A comment about terms used in EP5-A3 and bias

December 11, 2014



I have the new version of EP5-A3, which is CLSI’s document about precision. Having been kicked out of CLSI, I was loathe to buy it but if one is consulting in evaluating assays, it’s required.

As I read through the document, one note on terminology – this was in the A2 version as well – the use of the term “total precision” has been dropped and replaced with either “within laboratory precision” or “within device precision.”

All three terms have issues – the replacement does not solve these issues. The problem is that whichever term one is using does not account for all sources of error, which is implied in the terms. In an experiment such as EP5, the goal is to randomly sample sources of imprecision from the population of interest. Take reagents for example. The study may use one reagent or in many cases in industry – three or more reagents. But these reagents are not a random sample from the population of reagents – that’s of course impossible, because for a new assay, there are often only a few reagents that have been made and future reagents don’t exist. Are future reagents the same? That’s hard to say as raw materials change, vendor and manufacturing procedures change, QC procedures for approving lots change, personnel change, and so on.

The same could be said for the 20 days. Say the assay’s projected life is 10 years. One cannot randomly select 20 days from all future 20 day sequences in the 10 years – one is stuck with the 20 days that are current.

Formally, these are forms of bias and thus the EP5 protocol is biased. This is not some bad, deliberate bias – it is unavoidable bias, but bias nevertheless.

So in reality, the EP5 experiment is estimating precision based on the error sources that are allowed to be in the experiment. Whatever term is used: “total precision”, “within laboratory precision” or within device precision”, it is likely that precision has been underestimated.

More glucose fiction

December 1, 2014


In the latest issue of Clinical Chemistry, there are two articles (1-2) about how much glucose meter error is ok and an editorial (3) which discusses these papers. Once again, my work on this topic has been ignored (4-12). Ok, to be fair not all of my articles are directly relevant but the gist of my articles and particularly reference #10 is that if you use the wrong model, the outcome of a simulation is not relevant to the real world.

How are the authors’ models wrong?

In paper #1, the authors’ state: “The measurement error was assumed to be uncorrelated and normally distributed with zero mean…”

In paper #2, the authors state:” We ignored other analytical errors (such as nonlinear bias and drift) and user errors in this model.”

In both papers, the objective is to state a maximum glucose error that will be medically ok. But since the modeling omits errors that occur in the real world, the results and conclusions are unwarranted.

Ok, here’s a thought people – instead of simulations based on the wrong model, why not construct simulations based on actual glucose evaluations. An example of such study is: Brazg RL, Klaff LJ, Parkin CG. Performance variability of seven commonly used self-monitoring of blood glucose systems: clinical considerations for patients and providers. J Diabetes Sci Technol. 2013;7:144-152. Given sufficient method comparison data, one could construct an empirical distribution of differences and randomly sample from it.

And finally, I’m sick of seeing the Box quote (reference 3): “Essentially, all models are wrong, but some are useful.” Give it a rest – it doesn’t apply here.


  1. Malgorzata E. Wilinska and Roman Hovorka Glucose Control in the Intensive Care Unit by Use of Continuous Glucose Monitoring: What Level of Measurement Error Is Acceptable? Clinical Chemistry 2014; v. 60, p.1500-1509.
  2. Tom Van Herpe, Bart De Moor, Greet Van den Berghe, and Dieter Mesotten Modeling of Effect of Glucose Sensor Errors on Insulin Dosage and Glucose Bolus Computed by LOGIC-Insulin Clinical Chemistry 2014; v. 60, p.1510-1518.
  3. James C. Boyd and David E. Bruns Performance Requirements for Glucose Assays in Intensive Care Units Clinical Chemistry 2014; v. 60, p.1463-1465
  4. Jan S. Krouwer: Wrong thinking about glucose standards. Clin Chem, 2010;56:874-875.
  5. Jan S. Krouwer and George S. Cembrowski A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology, 2010;4:75-83.
  6. Jan S. Krouwer: Analysis of the Performance of the OneTouch SelectSimple Blood Glucose Monitoring System: Why Ease of Use Studies Need to Be Part of Accuracy Studies. Journal of Diabetes Science and Technology, 2011;5:610-611.
  7. Jan S. Krouwer: Evaluation of the Analytical Performance of the Coulometry-Based Optium Omega Blood Glucose Meter: What Do Such Evaluations Show? Journal of Diabetes Science and Technology, 2011;5:618-620.
  8. Jan S. Krouwer: Why specifications for allowable glucose meter errors should include 100% of the data. Clinical Chemistry and Laboratory Medicine, 2013;51:1543-1544.
  9. Jan S. Krouwer: The new glucose standard, POCT12-A3 misses the mark. Journal of Diabetes Science and Technology, 2013;7:1400-1402.
  10. Jan S. Krouwer: The danger of using total error models to compare glucose meter performance. Journal of Diabetes Science and Technology, 2014;8:419-421.
  11. Jan S. Krouwer and George S. Cembrowski: Acute Versus Chronic Injury in Error Grids. Journal of Diabetes Science and Technology, 2014;8:1057.
  12. Jan S. Krouwer and George S. Cembrowski. The chronic injury glucose error grid. A tool to reduce diabetes complications. Journal of Diabetes Science and Technology, in press (available online)

How the journal Clinical Chemistry has become elitist

November 21, 2014


At a recent AACC dinner meeting, I heard an interesting talk by Nader Rifai, the editor of Clinical Chemistry. About halfway through his talk, I remembered an event that took place a couple of years ago, so I asked him a question after his talk ended. My question and Nader’s responses went something like this:

Me: “A while ago, I read a commentary article that I didn’t agree with and submitted a Letter to the editor about it. The response from the journal was…”
Rifai: “It wouldn’t be reviewed because it wasn’t about an original article, right?”
Me: “Yes, that’s right, then I looked at a few issues and saw that the percentage of original articles is only about 50% of the journal. This means that one can’t comment about a large portion of the journal.”
Rifai: “Well, we were seeing Letters to the editor about other Letters to the editor and with commentary articles it is common that many people won’t have the same opinion as the author, so we don’t want to fill up the journal with such stuff.”

This is sort of what I remembered, not verbatim but that is the gist of it.

So basically, Rifai is putting Letters to the editor into a generic category similar to junk mail or the endless comments associated with Twitter or a blog and at the same time giving immunity to authors – other than those who write original articles – from any kind of comment.

But the problem is that commentary articles in Clinical Chemistry are about science and if the authors get the science wrong, it is a mistake to prevent people from pointing that out. That is unscientific / elitist. Perhaps contributing to this elitism was that Rifai mentioned that articles in Clinical Chemistry are of high quality due to the extensive review process. But this doesn’t guarantee correctness.

And Clinical Chemistry has changed its policy. I commented briefly on this topic before in this blog. My 2010 Letter to the editor about a “Question and Answer” type article was published. Moreover, I think my 2010 Letter had a role in shaping glucose meter standards but these days the Letter would not have been considered.

So now I have less interest in reading Clinical Chemistry.

A1c Result Reliability – Not!

November 20, 2014

reviewA recent article in Clin Chem – available without subscription – purports to show the result reliability of different A1c assays (1).

The basic premise of this paper is that given:

  • total error goals
  • a QC program
  • a study to estimate imprecision (CLSI EP5) and average bias (CLSI EP9)

one can determine the risk of reporting unreliable results.

This is simply not true. I have shown before – see ref 2 for the most recent – that the Westgard model of total error = (a multiple of imprecision + average bias) is incomplete and typically underestimates the true amount of error.

Thus, the authors’ risk of reporting unreliable results is itself unreliable and probably underestimates things because:

  • There is no information about interfering substances, not even a list of the standard error of estimates from the regessions which would provide some information about this error source.
  • One can assume that one reagent was used. Yet lot-to-lot reagent error is usually the largest component of error in an assay. Hence, this error source is inadequately measured
  • One does not know if the people that ran the study are representative of people who routinely run the assay – important since user error is often a significant source of error.

And finally, the use of one set of total error goals is questionable. If some results fail the total error goal, one wants to know if they just fail or if they are way out because just as error can be small or large, so can the resulting patient harm. Studies of the type in the paper can’t really help here because they use one Normal distribution. But in the real world, errors tend to come from different sources (distributions) so the risk of large errors is completely unknown.

What should one do to get a better prediction of risk?

  • Conduct risk analysis by performing a fault tree and FMEA (Failure Mode Effects Analysis) that includes
  • The correct model (see reference 2)
  • Account for the error sources missed in the paper (part of the fault tree / FMEA)
  1. Woodworth A Korpi-Steiner N, Miller JJ, et.al. Utilization of Assay Performance Characteristics to Estimate Hemoglobin A1c Result Reliability Clin Chem 2014 60 1073–1079 (2014).
  2. Krouwer JS The danger of using total error models to compare glucose meter performance. Journal of Diabetes Science and Technology, 2014;8:419-421.

Review of Laboratory QC

October 24, 2014


Recommended reading – CAP interview of Jim Westgard regarding lab QC over the last 30 years including the current focus on risk management: http://www.captodayonline.com/lab-qc-much-room-improvement/

Published: The Chronic Injury Glucose Error Grid: A Tool to Reduce Diabetes Complications

October 15, 2014


This is the full paper http://dst.sagepub.com/content/early/2014/10/10/1932296814554415.abstract – a Letter to the Editor was already mentioned.

The paper suggests that a different glucose error grid is needed for diabetes complications such as diabetic retinopathy rather than the traditional glucose error grid which deals with acute injuries. This is because slightly or even moderately elevated glucose would fall in the no treatment needed zone of a traditional glucose error grid but would be harmful for diabetic complications. Thus, a glucose meter could look good in terms of a traditional glucose error grid but have a bias that would allow elevated glucose to occur for up to 6 months – until the patient’s next A1c determination. Patients and providers would be better informed if they knew which glucose meters were free from these long-term biases.

QC vs. Clinical Limits

September 10, 2014


This entry adds to but does not contradict the Westgard Web site’s: http://www.westgard.com/badqc-goodsoftware.htm

Clinical limits provide a range of results that do not cause patient harm.

A problem with traditional clinical limits is that going from just under the limit to just over the limit changes the outcome from no harm to harm. This is difficult to understand for two results that are almost identical. That is why error grids were developed. They separate harm into categories so when one goes from just under the limit to just over the limit the outcome changes from no harm to minor harm.

QC limits are set to control the process

Thus, QC and clinical limits are different.

For an assay with a high process capability (similar to a high six sigma) clinical limits will always be much wider than QC limits. And for an assay that is out of control (values that exceed QC limits), results may still be within clinical limits. But it is nevertheless important to detect an out of control process because the process may become so out of control so that results fail clinical limits.

For a process that is in control (all values that within QC limits), can results be outside of clinical limits (potentially cause patient harm?) YES!

Glucose examples:

Patient spills coke on himself. Provider comes in to use glucose meter and fails to wash and dry the site from which the capillary sample is taken. The result is 200 mg/dL too high. There is nothing wrong with the glucose meter or QC.

Patient has an interference. Result is way off. QC is ok.

There is a one hour shift in results – they are way off. QC – once every eight hours doesn’t detect the shift.

Thus, QC detects long-term bias which is important but there can still be other errors that can harm patients.


Get every new post delivered to your Inbox.