The value of error grids

March 29, 2019

My colleague and I sang the praises of error grids as a way to specify performance – for any assay. To recall, here are some of the benefits:

  1. Unlike most specifications, the limits can change with concentration
  2. Unlike most specifications, the limits need not be symmetrical
  3. Most specifications have one set of limits, implying that results within limits cause no harm and results outside of limits cause harm. Error grid have multiple sets of limits – called zones – whereby harm can be none, minor, or major.
  4. Error grid zones account of 100% of the results – they cover the XY space of candidate assay vs reference assay. Most specifications cover 95% or 99% of results, leaving the balance unspecified.

Krouwer JS and Cembrowski GS Towards more complete specifications for acceptable analytical performance – a plea for error grid analysis. Clinical Chemistry and Laboratory Medicine, 2011;49:1127-1130.

How Should Glucose Meters be evaluated for Critical Care

July 24, 2018

There is a new, IFCC document with the title the same as this blog entry.  Ok, I know better than to try to publish a critique of an IFCC document, so I’ll keep my thoughts to this blog.

The glucose meter goals suggested by IFCC are the same as those contained in the CLSI document POCT12-A3. Now I do have a published critique of this CLSI standard – it is here. Not a surprise, but my critique of POCT12-A3 is not listed in the many IFCC references.

Upon skimming the IFCC document, it has the same accuracy goals as POCT12-A3, which basically leaves 2% of glucose meter results as unspecified (e.g., could be real bad). Since the IFCC document covers the possibility of interferences and user errors as a reason for errors, someone needs to tell me why 2% of glucose meter results are unspecified.

The problem is, say you did an evaluation with 100 samples and 1 of them had a large error (much greater than a 20% error). A problem with the POCT12-A3 spec is that allows one to say for the results of this evaluation, the spec has been met even though the bad result could cause patient harm. Hence, meeting the POCT12-A3 spec implies that one has achieved accuracy as suggested by a standards group and this could justify one to ignore the bad result.

Big errors and little errors

May 27, 2018

In clinical assay evaluations, most of the time, focus is on “little” errors. What I mean by little errors are average bias and imprecision that exceed goals. Now I don’t mean to be pejorative about little errors since if bias or imprecision don’t meet goals, the assay is unsuitable. One of the reasons to distinguish between big and little errors is that often in evaluations, big errors are discarded as outliers. This is especially true in proficiency surveys but even for a simple method comparison, one is justified in discarding an outlier because the value would otherwise perturb the bias and imprecision estimates.

But big errors cause big problems and most evaluations focus on little errors, so how are big errors studied? Other than running thousands of samples, a valuable technique is to perform a FMEA (Failure Mode Effects Analysis). This can or should cover user error, software, interferences, besides the usual items. A FMEA study is often not very enthusiastically received but it is a necessary step in trying to ensure that an assay is free from both big and little errors. Of course, even with a completed FMEA, there are no guarantees.


Mandel and Westgard

May 20, 2018

Readers may know that I been known to critique Westgard’s total error model.

But let’s step it back to 1964 with Mandel’s representation of total error (1), where:

Total Error (TE) = x-R = (x-mu) + (mu-R) with

x= the sample measurement
R=the reference value and
mu=the population mean of the sample

Thus, mu-R is the bias and x-mu the imprecision – the same as the Westgard model. There is an implicit assumption that the replicates of x which estimate mu are only affected by random error. For example, if the observations of the replicates contain drift, the Mandel model would be incorrect. For replicates sampled close in time, this is a reasonable assumption, although it is rarely if ever tested.

Interferences are not a problem because even if they exist, there is only one sample. Thus, interference bias is mixed in with any other biases in the sample.

Total error is often expressed for 95% of the results. I have argued that 5% of results are unspecified but if the assumption of random error is true for the repeated measurements, this is not a problem because these results come from a Normal distribution. Thus, the probability is extremely remote that high multiples of the standard deviation will occur.

But outliers are a problem. Typically for these studies, outliers (if found) are deleted because they will perturb the estimates – the problem is the outliers are usually not dealt with and now the 5% unspecified results becomes a problem.

If no outliers are observed, this is a good thing but here are some 95% confidence levels for the maximum outlier rate given the number of sample replicates indicated where 0 outliers have been found.

N                             Maximum outlier rate (95% CI)

10                           25.9%
100                         3.0%
1,000                      0.3%

So if one is measuring TE for a control or patient pool and keeping the time between replicates short, then the Westgard model estimate of total error is reasonable, although one still has to worry about outliers.

But when one applies the Westgard model to patient samples, it is no longer correct since each patient sample can have a different amount of interference bias. And while large interferences are rare, interferences can come in small amounts and affect every sample – inflating the total error. Moreover, other sources of bias can be expected with patient samples, such as user error in sample preparation. And with patient samples, outliers while still rare, can occur.

This raises the question as to the interpretation of results from a study that uses the Westgard model (such as a Six Sigma study). These studies typically use controls but the implication is that they inform about the quality of the assay – meaning of course for patient samples. This is a problem for the reasons stated above. So one can say that if an assay has a bad six sigma value, the assay has a problem, but if the assay has a good six sigma value, one cannot say the assay is without problems.



  1. Mandel J. The statistical analysis of experimental data Dover, NY 1964, p 105.


Commutability and revival of a 39 year old model

March 13, 2018

Commutability is a hot topic these days and it should be. One would like to think that someone tested on one system will get the same result if they are tested on another system.

In reading the second paper (1) in a three series set of articles, I note that a term for interferences is present (in addition to average bias and imprecision) to estimate error. Almost forty years ago, this was suggested (see reference 2).

Although reference 2 was not cited in the Clinical Chemistry paper, at least a model accounting for interferences is being used.



  1. Clinical Chemistry 64:3 455–464 (2018)
  2. Lawton WH, Sylvester EA, Young-Ferraro BJ. Statistical comparison of multiple analytic procedures: application to clinical chemistry. Technometrics. 1979;21:397-409.

Performance specifications, lawsuits, and irrelevant statistics

March 11, 2018

Readers of this blog know that I’m in favor of specifications that account for 100% of the results. The danger of specifications that are for 95% or 99% of the results is that errors can occur that cause serious patient harm for assays that meet specifications! Large and harmful errors are rare and certainly less than 1%. But hospitals might not want specifications that account for 100% of results (and remember that hospital clinical chemists populate standards committees). A potential reason is that if a large error occurs, the 95% or 99% specification can be an advantage for a hospital if there is a lawsuit.

I’m thinking of an example where I was an expert witness. Of course, I can’t go into the details but this was a case where there was a large error, the patient was harmed, and the hospital lab was clearly at fault. (In this case it was a user error). The hospital lab’s defense was that they followed all procedures and met all standards, e.g., sorry but stuff happens.

As for irrelevant statistics, I’ve heard two well-known people in the area of diabetes (Dr. David B Sachs and Dr. Andreas Pfützner) say in public meetings that one should not specify glucose meter performance for 100% of the results because one can never prove that the number of large errors is zero.

That one can never prove that the number of large errors is zero is true but this does not mean one should abandon a specification for 100% of the results.

Here, I’m reminded of blood gas. For blood gas, obtaining a result is critical. Hospital labs realize that blood gas instruments can break down and fail to produce a result. Since this is unacceptable, one can calculate the failure rate and reduce the risk of no result with redundancy (meaning using multiple instruments). No matter how many instruments are used, the possibility that all instruments will fail at the same time is not zero!

A final problem with not specifying 100% of the results is that it may cause labs to not put that much thought into procedures to minimize the risk of large errors.

And in industry (at least at Ciba-Corning) we always had specifications for 100% of the results, as did the original version of the CLSI total error document, EP21-A (this was dropped in the A2 version).

Assumptions – often a missing piece in data analysis for lab medicine

February 24, 2018

A few blog entries ago, I described a case when calculating the SD did not provide an estimate of random error because the observations contained drift.

Any time that data analysis is used to estimate a parameter, there are usually a set of assumptions that must be checked to ensure that the parameter estimate will be valid. In the case of estimating random error from a set of observations from the same sample, an assumption is that the errors are IIDN, which means that the observations are independently and identically distributed in a normal distribution with mean zero and variance sigma squared. This can be checked visually by examining a plot of the observations vs. time, the distribution of the residuals, the residuals vs. time, or any other plot that makes sense.

The model is: Yi = ηi + εi and the residuals are simply YiPredicted – Yi