Antwerp talk about total error

March 12, 2017

Looking at my blog stats, I see that a lot of people are reading the total analytical error vs. total error post. So, below are the slides from a talk that I gave at a conference in Antwerp in 2016 called The “total” in total error. The slides have been updated. Because it is a talk, the slides are not as effective as the talk.





Published – my one man Milan Conference

March 23, 2016


Having read the consensus statement and all the papers from the Milan conference (available without subscription), I prepared my version of this for the Antwerp conference. This talk contained the following:

  • A description of why the Westgard model for total error is incomplete (with of course Jim Westgard sitting in the audience)
  • A description of why expanded total error models are nevertheless also incomplete
  • A critique of Boyd and Bruns’ glucose meter performance simulations using the Westgard model
  • A critique of the ISO and CLSI glucose meter specifications, both based on total error
  • A description of what the companies with most of the market share in glucose meters did, when they started to lose market share
  • How Ciba Corning specified and evaluated performance
  • What I currently recommend

I submitted a written version of this talk to Clin Chem and Lab Medicine, with recommended reviewers being Milan authors with whom I disagreed. (The journal asks authors to recommend reviewers). Now I don’t know who the reviewers were, but suffice it to say that they didn’t like my paper at all. So after several revisions, I scaled back my paper to its current version, which is here (subscription required).

Whining rewarded?

August 29, 2014


Looking at the table of contents of Clinical Chemistry for September, there is a list of the most downloaded point / counterpoint articles and I am number one on this list for my discussion of GUM (The guide to the expression of uncertainty in measurement):

Why GUM will never be enough

April 7, 2014


I occasionally come across articles that describe a method evaluation using GUM (Guide to the expression of Uncertainty in Measurement). These papers can be quite impressive with respect to the modeling that occurs. However, there is often a statement that relates the results to clinical acceptability. Here’s why there is a problem.

Clinically acceptability is usually not defined but often implied to be a method’s performance that will not cause patient harm due to assay error.

A GUM analysis usually specifies the location for 95% of the results. But if the analysis shows that the assay just meets limits, then 5% of the results will cause patient harm. Now according to GUM models, the 5% will be close to limits because the data are assumed to be Gaussian so this is a minor problem.

A bigger problem is that GUM analysis often ignores rare but large errors such as a rare interference or something more insidious such a user error that results in a large assay error. (Often GUM analyses don’t assess user error at all). These large errors, while rare, are associated with major harm or death.

The remedy is to conduct a FMEA or fault tree in addition to GUM to try to brainstorm how large errors could occur and whether mitigations are in place to reduce their likelihood. Unless risk analysis is added to GUM, talking about clinical acceptability is misleading.

AACC 2012

July 19, 2012

I went early to AACC 2012 on a consulting assignment which ended two days before AACC. So the highlight of my trip was a helicopter tour of LA in an R22, a tour of Warner Bros. studios, and a walk on the Santa Monica pier.

At the meeting, I got to hear two plenary lectures, both on genomics. They were very interesting talks (Eric Green and Robert Roberts) and it was humbling to realize how little I know about genomics.

I also attended the Evaluations Protocol Area Committee, although I think it is now called a consensus committee. There were two projects that I had started EP27 (error grids) and the revision to EP21A2 (total error) – I proposed and completed EP21A. As of January, I had been unexpectedly and rather unceremoniously kicked off both projects. There was little discussion about EP27 – it should be available around September 2012 and little changed. It will be interesting to see who the authors will be.

There was more discussion about EP21, including a proposal to drop it completely in favor of a GUM uncertainty analysis, which is a CLSI document (C51). This is about when I had enough and bolted from the meeting. Yet, I did have the comfort in knowing that the financial way projects are valued is something I put in place a while back and probably unknown to this current group. 

I went to a talk about medical error. One thing that is always missing from these talks is a measure of overall patient harm – maybe the subject of a future blog entry.

I also talked with a very nice Spanish lady about her poster. She assayed a sample on each of two analyzers of the same type and compared the results to various total error goals including biological variation, CLIA, and others. The results were often outside of goals, especially biological variation goals which makes one wonder if such goals are meaningful.

CLSI C51 – measurement uncertainty – or the classic comic version of GUM

February 29, 2012

GUM (Guide to the Expression of Uncertainty in Measurement) for laboratories (and manufacturers) is what CLSI C51 is all about.  (GUM was originally used to provide information about reference materials). I have previously commented that I didn’t think that GUM was a good idea for laboratories (1). I was also initially on the C51 subcommittee but since I couldn’t convince anyone about my point of view, I bailed.

To recall some of the problems with GUM …

  1. bias is not allowed – it must be corrected. But you could ignore big, rare biases (outliers) as well as real small biases.
  2. To obtain the standard deviations or bias corrections applied by manufacturers was impractical if not impossible for laboratories as in … Let’s set up a fixture and measure the variability of 10 pumps we just bought for this experiment.
  3. The math required to put together an estimate will make most people’s head spin.

In the C51 version of GUM, there is only 1 example – that of measuring a bunch of controls. This is not GUM! and will not provide an uncertainty estimate for patient samples since controls do not estimate the non specificity assay errors in patient samples.


  1. Krouwer JS A Critique of the GUM Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin Chem 2003;49:1818-1821.

Comments on “Measurement uncertainty is not synonym of measurement repeatability or measurement reproducibility”

October 29, 2008

Of course, I agree with this statement (1) and here are some comments.


First, some definitions and observations.


1.       A repeatability condition is defined as a “condition of measurement, out of a set of conditions that includes the same measurement procedure, same operators, same measuring system, same operating conditions and same location, and replicate measurements on the same or similar objects over a short period of time”

2.       A reproducibility condition is defined as a “condition of measurement, out of a set of conditions that includes different locations, operators, measuring systems, and replicate measurements on the same or similar objects”

3.       GUM includes two sources of uncertainty both expressed as standard deviations: type A, which is characterized by measurements and type B, which is characterized by either measurements or more commonly by experience or assumptions.

4.       In the real world of commercial diagnostic assays, reproducibility is almost always larger than repeatability.

5.       It is logical to assume that the reason for #4 is uncorrected systematic effects.

6.       A reason that some effects are uncorrected (or not better corrected) is economics.


Now if one could take an infinite set of measurements for a diagnostic assay, would there be a difference between reproducibility and uncertainty of measurement? I maintain the answer is no, they will be the same. All of the type B effects (and type A) will be expressed in an infinite set of measurements.


For a shorter set of measurements, reproducibility and uncertainty of measurement will be different, although for diagnostic assays, one routinely has large sets of quality control data (reproducibility) that span relatively long times.


One problem with this quality control data is that it is not patient data and thus some effects cannot be sampled such as patient interferences. (Postulating them through assumptions in GUM is not easy either).


Another consideration is the types of effects that manifest themselves, such as calibration bias and non calibratable reagent lot effects. In principle, these systematic effects could be made smaller but aren’t since economics prevent it. Since these effects can be relatively large, given a long enough sampling time, these effects and the other unknown effects which are expressed over time will approximate an uncertainty of measurement approach although as stated above, the quality control results will never account for effects such as patient interferences.


So although measurement uncertainty is not synonym of measurement repeatability or measurement reproducibility, a reproducibility experiment, conducted over a long enough time will probably give a result that is similar to an uncertainty of measurement approach (save for the patient interference problem).


And finally, it is a lot easier to calculate a standard deviation on some quality control data, than to go through a proper uncertainty of measurement procedure.




1.       Paul De Bièvre Measurement uncertainty is not synonym of measurement repeatability or measurement reproducibility Accred Qual Assur (2008) 13:61–62