Antwerp talk about total error

March 12, 2017

Looking at my blog stats, I see that a lot of people are reading the total analytical error vs. total error post. So, below are the slides from a talk that I gave at a conference in Antwerp in 2016 called The “total” in total error. The slides have been updated. Because it is a talk, the slides are not as effective as the talk.




Help with sigma metric analysis

January 27, 2017


I’ve been interested in glucose meter specifications and evaluations. There are three glucose meter specifications sources:

FDA glucose meter guidance
ISO 15197:2013
glucose meter error grids

There are various ways to evaluate glucose meter performance. What I wished to look at was the combination of sigma metric analysis and the error grid. I found this article about the sigma metric analysis and glucose meters.

After looking at this, I understand how to construct these so-called method decision charts (MEDX). But here’s my problem. In these charts, the total allowable error TEa is a constant – this is not the case for TEa for error grids. The TEa changes with the glucose concentration. Moreover, it is not even the same at a specific glucose concentration because the “A” zone limits of an error grid (I’m using the Parkes error grid) are not symmetrical.

I have simulated data with a fixed bias and constant CV throughout the glucose meter range. But with a changing TEa, the estimated sigma also changes with glucose concentration.

So I’m not sure how to proceed.

The problem with the FDA standard explained

October 25, 2016


The previous blog entry criticized the updated FDA POCT glucose meter performance standard, which now allows 2% of the results to be unspecified.

What follows is an explanation of why this is wrong. My logic applies to:

  1. Total error performance standards which state that 95% (or 99%) of results should be within stated limits
  2. Measurement uncertainty performance standards which state that 95% (or 99%) of results should be within stated limits
  3. The above FDA standard which states that 98% of results should be within stated limits

One argument that surfaces for allowing results to be unspecified is that one cannot prove that 100% of results are within limits. This is of course true. But here’s the problem of using that fact to allow unspecified results.

Using a glucose meter example, with truth = 30 mg/dL. Assume the glucose meter has a 5% CV and assume that the precision results are normally distributed. One can calculate the location of glucose meter errors using various SD multiples and also note their location in a Parkes error grid and the number of times 1 of these errors due to precision could occur.

Truth SD multiple Observed glucose Parkes grid Occurs 1 in
30 2 33 A zone 20
30 3 34.5 A zone 370
30 8 42 A zone 7E+14
30 22 63 C zone 1E+106


(To get an error in the E zone, an extremely dangerous result, would require 90 multiples of the standard deviation, and Excel refuses to tell me how rare this is). I think it’s clear that not specifying a portion of the results is not justified by worrying about precision and / or the normal distribution.

Now errors in higher zones of the Parkes error grid do occur including E zone errors and clearly this has nothing to do with precision. These errors have other causes by other sources such as interferences.

A better way to think of these errors are “attribute” errors – they either occur or don’t occur. For more on this, see: Krouwer JS. Recommendation to treat continuous variable errors like attribute errors. Clinical Chemistry and Laboratory Medicine 2006;44(7):797–798.

Note that one cannot prove that attribute errors won’t occur. But no one allows results to be unspecified the way clinical chemistry standards committees do. For example you don’t hear “we want 98% of surgeries to be performed on the correct organ on the correct patient.”

MU vs TE vs EG

July 29, 2016


Picture is aerial view from a Cirrus of Foxwoods casino in CT

MU=measurement uncertainty TE=total error EG=error grid

Having looked at a blog entry by the Westgards, which is always interesting, here are my thoughts.

To recall, MU is a “bottoms-up” way to model error in a clinical chemistry assay (TE uses a “top down” model) and EG has no model at all.

MU is a bad idea for clinical chemistry – Here are the problems with MU:

  1. Unless things have changed, MU doesn’t allow for bias in it modeling process. If a bias is found, it must be eliminated. Yet in the real world, there are many uncorrected biases in assays (calibration bias, interferences).
  2. The modeling required by MU is not practical for a typical clinical chemistry lab. One can view the modeling as having two major components: the biological equations that govern the assay (e.g., Michaelis Menten kinetics) and the instrumentation (e.g., the properties of the syringe that picks up the sample). Whereas clinical chemists may know the biological equations, they won’t have access to the manufacturer’s instrumentation data.
  3. The math required to perform the analysis is extremely complicated.
  4. Some of the errors that occur cannot be modeled (e.g., user errors, manufacturing mistakes, software errors).
  5. The MU result is typically reported as the location of 95% of the results. But one needs to account for 100% of the results.
  6. So some people get the SD for a bunch of controls and call this MU – a joke.

TE has been much more useful than MU, but still has problems:

  1. The Westgard model for TE doesn’t account for some important errors, such as patient interferences.
  2. Other errors that occur (e.g., user errors, manufacturing mistakes, software errors) may be captured by TE but the potential for these errors are often excluded from experiments (e.g., users in these experiments are often more highly trained than typical users).
  3. Although both MU and TE rely on experimental data, TE relies solely on an experiment (method comparison or quality control). There are likely to be biases in the experiment which will cause TE to be underestimated. (See #2).
  4. The TE result is typically reported as the location of 95% of the results. But one needs to account for 100% of the results.
  5. TE is often overstated e.g., the sigma value is said to provide a specific (numeric) quality for patient results. But this is untrue since TE underestimates the true total error.
  6. TE fails to account for the importance of bias. That is, one can have results that are within TE goals but can still cause harm due to bias. Klee has shown this as well as me. For example, bias for a glucose meter can cause diabetic complications but still be within TE goals.

I favor error grids.


  1. Error grids still have the problem that they rely on experimental data and hence there may be bias in the studies.
  2. But 100% of the results are accounted for.
  3. There is the notion of increasing patient harm in EG. With either MU or TE, there is only the concept of harm vs no harm. This is not the real world. A glucose meter result of 95 mg/dL (truth=160 mg/dL) has much less harm than a glucose meter result of 350 mg/dl (truth=45 mg/dL).
  4. EG simply plots test vs. reference. There are no models (but there is no way to tell the origin of the error source).

Published – my one man Milan Conference

March 23, 2016


Having read the consensus statement and all the papers from the Milan conference (available without subscription), I prepared my version of this for the Antwerp conference. This talk contained the following:

  • A description of why the Westgard model for total error is incomplete (with of course Jim Westgard sitting in the audience)
  • A description of why expanded total error models are nevertheless also incomplete
  • A critique of Boyd and Bruns’ glucose meter performance simulations using the Westgard model
  • A critique of the ISO and CLSI glucose meter specifications, both based on total error
  • A description of what the companies with most of the market share in glucose meters did, when they started to lose market share
  • How Ciba Corning specified and evaluated performance
  • What I currently recommend

I submitted a written version of this talk to Clin Chem and Lab Medicine, with recommended reviewers being Milan authors with whom I disagreed. (The journal asks authors to recommend reviewers). Now I don’t know who the reviewers were, but suffice it to say that they didn’t like my paper at all. So after several revisions, I scaled back my paper to its current version, which is here (subscription required).

Hemoglobin A1c quality targets

March 16, 2015


There is a new article in Clinical Chemistry about a complicated (to me) analysis of quality targets for A1c when it would seem that a simple error grid – prepared by surveying clinicians would fit the bill.

Thus, this paper has problems. They are:

  1. The total error model is limited to average bias and imprecision. Error from interferences, user error, or other sources is not included. It is unfortunate to call this “total” error, since there is nothing total about it.
  2. A pass fail system is mentioned, which is dichotomous and unlike an error grid which allows for varying degrees of error with respect to severity of harm to patients.
  3. A hierarchy of possible goals are mentioned. This comes from a 1999 conference. But there is really only one way to set patient goals (listed near the top of the 1999 conference): namely; a survey of clinician opinions.
  4. Discussed in the Clinical Chemistry paper is the use of biological variation based goals for quality targets. Someone needs to explain to me how this could ever be useful.
  5. The analysis is based on proficiency survey materials, which due to the absence of patient interferences (see #1) is a subset of total error.
  6. From I could tell from their NICE reference (#11) in the paper, the authors have inferred that total allowable error should be 0.46% but this did not come from surveying clinicians.
  7. I’m on-board with six sigma in its original use at Motorola. But I don’t see its usefulness in laboratory medicine compared to an error grid.

Review of Laboratory QC

October 24, 2014


Recommended reading – CAP interview of Jim Westgard regarding lab QC over the last 30 years including the current focus on risk management: