Mandel and Westgard

May 20, 2018

Readers may know that I been known to critique Westgard’s total error model.

But let’s step it back to 1964 with Mandel’s representation of total error (1), where:

Total Error (TE) = x-R = (x-mu) + (mu-R) with

x= the sample measurement
R=the reference value and
mu=the population mean of the sample

Thus, mu-R is the bias and x-mu the imprecision – the same as the Westgard model. There is an implicit assumption that the replicates of x which estimate mu are only affected by random error. For example, if the observations of the replicates contain drift, the Mandel model would be incorrect. For replicates sampled close in time, this is a reasonable assumption, although it is rarely if ever tested.

Interferences are not a problem because even if they exist, there is only one sample. Thus, interference bias is mixed in with any other biases in the sample.

Total error is often expressed for 95% of the results. I have argued that 5% of results are unspecified but if the assumption of random error is true for the repeated measurements, this is not a problem because these results come from a Normal distribution. Thus, the probability is extremely remote that high multiples of the standard deviation will occur.

But outliers are a problem. Typically for these studies, outliers (if found) are deleted because they will perturb the estimates – the problem is the outliers are usually not dealt with and now the 5% unspecified results becomes a problem.

If no outliers are observed, this is a good thing but here are some 95% confidence levels for the maximum outlier rate given the number of sample replicates indicated where 0 outliers have been found.

N                             Maximum outlier rate (95% CI)

10                           25.9%
100                         3.0%
1,000                      0.3%

So if one is measuring TE for a control or patient pool and keeping the time between replicates short, then the Westgard model estimate of total error is reasonable, although one still has to worry about outliers.

But when one applies the Westgard model to patient samples, it is no longer correct since each patient sample can have a different amount of interference bias. And while large interferences are rare, interferences can come in small amounts and affect every sample – inflating the total error. Moreover, other sources of bias can be expected with patient samples, such as user error in sample preparation. And with patient samples, outliers while still rare, can occur.

This raises the question as to the interpretation of results from a study that uses the Westgard model (such as a Six Sigma study). These studies typically use controls but the implication is that they inform about the quality of the assay – meaning of course for patient samples. This is a problem for the reasons stated above. So one can say that if an assay has a bad six sigma value, the assay has a problem, but if the assay has a good six sigma value, one cannot say the assay is without problems.



  1. Mandel J. The statistical analysis of experimental data Dover, NY 1964, p 105.



New publication about interferences

April 20, 2018

My article “Interferences, a neglected error source for clinical assays” has been published. This article may be viewed using the following link

Risk based SQC – What does it really mean

December 4, 2017

Having just read a paper on risk based SQC, here are my thoughts…

CLSI has recently adopted a risk management theme for some of their standards. The fact that Westgard has jumped on the risk management bandwagon is as we say in Boston, wicked smaaht.

But what does this really mean and is it useful?

SQC as described in the Westgard paper is performed to prevent patient results from exceeding an allowable total error (TEa). To recall, TEa = |bias|/SD*1.65. I have previously commented that this model does not account for all error sources, especially for QC samples. But for the moment, let’s assume that the only error sources are average bias and imprecision. The remaining problem with TEa is that it is always given as a percentage of results, usually 95%. So if some SQC procedure were to just meet their quality requirement, up to 5% of patient results could exceed their TEa and potentially cause medical errors. This is 1 in every 20 results! I don’t see how this is a good thing even if one were to use a 99% TEa.

The problem is one of “spin.” SQC, while valuable, does not guarantee the quality of patient results. The laboratory testing process is like a factory process and with any such process, to be useful it must be in control (meaning in statistical quality control). Thus, SQC helps to guard against an out of control process. To be fair, if the process were out of control, patient sample results might exceed TEa.

The actual risk of medical errors due to lab error is a function not only of an out of control process but also due to all other error sources not accounted for by QC, such as user errors with patient samples (as opposed to QC samples), patient interferences, and so on. Hence, to say that risk based SQC can address the quality of patient results is “spin.” SQC is a process control tool – nothing more and nothing less.

And the best way of running SQC would be for a manufacturer to assess results from all laboratories.

Now some people might think, this is a nit-piking post but here is an additional point. One might be lulled into thinking that with this risk based SQC that labs don’t have to worry about bad results. But interferences can cause large errors that can cause medical errors. For example, in the maltose problem for glucose meters, 6 of 13 deaths occurred after an FDA warning. And recently, there have been concerns about biotin interference in immunoassays. So it’s not good to oversell SQC, since people might loose focus on other, important issues.

HbA1c – use the right model, please

August 31, 2017

I had occasion to read a paper (CCLM paper) about HbA1c goals and evaluation results. This paper refers to an earlier paper (CC paper) which says that Sigma Metrics should be used for HbA1c.

So here are some problems with all of this.

The CC paper says that TAE (which they use) is derived from bias and imprecision. Now I have many blog entries as well as peer reviewed publications going back to 1991 saying that this approach is flawed. That the authors chose to ignore this prior work doesn’t mean the prior work doesn’t exist – it does – or that it is somehow not relevant – it is.

In the CC paper, controls were used to arrive at conclusions. But real data involves patient samples so the conclusions are not necessarily transferable. And in the CCLM paper, patient samples are used without any mention as to whether the CC paper conclusions still apply.

In the CCLM paper, precision studies, a method comparison, linearity, and interferences were carried out. This is hard to understand since the TAE model of (absolute) average bias + 2x imprecision does not account for either linearity or interference studies.

The linearity study says it followed CLSI EP6 but there are no results to show this (e.g., no reported higher order polynomial regressions). The graphs shown, do look linear.

But the interference studies are more troubling. From what I can make of it, the target values are given ± 10% bands and any candidate interfering substance whose data does not fall outside of these bands is said to not clinically interfere (e.g., the bias is less than absolute 10%). But that does not mean there is no bias! To see how silly this is, one could say if the average bias from regression was less than absolute 10%, it should be set to zero since there was no clinical interference.

The real problem is that the authors’ chosen TAE model cannot account for interferences – such biases are not in their model. But interference biases still contribute to TAE! And what do the reported values of six sigma mean? They are valid only for samples containing no interfering substances. That’s neither practical nor meaningful.

Now one could better model things by adding an interference term to TAE and simulating various patient populations as a function of interfering substances (including the occurrence of multiple interfering substances). But Sigma Metrics, to my knowledge cannot do this.

Another comment is that whereas HbA1c is not glucose, the subject matter is diabetes and in the glucose meter world, error grids are well known as a way to evaluate required clinical performance. But the term “error grid” does not appear in either paper.

Error grids account for the entire range of the assay. It seems that Sigma Metrics are chosen to apply at only one point in the assay.

Antwerp talk about total error

March 12, 2017

Looking at my blog stats, I see that a lot of people are reading the total analytical error vs. total error post. So, below are the slides from a talk that I gave at a conference in Antwerp in 2016 called The “total” in total error. The slides have been updated. Because it is a talk, the slides are not as effective as the talk.




Help with sigma metric analysis

January 27, 2017


I’ve been interested in glucose meter specifications and evaluations. There are three glucose meter specifications sources:

FDA glucose meter guidance
ISO 15197:2013
glucose meter error grids

There are various ways to evaluate glucose meter performance. What I wished to look at was the combination of sigma metric analysis and the error grid. I found this article about the sigma metric analysis and glucose meters.

After looking at this, I understand how to construct these so-called method decision charts (MEDX). But here’s my problem. In these charts, the total allowable error TEa is a constant – this is not the case for TEa for error grids. The TEa changes with the glucose concentration. Moreover, it is not even the same at a specific glucose concentration because the “A” zone limits of an error grid (I’m using the Parkes error grid) are not symmetrical.

I have simulated data with a fixed bias and constant CV throughout the glucose meter range. But with a changing TEa, the estimated sigma also changes with glucose concentration.

So I’m not sure how to proceed.

The problem with the FDA standard explained

October 25, 2016


The previous blog entry criticized the updated FDA POCT glucose meter performance standard, which now allows 2% of the results to be unspecified.

What follows is an explanation of why this is wrong. My logic applies to:

  1. Total error performance standards which state that 95% (or 99%) of results should be within stated limits
  2. Measurement uncertainty performance standards which state that 95% (or 99%) of results should be within stated limits
  3. The above FDA standard which states that 98% of results should be within stated limits

One argument that surfaces for allowing results to be unspecified is that one cannot prove that 100% of results are within limits. This is of course true. But here’s the problem of using that fact to allow unspecified results.

Using a glucose meter example, with truth = 30 mg/dL. Assume the glucose meter has a 5% CV and assume that the precision results are normally distributed. One can calculate the location of glucose meter errors using various SD multiples and also note their location in a Parkes error grid and the number of times 1 of these errors due to precision could occur.

Truth SD multiple Observed glucose Parkes grid Occurs 1 in
30 2 33 A zone 20
30 3 34.5 A zone 370
30 8 42 A zone 7E+14
30 22 63 C zone 1E+106


(To get an error in the E zone, an extremely dangerous result, would require 90 multiples of the standard deviation, and Excel refuses to tell me how rare this is). I think it’s clear that not specifying a portion of the results is not justified by worrying about precision and / or the normal distribution.

Now errors in higher zones of the Parkes error grid do occur including E zone errors and clearly this has nothing to do with precision. These errors have other causes by other sources such as interferences.

A better way to think of these errors are “attribute” errors – they either occur or don’t occur. For more on this, see: Krouwer JS. Recommendation to treat continuous variable errors like attribute errors. Clinical Chemistry and Laboratory Medicine 2006;44(7):797–798.

Note that one cannot prove that attribute errors won’t occur. But no one allows results to be unspecified the way clinical chemistry standards committees do. For example you don’t hear “we want 98% of surgeries to be performed on the correct organ on the correct patient.”