My friends and colleagues Jim and Sten Westgard have written an article about total analytical error in which I am mentioned. Their article says that total analytical error (TAE) is important and estimated by the equation TAE=average bias + 2xSD. They later go on to say that higher multiples of the SD are useful for Six Sigma methods.
So what are the problems?
Here’s the first problem. The Westgards say (I have added bolding):
“After Westgard, Carey, and Wold proposed these definitions, some analysts argued that there were additional components of error that should be considered, such as interferences that affect individual patient samples, sometimes referred to as random biases. To include such effects, Krouwer recommended a direct estimation of TAE obtained by using a comparison with a reference method (2), and the Clinical Laboratory Standards Institute (CLSI) subsequently developed the EP21A guidance document using that approach (3).”
The use of the word “argued” implies that perhaps this consideration is wrong. It isn’t. The original analysis comes from: Lawton WH, Sylvester EA, Young-Ferraro BJ. Statistical comparison of multiple analytic procedures: application to clinical chemistry. Technometrics 1979;21:397-409. Now this is a technical article. For those that want a simpler version, consider the following picture:
These assays have the same TAE according to the Westgards because the average bias is the same. But clearly, the assay of the left has more actual TAE than the one on the right. Patient interferences aren’t the only problem. Any intermittent error will inflate the TAE but depending on the error source may not affect either average bias or precision. Moreover, it’s not just large interferences. Say that every sample has a bias from a mixture of interferences that ranges from -1% to +1% bias. The average bias is zero but the SD of differences between the assay and reference will be inflated and not accounted for by the equation average bias + 2 SD.
The second problem is perhaps more important. The Westgards say (I have added bolding):
“Given that ATE is intended to be an estimate of the quality of a measurement procedure, its practical value depends on a comparison to the quality required for the intended use of a test result. In other words, the definition refers to the amount of error that is allowable without invalidating the interpretation of a test result.”
The problem is that a measurement procedure includes all errors made during the conduct of a test including pre-analytical and post-analytical error (e.g., user error). Thus, the Westgards should be using total error, not total analytical error. Now I made this suggestion for the revision of CLSI EP21-A with the result that I got kicked off of my own committee.
Consider the following scenario. A hospitalized diabetic patient spills Coke on their hands. In comes a provider who fails to wash and dry the site where the capillary sample is taken. The result is a huge glucose error which will occur for any meter. This is part of total error and is part of the measurement procedure and will affect the interpretation of the test result.
The third problem has to do with choosing higher multiples of the SD – using average bias + Y SD, where Y=4 or 6.
“While the original recommendation for a total error criterion was ATE ≥ bias + 2 SD, later papers recommended ATE ≥ bias + 4 SD (8) and, with adoption of Six Sigma concepts (9), suggested ATE ≥ bias + 5 SD and ATE ≥ bias + 6 SD.”
The problem is that simple precision does not account for all (random) errors. It would make some sense if one used the SD of differences between an assay and reference, because this quantity contains all errors allowed to occur in the experiment. The last set of italics is key. To evaluate total error, an experiment must be conducted in which no possible error sources are excluded. This is difficult if not impossible to achieve in a method comparison experiment and even if it could be done, the Westgards analysis method is incorrect.
So what’s the bottom line?
To do what the Westgards say is generally a good thing. But it should not be over interpreted. Their measures do not inform about the measurement procedure but a subset (albeit an important subset) of the measurement procedure.