Theranos / AACC – You have to answer for Santino

July 31, 2018

The Theranos topic reappeared at the 2018 AACC meeting. This time there was an interview with John Carreyrou, the author who first published the problems of Theranos in The Wall Street Journal in October of 2015. The interview was well done in a question and answer format, much like a TV talk show. For people like me who read Carreyrou’s book Bad Blood, there was not much new. The AACC audience had a smug response as in, how could this have happened – I would have never bought this stuff.

But the two questions I wanted to hear were never asked!

  1. Why did AACC invite the founder of Theranos to present at AACC in 2016, after it was discovered that Theranos had committed fraud?
  2. Why did four former AACC presidents (Susan Evans, Ann Gronowski, Larry Kricka, and Jack Ladenson) join the Theranos scientific board after the 2016 AACC meeting?

Ok, maybe the Godfather reference doesn’t work, but I would have liked to hear answers to these questions.

How Should Glucose Meters be evaluated for Critical Care

July 24, 2018

There is a new, IFCC document with the title the same as this blog entry.  Ok, I know better than to try to publish a critique of an IFCC document, so I’ll keep my thoughts to this blog.

The glucose meter goals suggested by IFCC are the same as those contained in the CLSI document POCT12-A3. Now I do have a published critique of this CLSI standard – it is here. Not a surprise, but my critique of POCT12-A3 is not listed in the many IFCC references.

Upon skimming the IFCC document, it has the same accuracy goals as POCT12-A3, which basically leaves 2% of glucose meter results as unspecified (e.g., could be real bad). Since the IFCC document covers the possibility of interferences and user errors as a reason for errors, someone needs to tell me why 2% of glucose meter results are unspecified.

The problem is, say you did an evaluation with 100 samples and 1 of them had a large error (much greater than a 20% error). A problem with the POCT12-A3 spec is that allows one to say for the results of this evaluation, the spec has been met even though the bad result could cause patient harm. Hence, meeting the POCT12-A3 spec implies that one has achieved accuracy as suggested by a standards group and this could justify one to ignore the bad result.

A selected catalog of critiques

July 12, 2018

The highlighted articles can be viewed without a subscription.

Imprecision calculations – Evaluations commonly reported total imprecision as less than within-run imprecision. Correct calculations are explained.

How to Improve Estimates of Imprecision Clin. Chem., 30, 290-292 (1984)

Total error models – Modeling total error by adding imprecision to bias is popular but fails to account for several other error sources. These articles (and others) provide alternative models.

Estimating Total Analytical Error and Its Sources: Techniques to Improve Method Evaluation Arch Pathol Lab Med., 116, 726-731 (1992)

Setting Performance Goals and Evaluating Total Analytical Error for Diagnostic Assays Clin. Chem., 48: 919-927 (2002)

Too optimistic project completion schedules – Project managers would forecast completion dates that were never met. The article shows how to get better completion estimates using past data.

Beware the Percent Completion Metric Research Technology Management, 41, 13-15, (1998)

GUM – The guide to the expression of uncertainty in measurement was suggested to be performed by hospital labs. There’s no way a hospital lab could carry out this work.

A Critique of the GUM Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin. Chem., 49:1818-1821 (2003)

ISO 9001 – There have been many valuable quality initiatives. In the late 80s, ISO 9001 was a program to certify that companies that passed had high quality. But it was nothing more than documentation – it did nothing to improve quality. Maybe the lab equivalent ISO 15189 is the same.

ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry Accred. Qual. Assur., 9: 39-43 (2004)

Bland-Altman plots – Bland-Altman plots (difference plots) suggest plotting the difference of y-x vs. (y+x)/2 in order to prevent spurious correlations. But the article below shows that if x is a reference method, following Bland and Altman’s advice will produce a spurious correlation. The difference of y-x vs x should be plotted when x is a reference method.

Why Bland-Altman plots should use X, not (Y+X)/2 when X is a reference method Statistics in Medicine, 27 778-780 (2008)

Six Sigma – This metric is often presented as a sole quality measure but it basically measures only average bias and imprecision. As this article shows there can be severe problems with an assay even when it has a high sigma.

Six Sigma can be dangerous to your health Accred Qual Assur 14 49-52 (2009)

Glucose standards – The glucose meter standard ISO 15197 has flaws. This letter pointed out what the experts missed in a question and answer forum.

Wrong thinking about glucose standards Clin Chem, 56 874-875 (2010)

POCT12-A3 – The article explains flaws in this CLSI glucose standard

The new glucose standard POCT12-A3 misses the mark Journal of Diabetes Science and Technology, September 7 1400–1402 (2013)

Regulatory approval evaluations – The performance of assays during regulatory evaluations is often quite better than when the assays are in the field. The articles gives some reasons why.

Biases in clinical trials performed for regulatory approval Accred Qual Assur, 20:437-439 (2015)

MARD – This metric to classify glucose meter quality leaves a lot to be desired. The article below suggests an alternative

Improving the Glucose Meter Error Grid with the Taguchi Loss Function Journal of Diabetes Science and Technology, 10 967-970 (2016)

Interferences – Motivated by a recent paper where interferences were treated almost as a new discovery (and given a new name), this paper discusses how specifications and analyses methods can be improved by accounting for interferences. And I also mention how the CLSI EP7 standard reports interferences incorrectly and could cause problems for labs. 

Interferences, a neglected error source. Accred. Qual. Assur. 23(3):189-192 (2018).

My battle with commutability – postscript

July 10, 2018

Since my article about commutability was rejected a number of times and I published my thoughts on my blog, I reflect here on some differences between blogs and journal publications.

I felt less constrained in writing on my blog. Whereas the basic ideas were the same, the actual content is quite different.

Of course for a blog, publishing means a few button clicks whereas publishing in a journal is much more difficult.

A blog is in some ways more accessible than a published article since a subscription is required for many journals.

Although it’s not that common, I’ve seen blogs referenced in journal articles. This is interesting since there’s nothing preventing the blog author from changing content.

I note in passing that when you’re the editor of a journal, that journal effectively becomes your blog.

But a blog is not reviewed. The reviews I received for my article about commutability did help as did reviews that I received for other articles. The only bone to pick about the reviews for my commutability article is that some reviewers said, there’s an established literature about the value of commutability – hence case closed.

My battle with Commutability part 4

July 7, 2018

Part 3 of this series is here.

The final version in this series is facilitated by a recent article. This article is written by three people, two of which are on the IFCC committee – hence there is traceability (Smiley).

The article provides a review of standardization and harmonization, which are essential in getting results from different labs to agree. I encourage people to read this well written review of standardization and harmonization.

Finally, this article discusses commutability and provides a figure showing three graphs. Each graph contains the results from measurement procedure 2 vs measurement procedure 1 for both clinical samples and reference materials. (I won’t show this figure due to copyright issues). The results are as follows:

Panel A shows clinical samples and reference materials on the same line – thus these reference materials are called commutable.

Panel B shows clinical samples with reference materials not the same line – thus these reference materials are called noncommutable.

Panel C shows the noncommutable reference materials from Panel B used as calibrators with the result that the clinical samples are offset and said to be inaccurate.

But this assumes that measurement procedure 1 is being used as a reference procedure. So even when there is no real reference method, often a method is chosen to be the reference method. And, the information in Panels A and B are equivalent. If one calibrates Panel A using its reference materials as calibrator, one gets accurate values, but one can still get accurate values in Panel B using its reference materials as calibrator, provided one offsets the calibrator values to make the patient samples come out right. Thus, Panels A and B have the same information.

To summarize, standardization and harmonization are essential for results to agree – commutability is not.

My battle with commutability part 3

July 7, 2018

Part 2 of this series is here.

It is helpful to review bias in clinical chemistry. For virtually any assay, one can envision a graph of response vs dose. The response is some physical signal and the dose is the concentration of measurand. One gets concentration through the calibration curve, which is an algorithm that relates response to concentration. A necessary component of the calibration is the assignment of values to the calibrator. One has responses for reference materials and clinical samples. If the responses are the same the reference materials are commutable, if not the reference materials are noncommutable.

Consider some cases of systematic bias that affects all results:

Case 1 – the reagent degrades within the same calibration cycle. Basically this means the response will no longer be appropriate for the assigned values of the calibrator. Bias will occur whether the calibrator is commutable or noncommutable.

Case 2 – Whether one is using commutable or noncommutable calibrators, most manufacturers rely on the concept of master lots, meaning the value of a master lot gives correct answers but is in limited supply. Thus, secondary lots are produced and the values from the master lot is transferred to the secondary lot. This process continues for many secondary lots. Unfortunately, there can be error in each transfer and these errors can accumulate. Bias can occur whether the master lot of calibrator is commutable or noncommutable.

Of course, there are many other causes of bias many of which are random such as interferences. But I fail to see how using commutable calibrators will eliminate differences in results as implied in the series of article on commutability.

My battle with commutability part 2

July 3, 2018

The first installment about commutability is here.

Let’s consider how IFCC evaluates the difference between the reference material and the clinical samples. The terms are (I’m not using the exact symbols):

  1. an acceptance criteria “C” which must be achieved
  2. the average difference “D” between the reference method and the clinical samples
  3. and uncertainty interval “U” estimated for the average difference D

IFCC states 3 possible cases for the result of the commutability experiment, although when one reads further there is a fourth case. Why it is absent from the 3 cases is a mystery, but for ease of understanding, I’ll list all 4:

  1. Commutable D±U includes 0 and ±U is inside of C
  2. Noncommutable D±U doesn’t include 0 and ±U is outside of C
  3. Indeterminate D±U doesn’t include 0 and ±U is inside of C
  4. Bad experiment meaning ±U is greater than abs (C)

Now the acceptance criteria C is said to be a fraction of the total allowable error. Let’s assume C is ±10% error and focus on some cases.

Case 3 – Let’s say for case 3 D=5% and ±U= ±3. To call this “indeterminate” makes no sense. The basic result of a commutability experiment is that a difference between the reference material and clinical samples when tested on two methods has either been or not been detected. In this case a difference has been detected! IFCC calls this “indeterminate” because the difference is within “C.” But in practical terms (e.g., like in baseball, one is either safe or out) the only meaningful result is commutable or noncommutable and in this case the result is noncommutable.

Case 1 – Let’s say for case 1 D=4% and ±U= ±4. This result is commutable because D includes 0 and U is within C.

Before going on, what if this result were for a suspected interfering substance. Now having worked in manufacturing I understand the concept of acceptability. We often set acceptability limits at 50% of the total allowable error, so 10% as above could be used. But our notion of acceptability is intended for random errors. An interfering substance might occasionally occur, and thus could be considered a random error and the probability of concurrent random errors (e.g., an interference and a high multiple of the SD) is quite remote. Hence the result would be accepted, even if the result were D=4% and U=±1.

Back to the reference material. In this version of case 1, if we accept this reference material as a commutable calibrator, we will have a systematic error which saddles every clinical result with a 4% bias. Now no one in industry would do this, especially if the comparison method were a reference method (if the comparison method were not a reference method will be covered later). What would a manufacturer do? Simple – set the reference material used as a calibrator to a value that would guarantee that the clinical samples have no bias – in other words treat this case as if the reference material is noncommutable.

Now a reader might suggest that my examples are rigged but they are within the parameters suggested by IFCC. Moreover, even though “no difference detected” is true for case 1, it is also true that the expected value of D is 4%. Also, one might question what’s the big deal with a bias of 4%? Well, Klee (1) has shown that even small biases cause diagnostic errors, so when one can avoid introducing bias, one should.

I mention in passing that as reported in the first installment, eliminating clinical samples that might interfere as suggested by IFCC, is a prescription for bias. Thus, it is possible that all of the differences calculated in the commutability experiment are biased and cannot be counted on as representative or in more simple terms are bogus.


  1. Klee GG, Schryver PG, Kisbeth RM. (1999) Analytic bias specifications based on the analysis of effects on performance of medical guidelines. Scand J Clin Lab Invest. 59:509-512