The bad and the good in lab medicine

May 5, 2020

First, the bad. Well bad is maybe too strong a word.


If you have ever read an ISO standard, you will notice that something is missing. There is no list of authors or committee members. The people who write the standard should be listed!

ISO 9001, ISO 15189

In the 90s, companies would display banners stating there were ISO 9001 certified. From the ISO website “Using ISO 9001 helps ensure that customers get consistent, good-quality products and services, which in turn brings many business benefits.” But accreditation success is judged by the documentation that the organization has, to show that it is following the processes that it has developed. A company could have poor quality but if they can prove through documentation that they follow their processes, they will be accredited. The same applies to ISO 15189, the clinical laboratory version.

Krouwer JS ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry Accred. Qual. Assur., 9: 39-43 (2004)

ISO 15197

This standard describes the required accuracy for patients with diabetes who self-monitor their glucose with glucose meters. It came out in 2003 and was updated in 2013. The 2003 version allowed 5% of the results to have any difference from reference (hence 5% unspecified). The 2013 version reduced the unspecified amount to 1%. People with diabetes do a lot of testing. To have 5% unspecified meant that once a week you could get a result that could kill you from an ISO acceptable glucose meter 2003 version). For the 2013 version this was once a month. I was invited to attend an early meeting of the 15197 committee. It was not run by endocrinologists, but rather by regulatory affairs people from industry.

Krouwer JS Wrong thinking about glucose standards. Clin Chem, 2010;56:874-875.


I spent many years contributing to CLSI in the area of evaluations. This group is dominated by regulatory affairs people and it was always difficult to finish any evaluation standard. For example, when I became chairholder of the committee, I finally published a bunch of standards which had been sitting around for 14 years!

I thought Jim Westgard’s original idea about total error made a lot of sense, so I established and chaired a standard about total error (EP21). When it was time to revise the standard, it occurred to me that the original document was about “total analytical error.” I suggested that in the revision, we include pre- and post-analytical error. There was strong opposition to this and not just by the regulatory affairs people but also by hospital clinical chemists. After a while, the CLSI management threw me out of CLSI.

Clinical Chemistry (Journal)

There have been a lot of good things in the journal Clinical Chemistry but here is one that is not so good. The journal will not accept for review a Letter to the Editor except if the Letter is about an original article. That means that about half of the content in the journal (case studies, opinions, editorials, and so on) are off limits. I asked the editor of the journal during a local AACC meeting and his response went along the lines – we vet our articles very carefully and don’t wish to burden the journal with useless blather. My Letter to Clinical Chemistry published in 2010 about the ISO glucose standard would not be considered today.

The good

I like all sections of the AACC Artery. I look at it every day.

My thoughts on QC

June 15, 2019

Having just read a Clin Lab News article on QC which IMHO is misleading, here are my thoughts.

The purpose of QC is to determine whether a process is in control or not. In clinical chemistry, the process is running an assay. An out of control process is undesirable because it can yield unpredictable results.

QC by itself cannot guarantee the quality of patient results, even when the process is in control. This is because QC does not detect all errors (example an interference).

The quality of the results of an in control process is called the process capability of the process (e.g., its inherent accuracy). QC cannot change this, regardless of the QC rules that are used.

QC is like insurance, hence cost should not be considered in designing a QC program. That is, regardless of how low risk a failure mode is, one should never abandon QC.

Although running more QC can detect an out of control process sooner, any QC program should always protect patient results from being reported when an out of control condition is detected. Risk is not involved.

Minimum system accuracy performance criteria

February 13, 2019

I had occasion to read the ISO 15197:2013 standard about blood glucose meters and was struck by the words “minimum system accuracy performance criteria” (6.3.3).

This reminds me of the movie “Office Space”, where Jennifer Anniston, who plays a waitress, is being chastised for wearing just the minimum number of pieces of flair (buttons on her uniform). Sorry if you haven’t seen the movie.

Or when I participated in an earlier version of the CLSI method comparison standard EP9. The discussion at the time was to arrive at a minimum sample size. The A3 version says at least 40 samples should be run. I pointed out that 40 would become the default sample size.

Back to glucose meters. No one will report that they have met the minimum accuracy requirements. They will always report they have exceeded the accuracy requirements.


Review of total error

February 6, 2019

History – Total error has probably been around for a long time but the first mention that I found is from Mandel (1). In talking about a measurement error, he wrote:

error = x – R = (x – mu) + (mu – R) where x=a measurement and R=reference

The term (x – mu) is the imprecision and (mu – R) is the inaccuracy. An implied assumption is that the errors are IIDN = independently and identically distributed in a normal distribution with mean zero and variance sigma squared. With laboratory assays of blood, this is almost never true.

Westgard model – The Westgard model of total error (2) is the same as Mandel; namely that

Total error TE = bias + 2 times imprecision.

The problem with this model is that it neglects other errors, with interfering substances affecting individual samples as perhaps the most important. Note that it is not just rare, large interferences that are missed in this model. I described a case where small interferences inflate the total error (3).

Lawton model – The Lawton model (4) adds interfering substances affecting individual samples.

Other factors – I added (5) to the Lawton model by including other factors such as drift, sample carryover, reagent carryover.

Here’s an example of a problem with the Westgard model. This model suggests that average bias accounts for systematic error and imprecision accounts for random error. Say you have an assay with linear drift between a 30 minute calibration cycle. The assay starts out with a negative bias, has 0 bias at 15 minutes, and ends with a positive bias. The Westgard model would estimate zero bias for the systematic error and assign imprecision for the random error. But this is not right. There is clearly systematic bias (as a function of time) and the calculated imprecision (the SD of the observations) is not equal to random error.

The problem with Bland Altman Limits of Agreement – In this method, one multiplies (usually x2) the SD of differences of the candidate method from reference. This is an improvement since interferences or other error sources are included in the SD of differences. But the differences must be normally distributed and outliers are allowed to be discarded. By discarding outliers, one can not claim total error.

The problem with measurement uncertainty – The GUM method (Guide to the Expression of Uncertainty in Measurement) is a bottoms up approach which adds all errors as sources of imprecision. I have critiqued this method (6) as bias is not allowed in the method, which does not seem to match what happens in the real world, and errors that cannot be modeled will not be captured.

The problem with probability models – Any one of the above models paradoxically cannot account for 100% of the results which makes the term “total” in total error meaningless. The above probability models will never account for 100% of the results as the 100% probability error limits stretch from minus infinity to plus infinity (7).

Errors that cannot be modeled – An additional problem is that there are errors that can occur but really can’t be modeled, such as user errors, software errors, manufacturing mistakes, and so on (7). The Bland Altman method does not suffer from this problem while all of the above other methods do.

A method to account for all results – The mountain plot (8) is simply a plot (or table) of differences of the candidate method from reference. No data are discarded. This is a nonparametric estimate of total error. A limitation is that error sources that are not part of the experiment may lead to an underestimate of total error.

Error Grid Analysis – One overlays a scatterplot from a method comparison on an error grid. The analysis is simply to tally the proportions of observations in each error grid zone. This analysis also accounts for all results.

The CLSI EP21 story – The original CLSI total error standard used the Westgard model but had a requirement that outliers could not be discarded and thus if outliers were present that exceeded limits, the assay would fail the total error requirement – 100% of the results had to meet goals. In the revision of EP21, the statements about outliers were dropped and this simply became the Westgard model. The mountain plot, which was an alternative method in EP21 was dropped in the revision.

Moreover, I argued that user error had to be included in the experimental setup. This too was rejected and the proposed title change from total analytical error to total error was rejected.


  1. Mandel J. The statistical analysis of experimental data Dover, New York 1964 p 105.
  2. Westgard, JO, Carey, RN, Wold, S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem. 1974;20:825-833
  3. Lawton, WH, Sylvester, EA, Young-Ferraro, BJ. Statistical comparison of multiple analytic procedures: application to clinical chemistry. Technometrics. 1979;21:397-409.
  4. Krouwer JS The danger of using total error models to compare glucose meter performance. Journal of Diabetes Science and Technology, 2014;8:419-421
  5. Krouwer JS Setting Performance Goals and Evaluating Total Analytical Error for Diagnostic Assays. Clin. Chem., 48: 919-927 (2002).
  6. Krouwer JS A Critique of the GUM Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin. Chem., 49:1818-1821 (2003)
  7. Krouwer JS The problem with total error models in establishing performance specifications and a simple remedy. Clinical Chemistry and Laboratory Medicine, 2016;54:1299-1301.
  8. Krouwer JS and Monti KL A Simple Graphical Method to Evaluate Laboratory Assays, Eur. J. Clin. Chem. and Clin. Biochem., 33, 525-527 (1995)

Comment about the interferences AACC webinar

September 13, 2018

I listened to the AACC webinar on interferences presented by David Grenache. He did a great job. But one thing that was presented struck me – the CLSI EP07-A2 definition of interferences –“a cause of clinically significant bias in the measured analyte concentration due to the effect of another component or property of the sample.” (Note – I have corrected a typo in Grenache’s presentation – no biggie).

This definition is bogus and conflicts with VIM (although the VIM definition is in tortured English – for example interference doesn’t appear, what’s defined is influence quantity).

Clearly, if a candidate interference substance can be detected, meaning that its presence affects the result, then the substance interferes.

Whether the interfering substance causes a clinically significant bias is a different question and shouldn’t be used as the definition.

CLSI EP7 3rd Edition

May 24, 2018

I have critiqued how results are presented in the previous version of EP7, where an example is given that if an interference is found to be less than 10% (also implied as less than whatever goal is chosen), the substance can be said not to interfere.

This is in Section 9 of the 2nd Edition. I am curious if this problem is in the 3rd edition but not curious enough to buy the standard.

Speaking of interferences …

May 13, 2018

I have discussed some shortcomings about how interferences are handled. This reminded me of something that I and my coworker published a number of years ago (1).

The origin of this publication came from Dr. Stan Bauer at Technicon Instruments. He was a pathologist with a passion for statistics. He had hired Cuthbert Daniel, a well-known consulting statistician who developed a protocol for the SMA analyzer. This was a nine sample long run of three concentration levels that provided an estimate of precision, proportional and constant bias, sample carryover, linear drift, and nonlinearity. The reason that the protocol worked was the choice of the sample order provided by Cuthbert Daniel.

In 1985, I chose to make a CLSI standard out of the protocol – EP10. It is now in version A3 AMD. (I have no idea what the AMD means).

The protocol could be extended to provide even more information by adding a candidate interfering substance to up to all three concentration levels. Since each level is repeated three times, the interference is added to only one replicate. Using multiple regression, one can now estimate 8 parameters – whereby in addition to the original parameters, the bias (if any) for each of the three interfering substances.

Now one run is virtually useless, but at Ciba Corning, we ran these protocols repeatedly during the development of an assay, so that with multiple runs, if a substance interfered, it would be detected.


Krouwer JS and Monti KL: A Modification of EP10 to Include Interference Screening,. Clin. Chem., 41, 325-6 (1995).

A simple example of why the CLSI EP7 standard for interference testing is flawed

May 10, 2018

I have recently suggested that the CLSI EP7 standard causes problems (1). Basically, EP7 says that if an interfering substance results in an interference less than the goal (commonly set at 10%), then the substance can be reported not to interfere. Of course, this makes no sense. If a substances interferes at a level less than 10%, it still interferes!

Here’s a real example from the literature (2). Lorenz and coworkers say “substances frequently reported to interfere with enzymatic, electrochemical-based transcutaneous CGM systems, such as acetaminophen and ascorbic acid, did not affect Eversense readings.

Yet in their table of interference results they show:

at 74 mg/dL of glucose, interference from 3 mg/dL of acetaminophen is -8.7 mg/dL

at 77 mg/dL of glucose, interference from 2 mg/dL of ascorbic acid is 7.7 mg/dL


  1. Krouwer, J.S. Accred Qual Assur (2018).
  2. Lorenz C., Sandoval W, and Mortellaro M. Interference Assessment of Various Endogenous and Exogenous Substances on the Performance of the Eversense Long-Term Implantable Continuous Glucose Monitoring System. DIABETES TECHNOLOGY & THERAPEUTICS Volume 20, Number 5, 2018 Mary Ann Liebert, Inc. DOI: 10.1089/dia.2018.0028.

New publication about interferences

April 20, 2018

My article “Interferences, a neglected error source for clinical assays” has been published. This article may be viewed using the following link

Performance specifications, lawsuits, and irrelevant statistics

March 11, 2018

Readers of this blog know that I’m in favor of specifications that account for 100% of the results. The danger of specifications that are for 95% or 99% of the results is that errors can occur that cause serious patient harm for assays that meet specifications! Large and harmful errors are rare and certainly less than 1%. But hospitals might not want specifications that account for 100% of results (and remember that hospital clinical chemists populate standards committees). A potential reason is that if a large error occurs, the 95% or 99% specification can be an advantage for a hospital if there is a lawsuit.

I’m thinking of an example where I was an expert witness. Of course, I can’t go into the details but this was a case where there was a large error, the patient was harmed, and the hospital lab was clearly at fault. (In this case it was a user error). The hospital lab’s defense was that they followed all procedures and met all standards, e.g., sorry but stuff happens.

As for irrelevant statistics, I’ve heard two well-known people in the area of diabetes (Dr. David B Sachs and Dr. Andreas Pfützner) say in public meetings that one should not specify glucose meter performance for 100% of the results because one can never prove that the number of large errors is zero.

That one can never prove that the number of large errors is zero is true but this does not mean one should abandon a specification for 100% of the results.

Here, I’m reminded of blood gas. For blood gas, obtaining a result is critical. Hospital labs realize that blood gas instruments can break down and fail to produce a result. Since this is unacceptable, one can calculate the failure rate and reduce the risk of no result with redundancy (meaning using multiple instruments). No matter how many instruments are used, the possibility that all instruments will fail at the same time is not zero!

A final problem with not specifying 100% of the results is that it may cause labs to not put that much thought into procedures to minimize the risk of large errors.

And in industry (at least at Ciba-Corning) we always had specifications for 100% of the results, as did the original version of the CLSI total error document, EP21-A (this was dropped in the A2 version).