Performance specifications, lawsuits, and irrelevant statistics

March 11, 2018

Readers of this blog know that I’m in favor of specifications that account for 100% of the results. The danger of specifications that are for 95% or 99% of the results is that errors can occur that cause serious patient harm for assays that meet specifications! Large and harmful errors are rare and certainly less than 1%. But hospitals might not want specifications that account for 100% of results (and remember that hospital clinical chemists populate standards committees). A potential reason is that if a large error occurs, the 95% or 99% specification can be an advantage for a hospital if there is a lawsuit.

I’m thinking of an example where I was an expert witness. Of course, I can’t go into the details but this was a case where there was a large error, the patient was harmed, and the hospital lab was clearly at fault. (In this case it was a user error). The hospital lab’s defense was that they followed all procedures and met all standards, e.g., sorry but stuff happens.

As for irrelevant statistics, I’ve heard two well-known people in the area of diabetes (Dr. David B Sachs and Dr. Andreas Pfützner) say in public meetings that one should not specify glucose meter performance for 100% of the results because one can never prove that the number of large errors is zero.

That one can never prove that the number of large errors is zero is true but this does not mean one should abandon a specification for 100% of the results.

Here, I’m reminded of blood gas. For blood gas, obtaining a result is critical. Hospital labs realize that blood gas instruments can break down and fail to produce a result. Since this is unacceptable, one can calculate the failure rate and reduce the risk of no result with redundancy (meaning using multiple instruments). No matter how many instruments are used, the possibility that all instruments will fail at the same time is not zero!

A final problem with not specifying 100% of the results is that it may cause labs to not put that much thought into procedures to minimize the risk of large errors.

And in industry (at least at Ciba-Corning) we always had specifications for 100% of the results, as did the original version of the CLSI total error document, EP21-A (this was dropped in the A2 version).


Blog Review

May 26, 2017

I started this blog 13 years ago in March 2004 – the first two articles are about six sigma, here and here. The blog entry being posted now is my 344th blog entry.

Although the blog has an eclectic range of topics, one unifying theme for many entries is specifications, how to set them and how to evaluate them.

A few years ago, I was working on a hematology analyzer, which has a multitude of reported parameters. The company was evaluating parameters with the usual means of precision studies and accuracy using regression. I asked them:

  1. a) what are the limits that, when differences from reference are contained within these limits, will ensure that no wrong medical decisions would be made based on the reported result (resulting in patient harm) and
  2. b) what are the (wider) limits that, when differences from reference are contained within these limits, will ensure that no wrong medical decisions would be made based on the reported result (resulting in severe patient harm)

This was a way of asking for an error grid for each parameter. I believe, then and now, that constructing an error grid is the best way to set specifications for any assay.

As an example about the importance of specifications there was a case for which I was an expert witness whereby the lab had produced an incorrect result that led to patent harm. The lab’s defense was that they had followed all procedures. Thus, as long as they as followed procedures, they were not to blame. But procedures, which contain specifications, are not always adequate. As an example, remember the CMS program “equivalent quality control”?

Biases in clinical trials performed for regulatory approval

May 31, 2015


The title of this post has been accepted for publication in the journal: Accreditation and Quality Assurance. The article describes common biases and how they might be avoided.

Hemoglobin A1c quality targets

March 16, 2015


There is a new article in Clinical Chemistry about a complicated (to me) analysis of quality targets for A1c when it would seem that a simple error grid – prepared by surveying clinicians would fit the bill.

Thus, this paper has problems. They are:

  1. The total error model is limited to average bias and imprecision. Error from interferences, user error, or other sources is not included. It is unfortunate to call this “total” error, since there is nothing total about it.
  2. A pass fail system is mentioned, which is dichotomous and unlike an error grid which allows for varying degrees of error with respect to severity of harm to patients.
  3. A hierarchy of possible goals are mentioned. This comes from a 1999 conference. But there is really only one way to set patient goals (listed near the top of the 1999 conference): namely; a survey of clinician opinions.
  4. Discussed in the Clinical Chemistry paper is the use of biological variation based goals for quality targets. Someone needs to explain to me how this could ever be useful.
  5. The analysis is based on proficiency survey materials, which due to the absence of patient interferences (see #1) is a subset of total error.
  6. From I could tell from their NICE reference (#11) in the paper, the authors have inferred that total allowable error should be 0.46% but this did not come from surveying clinicians.
  7. I’m on-board with six sigma in its original use at Motorola. But I don’t see its usefulness in laboratory medicine compared to an error grid.

More glucose fiction

December 1, 2014


In the latest issue of Clinical Chemistry, there are two articles (1-2) about how much glucose meter error is ok and an editorial (3) which discusses these papers. Once again, my work on this topic has been ignored (4-12). Ok, to be fair not all of my articles are directly relevant but the gist of my articles and particularly reference #10 is that if you use the wrong model, the outcome of a simulation is not relevant to the real world.

How are the authors’ models wrong?

In paper #1, the authors’ state: “The measurement error was assumed to be uncorrelated and normally distributed with zero mean…”

In paper #2, the authors state:” We ignored other analytical errors (such as nonlinear bias and drift) and user errors in this model.”

In both papers, the objective is to state a maximum glucose error that will be medically ok. But since the modeling omits errors that occur in the real world, the results and conclusions are unwarranted.

Ok, here’s a thought people – instead of simulations based on the wrong model, why not construct simulations based on actual glucose evaluations. An example of such study is: Brazg RL, Klaff LJ, Parkin CG. Performance variability of seven commonly used self-monitoring of blood glucose systems: clinical considerations for patients and providers. J Diabetes Sci Technol. 2013;7:144-152. Given sufficient method comparison data, one could construct an empirical distribution of differences and randomly sample from it.

And finally, I’m sick of seeing the Box quote (reference 3): “Essentially, all models are wrong, but some are useful.” Give it a rest – it doesn’t apply here.


  1. Malgorzata E. Wilinska and Roman Hovorka Glucose Control in the Intensive Care Unit by Use of Continuous Glucose Monitoring: What Level of Measurement Error Is Acceptable? Clinical Chemistry 2014; v. 60, p.1500-1509.
  2. Tom Van Herpe, Bart De Moor, Greet Van den Berghe, and Dieter Mesotten Modeling of Effect of Glucose Sensor Errors on Insulin Dosage and Glucose Bolus Computed by LOGIC-Insulin Clinical Chemistry 2014; v. 60, p.1510-1518.
  3. James C. Boyd and David E. Bruns Performance Requirements for Glucose Assays in Intensive Care Units Clinical Chemistry 2014; v. 60, p.1463-1465
  4. Jan S. Krouwer: Wrong thinking about glucose standards. Clin Chem, 2010;56:874-875.
  5. Jan S. Krouwer and George S. Cembrowski A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology, 2010;4:75-83.
  6. Jan S. Krouwer: Analysis of the Performance of the OneTouch SelectSimple Blood Glucose Monitoring System: Why Ease of Use Studies Need to Be Part of Accuracy Studies. Journal of Diabetes Science and Technology, 2011;5:610-611.
  7. Jan S. Krouwer: Evaluation of the Analytical Performance of the Coulometry-Based Optium Omega Blood Glucose Meter: What Do Such Evaluations Show? Journal of Diabetes Science and Technology, 2011;5:618-620.
  8. Jan S. Krouwer: Why specifications for allowable glucose meter errors should include 100% of the data. Clinical Chemistry and Laboratory Medicine, 2013;51:1543-1544.
  9. Jan S. Krouwer: The new glucose standard, POCT12-A3 misses the mark. Journal of Diabetes Science and Technology, 2013;7:1400-1402.
  10. Jan S. Krouwer: The danger of using total error models to compare glucose meter performance. Journal of Diabetes Science and Technology, 2014;8:419-421.
  11. Jan S. Krouwer and George S. Cembrowski: Acute Versus Chronic Injury in Error Grids. Journal of Diabetes Science and Technology, 2014;8:1057.
  12. Jan S. Krouwer and George S. Cembrowski. The chronic injury glucose error grid. A tool to reduce diabetes complications. Journal of Diabetes Science and Technology, in press (available online)

QC (quality Control) is not quality

May 14, 2013


Based on recent events, I’m restating that for a clinical assay, good quality control results do not imply good quality. Of course, good quality control results is a good thing and poor quality control results means that there are problems, but here are some examples where good quality control results don’t mean good quality.

  1. QC samples do not inform about patient sample interferences, which can cause large errors and result in patient harm. Such events could occur with perfect QC results.
  2. QC informs about biases that persist across time. For example if QC is performed twice per day, a bad calibration (where calibration lasts for a month) will likely be detected. But short term biases will likely be missed.

So if anyone claims, you can select your lab’s quality by running QC according to some scheme, it’s simply not true.

The basis of a spec

May 16, 2012

I proposed and was the chairholder of EP27, the CLSI standard about error grids. A while ago during the document development committee discussions, I suggested that an error limit specification contain two items – the level of error (e.g., ± 10%) and the percentage of results that would meet the limit (e.g., 95%). A committee member strongly objected and said no – the spec should be the level or error only. So I said, would it be acceptable for the 10% spec, if 20% of results met the spec. He said, of course no. So I said, what about 60%? He said again no and commented – I see where you’re going. Yes, the percentage of results is used to determine acceptability but is not part of the spec. So I said, a spec is a set of criteria and an evaluation is conducted to determine if those criteria have been met, but this line of reasoning didn’t convince him. There might have been more to this story but then I was unexpectedly and rather unceremoniously thrown off the document development committee.