Assay specifications are not based on clinical needs

June 18, 2010

Published specifications for assays are often said to be based on clinical requirements – an example is the glucose meter specification ISO 15197:

“The minimum acceptable accuracy criteria are based on the medical requirements for glucose monitoring.” 

This blog entry shows that rather than medical requirements, specifications are based on currently achievable performance.

Regulators have to make a binary decision – to approve or reject an assay. The issue can be seen in the following table

  Benefit Risk
Approve assay Information helps clinicians Assay errors cause wrong
medical decisions
Reject assay No wrong medical decisions
from assay errors
Lack of information from assay causes harm

Consider glucose meters. Each year, a certain percentage of people (albeit very low) are harmed by glucose meter errors. If one were to use more stringent specifications that all existing glucose meters would fail to meet and would have to be taken off the market, the harm from the lack of information from the meters would be much greater than the harm caused by existing glucose meter errors. If the existing glucose meter error rate increased, the analysis would be the same – the lack of information from the meters would be much greater than the harm caused by existing glucose meter errors.

Given that current glucose meters have a certain performance (bias, imprecision, ease of use, and so on) a new meter is expected to have comparable performance to the performance of current meters and specifications take into account current performance. This has nothing to do with the clinical need for glucose meter performance.

Simulations need to account for more than 95% of results

June 16, 2010

In 2001, Boyd and Bruns published a paper about the effects of glucose meter error on insulin dose errors (1). I commented that their simulation model was incomplete (2) and they agreed (3). Now, one of the authors has with others published a new paper (4), which is similar to the original simulation so once again I have sent a Letter to the editor in response.

Since the first paper, I and my colleague published a review on glucose meter performance statistics (5) and I published a Letter (6) critiquing experts’ views (7-8) on glucose meter performance standards. References 5 and 6 point out additional problems beyond those mentioned in reference 1.

The authors of reference 4 have chosen to ignore all of this previous work, and although there are a bunch of problems with their simulation, here’s the biggest problem – the authors suggest constructing limits based on 95% of the results that are within limits. They then go on to say that such and such insulin dose errors will occur for various limits. But by their definition, 5% of results are beyond limits, so that if one accounts for 100% of the population, one always has to add 5% to the bad results. Thus, if they came up with limits whereby according to them no serious insulin dose errors were made, in reality, 5% serious dose errors would be made.

It is equivalent to specifying that 95% of surgeries should be surgeries that involve the correct site. But no one would accept a 5% wrong site surgery error rate.

In glucose meters, there will be insulin dose errors (including serious ones) because with 7.2 million diabetics in the US who inject insulin every day, even for one test per day, this gives 2.6 billion glucose meter assays per year, so one cannot expect all meter errors to be below a limit such that no serious insulin doses occur. But 5%, which turns out to be 131 million serious insulin dose errors as a result of glucose meter errors are too many for a glucose meter performance specification.


  1. Boyd JC and Bruns DE Quality Specifications for Glucose Meters: Assessment by Simulation Modeling of Errors in Insulin Dose Clin Chem 2001;47:209-214.
  2. Krouwer JS. How to Improve Total Error Modeling by Accounting for Error Sources Beyond Imprecision and Bias Clin Chem  2001;47:1329-30.
  3. Boyd JC and Bruns DE Response to How to Improve Total Error Modeling by Accounting for Error Sources Beyond Imprecision and Bias Clin Chem  2001;47:1330-31.
  4. Karon BS, Boyd JC, and Klee GG. Glucose Meter Performance Criteria for Tight Glycemic Control Estimated by Simulation Modeling
  5. Krouwer JS and Cembrowski GS. A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology, 2010;4:75-83.
  6. Krouwer JS. Wrong Thinking about Glucose Standards Clin Chem 2010;56: 874 – 875.
  7. Sacks DB. Tight glucose control in critically ill patients: Should glucose meters be used?. Clin Chem 2009;55:1580-1583.
  8. AACC. September 2009 clinical laboratory news: higher standards on the way for glucose meters? (Accessed November 2009).

Bad R&D

June 8, 2010

I was made aware of a case of R&D fraud for a diagnostic assay by the Health Care Renewal blog. The story is here and a Wiki article is here. The company is Sequenom and the test is an assay for Down’s syndrome.

EP27 (Error Grid) Update

June 6, 2010

EP27 is a CLSI standard about constructing error grids for diagnostic assays and also about evaluating data in error grids. It was first published as a P (proposed) standard about a year ago. The A (approved) version EP27A has been stuck for a year!

So here is the objection to EP27A.

EP27 needs to define the various regions that form the error grid. The innermost region (zone A) is currently defined as the region which is thought to have a low potential for patient error and to include a high percentage of the data (often 95% or greater).

The commentator objects to having the percentage of data as part of the definition. The rationale is the definition should be based on clinical grounds, not on assay performance (e.g., percentages).

This objection seems reasonable but here’s why I favor the current definition.

“Clinical grounds” are stated as if they are limits set in stone. They are usually provided by asking clinicians, which is reasonable. As assay error increases, the probability of a clinician making an incorrect medical decision increases. Is there an amount of error which demarcates no patient harm from patient harm? I would submit the answer is no and that clinicians when asked to provide this limit, will give different answers, based on anecdotal data (as opposed to a clinical trial).

If a clinical trial could be performed, it would show the percentage of patients that are not harmed for cases where the limit has just been met. Say this is 90%. Now a bunch of other clinicians might come up with somewhat more stringent limit which might yield a percentage of unharmed patients as 95%. In this hypothetical trial, there are different patients with a variety of clinical symptoms and histories and different clinicians but the errors are all the same and at a specified limit. The point is that whatever limit is chosen (here the limit that results in 95% of patients with no harm is chosen) the limit will provide a percentage of patients which are unharmed and hence percentages are involved – call this percentage 1.

So the “clinical limits” imply percentages before any data is collected. The standard talks about how much data is in zone A – call this percentage 2 – not the percentage of patients that are unharmed. These two percentages are different.

Percentage 1 is not measureable although its intent is to be high (most patients unharmed). But percentage 1 will remain high only if percentage 2 is high. Thus, in the hypothetical trial if everything were the same then by definition, 100% of the errors would be just below the limit and 95% of the patients would be unharmed. Since 100% is not a practical number to specify, a high percentage will lead to close to 95% of the patients unharmed. Now if the errors are real, then the number of patients with no harm is unknown, but as long a high (95% or more) percentage of errors is within the limit, then the percentgae of unharmed patients will be close to 95%.  Hence percentages should be part of the definition.

As a footnote, harm associated with exceeding zone A limits is considered as minor injury, as opposed to results in the LER zone (zone C) where LER = Limits of Erroneous Results and harm is associated with major injury or death.

EP21 (Total Error) Update

June 6, 2010

EP21 is a CLSI standard about estimating total error for diagnostic assays. It was first published in 2002 and has been a core part of the FDA’s guidance for CLIA waived assays. I published a paper in 1992, which among other papers supports the standard (Krouwer JS. Estimating Total Analytical Error and Its Sources: Techniques to Improve Method Evaluation.  Arch Pathol Lab Med 1992;116:726-731).

I led a group to revise EP21A, as required by CLSI procedure and this was completed about a year ago. But although the revision to EP21A has been minor, EP21A2 has been stuck for a year!

So here are some of the objections to EP21A2.

One reject vote had a lengthy harangue which for the sake of simplicity could be summarized that EP21A2 was too complicated. This comment included:  

Finally, this document not only is a departure but an abandonment of the traditional formula for estimating total error as a quantity [TE= (z factor)(CV%) +Bias%) widely publicized by Westgard and others.

I pointed out that the owner of this comment is listed as an author of the original EP21A standard and never complained about the original standard being too complicated – the revision to the original standard is minor. But since the original standard was issued, the commentator’s company entered into a financial arrangement with Westgard. I should say that I tried to point out this conflict of interest but this part of my response to the comment was always deleted by CLSI.

Another comment objected to the inclusion of pre- and post-analytical error to EP21A2. I have since modified the text of the document but the basic idea remains – it is intended to estimate all (e.g. total) error in the protocol that would be expected to occur in routine use. So for example if a finger stick sample were used routinely, then the protocol should be conducted with finger stick rather than venous samples. Putting things another way, if venous blood were used, then the potential error source from finger sticks would have been excluded from the protocol, which is a source of bias.

The objection to this was:

Pre-and post analytic error is not measurable in a reproducible manner.  It will vary by location (ED vs. clinic, ICU private office etc.), personnel obtaining samples, reporting results. … We in the lab and manufacturers have very little control over this.  We can however measure bias and imprecision quite well and then estimate total analytic error.

This is a misconception – even if it weren’t – it wouldn’t matter. Whether one has control over an error is not the issue, the task is to measure the error that will be observed in routine use.

As an example, say there are nurses performing finger sticks on assays that are under control of the laboratory. One could perform an experiment to measure error from this source as an imprecision component. It would be as reproducible as any other imprecision component (e.g., calibration error). And as for control, reagent bias is an analytical error. Why does the commentator think that he has control over the amount of reagent bias in an assay.

The problem is that laboratorians have been comfortable in using the wrong model (e.g., Westgard) to measure total error and also in excluding pre-analytical error from evaluation experiments that could easily be included (an error that occurs in some standards such as the glucose meter standard ISO 15197).

EP21A2 attempts to make things right and provide more faithful estimates of the error that will be routinely observed.