Just published

May 8, 2019

The article, “Getting More Information From Glucose Meter Evaluations” has just been published in the Journal of Diabetes Science and Technology.

Our article makes several points. In the ISO 15197 glucose meter standard (2013 edition), one is supposed to prepare a table showing the percentage of results in system accuracy within 5, 10, and 15 mg/dL. Our recommendation is to graph these results in a mountain plot – it is a  perfect example of when a mountain plot should be used.

Now I must confess that until we prepared this paper, I had not read ISO 15197 (2013). But based on some reviewer comments, it was clear that I had to bite the bullet, send money to ISO and get the standard. Reading it was an eye opener. The accuracy requirement is:

95% within ± 15 mg/dL (< 100 mg/dL) and within ± 15% (> 100 mg/dL) and
99% within the A and B zones of an error grid

I knew this. But what I didn’t know until I read the standard is user error from the intended population is excluded from this accuracy protocol. Moreover, even the healthcare professionals performing this study could exclude any result if they thought they made an error. I can imagine how this might work: That result can’t be right…

In any case, as previously mentioned in this blog, in the section when users are tested, the requirement for 99% of the results to be within the A and B zones of an error grid was dropped.

In the section where results may be excluded, failure to obtain a result is listed since if there’s no result, you can’t get a difference from reference. But there’s no requirement for the percentage of times a result can be obtained. This is ironic since section 5 is devoted to reliability. How can you have a section on reliability without a failure rate metric?

The value of error grids

March 29, 2019

My colleague and I sang the praises of error grids as a way to specify performance – for any assay. To recall, here are some of the benefits:

  1. Unlike most specifications, the limits can change with concentration
  2. Unlike most specifications, the limits need not be symmetrical
  3. Most specifications have one set of limits, implying that results within limits cause no harm and results outside of limits cause harm. Error grid have multiple sets of limits – called zones – whereby harm can be none, minor, or major.
  4. Error grid zones account of 100% of the results – they cover the XY space of candidate assay vs reference assay. Most specifications cover 95% or 99% of results, leaving the balance unspecified.

Krouwer JS and Cembrowski GS Towards more complete specifications for acceptable analytical performance – a plea for error grid analysis. Clinical Chemistry and Laboratory Medicine, 2011;49:1127-1130.

Review of setting goals (to determine if the estimated total error is acceptable)

February 7, 2019

The last post described ways to estimate total error. But the reason total error is estimated is to determine if it meets goals. This post describes how to set goals.

Consider the following scenario. A clinician is deciding on a treatment for a patient. Among the criteria used to make that decision are the patient’s history, the physical exam, and one or more blood tests or images. Given the other criteria and a specific blood test with value A, the clinician will decide on a treatment (which may include no treatment). Now assume the blood test’s value keeps diverging from value A. At some point, call it value B, the clinician will make a different treatment decision. If the value B is an error, then it is reasonable to assume that the magnitude of error (B-A) is enough to cause the wrong medical decision by the clinician based on test error. Thus, just under the magnitude B-A is a reasonable error limit. There are a bunch of other assumptions…

  1. The clinician’s decision conforms to acceptable medical practice.
  2. A wrong decision usually causes harm to the patient.
  3. Larger errors may cause different decisions leading to greater harm to the patient.
  4. Although all patients are unique, one can describe a “typical” patient for a disease.
  5. Although all clinicians are unique, most clinicians will make the same decision within a narrow enough distribution of errors so that one can use the average error as the limit.
  6. Given the X-Y space for the range of the test, where X=truth and Y=the candidate medical test, the entire space can be designated with error limits.
  7. It is common (given #6) that there will be multiple limits with different levels of patient harm throughout the range of the medical test.

All of the above can be satisfied by an error grid such as the glucose meter error grid. The error grid should work for any assay.

Note that many conventional error limits are not as comprehensive because …

  1. They use one limit for the entire range of the assay
  2. They do not take into account greater harm for larger errors.
  3. They are not always based on patient results but on controls (e.g., CLIA limits).

Given the above discussion, setting limits using biological variability or state of the art is not relevant to answering the question of what magnitude of error will cause a clinician to make an incorrect medical decision. The only reasonable way to answer the question is to ask clinicians. An example of this was done for glucose meters (1).

A total error specification could easily be improved by adding to it:

  1. A limit for the average bias (2)
  2. A limit (greater than the total error limit) where there should be no observations, making the total error specification similar to an error grid.

Adding a limit for the average bias would also improve an error grid (3).


  1. Klonoff DC, Lias C, Vigersky R, et al The surveillance error grid. J Diabetes Sci Technol. 2014;8:658-672.
  2. Klee GG, Schryver PG, Kisbeth RM. Analytic bias specifications based on the analysis of effects on performance of medical guidelines. Scand J Clin Lab Invest. 1999;59:509-512.
  3. Jan S Krouwer and George S. Cembrowski: The chronic injury glucose error grid. A tool to reduce diabetes complications. Journal of Diabetes Science and Technology, 2015;9:149-152.

New FDA Glucose meter draft guidelines (November 2018)

January 31, 2019

The FDA continues to dis the ISO 15197 standard in both their POC and lay user (over the counter) proposed guidelines:

POC“Although many manufacturers design their BGMS validation studies based on the International Standards Organizations document 15197: In vitro diagnostic test systems—Requirements for blood glucose monitoring systems for self-testing in managing diabetes mellitus, FDA believes that the criteria set forth in the ISO 15197 standard do not adequately protect patients using BGMSs in professional settings, and does not recommend using the criteria in ISO 15197 for BGMSs.”

The POC accuracy criteria are:

95% within +/- 12 <75 mg/dL and within +/- 12% >75 mg/dL
98% within +/- 15 <75 mg/dL and within +/- 15% >75 mg/dL

Over the counter“FDA believes that the criteria set forth in the ISO 15197 standard are not sufficient to adequately protect lay-users using SMBGs; therefore, FDA recommends performing studies to support 510(k) clearance of a SMBG according to the recommendations below.”

The over the counter accuracy criteria are:

95% within +/- 15% over the entire claimed range
99% within +/- 20% over the entire claimed range

To recall, ISO 15197 2013 accuracy criteria are:

95% within ± 15 mg/dl <100 mg/dL

95% within ± 15% >100 mg/dL
99% within A and B zones of a glucose meter error grid

New publication about interferences

April 20, 2018

My article “Interferences, a neglected error source for clinical assays” has been published. This article may be viewed using the following link https://rdcu.be/L6O2

Comments about clinical chemistry goals based on biological variation – Revised Feb. 7, 2018

February 5, 2018

There is a recent article which says that measurement uncertainty should contain a term for biological variation. The rationale is that diagnostic uncertainty is caused in part by biological variation. My concerns are with how biological variation is turned into goals.

On the Westgard web site, there are some formulas on how to convert biological variation into goals and on another page, there is a list of analytes with biological variation entries and total error goals.

Here are my concerns:

  1. There are three basic uses of diagnostic tests: screening, diagnosis, and monitoring. It is not clear to me what the goals refer to.
  2. Monitoring is an important use of diagnostic tests. It makes no sense to construct a total error goal for monitoring that takes between patient biological variation into account. The PSA total error goal is listed at 33.7%. Example: For a patient tested every 3 months after undergoing radiation therapy, a total error goal of 33.7% is too big. Thus, for values of 1.03, 0.94, 1.02, and 1.33, the last value is within goals but in reality would be cause for alarm.
  3. The web site listing goals has only one goal per assay. Yet, goals often depend on the analyte value, especially for monitoring. For example the glucose goal is listed at 6.96%. But if one examples a Parkes glucose meter error grid, at 200 mg/dL, the error goal to separate harm from no harm is 25%. Hence, the biological goal is too small.
  4. The formulas on the web site are hard to believe. For example, I < 0.5 * within person biological variation. Why 0.5, and why is it the same for all analytes?
  5. Biological variation can be thought to have two sources of variation – explained and unexplained – much like in a previous entry where the measured imprecision could be not just random error, but inflated with biases. Thus, PSA could rise due to asymptomatic prostatitis (a condition that by definition that has no symptoms and could be part of a “healthy” cohort). Have explained sources of variation been excluded from the databases? And there can be causes of explained variation other than diseases. For example, exercise can cause PSA to rise in an otherwise healthy person.
  6. Biological variation makes no sense for a bunch of analytes. For example, blood lead measures exposure to lead. Without lead in the environment, the blood lead would be zero. Similar arguments apply to drugs of abuse and infectious diseases.
  7. The goals are based on 95% limits from a normal distribution. This leaves up to 5% of results as unspecified. Putting things another way, up to 5% of results could cause serious problems for an assay that meets goals.

A simple improvement to total error and measurement uncertainty

January 15, 2018

There has been some recent discussion about the differences between total error and measurement uncertainty, regarding which is better and which should be used. Rather than rehash the differences, let’s examine some similarities:

1.       Both specifications are probability based.
2.       Both are models

Being probability based is the bigger problem. If you specify limits for a high percentage of results (say 95% or 99%), then either 5% or 1% of results are unspecified. If all of the unspecified results caused problems this would be a disaster, when one considers how many tests are performed in a lab. There are instances of medical errors due to lab test error but these are (probably?) rare (meaning much less than 5% or 1%). But the point is probability based specifications cannot account for 100% of the results because the limits would include minus infinity to plus infinity.

The fact that both total error and measurement uncertainty are models is only a problem because the models are incorrect. Rather than rehash why, here’s a simple solution to both problems.

Add to the specification (either total error or measurement uncertainty) the requirement that zero results are allowed beyond a set of limits. To clarify, there are two sets of limits, an inner set to contain 95% or 99% of results and an outer set of limits for which no results should exceed.

Without this addition, one cannot claim that meeting either a total error or measurement uncertainty specification will guarantee quality of results, where quality means that the lab result will not lead to a medical error.