A little more on the glucose standard ISO 15197 and total error

April 24, 2010

EP21 is a CLSI standard on total error, which predated the ISO glucose standard. Although I advocated the use of total error 10 years earlier, EP21 was the first standard. As the ISO glucose standard was being prepared, the authors could choose traditionally based limits or total error.

Traditional based limits specify average bias and imprecision and total error as the average bias ± twice the imprecision. This underestimates total error by neglecting random patient interferences.

EP21 simply looks at the differences between the candidate and comparative method. It will slightly overestimate total error since the imprecision of the comparative method will be included in the total error estimate. The overestimate can be minimized by using the average of replicates of the comparative method.

I was asked to attend a meeting of ISO 15197 to answer questions about total error as estimated by EP21. Hence, this is why I know a bit about the ISO glucose standard. Although I can’t say why the EP21 method of estimating total error was used, EP21 will provide wider limits than traditional based limits. Now if you are a regulatory affairs person within industry, this is good.

One thing that EP21 was clear about is reproduced here.

An additional consideration is the relationship between outliers and total analytical error (measurement error). In this document, outliers are considered values beyond predefined limits. Typically, these limits will be wider than total analytical error limits. Note that this definition means that results that are not detached from a distribution might still be considered outliers. To evaluate outliers, one estimates the rate at which they occur. To pass an outlier goal, the observed rate must be below the specified rate. Table 1 shows the possible outcomes of a study to evaluate total analytical error and outliers.

Table 1. Evaluation Results for Total analytical error and Outliers

Case Evaluation of Total
Analytical Error
of Outliers
1 Pass Pass
2 Pass Fail
3 Fail Pass
4 Fail Fail

So this is much like an error grid. The problem with ISO 15197 is they used total error from EP21 without mentioning the problem of outliers.

Reducing glucose meter error rates – comments about the day 1 FDA glucose meeting

April 20, 2010

Let’s view the glucose meter situation as a FRACAS (Failure Reporting And Corrective Action System). A FRACAS is a process to reduce an unacceptable error rate. As an example of a successful FRACAS, Dr. Peter Pronovost reduced the infection rate for placing central lines (described here). The steps he used are:

  1. Measure the error rate
  2. Observe events (e.g. errors)
  3. Classify events
  4. Propose and initiate corrective actions
  5. Re-measure the error rate

In this case, the initial error rate was 11% and the re-measured rate was 0.

So how does this relate to glucose meters?

Step 1 – FDA has an estimate of an error rate of about 0.1% for the 7.2 million insulin using diabetics.

Step 2 – which for the FDA is the same as step 1, is the MAUDE adverse event database. However, if one looks at a MAUDE adverse event report, one sees problems. The advice on how to report an event suggests a user should email the problem with as much information as possible as listed on this web page. How many users actually find this web page? What should be used is a web form using dropdowns lists where appropriate, which guides the user in filling in information. The web form address should be publicized (listed on every glucose meter?) This would reduce wrong spellings and duplicate categories. The currently suggested items to fill in are meter centric – they do not follow the process of obtaining a glucose result. In the central line FRACAS, each process step was observed and it was determined whether it was followed or not. In glucose testing, for example, it is important to wash and dry one’s hands. This should be listed on the web form:

Wash hands? Y or N

Dry hands? Y or N

Steps 3-5 – Now as one collects statistics about the frequency of hand washing, one cannot say that lack of hand washing causes adverse events; however, if it is important to wash hands, then the procedure should be followed and one could postulate that a corrective action – to increase the frequency of hand washing – might lower the adverse event rate. So a program would need to be put in place to increase the frequency of following the process and then seeing what effect the action has on the adverse event rate. In the central line infection rate case, it was observed that doctors omitted one of the five steps in placing a central line 30% of the time. When the compliance rate was raised to 100%, the infection rate dropped from 11% to 0%.

On another note, a lot of effort is spent on evaluating glucose meters to qualify them before they are sold. To evaluate glucose meters, a method comparison is performed. There are various ways to analyze the data from this experiment but even if an error grid is used, the results are most likely not relevant to the error rate.

This is because – as specified in ISO 15197–and as supported by FDA comments, the method comparison study excludes the possibility of many use errors. So the analytical and usability performance of the meter are assessed separately without an attempt to combine them to arrive at the only meaningful performance for clinicians and patients – how the meter performs in routine use. This needs to be changed such that the potential for use error is included in glucose evaluations. Even with the inclusion of use error, large errors will probably not show up due to the small sample size (e.g., you need 10 evaluations of a sample size of 100 for one large error, assuming that an adverse event rate of 0.1% is triggered by a large error). But including use error will give a much better estimate of the width of data around the “A” zone in an error grid, which is the subject of much interest.

Comments on the FDA Glucose Meeting

April 19, 2010

On March 16-17, there was a public FDA/CDRH meeting on glucose meters. Here are some comments based on the transcripts of the March 16th meeting.

Jeffrey E. Shuren, M.D., J.D., Director, Center for Devices and Radiological Health – In the US, there are 24 million diabetics of which 30% take insulin and use glucose meters. FDA receives notice of about 12,000 adverse events about glucose meters each year.

Comment – This equals 7.2 million glucose meter users. If one tests once per day, the adverse event rate is 0.17%. If one tests four times per day, the adverse event rate is 0.04%. In either case, there will be an adverse event about once every two years.

Patricia Bernhardt, M.T.(ASCP), Office of in Vitro Diagnostic Device Evaluation and Safety – There is a evaluation of user performance required by the FDA that is distinct from the evaluation of accuracy. The comparative method for accuracy is typically the YSI meter whereas the comparative method for user performance is the candidate meter, with the evaluation being different users (health care professionals vs. lay users).

Comment – It makes no sense to separate these two studies, or putting things another way, there is no attempt to combine the information from the two studies to arrive at a total error estimate.

Mitchell Scott, Ph.D., Washington University School of Medicine, St. Louis, Missouri – They do 600,000 glucose results a year. If 95% are within limits that means 30,000 results can be anywhere. “So I think we really need to think about this 5 percent of the values just sort of being unclassified.”

Comment – Finally, we’re starting to see others point out the problem with only specifying 95% of the results. George Cembrowski and I mention this (1) and I mention this also in my Letter (2).

From August of 2002 in an email I wrote to the leader of the group working on ISO 15197:

“Along these lines, the section about total error is excellent. Yet, I have a concern. In EP21, there is mention that total error by itself is inadequate. One must have information about outliers. This seems to be missing in my (quick) reading of the ISO document – maybe I missed it. I mentioned this problem formally in a recent Clinical Chemistry paper (this year 919-927). Thus for glucose meters, if 4.9% of the values are worse then goals, the meter is ok (ISO) but 4.9% outliers is a huge problem. Of course, one has the Lifescan problem for a real example – you probably have a lot of information on this. I suspect their outlier rate was less than 1% but this was still a huge problem.”

I got a thank you for my email, but no action about the point I made. The reference above is (3).

Mitchell Scott, Ph.D., Washington University School of Medicine, St. Louis, Missouri – Dr. Scott has data in his hospital that show that 0.8% to 1.2% of glucose results are repeated within 15 minutes, indicating that the operator did not believe the results. The mean difference between the two results is 84 mg/dL.

Barry Ginsberg, M.D., Ph.D., Diabetes Consultants, Wyckoff, New Jersey – Virtually every small study shows great data (no values in zones C or higher in an error grid) but when you look at large studies from manufacturers, there is a 0.03 to 0.3% outlier rate with values 100 to 200 mg/dL different.

Comment – These outlier rates are close to the adverse event rate reported by the FDA.

David B. Sacks, M.D., M.B., Ch.B., Harvard Medical School and Brigham and Women’s Hospital, Boston, Massachusetts – “And as has been mentioned, the other recommendations, CLSI or ISO, 95 percent of results, and this has been discussed in some detail, but I’m going to discuss it again. So, as was suggested by Dr. Scott, I think we need an addendum to meet a performance criteria, because 5 percent are excluded from accuracy criteria. And these values can be essentially anything. So, if you do the calculation, if a patient does self-monitoring of blood glucose four times a day, you’d expect one result to be outside the analytical limit every five days. The problem is the patient won’t know which this one result is, which is outside. So that’s very, very frequent. So I think we need to define criteria that include these 5 percent of values.”

Comment – This sounds familiar, lol – from George and my reference 1 – “The problem with this [ISO] standard is simple: up to 5% of the results can be medically unacceptable. Consider what this means for SMBG. If a subject tests his (her) blood glucose four times daily, then on average there could be a medically unacceptable result every 5 days (once per 20 measurements).”

Steve Brotman, M.D., J.D., Advanced Medical Technology Association (AdvaMed) – “The standard currently governing blood glucose meters for self-testing, ISO 15197, itself recognizes the importance of usability improvement. Specifically, it notes that the goals for performance criteria should be weighed against the capabilities of current self- monitoring devices. Furthermore, the standard notes that care should be taken implementing performance requirements that cause manufacturers to focus design improvements on analytical performance at the expense of other important attributes, such as greater convenience and greater compliance. Thus, the standard acknowledges the careful balance of these factors and the minimum acceptable device performance for glucose meters for self-testing. The standard supports performance improvements beyond analytical performance, such as advances that reduce dependence on user technology, otherwise referred to as patient usability.”

Comment – Sounds like double talk. The ISO standard does not measure total error where error equals accuracy + user error.

Barry Ginsberg, M.D., Ph.D., Diabetes Consultants, Wyckoff, New Jersey – “But when you look at the ISO standard and ask, So if the actual blood glucose is 70, at the various ISO standards, what is my 95 percent confidence limits on what I’m going to see? And so at the ISO standard of 20 percent, that 70 is somewhere between 55 and 85.”

Comment –Dr. Ginsberg does not understand the standard. The 95 percent is not a confidence interval, it’s saying what percentage of the population should be contained within the stated limits.

Barry Ginsberg, M.D., Ph.D., Diabetes Consultants, Wyckoff, New Jersey – His experience is that four out of five do not wash their hands before testing and this can drastically affect results, especially considering the reduced sample volume in today’s meters.

Comment – Interesting point.

David C. Klonoff, M.D., F.A.C.P., Mills-Peninsula Health Services, San Mateo, California – A regulatory standard means this is something that’s achievable. When doctors make requests or standards, we tend to talk about what’s needed. Those are clinical standards.

 Comment – Also in reference 1.

Diane Rutherford Ken Block Consulting Dallas, TX – She makes the point that high values are repeated but values we like may not be repeated, but also may not be accurate.

Comment – Interesting point. This is a type of selection bias and the overall effect is to underestimate the rate of outliers.

Richard Melker, University of Florida College of Medicine – “The last thing I want to say is what you said about hand washing, which I think is really interesting. Because if you wash your hands and you don’t dry them really well, you get low glucose readings, because you have water on your hands. So not washing your hands is a problem, and washing your hands and not drying — which takes a fair amount of time to do properly — is a problem. The other problem with not drying your hands completely is if you open the vial and you take a glucose test strip out while your hands are wet, you can ruin all the other test strips in that vial. Nobody teaches patients about any of these issues, so have at any of them.”

Comment – Interesting point. Mr. Melker is a type I diabetic.

Gary L. Myers, Ph.D., Division of Laboratory Sciences at the Centers for Disease Control and Prevention, Atlanta, Georgia – “There is no current consensus exists among manufacturers about the most appropriate way to publish guidelines on how a interfering substance is affecting a particular method.”

Comment – CLSI had a guideline to address this – EP11 (Uniformity of Claims) but (some) manufacturers with the help of AdvaMed killed it.


Note: All references were available before the FDA meeting.

  1. Krouwer JS and Cembrowski GS. A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology, 2010;4:75-83.
  2. Krouwer JS.  Wrong thinking about glucose standards. Clin Chem, in press. Available online.
  3. Krouwer JS Setting Performance Goals and Evaluating Total Analytical Error for Diagnostic Assays. Clin Chem 2002;48:919-927.

Bad risk management vs. good risk management

April 15, 2010

I gave a talk at NERCE about risk management and based on a comment, realized that I failed to explain something properly. So after thinking about it, here it is.

Bad risk management – This could occur either by a manufacturer or a clinical laboratory. As a manufacturer example, assume a glucose meter is being released for sale with remaining known issues. The manufacturer could perform a risk benefit analysis which looks at the risk of releasing the glucose meter with known issues vs. the benefit of having diabetics use glucose meters. This analysis will always favor releasing the product because the lack of knowledge from not using a glucose meter outweighs the risk of harm from erroneous results.

Good risk management – This could also occur either by a manufacturer or a clinical laboratory. As a clinical laboratory example, blood gas results for an operating room must be produced. If the machine fails, patient harm may result. As a control measure, many laboratories have multiple blood gas analyzers. If one has two blood gas analyzers, the risk of a not producing a result is lower because both analyzers must simultaneously fail. Yet, the risk of failure is not zero. One can add a third analyzer and get an even lower risk but again it is still above zero. So at some point, one must accept the failure risk because funds are limited. This risk level is often called ALARP (as low as reasonably practicable).

The reason why the “bad risk management” example is bad is because the risk was not reduced to the ALARP level. Of course, what is practicable will differ depending on resources available, culture, regulations, and so on.

EPCA-2 update number 5

April 14, 2010

I attended a lecture by Dr. Daniel Chan, a coauthor of the first paper (1) on EPCA-2. The title of Dr. Chan’s lecture was “Translation of Proteomics Biomarkers into Clinical Practice for Ovarian and Prostate Cancer.” Guess what – not a peep about EPCA-2 during this talk. I spoke to Dr. Chan before the talk. It seems as if Dr. Chan is distancing himself from Getzenberg. I didn’t ask Dr. Chan any hard questions (such as is all this totally bogus) and the only thing that Dr. Chan offered was that there were likely problems with the EPCA-2 assay. But this is not news as it was already reported by Dr. Diamandis. So not much to report.


  1. Leman ES, Cannon GW, Trock BJ, Sokoll LJ, Chan DW, Mangold L, Partin AW, Getzenberg RH. EPCA-2: a highly specific serum marker for prostate cancer. Urology  2007;69:714-20.