The revised total error standard EP21, an example of manufacturers dominating CLSI

May 18, 2015

10626368_10202759197363771_4993224402444483262_o

I had a chance to look at the revision of EP21 – the document about total error that I proposed and chaired. So after 12 years, here are the major changes.

In the original EP21, I realized that even if 95% of the results met goals, the remaining 5% might not, so there was a table which accounted for this. An acceptable assay had to have 100% of its results within goals. The revised EP21 – call it A2 – only talks about 95% of results (similar to the 2003 ISO glucose meter standard). There is no longer any mention of the remaining 5% – these remaining results are unspecified. This goes along with my thinking that manufacturers will refuse to talk about assay results that can cause severe injury or death. Thus, if 95% of the results just meet goals, a portion of the remaining 5% could cause severe injury or death and this portion even for a small percentage could be a big number (as one example, there are 8 billion glucose meter results each year in the US).

The mountain plot and all references to it are gone in A2. To recall, the mountain plot is ideal at visualizing outlier observations. In fact, there could be 10,000 observations but if there were 5 outliers, they would be clearly visible. In place of the mountain plot, there is a histogram with an example with normal looking results – the example that had outliers is gone. And the histogram has only 9 bins so if there were outliers, they would disappear. So again, this is a way to minimize talking about results which can cause major problems.

Somehow, sigma metrics have become part of A2. How this happens is a mystery. Perhaps someone can explain it to me, since whereas I understand the equation: Total error = |bias| + 2 x imprecision, the total error in EP21 is the difference between candidate and comparison assays and this difference can’t be separated into bias and imprecision.

And then there is the section on distinguishing between total error and total analytical error. This is part of the reason I was booted out of CLSI. A2 is constrained to include only analytical error.

Total error, including all sources of variation is the only thing that matters to clinicians. The total error experiment (e.g., EP21) will include errors from only those sources that are sampled. Practically speaking, the sources will be limited, even for analytical error. For example, even if more than one reagent is used, this is not the same as randomly sampling from the population of all reagents during the lifetime of the device – impossible since this involves future reagents that don’t yet exist. The same is true for pre- and post-analytical error but the point is one should not exclude pre- and post-analytical error sources from the experiment.

There is a section on various ways to establish goals. Examples shown are the ISO, CLSI, and NACB glucose meter standards, which have performance goals for glucose meters. A2 talks about the strengths and weaknesses of using expert bodies to create these standards. Now A2 has a reference from May of 2015, but somehow they missed the FDA draft guidance on glucose meters (January 2014) which unlike the examples cited in A2 wants evaluators to account for 100% of the data. And, FDA’s opinion about the ISO glucose meter standard is pretty clear:

Although many manufacturers design their BGMS validation studies based on the International Standards Organizations document 15197, FDA believes that the criteria set forth in the ISO 15197 standard do not adequately protect patients using BGMS devices in professional settings, and does not recommend using these criteria for BGMS devices.

I have published a critique of the CLSI glucose meter standard, which is available here.

When I was chair holder of the Evaluations Protocol Committee, there were battles between regulatory affairs people, who populated the manufacturing contingent and the rest of the committee. For example, I remember one such battle over EP6, the linearity document. The proposed new version finally had a sensible statistical method to evaluate nonlinearity but one regulatory affairs member insisted on having an optional procedure where one could just graph the data and look at it to declare whether it was linear. After many delays, this optional procedure was rejected.

By looking at the new version of EP21, my sense is that the regulatory affairs view now dominates these committees.


Total Error and Milan 3

May 11, 2015

DSC_1288edp

Having mentioned in my first blog entry “Total Error and Milan”, the fact that clinician surveys were dropped as a means of constructing performance specifications, I looked at the published paper on this topic. Many of the citations are from the 80s – there’s nothing wrong with that but I was surprised to see that a recent paper on glucose meter performance specifications, which is here and available before the Milan conference was not cited. In this glucose paper, 206 clinicians were surveyed using 4 scenarios and the range of glucose levels that would correspond to one of 5 types of actions: (A) emergency treatment for low BG; (B) take oral glucose; (C) no action needed; (D) take insulin; and (E) emergency treatment for high BG.

Maybe if the Milan conference were aware of this work, they would have added clinician surveys as a primary means to establish performance specifications.


Total Error and Milan 2

May 11, 2015

DSC_1292edp

Having looked further into this conference, I see that the original slides of the talks of the Milan conference are available as well as a list of articles (without needing a subscription).

So one of the articles of interest to me, was the one that describes using simulation to set performance goals. It is here.

And sure enough, this article refers to the glucose meter simulations originally published by Boyd and Bruns and continued by them and others which I have critiqued over the years.

An article that I wrote which shows why such a model can be misleading is now available without subscription and is here.

And another letter by me – published after the Milan conference – is here (subscription required). This makes three articles I published showing that the Boyd Bruns model is incomplete and misleading.


Total error and Milan

May 8, 2015

DSC_1313edp

Recently, I talked to someone who attended a conference on total error in Milan. Had I known about the conference or been invited, I would have attended. Searching the web, the Westgard web site has summaries and links about this conference. So here are my comments:

  1. The use of allowable performance specifications implies a set of limits that demarcate no harm from harm. This further implies that for many analytes, results that just exceed the limit will cause minor harm. But for many analytes, harm increases as the error increases (such as for glucose meter errors). Thus, small errors may result in minor harm and large errors can be fatal. This can be accounted for by using an error grid (such as a glucose meter error grid) which has separate zones for increasing error and harm.
  2. The allowable performance specifications are for analytical performance. Although pre- and post-analytical errors are mentioned, there is no attempt to present allowable performance specifications that include all sources of error. Thus, in the consensus statement, “The SPC encourages users to expand those specifications [referring to analytical performance specifications] to the total examination process.” This is not something that should be a user exercise.
  3. The primary method for establishing allowable analytical performance specifications is: “Based on the effect of analytical performance on clinical outcomes.” It is interesting to compare for this item, the unofficial summary from the Westgard site, to the official summary. Note that IMHO, the most important method, a clinician survey, has been dropped in the official version.
  4. Also problematic is the suggestion of #2 below of using simulations. In glucose meter modeling, I have published on how misleading these simulations have been.

Unofficial:

In order to develop quality specifications using outcomes, you must complete one of the following:

  1. an Outcome study investigating the impact of analytical performance on clinical outcomes
  2. a Simulation study investigating the impact of analytical performance on the probability of clinical outcomes
  3. a Survey of clinicians’ and/or experts’ opinion investigating the impact of analytical performance on medical decisions

Official:

This can, in principle, be done using different types of studies:

  1.  Direct outcome studies – investigating the impact of analytical performance of the test on clinical outcomes;
  2. Indirect outcome studies – investigating the impact of analytical performance of the test on clinical classifications or decisions and thereby on the probability of patient outcomes, e.g., by simulation or decision analysis.

Glucose error grids vs. ISO / CLSI standards

April 11, 2015

KPWM5edp

Ever wonder why ISO or CLSI glucose standards use primarily one set of limits rather than an error grid? Here’s my explanation.

With an error grid – especially a glucose error grid – there are multiple sets of limits. Data inside of the innermost limit implies no harm to patients and data outside of the outermost limit implies serious injury or death. And of course there are limits in between the inner and outermost limits which range in harm to patients. Although the limits are provided without percentages of data that should be in any region, it is implied that there should be no results in the outermost limits.

With ISO or CLSI, the use of one primary set of limits (corresponding to the innermost limits of an error grid) relieves these standard organizations from having to even mention a case where serious injury or death may occur. And this is probably because these groups are dominated by regulatory affairs people from industry.

 


EPCA-2 Update number 6

March 19, 2015

jail

For no particular reason, I searched for Dr. Getzenberg in Google. To recall about previous entries on this blog, search for EPCA-2 on this blog. (there is a search form on the top right of this blog). I found two rather different entries in Google.

One deals with the seventh retraction for articles written by Dr. Getzenberg

Another talks about awards distinction and how he is a senior leader in oncology and urology.


Hemoglobin A1c quality targets

March 16, 2015

KPSF3edp

There is a new article in Clinical Chemistry about a complicated (to me) analysis of quality targets for A1c when it would seem that a simple error grid – prepared by surveying clinicians would fit the bill.

Thus, this paper has problems. They are:

  1. The total error model is limited to average bias and imprecision. Error from interferences, user error, or other sources is not included. It is unfortunate to call this “total” error, since there is nothing total about it.
  2. A pass fail system is mentioned, which is dichotomous and unlike an error grid which allows for varying degrees of error with respect to severity of harm to patients.
  3. A hierarchy of possible goals are mentioned. This comes from a 1999 conference. But there is really only one way to set patient goals (listed near the top of the 1999 conference): namely; a survey of clinician opinions.
  4. Discussed in the Clinical Chemistry paper is the use of biological variation based goals for quality targets. Someone needs to explain to me how this could ever be useful.
  5. The analysis is based on proficiency survey materials, which due to the absence of patient interferences (see #1) is a subset of total error.
  6. From I could tell from their NICE reference (#11) in the paper, the authors have inferred that total allowable error should be 0.46% but this did not come from surveying clinicians.
  7. I’m on-board with six sigma in its original use at Motorola. But I don’t see its usefulness in laboratory medicine compared to an error grid.

Follow

Get every new post delivered to your Inbox.