Who influences CMS and CDC?

March 23, 2019

A recent editorial disagrees with the proposed CLIA limits for HbA1c provided by CMS and CDC (The Need for Accuracy in Hemoglobin A1c Proficiency Testing: Why the Proposed CLIA Rule of 2019 Is a Step Backward) online in J Diabetes Science and Technology. The proposed CLIA limits are ± 10% – the NGSP limits are 5%, and the CAP limits 6%. Reading the Federal Register, I don’t understand the basis of the 10%.

This reminds me of another CMS decree in the early 2000s – Equivalent Quality Control. Under this program, a lab director could run quality control for 10 days as well as the automated internal quality checks and decide whether the two were equivalent. If the answer was yes, the frequency of quality control could be reduced to once a month. This made no sense!


Minimum system accuracy performance criteria – part 2

February 13, 2019

I had occasion to read the ISO 15197:2013 standard about blood glucose meters Section 6.3.3 “minimum system accuracy performance criteria.

Note that this accuracy requirement is what is typically cited as the accuracy requirement for glucose meters.

But the two Notes in this section say that testing meters with actual users is tested elsewhere in the document (section 8). Thus, because of the protocol used, the system accuracy estimate does not account for all errors since user errors are excluded. Hence, the system accuracy requirement is not the total error of the meter but rather a subset of total error.

Moreover, in the user test section, the acceptance goals are different from the system accuracy section!

Ok, I get it. The authors of the standard want to separate two major error sources: error from the instrument and reagents (the system error) and errors caused by users.

But there is no attempt to reconcile the two estimates. And if one considers the user test as a total error test, which is reasonable (e.g., it includes system accuracy and user error), then the percentage of results that must meet goals is 95%. The 99% requirement went poof.

 


Review of setting goals (to determine if the estimated total error is acceptable)

February 7, 2019

The last post described ways to estimate total error. But the reason total error is estimated is to determine if it meets goals. This post describes how to set goals.

Consider the following scenario. A clinician is deciding on a treatment for a patient. Among the criteria used to make that decision are the patient’s history, the physical exam, and one or more blood tests or images. Given the other criteria and a specific blood test with value A, the clinician will decide on a treatment (which may include no treatment). Now assume the blood test’s value keeps diverging from value A. At some point, call it value B, the clinician will make a different treatment decision. If the value B is an error, then it is reasonable to assume that the magnitude of error (B-A) is enough to cause the wrong medical decision by the clinician based on test error. Thus, just under the magnitude B-A is a reasonable error limit. There are a bunch of other assumptions…

  1. The clinician’s decision conforms to acceptable medical practice.
  2. A wrong decision usually causes harm to the patient.
  3. Larger errors may cause different decisions leading to greater harm to the patient.
  4. Although all patients are unique, one can describe a “typical” patient for a disease.
  5. Although all clinicians are unique, most clinicians will make the same decision within a narrow enough distribution of errors so that one can use the average error as the limit.
  6. Given the X-Y space for the range of the test, where X=truth and Y=the candidate medical test, the entire space can be designated with error limits.
  7. It is common (given #6) that there will be multiple limits with different levels of patient harm throughout the range of the medical test.

All of the above can be satisfied by an error grid such as the glucose meter error grid. The error grid should work for any assay.

Note that many conventional error limits are not as comprehensive because …

  1. They use one limit for the entire range of the assay
  2. They do not take into account greater harm for larger errors.
  3. They are not always based on patient results but on controls (e.g., CLIA limits).

Given the above discussion, setting limits using biological variability or state of the art is not relevant to answering the question of what magnitude of error will cause a clinician to make an incorrect medical decision. The only reasonable way to answer the question is to ask clinicians. An example of this was done for glucose meters (1).

A total error specification could easily be improved by adding to it:

  1. A limit for the average bias (2)
  2. A limit (greater than the total error limit) where there should be no observations, making the total error specification similar to an error grid.

Adding a limit for the average bias would also improve an error grid (3).

References

  1. Klonoff DC, Lias C, Vigersky R, et al The surveillance error grid. J Diabetes Sci Technol. 2014;8:658-672.
  2. Klee GG, Schryver PG, Kisbeth RM. Analytic bias specifications based on the analysis of effects on performance of medical guidelines. Scand J Clin Lab Invest. 1999;59:509-512.
  3. Jan S Krouwer and George S. Cembrowski: The chronic injury glucose error grid. A tool to reduce diabetes complications. Journal of Diabetes Science and Technology, 2015;9:149-152.

Mandel and Westgard

May 20, 2018

Readers may know that I been known to critique Westgard’s total error model.

But let’s step it back to 1964 with Mandel’s representation of total error (1), where:

Total Error (TE) = x-R = (x-mu) + (mu-R) with

x= the sample measurement
R=the reference value and
mu=the population mean of the sample

Thus, mu-R is the bias and x-mu the imprecision – the same as the Westgard model. There is an implicit assumption that the replicates of x which estimate mu are only affected by random error. For example, if the observations of the replicates contain drift, the Mandel model would be incorrect. For replicates sampled close in time, this is a reasonable assumption, although it is rarely if ever tested.

Interferences are not a problem because even if they exist, there is only one sample. Thus, interference bias is mixed in with any other biases in the sample.

Total error is often expressed for 95% of the results. I have argued that 5% of results are unspecified but if the assumption of random error is true for the repeated measurements, this is not a problem because these results come from a Normal distribution. Thus, the probability is extremely remote that high multiples of the standard deviation will occur.

But outliers are a problem. Typically for these studies, outliers (if found) are deleted because they will perturb the estimates – the problem is the outliers are usually not dealt with and now the 5% unspecified results becomes a problem.

If no outliers are observed, this is a good thing but here are some 95% confidence levels for the maximum outlier rate given the number of sample replicates indicated where 0 outliers have been found.

N                             Maximum outlier rate (95% CI)

10                           25.9%
100                         3.0%
1,000                      0.3%

So if one is measuring TE for a control or patient pool and keeping the time between replicates short, then the Westgard model estimate of total error is reasonable, although one still has to worry about outliers.

But when one applies the Westgard model to patient samples, it is no longer correct since each patient sample can have a different amount of interference bias. And while large interferences are rare, interferences can come in small amounts and affect every sample – inflating the total error. Moreover, other sources of bias can be expected with patient samples, such as user error in sample preparation. And with patient samples, outliers while still rare, can occur.

This raises the question as to the interpretation of results from a study that uses the Westgard model (such as a Six Sigma study). These studies typically use controls but the implication is that they inform about the quality of the assay – meaning of course for patient samples. This is a problem for the reasons stated above. So one can say that if an assay has a bad six sigma value, the assay has a problem, but if the assay has a good six sigma value, one cannot say the assay is without problems.

 

Reference

  1. Mandel J. The statistical analysis of experimental data Dover, NY 1964, p 105.

 


When large lab errors don’t cause bad patient outcomes

April 21, 2018

In the Milan conference, the preferred specification is the effect of assay error on patient outcomes. This seems reasonable enough but consider the following two cases.

Case 1, a glucose meter reads 350 mg/dL, truth is 50 mg/dL; the clinician administers insulin resulting in severe harm to the patient.

Case 2, a glucose meter reads 350 mg/dL, truth is 50 mg/dL; the clinician questions the result and repeats the test. The second test is 50 mg/dL; the clinician administers sugar resulting in no harm to the patient.

One must realize that lab tests by themselves cannot cause harm to patients; only clinicians can cause harm by making an incorrect medical decision based in part on a lab test. The lab test in cases 1 and 2 has the potential (a high potential) to result in patient harm. Case 2 could also be considered a near miss. From a performance vs. specification standpoint, both cases should be treated equally in spite of different patient outcomes.

Thus, the original Milan statement should really be the effect of assay error on potential patient outcomes.


New publication about interferences

April 20, 2018

My article “Interferences, a neglected error source for clinical assays” has been published. This article may be viewed using the following link https://rdcu.be/L6O2


Performance specifications, lawsuits, and irrelevant statistics

March 11, 2018

Readers of this blog know that I’m in favor of specifications that account for 100% of the results. The danger of specifications that are for 95% or 99% of the results is that errors can occur that cause serious patient harm for assays that meet specifications! Large and harmful errors are rare and certainly less than 1%. But hospitals might not want specifications that account for 100% of results (and remember that hospital clinical chemists populate standards committees). A potential reason is that if a large error occurs, the 95% or 99% specification can be an advantage for a hospital if there is a lawsuit.

I’m thinking of an example where I was an expert witness. Of course, I can’t go into the details but this was a case where there was a large error, the patient was harmed, and the hospital lab was clearly at fault. (In this case it was a user error). The hospital lab’s defense was that they followed all procedures and met all standards, e.g., sorry but stuff happens.

As for irrelevant statistics, I’ve heard two well-known people in the area of diabetes (Dr. David B Sachs and Dr. Andreas Pfützner) say in public meetings that one should not specify glucose meter performance for 100% of the results because one can never prove that the number of large errors is zero.

That one can never prove that the number of large errors is zero is true but this does not mean one should abandon a specification for 100% of the results.

Here, I’m reminded of blood gas. For blood gas, obtaining a result is critical. Hospital labs realize that blood gas instruments can break down and fail to produce a result. Since this is unacceptable, one can calculate the failure rate and reduce the risk of no result with redundancy (meaning using multiple instruments). No matter how many instruments are used, the possibility that all instruments will fail at the same time is not zero!

A final problem with not specifying 100% of the results is that it may cause labs to not put that much thought into procedures to minimize the risk of large errors.

And in industry (at least at Ciba-Corning) we always had specifications for 100% of the results, as did the original version of the CLSI total error document, EP21-A (this was dropped in the A2 version).