Risk based SQC – What does it really mean

December 4, 2017

Having just read a paper on risk based SQC, here are my thoughts…

CLSI has recently adopted a risk management theme for some of their standards. The fact that Westgard has jumped on the risk management bandwagon is as we say in Boston, wicked smaaht.

But what does this really mean and is it useful?

SQC as described in the Westgard paper is performed to prevent patient results from exceeding an allowable total error (TEa). To recall, TEa = |bias|/SD*1.65. I have previously commented that this model does not account for all error sources, especially for QC samples. But for the moment, let’s assume that the only error sources are average bias and imprecision. The remaining problem with TEa is that it is always given as a percentage of results, usually 95%. So if some SQC procedure were to just meet their quality requirement, up to 5% of patient results could exceed their TEa and potentially cause medical errors. This is 1 in every 20 results! I don’t see how this is a good thing even if one were to use a 99% TEa.

The problem is one of “spin.” SQC, while valuable, does not guarantee the quality of patient results. The laboratory testing process is like a factory process and with any such process, to be useful it must be in control (meaning in statistical quality control). Thus, SQC helps to guard against an out of control process. To be fair, if the process were out of control, patient sample results might exceed TEa.

The actual risk of medical errors due to lab error is a function not only of an out of control process but also due to all other error sources not accounted for by QC, such as user errors with patient samples (as opposed to QC samples), patient interferences, and so on. Hence, to say that risk based SQC can address the quality of patient results is “spin.” SQC is a process control tool – nothing more and nothing less.

And the best way of running SQC would be for a manufacturer to assess results from all laboratories.

Now some people might think, this is a nit-piking post but here is an additional point. One might be lulled into thinking that with this risk based SQC that labs don’t have to worry about bad results. But interferences can cause large errors that can cause medical errors. For example, in the maltose problem for glucose meters, 6 of 13 deaths occurred after an FDA warning. And recently, there have been concerns about biotin interference in immunoassays. So it’s not good to oversell SQC, since people might loose focus on other, important issues.

Advertisements

HbA1c – use the right model, please

August 31, 2017

I had occasion to read a paper (CCLM paper) about HbA1c goals and evaluation results. This paper refers to an earlier paper (CC paper) which says that Sigma Metrics should be used for HbA1c.

So here are some problems with all of this.

The CC paper says that TAE (which they use) is derived from bias and imprecision. Now I have many blog entries as well as peer reviewed publications going back to 1991 saying that this approach is flawed. That the authors chose to ignore this prior work doesn’t mean the prior work doesn’t exist – it does – or that it is somehow not relevant – it is.

In the CC paper, controls were used to arrive at conclusions. But real data involves patient samples so the conclusions are not necessarily transferable. And in the CCLM paper, patient samples are used without any mention as to whether the CC paper conclusions still apply.

In the CCLM paper, precision studies, a method comparison, linearity, and interferences were carried out. This is hard to understand since the TAE model of (absolute) average bias + 2x imprecision does not account for either linearity or interference studies.

The linearity study says it followed CLSI EP6 but there are no results to show this (e.g., no reported higher order polynomial regressions). The graphs shown, do look linear.

But the interference studies are more troubling. From what I can make of it, the target values are given ± 10% bands and any candidate interfering substance whose data does not fall outside of these bands is said to not clinically interfere (e.g., the bias is less than absolute 10%). But that does not mean there is no bias! To see how silly this is, one could say if the average bias from regression was less than absolute 10%, it should be set to zero since there was no clinical interference.

The real problem is that the authors’ chosen TAE model cannot account for interferences – such biases are not in their model. But interference biases still contribute to TAE! And what do the reported values of six sigma mean? They are valid only for samples containing no interfering substances. That’s neither practical nor meaningful.

Now one could better model things by adding an interference term to TAE and simulating various patient populations as a function of interfering substances (including the occurrence of multiple interfering substances). But Sigma Metrics, to my knowledge cannot do this.

Another comment is that whereas HbA1c is not glucose, the subject matter is diabetes and in the glucose meter world, error grids are well known as a way to evaluate required clinical performance. But the term “error grid” does not appear in either paper.

Error grids account for the entire range of the assay. It seems that Sigma Metrics are chosen to apply at only one point in the assay.


Antwerp talk about total error

March 12, 2017

Looking at my blog stats, I see that a lot of people are reading the total analytical error vs. total error post. So, below are the slides from a talk that I gave at a conference in Antwerp in 2016 called The “total” in total error. The slides have been updated. Because it is a talk, the slides are not as effective as the talk.

 

 

TotalError


Help with sigma metric analysis

January 27, 2017

dsc_0900edp

I’ve been interested in glucose meter specifications and evaluations. There are three glucose meter specifications sources:

FDA glucose meter guidance
ISO 15197:2013
glucose meter error grids

There are various ways to evaluate glucose meter performance. What I wished to look at was the combination of sigma metric analysis and the error grid. I found this article about the sigma metric analysis and glucose meters.

After looking at this, I understand how to construct these so-called method decision charts (MEDX). But here’s my problem. In these charts, the total allowable error TEa is a constant – this is not the case for TEa for error grids. The TEa changes with the glucose concentration. Moreover, it is not even the same at a specific glucose concentration because the “A” zone limits of an error grid (I’m using the Parkes error grid) are not symmetrical.

I have simulated data with a fixed bias and constant CV throughout the glucose meter range. But with a changing TEa, the estimated sigma also changes with glucose concentration.

So I’m not sure how to proceed.


The problem with the FDA standard explained

October 25, 2016

dsc02196edp

The previous blog entry criticized the updated FDA POCT glucose meter performance standard, which now allows 2% of the results to be unspecified.

What follows is an explanation of why this is wrong. My logic applies to:

  1. Total error performance standards which state that 95% (or 99%) of results should be within stated limits
  2. Measurement uncertainty performance standards which state that 95% (or 99%) of results should be within stated limits
  3. The above FDA standard which states that 98% of results should be within stated limits

One argument that surfaces for allowing results to be unspecified is that one cannot prove that 100% of results are within limits. This is of course true. But here’s the problem of using that fact to allow unspecified results.

Using a glucose meter example, with truth = 30 mg/dL. Assume the glucose meter has a 5% CV and assume that the precision results are normally distributed. One can calculate the location of glucose meter errors using various SD multiples and also note their location in a Parkes error grid and the number of times 1 of these errors due to precision could occur.

Truth SD multiple Observed glucose Parkes grid Occurs 1 in
30 2 33 A zone 20
30 3 34.5 A zone 370
30 8 42 A zone 7E+14
30 22 63 C zone 1E+106

 

(To get an error in the E zone, an extremely dangerous result, would require 90 multiples of the standard deviation, and Excel refuses to tell me how rare this is). I think it’s clear that not specifying a portion of the results is not justified by worrying about precision and / or the normal distribution.

Now errors in higher zones of the Parkes error grid do occur including E zone errors and clearly this has nothing to do with precision. These errors have other causes by other sources such as interferences.

A better way to think of these errors are “attribute” errors – they either occur or don’t occur. For more on this, see: Krouwer JS. Recommendation to treat continuous variable errors like attribute errors. Clinical Chemistry and Laboratory Medicine 2006;44(7):797–798.

Note that one cannot prove that attribute errors won’t occur. But no one allows results to be unspecified the way clinical chemistry standards committees do. For example you don’t hear “we want 98% of surgeries to be performed on the correct organ on the correct patient.”


MU vs TE vs EG

July 29, 2016

DSC01755edp

Picture is aerial view from a Cirrus of Foxwoods casino in CT

MU=measurement uncertainty TE=total error EG=error grid

Having looked at a blog entry by the Westgards, which is always interesting, here are my thoughts.

To recall, MU is a “bottoms-up” way to model error in a clinical chemistry assay (TE uses a “top down” model) and EG has no model at all.

MU is a bad idea for clinical chemistry – Here are the problems with MU:

  1. Unless things have changed, MU doesn’t allow for bias in it modeling process. If a bias is found, it must be eliminated. Yet in the real world, there are many uncorrected biases in assays (calibration bias, interferences).
  2. The modeling required by MU is not practical for a typical clinical chemistry lab. One can view the modeling as having two major components: the biological equations that govern the assay (e.g., Michaelis Menten kinetics) and the instrumentation (e.g., the properties of the syringe that picks up the sample). Whereas clinical chemists may know the biological equations, they won’t have access to the manufacturer’s instrumentation data.
  3. The math required to perform the analysis is extremely complicated.
  4. Some of the errors that occur cannot be modeled (e.g., user errors, manufacturing mistakes, software errors).
  5. The MU result is typically reported as the location of 95% of the results. But one needs to account for 100% of the results.
  6. So some people get the SD for a bunch of controls and call this MU – a joke.

TE has been much more useful than MU, but still has problems:

  1. The Westgard model for TE doesn’t account for some important errors, such as patient interferences.
  2. Other errors that occur (e.g., user errors, manufacturing mistakes, software errors) may be captured by TE but the potential for these errors are often excluded from experiments (e.g., users in these experiments are often more highly trained than typical users).
  3. Although both MU and TE rely on experimental data, TE relies solely on an experiment (method comparison or quality control). There are likely to be biases in the experiment which will cause TE to be underestimated. (See #2).
  4. The TE result is typically reported as the location of 95% of the results. But one needs to account for 100% of the results.
  5. TE is often overstated e.g., the sigma value is said to provide a specific (numeric) quality for patient results. But this is untrue since TE underestimates the true total error.
  6. TE fails to account for the importance of bias. That is, one can have results that are within TE goals but can still cause harm due to bias. Klee has shown this as well as me. For example, bias for a glucose meter can cause diabetic complications but still be within TE goals.

I favor error grids.

EG

  1. Error grids still have the problem that they rely on experimental data and hence there may be bias in the studies.
  2. But 100% of the results are accounted for.
  3. There is the notion of increasing patient harm in EG. With either MU or TE, there is only the concept of harm vs no harm. This is not the real world. A glucose meter result of 95 mg/dL (truth=160 mg/dL) has much less harm than a glucose meter result of 350 mg/dl (truth=45 mg/dL).
  4. EG simply plots test vs. reference. There are no models (but there is no way to tell the origin of the error source).

Published – my one man Milan Conference

March 23, 2016

RijksMuseumT

Having read the consensus statement and all the papers from the Milan conference (available without subscription), I prepared my version of this for the Antwerp conference. This talk contained the following:

  • A description of why the Westgard model for total error is incomplete (with of course Jim Westgard sitting in the audience)
  • A description of why expanded total error models are nevertheless also incomplete
  • A critique of Boyd and Bruns’ glucose meter performance simulations using the Westgard model
  • A critique of the ISO and CLSI glucose meter specifications, both based on total error
  • A description of what the companies with most of the market share in glucose meters did, when they started to lose market share
  • How Ciba Corning specified and evaluated performance
  • What I currently recommend

I submitted a written version of this talk to Clin Chem and Lab Medicine, with recommended reviewers being Milan authors with whom I disagreed. (The journal asks authors to recommend reviewers). Now I don’t know who the reviewers were, but suffice it to say that they didn’t like my paper at all. So after several revisions, I scaled back my paper to its current version, which is here (subscription required).