HbA1c – use the right model, please

August 31, 2017

I had occasion to read a paper (CCLM paper) about HbA1c goals and evaluation results. This paper refers to an earlier paper (CC paper) which says that Sigma Metrics should be used for HbA1c.

So here are some problems with all of this.

The CC paper says that TAE (which they use) is derived from bias and imprecision. Now I have many blog entries as well as peer reviewed publications going back to 1991 saying that this approach is flawed. That the authors chose to ignore this prior work doesn’t mean the prior work doesn’t exist – it does – or that it is somehow not relevant – it is.

In the CC paper, controls were used to arrive at conclusions. But real data involves patient samples so the conclusions are not necessarily transferable. And in the CCLM paper, patient samples are used without any mention as to whether the CC paper conclusions still apply.

In the CCLM paper, precision studies, a method comparison, linearity, and interferences were carried out. This is hard to understand since the TAE model of (absolute) average bias + 2x imprecision does not account for either linearity or interference studies.

The linearity study says it followed CLSI EP6 but there are no results to show this (e.g., no reported higher order polynomial regressions). The graphs shown, do look linear.

But the interference studies are more troubling. From what I can make of it, the target values are given ± 10% bands and any candidate interfering substance whose data does not fall outside of these bands is said to not clinically interfere (e.g., the bias is less than absolute 10%). But that does not mean there is no bias! To see how silly this is, one could say if the average bias from regression was less than absolute 10%, it should be set to zero since there was no clinical interference.

The real problem is that the authors’ chosen TAE model cannot account for interferences – such biases are not in their model. But interference biases still contribute to TAE! And what do the reported values of six sigma mean? They are valid only for samples containing no interfering substances. That’s neither practical nor meaningful.

Now one could better model things by adding an interference term to TAE and simulating various patient populations as a function of interfering substances (including the occurrence of multiple interfering substances). But Sigma Metrics, to my knowledge cannot do this.

Another comment is that whereas HbA1c is not glucose, the subject matter is diabetes and in the glucose meter world, error grids are well known as a way to evaluate required clinical performance. But the term “error grid” does not appear in either paper.

Error grids account for the entire range of the assay. It seems that Sigma Metrics are chosen to apply at only one point in the assay.

Advertisements

Antwerp talk about total error

March 12, 2017

Looking at my blog stats, I see that a lot of people are reading the total analytical error vs. total error post. So, below are the slides from a talk that I gave at a conference in Antwerp in 2016 called The “total” in total error. The slides have been updated. Because it is a talk, the slides are not as effective as the talk.

 

 

TotalError


Help with sigma metric analysis

January 27, 2017

dsc_0900edp

I’ve been interested in glucose meter specifications and evaluations. There are three glucose meter specifications sources:

FDA glucose meter guidance
ISO 15197:2013
glucose meter error grids

There are various ways to evaluate glucose meter performance. What I wished to look at was the combination of sigma metric analysis and the error grid. I found this article about the sigma metric analysis and glucose meters.

After looking at this, I understand how to construct these so-called method decision charts (MEDX). But here’s my problem. In these charts, the total allowable error TEa is a constant – this is not the case for TEa for error grids. The TEa changes with the glucose concentration. Moreover, it is not even the same at a specific glucose concentration because the “A” zone limits of an error grid (I’m using the Parkes error grid) are not symmetrical.

I have simulated data with a fixed bias and constant CV throughout the glucose meter range. But with a changing TEa, the estimated sigma also changes with glucose concentration.

So I’m not sure how to proceed.


The problem with the FDA standard explained

October 25, 2016

dsc02196edp

The previous blog entry criticized the updated FDA POCT glucose meter performance standard, which now allows 2% of the results to be unspecified.

What follows is an explanation of why this is wrong. My logic applies to:

  1. Total error performance standards which state that 95% (or 99%) of results should be within stated limits
  2. Measurement uncertainty performance standards which state that 95% (or 99%) of results should be within stated limits
  3. The above FDA standard which states that 98% of results should be within stated limits

One argument that surfaces for allowing results to be unspecified is that one cannot prove that 100% of results are within limits. This is of course true. But here’s the problem of using that fact to allow unspecified results.

Using a glucose meter example, with truth = 30 mg/dL. Assume the glucose meter has a 5% CV and assume that the precision results are normally distributed. One can calculate the location of glucose meter errors using various SD multiples and also note their location in a Parkes error grid and the number of times 1 of these errors due to precision could occur.

Truth SD multiple Observed glucose Parkes grid Occurs 1 in
30 2 33 A zone 20
30 3 34.5 A zone 370
30 8 42 A zone 7E+14
30 22 63 C zone 1E+106

 

(To get an error in the E zone, an extremely dangerous result, would require 90 multiples of the standard deviation, and Excel refuses to tell me how rare this is). I think it’s clear that not specifying a portion of the results is not justified by worrying about precision and / or the normal distribution.

Now errors in higher zones of the Parkes error grid do occur including E zone errors and clearly this has nothing to do with precision. These errors have other causes by other sources such as interferences.

A better way to think of these errors are “attribute” errors – they either occur or don’t occur. For more on this, see: Krouwer JS. Recommendation to treat continuous variable errors like attribute errors. Clinical Chemistry and Laboratory Medicine 2006;44(7):797–798.

Note that one cannot prove that attribute errors won’t occur. But no one allows results to be unspecified the way clinical chemistry standards committees do. For example you don’t hear “we want 98% of surgeries to be performed on the correct organ on the correct patient.”


MU vs TE vs EG

July 29, 2016

DSC01755edp

Picture is aerial view from a Cirrus of Foxwoods casino in CT

MU=measurement uncertainty TE=total error EG=error grid

Having looked at a blog entry by the Westgards, which is always interesting, here are my thoughts.

To recall, MU is a “bottoms-up” way to model error in a clinical chemistry assay (TE uses a “top down” model) and EG has no model at all.

MU is a bad idea for clinical chemistry – Here are the problems with MU:

  1. Unless things have changed, MU doesn’t allow for bias in it modeling process. If a bias is found, it must be eliminated. Yet in the real world, there are many uncorrected biases in assays (calibration bias, interferences).
  2. The modeling required by MU is not practical for a typical clinical chemistry lab. One can view the modeling as having two major components: the biological equations that govern the assay (e.g., Michaelis Menten kinetics) and the instrumentation (e.g., the properties of the syringe that picks up the sample). Whereas clinical chemists may know the biological equations, they won’t have access to the manufacturer’s instrumentation data.
  3. The math required to perform the analysis is extremely complicated.
  4. Some of the errors that occur cannot be modeled (e.g., user errors, manufacturing mistakes, software errors).
  5. The MU result is typically reported as the location of 95% of the results. But one needs to account for 100% of the results.
  6. So some people get the SD for a bunch of controls and call this MU – a joke.

TE has been much more useful than MU, but still has problems:

  1. The Westgard model for TE doesn’t account for some important errors, such as patient interferences.
  2. Other errors that occur (e.g., user errors, manufacturing mistakes, software errors) may be captured by TE but the potential for these errors are often excluded from experiments (e.g., users in these experiments are often more highly trained than typical users).
  3. Although both MU and TE rely on experimental data, TE relies solely on an experiment (method comparison or quality control). There are likely to be biases in the experiment which will cause TE to be underestimated. (See #2).
  4. The TE result is typically reported as the location of 95% of the results. But one needs to account for 100% of the results.
  5. TE is often overstated e.g., the sigma value is said to provide a specific (numeric) quality for patient results. But this is untrue since TE underestimates the true total error.
  6. TE fails to account for the importance of bias. That is, one can have results that are within TE goals but can still cause harm due to bias. Klee has shown this as well as me. For example, bias for a glucose meter can cause diabetic complications but still be within TE goals.

I favor error grids.

EG

  1. Error grids still have the problem that they rely on experimental data and hence there may be bias in the studies.
  2. But 100% of the results are accounted for.
  3. There is the notion of increasing patient harm in EG. With either MU or TE, there is only the concept of harm vs no harm. This is not the real world. A glucose meter result of 95 mg/dL (truth=160 mg/dL) has much less harm than a glucose meter result of 350 mg/dl (truth=45 mg/dL).
  4. EG simply plots test vs. reference. There are no models (but there is no way to tell the origin of the error source).

Published – my one man Milan Conference

March 23, 2016

RijksMuseumT

Having read the consensus statement and all the papers from the Milan conference (available without subscription), I prepared my version of this for the Antwerp conference. This talk contained the following:

  • A description of why the Westgard model for total error is incomplete (with of course Jim Westgard sitting in the audience)
  • A description of why expanded total error models are nevertheless also incomplete
  • A critique of Boyd and Bruns’ glucose meter performance simulations using the Westgard model
  • A critique of the ISO and CLSI glucose meter specifications, both based on total error
  • A description of what the companies with most of the market share in glucose meters did, when they started to lose market share
  • How Ciba Corning specified and evaluated performance
  • What I currently recommend

I submitted a written version of this talk to Clin Chem and Lab Medicine, with recommended reviewers being Milan authors with whom I disagreed. (The journal asks authors to recommend reviewers). Now I don’t know who the reviewers were, but suffice it to say that they didn’t like my paper at all. So after several revisions, I scaled back my paper to its current version, which is here (subscription required).


Hemoglobin A1c quality targets

March 16, 2015

KPSF3edp

There is a new article in Clinical Chemistry about a complicated (to me) analysis of quality targets for A1c when it would seem that a simple error grid – prepared by surveying clinicians would fit the bill.

Thus, this paper has problems. They are:

  1. The total error model is limited to average bias and imprecision. Error from interferences, user error, or other sources is not included. It is unfortunate to call this “total” error, since there is nothing total about it.
  2. A pass fail system is mentioned, which is dichotomous and unlike an error grid which allows for varying degrees of error with respect to severity of harm to patients.
  3. A hierarchy of possible goals are mentioned. This comes from a 1999 conference. But there is really only one way to set patient goals (listed near the top of the 1999 conference): namely; a survey of clinician opinions.
  4. Discussed in the Clinical Chemistry paper is the use of biological variation based goals for quality targets. Someone needs to explain to me how this could ever be useful.
  5. The analysis is based on proficiency survey materials, which due to the absence of patient interferences (see #1) is a subset of total error.
  6. From I could tell from their NICE reference (#11) in the paper, the authors have inferred that total allowable error should be 0.46% but this did not come from surveying clinicians.
  7. I’m on-board with six sigma in its original use at Motorola. But I don’t see its usefulness in laboratory medicine compared to an error grid.