My article “Interferences, a neglected error source for clinical assays” has been published. This article may be viewed using the following link https://rdcu.be/L6O2
There is a recent article which says that measurement uncertainty should contain a term for biological variation. The rationale is that diagnostic uncertainty is caused in part by biological variation. My concerns are with how biological variation is turned into goals.
On the Westgard web site, there are some formulas on how to convert biological variation into goals and on another page, there is a list of analytes with biological variation entries and total error goals.
Here are my concerns:
- There are three basic uses of diagnostic tests: screening, diagnosis, and monitoring. It is not clear to me what the goals refer to.
- Monitoring is an important use of diagnostic tests. It makes no sense to construct a total error goal for monitoring that takes between patient biological variation into account. The PSA total error goal is listed at 33.7%. Example: For a patient tested every 3 months after undergoing radiation therapy, a total error goal of 33.7% is too big. Thus, for values of 1.03, 0.94, 1.02, and 1.33, the last value is within goals but in reality would be cause for alarm.
- The web site listing goals has only one goal per assay. Yet, goals often depend on the analyte value, especially for monitoring. For example the glucose goal is listed at 6.96%. But if one examples a Parkes glucose meter error grid, at 200 mg/dL, the error goal to separate harm from no harm is 25%. Hence, the biological goal is too small.
- The formulas on the web site are hard to believe. For example, I < 0.5 * within person biological variation. Why 0.5, and why is it the same for all analytes?
- Biological variation can be thought to have two sources of variation – explained and unexplained – much like in a previous entry where the measured imprecision could be not just random error, but inflated with biases. Thus, PSA could rise due to asymptomatic prostatitis (a condition that by definition that has no symptoms and could be part of a “healthy” cohort). Have explained sources of variation been excluded from the databases? And there can be causes of explained variation other than diseases. For example, exercise can cause PSA to rise in an otherwise healthy person.
- Biological variation makes no sense for a bunch of analytes. For example, blood lead measures exposure to lead. Without lead in the environment, the blood lead would be zero. Similar arguments apply to drugs of abuse and infectious diseases.
- The goals are based on 95% limits from a normal distribution. This leaves up to 5% of results as unspecified. Putting things another way, up to 5% of results could cause serious problems for an assay that meets goals.
There has been some recent discussion about the differences between total error and measurement uncertainty, regarding which is better and which should be used. Rather than rehash the differences, let’s examine some similarities:
1. Both specifications are probability based.
2. Both are models
Being probability based is the bigger problem. If you specify limits for a high percentage of results (say 95% or 99%), then either 5% or 1% of results are unspecified. If all of the unspecified results caused problems this would be a disaster, when one considers how many tests are performed in a lab. There are instances of medical errors due to lab test error but these are (probably?) rare (meaning much less than 5% or 1%). But the point is probability based specifications cannot account for 100% of the results because the limits would include minus infinity to plus infinity.
The fact that both total error and measurement uncertainty are models is only a problem because the models are incorrect. Rather than rehash why, here’s a simple solution to both problems.
Add to the specification (either total error or measurement uncertainty) the requirement that zero results are allowed beyond a set of limits. To clarify, there are two sets of limits, an inner set to contain 95% or 99% of results and an outer set of limits for which no results should exceed.
Without this addition, one cannot claim that meeting either a total error or measurement uncertainty specification will guarantee quality of results, where quality means that the lab result will not lead to a medical error.
So here are some problems with all of this.
The CC paper says that TAE (which they use) is derived from bias and imprecision. Now I have many blog entries as well as peer reviewed publications going back to 1991 saying that this approach is flawed. That the authors chose to ignore this prior work doesn’t mean the prior work doesn’t exist – it does – or that it is somehow not relevant – it is.
In the CC paper, controls were used to arrive at conclusions. But real data involves patient samples so the conclusions are not necessarily transferable. And in the CCLM paper, patient samples are used without any mention as to whether the CC paper conclusions still apply.
In the CCLM paper, precision studies, a method comparison, linearity, and interferences were carried out. This is hard to understand since the TAE model of (absolute) average bias + 2x imprecision does not account for either linearity or interference studies.
The linearity study says it followed CLSI EP6 but there are no results to show this (e.g., no reported higher order polynomial regressions). The graphs shown, do look linear.
But the interference studies are more troubling. From what I can make of it, the target values are given ± 10% bands and any candidate interfering substance whose data does not fall outside of these bands is said to not clinically interfere (e.g., the bias is less than absolute 10%). But that does not mean there is no bias! To see how silly this is, one could say if the average bias from regression was less than absolute 10%, it should be set to zero since there was no clinical interference.
The real problem is that the authors’ chosen TAE model cannot account for interferences – such biases are not in their model. But interference biases still contribute to TAE! And what do the reported values of six sigma mean? They are valid only for samples containing no interfering substances. That’s neither practical nor meaningful.
Now one could better model things by adding an interference term to TAE and simulating various patient populations as a function of interfering substances (including the occurrence of multiple interfering substances). But Sigma Metrics, to my knowledge cannot do this.
Another comment is that whereas HbA1c is not glucose, the subject matter is diabetes and in the glucose meter world, error grids are well known as a way to evaluate required clinical performance. But the term “error grid” does not appear in either paper.
Error grids account for the entire range of the assay. It seems that Sigma Metrics are chosen to apply at only one point in the assay.
Although the blog has an eclectic range of topics, one unifying theme for many entries is specifications, how to set them and how to evaluate them.
A few years ago, I was working on a hematology analyzer, which has a multitude of reported parameters. The company was evaluating parameters with the usual means of precision studies and accuracy using regression. I asked them:
- a) what are the limits that, when differences from reference are contained within these limits, will ensure that no wrong medical decisions would be made based on the reported result (resulting in patient harm) and
- b) what are the (wider) limits that, when differences from reference are contained within these limits, will ensure that no wrong medical decisions would be made based on the reported result (resulting in severe patient harm)
This was a way of asking for an error grid for each parameter. I believe, then and now, that constructing an error grid is the best way to set specifications for any assay.
As an example about the importance of specifications there was a case for which I was an expert witness whereby the lab had produced an incorrect result that led to patent harm. The lab’s defense was that they had followed all procedures. Thus, as long as they as followed procedures, they were not to blame. But procedures, which contain specifications, are not always adequate. As an example, remember the CMS program “equivalent quality control”?
Looking at my blog stats, I see that a lot of people are reading the total analytical error vs. total error post. So, below are the slides from a talk that I gave at a conference in Antwerp in 2016 called The “total” in total error. The slides have been updated. Because it is a talk, the slides are not as effective as the talk.
I’ve been interested in glucose meter specifications and evaluations. There are three glucose meter specifications sources:
FDA glucose meter guidance
glucose meter error grids
There are various ways to evaluate glucose meter performance. What I wished to look at was the combination of sigma metric analysis and the error grid. I found this article about the sigma metric analysis and glucose meters.
After looking at this, I understand how to construct these so-called method decision charts (MEDX). But here’s my problem. In these charts, the total allowable error TEa is a constant – this is not the case for TEa for error grids. The TEa changes with the glucose concentration. Moreover, it is not even the same at a specific glucose concentration because the “A” zone limits of an error grid (I’m using the Parkes error grid) are not symmetrical.
I have simulated data with a fixed bias and constant CV throughout the glucose meter range. But with a changing TEa, the estimated sigma also changes with glucose concentration.
So I’m not sure how to proceed.