Total error, EP21, and vindication

February 18, 2018

To recall, total analytical error was proposed by Westgard in 1974. It made a lot of sense to me and I proposed to CLSI that a total analytical error standard should be written. This proposal was approved and I formed a subcommittee which I chaired and in 2003, the CLSI standard EP21-A, which is about total analytical error was published.

When it was time to revise the standard – all standards are considered for revision – I realized that the standard had some flaws. Although the original Westgard article was specific to total analytical error, it seemed that to a clinician, any error that contributed to the final result was important regardless of its source. And for me, who often worked in blood gas evaluations, user error was an important contribution to total error.

Hence, I suggested the revision to be about total error, not total analytical error and EP21-A2 drafts had total error in the title. There were some people within the subcommittee and particularly one or two people not on the subcommittee but in CLSI management, who hated the idea, threw me off my own subcommittee and ultimately out of CLSI.

But recently (in 2018) a total error task force published an article which contained the statement, to which I have previously referred:

Lately, efforts have been made to expand the TAE concept to the evaluation of results of patient samples, including all phases of the total testing process.” (I put in the bolding).

Hence, I’m hoping that the next revision, EP21-A3 will be about total error, not total analytical error.

Advertisements

An observation from the ATTD glucose Conference

February 14, 2018

The 11th International Conference on Advanced Technologies and Treatments for Diabetes (ATTD) is underway in Vienna, Austria. The abstracts from the conference are available here. Here’s an interesting observation: I searched for the term MARD and it was found 48 times whereas the term error grid was found only 10 times. I published a paper describing problems with the MARD statistic and offered alternatives.


Comments about clinical chemistry goals based on biological variation – Revised Feb. 7, 2018

February 5, 2018

There is a recent article which says that measurement uncertainty should contain a term for biological variation. The rationale is that diagnostic uncertainty is caused in part by biological variation. My concerns are with how biological variation is turned into goals.

On the Westgard web site, there are some formulas on how to convert biological variation into goals and on another page, there is a list of analytes with biological variation entries and total error goals.

Here are my concerns:

  1. There are three basic uses of diagnostic tests: screening, diagnosis, and monitoring. It is not clear to me what the goals refer to.
  2. Monitoring is an important use of diagnostic tests. It makes no sense to construct a total error goal for monitoring that takes between patient biological variation into account. The PSA total error goal is listed at 33.7%. Example: For a patient tested every 3 months after undergoing radiation therapy, a total error goal of 33.7% is too big. Thus, for values of 1.03, 0.94, 1.02, and 1.33, the last value is within goals but in reality would be cause for alarm.
  3. The web site listing goals has only one goal per assay. Yet, goals often depend on the analyte value, especially for monitoring. For example the glucose goal is listed at 6.96%. But if one examples a Parkes glucose meter error grid, at 200 mg/dL, the error goal to separate harm from no harm is 25%. Hence, the biological goal is too small.
  4. The formulas on the web site are hard to believe. For example, I < 0.5 * within person biological variation. Why 0.5, and why is it the same for all analytes?
  5. Biological variation can be thought to have two sources of variation – explained and unexplained – much like in a previous entry where the measured imprecision could be not just random error, but inflated with biases. Thus, PSA could rise due to asymptomatic prostatitis (a condition that by definition that has no symptoms and could be part of a “healthy” cohort). Have explained sources of variation been excluded from the databases? And there can be causes of explained variation other than diseases. For example, exercise can cause PSA to rise in an otherwise healthy person.
  6. Biological variation makes no sense for a bunch of analytes. For example, blood lead measures exposure to lead. Without lead in the environment, the blood lead would be zero. Similar arguments apply to drugs of abuse and infectious diseases.
  7. The goals are based on 95% limits from a normal distribution. This leaves up to 5% of results as unspecified. Putting things another way, up to 5% of results could cause serious problems for an assay that meets goals.

A simple improvement to total error and measurement uncertainty

January 15, 2018

There has been some recent discussion about the differences between total error and measurement uncertainty, regarding which is better and which should be used. Rather than rehash the differences, let’s examine some similarities:

1.       Both specifications are probability based.
2.       Both are models

Being probability based is the bigger problem. If you specify limits for a high percentage of results (say 95% or 99%), then either 5% or 1% of results are unspecified. If all of the unspecified results caused problems this would be a disaster, when one considers how many tests are performed in a lab. There are instances of medical errors due to lab test error but these are (probably?) rare (meaning much less than 5% or 1%). But the point is probability based specifications cannot account for 100% of the results because the limits would include minus infinity to plus infinity.

The fact that both total error and measurement uncertainty are models is only a problem because the models are incorrect. Rather than rehash why, here’s a simple solution to both problems.

Add to the specification (either total error or measurement uncertainty) the requirement that zero results are allowed beyond a set of limits. To clarify, there are two sets of limits, an inner set to contain 95% or 99% of results and an outer set of limits for which no results should exceed.

Without this addition, one cannot claim that meeting either a total error or measurement uncertainty specification will guarantee quality of results, where quality means that the lab result will not lead to a medical error.


Calculating measurement uncertainty and GUM

October 16, 2017

A recent article (subscription required) suggests how to estimate measurement uncertainty for an assay to satisfy the requirements of ISO 15189.

As readers may know, I am neither a fan of ISO nor measurement uncertainty. The formal document, GUM – The Guide to the Expression of Uncertainty in Measurement will make most clinical chemists heads spin. Let’s review how to estimate uncertainty according to GUM.

  1. Identify each item in an assay that can cause uncertainty and estimate its imprecision. For example a probe picks up some patient sample. The amount of sample taken varies due to imprecision of the sampling mechanism.
  2. Any bias found must be eliminated. There is imprecision in the elimination of the bias. Hence bias has been transformed into imprecision.
  3. Combine all sources of imprecision into a BHE (big hairy equation – my term, not GUMs).
  4. The final estimate of uncertainty is governed by a coverage factor. Thus, an uncertainty interval for 99% is wider than one for 95%. Remember that an uncertainty interval for 100% is minus infinity to plus infinity.

The above Clin Chem Lab Med article calculates uncertainty by mathematically summing imprecision of controls and bias from external surveys. This is of course light years away from GUM. The fact that the authors call this measurement uncertainty could confuse some to think that this is the same as GUM.

Remember that in the authors’ approach, there are no patient samples. Thus, the opportunity for errors due to interferences has been eliminated. Moreover, patient samples can have errors that controls do not. Measurement uncertainty must include errors from the entire measurement process, not just the analytical error.

Perhaps the biggest problem is that a clinician may look at such an uncertainty interval as truth, when the likely true interval will be wider and sometimes much wider.


Comparison of company vs. standards organization specifications

April 11, 2017

For almost all of my career, I’ve been working to determine performance specifications for assays, including the protocol and data analysis methods to see if performance has been met. This work has been performed mainly for companies but occasionally also for standards groups. There are some big differences.

Within a company, the specifications are very important:

If the product is released too soon, before the required performance has been met, the product may be recalled, patients may suffer harm, and overall the company may suffer financially.

If the product is released too late, the company will definitely suffer financially as “time to market” has been shown in financial models to be a key success factor in achieving profit goals.

Company specifications are built around two main factors – what performance is competitive and how can the company be sure that no patients will be harmed. In my experience this has simply led to two goals – 95% of the differences between the company assay and reference should be within limits which guarantee a competitive assay and no differences should be large enough to cause patient harm (a clinical standard).

Standards groups seem to have a different outlook. Without being overly cynical, the standards adopted are often to guarantee that no company’s assay will fail the specification. Thus, 95% of differences between the assay and reference should be within these limits. There is almost never a mention about larger errors which may cause patient harm.

Thus, it is somewhat ironic that company specifications are usually more difficult to achieve then specifications published by the standards organizations.


EFLM – after three years it’s disappointing

February 15, 2017

dsc_0828edp

Thanks to Sten Westgard, whose website alerted me to an article about analytical performance specifications. Thanks also to Clin Chem Lab Med for making this article available without a subscription.

To recall, the EFLM task group was going to fill in details about performance specifications as initially described by the Milan conference held in 2014.

Basically, what this paper does is to assign analytes (not all analytes that can be measured but a subset) to one of three categories for how to arrive at analytical performance specifications: clinical outcomes, biological variation, or state of the art. Note that no specifications are provided – only which analytes are in which categories. Doesn’t seem like this should take three years.

And I don’t agree with this paper.

For one, talking about “analytical” performance specifications implies that user error or other mishaps that cause errors are not part of the deal. This is crazy because the preferred option is the effect of assay error on clinical outcomes. It makes no sense to exclude errors just because their source is not analytical.

I don’t agree with the second and third options ever playing a role (biological variation and state of the art). My reasoning follows:

If a clinician orders an assay, the test must have some use for the clinician to decide on treatment. If this is not the case, the only reason a clinician would order such an assay is that he has to make a boat payment and needs the funds.

So, for example say the clinician will provide treatment A (often no treatment) if the result falls within X1-X2. If the result is greater than X2, then the clinician will provide treatment B. Of course this is oversimplified since other factors are involved besides the assay result. But if the assay is 10 times X2 but truth is between X1 and X2, then the clinician will make the wrong treatment decision based on laboratory error. I submit this model applies to all assays and that if one assembles clinician opinion, one can construct error specifications (see last sentence at bottom).

Other comments:

In the event that outcome studies do not exist, authors encourage double-blind randomized controlled trials. Get real people – these studies would never be approved! (e.g., feeding clinicians the wrong answer to see what happens).

The authors also suggest simulation studies which I have previously commented that their premier simulation study which was cited was flawed (Boyd Bruns glucose meter simulations).

The Milan 2014 conference rejected the use of clinician opinion to establish performance specifications. I don’t see how clinical chemists and pathologists trump clinicians.