Antwerp talk about total error

March 12, 2017

Looking at my blog stats, I see that a lot of people are reading the total analytical error vs. total error post. So, below are the slides from a talk that I gave at a conference in Antwerp in 2016 called The “total” in total error. The slides have been updated. Because it is a talk, the slides are not as effective as the talk.




Test error and healthcare costs

December 7, 2016


Conventional wisdom says that regulatory authorities approve assays that have the highest quality, meaning that the errors are small enough that no or little harm will arise because a clinician makes a wrong medical decision based on test error.

It is also true, although not talked about, that in most countries healthcare is rationed – the cost of treating everyone with every possible treatment is too high.

So here’s a hypothetical example using glucose meters.

First, we start out with the status quo for existing glucose meter quality and assume that on average, across all tests there will be some harm due to glucose meter error. The percentage of tests that harm people is unknown as is the range of harm but assume that these can be ascertained and do occur.

As for the hypothetical part…

There are 2 new glucose meters seeking approval

Meter A costs 100 times as much as current meters and is guaranteed to have zero error, as it is a breakthrough technology. Its use will reduce patient harm due to test error to zero.

Meter B costs 100 times less than current meters but isn’t quite as accurate or reliable. Patient harm will increase with the use of meter B.

If meter A is approved, because of healthcare rationing, costs will have to be transferred from other parts of healthcare to pay for meter A.

If meter B is approved, costs can be transferred from glucose meter testing to other parts of healthcare.

The point is not to try to answer whether meter A or meter B should be approved, but to illustrate that the cost issues associated with healthcare policy always exist but are rarely discussed.

Update on Blood Lead Goals

May 4, 2016


I have updated the section in the previous post on blood lead goals. They are also here.

Blood lead lowest allowable limit:

1960s 60ug/dL
1978   30ug/dL
1985   25ug/dL
1991   10 ug/dL
2012     5 ug/dL

Source Markowitz G, Rosner D. Lead Wars: The politics of science and the fate of America’s Children

Why do performance goals change – has human physiology changed?

May 3, 2016


[Photo is Cape Cod Canal] Ok, the title was a rhetorical question. Some examples of the changes:

Blood lead lowest allowable limit:

1960s 60ug/dL
1978   30ug/dL
1985   25ug/dL
1991   10 ug/dL
2012     5 ug/dL


Glucose meters:

2003 ISO 15197 standard is 20% above 75,
2013 ISO 15197 standard is 15% above 100,
2014 proposed FDA standard is 10% above 70.

The players:

Industry – Regulatory affairs professionals participate in standards committees and support each other through their trade organization, AdvaMed. The default position of industry is no standards – when standards are inevitable, their position is to make the standard as least burdensome as possible to industry.

Lab – Clinical chemists and pathologists are knowledgeable about assay performance. ALERTpathologists are not clinicians. Also, lab people are often beholden to industry since clinical trials are paid by industry, conducted in hospitals by clinical chemists or pathologists.

Clinicians – Sometime, clinicians are part of standards but less often than one might think.

Regulators – People from FDA, CDC, and other organizations have to decide to approve or reject assays and are often part of standards groups.

Patients – Patients have a voice sometimes – diabetes is an example.

Medical Knowledge – As the title implies, the medical knowledge related to performance goals is probably of little consequence. For example, the harm of lead exposure is not a recent discovery.

Technology – Improving assay performance due to technical improvements probably does play a role in standards. All of a sudden the performance standard is tighter and coincidently, assay performance has improved.

Cost – Healthcare is rationed in most countries so cost is always an issue, but it is rarely discussed.

Note that the earliest standard for these two assays is 100% or more lenient than the current standard.

IQCP – It’s about the money

April 22, 2016


There is an article in CAP Today about IQCP. I was struck by a quote in the beginning of the article:

“I didn’t stop to calculate what it would cost to do liquid quality control on all the i-Stat cartridge types every eight hours because the number would have been through the roof”

Now I understand that cost is a real issue, but so is harm to patients.

The original idea of EQC (equivalent quality control) was to reduce the frequency of QC if you did an experiment that showed good QC for 10 days. This was of course without merit with the potential to cause patient harm.

The current notion of IQCP is to perform risk analysis and reduce the frequency of QC. This also makes no sense. Risk analysis should always be performed and so should QC, at a frequency which allows the repeat of questionable results such that patients will not be harmed.

Decision analysis? – where are the details?

November 15, 2015


In the Milan conference (1st EFLM Strategic Conference Defining analytical performance goals) one of the papers (1) suggests that analytical performance specifications should be prepared from indirect outcome studies using decision analysis. The only example presented is a simulation, which is not decision analysis. Decision analysis is also discussed in this section but on an abstract level.

I have performed decision analysis and discuss it in my book (2). Decision analysis requires a quantitative variable that is either maximized or minimized. In my case, we performed financial decision analysis and the parameter to be maximized was net present value (NPV) of future cash flows. The Milan paper never identifies a quantitative parameter to be optimized.

I don’t understand how decision analysis can be recommended without any known examples or details about how one would go about it.


  1. Horvath AR, Bossuyt PMM, Sandberg S, Setting analytical performance specifications based on outcome studies – is it possible? Clin Chem Lab Med 2015; 53(6): 841–848.
  2. Assay Development and Evaluation: A Manufacturer’s Perspective. Jan S. Krouwer, AACC Press, Washington DC, 2002, see Chapter 3.

Product performance acceptance limits

February 17, 2013

Speed Limit Traffic Sign

During my consulting career – both while working full time within a company and as an external consultant – one of the most common issues has been the failure of the company to provide useful acceptance limits for performance parameters for a product.

These included:

  • No limits whatsoever
  • Clearly unachievable limits
  • Limits that could not be evaluated (often non quantifiable limits)

Limits are important because matching observed performance to limits (assuming the limits are correct) prevents releasing a product too late or too early:

  • Late – Delaying product release
  • Attempted early – Having the product rejected or delayed by the FDA
  • Early – Having the product rejected by the marketplace

Any of the above is bad for the financial health of a company.

So why is it so difficult for this fundamental market requirement to be established? I’m not sure.

Ioannidis is Wrong

September 29, 2012


For some time, I have been a follower of John P.A. Ioannidis, but I don’t agree with his recent analysis of PSA as a screening tool. He says that PSA is a failure and “largely useless— or even harmful—and therefore needs to be abandoned” He offers as evidence the recent USPSTF recommendation, which recommends against PSA screening altogether. Of course, PSA does have false positive problems and overtreatment is an issue. But …

An update from the ERSPC Trial states that “The European Randomized Study of Screening for Prostate Cancer has published its 11-year follow-up results (New England Journal of Medicine, March 15 2012). Once again, they demonstrate that screening does significantly reduce death from prostate cancer. The latest study confirms that a man who undergoes PSA testing will have his risk of dying from prostate cancer reduced by 29%.”

And one can listen to an oncologist, who had metastatic prostate cancer and recovered, and now treats prostate cancer patients.

The USPSTF is empowered by the Affordable Care Act. It’s clear that healthcare spending by the government must be reduced. There would be considerable cost savings for: 

  • the population of men over 50 in the US (or between 50 and 75) that would no longer receive a PSA test
  • the number of men that would have had an elevated PSA that would not receive a biopsy
  • the number men that would have been diagnosed with early prostate cancer via a PSA test / biopsy (~ 200,000 per year) that would not receive treatment (surgery or radiation)

Isn’t it likely that cost had a role in the USPSTF decision? But this is not covered in the Ioannidis article.

If performance goals were decided rationally

November 25, 2011

One of the key questions asked for a diagnostic assay is “is the performance good enough?” This question takes on forms such as: what should the performance goals be? How should they be evaluated? From a regulator’s point of view, there are two decisions that can be made:

Accept the assay – with the risk that patients may be harmed due to assay error
Reject the assay – with the risk that patients may be harmed due to the lack of information that would have been obtained by the assay

Now for any good assay, patients harmed from assay error is always extremely low (e.g., self justifying because it’s a good assay).

And a good assay provides important information to a clinician (e.g., self justifying because it’s a good assay). This also means that the lack of information from a rejected assay would likely cause great harm.

Hence the rational decision is always to approve an assay. Why doesn’t the FDA always approve assays? Perhaps because assays that harm patients are like a plane crash and no one likes plane crashes.

CLSI Guidelines – the importance of real examples

May 15, 2011

CLSI Evaluation Protocol guidelines often contain statistical procedures and statistics is challenging for most people. One can think of CLSI documents as having three parts: the explanatory text, examples, and the appendices. The text is often lacking in spite of many revisions, simply because statistical explanations are hard to follow. The justifications of some of the statistics are in the appendices – even harder to follow.

This leaves the examples as an important part of these guidelines. If one understands the examples, then one can do the procedure, even if some of the text can’t be followed. Now this is a bit less important with the introduction of StatisPro software from CLSI, but some users might choose not to buy StatisPro and StatisPro doesn’t cover all guidelines.

Examples can be completely made up or they can have real data, which is much more useful. EP21 (total error) has two examples. One is real data (ldl cholesterol) and has a few outliers. During the comment period for EP21, several people wanted to change or delete the example because of the outliers, but outliers happen in the real world. The second example is made up because I wanted normally distributed data and although I worked for a manufacturer at the time, I couldn’t find an example of normally distributed data.

So the tradeoff is a made up example that neatly illustrates the statistical method with no brainer conclusions or a real example – warts and all – that also illustrates the statistical method but doesn’t look very appealing or leads to conclusions that require judgment.

This issue occurs in EP27 (error grids) but is much more intense. That is because error grids require judgment in their creation and this judgment can seem (and often is) arbitrary. But this is the real world. For example, with glucose, clinicians are still debating the location of the innermost zone of the error grid. The error grids in EP27 are real (blood lead, prothrombin time, and urine albumin). So now the comments complain that the error grids seem arbitrary and the commentators would rather have a made up, neat example that is an abstraction of an error grid, with clearly defined clinical consequence zones. But that is not the real world and won’t help anyone.