Why do performance goals change – has human physiology changed?

May 3, 2016


[Photo is Cape Cod Canal] Ok, the title was a rhetorical question. Some examples of the changes:

Blood lead lowest allowable limit:

1960s 60ug/dL
1978   30ug/dL
1985   25ug/dL
1991   10 ug/dL
2012     5 ug/dL


Glucose meters:

2003 ISO 15197 standard is 20% above 75,
2013 ISO 15197 standard is 15% above 100,
2014 proposed FDA standard is 10% above 70.

The players:

Industry – Regulatory affairs professionals participate in standards committees and support each other through their trade organization, AdvaMed. The default position of industry is no standards – when standards are inevitable, their position is to make the standard as least burdensome as possible to industry.

Lab – Clinical chemists and pathologists are knowledgeable about assay performance. ALERTpathologists are not clinicians. Also, lab people are often beholden to industry since clinical trials are paid by industry, conducted in hospitals by clinical chemists or pathologists.

Clinicians – Sometime, clinicians are part of standards but less often than one might think.

Regulators – People from FDA, CDC, and other organizations have to decide to approve or reject assays and are often part of standards groups.

Patients – Patients have a voice sometimes – diabetes is an example.

Medical Knowledge – As the title implies, the medical knowledge related to performance goals is probably of little consequence. For example, the harm of lead exposure is not a recent discovery.

Technology – Improving assay performance due to technical improvements probably does play a role in standards. All of a sudden the performance standard is tighter and coincidently, assay performance has improved.

Cost – Healthcare is rationed in most countries so cost is always an issue, but it is rarely discussed.

Note that the earliest standard for these two assays is 100% or more lenient than the current standard.


The revised total error standard EP21, an example of manufacturers dominating CLSI

May 18, 2015


I had a chance to look at the revision of EP21 – the document about total error that I proposed and chaired. So after 12 years, here are the major changes.

In the original EP21, I realized that even if 95% of the results met goals, the remaining 5% might not, so there was a table which accounted for this. An acceptable assay had to have 100% of its results within goals. The revised EP21 – call it A2 – only talks about 95% of results (similar to the 2003 ISO glucose meter standard). There is no longer any mention of the remaining 5% – these remaining results are unspecified. This goes along with my thinking that manufacturers will refuse to talk about assay results that can cause severe injury or death. Thus, if 95% of the results just meet goals, a portion of the remaining 5% could cause severe injury or death and this portion even for a small percentage could be a big number (as one example, there are 8 billion glucose meter results each year in the US).

The mountain plot and all references to it are gone in A2. To recall, the mountain plot is ideal at visualizing outlier observations. In fact, there could be 10,000 observations but if there were 5 outliers, they would be clearly visible. In place of the mountain plot, there is a histogram with an example with normal looking results – the example that had outliers is gone. And the histogram has only 9 bins so if there were outliers, they would disappear. So again, this is a way to minimize talking about results which can cause major problems.

Somehow, sigma metrics have become part of A2. How this happens is a mystery. Perhaps someone can explain it to me, since whereas I understand the equation: Total error = |bias| + 2 x imprecision, the total error in EP21 is the difference between candidate and comparison assays and this difference can’t be separated into bias and imprecision.

And then there is the section on distinguishing between total error and total analytical error. This is part of the reason I was booted out of CLSI. A2 is constrained to include only analytical error.

Total error, including all sources of variation is the only thing that matters to clinicians. The total error experiment (e.g., EP21) will include errors from only those sources that are sampled. Practically speaking, the sources will be limited, even for analytical error. For example, even if more than one reagent is used, this is not the same as randomly sampling from the population of all reagents during the lifetime of the device – impossible since this involves future reagents that don’t yet exist. The same is true for pre- and post-analytical error but the point is one should not exclude pre- and post-analytical error sources from the experiment.

There is a section on various ways to establish goals. Examples shown are the ISO, CLSI, and NACB glucose meter standards, which have performance goals for glucose meters. A2 talks about the strengths and weaknesses of using expert bodies to create these standards. Now A2 has a reference from May of 2015, but somehow they missed the FDA draft guidance on glucose meters (January 2014) which unlike the examples cited in A2 wants evaluators to account for 100% of the data. And, FDA’s opinion about the ISO glucose meter standard is pretty clear:

Although many manufacturers design their BGMS validation studies based on the International Standards Organizations document 15197, FDA believes that the criteria set forth in the ISO 15197 standard do not adequately protect patients using BGMS devices in professional settings, and does not recommend using these criteria for BGMS devices.

I have published a critique of the CLSI glucose meter standard, which is available here.

When I was chair holder of the Evaluations Protocol Committee, there were battles between regulatory affairs people, who populated the manufacturing contingent and the rest of the committee. For example, I remember one such battle over EP6, the linearity document. The proposed new version finally had a sensible statistical method to evaluate nonlinearity but one regulatory affairs member insisted on having an optional procedure where one could just graph the data and look at it to declare whether it was linear. After many delays, this optional procedure was rejected.

By looking at the new version of EP21, my sense is that the regulatory affairs view now dominates these committees.

Blood Lead – what is the rationale for the allowable level?

January 22, 2014


I’ve been consulting for a while for a company that makes blood lead assays. It used to be that the lowest allowable level of lead was 10 ug/dL.  Below this level, no action was needed whereas above this level, a repeat assay was proscribed to determine if the source of contamination was still present. The lead level that sparks chelation treatment is 45 ug/dL. 

The cut-off of 10 makes one wonder. If a person (usually a child) has a level of 9.9 and another child has an undetectable lead level, do these two kids have the same risk for lead poisoning? (Note a lead assay measures lead exposure, not lead poisoning).

But now, the CDC has changed the allowable level to 5 ug/dL. This raises some strange possibilities. The parents of a child who previously had a lead level of 6 may not have been even notified of the result, but had the child just been tested they would be.

What has changed? One thing that has not changed is the biological role for lead in humans. There is none! And since higher levels of lead cause severe problems isn’t it likely that any level of lead is undesirable?

Disagreeing with the stat books

November 3, 2012

The typical statistics book states that to evaluate something, you state a goal, perform the experiment, and determine if the goal has been met. I believe this is also what the FDA expects. Whereas this sounds reasonable, the problem is that for many companies, not much time is spent on goals during development and evaluation of methods. The result is that after the evaluation is done, people will now start to ask what the goal should really be. But changing the goal after the evaluation is done is frowned upon.

For example, an assay precision goal is set at 5% CV and the result of the evaluation is 5.1% Say the team meets and decides 5.1% is acceptable and changes the goal to 6%. Is there something wrong with this? I say no. In my experience at Ciba Corning, this type of situation occurred periodically which led to the team discussing what we should do. The decision was always unanimous and sometimes favored the product being withheld (or recalled) and sometimes favored product release. Ideally, we should have had such discussions for each goal before the evaluations but it never happened.

Sometimes during the discussions, a “limited product release” was suggested. I always thought this was funny because every product release was limited by our manufacturing capacity so a “limited product release” really meant limited by as fast as you could make and ship product.

Risk Management – within financial constraints

March 2, 2011

My colleague Jim Westgard wrote a piece about risk management that deserves some comments. He dislikes the risk scoring schemes commonly in use because he says they “reflect subjective opinions and qualitative judgments.” He recommends that:

  • Defects are scored using probability of occurrence from 0 to 1.0
  • Severity is scored from 0 to 1.0
  • Probability of detection is scored from 0 to 1.0

I mention in passing that two of these items are probabilities but severity is not a probability and arbitrarily ranked from 0 (no harm) to 1.0 (serious harm). Since the three items are multiplied together, I don’t know what this means.

But here are my two main points. Take probability of defect occurrence first. Say a defect is a very wrong result caused by electrical noise in a response, undetected by instrument algorithms. Westgard would like to change the probability of occurrence of this event from a scale such as extremely unlikely = 1, very unlikely = 2, and so on to a specific probability from 0 to 1.0. He wants to do this to prevent subjective opinions and qualitative judgments.

Now subjective opinions about this type of error from a person on the street would not make sense. But the opinion of a group of engineers who have developed the system would be of interest and yes the opinion is qualitative. But how does Westgard propose to get a quantitative probability? Who will provide this? It is possible through experiments to get an estimate for this defect but this could involve an enormous effort and this is only one potential defect. There could be thousands of potential defect causes, often depending on other causes and each requiring detailed experiments. Remember that a wrong result can be the cause of an operator error, pre or post analytical error and not just analytical error.

My other beef is about including probability of detection (also see reference below). The problem is detection is a process (QC is just one means of detection). For any incorrect result, there are many detection possibilities. For most analyzers, operators examine samples, a series of instrument algorithms are programmed to detect questionable results, QC is performed, serial results are queried using delta checks, and so on. And because detection is a process, there is the opportunity for failure of detection (often from multiple causes). So for example, QC may have some calculated probability of success, but there is the potential for failure because the control was not reconstituted properly, there was a bad vial, the control was expired, and so on.

Moreover, detection by itself will not prevent an error. One must also have a recovery. So with QC, one does not report results until troubleshooting has been completed. But troubleshooting (e.g., the recovery) is a process and it too can fail (again with multiple causes) and its potential for failure is ignored in the Westgard treatment.

So risk management using traditional FMEA isn’t so bad after all. But if you want to do something quantitative such as quantitative fault trees, it is unlikely to be within the financial constraints of your environment.


Schmidt MW. The Use and Misuse of FMEA in Risk Analysis. Medical Device and Diagnostic Industry 2004 p56 (March), available at http://www.devicelink.com/mddi/archive/04/03/001.html

Resistance in CLSI standards

November 18, 2009

As a consultant, much of my work are projects that I suggest. As might be expected, I have to sell people on these ideas. Sometimes, it is clear that stated objections are not the real reason the client doesn’t want to do the project. This is known as resistance and takes on many forms such as “great idea, let’s do it on the next project” or “great idea, but our problem is different” or “I just don’t understand.”

There have been some reject votes for the revised CLSI total error standard, EP21. Based on several discussions, the two major objections look like resistance. Here’s why.

The first objection is that EP21 is too complicated. Now the people who voted to reject EP21 are all senior people in the field of clinical chemistry. EP21 involves graphing differences between a new and comparison method. In one plot the differences are manipulated with subtractions. This is probably the simplest of all the CLSI evaluation documents.

The second objection is that the revised document includes total error from any source that is present in the evaluation protocol (not just analytical error). So what does this mean? Well, if one were going to evaluate a POC device such as a glucose meter which had fingersticks as a valid sample mode, the evaluation should employ fingersticks as the sampling method for the POC glucose meter under test. It is likely that the comparison method would be a laboratory glucose instrument using a venous sample. This means that differences between test and comparison will include errors due to improper fingerstick technique, which is a user error, not an analytical error. This is appropriate because the goal of the evaluation is to estimate performance in routine use.

So what were the objections? We can’t be expected to evaluate all pre- and post-analytical errors such as problems with the LIS. Let’s develop a new standard for pre- and post-analytical errors.

But the revised EP21 does not suggest evaluating all pre- and post-analytical errors, it advises one not to exclude the opportunity for relevant errors in the protocol, such as the glucose example above.

The consultant’s task is to expose objections that have no merit.

Why you need to be your own patient advocate with lab tests

June 1, 2009

advocateLab tests have error and sometimes very large errors. As the last blog entry showed, patient harm can result from certain lab errors. In this blog entry, lab error means an error large enough to result in patient harm. But it is not the error itself that causes harm, it is the clinician acting on the result. When harm occurs from lab error, one can infer that the clinician has not questioned the accuracy of the test. Thus, it’s up to the patient to question the result.

Some examples of how lab results can have error:

Interferences – The HAMA interference from the previous blog is just one example of an interference and can occur on assays other than hCG.

Known bias  – Example: PSA values can differ by 22% on average depending on the manufacturer. This is due to how the assay is standardized. Say one’s PSA value was 3.3 as assayed by manufacturer ABC. If next year the assay value remained at 3.3 by manufacturer ABC, but a different manufacturer were used (that had this 22% different standardization), the reported value would be 4.0 on average. Actually, taking into account the ~5% CV of these assays, 95% of the time the value would be between 3.2 and 4.8 (e.g., half the time greater than 4). Unless questioned, this might lead to a biopsy. Often, the manufacturer is not listed on the lab report. To find this out, one must call the lab.

Problems with newer tests – Molecular testing with arrays is the newest type of testing. A recent article (subscription required for full article) showed that the reproducibility was often greater than 30% CV. If one translates this to a glucose test with a true value of 100 mg/dL, 95% of the time, values would be between 40 and 160 mg/dL! Another report showed that different generations of probe sets often had close to 0 correlation. Back to glucose, this is equivalent to running a method comparison between a newer and older machine and getting random scatter rather than a typical result of a correlation > 0.9.

Other problems – There are many other potential problems that would give an error such as a patient sample mix-up, an undetected instrument error, sample pretreatment problems, and so on.

When to question lab tests – In principle, any lab test could be questioned, although this could be (is) impractical. Moreover, the above problems will be unknown to patients not familiar with laboratory medicine and even people who are familiar with error causes may be unaware that an error has occurred.

Two scenarios are suggested to question lab tests:

  • Before a treatment is started, especially a treatment with risks such as surgery
  • If symptoms persist and a lab test was negative

How to question lab tests – Unfortunately, simply repeating a lab test will not always help. It depends on the error source. If the error source is random, then simply repeating the test will help. If the error source is not random, such as caused by an interference, then repeating the test by the same procedure in the same lab will not help. In situations with HAMA interference, part of the problem was that serial measurements gave the same wrong answers which prompted clinicians to continue (the wrong) treatment.

The safest way then is to request a test to be repeated by a different laboratory and preferably a reference laboratory, if one exists for that assay.

And remember – A wrong lab test is a rare event – whereas a result is worth questioning, the likelihood that a lab test is wrong is extremely low.

NO MEDICAL ADVICE: Material appearing here represents opinions offered by non-medically-trained laypersons. Comments shown here should NEVER be interpreted as specific medical advice and must be used only as background information when consulting with a qualified medical professional.