Why method comparison and imprecision protocols are biased

June 24, 2011

The purpose of a typical clinical laboratory evaluation for an assay is to determine if the assay has adequate performance. What is implied by adequate performance is that the errors are small enough so that clinicians will not make incorrect medical decisions based on assay error.

The typical protocol for a method comparison experiment to assess performance is the split sample protocol, whereby a sample is processed and then divided in two to be analyzed by both systems.

In a protocol to assess imprecision, sufficient sample is processed to provide a pool that is analyzed repeatedly by the system.

Both of these protocols are biased. The reason is that for either the method comparison or imprecision experiment, the same processed sample is being analyzed rather than collecting and processing each sample separately. The opportunity for error due to processing has been excluded in these experiments.

Why does this situation persist? Because another reason – often the main laboratory reason – for performing these evaluations is to answer the question is this new analyzer or method as good as the existing method. In this sense, the protocols are not biased. So I don’t advocate changing the way the protocols are done but one should realize that when one considers the other implied other goal of medical utility for these studies, the protocols are biased. In fact, the implied goal of medical utility is routinely stated as a conclusion of evaluations, such as the ABC assay is acceptable for use in monitoring patients with XYZ.

Detection is important, but not in RPN (Risk Priority Number)

June 22, 2011

At the Westgard web site, the pros and cons of dropping detection from RPN (severity, probability of occurrence, and probability of detection) are discussed. Rather than comment on an abstract level, here is an example of why I favor not including detection.

Assume potassium is being analyzed with the potential for an erroneously high potassium result to be reported to the clinician. This sets the severity to be the same for all causes of this event. There are many, many potential failure causes which require a ranking to focus on mitigations for the top causes that have unacceptable risk. Since all causes have the same severity, and since I don’t favor using probability of detection, it remains to determine the probability of occurrence for each failure mode.

Potential failure (= error) causes are postulated for each process step, of which there are many process steps in a detailed process map. Let’s consider one process step and one potential error cause. Additional process steps deal with this error.

Sample process step – Sample processed to provide serum. A potential error in this step is cell hemolysis.
Detection process step – Technician examines sample for hemolysis.
Recovery process step – If hemolysis has been detected, technician prevents the sample from being analyzed.

For each of these process steps, one must estimate the probability of occurrence for a failure cause. Moreover, each process step can have multiple failure causes. For the sample process step, one could perhaps tally the percentage of detected hemolyzed samples. It is important to understand that both the detection and recovery steps can also fail and their probability of occurrence of failure must also be estimated. These potential failure rates would seem to be harder to estimate although each would have as a floor, non cognitive error.

Given all process step failure modes have been ranked in descending order by probability of occurrence, one would consider mitigations for those process step errors with unacceptable risk.

I don’t see how the addition of classifying probability of detection would help.

CLSI continues to ignore important error sources in its evaluation protocols

June 19, 2011

The majority of errors in the laboratory are not analytical errors but pre- and post-analytical errors. These errors are due to process failures such as using too little sample, analyzing a hemolyzed sample, or failing to withhold results if QC is out.

It is unbelievable to me that EP23 – CLSI’s newest document on risk management specifically excludes pre- and post-analytical errors (see introduction). In a previous post, I commented that EP23 ignored pre- and post-analytical errors but didn’t realize that this exclusion is actually written into the document. Crazy. There are already a slew of CLSI standards to deal with analytical error and risk management is the perfect way to reduce the risk of pre- and post-analytical errors.

And now there is new push back about CLSI’s revised document about total error (EP21-A2). The complaint is about EP21 dealing with “pre- and post- analytical error sources, including those which are not due to the test method”. So if a patient is harmed because a clinician made an incorrect medical decision due to an error in the lab test result, who would try to say that the error was not due to the test method (because it was pre- or post- analytical). Crazy, again. Back in 1987, the ADA (American Diabetes Association) came out with a standard for glucose meters, whose goal was for “total error (user plus analytical)”.



Why no Pareto in EP23 is a problem

June 14, 2011

I have previously commented on deficiencies in CLSI’s risk management standard EP23. The word Pareto cannot be found in this document. Here’s why that’s a problem.

First, a Pareto chart or table is a means of ranking potential failure modes so that one can concentrate on the top failure modes. The ranking is facilitated by means of classifying the potential failure modes with respect to severity and probably of occurrence. The ranking is required because financial resources are limited.

In the example for EP23, 13 potential failure modes are considered for a hypothetical glucose assay. With only 13 potential failure modes, each one can thoroughly analyzed and mitigations proposed. There is no need for Pareto analysis. But in the real world, hundreds of potential failure modes can be enumerated for an assay. This does not mean that the example must contain every potential failure mode but without a Pareto analysis, one does not demonstrate the ranking required to focus resources on the most serious potential failures. Note that the ISO document on risk management 14197, while not using Pareto analysis, uses something which achieves the same purpose – the limited (only six) potential failures are put into a risk matrix, which graphs severity against probability of occurrence for each failure mode, allowing one to focus on the most important failures.

Not a member of the club – part 2

June 5, 2011

The last entry noted that the NACB (National Academy of Clinical Biochemistry) guidelines for glucose meters (subscription may be required) ignored my Letter (1) and our review article on glucose meter standards (2).

But the guidelines did mention a simulation study, which I have previously critiqued. So let me expand on this critique. The simulation study provided some contour plots relating bias and imprecision to total error (or insulin dosage error rates). Now the beauty of contour plots is that they let one visualize things very nicely. The requirements for a contour plot are two variables that describe a third variable (the response) which is what these authors had. But if there are more than two variables – which is the case – then the contour plots (limited to two variables) are bogus.

I mention in my critique that the authors of the simulation study neglected to account for random interferences. This is not just to account for large errors. For example, hemoglobin is well known to cause a small interference in glucose meters, which although small adds to the overall variability.

I add here that the authors also neglected to account for pre- and post-analytical errors. Now you might ask are such errors relevant if someone is just interested in the meter accuracy. Well, observed errors can be an interaction between pre- and post-analytical errors and meter design. And pre- and post-analytical errors that are independent of the meter are still important in assessing insulin dosage errors, which was the purpose of the simulation. The point is that neglecting error sources renders the simulation to be meaningless.


  1. Krouwer JS. Wrong thinking about glucose standards. Clin Chem, 2010;56:874-875.
  2. Krouwer JS and Cembrowski GS. A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology, 2010;4:75-83.