Equivalent QC – Prevention vs. Detection of Errors

June 17, 2005

A recent article in Clinical Laboratory News summarized the issues with the EQC (Equivalent Quality Control) proposal (1). I previously wrote an essay on this topic and also presented an expert online session at AACC.

This essay deals with a statement in the Clinical Laboratory News article made by Fred Lasky, “You’re much better off working to prevent errors rather than trying to detect them. You will never produce a high quality product if your only way to assure quality is to sort out the bad cherries. That will never be 100% successful.”

This statement is based on a common thread in quality circles which favors prevention over inspection (detection) as in “do it right the first time”. On the face of it, it’s hard to argue with this logic and would seem crazy to build systems with problems that can only be uncovered by inspection / detection. Why wouldn’t one design systems right the first time to prevent errors? The answer was discussed previously (2) and has the do with the state of knowledge that is available to the designers. This was also mentioned in the first essay about EQC.

The quality method used depends on the state of knowledge

The states of knowledge can be thought of as

high – where knowledge can be expressed as mathematical equations based on physical properties such as Michaelis-Menton kinetics for an enzyme reaction.

medium – where knowledge can be expressed with empirical equations based on physical properties such as those derived from factorial experiments

low – where knowledge can be expressed semi-empirically, or by hit or miss methods such as trying virtually all surfactants to find one that works.

Designers of diagnostic assays often are faced with all three states of knowledge as most assays today are a complex set integrated technologies. When the state of knowledge is high, engineers and scientists design out errors. However, when the state of knowledge is medium or low, detection strategies become important. These may be built into the system (e.g., internal monitoring systems) or are external (e.g., traditional QC). As illustrated in the AACC presentation (slide 15), internal monitoring systems can detect lower level system errors which implies significant knowledge of the system. On the other hand, external quality control can detect errors without knowledge of the system. Thus, external QC provides a net to catch errors which designers missed either because their knowledge was inadequate to design out all errors or because their model of the way the system might fail – their internal monitoring systems (e.g., risk analysis) was incorrect.

As shown in slide 7 from the AACC presentation, laboratory errors that cause harm to patients are often caused by a cascade of errors. The cascade may be broken by either preventing or detecting an error – hence detection plays an important role in quality and should not be relegated to a lesser status. One could only devalue QC if it could be shown to have no benefit. This could in principle be shown by the so called option 4. The above discussion suggests that unless the state of knowledge is high enough, external QC will have benefit.

Finally, the use of reliability growth management at Ciba Corning, a detection method of reliability improvement based on learning curve theory, was a key success factor for several instrument systems. It complemented design strategies and became part of them (3). The acronym used in mil-stds is TAAF (Test Analyze And Fix) which is politically incorrect in today’s quality world but nevertheless highly successful.


  1. McDowell, J. Revisiting equivalent quality control. Clinical Laboratory News, June 2005, pp 1,3,6.
  2. Assay Development and Evaluation: A Manufacturer’s Perspective. Jan S. Krouwer, AACC Press, Washington DC, 2002, pp 28,61.
  3. Assay Development and Evaluation: A Manufacturer’s Perspective. Jan S. Krouwer, AACC Press, Washington DC, 2002, pp 60-67.

Pay for performance – the missing measure

June 17, 2005

Recently it has been suggested that to improve quality, hospitals will be paid based on their performance, where performance is taken to mean among other things a lower rate of medical errors. This of course implies that something will be measured. The number of measures suggested is huge. An idea of the extent of these measures is shown in the following table (1).

Number of measures for selected categories

Category Number of Measures
Process 343
Patient experience  83
Outcomes 129
Cardiovascular diseases 144
Pathological conditions, signs and symptoms 160

A global (total medical error) measure is needed

Specific measures are useful. However, what is also needed is a global or total medical error measure, e.g., the sum of all severity weighted individual medical error rates. Total medical error measures can exist for specific services and there can be one hospital wide global measure. For laboratory medicine, a list of specific measures has been suggested (2). Whereas all of these measures are useful, one would still like to know for the lab, what is the overall medical error rate.

A financial analogy

Investors look at many measures to decide whether to invest including the overall measure – profitability. It’s hard to imagine omitting this measure but that is what is being proposed in pay for performance. In financial reporting, if revenue is reported but not cost, there is no way to estimate profitability. The way profitability is reported is standardized so that companies can be compared. The same needs to happen for pay for performance, with a total medical error rate measure.

Some examples of problems when there is no global measure

In laboratory medicine, errors are often divided into the categories, pre-analytical, analytical, and post-analytical. Analytical errors are often given the most attention enough though their frequency is lower than the other categories. Part of the reason is that it is relative easy to quantify many analytical performance parameters. Even within the analytical error category, insufficient attention is given to some important errors such as interferences (3) because their estimation is more difficult. Moreover, in estimating certain parameters such as average bias, outlier data are often discarded. While this is legitimate with respect to average bias estimation, it is possible that the discarded data (and the origin of these errors) will disappear from consideration even though their effect will still be observed. There have been attempts to model total error (for analytical errors) which have their own problems. For example in a GUM (Guide for the Estimation of Uncertainty of Measurement) like approach (4), important errors were ignored if they were infrequent. In another case, an analytical total error model was shown to be incorrect (3). The possibility of incorrect models is remedied by a direct measure of total analytical error (5), which does not rely on a model.

In the preanalytical area, a patient sample mix-up is an important error but it is uncommon for this type of error to be compared to the analytical errors and it is also uncommon for attribute types of data to be compared to continuous variables such as bias. Without a Pareto like analysis of all observed errors, it is possible that resources will not be optimized to provide the quickest reduction for all medical errors according to their importance. The fact that different people may deal with different errors also complicates matters in the absence of a Pareto since the skill that people have in lobbying for funds may be out of whack with the results from a conceptual Pareto (e.g., one that could exist but doesn’t). Patient sample mix-ups would be investigated using FMEA, FRACAS, or root cause analysis while bias would be investigated with statistical analysis from a method comparison.

There is also the problem of goals, when one has a series of individual medical errors. How does one realistically set the error rate reduction for each error.

FRACAS (and FMEA) allow for a global measure

FRACAS – Failure Review And Corrective Action System FMEA – Failure Mode Effects Analysis

People in hospitals are familiar with the classification scheme used in FMEA to classify errors – the same one is used in FRACAS. That is, each error is classified (numerically) according to its severity and frequency of occurrence. The two numbers are multiplied together to get criticality. Once can add up this criticality and by means of a usage factor arrive at a global medical error rate. That is, one has not only a Pareto of individual medical errors, but also a measure of the total medical error rate which is simply the sum of all elements in the Pareto. Note that to reduce observed errors, FRACAS, and not FMEA will apply.

Given the Pareto based on a FRACAS, one can apply tools such as reliability growth management which allows one to track progress and predict when a total medical error rate goal will be achieved, as was shown for medical instrument reliability (6). For this case, analytical performance problems such as bias and hardware failures which affected availability (e.g., turn-around-time) were all classified and captured in the FRACAS.

Of course, people can argue that it is difficult in hospitals to have an error reporting program, since for a variety of reasons, there can be resistance to report medical errors. This is a problem that needs to be addressed. However, this problem exists whether one has a total medical error rate measure or a selection of individual medical error rate measures.

Specific measures are still needed

One can only reduce the total medical error rate goal by reducing individual medical error rates. Specific individual measures that make up the total can receive focus according to their ranking in a Pareto chart and this the fastest way to reduce the overall error rate. Note that the top individual measures from a Pareto chart may not correspond to a pre-designated list of measures. For example, one suggested pay for performance measure is the percent of patients receiving aspirin after undergoing coronary bypass surgery. But it is possible for a hospital that this measure meets it goal, but that other measures that lead to morbidity and mortality are high in the Pareto chart and are not on a pay for performance schedule.


  1. National Quality Measures Clearinghouse, see http://www.qualitymeasures.ahrq.gov/
  2. Hilborne, L. Developing a Core Set of Laboratory Based Quality Indicators.
  3. Krouwer, JS Setting Performance Goals and Evaluating Total Analytical Error for Diagnostic Assays. Clin Chem 2002;48:919-927.
  4. Krouwer, JS. A Critique of the GUM Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin Chem 2003;49:1218-1221.
  5. Krouwer, JS and Monti, KL. A Simple Graphical Method to Evaluate Laboratory Assays, Eur. J Clin Chem and Clin Biochem 1995;33, 525-527.
  6. Krouwer, JS. Using a Learning Curve Approach to Reduce Laboratory Error, Accred. Qual. Assur., 7: 461-467 (2002) available at http://krouwerconsulting.com/KrouwerLearningCurve.pdf

ISO Terminology – globally harmonized or gobbledygook – 6/2005

June 11, 2005

One of my hobbies is foreign languages, so if it were required to communicate clinical chemistry information in a foreign language, regardless of the language chosen, this would be entertaining for me. However, hobbies aside, one has to ask how useful is the current trend towards “global harmonization of terminology.”

Consider the following passage:

“The total error of that assay was marginal – the within-run imprecision was good but interferences caused accuracy problems. For details, see the appendix.”

Translated into an ISO globally harmonized version gives:

“The accuracy of that measurand was marginal – the repeatability was good but influence quantities caused bias problems. For details, see the annex.”

For most speakers or readers of English, the comprehensibility of the second version has decreased. This is particularly troublesome since statistical concepts are difficult, and changing terms often makes their comprehension more difficult. For example, consider some sections for the ISO accuracy definition (1).

  • Accuracy is a qualitative concept
  • Accuracy cannot be given a numerical value in terms of the measurand, only descriptions such as ‘sufficient’ or ‘insufficient’ for a stated purpose.
  • Accuracy of measurement is related to both trueness of measurement and precision of measurement;

What does all of this mean? For one, in ISO terms, the phrase “The accuracy for 95% of the results was between ± 5 mmol/L.” would be incorrect. This is a little difficult to understand since if this result met the accuracy goal, it would be correct in ISO terms to state: “The accuracy was sufficient.” Moreover, prior to global harmonization, accuracy meant how much bias there was. As an ISO term, accuracy refers to errors from all sources (e.g., bias and imprecision) so this means, every time accuracy is used, one must try to determine which definition was intended.

How many people will use the word measurand? What’s wrong with the word assay or analyte? What does “influence quantity” offer over interference?

Consider the simple words “annex” and “appendix” (2).

Appendix is defined as definition 1 as “additional or supplemental material at the end of a book or other writing”

Annex is defined as definition 5 c and noted as archaic as “a section added as to a document or addendum”

Remember Esperanto?


  1. See, http://www.clsi.org/ click on harmonized terminology database.
  2. Webster’s New World Dictionary Third College Edition. Neufelfdt, Ed. 1988 Simon and Schuster, New York.