Equivalent QC

March 17, 2005

If you are unfamiliar with equivalent QC, see references 1-2. This essay explores some issues with equivalent QC.

 Go to the AACC expert online access presentation

What causes errors in assays and the role of QC

The figure above shows four generic types of error and how QC can prevent two of tem. The four error types are explained:

random patient interference – is an interfering substance or mix of substances that causes a bias in the (patient) result and is often different (e.g., apparently random) in each patient specimen. The incorrect results are repeatable on re-assay. For some or many patient specimens, there may be no observed interferences. QC does not detect this error.

random bias – is any short term bias such as a clog in an analyzer that lasts for a few samples and is not specific to a particular patient specimen. The incorrect results are often not repeatable on re-assay (because the bias has disappeared). QC probably won’t detect this error since the probability of the error occurring during a QC sample is low. Note: The “clog in analyzer” is a case of an error that may be detected by an internal monitoring system. In this example, the error has not been detected by the internal monitoring system.

long-term bias – is any bias such as most calibration error that lasts for at least a day and is thus detected by routine quality control. The “clog in an analyzer” failure could also last for more than a day. This definition is somewhat arbitrary, since some calibration error is short term (e.g., blood gas systems are calibrated more frequently than once a day).

imprecision – are all biases that are very short term (occur in less time than 1 assay result and are modeled as random error), plus longer term uncompensated biases (for example drift). Note that the imprecision as typically measured in clinical chemistry assays is apparent random error, which means it is the true random error plus uncompensated biases such as drift. QC can detect poor imprecision.

The effect various QC Schemes on detecting these errors.

Error Source QC Scheme
Increased Current (2 per day) Reduced
Random patient interference No effect No effect No effect
Short term bias Catches more errors Catches fewer errors Catches even fewer errors
Long term bias No effect No effect Catches fewer errors1
Imprecision No effect No effect No effect

1For example, if a system is calibrated weekly, and there is calibration error, running QC monthly will frequently miss this error

Internal monitoring systems

The rationale behind the reduction is QC frequency is the assertion that internal monitoring systems adequately detect and prevent incorrect results from being reported. Here are some problems with that assertion.

Calibration is hard to control though internal monitoring – It is unlikely that any internal monitoring system can detect all calibration problems. The whole basis behind calibration is to associate an assay’s response (signal) with a known concentration. This sets up a calibration equation. Then, with each unknown (patient sample) the response that is found is assigned a concentration according to that equation.

Although there can be limits set on the expected calibrator’s response as well as checks on the shape of the response, there is no real way to prevent other errors and this can lead to calibration bias, which can be detected by QC.

Internal monitoring systems are models and can be wrong – An internal monitoring system is the result of a model of how the system can fail. These models are often based on fault trees and FMECAs (see reference 3). Mitigations are applied to detect and prevent errors through hardware and software. But there is no guarantee that either the model is correct (e.g., that all possible failure modes are included) or that the mitigations applied are 100% effective. In fact, experience has shown that assay development usually starts with a relatively large number of errors. Mitigations are repeatedly applied until a decision is made to release the product (see reference 4). Mitigations also are applied after product release. Of course, errors which affect patient results are classified as the most severe and are given the most attention. The process of repeatedly applying fixes (formally known as reliability growth management) is the most efficient way of developing complex instrument systems and is used because the required knowledge to “design things right the first time” doesn’t exist.

Another view of QC vs. internal monitoring systems

There is another fundamental difference between QC and internal monitoring systems. As stated above, internal monitoring systems are based on a model whereas QC is largely observational. Observation means that assuming that one has reasonable quality control rules, one does not require any knowledge about how the system can fail, one must only run QC. Putting things another way, you can forecast the weather through models (and these can be quite sophisticated) or you can go outside.  Or in terms of the equivalent QC issue, one could suggest that one should have the best internal monitoring systems possible and run QC to detect anything that was missed.

The problem with the validation protocol

The suggested validation protocol is 2 QC samples per day for 30 days. One failed QC that does not repeat is allowed. One can show that this proves with 95% confidence that the proportion of all QC failures is no more than 7.7% (see reference 5). This is “equivalent” – in Six Sigma terms – to a 2.9 sigma process. This is actually the best case because one is not really interested in the QC samples but in the patient samples.

Cost must always be considered

All of the above does not deal with cost. If cost did not enter into the equation, one would increase QC frequency, not decrease it. However, cost is important. Running QC samples adds cost. The more lab tests cost, the fewer people will be able to be tested and this lack of information will increase morbidity and mortality. Yet, if QC frequency is reduced, this may lead to more errors and also increase morbidity and mortality (also see reference 6).


The proposal to reduce QC implies that QC is redundant to internal monitoring systems. I have suggested why this might not be the case. The cost benefit tradeoff of equivalent QC must be addressed with data, and this doesn’t mean asking each lab to answer this question.


  1. http://www.cms.hhs.gov/CLIA/downloads/6066bk.pdf
  2. http://www.westgard.com/cliafinalrule7.htm
  3. http://krouwerconsulting.com/IFTF.htm
  4. http://krouwerconsulting.com/KrouwerLearningCurve.pdf
  5. Hahn GJ and Meeker WQ. Statistical intervals. A guide for practitioners. Wiley: New York, 1991, p. 104
  6. Krouwer JS. Assay Development and Evaluation: A Manufacturer’s Perspective., AACC Press, Washington DC, 2002, p6.

Why “latent errors” is not a good term

March 17, 2005

One occasionally hears the term “latent errors” in articles about error reduction techniques (1). The purpose of this essay is to explain problems with this term and to suggest alternatives.

Latent implies hidden. Berwick uses the term “latent failures” and equates this with “the little things that go wrong all the time”. These are misleading concepts. Consider an example. In a recent presentation, Astion presents some examples of latent errors and their effects (2).

  • Computers: A lack of 1 instrument interface is responsible for many active data entry errors.
  • Staffing: 1 latent error regarding suboptimal staffing leads to multiple active errors by staff who are forced to multitask.
  • Policy and Procedure: A bad strategy for handling phone calls can lead to multiple errors

Before analyzing one of these examples, consider a fault tree model of lab error. This is a hierarchical (“top down”) chart of errors which has the following properties:

  • The severe errors are at the top
  • Errors are connected through parent – child connections
  • The parent errors are the “effects” of the child errors
  • The child errors are the causes of the parent errors
  • The tree uses “gates” which include:

o       or gate means any child event in that branch that occurs will cause the parent error

o       and gate means all child events in that branch must occur to cause the parent error

o       basic gate is the end (cause) of a tree branch

  • Each error is classified as to its:

o       severity

o       probability (likelihood of occurrence)

Considering the staffing error mentioned by Astion. One could postulate one of many branches of a fault tree to contain this error as:

Top – outlier (e.g., incorrect result) reported to clinician

AND – assay has interference to lipemia

AND – sample is lipemic

AND – (visual) detection for lipemic sample failed

OR – technician not available to perform test

OR – technician called away

BASIC – inadequate staffing

Translating the events of this tree into English (sort of), an outlier will be reported if the assay has an interference to lipemia, AND the sample is too lipemic AND the step for visually examining the sample has failed. The visual examination step failure can have several causes, one of which is the technician does not perform this step. This has several causes, one of which is the technician has been called away because the staffing is inadequate (e.g., there is a problem somewhere else that should be handled by staff but, inadequate staffing prevents this).


An outlier that is reported to a clinician is among the most severe errors that a lab can make. Every event in this branch of the tree has the same severity classification because the effect of any of these errors is the top level error. The importance (ranking) of these errors may be different because the probability of each of these errors may be different.

Because the severity is high, there is no reason for calling any of these events “the little things that go wrong.” They are all severe events. There is also no reason to call them latent (e.g., hidden) or to call the top level error “active” which implies that the lower level errors are not active. Any of these events that occurs are active. Whether these events can be detected depends on programs in place (3) to expose such errors.

To summarize what FMEA recommends:

  1. Flowchart the process
  2. Add process steps to a fault tree
  3. Add causes to each potential process step error
  4. Add FMEA information to each event
  5. Rank the errors
  6. Propose mitigations


Before deciding that one has thought of all possible causes for an error, one should consult a list of “QSEs (Quality System Essentials) (3). These are generic activities that apply to virtually all aspects of a service. These activities will typically not appear in flowcharts because they are so pervasive that they would make flowcharts too complicated.

For an additional critique of reference 1, see reference 4.


  1. Berwick, DM. Errors Today and Errors Tomorrow N Engl J Med 2003;348: 2570-2572
  2. Astion M. Developing a Patient Safety Culture in the Clinical Laboratory http://www.aacc.org/AACC/events/expert_access/2005/saftey/
  3. Application of a Quality System Model for Laboratory Services; Approved Guideline—Third Edition GP26-A3 NCCLS 2004 Wayne, PA
  4. Krouwer JS There is nothing wrong with the concept of a root cause. Int J Qual Health Care 2004;16:263

See also additional references at the end of the systems not people essay.


FMEA vs. FRACAS vs. RCA – 3/2005

March 11, 2005

FME(C)A – Failure Mode Effects (and Criticality) Analysis FRACAS – Failure Reporting And Corrective Action System RCA – Root Cause Analysis

Many people who work in hospitals have never heard of any of these reliability tools. Those that have (often on patient safety committees) have heard of FMEA and RCA, but often not FRACAS. This essay explains the differences among these techniques.

JCAHO requires:

  •  FMEA to performed once a year
  •  RCA to be performed for sentinel events and near misses.

The use of RCA by hospitals has been critiqued by Berwick (1) who suggests that RCA seeks to find a single cause. I have responded (2) as in my experience, RCA is not limited to seeking single causes. A more important limitation of RCA as practiced by hospitals is that it is often limited, as implied by JCAHO policy, to sentinel events and near misses. This leaves the many less severe events out of the picture.

FRACAS (3-4) is really the same thing as RCA although FRACAS is often combined with other tools such as reliability growth management which is based on learning curve theory. Moreover, in FRACAS all observed error events are analyzed and in this way FRACAS is very similar to FMEA.

The term FRACAS will be used here instead of RCA. Both FRACAS and FMEA can be combined with fault trees. Some attributes of FMEA and FRACAS are shown in the following table.

Attributes of FMEA and FRACAS for a process

General “proactive” “reactive”
Purpose affect the design before launch correct problems after launch
Errors may occur – the potential errors must be enumerated have occurred – observed errors are simply counted
Error rate is assumed is measured
Issues with technique Is it complete? Models can be wrong. All errors counted? Culture inhibits reporting errors.
Can be combined with fault trees fault trees
Evaluate quality of the technique difficult – completeness, reasonableness of mitigations is qualitative simple – measure error rate

FMEA and FRACAS can inform Fault Trees

A fault tree is a “top down” structured way1 of representing causes for an undesirable event. Fault trees allow multiple causes for an event and use “AND” and “OR” gates to distinguish between error types. Fault trees can contain both potential and observed errors. Because of this, they are ideal to contain the knowledge expressed in both a FMEA and FRACAS. That is, when a process is designed, the ways it might fail are captured in a fault tree (and FMEA). After the process is launched, the ways in which the process has failed are captured through FRACAS and this knowledge is used to update the fault tree. In both the FMEA and FRACAS, the fault tree is also updated when a mitigation is implemented, since this represents a design change to the process. This is shown in Figure 1.

Figure 1 Use of FMEA, FRACAS, and Fault trees to prevent errors in processes.

Don’t neglect FRACAS

Both FMEA and FRACAS are useful. Yet, the JCAHO focuses on FMEA. In a sense, this is logical because FMEA is more encompassing than FRACAS. That is, FMEA addresses potential errors, yet can also accommodate observed errors, whereas FRACAS is intended only for observed errors. The problem is that with 98,000 deaths due to medical errors each year, there are a huge number of observed errors and there is the possibility to pay insufficient attention to potential errors if one performs only FMEA.

Consider a hypothetical FMEA for a transplant service. Consider two error events:

  1. patient infection after surgery – an observed error
  2. organ selected with incorrect blood type – a potential error

If one goes through the entire service, it is likely that the number of observed error events will cause a ranking problem. Ranking is important because there are limited funds for which to apply to mitigations. So even though selection of an organ with the wrong blood type may have never occurred, it is possible that the selection process is flawed and could benefit from mitigations. Yet, it is also possible that this will not occur because the focus is on observed errors. Hence, one should perform both FMEA and FRACAS, as indicated in Figure 1. This reduces the likelihood of ranking problems since the FMEA will focus on potential problems and the FRACAS will focus on observed problems.

A challenge with FMEAs

As indicated in the above table under “purpose”, use of FMEA is intended to affect the design of a process. Yet, in the medical diagnostics industry, an FDA required hazard analysis for instrument systems (a fault tree / FMEA for hazards) was at times merely a documentation of an existing design. The same issue exists for FMEAs in hospitals, since many FMEAs will be performed for existing processes.


  1. Berwick, DM. Errors Today and Errors Tomorrow NEJM 2003;348:2570-2572.
  2. Krouwer, JS. There is nothing wrong with the concept of a root cause. Int J Qual Health Care 2004;16:263.
  3. Mil-Std 2155, available at http://www.barringer1.com/mil_files/MIL-STD-2155.pdf
  4. Krouwer, JS. Using a Learning Curve Approach to Reduce Laboratory Error, Accred. Qual. Assur., 7: 461-467 (2002) available at http://krouwerconsulting.com/KrouwerLearningCurve.pdf

1The graphical structure imposed by a fault tree increases the likelihood that a FMEA will be more complete, since a FMEA is basically an unordered list in a table.