A pedigree of approaches to reduce error

June 15, 2007

James T. Reason is a psychologist who has been working in the field of human error for many years (1-2). My background comes from studying and using defense industry reliability tools such as fault trees, FMEA, FRACAS, and reliability growth management. Reason has influenced others which can be seen in articles and standards (3-6). They incorporate ideas from cognitive psychology.

Some comments of the Reason paper

Reason uses the “Swiss cheese” error model (this has made it into the CLSI standard GP32). Here, holes (e.g., errors ) in a hunk of Swiss cheese need to line up before an actual error event is observed. Yet, this can also be represented by a fault tree, where the errors are events connected by an AND gate. The probability of an actual error event (the parent of the AND events) can be calculated by combining the probabilities of the individual (child) error events. While perhaps less colorful than the Swiss cheese model, the fault tree is more amenable to actually estimating error rates.

Reason also uses the terms active and latent errors. If one reads the Reason article, one gets the point. However, this concept is abstracted by other authors in a confusing way. For example, latent errors are defined in GP32 as “less apparent failures of organization or design that contributed to the occurrence of errors or allowed them to cause harm to patients.” Since one would logically fix apparent errors, this definition seems to make virtually all errors “latent.” Moreover, in references 3 and 6, these authors use latent as an error classification. In FRACAS, there are two important ways to classify observed errors; by severity and frequency of occurrence. Other classifications are secondary albeit often useful, but classifying errors as latent is simply too confusing.

The terms cognitive error and non cognitive error are quite useful, even though they will be blank for non human errors. Non cognitive errors are usually considered to be non preventable which imply detection recovery schemes to prevent the effects of such errors from occurring. Cognitive errors are usually considered to be preventable which implies control measures such as better training.

Some comments of the Astion paper

The work by Astion et al. on laboratory medical errors (3) contains a wealth of information, yet some aspects of their paper prompt me to comment.

One gets the impression that a fundamental way to prevent adverse effects has been missed in the paper. This is inferred from the definition of adverse and potential adverse events.

“A potential adverse event was defined as an error or incident that produced no injury but had the clear potential to do so. Such errors may have been intercepted before producing harm or may have reached the patient but, by good fortune, produced no injury.”

The second sentence has the key words that errors may have been intercepted (by good fortune) before producing harm. This seems to ignore the hierarchical relationship of events as expressed in a fault tree. That is, higher level events are effects and lower level events are causes. For many lower level events (causes) that occur (e.g. errors), there is a process step designed to detect this error and recover from it, thereby preventing the higher level adverse event. (See transplant example below.)

One can also question the usefulness of the classification of adverse events vs. potential adverse events. The clinical laboratory is removed from the patient. In almost all cases, all patient harm related to laboratory errors start out as potential adverse events – whether they become actual adverse events often result from circumstances outside of the control of the clinical laboratory.

Preventability is defined as

“A preventable problem was considered an error that was reasonably avoidable, in which the error was a mistake in performance or thought. Preventability was scored on a scale of 1, definitely not preventable, to 5, definitely preventable. A score of 3 or more indicated a preventable incident.”

It would seem that preventability is synonymous with whether the error is cognitive or non cognitive. This also neglects the fault tree model. It is not clear from the Astion et al. definition as to which events are considered for preventability. As an example, consider a chain of two events:

  • the wrong blood type organ is selected for transplantation
  • the wrong blood type organ is transplanted

Error 1 as a cause could be considered as non preventable as a “slip” or non cognitive error. However, the effect of error 1, which is event 2 can be prevented by instituting checks to detect error 1 and recover from it. Explicitly calling out the chain of events adds clarity, especially since detection recovery sequences are different from preventing errors, as detection recovery sequences prevent the effects of errors.

Another issue is that preventability is viewed in terms of preventing the occurrence of events without explicit reference to the control measure that would be used. Yet, the choice of the mitigation or control measure is key as it defines the expected effectiveness and cost of the control measure. As an example, assume that a medication error is caused by a pharmacist misreading difficult to read handwriting from a physician. One can envision at least two possible control measures:

  • a process step to call the physician if the pharmacist is in doubt
  • institute an CPOE (computer physician order entry) system

Control measure 1 could be questioned about its effectiveness, but is low cost. Control measure 2 is highly effective but high cost. Control measure 2, as any measure that is a cost burden, depends on the financial status of the institution. Thus, any preventability ranking needs to take into account the specific control measure intended. The authors’ Table 7 (preventability) lists error causes, but the likelihood of preventability should refer to control measures, none of which are listed.

This work has much of the lion share’s effort of a FRACAS. That is, actually collecting all of the error events is often the most difficult step. Using the data in a more traditional FRACAS would have been an improvement. Thus, table 2 is the severity and frequency of occurrence table, but is not labeled as such, nor is there a severity ranking. There is also no criticality (severity x frequency of occurrence) and no Pareto chart. A problem with the other tables is that they should really not appear as independent tables. Take cognitive and non cognitive errors. One does not want to know the overall split between these two items, one wants to know this split for each of the severity categories in table 2. Given this missing constraint, the classifications:

phase of laboratory testing cognitive vs. non cognitive error responsibility for the incident

are all valuable to help focus corrective action resources.

The bottom line

Here’s what needs to be done. Consider for a moment what a business does. They collect and analyze a lot of data, but all businesses have one (and only) one key number that they report – profitability. All data analysis is used by the business to inform decisions as to how to improve profitability.

There’s nothing like that in these references. What one needs is an overate rate of patient safety errors. This could be one rate if can combine (by a weighting scheme) high, moderate, and low risk errors. If not, then one would have three rates – those of high risk errors, moderate risk errors, and low risk errors. Everything else – all analyses, corrective actions, and so on should be geared towards reducing the error rates to acceptable levels. This last statement means that the clinical laboratory requires goals for each error rate. There is no mention of goals in these references.

It is possible to have non patient safety errors as well. These could be ranked for severity as well and have their own importance. Thus errors for accreditation or cost are also important.


  1. Reason JT. Human Error. New York, NY: Cambridge University Press; 1990
  2. Reason J. Education and debate Human error: models and management. BMJ 2000;320:768-770 available at http://www.bmj.com/cgi/content/full/320/7237/768
  3. Classifying Laboratory Incident Reports to Identify Problems That Jeopardize Patient Safety Michael L. Astion, MD, PhD, Kaveh G. Shojania, MD, Tim R. Hamill, MD, Sara Kim, PhD, Valerie L. Ng, MD Am J Clin Pathol 120(1):18-26, 2003 available at http://www.medscape.com/viewarticle/458299
  4. CLSI GP32 Management of Nonconforming Laboratory Events; Proposed Guideline
  5. ISO TC212 22367 Technical Report: Medical laboratories — Reduction of error through risk management and continual improvement.
  6. Carraro, P and Plebani, M Errors in a Stat Laboratory: Changes in Type and Frequency since 1996 2007;53: . This is coming out in June.
  7. Krouwer JS. Using a Learning Curve Approach to Reduce Laboratory Error, Accred. Qual. Assur., 7: 461-467 (2002).

Customer Misuse – What it is and what to do about it

October 18, 2006

The Issue

A manufacturer designs a diagnostic assay system and after considerable testing, the assay is approved by regulators and sold. Subsequently, there are some patient harm incidents traced back to incorrect results from this assay. Upon analysis, the manufacturer maintains that the incorrect results were caused by customer misuse.

The spectrum of customer misuse

When I worked for a diagnostic company, I remember a product that had poor reliability. In meetings devoted to solving the reliability issues, the head of engineering claimed for many of the problems that he could do nothing, because the problem was caused by customer misuse. One of these problems stuck in my mind because the customer was required on a regular frequency to disassemble a valve and clean it. To frequently rebuild the valve seemed excessive and the next generation product employed a new design which obviated this maintenance.

An example on another end of the spectrum is that some instrument systems allow the user to delay a required calibration. If the user continues to delay the calibration and to either not run quality control or to ignore failed quality control, incorrect results could easily be generated.

Although in the second case, one could argue that the user had violated the policy set up by the laboratory, the same might be true for the first case.

Of course there will be customer misuse issues which are less black and white (if in fact the above examples are).

These are hypothetical examples. A real example was reported for a home glucose analyzer (1) when users did not completely insert the reagent strip and got incorrect results leading to some hospitalizations. Not completely inserting the strip did not cause any error message and also represented an example of not following the instructions (e.g., customer misuse). In this case, the government successfully brought legal action against the manufacturer. In another example, a clinical laboratory (Maryland General Hospital) grossly violated their own policies (2).

Regarding customer misuse and blame, the taxonomy for errors described by Marx (3) is of interest.

The FMEA (fault tree) approach to customer misuse

If one sets up a FMEA or fault tree, the effect “incorrect results” can be caused by a variety of events, and the cause of these events can be the customer doing something incorrectly (customer misuse). Actually, some companies divide FMEAS into separate categories, with one FMEA devoted to customer use (e.g., misuse).

There are several questions to be addressed in these customer use FMEAs (as with all FMEAs).

  • what is the severity of the event caused by misuse– e.g., will it lead to potential patient harm such as potential causing incorrect results
  • what is the estimated probability of occurrence
  • what is the best control or mitigation to prevent this misuse error from occurring
  • what is the best way to detect this misuse error and recover from it

The spectrum of solutions for preventing customer misuse

Just as there is a spectrum of customer misuse, there is a spectrum for the mitigations to prevent customer misuse. The default mitigation for customer misuse is virtually always customer training ranging from the instruction manual (and offshoots such as videos) to onsite training. One must understand that since there is always an instruction manual (or package insert), the mitigation is to improve the instruction manual. The mechanism to do this involves usability testing. The other end of the spectrum is redesign. As the reliability consultant Ralph Evans suggested “make it easy to do the right thing and hard to do the wrong thing.” A previous essay on a medical error also illustrates this spectrum.

Some of the mitigations are also not that black and white. Consider a manufacturer that has conducted extensive interference testing for an assay and has reported in the product insert that 7 drugs interfere with the assay and when any of these substances are present above the concentrations listed, the manufacturer’s assay should not be used. If the clinical laboratory is wired into the hospital’s EMR (Electronic Medical Record) assuming that an EMR exists, one could suggest that rules could be built into the LIS (Laboratory Information System) to follow the manufacturer’s recommendation. Without these computerized systems, one would have to manually inspect (potentially) each patient’s medical record, which is a daunting task.

The current environment

Ever since the Auto Analyzer was invented, the trend has been towards instruments that are easier to use. Since clinical laboratory staff are less trained today, manufacturers design their products to be easier to use to gain competitive advantage and advertise this feature. Regulators such as the FDA also recognize the value of ease of use and require hazard analysis.

Ease of use is thus a key product attribute for many systems and fulfills Ralph Evans suggestion. But there will nevertheless always be customer misuse issues and each one must be considered with the result that some will be shown to be the responsibility of the manufacturer, some the responsibility of the clinical laboratory and for some agreement of responsibility will never be reached.


  1. Assay Development and Evaluation: A Manufacturer’s Perspective. Jan S. Krouwer, AACC Press, Washington DC, 2002, pp 1-3.
  2. See: http://www.westgard.com/essay64.htm.
  3. Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives http://www.mers-tm.net/support/Marx_Primer.pdf

Detection Systems – Fault Isolation, Automation, and Diagnostic Accuracy – 6/2006

June 12, 2006
Detection Systems – Fault Isolation, Automation, and Diagnostic Accuracy – 6/2006

First, a quick review

A clinical laboratory’s product is the report provided to clinicians, whose main element is the assay result. The result needs to be as error free as possible to prevent harm to patients. Assay performance goals can be expressed in terms of error grids such as are available for glucose. It is helpful to conceptualize clinical laboratory errors in terms of a fault tree or FMEA. The top level error one wants to prevent is providing an incorrect result to a clinician.

Another possible top level error is delay in the reporting of a result – to keep things simple that is not considered here, but could also lead to patient harm.

This top level error is the “effect” of many possible lower level errors (e.g., causes). In order to prevent the top level error, the clinical laboratory’s quality program tries to address lower level errors either by

  • preventing errors or
  • detecting and recovering from errors.

Note that detection without recovery is not useful and that these are two (separate) steps.

The use of quality control

Quality control is a means of detecting errors. The recovery part of quality control is simple – after a failed quality control result is observed, no patient results are reported since the last successful quality control . This raises an immediate concern about the CMS proposal to allow quality control to be run once a month, as this makes recovery rather useless – all of these potentially incorrect patient results will have been reported to clinicians. To summarize, quality control detects lower level errors and prevents the effect of these errors. In this way, it blocks the error cascade expressed by a fault tree or FMEA.

There is a another task that clinical laboratories must do after a failed quality control and that is to determine why the quality control failed, so as to correct the problem. This is where fault isolation plays a role.

Fault Isolation – Why its important

Fault isolation, when it is present, refers to a detection system, which points to a single root cause for the failure. To see why this is important, consider the following case, where incorrect results are generated by an assay system because of regent degradation caused by the reagent being stored above its maximum allowable storage temperature. To prevent this error, training would be used and perhaps the use of redundant refrigeration systems. In addition, consider two different detection systems to deal with this failure.

Fault isolation absent

Quality Control – The bad reagent can lead to a failed QC. Since failed QC can be caused by many factors, there is no fault isolation. So one must follow a troubleshooting protocol to determine the root cause of failed QC. This troubleshooting ensures that the next set of results will not fail QC – at least not for that root cause!

Fault isolation present

Temperature Sensor on Reagent – A sensor of the reagent box that indicates storage at a too high temperature by a color change does has fault isolation. Of course this relies on another detection step, where one looks at the temperature sensor.


Ideally, one would like all detection systems to have fault isolation since no troubleshooting is required which returns the system quicker to an error free state. But to design in detection systems with fault isolation for all errors, one must have a complete knowledge of all the ways a system can fail.

For the reasons this knowledge is often not the case, see the AACC expert session.

The value of quality control is that in many cases it detects errors, even though no one (the clinical laboratory or the manufacturer) has knowledge that such an error may occur. The disadvantage of quality control is that there is no fault isolation and a corrective action could involve a substantial amount of work. When this corrective action occurs before product release, it is simply part of product development, but when it occurs after product release in a clinical laboratory, it is also product development but conducted in part by the clinical laboratory.

Automated detection recovery systems:

Automated detection recovery systems are desirable and are prevalent on instrument systems. As an example, a sample’s response curve is evaluated by an algorithm. The algorithm can detect whether the response is too noisy, and if so signal the analyzer to suppress reporting that result (e.g., the recovery). Note that either the previous temperature sensor detection system or quality control are manual detection recovery systems.

There is no guarantee that an automated detection recovery system has fault isolation. In the noisy response example, there is no indication of what is causing the noise. For example, it could be a lipemic specimen or alternatively a dirty reaction chamber.

Diagnostic accuracy

The final dimension in this essay is the diagnostic accuracy of the detection system. This was also covered in the AACC expert session and relates the to number of false positives and false negatives that occur with the detection process.

Final Summary

With sufficient knowledge, one would either design a system without errors or employ detection systems for all possible failures. However, one does not have this knowledge. Good detection systems have high diagnostic accuracy, are automated, and have fault isolation. The value of quality control is that in spite of not having fault isolation or being automated, it can catch errors that are missed by detection systems.

Building and quantifying fault trees – an example – 10/2005

October 13, 2005
Building and quantifying fault trees – an example – 10/2005

This example will show how a fault tree helps in completing a FMEA. The example will also demonstrate some quantification.

Introduction and starting point


hCG – human chorionic gonadotropin

HAMA – human anti mouse antibodies

FMEA is a “bottoms-up” approach and a fault tree is a “top-down” approach. Both approaches are useful. A difficulty with FMEA is that the entries form part of a table and unlike a fault tree, much of the structure within the table cannot be expressed. This example is an hCG blood test. The component that is being investigated is the reagent. The starting point for this section of the FMEA is:

Failure mode – outlier result Failure cause – HAMA interference in assay

The questions are:

What is the failure effect? What is the severity of the failure effect? What is the frequency of occurrence of the root cause?

This gives the following FMEA fragment:

Component Function Failure effect Failure Mode Failure


Severity Freq. RI
Reagent Measure hCG through immuno- chemical reaction ? outlier HAMA interference ? ? ?

A corresponding fault tree fragment is:

Outlier result

OR HAMA interference in assay

Assume that this part of the FMEA is concerned with potential harm to the patient. There could be other effects of outliers too, such as customer complaints. An outlier result by itself does not inform one about severity with respect to patient harm, because patients are not directly connected to the assay. To assess the importance of the outlier, one must know what happens with outlier results.

There are many possibilities. Consider one, where the hCG value is elevated (falsely) leading to a diagnosis and treatment of trophoblastic carcinoma, when the patient does not have this condition.

The fault tree now looks like this

Error – Patient harm

OR – Outlier result

OR HAMA interference in assay

In the VA scheme for severity (1), this would be severity 3 (severe injury, but not death).

What about frequency of occurrence of the root cause? In this case, “HAMA interference” is considered as a discrete event – the assay either interferes or doesn’t (due to its design and formulation) and this assay interferes, so in principle, the frequency of occurrence is always! But this implies that every hCG sample assayed results in an outlier and this is not the case. What’s missing is that for an outlier to occur, the patient sample must have human anti mouse antibodies in sufficient quantity. So now the fault tree looks like the one below. Note that there are two AND events and that the original root cause of HAMA interference in the assay has been changed from an OR to an AND gate. Both AND events have to occur for an outlier to result. Assume that 1% of patient samples have human anti mouse antibodies. This gives a frequency of occurrence for outliers of 1%.

Error – Patient harm

OR – Outlier result: freq. 1%

AND Human anti mouse antibodies in patient sample: freq. 1%

AND HAMA interference in assay: freq. 100%

However, in a real lab, results are reviewed before they are reported, so this step must be added. The result review can be considered as an error, detection, recovery scheme.

HAMA assay interference is a known problem with immunoassays and there are methods to detect it, which may be performed for certain assay results according to the lab’s rules. Assume that detection is successful 75% of the time (in detecting errors due to HAMA interference). Recovery means that the assay will be repeated to eliminate the interference and the new result reported to the clinician. Be aware that recovery is not always 100% effective – it can fail. Assume that in this case it is 99% effective. What is the outlier rate, given these assumptions? In this case,

  • assume there are 10,000 reported results per year
  • the 1% outlier rate gives 1,000 outliers
  • of the 1,000 outliers 26 (2.6%) incorrect results will be reported to clinicians and 74 (7.4%) of the outliers will be detected with recovery and no longer be an issue. (For this example, numbers have been rounded) This gives:

Error – Patient harm

OR – Outlier result reported: freq.  26 per year

OR – Result review fails: EDR Sequence*

OR – Outlier result

AND Human anti mouse antibodies in patient sample

AND HAMA interferences in assay

*EDR = error, detection, recovery

One still has to take into account two more things: 1) the outlier result must fall into a specific region of a (Parks type) error grid (2) to cause this level of patient harm and 2) the clinician must act on the incorrect result.

The error grid means that outliers (e.g., large errors) that don’t cross medical decision limits are not as dangerous as errors that do cross medical decision limits. In addition, the clinician has the opportunity to question the result and (for any reason) not act on it. If this happens, there may be no patient harm and in any case the outlier is not involved. Assume that these values are:

26 x (outlier percent in dangerous region) x (percent clinician acts on incorrect result) =

26 x (5%) x (50%) = 0.65

This gives as the final fault tree for this cause and effect:

Error – Patient harm (1) frequency of occurrence ~= slightly more than once in two years

AND – Clinician acts on incorrect result (2)

AND – Outlier falls into dangerous region of error grid (3)

AND – Outlier result reported (4)

OR – Result review fails: EDR Sequence*(5)

OR – Outlier result (6)

AND Human anti mouse antibodies in patient sample (7)

AND HAMA interferences in assay (8)

OR – Other causes

*EDR = error, detection, recovery

Note that there are other possible causes for the outlier to occur (the bottommost OR gate), which would raise the frequency of this type of patient harm, but these causes are distinct from HAMA interference. Also, the original question of the frequency of occurrence of the root cause is being addressed by the frequency of occurrence of the effect of the root cause.

Thus, the fault tree has helped to inform the FMEA. An outlier has many possible failure effects, the one studied here has a severity of 3 and causes serious harm to the patient and has a risk to occur of slightly more than once in two years which in the VA frequency scheme is the second highest frequency of occurrence. It’s hard to imagine this level of analysis with only a FMEA table.

Component Function Failure effect Failure Mode Failure


Severity Freq. RI
Reagent Measure hCG through immuno- chemical reaction Unneeded, harmful treatment outlier HAMA interference 3 3 9

Further discussion

This fault tree could still considered to be simplified and of course all of the numbers have been made up, but note that there have been 12 cases reported recently in which unnecessary treatment was carried out due to incorrect hCG results caused by HAMA interference (3).

A quantification of an entire fault tree (or a large subsection) requires algorithms which are available only in advanced (and expensive) fault tree software. This software is warranted in these cases, provided that one has good input data.

The fault tree helps to suggest risk mitigations. For this example, among the possible lab risk mitigations are:

  • One should of course try to select an assay which has been shown to have no HAMA interference, or if there is interference, only to a smaller subset of patients (e.g., with much higher levels of human anti mouse antibodies).
  • One could try to improve the detection success percentage. If this were 95%, for example, the rate of patient harm would be reduced to 0.15 events per year (once in 6.6 years).
  • Not mentioned in the fault tree is the interface between the lab and clinician, which also represents a lab risk mitigation opportunity. That is, clinicians focus is on patient care, and lab personnel focus is on lab assays. There would benefit by the lab being aware of clinician actions, given lab results so that a feedback loop could be added to the detection scheme.

A manufacturer’s risk mitigation would require an expanded fault tree, with causes listed for the HAMA interference. This also illustrates the concept of not enumerating causes when they are not relevant. That is, a lab may know possible reagent causes for HAMA interference, but if the lab must use a manufacturer’s assay without reagent modification, these causes are not relevant.

Finally, note that whereas a risk mitigation (or initial analysis) may result in a very tiny frequency of occurrence (e.g., once in 1,000 years) it still won’t be zero.

Building fault trees using a top down approach

This example was for illustration purposes. This is because the example involved building a fault tree from a FMEA, which while possible would not be how a fault tree is typically done. If one were normally building a fault tree, one would use a top down approach. The end result would be the same. Thus, the main error types are:

Lab error     OR Complaints     OR hazards         OR harm to patient         OR harm to operator     OR others

Expanding the harm to patient

Lab error     OR Complaints     OR hazards         OR harm to patient             OR outlier result             OR patient ID mix up

Expanding the outlier event, with help of a process flowchart

Lab error     OR Complaints     OR hazards         OR harm to patient             OR outlier result                 OR Interference                     OR HAMA interference                 OR Random noise

Continuing with this tree would give the same results as above. Note that the process flowchart does not help with all parts of the fault tree.


  1. The Basics of Healthcare Failure Mode and Effect Analysis, available at http://www.patientsafety.gov/SafetyTopics.html#HFMEA
  2. Parkes JL, Slatin SL, Pardo S, and Ginsberg BH. A new consensus error grid to evaluate the clinical significance of inaccuracies in the measurement of blood glucose. Diabetes Care 2000;23:1143-1148.
  3. Rotmensch S, Cole LA. False diagnosis and needless therapy of presumed malignant disease in women with false-positive human chorionic gonadotropin concentrations Lancet. 2000;355:712-5.

Why “latent errors” is not a good term

March 17, 2005

One occasionally hears the term “latent errors” in articles about error reduction techniques (1). The purpose of this essay is to explain problems with this term and to suggest alternatives.

Latent implies hidden. Berwick uses the term “latent failures” and equates this with “the little things that go wrong all the time”. These are misleading concepts. Consider an example. In a recent presentation, Astion presents some examples of latent errors and their effects (2).

  • Computers: A lack of 1 instrument interface is responsible for many active data entry errors.
  • Staffing: 1 latent error regarding suboptimal staffing leads to multiple active errors by staff who are forced to multitask.
  • Policy and Procedure: A bad strategy for handling phone calls can lead to multiple errors

Before analyzing one of these examples, consider a fault tree model of lab error. This is a hierarchical (“top down”) chart of errors which has the following properties:

  • The severe errors are at the top
  • Errors are connected through parent – child connections
  • The parent errors are the “effects” of the child errors
  • The child errors are the causes of the parent errors
  • The tree uses “gates” which include:

o       or gate means any child event in that branch that occurs will cause the parent error

o       and gate means all child events in that branch must occur to cause the parent error

o       basic gate is the end (cause) of a tree branch

  • Each error is classified as to its:

o       severity

o       probability (likelihood of occurrence)

Considering the staffing error mentioned by Astion. One could postulate one of many branches of a fault tree to contain this error as:

Top – outlier (e.g., incorrect result) reported to clinician

AND – assay has interference to lipemia

AND – sample is lipemic

AND – (visual) detection for lipemic sample failed

OR – technician not available to perform test

OR – technician called away

BASIC – inadequate staffing

Translating the events of this tree into English (sort of), an outlier will be reported if the assay has an interference to lipemia, AND the sample is too lipemic AND the step for visually examining the sample has failed. The visual examination step failure can have several causes, one of which is the technician does not perform this step. This has several causes, one of which is the technician has been called away because the staffing is inadequate (e.g., there is a problem somewhere else that should be handled by staff but, inadequate staffing prevents this).


An outlier that is reported to a clinician is among the most severe errors that a lab can make. Every event in this branch of the tree has the same severity classification because the effect of any of these errors is the top level error. The importance (ranking) of these errors may be different because the probability of each of these errors may be different.

Because the severity is high, there is no reason for calling any of these events “the little things that go wrong.” They are all severe events. There is also no reason to call them latent (e.g., hidden) or to call the top level error “active” which implies that the lower level errors are not active. Any of these events that occurs are active. Whether these events can be detected depends on programs in place (3) to expose such errors.

To summarize what FMEA recommends:

  1. Flowchart the process
  2. Add process steps to a fault tree
  3. Add causes to each potential process step error
  4. Add FMEA information to each event
  5. Rank the errors
  6. Propose mitigations


Before deciding that one has thought of all possible causes for an error, one should consult a list of “QSEs (Quality System Essentials) (3). These are generic activities that apply to virtually all aspects of a service. These activities will typically not appear in flowcharts because they are so pervasive that they would make flowcharts too complicated.

For an additional critique of reference 1, see reference 4.


  1. Berwick, DM. Errors Today and Errors Tomorrow N Engl J Med 2003;348: 2570-2572
  2. Astion M. Developing a Patient Safety Culture in the Clinical Laboratory http://www.aacc.org/AACC/events/expert_access/2005/saftey/
  3. Application of a Quality System Model for Laboratory Services; Approved Guideline—Third Edition GP26-A3 NCCLS 2004 Wayne, PA
  4. Krouwer JS There is nothing wrong with the concept of a root cause. Int J Qual Health Care 2004;16:263

See also additional references at the end of the systems not people essay.