Recovery in FMEA

February 17, 2005

A previous essay (near miss) describes a model of preventable medical errors as following a cascade of error events whereby the cascade may be terminated by an error event -> detection -> recovery sequence (whereby detection and recovery are successful). Much has been written about detection (see detection essay) but little about recovery. Perhaps one might assume that recovery is always successful – it isn’t.

Consider the following real example of a failed recovery. The incorrect leg of a patient was scheduled to be amputated. This error event was detected but not all copies of the schedule were corrected and the incorrect leg was amputated (1).

Whereas this might seem to be an exceedingly rare event, another example illustrates the need to pay attention to recovery. Every year there are thousands of manufacturer recalls that are sent to hospitals. To get a sense of this, visit the FDA web site for recalls. One may consider  the event -> detection -> recovery sequence as having occurred across multiple sites. The original error event may have occurred at a hospital or not but has likely been reproduced in some way by the manufacturer and hence has been detected by the manufacturer. A recall notice is sent to all customers with ‘recovery’ instructions such as: throw away the following lot, or call service before using this instrument. Now I know, as a supplier of FMEA software to hospitals that the likely contact that I have with a hospital is with a purchasing agent. Hence, a successful recovery of a manufacturer recall, while simple in theory can be difficult in practice (2) and has led to patient deaths. A standards organization has completed a project to provide a guideline for recalls.


When one prepares a flowchart of the process, recovery should be added as a process step if there is a detection step. One may then ask whether the recovery step is adequate or needs to be improved.


  1. Scott D. Preventing medical mistakes. RN 2000;63:60-64
  2. Featherly K. Product recalls Patient safety’s neglected sibling Healthcare Informatics 2004;21:12

Detection in FMEA – why it should not be included in a Pareto ranking

February 13, 2005

An early FMEA guideline (1), states that numerical rankings should be given to two properties of a potential error event: 1) the severity and 2) the probability of occurrence. These two numbers are multiplied together for each event in what is called a criticality analysis and the resulting list of potential error events is sorted by descending criticality in what is commonly known as a Pareto chart (or table).

In some recommended ways of conducting a FMEA (2), the likelihood of detection of the error event is ranked as a third property of an event. Thus, the most severe, most probable, and least likely to be detected potential error event is the highest ranked event. This ranking is called the risk priority number.

The inclusion of likelihood of detection is not recommended. First, in an error model, one can postulate that severe events are the result of a cascade of prior events (see near miss essay). This cascade may be terminated by a detection / recovery combination. So if one were to include detection, one should also include recovery. Whereas, one might think that recovery always occurs and is successful – this isn’t the case. Another reason for excluding likelihood of detection from the Pareto ranking is that both detection and recovery are often process steps, meaning that they themselves are potential error events. In any process step, one has already asked about the probability of occurrence. Hence for a detection process step, asking about the probability of occurrence of a detection error event makes things very confusing since one would be asking:

  • the severity of an error in the detection event
  • the probability of occurrence of an error in the detection event and
  • the probability of occurrence of detecting error in the detection event

There are other problems. One can envision a case where an event has a higher risk priority number due to the detection ranking. This could result in lower severity events to be ranked higher than higher severity events, which is illogical (e.g., given the same probability of occurrence, injury could be ranked higher than death):

Severity Probability Detection likelihood RPN
10 2 3 60
8 2 7 112


All of this does not mean that detection isn’t important – it is – it just shouldn’t be included in the Pareto ranking. One can and should ask about whether a potential error event is detectable. However, looking at an anatomy of a potential error event (see near miss essay), error events have a detection / recovery sequence and both detection and recovery can be process steps. Reducing risk means reducing the likelihood that the effect of an error event will occur. This can be accomplished by:

  • reducing the likelihood of an error event
  • adding or improving a detection step for that error event
  • adding or improving a recovery step for that error event


  1. Available at
  2. See for example:

Reference added 4/9/05

  1. Schmidt MW. The Use and Misuse of FMEA in Risk Analysis. Medical Device and Diagnostic Industry 2004 p56 (March), available at

The anatomy of a near miss – 2/2005

February 13, 2005

A near miss implies that a catastrophic event (sentinel event in JCAHO terms) has nearly occurred. A “near miss” is actually a poor name – a “near hit” would be better. Alternatively, a “good catch” could be used since a catastrophic event has been prevented. However, near miss is universally understood and will be used here.

A near miss is shown as a cascade of events whereby a sentinel event has been prevented due to a detection and recovery sequence. In the figure below, if either detection or recovery fail, the sentinel event (the next event = event B in this cascade) occurs. Thus, detection and recovery play a key role in a near miss. The sentinel event is also the effect of the prior event.

Figure 1 Error event Cascade

One can further classify near miss events as follows:

Planned detection and recovery – Here, detection is a process step. Example. A lab specimen was examined for lipemia as required (planned detection) . Lipemia was found and the sample underwent an ultracentrifuge step (planned, successful recovery) before analysis.

Chance detection – Detection occurred only by chance. Example – A portable BP monitor was disconnected during an MRI. The BP monitor was then incorrectly reconnected to the IV line. A family member noticed the incorrect connection (chance detection) and called a nurse who corrected the problem (unplanned, successful recovery) (1).

Unsafe situation (Accident waiting to happen) – An error event is only recognized as such after a chance detection. Example – Two similar looking medications are next to each other. If an incorrect selection is made, the result could be fatal. Placing the similar medications next to each other can be considered to be a process error event. This error event may be a cause for selection of the incorrect medication. If the wrong medication is selected and this error is detected before administering the medication (chance detection and unplanned, successful recovery), a near miss has occurred.

One may further consider these cases with respect to FMEA and RCA (Root Cause Analysis).

Planned detection and recovery – FMEA analysis seeks to add planned detection and recovery where they were absent or to improve detection and recovery, by asking how can an error event be detected and what is the recovery.

Chance detection – During FMEA analysis, the addition of a detection step can be thought of as changing a chance detection to a planned detection. If an error event has occurred and been detected by chance, the addition of this detection as a planned process step would have been achieved through RCA.

Unsafe situation – An unsafe situation is an unrecognized error event. By definition if the error event is unrecognized, detection and recovery are unknown. By analyzing the process steps through FMEA, events that were previously unrecognized as potential errors could now be so recognized. Planned detection and recovery steps could then be added.

Chance detection implies an unsafe situation – If one considers the BP problem above, one could suggest that having a BP Luer connector that can attach to an IV line is an unsafe situation (e.g., an error). Starting with that premise there are several possible mitigations including training, warning labels, and different equipment which would prevent the incorrect connection.


  1. ISMP web site:

How to Write A Report That Will Be Read

February 6, 2005

Although this material can apply to any report, it is primarily intended for reports based on data. This tutorial covers the following topics:

  • Problems with reports
  • The need written reports
  • The difference between data and information
  • Tips on converting data into information
  • Using a format to reach different audiences efficiently
  • How to evaluate your reports
  • Reports for business vs. journals

Problems with Reports

Often after reading a report, the reader has the following problems:

  • the reader doesn’t know what the recommendations are
  • the reader doesn’t know what the conclusions are
  • the report style is often a barrier to having the reader try to read the report

Due to their style, many reports are not really read, but skimmed – thus “reading” used above means any activity that the reader chooses.

Don’t believe it? Try the following exercise. Select a report that you have written and give it to three people. Ask them to read it (often a challenge in itself). Then, ask everyone to state the recommendations and conclusions of the report. Do these match what you intended?

The Need for Written Reports

There is sometimes a tendency to forgo writing a report and instead verbally summarizing the results of a study (almost always “to save time”).

The advantages of writing a report are:

  • experience has shown that written recommendations and conclusions often differ from verbal recommendations and conclusions
    • there is a different psychology involved between preparing a verbal report (e.g., largely ephemeral even if minutes are taken) and preparing something in writing
  • documentation: written recommendations and conclusions will be available if needed in the future
  • legal, GMP, etc.

Data and Information


Data – Facts and figures

Information – Knowledge gained from data

The goal of a good report is to transform data into information or, putting it another way, have the report do the work – not the reader.

A prerequisite for a good report is to have a clear goal that is being addressed. Many problems, including unreadable reports, stem from unclear goals.

Assuming that:

  • there is a clear goal
  • meaningful data have been collected to address the goal

then the task is to transform data into information through

  • data analysis and summaries
  • and putting the information into an easy to read format

Below is a somewhat extreme example of the difference between data and information.

These are data

begin 666 burst1.gif
























The above data have been converted into information.

In this example, the ‘data’ have the same content as the ‘information’ (the data is the uuencoded gif file, whose alternative representation is shown by the above figure).

Data Analysis Tips (A longer version of this section is covered in actual training sessions)

Some suggestions for converting data into information.

Convert raw data into:

  • plots
  • tabular summaries
  • other summaries

The progression is data->information as one goes from the raw data to data summaries to plots.

Brainstorming about unsummarized data is often appropriate during the data analysis phase, but “discussing” (e.g., summarizing) raw data at a meeting because the report does not have summaries, is inefficient.

Use units that are meaningful to the reader. For example, use concentration rather than response units and resist the temptation to be esoteric (a glucose value of 320 mg/dL means more to most than a glucose value of 310 nanoamps).

Focus on the question being addressed. Example: Is A different than B?

If the report contains only two columns of results (A and B) then the reader must perform the subtraction, which is implied by the question is A different than B. In a better report, this subtraction has already been done as a third column.

A Report Template

Here is a report format that highlights information – not data.

  6. DATA

Attributes of this report format are:

  • Within 3-6, and going from bottom to top
    • data are being transformed into information
    • sections get shorter
    • for a correctly written report, each section is supported by the one below it
  • People know where the recommendations and conclusions are
  • People who only want to read the recommendations can do so quickly

Descriptions of the report sections:

Purpose describes why you are writing the report, i.e., why was the experiment performed.

Background contains introductory information about the project and often contains an outline of the protocol (the full protocol is often appended).

Recommendations are actions such as: use 1 mmol/L phosphate (rather than: 1 mmol/L was found to be optimum, which is a conclusion). The purpose and background sections should be short enough so that the recommendations start on the first page.

Conclusions are a concise summary of results. Individual recommendations and conclusions should be numbered and placed in separate paragraphs

Results are a description of the assumptions, data analysis methods, theory, etc. Results contain data summaries, tables, plots This is also a good place to document the system configuration, i.e., serial number, lot numbers, etc.

Data are the numbers or inputs to the experiment. Data can also contain summaries but there should be a trail to the raw dat

Symptoms of problem reports and remedies


Symptom Remedy
People call and ask ” what’s the bottom line” Use recommendations
You have to call a meeting to discuss the report Use report format
You get no response (because no one has read it) Use report format
You get back a marked up copy Use proper English, a spell checker, don’t go overboard on fonts


Reports for businesses vs. journals

I was invited to write a Letter to the Editor about this topic for the journal Clinical Chemistry. This resulted because during a review of a manuscript I pointed out a problem that the conclusions were not supported by the results and suggested that the report format suggested above might be helpful. The Letter was published (1). However, the editors decided not to adapt this suggestion.


  1. Jan S. Krouwer: Proposal to add an optional Recommendations section to Clinical Chemistry Abstracts. Clin Chem, 2002;48:2292.

The “Preventable” in Preventable Medical Errors and the role of regulation

February 2, 2005

The term “medical errors” is often preceded by the modifier “preventable.” This essay considers the meaning of preventability, how it can be measured as well as the role of regulation in reducing the risk of preventable medical errors.

There are some medical errors for which preventability is rarely questioned. These include medical errors such as wrong site surgery (1), administering the wrong drug when the correct drug was ordered (2), or transplanting organs of the wrong blood type (3). Less preventable medical errors include judgment type errors such as case studies reported in journals, where one or more experts review the treatment decisions of a clinician and conclude that the clinician’s judgment was incorrect (4).

The role of FMEA

FMEA (Failure Mode Effects Analysis) is a tool that when performed adequately, can reduce the risk of preventable medical errors. Hospitals in the US that are accredited by JCAHO are required to perform at least one FMEA each year. The main output of FMEA is a series of mitigations, each of which is some process change implemented to reduce the risk of error. Because resources are limited, implementing all mitigations are not possible so the challenge is to find the set of mitigations that provides the highest reduction in risk for the least cost. Hence, preventability may be viewed in terms of the cost and effectiveness of a mitigation. A low cost and effective mitigation is associated with a highly preventable medical error, whereas a high cost and or less effective mitigation is associated with a less preventable medical error.

Preventability viewed in a decision analysis framework

Decision analysis, while beyond the scope of this essay is commonly to used to evaluate research opportunities in industry, whereby cumulative profit is graphed vs. cumulative R&D cost for a series of projects (5).

Figure 1 shows a similar type of graph for mitigations. Here, the cumulative cost of medical errors is graphed against the cumulative cost of mitigations. In decision analysis language, one conducts a “portfolio” analysis in that a “basket” of mitigations is selected (from a larger set of mitigations) that has the most effectiveness in reducing risk, for the least amount of cost.

Figure 1 – Portfolio type analysis for mitigations

The role of regulation

Regulation seems to be an inevitable response to adverse events that occur early on, as the public demands protection. The concept of regulation to prevent medical errors is simple. Through inspection, facilities are certified and / or accredited for the medical service that they provide. In theory, services that are likely to commit medical errors are denied accreditation, so the public is protected. The outcome of the inspection process can be viewed as a 2×2 table, similar to a medical test (Table 1).

Table 1 Inspection outcomes

Likely outcome Passes inspection Fails inspection
Commits medical errors Service allowed to harm public Public protected from medical errors
Doesn’t commit medical errors Service allowed to operate without harm Service unnecessarily shut down

This is somewhat of a conceptual table because if a service is shut down, one does not know whether medical errors will actually occur. But a key point is that hospitals rarely fail inspections such that accreditation is lost! (6) Hence, one can largely ignore the last column, which means among other things that accreditation is not useful as a means to test the capability of a service to prevent medical errors. One could of course suggest that more hospitals need to fail accreditation. However, the inspection process is an estimate of whether a hospital will commit errors and estimates have uncertainty. Even if the likelihood of medical errors were higher as judged by inspection, this must be traded off against the consequences of loss of accreditation, which could mean that many people loose their jobs and that some people will not have access to medical services, both of which can affect morbidity and mortality (e.g. must be traded off against the possibility of medical errors that also affect morbidity and mortality).

Although the inspection process is largely not a means to deny accreditation, it still may be viewed, at least conceptually, as a way to lower the preventable medical error rate. That is, during the inspection process, unsafe situations are documented to have been remedied, as no hospital wishes to loose accreditation. However, one may also question the effectiveness of a FMEA process that is conducted as part of a regulatory requirement.

FMEA goals and regulation

FMEA is now carried out as part of a JCAHO requirement; however, prior to 2002 it might have been performed as a hospital quality initiative unrelated to regulation. Consider the likely goals for each FMEA:

JCAHO FMEA Goal – Pass inspection Non regulatory FMEA Goal – Reduce risk of medical errors

Now a regulatory person might argue that the goal of reducing risk of medical errors is implied in the JCAHO FMEA goal of passing inspection. Whether this actually happens depends on how the inspection is carried out. In fact, there is evidence to suggest that inspections do not achieve this goal. For example, it was reported that the incidence of wrong site surgery is rising in spite of the fact that most hospitals pass (FMEA) inspections and remain accredited (7).

As a provider of FMEA software, feedback with potential clients revealed that many are seeking to “streamline” the way FMEA is performed since a hospital goal is not only to pass inspection but to do so with minimal effort. This is consistent with an analysis of quality initiatives in industry that have not been fruitful (8). This situation is not unique to FMEA. For example, most industries are required to obtain ISO 9001 certification (which is supposed to guarantee high quality) and once again, gaining and maintaining certification (with the least effort) is the main goal for industry. There is a consulting industry devoted to shepherding companies through the inspection process. Yet, analysis of the ISO 9001 process used by medical diagnostics companies to gain certification showed that there is no reason to believe that quality has been improved (9). Although some organizations may try to improve quality (and succeed), these activities are often largely separate from quality programs required by regulatory agencies.

One of the main problems with regulatory required quality programs is that there is too much emphasis on complying with “horizontal” standards. These standards provide broad guidelines without supplying details. While often favored by industry trade groups, such standards allow compliance to be achieved at the discretion of the user. In fact,  these are not really standards, since within the general guidelines, there is too much leeway – the main requirement is to provide documentation that whatever procedure was selected, was in fact carried out.

Some recommendations

There are no magic bullets. Ideally, a cultural change would affect attitudes toward quality programs, but that has not happened. Inspections and accreditation are here to stay and do fulfill a role. Within this framework, the following is recommended.

More emphasis on vertical standards – Vertical standards are needed, which proscribe how to do things. They should replace horizontal standards, which provide generic guidelines.

More measurements and reporting – Vertical standards should require an organization to set quantitative goals. Inspections should be less concerned with documentation that shows that a procedure was followed and more concerned with documentation that shows results vs. goals (e.g., what was measured and what was the goal).

Beware of some quality “gurus”

As a quality consultant, it may seem self-serving to complain about other organizations, but one should nevertheless be aware of some shortcomings. For example, the IHI (Institute of Healthcare Improvement) suggests use of before and after Risk Priority Numbers (RPN) to demonstrate risk reduction through FMEA (10). I and others have commented that detection should not be included in a priority ranking and that before and after rankings are also suspect since the highest severity, lowest probability event is often a common classification and whereas this event cannot have its ranking changed, it still may be beneficial to devote mitigation effort to these events (11-12). In fact , one could argue that recommendations such as the use of before and after RPNs adversely affects the culture needed for quality improvement.


  1. Scott D. Preventing medical mistakes RN 2000;63:60-64.
  2. An omnipresent risk of morphine-hydromorphone mix-ups. From the July 2004 Institute for Safe Medication Practices web site,
  3. Molter J. Background Information on Jesica Santillan Blood Type Mismatch Feb. 17, 2003 accessed 12/5, 2003 at
  4. Lukela M, DeGuzman D, Weinberger, S and Saint S. Unfashionably Late. New Eng J Med 2005;352:64-69.
  5. Assay Development and Evaluation: A Manufacturer’s Perspective. Jan S. Krouwer, AACC Press, Washington DC, 2002 pp 18-32.
  6. Managing risk in hospitals using integrated Fault Trees / FMECAs. Jan S. Krouwer, AACC Press, Washington DC, 2004 p 11.
  7. See:
  8. See
  9. Krouwer JS. ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry. Accred. Qual. Assur. 2004;9:39-43.
  10.  See
  11. Managing risk in hospitals using integrated Fault Trees / FMECAs. Jan S. Krouwer, AACC Press, Washington DC, 2004 pp 7-8 (Also see essay on detection).
  12. Schmidt MW. The Use and Misuse of FMEA in Risk Analysis. Available at