Medical Error FMEA Risk Grids – why they are a problem

September 28, 2007

FMEA risk grids are presented as a small spreadsheet.  As an example, the VA HFMEA “scoring matrix” is shown below.

























This table is similar to those in the ISO standard on risk management, 14971. The idea is to classify all potential errors in a process as to their severity and probability of occurrence. Each potential error event will fall in one of the grid cells. Events that fall in the yellow cells (designated as number+”y”) are unacceptable and require action.

So what’s wrong with this? On the face of it, there is nothing wrong – it is a standard practice from other industries. The problems emerge as one looks into the details. Each of the row and column headings are defined by the VA as follows (for the sake of brevity – severity is limited to patient outcomes only):


Frequent -Likely to occur immediately or within a short period (may happen several times in one year)
Occasional -Probably will occur (may happen several times in 1 to 2 years)
Uncommon -Possible to occur (may happen sometime in 2 to 5 years)
Remote -Unlikely to occur (may happen sometime in 5 to 30 years) 


Catastrophic  – Death or major permanent loss of function (sensory, motor, physiologic, or intellectual), suicide, rape, hemolytic transfusion reaction, Surgery/procedure on the wrong patient or wrong body part, infant abduction or infant discharge to the wrong family
Major –
Permanent lessening of bodily functioning (sensory, motor, physiologic, or intellectual), disfigurement, surgical intervention required, increased length of stay for 3 or more patients, increased level of care for 3 or more patients
Moderate – Increased length of stay or increased level of care for 1 or 2 patients
Minor – No injury, nor increased length of stay nor increased level of care  

Now the problems can be seen:

If one focuses on catastrophic errors, almost all of them will be located in the remote cell – none will be in the frequent cell. The implication of the VA table is that one does not have to further examine a process that has remote catastrophic errors but this is clearly wrong. There are many examples of catastrophic errors whereby it is desired that they occur much less than once in 5 to 30 years. An example is the tragic case of the Mexican teenage girl, who was given organs with the wrong blood type and later died, a story which made the national news.

To understand what needs to be done, one must examine a potential error in more detail. For simplicity of this explanation, let’s eliminate human error and focus on machines. In the operating room, blood gas results are needed, so this hospital has a blood gas lab nearby with a blood gas system. But there is a possibility that this blood gas system will fail and blood gas results will be unavailable. One has reliable data from the manufacturer about the frequency of a blood gas system failing. The mitigation is to have a second blood gas system. Now, blood gas results will be available unless both blood gas systems fail simultaneously. So now the probability of unavailability of blood gas results is lower, but it is not zero. The hospital can keep on going and put in place as many blood gas systems as they see fit, with each additional system lowering the probability of this adverse event from occurring. From the standpoint of a risk grid, one will eventually arrive at a cell that has acceptable risk.

The point is that for many (all?) catastrophic errors, the desired goal is to never have them. Because one is dealing with probabilities, never is unattainable, so one must put in place mitigations which lower the probability to an acceptable level.

This also means that for each potential catastrophic error, one must quantify the probability of occurrence before and after any mitigations and this is the problem – this quantification is a monumental task and until it is tackled, one will have FMEAs performed to satisfy regulatory requirements, but with little improvement in reducing risk. 

Myths of EQC (Equivalent Quality Control)

September 27, 2007

CMS has established EQC (Equivalent Quality Control) as a way for clinical laboratories to reduce the frequency that they perform quality control, provided they meet certain guidelines. I have previously commented on the problems with this: see my 4/20/07 blog “Beware of Equivalent Quality Controland also an AACC expert session at

The purpose of this entry is to deal with the fact that although the expert session dealt with myths of EQC – these myths persist in comments by CMS and by people who are preparing CLSI documents about risk management. Hence, I will repeat here some of the myths.

1.       Internal QC is new – What is meant by internal QC is algorithms and associated hardware to detect and prevent incorrect lab results. Internal QC has been around since the days of SMAC. Whereas it is always being improved, it is not new. “Internal QC is new” is often used as a justification for implementing EQC as in … because modern analyzers are now using new, sophisticated …

2.       External QC is redundant to internal QC – It is often implied that one is justified in reducing external QC because it is redundant to internal QC. This is not always true. As an example, say an algorithm looks at the response to determine if the sample is too noisy. If so, the sample will be rejected. But algorithms such as these do not work 100% of the time. If the algorithm fails on a calibration, the calibration will go through and all subsequent samples could have a shift. External QC is different and when run will likely detect the shift. An example of redundancy is to have five blood gas systems in the laboratory. If one system fails, the other systems can be used. (See also #5, #6).

3.       External QC doesn’t work for unit use systems – Here, it is implied that each sample run is completely unique in unit use systems so that external QC can only inform about one sample. This is not true. Unit use systems are manufactured and the manufacturing process can have drift and bad lots, so that a batch of unit use devices are bad. External QC will detect this condition.

4.       Internal QC always works (is 100% effective) – See #2 – internal QC often has the properties of a medical test – there are some false positives and false negatives. One can see that this point is missed in writings about internal QC. Thus, if one has data from an internal QC experiment, such as success was achieved in 100 out of 100 tries, one must realize that whereas the point estimate is 100% (effective), the confidence interval is not. One has not proved 100% effectiveness. The value of external QC is that is uses a different mechanism and can catch errors that internal QC misses.

5.       Because one performs FMEA and other risk management tools, one has thought of everything  – Of course, there will be no associated internal QC for a failure that no one has thought of. But with external QC, one does not require knowledge of failures for external QC to work. One need only review the list if FDA recalls to see that manufacturers have not thought of all ways a system can fail. (see also #2, #6).

6.       No one makes human errors – In reviewing the list of FDA recalls, some errors are human errors – such as releasing a lot of reagent that has failed. Once again, external QC can catch these errors.

There will be many conditions where internal QC catches errors that would be missed by external QC, but there is no scientific evidence that one can reduce the frequency of external QC without increasing the risk of medical errors.

Feedback on how well you speak a foreign language

September 19, 2007

You’ve studied a foreign language and are trying to use it. As an example, say your native language is English and the foreign language is German. Here is some feedback you may encounter when you ask a person a question in German. The feedback is ranked from best to worst.

  1. No apparent reaction – your proficiency in German is not mentioned. The conversation takes place all in German.
  2. During the conversation – all in German – it is mentioned how well you speak German.
  3. Although the conservation takes place in German, ocassional words are spoken in English by your talking partner. He/She figures you wouldn’t understand these words in German.
  4. Right away, your talking partner complements your German, but in English.
  5. Your talking partner only speaks English to you.
  6. Your talking partner says something in German and you have the “glazed look”.

In situation 5, if you only speak German and your talking partner only speaks English and continue, it’s like a game to see who will give up first.

A Blog (from someone else) Worth Reading

September 16, 2007

I have ranted about pay for performance (P4P). I recommend two blog contributions by DrRich.

The first has similar ideas to mine – probably one reason I like it so much. The second is about healthcare rationing and the financial aspects of P4P and is also very interesting reading.

However, I take issue with the DrRich’s comparison between widgets and patients in his first blog.

“P4P also relies on the Axiom of Industry – that the standardization of any process both improves quality and reduces cost. As DrRich has described elsewhere, the Axiom of Industry does not hold when the process involves actual human patients. This is because patients are not widgets. (While everyone agrees that patients are not widgets, the implication of this fact seems to have escaped many: What happens to the individual widget on an assembly line is immaterial – discarding even a high percentage of proto-widgets may be fine – as long as the ones that come out the other end are of sufficiently high quality as to yield the optimal price point in the market. Patients not being widgets, in theory we are supposed to care about what happens to the individual patient during the process.) Nonetheless, invoking the Axiom of Industry – equating reduced cost to improved quality – allows the central authorities to choose “quality measures” in their P4P efforts that will primarily reduce cost, and then to claim that their primary concern is for quality.”

There are several problems with this comparison. In a diagnostic process for patients, one would not throw out patients as implied by DrRich. One throws out (tentative) diagnoses that no longer meet evidence as it is collected. That is, one is dealing with the process of diagnosis (or the process of producing widgets). So this could mean that P4P would force one to accept an incorrect diagnosis which would harm a patient.

But the main issue is it is not whether one is dealing with patients or widgets but the state of knowledge one has (for either process). When the state of knowledge is high, then standardization* is appropriate. In DrRich’s site, a comment by bev M.D. reminds us that standardization works well for the process of transfusing blood. When the state of knowledge is not high enough, standardization does not work as well and other methods are needed and used. In reliability engineering, when the state of knowledge is insufficient, FMEA (Failure Mode Effects Analysis – a modeling method), is unable to predict all of the ways a process can fail and design errors will occur. Standardizing such a process would lock in design errors. Therefore FRACAS (Failure Reporting And Corrective Action System) is used, which is a “data-driven process improvement”, which corrects observed failures so they will not recur. These reliability engineering concepts are being applied to medicine and particularly medical errors.

*P4P could be viewed as a measure of compliance to standardization.

Third time’s a charm

September 6, 2007

I had submitted an essay on Bland Altman plots to two journals, but it was rejected by both. Therefore, I put the essay on my web site. Since I wish to refer to this essay in a publication, I tried again for publication and this time the essay was accepted and will appear in Statistics in Medicine. Because of this, I will remove the essay from my web site in the near future. The title of the publication is: “Why Bland Altman plots should use X, not (Y+X)/2, when X is a reference method”.

The Statistics in Medicine Letter is available online at