Reading Quality Digest can be dangerous to your health

June 17, 2008

right tool for jobIn the June 2008 issue of Quality digest, there is an article by Jay Arthur entitled “Statistical Process Control for Healthcare” (1). After the usual boilerplate type of introduction, something caught my eye; namely, the so called good news that there is “inexpensive Excel based software to create control charts … .“ This made me go to the end of the article where sure enough the author just happens to sell such software. This may have been a good place for the author to introduce the term bias.

To understand a more serious problem with this article, consider a hospital process; namely analyzing blood glucose in a hospital laboratory. Because such a process has error, quality control samples are run. Say such a control has a target value of 100 mg/dL.  The values of the quality control samples are plotted by SPC software and rules are formulated. If the glucose control value is too high or too low, the process is said to be out of control and action is taken.

Now,  Mr. Arthur is trying to push SPC software not for a process but for errors in the process. For example, he uses the infection rate in a hospital. But the infection rate error is not a process that one wants to control – of course one does not want it to become worse - but its target is zero.

A more useful example than the hypothetical one provided by Mr. Arthur was published recently (2). Here, the authors were faced with an undesirable hospital infection error rate and set out to observe where errors occurred in the process of placing central lines. They then provided control measures and continued to track the error rate, which was reduced to zero. This is not SPC! It is much more like a FRACAS (Failure Reporting And Corrective Action System).

In another part of the article, Mr. Arthur suggests that “never events” can be tracked by SPC. Never events – a list of 28 such events have been put forth by the National Quality Forum – have as implied, targets of zero. Such an event is wrong site surgery. One should use something like FMEA (Failure Mode Effects Analysis) to reduce the risk of such events. It is silly to suggest SPC software for never events.

References

1.   See. http://www.qualitydigest.com/currentmag/articles/03_article.shtml

2.   An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU. Pronovost P, Needham D, Berenholtz S, Sinopoli D, Chu H, Cosgrove S, Sexton B, Hyzy R, Welsh R, Roth G, Bander J, Kepros J, Goeschel C N Engl J Med 355:2725, December 28, 2006


Acceptable Risk – Easy to talk about, but no one knows what it means

May 4, 2008

risk

Standards about risk management always talk about “acceptable risk.” This is a qualitative term. Unfortunately, for much of healthcare there is no matching quantitative assessment or goal. Consider two examples.

Statement

Because

Precision is acceptable

CV is 8% and goal is 10%

Residual risk is acceptable

?

 

 

It is possible to estimate the probability of a severe adverse event and to have an associated goal for such a probability but no one in healthcare does this. So one will see things like, “with this mitigation we have reduced the risk of the adverse event to an acceptable level” but the reality is no one knows what this really means.


Never Events – Never a meaningful goal

May 3, 2008

problemThis has been considerable discussion about the National Quality Forum’s  so called 28 never events (1). Here are some problems with this concept.

Never is a poor goal – Adverse events can be considered within a risk management program. Risk is the combination of two items – severity and probability of occurrence. By their selection, one can gather than severity is high for the 28 events. However, probability can never be zero. Consider a simple example. The likelihood of performing wrong site surgery is X. One performs a double check to prevent wrong site surgery. Now the likelihood is 0.0001X. But the double check can fail. So one can perform a triple check. Now the probability is much lower but it is still not zero. And so on. Working with probabilities (as in fault trees), is one way to see that probabilities are never zero, nor is risk.

28 goals are too many – If one wants to manage anything, one needs a limited number of goals. There is no reason why one can’t combine events to give a single goal – the overall risk of an adverse event.

“largely preventable” is not the same as preventable – In the NQF site, the never events are said to be largely preventable. The problems with this are obvious.

References

1.       See http://216.122.138.39/projects/completed/sre/index.asp


Alternatives to Six Sigma

March 19, 2008

assay

This entry continues where the entry (Six Sigma can be dangerous to your health) left off. Given the problems with six sigma, what are some solutions to estimate the quality of an assay, using hCG as an example assay.

First, when total analytical error is calculated to estimate the values in zones A-C in an error grid, one should use conservative methods such as the empirical distributions suggested by the CLSI EP21A method, and where no data are deleted. Let’s say a clinical laboratory has done this evaluation with 40 patient samples for a new and reference method and found no results in zone C for an hCG assay. What can one conclude? Although there are 0% of the values in zone C, the 95% confidence interval extends to 7.2%. This means that for every million hCG results performed, up to 72,000 results could be in zone C. This is not very comforting and these types of evaluations don’t prove much, although one knows that the 7.2% rate is unlikely (because if this rate to occurred, it would be noticed).

FMEA is an approach that will provide an answer to the quality question but in its complete form, it requires considerable effort. To complete a FMEA analysis, one has to postulate all possible reasons why a result could fall into zone C. To get an idea of what is involved, take two possible failure modes, HAMA interference and a patient sample mix-up.

HAMA interference – To estimate the likelihood of a zone C result from HAMA interference, one needs to know the level of HAMA that will cause erroneous results in the assay and the probability of such levels in the population being sampled. Contacting the manufacturer might give one the level of HAMA to watch out for – I am not familiar with data about the distribution of HAMA in patient samples. Yet, one knows HAMA interference occurs (Clinical Chemistry. 2001;47:1332-1333).  

Patient sample mix-up – There are some data for patient sample mix-ups (Archives of Pathology and Laboratory Medicine: Vol. 130, No. 11, pp. 1662–1668). However, it seems that these cases are caught within the laboratory. One would need to determine how many cases actually are not caught within the laboratory. One could then model the likelihood of a zone C result by sampling from the empirical distribution of hCG results that are observed on the lab to see the likelihood of a mix-up causing a zone C result.

Because there are so many existing data in a clinical laboratory, one may also have the opportunity to perform FRACAS types of analyses. That is, in addition to modeling probabilities, once could use existing data to count actual failures.

One must then continue:

  • with each other possible failure mode, calculate the probability of zone C results
  • calculate the overall probability of zone C results (from all failure modes) and determine if that risk is acceptable
    • special software is typically used to perform these calculations
  • construct a Pareto table if the overall probability of zone C results is too high and
  • propose control measures to lower the overall risk to an acceptable level
    • the control measures must of course be affordable

At this point, one can get the idea that this level of effort is out of reach for clinical laboratories since the level of expertise and work need just to estimate the likelihood of a zone C result is huge. Even if a clinical laboratory could perform this task, it makes no sense to require every clinical laboratory to do so.

One possibility is to have a standards group tackle such a task., although this too has limitations as was shown for a (universal) control measure to prevent wrong site surgery.

Another possibility is to perhaps leverage resources beyond the clinical laboratory. For example, one could insist that before treatment for trophoblastic carcinoma, an hCG result should be confirmed either by performing a reference assay or perhaps by treating the sample and rerunning it. This requires an interaction between the clinical laboratory and clinicians.

So there are no easy answers to preventing severe, low frequency failures, (that cause patient harm) but as discussed before, coming up with a sigma estimate for an hCG assay, is also not the answer. Nor is doing nothing.


Jan gets an award

March 15, 2008

award

I recently spoke at the Quality in the Spotlight conference in Antwerp, Belgium and gratefully acknowledge being awarded the Westgard Quality Award. This award was presented by Jim Westgard himself. The Quality in the Spotlight conference is a two day conference in Antwerp, devoted each year to a quality theme. This year’s theme was quality tools. I spoke about FMEA on each of the two days. It wasn’t until the second day of the conference that I realized that some of the other presentations were bothering me – perhaps I had a case of brain jetlag. This is an interactive conference so had I been quicker I would have presented my concerns to the speakers. But this did not happen so my concerns are in the previous entry to this blog. Prof. Dr. Jean-Claude Libeer, who founded the conference and also spoke about me with respect to the award, said that it was my blog which impressed people. So perhaps my previous entry could be taken as an acceptance speech.

On the second day, per instructions, I attempted to do a “workshop”. This is in quotes because I had to involve the audience but was only given one hour. Had I to do this again, I would have given an award to one lady, who answered some of the questions I posed to the audience. One example – name a case of at risk behavior that you have experienced. Answer, a technician, who had trouble getting a barcode on a patient sample to register, scanned the barcode from another patient. So perhaps this is also an illustration of the need to perform a FMEA on a control measure (what can go wrong with implementing barcodes).

Another highlight of my trip was spending three days in Amsterdam and hearing that in spite of frequent mistakes, my Dutch is begrijpelijk (understandable).



Six Sigma can be dangerous to your health

March 13, 2008

sigma

At a recent conference, there were several presentations about six sigma for clinical laboratory assays. To recall, sigma is calculated as Sigma = (TEa – bias)/CV where

TEa is the total allowable error
Bias is the inaccuracy of the measurement procedure
CV is the imprecision of the measurement procedure

The problem with six sigma is that’s it taken as a sole measure of quality – that is, if you have a high sigma value (greater than 6) then your assay is assured of high quality. The rest of this entry explains why this is wrong.

First, TEa (total allowable error) is often specially called out as medically acceptable limits. One need only read the ISO 15197 standard for glucose to see this connection. I have previously commented about this standard. The implied meaning of medically acceptable limits in shown in below.

figure 1

This is simply not the real world. Taguchi long ago specified a more realistic quadratic model of worth, which is shown below, superimposed on the original figure but in green.

figure 2

Thus points A and B are similar in bias and are similar in causing (or not causing) medically unacceptable results. It is also likely then that if point A is ok, then so is point B. It is only when one gets far away from these limits that one is almost certain to have results that can cause harm. This is shown below with point C.

figure 3

This can also be expressed as an error grid such as those for glucose. So the “sigma” calculations really only express the zone A region (grey) where 95% or more of the results should be. Zone B (white) can contain up to 5% of the results and zone C (dark grey) should contain no results. The error grid contains more information since each set of limits is different for each concentration. An error grid is shown below, taken from FDA guidance. In the guidance, WM is the test method and CM is the reference method. (In the document WM=waiver method and CM=comparative method).

figure 4

So the problem is that sigma only accounts for zone A, but patients are harmed by values in zone C!

Now one might argue that there is nevertheless a relationship between sigma and the three zones, meaning that high sigma values are unlikely to have values in zone C and low sigma values are likely to have such values. This is also not true. Here is why.

1.       Often incorrect models are used to asses total error – see here.

2.       In estimating bias and CV, outliers – the very values that cause harm - are often thrown out.

3.       All sigma calculations are based on the assumption that the data are normally distributed. Most data do not fulfill this criterion. This means that often there are more frequent values in the tails of the distribution (again, this is zone C) than expected by calculations based on the normal distribution

4.       And maybe the biggest reason of all, values can occur in zone C that have nothing to do with the analytical process. If there is a patient sample mix-up, this can occur and these values are excluded (when detected) from virtually all analytical evaluations.

Think of it this way. If a loved one suffered medical harm, due in part to an erroneous lab result, would it make you feel better to know that the assay had a high sigma value? And would you associate that assay with quality?

I will comment on how one can address these issues in a future entry.


At risk behavior

March 3, 2008

risk

I am involved in risk management standards for clinical laboratories, where the focus has been on understanding how manufacturer’s devices can fail and how a clinical laboratory can put in place control measures to prevent these failures from causing harm.

My concern with these standards is that there is not enough emphasis given to the clinical laboratories own sources of error – its people. Among problems related to human errors are cognitive errors, non cognitive errors, reckless behavior, and at risk behavior – the topic of this entry.

At risk behavior is behavior that increases risk where risk is not recognized, or is mistakenly believed to be justified. Anyone who manages people must have had the experience by hearing  (perhaps second hand) “I don’t think that’s necessary and I’m not going to do it.” And of course, parents are familiar with at risk behavior practiced by their children.

An example of healthcare at risk behavior is reusing syringes. This occurred recently at an endoscopy clinic in Nevada and has affected up to 40,000 people. In reading the patient empowerment blog, one learns about other cases of reused syringes. In a case in Long Island, the physician reused syringes only for the same patient, but the syringes were used with multi-dose vials and these vials were used across patients.

In the recent case of reducing central line infections, Dr. Peter Pronovost observed that of the steps associating with placing a central line, in a third of patients, doctors skipped at least one step. Whereas, some of this could be attributed to non cognitive errors (slips), it could also be associated with at risk behavior. The control measure that worked here, was a double check step, whereby another healthcare provider would check to make sure each step was followed.

Discovering at risk behavior may not be easy, hence it needs to be on one radar’s screen.


Should one focus on a failure in a procedure or the outcome of such a failure?

February 14, 2008

money

Withholding payment for adverse events is a financial incentive to promote patient safety. Whether this incentive makes financial sense is something I will comment on later or perhaps not at all. For now, my comments are about the policy as it recently appeared (1).

 

 

The authors suggest the following criteria to withhold payment.

·         Evidence demonstrates that the bulk of the adverse events in question can be prevented by widespread adoption of achievable practices.

·         The events can be measured accurately, in a way that is auditable.

·         The events resulted in clinically significant patient harm.

·         It is possible, through chart review, to differentiate the adverse events that began in the hospital from those that were “present on admission” (POA).

The problem is with the third bullet and can perhaps be illustrated by the following figure.

FMEA FRACAS

In this figure FMEA events are shown by the dashed line.  The red dashed line is before FMEA. The green dashed line shows that after a successful FMEA, risk of failures has been reduced. FRACAS events are shown by the solid lines. The green line shows a reduction in the failure rate after FRACAS.

Keep in mind, for the dashed lines (FMEA), no failures have occurred, while for the solid lines, failures have occurred.

Now the policy defines a failure as an adverse patient outcome. One can view outcomes as the end of  an event cascade as in the next figure.

error cascade

Assume that event C is an adverse patient outcome. According to the policy, payment is withheld only when event C is observed. In the first figure, the relevant concern area is shown by the ellipse as it is assumed that these are all high severity (severe patient harm) events.

This policy therefore excludes the following cases:

All FMEA events. That is, a procedure with a correctable high risk will be excluded from this policy because the event has not yet occurred. Considered the case of the Duke transplant error (2), before it happened. One can infer that this was a high risk procedure that would have benefited from a FMEA. In essence, this policy waits for disasters to happen.

All near miss events. Consider the case of the patient who had an MRI (3). Blood pressure monitor tubing had to be disconnected for the MRI. After the procedure, the tubing was incorrectly connected to an IV line. Before air was delivered from the automated blood pressure monitor, a family member noticed that things didn’t look right and contacted a nurse, who corrected the problem. Thus, there was no adverse event.

All defective procedures that don’t result in severe patient harm. Consider a healthcare worker who violates hospital policy (at risk behavior according to Marx (4)), which results in a patient fall. In this case, the fall results in a minor injury.  This is an important case because the policy fails to properly reflect risk management principles.

For a procedure that has a problem (e.g., a failed event), one has to classify the severity of the failed event and its probability (FMEA) or frequency of occurrence (FRACAS). The severity is classified not necessarily by the failed event but by the effect of the failed event. The effect is itself an event and can be a spectrum of severities. In the case of a patient fall, there is a distribution of harm associated with the fall event – some falls will result in severe harm, some will result in minor harm. Traditionally, in risk management, if severe harm is possible, then severity is associated with severe harm, even if the probability of severe harm is low. In this sense, severity is equated with potential outcome, regardless of whether that specific outcome has occurred.

One also has to classify the probability (FMEA) or frequency of occurrence of the event (FRACAS). Here, assuming FMEA, one could choose between the probability of the failed event or the probability of the effect of the event (the adverse outcome). It is recommended to use the probability of the failed event, not the probability of the effect of the event. This is because one usually has control over the failed event and does not have control over the effect of the event.

Example: If a clinical laboratory provides a clinician with an erroneous result and the effect of that could be patient harm, the event is classified as severe. The probability is the probability of erroneous result, not the probability of patient harm, because patient harm is outside of control of the clinical laboratory (the clinician might not act on the result, might suspect it is erroneous and request it to be repeated, and so on).

Summary

This policy will miss many quality issues and deviates from traditional risk management.

References

  1. Wachter RM ,Foster NE and Dudley RA Medicare’s Decision to Withhold Payment for Hospital Errors: The Devil Is in the Details The Joint Commission Journal on Quality and Patient Safety 2008;34: 116-123, see http://psnet.ahrq.gov/resource.aspx?resourceID=6760
  2. See http://www.cbsnews.com/stories/2003/03/16/60minutes/main544162.shtml
  3. See http://www.ismp.org/newsletters/acutecare/articles/20030612.asp
  4. Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives http://www.mers-tm.net/support/Marx_Primer.pdf


FMEA vs. FRACAS

January 4, 2008

concept

I have previously compared FMEA and FRACAS, here. Another simple difference is:

(Successful) FMEA reduces risk.

(Successful) FRACAS reduces failure rates.

Now, one often hears about successful FMEAs. In my experience, these are not FMEAs, they are examples of FRACAS. An example is here. How can one tell that this is FRACAS and not FMEA. It’s simple - what is described is the reduction of a too high failure rate to a lower rate. With FMEA, the failure rate is zero – the event has not happened. What one does is to reduce the risk of this potential failure, from some amount to a lower amount. This is perhaps one of the reasons, one does not hear too much about FMEA successes. As I said before, to say that something that has never happened is now even less likely to happen (due to FMEA) just isn’t too exciting.

To reduce failure rates is a good thing and it is not a big deal to call this FMEA when it is FRACAS. However, it is simple to use the correct terms and if one doesn’t one might wind up neglecting to perform FMEA when it’s needed.


Central lines and FRACAS

December 7, 2007

surgery

One hears of FRACAS success stories (like the one below) and FMEA failure stories (like the wrong blood type organs transplanted at Duke). A reason one doesn’t hear of FMEA success stories is that to say that something that has never happened is now even less likely to happen (due to FMEA) just isn’t too exciting. FMEA success stories are often not cases of FMEA, they are FRACAS, since rate improvements are discussed. FRACAS failures – we tried something, it didn’t work – are not very interesting.

A recent article in The New Yorker (1) provides an example of a FRACAS success story.

In the article, there is no mention of FRACAS but many of the steps were followed. The issue was a too frequent infection rate in central lines. It is important that one can measure this rate. One knows how many central lines are used, infections manifest themselves and their cause can be determined by culturing the lines. Some undercounting is possible but the rate seems fairly reliable.

The man behind the work, Dr. Peter Pronovost, first observed events for a month within the context of the process of placing central lines (e.g., process mapping). Errors in the process steps were identified. Since these steps were simple, such as washing hands, one could partly view these errors as non cognitive errors. This suggests a control measure such as a double check to prevent such “slips”. Actually, besides slips, there may have been some at-risk behavior (2). This is behavior that increases risk where risk is not recognized, or is mistakenly believed to be justified. The main control measure used was a checklist, with the addition of having nurses double check to see that the checklist steps were properly done. Then the rate was measured again and found to be considerably lower. All of this was published (3).

It was mentioned that an alternative control measure had been tried; namely, using central lines coated with antimicrobials. This expensive control measure failed to provide a substantial reduction in infection rates. This illustrates that one must be open minded when selecting control measures. There is sometimes a bias towards fixing the “system” (e.g., such as with coated lines) rather than fixing a people issue (e.g., which often implies blame). Dr. Pronovost implemented some system control measures by getting the manufacturer of central lines to include drapes and chlorhexidine – items that should have been available at the bedside but often were not.

Another big part of this story is ongoing resistance towards implementing this control measure more widely, even after it has been shown to be effective and low cost. Any control measure can be viewed as a standard and standards are not very popular. People will argue “but our situation is different”, “ICUs are too complicated for standards”, and so on. Financial incentives (or disincentives) for standards (e.g., P4P) loom. Dr. Gawande goes on to say how complicated things are in an ICU, yet there is precisely where standards helped. A similar situation happened in anesthesiology in the late 70s and early 80s. (Here, critical incident analysis was used and is basically the same as FRACAS.) The error rate was too high, effective control measures were developed, and widespread implementation of the control measures took considerable effort. You can read about that story here.

References

1.       Gawande A. Annals of Medicine. The checklist. The New Yorker, Dec. 7th issue, 2007, see here (don’t know how long this link will work).

2.       Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives http://www.mers-tm.net/support/Marx_Primer.pdf

3.       Pronovost P. et al. An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU. N Engl J Med 2006;355:2725-32.