## Pareto charts and running out of money

May 30, 2009

In the last blog entry, I said that ranking is not important* within a severity class. For example, if the class is severe patient harm, then all error causes within this class should be fixed. “Fixed” means that:

• if the error rate has a frequency (the error is occurring), then the error rate must be lowered to zero
• if the error has never occurred, it’s risk must be low enough (it can never be zero) – low enough means that the error should not be expected to occur

However, since funds are not limitless one might argue that the purpose of the Pareto chart is to draw a line where funds will run out so that even within a severity class such as severe patient harm, one can stop reducing risk when there is no money. (If this were the case, then ranking would be important since one would want to limit harm).

An alternative is to be more creative with solutions. Referring to the central line infection problem a few blogs ago, there was an expensive (and not as effective) attempt to prevent infections; namely to buy lines coated with antimicrobials. The most effective solution was to use a checklist, which did not add any cost.

In the laboratory, HAMA interference is an example of a error that can cause serious harm. I have heard laboratory directors say there is nothing that they can do – proposed solutions are too expensive. They need to think of an effective solution within cost constraints.

*Ranking could help to lower risk more quickly, since bigger risk items would be ranked higher. In that sense, ranking is still important.

May 26, 2009

Having a discussion with someone about the previous blog entry prompted me to make some additions to the entry. While this person argued for assigning a probability to event 3 and not event 1, it occurred to me that event 2 should also have a probability assigned to it.

For reference, the following sequence of events, which is a laboratory example of patient sample mix-up is reproduced from before.

Assigning a probability to event 2 is difficult using fault trees since this involves estimating not just the probability of event 1 but estimating probabilities from all other events that could lead to event 2. One then has to mathematically add the probabilities (done through software, since this is complicated). No one in a laboratory would do this. However, there is another way; namely to follow the total error approach on CLSI EP21. Here, a method comparison experiment is performed and all “bad” results are counted even though their cause is not known. It is important to define how bad a result must be before it causes harm. This is what CLSI EP27 will do, when it’s released.

## EP18P3 – More comments on risk management

May 23, 2009

One of the tasks that is confusing is assigning probability and severity to (error) events. This is important since it is the basis for Pareto charts, where events are ranked for the purpose of deciding which problems to fix.

Consider the following sequence of events, which is a laboratory example of patient sample mix-up.

There could be more events within this series, but to keep things simple, there are just these three events.

Issue 1: the severity of event 1 – The original error is event 1, which by itself does not cause patient harm. Actually nothing in the laboratory by itself causes patient harm. It is the downstream effects of event 1 that cause patient harm. Thus, the severity of event 1 is given by the severity of event 3. That is, if event 3 can cause harm, then the severity of that harm is assigned the severity of event 1.

Issue 2: multiple outcomes – Assume the assay is glucose. Providing the wrong glucose result can have a variety of consequences. If a 94 mg/dL is given instead of a 95 mg/dL, no harm is likely, not so if a 35 mg/dL is given instead of a 420 mg/dL. Typically, one assigns the severity corresponding to the worst possible outcome.

Issue 3: probability – The issue is should one assign the probability of occurrence to event 1 or event 3 and if event 3 which outcome.

Since the severity has been assigned to the worst outcome, if one were to assign a probability to event 3, it would be for the worst outcome. Typically, the probability for event 3 will be much lower than that for event 1. Consider two examples:

Central lab Glucose – one could get the distribution of glucose results for the laboratory and randomly sample two results from that distribution to get a probability for a “bad” patient mismatch to occur. One then has to speculate the percentage of times that a clinician would act on the result leading to patient harm.

Newborn screening – Here, a bad result would be a positive that is called a negative (usually worse than a false positive). This could be estimated by the prevalence of the disorder. Since most newborn screening disorders have low prevalence, the most common result of the patient sample mix-up would be mixing up a negative with a negative – which causes no harm.

In either the glucose or newborn screening case, one has to multiply the probability of event 3 by the probability of event 1, which gives a very low overall probability.

Why probability should be assigned to event 1 – Although it is possible to estimate a probability for event 3, the problem with using that probability is that it contains chance events, which are beyond the control of the laboratory. One cannot do anything about those chance events, one can only lower the probability of occurrence of event 1.

One might argue that this affects the ranking in a Pareto. This is true. For the glucose and newborn screening example, using the probability for event 1 might give a different ranking than using the probability for event 3 (the two event 1s are likely to have different probabilities since newborn screening is a filter paper assay and central lab glucose is a serum assay. The problem is this level of ranking is not important. Within a certain class, all problems need to be addressed since any serious harm due to laboratory error is unacceptable. The ranking is important when one moves from one severity class to another as occurs in a larger Pareto with hundreds of events, especially when events cover other areas, such as complaints, accreditation, finances, and so on.

## EP18P3 – New Version of laboratory risk management guideline

May 19, 2009

EP18 is the CLSI document about risk management. As it is being released there have been some comments which merit a response.

A few comments object to devoting attention to failure modes – implying that failure modes are events that have occurred. The proposal is to restrict the document to managing risk of events that have not yet occurred. I suspect that these people are familiar with the ISO standard on risk management 14971, which ignores failures that have occurred.

ISO 14971 was written for manufacturers who wish to release medical devices. In that sense, risk of potential errors is important since the device has not yet been released. But in the clinical laboratory, the product, which is reporting results of diagnostic assays to physicians, has been released and errors are occurring which can harm patients.

Consider HAMA interference in laboratory ABC. HAMA interference is a failure mode that has occurred in some laboratories and has caused harm. It may have already occurred in laboratory ABC but whether it has or hasn’t, it is a known cause of patient harm. If it is occurring in laboratory ABC, then it has a rate and needs to be reduced. If it has never occurred then it is a risk and needs to be controlled. In either case, harm may still occur since the risk will never be zero and it is possible that the rate may not reach zero. Note that even if the rate reaches zero, then it becomes a risk which is never zero.

Whether harm from HAMA interference occurs in this laboratory depends on how much effort (cost) is applied to prevent harm. This is a decision made by society which is largely misrepresented in ISO 14971 as a risk benefit tradeoff made by the site. What really happens is that the laboratory works at preventing errors until the money allocated for that task is used up. The amount of money is often a function of satisfying regulations, which in turn are dictated by society.

Here’s a real medical example for putting in central lines. This is an example to reduce failures (FRACACS) – although the word FRACAS is never used. From the article:

“Still, Pronovost asked the nurses in his I.C.U. to observe the doctors for a month as they put lines into patients, and record how often they completed each step. In more than a third of patients, they skipped at least one.”

This is a key step in FRACAS – measuring a failure rate. The steps that were skipped had the potential to lead to harm to patients and when controls were put in place – a check list was used – the rate of harm to patients decreased dramatically.

So risk management and EP18 needs to talk about the rate of actual failures as well as the risk of potential failures.

## EP9 – “Bias estimation” should be called “average bias estimation”

May 15, 2009

EP9, the CLSI standard about method comparison is being revised. What struck me is that the title of the current version includes the term “bias estimation”. This is not accurate. What EP9 estimates is average bias.

Average bias is a useful thing to estimate. For example, it’s good to know that the Beckman Coulter PSA assay is 22% higher on average than any WHO calibrated PSA assay. However, using the word bias instead of average bias implies that any and all biases will be estimated. This is not the case for EP9 since a result with a high bias that is detached from the other results will be excluded from the analysis. Moreover, two assays with the same regression coefficients but with different scatter will exhibit the same average bias (although the confidence intervals for average bias will be different).

Historically, EP9 was released as a Proposed document in 1986; however, it is based on an earlier CLSI (then called NCCLS) document called PSEP-4 released in 1979. The original title was “Comparison of Methods Experiment.” So later versions included the term bias estimation in the title. Note that there are two documents that estimate bias due to interference (EP7 and EP14) and it is only relatively recent that two documents estimate all bias in an assay – EP21 (2003) and EP27 (not yet released). See the CLSI site for details about individual documents.

## Response to Dr. Rich about Comparative Effective Research (CER)

May 12, 2009

I am happy to respond to Dr. Rich’s blog entry which refers to me. I enjoy reading Dr. Rich’s blog, value his wisdom, and thoroughly enjoyed his book. I agree with Dr. Rich that

“the controversy regarding CER has to do with how its results will be applied”

Regarding:

“DrRich suspects that Dr. Krouwer is more familiar with laboratory research than with clinical research”

Actually, what I do is analyze data. It usually does relate to the laboratory via diagnostic companies.

Dr. Rich talks about the problems with randomized clinical trials (RCT). I agree with his points; however, take different treatments for prostate cancer. Here, a RCT make no sense because the study would be too complicated, expensive, and take too long with respect to a viable alternative.

There are at least 8 treatments for prostate cancer:

1. Active surveillance (watchful waiting)
5. Proton beam therapy
6. Brachytherapy (seeds)
7. Cyrotherapy
8. HIFU (not approved in the US)

However, the number of categories increases because:

Patients are usually subdivided into low, intermediate , or high risk as a function of PSA, Gleason score, and stage. This leads to 24 treatment categories. Moreover, there are a variety of other complicating factors such as combination treatments including the use of androgen derivation therapy (ADT) applied to some patients for one of several reasons, surgical skill for relevant procedures, localized therapy, salvage treatments, and so on.

The alternative to a RCT is simply to analyze existing data. There are around 185,000 men diagnosed with prostate cancer each year. Each ten years provides 1,850,000 cases. That’s a lot of data! If one had a way of simply collecting the outcomes (success rates, side effects, total cost) of each case, one would have CER.

Yet, existing studies that attempt to answer some of these questions have bias. To deal with all points would be a book so take one example – incontinence as a side effect of treatment, especially radical prostatectomy.

First, one has to take into account the pre treatment rate of incontinence which is around 9% – see WebMD (the pre treatment rate is a bigger issue for another side effect – erectile dysfunction). It is not clear that studies take into account prior side effect rates.

From Medscape, physician-reported studies report a lower rate of incontinence than when patients are surveyed. One can speculate that physicians would like to minimize the side effects of treatments that they provide. Moreover, the definition of incontinence including when it is measured after treatment differs among studies or is not provided at all. And of course, some of the definitions are questionable. Thus, using less than three pads a day qualifies as continent according to Zincke, et. al.

What is needed is standardization of terms and survey instruments and a reliable method to collect this data. If this doesn’t occur, then it will be hard to draw conclusions about this side effect whether the study is transparent (well described) or not. One can argue that there will always be some bias but it does matter how much if policies will result from such studies.

## Pay for Performance, Quality, and DB

May 2, 2009

I read DB’s medical blog, who is amazingly prolific and at times blogs about some aspect of quality. His blog entry with the title “quality measurement – a delusion” was alarming and I posted a comment about this blog entry. His next blog entry discussed (a portion of) my comment. Here is my analysis…

For any process, including medical processes, one can estimate an error rate. Some of the errors are preventable. This can be thought of as lack of quality, whereby quality is defined (ASQ) as “free of deficiencies”. There is nothing inherently wrong in calling an error rate a “performance measure”. Unfortunately, programs such as pay for performance (P4P) cloud the issue. P4P rewards and penalizes physicians for their performance on various measures. However, these measures go beyond the concept of reducing errors. For example lowering hemoglobin A1C to less than 7.0 in a diabetic patient, while desirable, is more of a policy than a means to reduce preventable errors.

One of the problems is that one must view the mitigation of an error as part of the overall picture. With an error such as wrong site surgery, the error can be considered as part of the surgery process and the solution to prevent this error is a modification of the surgery process.

In lowering A1C, many of the processes are outside of the control of the physician. An A1C value that has not been reduced to less than 7.0 cannot automatically be called a preventable medical error whose mitigation is the responsibility of the physician.

There are plenty of easy to see preventable medical errors such as wrong site surgery, giving the wrong amount of a drug, mixing up a laboratory specimen, and so on. Tools such as FMEA and FRACAS help to reduce errors. It would appear that some physicians, who are not familiar with quality tools but have been bombarded with P4P have overreacted, and when this has been pointed out, call on semantics.