A six year series of the same medical error

June 22, 2009

surgeryI was made aware of a series of medical errors in brachytherapy by the Health Care Renewal blog. The original article is here. Brachytherapy is a treatment for prostate cancer where small radioactive seeds are implanted in the prostate.

As a quick summary, at the VA hospital in Philadelphia, over a six year period, 92 out of 116 brachytherapy treatments were performed incorrectly, often leading to serious complications. The VA had contracted out the treatments with one of the contractors being from the University of Pennsylvania and board certified in radiation oncology.

In the article:

Two days later, the Joint Commission, which helps set standards in the hospital industry, surveyed the Philadelphia V.A. and on the next day accredited the hospital. “This organization is in full compliance with applicable standards,” the Joint Commission said.

The commission said that it had no indications of the problems in the brachytherapy program when it arrived at the hospital and that its surveys are not detailed enough to have uncovered the flawed implants.

I have previously written about the fact that medical errors happen at accredited hospitals. But for the same error to recur for six years says the accreditation system is flawed.

I was also struck by:

Susan Phillips, a senior executive at Penn’s medical school and health system, said Dr. Kao had voluntarily given up his clinical privileges there, though he continues to do research on campus.

Since it was shown that Dr. Kao falsified records during several brachytherapy procedures, why is he still doing research. Dr. Kao is listed as part of the clinical faculty of the University of Pennsylvania department of radiation oncology. The University of Pennsylvania is building a proton beam therapy (PBT) facility (scheduled to open in 2009) and will be the sixth such center to offer PBT treatment in the US.


What patients want

June 19, 2009

wantDB asked in his blog “When you consider physician quality, what attributes do you consider?” My response was:

  1. Unbiased treatment advice
  2. Physician knows more than I do

An ideal treatment is (at least in concept) effective with minimal side effects – cost is not an issue for insured patients. Yet, some physicians steer patients towards a treatment that may not be ideal for that patient – prostate cancer is one example. This leads to people using the internet and other sources to be their own patient advocate. Whereas it would be silly to think such research makes one competent, it is not good if one gets the feeling the doctor is not as knowledgeable as the patient.

Other responses were similar.

A Kano diagram is useful to frame the issue.

 Kano

 

The green line is expected attributes. For example, if someone is going into a hospital for a total left hip replacement and the left hip is replaced (as opposed to the right hip), this is expected! It is not thought of as quality. Hence, there is no satisfaction for fulfilling this goal, only dissatisfaction if it is not fulfilled. This is similar to the airlines. One does not ask to review the maintenance records of a flight one is taking nor review the pilot’s resume. One expects these attributes to be in order.

The red line is also not specified. For example, if someone with a chronic condition was suddenly diagnosed and cured by a new doctor, this would be unexpected although of course there would be great value.

It is the blue line that is specified by patients. Take surgery for example. A famous urologist once said he would love to redo his first 100 prostatectomies. Surgery skill, notwithstanding the difficulty in getting this information is valued by patients. My and the other responses dealt with other attributes valued by patients.

I doubt if anyone would specify outcome measures such as “Percentage of patients who received advice to quit smoking” or any of the other 25 measures.


EP23 – Again

June 10, 2009

ep23-revThe title of EP 23 is Laboratory Quality Control Based on Risk Management. This title and hence the document makes no sense to me.

Risk management involves enumerating potential failure modes and implementing control measures for high risk failure modes. But quality control works whether one knows about failure modes or not. In fact, one of the values of quality control is precisely that one can know nothing about failure modes yet quality control will still detect failures.

Thus, Laboratory Quality Control is not Based on Risk Management would be a corrected title but also not a very good one.

There are many ways to go about setting up a quality control program – the Westgard site is a good place to start.

To illustrate things using the figure below, the outcome of a (highly abbreviated) risk management program is nevertheless an incorrect result. If QC succeeds, the incorrect result is suppressed.

ep23-61009

Although one could talk about the total program as risk management, there is really no connection between trying to prevent or detect the many possible failure mode causes that can lead to an incorrect result and QC which detects (some) incorrect results regardless of the cause. And QC is not based on risk management.


Why you need to be your own patient advocate with lab tests

June 1, 2009

advocateLab tests have error and sometimes very large errors. As the last blog entry showed, patient harm can result from certain lab errors. In this blog entry, lab error means an error large enough to result in patient harm. But it is not the error itself that causes harm, it is the clinician acting on the result. When harm occurs from lab error, one can infer that the clinician has not questioned the accuracy of the test. Thus, it’s up to the patient to question the result.

Some examples of how lab results can have error:

Interferences – The HAMA interference from the previous blog is just one example of an interference and can occur on assays other than hCG.

Known bias  – Example: PSA values can differ by 22% on average depending on the manufacturer. This is due to how the assay is standardized. Say one’s PSA value was 3.3 as assayed by manufacturer ABC. If next year the assay value remained at 3.3 by manufacturer ABC, but a different manufacturer were used (that had this 22% different standardization), the reported value would be 4.0 on average. Actually, taking into account the ~5% CV of these assays, 95% of the time the value would be between 3.2 and 4.8 (e.g., half the time greater than 4). Unless questioned, this might lead to a biopsy. Often, the manufacturer is not listed on the lab report. To find this out, one must call the lab.

Problems with newer tests – Molecular testing with arrays is the newest type of testing. A recent article (subscription required for full article) showed that the reproducibility was often greater than 30% CV. If one translates this to a glucose test with a true value of 100 mg/dL, 95% of the time, values would be between 40 and 160 mg/dL! Another report showed that different generations of probe sets often had close to 0 correlation. Back to glucose, this is equivalent to running a method comparison between a newer and older machine and getting random scatter rather than a typical result of a correlation > 0.9.

Other problems – There are many other potential problems that would give an error such as a patient sample mix-up, an undetected instrument error, sample pretreatment problems, and so on.

When to question lab tests – In principle, any lab test could be questioned, although this could be (is) impractical. Moreover, the above problems will be unknown to patients not familiar with laboratory medicine and even people who are familiar with error causes may be unaware that an error has occurred.

Two scenarios are suggested to question lab tests:

  • Before a treatment is started, especially a treatment with risks such as surgery
  • If symptoms persist and a lab test was negative

How to question lab tests – Unfortunately, simply repeating a lab test will not always help. It depends on the error source. If the error source is random, then simply repeating the test will help. If the error source is not random, such as caused by an interference, then repeating the test by the same procedure in the same lab will not help. In situations with HAMA interference, part of the problem was that serial measurements gave the same wrong answers which prompted clinicians to continue (the wrong) treatment.

The safest way then is to request a test to be repeated by a different laboratory and preferably a reference laboratory, if one exists for that assay.

And remember – A wrong lab test is a rare event – whereas a result is worth questioning, the likelihood that a lab test is wrong is extremely low.

NO MEDICAL ADVICE: Material appearing here represents opinions offered by non-medically-trained laypersons. Comments shown here should NEVER be interpreted as specific medical advice and must be used only as background information when consulting with a qualified medical professional.


Pareto charts and running out of money

May 30, 2009

moneyIn the last blog entry, I said that ranking is not important* within a severity class. For example, if the class is severe patient harm, then all error causes within this class should be fixed. “Fixed” means that:

  • if the error rate has a frequency (the error is occurring), then the error rate must be lowered to zero
  • if the error has never occurred, it’s risk must be low enough (it can never be zero) – low enough means that the error should not be expected to occur

However, since funds are not limitless one might argue that the purpose of the Pareto chart is to draw a line where funds will run out so that even within a severity class such as severe patient harm, one can stop reducing risk when there is no money. (If this were the case, then ranking would be important since one would want to limit harm).

An alternative is to be more creative with solutions. Referring to the central line infection problem a few blogs ago, there was an expensive (and not as effective) attempt to prevent infections; namely to buy lines coated with antimicrobials. The most effective solution was to use a checklist, which did not add any cost.

In the laboratory, HAMA interference is an example of a error that can cause serious harm. I have heard laboratory directors say there is nothing that they can do – proposed solutions are too expensive. They need to think of an effective solution within cost constraints.

*Ranking could help to lower risk more quickly, since bigger risk items would be ranked higher. In that sense, ranking is still important.


EP18P3 – More comments on risk management – addition

May 26, 2009

series

Having a discussion with someone about the previous blog entry prompted me to make some additions to the entry. While this person argued for assigning a probability to event 3 and not event 1, it occurred to me that event 2 should also have a probability assigned to it.

For reference, the following sequence of events, which is a laboratory example of patient sample mix-up is reproduced from before.

frequency

Assigning a probability to event 2 is difficult using fault trees since this involves estimating not just the probability of event 1 but estimating probabilities from all other events that could lead to event 2. One then has to mathematically add the probabilities (done through software, since this is complicated). No one in a laboratory would do this. However, there is another way; namely to follow the total error approach on CLSI EP21. Here, a method comparison experiment is performed and all “bad” results are counted even though their cause is not known. It is important to define how bad a result must be before it causes harm. This is what CLSI EP27 will do, when it’s released.


EP18P3 – More comments on risk management

May 23, 2009

series

One of the tasks that is confusing is assigning probability and severity to (error) events. This is important since it is the basis for Pareto charts, where events are ranked for the purpose of deciding which problems to fix.

Consider the following sequence of events, which is a laboratory example of patient sample mix-up.

frequency

There could be more events within this series, but to keep things simple, there are just these three events.

Issue 1: the severity of event 1 – The original error is event 1, which by itself does not cause patient harm. Actually nothing in the laboratory by itself causes patient harm. It is the downstream effects of event 1 that cause patient harm. Thus, the severity of event 1 is given by the severity of event 3. That is, if event 3 can cause harm, then the severity of that harm is assigned the severity of event 1.

Issue 2: multiple outcomes – Assume the assay is glucose. Providing the wrong glucose result can have a variety of consequences. If a 94 mg/dL is given instead of a 95 mg/dL, no harm is likely, not so if a 35 mg/dL is given instead of a 420 mg/dL. Typically, one assigns the severity corresponding to the worst possible outcome.

Issue 3: probability – The issue is should one assign the probability of occurrence to event 1 or event 3 and if event 3 which outcome.

Since the severity has been assigned to the worst outcome, if one were to assign a probability to event 3, it would be for the worst outcome. Typically, the probability for event 3 will be much lower than that for event 1. Consider two examples:

Central lab Glucose – one could get the distribution of glucose results for the laboratory and randomly sample two results from that distribution to get a probability for a “bad” patient mismatch to occur. One then has to speculate the percentage of times that a clinician would act on the result leading to patient harm.

Newborn screening – Here, a bad result would be a positive that is called a negative (usually worse than a false positive). This could be estimated by the prevalence of the disorder. Since most newborn screening disorders have low prevalence, the most common result of the patient sample mix-up would be mixing up a negative with a negative – which causes no harm.

In either the glucose or newborn screening case, one has to multiply the probability of event 3 by the probability of event 1, which gives a very low overall probability.

Why probability should be assigned to event 1 – Although it is possible to estimate a probability for event 3, the problem with using that probability is that it contains chance events, which are beyond the control of the laboratory. One cannot do anything about those chance events, one can only lower the probability of occurrence of event 1.

One might argue that this affects the ranking in a Pareto. This is true. For the glucose and newborn screening example, using the probability for event 1 might give a different ranking than using the probability for event 3 (the two event 1s are likely to have different probabilities since newborn screening is a filter paper assay and central lab glucose is a serum assay. The problem is this level of ranking is not important. Within a certain class, all problems need to be addressed since any serious harm due to laboratory error is unacceptable. The ranking is important when one moves from one severity class to another as occurs in a larger Pareto with hundreds of events, especially when events cover other areas, such as complaints, accreditation, finances, and so on.


EP18P3 – New Version of laboratory risk management guideline

May 19, 2009

riskEP18 is the CLSI document about risk management. As it is being released there have been some comments which merit a response.

A few comments object to devoting attention to failure modes – implying that failure modes are events that have occurred. The proposal is to restrict the document to managing risk of events that have not yet occurred. I suspect that these people are familiar with the ISO standard on risk management 14971, which ignores failures that have occurred.

ISO 14971 was written for manufacturers who wish to release medical devices. In that sense, risk of potential errors is important since the device has not yet been released. But in the clinical laboratory, the product, which is reporting results of diagnostic assays to physicians, has been released and errors are occurring which can harm patients.

Consider HAMA interference in laboratory ABC. HAMA interference is a failure mode that has occurred in some laboratories and has caused harm. It may have already occurred in laboratory ABC but whether it has or hasn’t, it is a known cause of patient harm. If it is occurring in laboratory ABC, then it has a rate and needs to be reduced. If it has never occurred then it is a risk and needs to be controlled. In either case, harm may still occur since the risk will never be zero and it is possible that the rate may not reach zero. Note that even if the rate reaches zero, then it becomes a risk which is never zero.

Whether harm from HAMA interference occurs in this laboratory depends on how much effort (cost) is applied to prevent harm. This is a decision made by society which is largely misrepresented in ISO 14971 as a risk benefit tradeoff made by the site. What really happens is that the laboratory works at preventing errors until the money allocated for that task is used up. The amount of money is often a function of satisfying regulations, which in turn are dictated by society.

Here’s a real medical example for putting in central lines. This is an example to reduce failures (FRACACS) – although the word FRACAS is never used. From the article:

“Still, Pronovost asked the nurses in his I.C.U. to observe the doctors for a month as they put lines into patients, and record how often they completed each step. In more than a third of patients, they skipped at least one.”

This is a key step in FRACAS – measuring a failure rate. The steps that were skipped had the potential to lead to harm to patients and when controls were put in place – a check list was used – the rate of harm to patients decreased dramatically.

So risk management and EP18 needs to talk about the rate of actual failures as well as the risk of potential failures.


EP9 – “Bias estimation” should be called “average bias estimation”

May 15, 2009

words2EP9, the CLSI standard about method comparison is being revised. What struck me is that the title of the current version includes the term “bias estimation”. This is not accurate. What EP9 estimates is average bias.

Average bias is a useful thing to estimate. For example, it’s good to know that the Beckman Coulter PSA assay is 22% higher on average than any WHO calibrated PSA assay. However, using the word bias instead of average bias implies that any and all biases will be estimated. This is not the case for EP9 since a result with a high bias that is detached from the other results will be excluded from the analysis. Moreover, two assays with the same regression coefficients but with different scatter will exhibit the same average bias (although the confidence intervals for average bias will be different).

Historically, EP9 was released as a Proposed document in 1986; however, it is based on an earlier CLSI (then called NCCLS) document called PSEP-4 released in 1979. The original title was “Comparison of Methods Experiment.” So later versions included the term bias estimation in the title. Note that there are two documents that estimate bias due to interference (EP7 and EP14) and it is only relatively recent that two documents estimate all bias in an assay – EP21 (2003) and EP27 (not yet released). See the CLSI site for details about individual documents.


Response to Dr. Rich about Comparative Effective Research (CER)

May 12, 2009

talkI am happy to respond to Dr. Rich’s blog entry which refers to me. I enjoy reading Dr. Rich’s blog, value his wisdom, and thoroughly enjoyed his book. I agree with Dr. Rich that

“the controversy regarding CER has to do with how its results will be applied”

Regarding:

“DrRich suspects that Dr. Krouwer is more familiar with laboratory research than with clinical research”

Actually, what I do is analyze data. It usually does relate to the laboratory via diagnostic companies.

Dr. Rich talks about the problems with randomized clinical trials (RCT). I agree with his points; however, take different treatments for prostate cancer. Here, a RCT make no sense because the study would be too complicated, expensive, and take too long with respect to a viable alternative.

There are at least 8 treatments for prostate cancer:

  1. Active surveillance (watchful waiting)
  2. Open radical prostatectomy
  3. Laparoscopic radical prostatectomy
  4. IMRT radiation
  5. Proton beam therapy
  6. Brachytherapy (seeds)
  7. Cyrotherapy
  8. HIFU (not approved in the US)

However, the number of categories increases because:

Patients are usually subdivided into low, intermediate , or high risk as a function of PSA, Gleason score, and stage. This leads to 24 treatment categories. Moreover, there are a variety of other complicating factors such as combination treatments including the use of androgen derivation therapy (ADT) applied to some patients for one of several reasons, surgical skill for relevant procedures, localized therapy, salvage treatments, and so on.

The alternative to a RCT is simply to analyze existing data. There are around 185,000 men diagnosed with prostate cancer each year. Each ten years provides 1,850,000 cases. That’s a lot of data! If one had a way of simply collecting the outcomes (success rates, side effects, total cost) of each case, one would have CER.

Yet, existing studies that attempt to answer some of these questions have bias. To deal with all points would be a book so take one example – incontinence as a side effect of treatment, especially radical prostatectomy.

First, one has to take into account the pre treatment rate of incontinence which is around 9% – see WebMD (the pre treatment rate is a bigger issue for another side effect – erectile dysfunction). It is not clear that studies take into account prior side effect rates.

From Medscape, physician-reported studies report a lower rate of incontinence than when patients are surveyed. One can speculate that physicians would like to minimize the side effects of treatments that they provide. Moreover, the definition of incontinence including when it is measured after treatment differs among studies or is not provided at all. And of course, some of the definitions are questionable. Thus, using less than three pads a day qualifies as continent according to Zincke, et. al.

What is needed is standardization of terms and survey instruments and a reliable method to collect this data. If this doesn’t occur, then it will be hard to draw conclusions about this side effect whether the study is transparent (well described) or not. One can argue that there will always be some bias but it does matter how much if policies will result from such studies.