Six Sigma can be dangerous to your health

March 13, 2008

sigma

At a recent conference, there were several presentations about six sigma for clinical laboratory assays. To recall, sigma is calculated as Sigma = (TEa – bias)/CV where

TEa is the total allowable error
Bias is the inaccuracy of the measurement procedure
CV is the imprecision of the measurement procedure

The problem with six sigma is that’s it taken as a sole measure of quality – that is, if you have a high sigma value (greater than 6) then your assay is assured of high quality. The rest of this entry explains why this is wrong.

First, TEa (total allowable error) is often specially called out as medically acceptable limits. One need only read the ISO 15197 standard for glucose to see this connection. I have previously commented about this standard. The implied meaning of medically acceptable limits in shown in below.

figure 1

This is simply not the real world. Taguchi long ago specified a more realistic quadratic model of worth, which is shown below, superimposed on the original figure but in green.

figure 2

Thus points A and B are similar in bias and are similar in causing (or not causing) medically unacceptable results. It is also likely then that if point A is ok, then so is point B. It is only when one gets far away from these limits that one is almost certain to have results that can cause harm. This is shown below with point C.

figure 3

This can also be expressed as an error grid such as those for glucose. So the “sigma” calculations really only express the zone A region (grey) where 95% or more of the results should be. Zone B (white) can contain up to 5% of the results and zone C (dark grey) should contain no results. The error grid contains more information since each set of limits is different for each concentration. An error grid is shown below, taken from FDA guidance. In the guidance, WM is the test method and CM is the reference method. (In the document WM=waiver method and CM=comparative method).

figure 4

So the problem is that sigma only accounts for zone A, but patients are harmed by values in zone C!

Now one might argue that there is nevertheless a relationship between sigma and the three zones, meaning that high sigma values are unlikely to have values in zone C and low sigma values are likely to have such values. This is also not true. Here is why.

1.       Often incorrect models are used to asses total error – see here.

2.       In estimating bias and CV, outliers – the very values that cause harm - are often thrown out.

3.       All sigma calculations are based on the assumption that the data are normally distributed. Most data do not fulfill this criterion. This means that often there are more frequent values in the tails of the distribution (again, this is zone C) than expected by calculations based on the normal distribution

4.       And maybe the biggest reason of all, values can occur in zone C that have nothing to do with the analytical process. If there is a patient sample mix-up, this can occur and these values are excluded (when detected) from virtually all analytical evaluations.

Think of it this way. If a loved one suffered medical harm, due in part to an erroneous lab result, would it make you feel better to know that the assay had a high sigma value? And would you associate that assay with quality?

I will comment on how one can address these issues in a future entry.


At risk behavior

March 3, 2008

risk

I am involved in risk management standards for clinical laboratories, where the focus has been on understanding how manufacturer’s devices can fail and how a clinical laboratory can put in place control measures to prevent these failures from causing harm.

My concern with these standards is that there is not enough emphasis given to the clinical laboratories own sources of error – its people. Among problems related to human errors are cognitive errors, non cognitive errors, reckless behavior, and at risk behavior – the topic of this entry.

At risk behavior is behavior that increases risk where risk is not recognized, or is mistakenly believed to be justified. Anyone who manages people must have had the experience by hearing  (perhaps second hand) “I don’t think that’s necessary and I’m not going to do it.” And of course, parents are familiar with at risk behavior practiced by their children.

An example of healthcare at risk behavior is reusing syringes. This occurred recently at an endoscopy clinic in Nevada and has affected up to 40,000 people. In reading the patient empowerment blog, one learns about other cases of reused syringes. In a case in Long Island, the physician reused syringes only for the same patient, but the syringes were used with multi-dose vials and these vials were used across patients.

In the recent case of reducing central line infections, Dr. Peter Pronovost observed that of the steps associating with placing a central line, in a third of patients, doctors skipped at least one step. Whereas, some of this could be attributed to non cognitive errors (slips), it could also be associated with at risk behavior. The control measure that worked here, was a double check step, whereby another healthcare provider would check to make sure each step was followed.

Discovering at risk behavior may not be easy, hence it needs to be on one radar’s screen.


Should one focus on a failure in a procedure or the outcome of such a failure?

February 14, 2008

money

Withholding payment for adverse events is a financial incentive to promote patient safety. Whether this incentive makes financial sense is something I will comment on later or perhaps not at all. For now, my comments are about the policy as it recently appeared (1).

 

 

The authors suggest the following criteria to withhold payment.

·         Evidence demonstrates that the bulk of the adverse events in question can be prevented by widespread adoption of achievable practices.

·         The events can be measured accurately, in a way that is auditable.

·         The events resulted in clinically significant patient harm.

·         It is possible, through chart review, to differentiate the adverse events that began in the hospital from those that were “present on admission” (POA).

The problem is with the third bullet and can perhaps be illustrated by the following figure.

FMEA FRACAS

In this figure FMEA events are shown by the dashed line.  The red dashed line is before FMEA. The green dashed line shows that after a successful FMEA, risk of failures has been reduced. FRACAS events are shown by the solid lines. The green line shows a reduction in the failure rate after FRACAS.

Keep in mind, for the dashed lines (FMEA), no failures have occurred, while for the solid lines, failures have occurred.

Now the policy defines a failure as an adverse patient outcome. One can view outcomes as the end of  an event cascade as in the next figure.

error cascade

Assume that event C is an adverse patient outcome. According to the policy, payment is withheld only when event C is observed. In the first figure, the relevant concern area is shown by the ellipse as it is assumed that these are all high severity (severe patient harm) events.

This policy therefore excludes the following cases:

All FMEA events. That is, a procedure with a correctable high risk will be excluded from this policy because the event has not yet occurred. Considered the case of the Duke transplant error (2), before it happened. One can infer that this was a high risk procedure that would have benefited from a FMEA. In essence, this policy waits for disasters to happen.

All near miss events. Consider the case of the patient who had an MRI (3). Blood pressure monitor tubing had to be disconnected for the MRI. After the procedure, the tubing was incorrectly connected to an IV line. Before air was delivered from the automated blood pressure monitor, a family member noticed that things didn’t look right and contacted a nurse, who corrected the problem. Thus, there was no adverse event.

All defective procedures that don’t result in severe patient harm. Consider a healthcare worker who violates hospital policy (at risk behavior according to Marx (4)), which results in a patient fall. In this case, the fall results in a minor injury.  This is an important case because the policy fails to properly reflect risk management principles.

For a procedure that has a problem (e.g., a failed event), one has to classify the severity of the failed event and its probability (FMEA) or frequency of occurrence (FRACAS). The severity is classified not necessarily by the failed event but by the effect of the failed event. The effect is itself an event and can be a spectrum of severities. In the case of a patient fall, there is a distribution of harm associated with the fall event – some falls will result in severe harm, some will result in minor harm. Traditionally, in risk management, if severe harm is possible, then severity is associated with severe harm, even if the probability of severe harm is low. In this sense, severity is equated with potential outcome, regardless of whether that specific outcome has occurred.

One also has to classify the probability (FMEA) or frequency of occurrence of the event (FRACAS). Here, assuming FMEA, one could choose between the probability of the failed event or the probability of the effect of the event (the adverse outcome). It is recommended to use the probability of the failed event, not the probability of the effect of the event. This is because one usually has control over the failed event and does not have control over the effect of the event.

Example: If a clinical laboratory provides a clinician with an erroneous result and the effect of that could be patient harm, the event is classified as severe. The probability is the probability of erroneous result, not the probability of patient harm, because patient harm is outside of control of the clinical laboratory (the clinician might not act on the result, might suspect it is erroneous and request it to be repeated, and so on).

Summary

This policy will miss many quality issues and deviates from traditional risk management.

References

  1. Wachter RM ,Foster NE and Dudley RA Medicare’s Decision to Withhold Payment for Hospital Errors: The Devil Is in the Details The Joint Commission Journal on Quality and Patient Safety 2008;34: 116-123, see http://psnet.ahrq.gov/resource.aspx?resourceID=6760
  2. See http://www.cbsnews.com/stories/2003/03/16/60minutes/main544162.shtml
  3. See http://www.ismp.org/newsletters/acutecare/articles/20030612.asp
  4. Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives http://www.mers-tm.net/support/Marx_Primer.pdf


Software Verification and Validation

January 24, 2008

SW bug In spending two sessions with groups of people who verify and validate medical device software, I got the impression that most effort is spent on testing code (to the requirements that exist). In part, I based this assessment on the amount of questions (e.g., interest by the audience) when code testing was discussed vs. examining requirements. Yet, in reviewing recalls, and my experience in the IVD industry, I suspect that that most errors are caused by wrong requirements (see figure).

 

 coderequirements.jpg

 This makes me recall some definitions.

Bug – A coding error that prevents the software from meeting its stated requirement. A divide by zero error is a bug, but if the denominator can never be zero, this bug will never be a failure. Never be zero means the value can never be zero without a code logic statement such as If X <> 0, then … If the code logic statement were present, there would be no divide by zero bug.

Failure – Any deviation from customer expectations. This rather liberal statement is similar to the general definition of quality by ASQ. Each failure must be evaluated by the software / product development team to decide whether they agree and of course deviations have non software causes.

Example – A home glucose meter produces a value over 500 mg/dL. The meter displays ERR1. This is a requirements error. It is known the value is too high ( it could be 501 or 1,000). The meter should say something like HIGH.


FMEA vs. FRACAS

January 4, 2008

concept

I have previously compared FMEA and FRACAS, here. Another simple difference is:

(Successful) FMEA reduces risk.

(Successful) FRACAS reduces failure rates.

Now, one often hears about successful FMEAs. In my experience, these are not FMEAs, they are examples of FRACAS. An example is here. How can one tell that this is FRACAS and not FMEA. It’s simple - what is described is the reduction of a too high failure rate to a lower rate. With FMEA, the failure rate is zero – the event has not happened. What one does is to reduce the risk of this potential failure, from some amount to a lower amount. This is perhaps one of the reasons, one does not hear too much about FMEA successes. As I said before, to say that something that has never happened is now even less likely to happen (due to FMEA) just isn’t too exciting.

To reduce failure rates is a good thing and it is not a big deal to call this FMEA when it is FRACAS. However, it is simple to use the correct terms and if one doesn’t one might wind up neglecting to perform FMEA when it’s needed.


A Different Animal

January 1, 2008

different

I have spent my career in industry in R&D in a quality role. As I continue to interact with people that deal with quality in the in vitro diagnostics industry, I get the impression that most of these people are not from R&D but rather from regulatory affairs. What’s the difference? My perception is that regulatory affairs professionals focus more on compliance – I have focused on measuring things. Compliance is often assessed through audits with documentation a large part of audits. Measuring things forces activities to focus on improving the metric of interest. Documentation is of less importance.

What’s another difference? Whenever I write an article for publication on quality, it’s reviewed by regulatory affairs professionals. I can tell by the comments (e.g., they disagree with most of what I say). R&D people agree with me.


Frequency of QC in the clinical laboratory

December 9, 2007

Lab

Kent Dooley has written an interesting essay, which is here. One of the points he makes is that not all clinical laboratory errors result in patient harm because clinicians will not always act on the erroneous result. So if an assay result doesn’t agree with other clinical data, the clinician may suspect the result might be wrong and ask to have it repeated. Dooley suggests that the minimum QC frequency should follow the time course for the likelihood of a clinician requesting a repeat sample, so that upon repeat, if the result had been in error, the new result will be correct (because now QC has been run).

Now, I am unencumbered by the knowledge and experience of working in a lab but my view of things is somewhat different. It seems to me that there are several error/detection/recovery possibilities as shown in the figure below. (Note, better pictures are here).

Error Detection Recovery

The problem of waiting for a clinician (of for that matter a patient) to question a result, before running QC is that it doesn’t take advantage of the purpose of QC, which is shown below.

QC

That is, one runs the assay and at some time QC. If the QC is ok, then the results are released to the clinician. If not, one troubleshoots the assay including possibly rerunning patient samples. Using this scheme, QC frequency should not be determined by a retest time course but rather by the turn-around-time requirement for the assay.

Now if the clinician requests a the assay to be repeated, and QC had already been run, it is unlikely that running a second QC will detect anything. QC has limitations in its ability to detect error (see figure below). Random biases and random patient interferences will not be detected by QC.

QC properties

This figure came from previous considerations about equivalent QC, which are here, and here.

Besides suspecting assay error, many assay results are repeated because a condition is being monitored. Delta checks are a type of QC that is performed on these samples to determine whether the difference between results is expected. Exactly how the clinical laboratory could act on the knowledge that the clinician suspects that something is wrong with the assay result is a topic for clinical laboratorians to answer.


Central lines and FRACAS

December 7, 2007

surgery

One hears of FRACAS success stories (like the one below) and FMEA failure stories (like the wrong blood type organs transplanted at Duke). A reason one doesn’t hear of FMEA success stories is that to say that something that has never happened is now even less likely to happen (due to FMEA) just isn’t too exciting. FMEA success stories are often not cases of FMEA, they are FRACAS, since rate improvements are discussed. FRACAS failures – we tried something, it didn’t work – are not very interesting.

A recent article in The New Yorker (1) provides an example of a FRACAS success story.

In the article, there is no mention of FRACAS but many of the steps were followed. The issue was a too frequent infection rate in central lines. It is important that one can measure this rate. One knows how many central lines are used, infections manifest themselves and their cause can be determined by culturing the lines. Some undercounting is possible but the rate seems fairly reliable.

The man behind the work, Dr. Peter Pronovost, first observed events for a month within the context of the process of placing central lines (e.g., process mapping). Errors in the process steps were identified. Since these steps were simple, such as washing hands, one could partly view these errors as non cognitive errors. This suggests a control measure such as a double check to prevent such “slips”. Actually, besides slips, there may have been some at-risk behavior (2). This is behavior that increases risk where risk is not recognized, or is mistakenly believed to be justified. The main control measure used was a checklist, with the addition of having nurses double check to see that the checklist steps were properly done. Then the rate was measured again and found to be considerably lower. All of this was published (3).

It was mentioned that an alternative control measure had been tried; namely, using central lines coated with antimicrobials. This expensive control measure failed to provide a substantial reduction in infection rates. This illustrates that one must be open minded when selecting control measures. There is sometimes a bias towards fixing the “system” (e.g., such as with coated lines) rather than fixing a people issue (e.g., which often implies blame). Dr. Pronovost implemented some system control measures by getting the manufacturer of central lines to include drapes and chlorhexidine – items that should have been available at the bedside but often were not.

Another big part of this story is ongoing resistance towards implementing this control measure more widely, even after it has been shown to be effective and low cost. Any control measure can be viewed as a standard and standards are not very popular. People will argue “but our situation is different”, “ICUs are too complicated for standards”, and so on. Financial incentives (or disincentives) for standards (e.g., P4P) loom. Dr. Gawande goes on to say how complicated things are in an ICU, yet there is precisely where standards helped. A similar situation happened in anesthesiology in the late 70s and early 80s. (Here, critical incident analysis was used and is basically the same as FRACAS.) The error rate was too high, effective control measures were developed, and widespread implementation of the control measures took considerable effort. You can read about that story here.

References

1.       Gawande A. Annals of Medicine. The checklist. The New Yorker, Dec. 7th issue, 2007, see here (don’t know how long this link will work).

2.       Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives http://www.mers-tm.net/support/Marx_Primer.pdf

3.       Pronovost P. et al. An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU. N Engl J Med 2006;355:2725-32.


ISO 14971 authors, expertise, and potential conflicts of interest

November 28, 2007

question

I have questioned the elevated status of ISO standards claimed by some. Often, people justify this status by asserting that ISO standards are prepared by a consensus of experts. This entry explores three topics related to this assertion:

·        ISO authorship

·        Expertise of authors

·        Potential conflicts of interest for authors

The membership of an ISO committee

If you have an ISO document – I have the latest version of ISO 14971 – one thing to notice is that there is no list of authors nor even a list of the committee members. I don’t understand why it is the policy of ISO to hide this information, nor could I find such an explanation (or list of members).

Note that CLSI (formerly NCCLS) has in each standard a list of authors and subcommittee members, advisors, and observers (as well as area committee members).

What does it take to be an expert?

A simple if not flip answer to this is to be on an ISO committee, since by assertion, all committee members are experts. Of course, for ISO committees, one cannot form an opinion, since membership is unknown outside of the committee.

Potential conflicts of interest

Here are some opinions about conflict of interest regarding ISO membership (given that I don’t have a clue who the authors are). To understand conflict of interest concerns, it is helpful to understand that ISO documents have quasi regulatory status. As such, organizations can be divided into two groups: regulatory providers, and regulatory consumers (see http://krouwerconsulting.com/Essays/StandardsGroups.htm)

Manufacturers – The membership from this (regulatory consumer) group is often filled with regulatory affairs professionals. Their potential conflict of interest is to shape the documents to favor ease of compliance. They favor horizontal over vertical documents (see http://krouwerconsulting.com/Essays/StandardsGroups.htm)

Clinical laboratory or hospital professionals – Although this group would not seem to have a vested interest, one can question, how many of these people serve as consultants for industry. If a standard is written for the clinical laboratory or elsewhere in the hospital than this group has the same regulatory consumer potential conflict of interest as the manufacturer.

Regulators – As a regulatory provider group, the potential conflict of interest is the healthcare economics policy in place by the current administration.

Consultants – This group often has a high potential conflict of interest since some consultants make their living by helping companies comply with ISO standards.

Trade associations – This group is the voice of manufacturers and if represented on a ISO group has the same potential conflict of interest as for manufacturers, but with the added concern that trade groups are skilled in organizing manufacturers.

Note that for CLSI, any prospective member must fill out a conflict of interest statement. I am unaware of anyone ever being turned away from membership due to the conflict of interest statements.


ISO 14971 and Residual Risk

November 21, 2007

competition

The last entry was about FMEA goals, yet, the word “goal” isn’t in ISO 14971. Maybe “goal” suffered the same fate as the word “mitigation” – banned from ISO. There is an implied goal in ISO 14971 - the residual risk must be acceptable. To recall, residual risk is the risk that remains after control measures have been taken. Here’s where things get a little tricky.

In cases where the residual risk is unacceptable, one is supposed to perform a risk benefit analysis to determine if benefits of the medical procedure performed by the device outweigh any possible residual risk.

To frame this discussion, consider two types of residual risk:

 

 

1.       A residual risk from a known issue, such as an interference, where eliminating this risk is not “practical “

2.       The overall residual risk from unknown issues. A certain amount of effort is used to search for risks (e.g., through FMEA, FTA, and FRACAS). At some point, more effort is considered not practical. Note: One can look at FDA recalls to see that unknown risks are often found in released products and lead to recalls (1).

Use of the word practical in ISO 14971 implies that in some cases, risk reduction is too expensive. This is not meant to be pejorative since everyone has limited resources.

In most cases in the standard, the cost benefit analysis is positioned as an analysis of the medical device’s clinical benefit to the patient vs. its risk. But ISO 14971 does point out an additional frame for the discussion.

“Those involved in making risk/benefit judgments have a responsibility to understand and take into account the technical, clinical, regulatory, economic, sociological and political context of their risk management decisions.”

To understand the issue, consider Type 1 diabetes as an example with the medical procedure being use of a home glucose meter. Because of risks 1 and 2 above, the glucose meter will fail and provide an erroneous result, albeit rarely. This is the current status and it is clear the benefit of the home glucose meter outweighs the risk (e.g., ADA recommendations to test for glucose). Yet, if one conducts a thought experiment and starts raising the frequency of (all) home glucose meter failures, simple decision analysis (2) still warrants use of the device. That is, measuring glucose, even if it occasionally (e.g., more often than rarely) gives an erroneous result, is better (clinically) than not measuring it.

If a company is working on a home glucose meter which provided an erroneous result too often (e.g., compared to existing meters), they will keep developing the meter until its failure rate is competitive. That is, there is a hierarchy of requirements for release for sale and often the competitive requirements (features needed to sell the product – including quality) are more stringent than any medical need or regulatory requirement (3).

Would you pay 2.5 million dollars to go to Cleveland?

Richard Fogoros suggests that there is a limit that we can spend for healthcare (4). To make this point, he says that if a plane could be built that could be survivable for most crashes, most people would not pay for an astronomical ticket price.

So regulators could require lower failure rates (less risk), causing companies to invest more, which would result in higher healthcare prices, but this is not done because it is unaffordable, hence the level of risk allowed is usually driven by competition. This is risk management but it is not the clinical benefit risk analysis described in ISO 14971– it is financial risk management.

References

1.       See http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfRES/res.cfm

2.       Krouwer JS. Assay Development and Evaluation: A Manufacturer’s Perspective, AACC Press, Washington DC, 2002, Chapter 3.

3.       Krouwer JS. Assay Development and Evaluation: A Manufacturer’s Perspective, AACC Press, Washington DC, 2002, pp 38-39.

4.       Fogoros RN. Fixing American Healthcare. Publish or Perish Press, Pittsburgh, 2007.