CLSI EP22 EP23 Review

August 4, 2008

EP22 was created as a means to use risk management to allow manufacturers to recommend the frequency of external quality control run by clinical laboratories. This was the so called option 4. Options 1-3 were part of the original CMS proposal to allow clinical laboratories to reduce the frequency of external quality control to once a month (provided certain conditions were met).

 

 

 

EP23 was the clinical laboratory follow on document to EP22.

Here’s my take on these two documents.

1.       Manufacturers won’t provide the information as suggested by EP22. (This information consists of experiments to demonstrate the efficacy of internal control measures). It would be a lot of work (e.g., cost) and there’s no regulatory requirement to do so. Moreover, if this information were provided, then it is labeling which would require FDA to review it. It is not clear that FDA has accepted this review task. 

Update on 8/4/08 - During a CLSI presentation at the AACC meeting in Washington, Alberto Gutierrez from the FDA gave a presentation. Afterwards, I asked him if FDA would review the material about internal control experiments that manufacturers might present as part of the package insert. He said that FDA would review this material - but from what was said it seemed that the review would be superficial and that only egregious problems would be flagged by the FDA.

2.       Clinical laboratory staff does not have the expertise to review this information, were it provided. This does not mean that clinical laboratory staff is incapable of reviewing it – they could acquire the expertise – it just seems unlikely.

 

3.       Should manufacturers provide this information and clinical laboratory staff review it, there would be no benefit with respect to improving QC. This is illustrated by an example in EP22 where the failure mode of “incorrect results due to low volume sample” is examined. After presenting the results of an experiment to show how an internal system control works, the user control measure is to “ensure that adequate volume of sample is presented to instrument.” But clinical laboratory staff would (or should) do this anyway. They don’t need EP22 and EP23 to know that one should follow the manufacturer’s instructions and to refrain from doing something stupid.

 

In clinical chemistry, risk management is “in.” But there are signs that its popularity is already starting to wane. This is unfortunate, as there is a great opportunity to use risk management tools to reduce both the risk and occurrence of laboratory errors. But one must focus not just on potential system errors, as EP22 and EP23 do, but on human errors as well.


Reading Quality Digest can be dangerous to your health

June 17, 2008

right tool for jobIn the June 2008 issue of Quality digest, there is an article by Jay Arthur entitled “Statistical Process Control for Healthcare” (1). After the usual boilerplate type of introduction, something caught my eye; namely, the so called good news that there is “inexpensive Excel based software to create control charts … .“ This made me go to the end of the article where sure enough the author just happens to sell such software. This may have been a good place for the author to introduce the term bias.

To understand a more serious problem with this article, consider a hospital process; namely analyzing blood glucose in a hospital laboratory. Because such a process has error, quality control samples are run. Say such a control has a target value of 100 mg/dL.  The values of the quality control samples are plotted by SPC software and rules are formulated. If the glucose control value is too high or too low, the process is said to be out of control and action is taken.

Now,  Mr. Arthur is trying to push SPC software not for a process but for errors in the process. For example, he uses the infection rate in a hospital. But the infection rate error is not a process that one wants to control – of course one does not want it to become worse - but its target is zero.

A more useful example than the hypothetical one provided by Mr. Arthur was published recently (2). Here, the authors were faced with an undesirable hospital infection error rate and set out to observe where errors occurred in the process of placing central lines. They then provided control measures and continued to track the error rate, which was reduced to zero. This is not SPC! It is much more like a FRACAS (Failure Reporting And Corrective Action System).

In another part of the article, Mr. Arthur suggests that “never events” can be tracked by SPC. Never events – a list of 28 such events have been put forth by the National Quality Forum – have as implied, targets of zero. Such an event is wrong site surgery. One should use something like FMEA (Failure Mode Effects Analysis) to reduce the risk of such events. It is silly to suggest SPC software for never events.

References

1.   See. http://www.qualitydigest.com/currentmag/articles/03_article.shtml

2.   An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU. Pronovost P, Needham D, Berenholtz S, Sinopoli D, Chu H, Cosgrove S, Sexton B, Hyzy R, Welsh R, Roth G, Bander J, Kepros J, Goeschel C N Engl J Med 355:2725, December 28, 2006


Westgard Quality Control Workshop – Part 3

June 5, 2008

dohI just returned from the Westgard quality Control Workshop, where I was a speaker and have a few blogs worth of comments – this is the third.

EQC – Equivalent Quality Control

This is the CMS proposal (1) to allow clinical laboratories to reduce the frequency of quality control from twice per day to once a month given that 10 days of running QC shows no values that are out (and given some other conditions).

Let’s try to construct a hypothesis to base such a recommendation. For example:

given any possible error condition that could be detected by external quality control, internal quality control would detect the same error 100% of the time.

This is about the best I can think of, which would result in the recommendation:

Stop running external quality control.

What does running 10 days of external QC with no out of control results show? The answer is nothing. This is because one can assume that during these 10 days, there were either no errors or if there were errors, external QC was not able to detect them. (It is possible that internal QC detected errors during these 10 days). In fact, this experiment is guaranteed to be meaningless. To see this, one must realize that internal QC is always “on” and precedes external QC. So to see if external QC is redundant to internal QC for an error, would mean that internal QC would detect the error and either shut down the system or prevent the result – this being the external QC sample – from being reported. However, one can get different information by running external QC for a longer period because if internal QC misses an error but external QC detects the error, then one has proved that external QC is not redundant to internal QC. This was shown to me (2) as out of control results for a range of assays ranging from 1 to 10 per year, where these were real problems. Since controls are run twice per day, the number of affected patients samples is larger.

So a lab that reduces external QC to once a month is risking an even larger number of patient samples which is made worse since the clinician has probably acted on the erroneous results.

Rather than do the experiment suggested by CMS, a lab can simply examine its external QC records for a sufficient length of time.

References

1.       To review, see: See http://www.aacc.org/events/expert_access/2005/eqc/Pages/default.aspx

2.       Personal communication from Greg Miller of Virginia Commonwealth University


Westgard Quality Control Workshop – Part 2

June 5, 2008

measureI just returned from the Westgard quality Control Workshop, where I was a speaker and have a few blogs worth of comments – this is the second.

How does one determine acceptable risk

This was one of the questions asked by a participant – are there any guidelines? I also commented recently, that in spite of all of talk about risk management and putting in place control measures until one has acceptable risk, no one knows what acceptable risk means. Here’s some more thoughts on this.

There are different risks (1). These can be enumerated. These include:

perception – complaints from either hospital or non hospital staff

performance – traditional quality, including errors that can affect patient safety

financial – errors that threaten the financial health of the service including lawsuits

regulatory – errors that threaten the accreditation status of the service

So first, one must say which risk one has in mind. One can envision an acceptable regulatory risk (we always pass inspections) but an unacceptable patient safety risk.  Note also, that the risks are not necessarily unique. One can have a patient safety failure with or without a lawsuit.

Assume the risk in question is the performance risk and specifically about patient safety. The Cadillac version of assessing risk would be to perform a quantitative fault tree and arrive at a numerical probability of patient risk. This is unlikely and one would probably have a qualitative assessment. Whether the assessment is quantitative or qualitative, this still hasn’t answered the acceptability question.

The problem is there is no easy answer to this question. If one had unlimited funds, one could lower the risk to whatever level was desired but funds are limited by the economic healthcare policy of the laboratory’s country (2). So one answer of acceptable risk is how this economic policy is translated into regulations. (e.g., one follows existing regulations and passes inspections). Yet, this is only a quasi legal way of stating acceptable risk.

Recommendation

I suggest that risk be assessed by traditional means (FMEA, fault tree) which includes a Pareto chart or table to rank the risks. Then, if one optimizes the money that one has in implementing control measures (mitigations) by a portfolio type means, then one has an acceptable risk under the imposed financial constraints.

portfolio analysis

References

1.       Managing risk in hospitals using integrated Fault Trees / FMECAs. Jan S. Krouwer, AACC Press, Washington DC, 2004.

2.       See http://covertrationingblog.com/


Westgard Quality Control Workshop – Part 1

June 5, 2008

 

measureI just returned from the Westgard quality Control Workshop, where I was a speaker and have a few blogs worth of comments – this is the first.

What’s Missing from Clinical Laboratory Inspections

At the Westgard Workshop, most of the participants were from clinical laboratories and I was impressed with how smart these people are. I also got a sense of a tremendous regulatory burden. From the CAP CD, I obtained at the Workshop:

      The mission statement of the CAP Laboratory Accreditation Program is:

“The CAP Laboratory Accreditation Program improves patient safety by advancing the quality of pathology and laboratory services through education and standard setting, and ensuring laboratories meet or exceed regulatory requirements.”

I have had mixed feelings about inspections that certify quality and have previously reported my experience with an industry quality program – ISO 9001 (1).

Here’s my assessment of clinical laboratory inspections to certify laboratories. It would seem that the premise of these inspections is to ensure that specific policies and procedures are in place and executed as proven largely by documentation, which guarantees high quality. So what’s missing? As far as I can tell – and it is with great difficulty to read through these materials – that there is no measurement of error rates. Without such measurements, quality is unknown.

Recommendation

The regulatory bodies would describe a list of errors and their associated severities. The severities would be given numerical values such as the VA hospital system which uses 1-4. Every clinical laboratory would record each error (failure mode) that occurs in their laboratory, its severity, and its frequency (default frequency is of course 1).  They would multiply frequency x severity for each unique error (failure mode), add this up and get a rate by dividing by the number of tests reported per year.

Failing to count errors would be a serious violation.

This would be the start of a new premise for the regulatory bodies. Measure quality – if it’s unacceptable, the clinical laboratory would suggest and implement process changes. It’s a simple closed loop process. With emphasis on measurement, reliance on documentation should decrease and inspections should be less burdensome.

closed loop

References

1.       Krouwer JS. ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry. Accred. Qual. Assur. 2004;9:39-43


Acceptable Risk – Easy to talk about, but no one knows what it means

May 4, 2008

risk

Standards about risk management always talk about “acceptable risk.” This is a qualitative term. Unfortunately, for much of healthcare there is no matching quantitative assessment or goal. Consider two examples.

Statement

Because

Precision is acceptable

CV is 8% and goal is 10%

Residual risk is acceptable

?

 

 

It is possible to estimate the probability of a severe adverse event and to have an associated goal for such a probability but no one in healthcare does this. So one will see things like, “with this mitigation we have reduced the risk of the adverse event to an acceptable level” but the reality is no one knows what this really means.


Alternatives to Six Sigma

March 19, 2008

assay

This entry continues where the entry (Six Sigma can be dangerous to your health) left off. Given the problems with six sigma, what are some solutions to estimate the quality of an assay, using hCG as an example assay.

First, when total analytical error is calculated to estimate the values in zones A-C in an error grid, one should use conservative methods such as the empirical distributions suggested by the CLSI EP21A method, and where no data are deleted. Let’s say a clinical laboratory has done this evaluation with 40 patient samples for a new and reference method and found no results in zone C for an hCG assay. What can one conclude? Although there are 0% of the values in zone C, the 95% confidence interval extends to 7.2%. This means that for every million hCG results performed, up to 72,000 results could be in zone C. This is not very comforting and these types of evaluations don’t prove much, although one knows that the 7.2% rate is unlikely (because if this rate to occurred, it would be noticed).

FMEA is an approach that will provide an answer to the quality question but in its complete form, it requires considerable effort. To complete a FMEA analysis, one has to postulate all possible reasons why a result could fall into zone C. To get an idea of what is involved, take two possible failure modes, HAMA interference and a patient sample mix-up.

HAMA interference – To estimate the likelihood of a zone C result from HAMA interference, one needs to know the level of HAMA that will cause erroneous results in the assay and the probability of such levels in the population being sampled. Contacting the manufacturer might give one the level of HAMA to watch out for – I am not familiar with data about the distribution of HAMA in patient samples. Yet, one knows HAMA interference occurs (Clinical Chemistry. 2001;47:1332-1333).  

Patient sample mix-up – There are some data for patient sample mix-ups (Archives of Pathology and Laboratory Medicine: Vol. 130, No. 11, pp. 1662–1668). However, it seems that these cases are caught within the laboratory. One would need to determine how many cases actually are not caught within the laboratory. One could then model the likelihood of a zone C result by sampling from the empirical distribution of hCG results that are observed on the lab to see the likelihood of a mix-up causing a zone C result.

Because there are so many existing data in a clinical laboratory, one may also have the opportunity to perform FRACAS types of analyses. That is, in addition to modeling probabilities, once could use existing data to count actual failures.

One must then continue:

  • with each other possible failure mode, calculate the probability of zone C results
  • calculate the overall probability of zone C results (from all failure modes) and determine if that risk is acceptable
    • special software is typically used to perform these calculations
  • construct a Pareto table if the overall probability of zone C results is too high and
  • propose control measures to lower the overall risk to an acceptable level
    • the control measures must of course be affordable

At this point, one can get the idea that this level of effort is out of reach for clinical laboratories since the level of expertise and work need just to estimate the likelihood of a zone C result is huge. Even if a clinical laboratory could perform this task, it makes no sense to require every clinical laboratory to do so.

One possibility is to have a standards group tackle such a task., although this too has limitations as was shown for a (universal) control measure to prevent wrong site surgery.

Another possibility is to perhaps leverage resources beyond the clinical laboratory. For example, one could insist that before treatment for trophoblastic carcinoma, an hCG result should be confirmed either by performing a reference assay or perhaps by treating the sample and rerunning it. This requires an interaction between the clinical laboratory and clinicians.

So there are no easy answers to preventing severe, low frequency failures, (that cause patient harm) but as discussed before, coming up with a sigma estimate for an hCG assay, is also not the answer. Nor is doing nothing.


Six Sigma can be dangerous to your health

March 13, 2008

sigma

At a recent conference, there were several presentations about six sigma for clinical laboratory assays. To recall, sigma is calculated as Sigma = (TEa – bias)/CV where

TEa is the total allowable error
Bias is the inaccuracy of the measurement procedure
CV is the imprecision of the measurement procedure

The problem with six sigma is that’s it taken as a sole measure of quality – that is, if you have a high sigma value (greater than 6) then your assay is assured of high quality. The rest of this entry explains why this is wrong.

First, TEa (total allowable error) is often specially called out as medically acceptable limits. One need only read the ISO 15197 standard for glucose to see this connection. I have previously commented about this standard. The implied meaning of medically acceptable limits in shown in below.

figure 1

This is simply not the real world. Taguchi long ago specified a more realistic quadratic model of worth, which is shown below, superimposed on the original figure but in green.

figure 2

Thus points A and B are similar in bias and are similar in causing (or not causing) medically unacceptable results. It is also likely then that if point A is ok, then so is point B. It is only when one gets far away from these limits that one is almost certain to have results that can cause harm. This is shown below with point C.

figure 3

This can also be expressed as an error grid such as those for glucose. So the “sigma” calculations really only express the zone A region (grey) where 95% or more of the results should be. Zone B (white) can contain up to 5% of the results and zone C (dark grey) should contain no results. The error grid contains more information since each set of limits is different for each concentration. An error grid is shown below, taken from FDA guidance. In the guidance, WM is the test method and CM is the reference method. (In the document WM=waiver method and CM=comparative method).

figure 4

So the problem is that sigma only accounts for zone A, but patients are harmed by values in zone C!

Now one might argue that there is nevertheless a relationship between sigma and the three zones, meaning that high sigma values are unlikely to have values in zone C and low sigma values are likely to have such values. This is also not true. Here is why.

1.       Often incorrect models are used to asses total error – see here.

2.       In estimating bias and CV, outliers – the very values that cause harm - are often thrown out.

3.       All sigma calculations are based on the assumption that the data are normally distributed. Most data do not fulfill this criterion. This means that often there are more frequent values in the tails of the distribution (again, this is zone C) than expected by calculations based on the normal distribution

4.       And maybe the biggest reason of all, values can occur in zone C that have nothing to do with the analytical process. If there is a patient sample mix-up, this can occur and these values are excluded (when detected) from virtually all analytical evaluations.

Think of it this way. If a loved one suffered medical harm, due in part to an erroneous lab result, would it make you feel better to know that the assay had a high sigma value? And would you associate that assay with quality?

I will comment on how one can address these issues in a future entry.


Should one focus on a failure in a procedure or the outcome of such a failure?

February 14, 2008

money

Withholding payment for adverse events is a financial incentive to promote patient safety. Whether this incentive makes financial sense is something I will comment on later or perhaps not at all. For now, my comments are about the policy as it recently appeared (1).

 

 

The authors suggest the following criteria to withhold payment.

·         Evidence demonstrates that the bulk of the adverse events in question can be prevented by widespread adoption of achievable practices.

·         The events can be measured accurately, in a way that is auditable.

·         The events resulted in clinically significant patient harm.

·         It is possible, through chart review, to differentiate the adverse events that began in the hospital from those that were “present on admission” (POA).

The problem is with the third bullet and can perhaps be illustrated by the following figure.

FMEA FRACAS

In this figure FMEA events are shown by the dashed line.  The red dashed line is before FMEA. The green dashed line shows that after a successful FMEA, risk of failures has been reduced. FRACAS events are shown by the solid lines. The green line shows a reduction in the failure rate after FRACAS.

Keep in mind, for the dashed lines (FMEA), no failures have occurred, while for the solid lines, failures have occurred.

Now the policy defines a failure as an adverse patient outcome. One can view outcomes as the end of  an event cascade as in the next figure.

error cascade

Assume that event C is an adverse patient outcome. According to the policy, payment is withheld only when event C is observed. In the first figure, the relevant concern area is shown by the ellipse as it is assumed that these are all high severity (severe patient harm) events.

This policy therefore excludes the following cases:

All FMEA events. That is, a procedure with a correctable high risk will be excluded from this policy because the event has not yet occurred. Considered the case of the Duke transplant error (2), before it happened. One can infer that this was a high risk procedure that would have benefited from a FMEA. In essence, this policy waits for disasters to happen.

All near miss events. Consider the case of the patient who had an MRI (3). Blood pressure monitor tubing had to be disconnected for the MRI. After the procedure, the tubing was incorrectly connected to an IV line. Before air was delivered from the automated blood pressure monitor, a family member noticed that things didn’t look right and contacted a nurse, who corrected the problem. Thus, there was no adverse event.

All defective procedures that don’t result in severe patient harm. Consider a healthcare worker who violates hospital policy (at risk behavior according to Marx (4)), which results in a patient fall. In this case, the fall results in a minor injury.  This is an important case because the policy fails to properly reflect risk management principles.

For a procedure that has a problem (e.g., a failed event), one has to classify the severity of the failed event and its probability (FMEA) or frequency of occurrence (FRACAS). The severity is classified not necessarily by the failed event but by the effect of the failed event. The effect is itself an event and can be a spectrum of severities. In the case of a patient fall, there is a distribution of harm associated with the fall event – some falls will result in severe harm, some will result in minor harm. Traditionally, in risk management, if severe harm is possible, then severity is associated with severe harm, even if the probability of severe harm is low. In this sense, severity is equated with potential outcome, regardless of whether that specific outcome has occurred.

One also has to classify the probability (FMEA) or frequency of occurrence of the event (FRACAS). Here, assuming FMEA, one could choose between the probability of the failed event or the probability of the effect of the event (the adverse outcome). It is recommended to use the probability of the failed event, not the probability of the effect of the event. This is because one usually has control over the failed event and does not have control over the effect of the event.

Example: If a clinical laboratory provides a clinician with an erroneous result and the effect of that could be patient harm, the event is classified as severe. The probability is the probability of erroneous result, not the probability of patient harm, because patient harm is outside of control of the clinical laboratory (the clinician might not act on the result, might suspect it is erroneous and request it to be repeated, and so on).

Summary

This policy will miss many quality issues and deviates from traditional risk management.

References

  1. Wachter RM ,Foster NE and Dudley RA Medicare’s Decision to Withhold Payment for Hospital Errors: The Devil Is in the Details The Joint Commission Journal on Quality and Patient Safety 2008;34: 116-123, see http://psnet.ahrq.gov/resource.aspx?resourceID=6760
  2. See http://www.cbsnews.com/stories/2003/03/16/60minutes/main544162.shtml
  3. See http://www.ismp.org/newsletters/acutecare/articles/20030612.asp
  4. Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives http://www.mers-tm.net/support/Marx_Primer.pdf


FMEA goals in healthcare

November 17, 2007

goal

FMEA is now a common risk management tool used in healthcare. Here’s a quick test. If the words “minimal cut set” and “Petri net” don’t mean anything to you, then you probably don’t have a quantitative FMEA goal. The rest of this entry explains some things to know about goals.

A quantitative goal must also be measureable and realistic. For example, a goal for imprecision (reproducibility) for a clinical laboratory sodium assay, might be 4% CV. One can measure this goal using a variety of experiments including those defined by standards such as the CLSI standard EP5A2.

FMEA deals with risk. Some common pitfalls about risk goals are:

·         A goal that an event should never happen. For example, the NQF (National Quality Forum) implies such by talking about “never events.” Risk is probabilistic and can never be zero. It is possible that an estimated risk is so low that in lay terms, it may be said to never be possible to occur but this lay usage is different from a formal quantitative assessment.

·         Too many goals. The NQF has a list of 28 “never events.” Virtually all of these cause serious patient harm. A goal could be restated in terms of patient harm, as the combination of risk from any of the 28 events.

·         The institute of Healthcare Improvement (IHI) implies goals in terms of evaluating the RPN (risk priority number) before and after implementing control measures. Some problems here are:

o   One may improve this metric by reducing the risk of less severe events (without reducing risk of severe events)

o   A severe risk with the lowest (categorical) probability of occurrence may be ignored as a candidate for improvement, since its RPN won’t change, but there still may be a way to lower risk (and still have the same (categorical) probability of occurrence rank.

Quantitative FMEA goals are possible and are used in the nuclear power industry although fault trees are used instead of FMEAs. Quantitative fault trees are evaluated among other ways using “minimal cut sets” and “Petri nets.”

A reasonable non quantitative goal for FMEA is to learn more about potential failure modes. However, one should realize that it is difficult to assess how much is learned.

It is easy to have a quantitative FRACAS goal because it is easy to measure failure rates from observed failures, before and after implementing control measures.