|I have been interested in ways to compare different assays. The recent guest essay on the Westgard web site on this topic stimulated the following comments.
The title of the essay is: “The Quality Goal Index – Its Use in Benchmarking and Improving Sigma Quality Performance of Automated Analytic Tests” The essay starts with:
“The long-term goal of six sigma quality management is to achieve an error rate of 3.4 or less per million opportunities for all laboratory processes. In percent terms, that’s an error rate of less than 0.001%.”
A few sentences later, the author is describing how to calculate “sigma performance” as a measure of quality performance and states that for the CV estimate:
“Data integrity can be assured only when procedures are in place and rigorously practiced to exclude erroneous quality control results due to procedural blunders and statistical outliers.”
This sentence causes me to question whatever follows with suspicion. The author excludes two types of errors: blunders and statistical outliers. From a clinician point of view, interest is in obtaining the correct answer (meaning “correct enough”). If an incorrect answer is produced by a blunder, it is nevertheless wrong. So right off the bat, one knows that whatever is being measured by the author is a subset of real quality performance and one has no doubt heard that the majority of clinical laboratory errors come from pre or post analytical problems.
But perhaps the author intends to measure a subset of quality performance – that due to the analytical process. Then I have a problem with excluding statistical outliers. There is no justification for this exclusion. From a simple numbers view, assume that excluded data are greater than 3 standard deviations from the average. This means excluding about 0.3% of the data, if the data are normally distributed. Well, that’s one way to get to an error rate of less than 0.001%! One might argue that exclusion of a specific outlier is ok because the outlier must have been due to a blunder, but that is speculation and it is possible that the outlier occurred as part of the analytical process – one simply does not know.
From another point of view, if one constructs a Clarke or Parkes type of error grid (e.g., similar to that used for glucose), then dangerous errors will be (by definition of this grid) large errors in certain regions and are likely to be outliers (in a statistical sense) and could easily be excluded. But these are the very errors that should be measured (see essay on FDA’s new waiver guidance).
Along these lines, note that serious assay errors are associated with patient harm. These rare values:
The same arguments apply to proficiency testing which often have automated outlier rules that “clean” the data.
This same trend to exclude outliers has been prevalent in discussions to publish a CLSI standard for GUM (Guide to the Expression of Uncertainty in Measurement).
But not excluding outliers messes up our analysis
Welcome to the real world. That’s true. This is why I recommend to assess quality control data which:
Even if outliers are included
Remember that even if outliers are included, the quality performance measured is still a subset of the analytical performance because random biases, especially those due to patient interferences will be missed because:
This is described in more detail in another essay.
In the list of essays, I used Outlier……………..s. I first saw this in: Beckman, R. J., and R. D. Cook, (1983). Outlier…s. Technometrics, vol. 25, pp. 119-149.
|FMEA and Validation – 2/2006
In conducting a FMEA, one goes through the steps of
If any of these steps is likely to be neglected, it is the last one – that of performing yet another FMEA! (sounds recursive too, since a subsequent FMEA can cause more changes). The purpose of this essay is to consider validation of a FMEA, which could be thought of as part of the task of performing a FMEA on the new process (e.g., as changed by mitigations).
Recall a model used in FMEA; namely, the error, detection, recovery model (see figure), where one is trying to prevent the effect of an error, given that an error has occurred. (see also the near miss essay).
For the example, consider the process steps when a sample arrives for analysis at a hospital laboratory (1). One of the steps is to examine the sample visually for lipemia, and if this condition is observed, to perform a “recovery”, often by notifying the source that sent the sample and or by further processing the sample. Assume that the original error occurred outside of the laboratory that is responsible for analyzing the sample. This is a common situation although it is also possible that the hospital laboratory that analyzes the sample may also be responsible for preparing it.
To put some numbers on this example, assume that the hospital laboratory receives 100,00 samples per year and that 1% of these samples should fail the criteria for lipemia. This means that 1,000 samples are lipemic. Now one may reason that all lipemic samples will be detected and a recovery performed because detection and recovery steps are in place. However, consider what would happen if these steps did not always work. Assume that the detection step was 95% effective and the recovery step was 99% effective. This means that of the 1,000 samples that are lipemic, 50 will not be detected and they will be analyzed in error. On the other hand, of the 950 samples that are detected, 9.5 will fail recovery, meaning that the total number of samples subject to the error effect is (on average) 50+9.5 = 59.5/100,000 or 0.0595%.
Assume also, that the number of samples for which lipemia would cause a result error is 2%. This means that for the original 100,000 samples, a higher level observed error effect of wrong answer is the combined probability of ((59.5 / 100,000) x (2,000 / 100,000))*100,000 = 1.2 samples on average every year. This error could in turn result in the spectrum of no patient harm to a patient death but the point of this essay is to go back to the FMEA steps that have been put in place to detect and recover from the original error (rather than to focus on outcomes).
In this example, I arbitrarily set detection success at 95% and recovery success at 99%. The laboratory person responsible for quality might argue that both steps are failsafe and hence virtually 100% effective. If there is a valid criterion for lipemia it might be hard to imagine how one could miss detecting it or fail to initiate a recovery – nevertheless, validation provides objective evidence that detection and recovery goals meet objectives. To set up a validation experiment for detection, one might have an independent observer rate all samples for lipemia, in a way that does not interfere with the routine process in place for examining the sample and then one can tally results as:
In this experiment, one is assuming that the independent observer is correct. An additional part of the validation experiment is the sample size. That is, say the independent observer has checked 100 consecutive samples and found no mismatches. The table might look like:
The observed error rate for each of the two possible error types is zero but the 95% confidence interval (2) for the two mismatch error rates are:
The problem is that there has only been 1 opportunity to misclassify a lipemic sample so the confidence interval actually says that this error rate could be as high as 95%! Say one goes back and rigs the experiment to include 10% lipemic samples and runs the experiment for 500 samples and gets the following results.
The observed error rate for any error is again zero but the 95% confidence interval for the two mismatch error rates are now:
So even with all of this work, one has only “proved” (e.g., with 95% confidence) that one has about a 94% or better error detection success rate of detecting all of the lipemic samples. Of course, it is also possible that mismatch rates will be non zero. The same arguments apply to recovery.
Errors and Outcomes
The initial error rate caused by missing detection and recovery was assumed by me to be 59.5 samples per year but this error rate leads to an outcome of a wrong result of only 1 sample per year which may lead the hospital laboratory into a false sense of security, meaning that their current process may be flawed but not lead to customer complaints. Hence, one should exclude outcomes from the analysis, since the hospital laboratory can only control their detection and recovery rate as a means to control the outcome rate.
Making up examples is difficult but there are real problems
Validation should lead to a case where no errors are found, which may make one exclaim they have been forced to do something for which they already knew the outcome. However, consider the following real cases:
Detection – Detection was missed when organs of the wrong blood type were selected to be transplanted, the transplant occurred and the patient died (3). Detection – Airline pilots repeat air traffic controller orders to detect miscommunication. Yet, miscommunication detection failed and caused one of the largest air disasters ever (4). Recovery – It was detected that the wrong leg was scheduled to be amputated but the recovery (change the operating room schedules) failed. Not all operating room schedules were changed (5) and the wrong leg was amputated.
Hence, even though it might be hard to envision how things can go wrong, there are real cases where seemingly simple detection and recovery process steps have failed. Validation is suggested as a means to help to ensure that new or existing mitigations work – and should be considered as a tool to help with performing a FMEA on mitigations.
The quality of validation – Equivalent QC
CMS has proposed equivalent QC for clinical laboratories. In changing the QC process, CMS requires validation (of use of equivalent QC). I have commented on the inadequacy of this validation (see equivalent QC essay). This leaves the question of what is an adequate validation. In some cases, people conducting a FMEA might assume perfect detection and recovery. Some level of validation beyond this assumption is warranted but must one conduct experiments that contain thousands of samples to prove that rare events haven’t happened? This topic will be pursued in a future essay.