Outliers in Quality Control and Proficiency Testing

I have been interested in ways to compare different assays. The recent guest essay on the Westgard web site on this topic stimulated the following comments.

The title of the essay is: “The Quality Goal Index – Its Use in Benchmarking and Improving Sigma Quality Performance of Automated Analytic Tests” The essay starts with:

“The long-term goal of six sigma quality management is to achieve an error rate of 3.4 or less per million opportunities for all laboratory processes. In percent terms, that’s an error rate of less than 0.001%.”

A few sentences later, the author is describing how to calculate “sigma performance” as a measure of quality performance and states that for the CV estimate:

“Data integrity can be assured only when procedures are in place and rigorously practiced to exclude erroneous quality control results due to procedural blunders and statistical outliers.”

This sentence causes me to question whatever follows with suspicion. The author excludes two types of errors: blunders and statistical outliers. From a clinician point of view, interest is in obtaining the correct answer (meaning “correct enough”). If an incorrect answer is produced by a blunder, it is nevertheless wrong. So right off the bat, one knows that whatever is being measured by the author is a subset of real quality performance and one has no doubt heard that the majority of clinical laboratory errors come from pre or post analytical problems.

But perhaps the author intends to measure a subset of quality performance – that due to the analytical process. Then I have a problem with excluding statistical outliers. There is no justification for this exclusion. From a simple numbers view, assume that excluded data are greater than 3 standard deviations from the average. This means excluding about 0.3% of the data, if the data are normally distributed. Well, that’s one way to get to an error rate of less than 0.001%! One might argue that exclusion of a specific outlier is ok because the outlier must have been due to a blunder, but that is speculation and it is possible that the outlier occurred as part of the analytical process – one simply does not know.

From another point of view, if one constructs a Clarke or Parkes type of error grid (e.g., similar to that used for glucose), then dangerous errors will be (by definition of this grid) large errors in certain regions and are likely to be outliers (in a statistical sense) and could easily be excluded. But these are the very errors that should be measured (see essay on FDA’s new waiver guidance).

Along these lines, note that serious assay errors are associated with patient harm. These rare values:

  1. are likely to do the most harm
  2. are likely to be called a statistical outlier

The same arguments apply to proficiency testing which often have automated outlier rules that “clean” the data.

This same trend to exclude outliers has been prevalent in discussions to publish a CLSI standard for GUM (Guide to the Expression of Uncertainty in Measurement).

But not excluding outliers messes up our analysis

Welcome to the real world. That’s true. This is why I recommend to assess quality control data which:

  1. does not exclude data
  2. measures distance from target.
  3. the number of values outside of medically acceptable limits (either out low or out high)
  4. the estimated total analytical error
  5. the lower and upper 95% uncertainty intervals (non parametric estimation)
  6. the contribution of bias as a percent of total analytical error
  7. the contribution of imprecision as a percent of total analytical error

Even if outliers are included

Remember that even if outliers are included, the quality performance measured is still a subset of the analytical performance because random biases, especially those due to patient interferences will be missed because:

  1. quality control material instead of patient samples are being tested
  2. the usual frequency of quality control testing will miss some intermittent errors

This is described in more detail in another essay.

In the list of essays, I used Outlier……………..s. I first saw this in: Beckman, R. J., and R. D. Cook, (1983). Outlier…s. Technometrics, vol. 25, pp. 119-149.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: