## How to specify and estimate outlier rates

Outliers are often distinguished from other error sources because the root cause of the outlier may differ from other error sources or because some authors recommend different disposition of outliers once they are detected (often as in “don’t worry about that result – it’s an outlier”). Unfortunately, some of these practices have lead to the neglect of outliers. Outliers are errors just like all other errors; just larger. Moreover, outliers are often the source of medical errors, since a large assay result error can lead to an incorrect medical treatment (1).

Setting outlier goals

An outlier goal is met if the number of observations in region A in Figure 1 is below a specified rate. A total error goal is met if the percentage of observations in region B is at or greater than a specified percentage (often 95% or 99%) – see for example the NCCLS standard  EP21A (2). The space that is between regions A and B is specified to contain the percentage of observations equal to B – A.

Figure 1. Outlier and Total Error Limits

 A
 A
 Differences
 Outlier limits
 Total error
 B

Estimating outlier rates

The difficulty in estimating outlier rates is that one is trying to prove that an unlikely event does not happen. There are two possible ways to do this and each have their advantages and disadvantages. Moreover, outliers are often the result of a different distribution than most of the other results. This makes it impossible to estimate outlier rates by simply assuming that all results come from a normal distribution.

 Method Advantage Disadvantage Modeling Requires fewer samples Modeling is difficult (and time consuming) – if wrong, the estimated outlier rate will be wrong Counting No modeling is required Requires a huge number of samples

Modeling

There are several types of modeling methods. One is to create a cause and effect or fishbone diagram of an assay and simulate assay results by selecting random observations from each assumed or observed distribution of assay variables to create an “assay result” and subtracting an assumed reference value from this result to obtain an assay error. The distribution of these differences allows one to estimate outlier rates.

The “GUM method” (guide to the expression of uncertainty in measurement) also starts with a cause and effect or fishbone diagram of an assay.  In the GUM method, a mathematical model is used to link all random and systematic errors sources . All systematic errors are either corrected by adjustment or can be converted into random errors when the error is unexplained. All (resulting) random errors are combined using the mathematical model, and following the rules of the propagation of error, to yield a standard deviation which expresses the combined uncertainty of all error sources. A multiple of this standard deviation (the coverage factor) provides a range for the differences between an assay and its reference for a percentage of the population of results. By selecting a suitable multiplier, one may estimate the magnitude of this range of differences (e.g., the outlier limits) for the desired percentage of the population (e.g., the outlier rate) that corresponds to the outlier goal. A concern with use of the GUM method is that it requires modeling all known errors. If an error is unknown, it won’t be modeled and the GUM standard deviation will be underestimated (3).

A FMEA (Failure Mode Effects Analysis) seeks to identify all possible failure modes and for those modes that are ranked as most important, mitigations are implemented to reduce risk. Thus, at the end of a FMEA, one has the potential to quantify outliers rates although in practice in clinical chemistry final outlier risk is rarely quantified. FMEA is important because risk is assessed for non continuous variables, such as the risk of reporting an assay value for the wrong patient.

Counting

In the counting method, outliers are considered as discrete events. That is, each assay result is judged independently from every other result to be either an outlier or not, based on the magnitude of the difference between the result and reference. Of course, the choice of reference method is important. If the reference method is not a true reference method but a comparison method (another field method), then there is no way to know that a large difference that is being called an outlier is due to the new method or existing method.

The rate of outliers is simply the numbers of outliers found divided by the total number of samples assayed and converted to a percent.

Outlier rate = (x/n) * 100

where   x = the numbers of outliers found

n = the total number of samples assayed

This rate is not exact because it is a sample. Hahn and Meeker present a method to account for this uncertainty (4). The table shows for various numbers of total observations and outliers found, the maximum percentage outlier rate with a stated level of confidence. This gives one an idea of sample sizes required to prove the maximum outlier rate.

 Sample Size Number Outliers Found Maximum Percent Outlier Rate (95% Confidence) Maximum Percent Outlier Rate (99% Confidence) ppm Outlier Rate (95%) ppm Outlier Rate (99%) 10 0 25.9 36.9 259,000 369,000 100 0 3.0 4.5 30,000 45,000 1,000 0 0.3 0.5 3,000 5,000 1,000 1 0.5 0.7 5,000 7,000 10,000 0 0.03 0.05 300 500 10,000 1 0.05 0.07 500 700 10,000 10 0.2 0.2 2,000 2,000 The following entry is a “six sigma” process 881,000 0 3.40037E-04 5.23E-04 3.4 5.2

Understanding the table entries

Using the third row as an example, 1,000 samples have been run and no outliers have been found. The estimated outlier rate is zero. However, this is only a sample and subject to sampling variation. Using properties of the binomial distribution allows one to state with 95% confidence that there could be no more than 0.3% outliers for the true rate. This is equivalent to saying that in 1,000,000 samples there could be no more than 3,000 outliers.

“Six sigma” and outliers

The popular six sigma paradigm assumes that if one has a process with a 1.5 standard deviation shift and variation of 6 standard deviations, the number of defects will be 3.4 per million. Defects per million for 1 to 6 sigma are shown on the following table.

 SIGMA (SL) NORMSDIST(SL) SL+1.5 1.5-SL Prob. Good Prob. Defect Defects per million 1 0.84134474 0.99379 0.691462 0.302328 0.697672 697672.1 2 0.977249938 0.999767 0.308538 0.69123 0.30877 308770.2 3 0.998650033 0.999997 0.066807 0.933189 0.066811 66810.6 4 0.999968314 1 0.00621 0.99379 0.00621 6209.7 5 0.999999713 1 0.000233 0.999767 0.000233 232.7 6 0.999999999 1 3.4E-06 0.999997 3.4E-06 3.4

These results assume a normal distribution. In a diagnostic assay, it would be difficult if not impossible to prove that all results are normally distributed. However, the corresponding entry in the bottom of the first table corresponds to a six sigma process of 3.4 defects.

Discussion

Laboratories are not going to run 10,000 samples (nor should they) to prove that there are no outliers. Unfortunately, there are proposals to get laboratories to perform a limited type of GUM modeling which is totally inadequate and would prove nothing (3). Manufacturers could (and do) run large numbers of samples during assay development but don’t want to include estimation of outlier rates in their product labeling.

Thus, outliers remain an ignored topic and only surface when they cause problems. One possible remedy would be a uniform way for manufacturers to report outlier studies as part of their product labeling.

References

1. Cole LA, Rinne KM, Shahabi S, and Omrani A. False-Positive hCG Assay Results Leading to Unnecessary Surgery and Chemotherapy and Needless Occurrences of Diabetes and Coma Clin Chem 1999;45:313 – 314
2. National Committee for Clinical Laboratory Standards. Estimation of total analytical error for clinical laboratory methods; approved guideline. NCCLS document E21-A 2003 NCCLS Villanova, PA
3. Krouwer JS Critique of the Guide to the Expression of Uncertainty in Measurement Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin Chem 2003;49:1818-1821.
4. Hahn GJ and Meeker WQ. Statistical intervals. A guide for practitioners. Wiley: New York, 1991, p. 104