Outliers are often distinguished from other error sources because the root cause of the outlier may differ from other error sources or because some authors recommend different disposition of outliers once they are detected (often as in “don’t worry about that result – it’s an outlier”). Unfortunately, some of these practices have lead to the neglect of outliers. Outliers are errors just like all other errors; just larger. Moreover, outliers are often the source of medical errors, since a large assay result error can lead to an incorrect medical treatment (1).
Setting outlier goals
An outlier goal is met if the number of observations in region A in Figure 1 is below a specified rate. A total error goal is met if the percentage of observations in region B is at or greater than a specified percentage (often 95% or 99%) – see for example the NCCLS standard EP21A (2). The space that is between regions A and B is specified to contain the percentage of observations equal to B – A.
Figure 1. Outlier and Total Error Limits
A 
A

Differences 
Outlier limits 
Total error 
B

Estimating outlier rates
The difficulty in estimating outlier rates is that one is trying to prove that an unlikely event does not happen. There are two possible ways to do this and each have their advantages and disadvantages. Moreover, outliers are often the result of a different distribution than most of the other results. This makes it impossible to estimate outlier rates by simply assuming that all results come from a normal distribution.
Method  Advantage  Disadvantage 
Modeling  Requires fewer samples  Modeling is difficult (and time consuming) – if wrong, the estimated outlier rate will be wrong 
Counting  No modeling is required  Requires a huge number of samples 
Modeling
There are several types of modeling methods. One is to create a cause and effect or fishbone diagram of an assay and simulate assay results by selecting random observations from each assumed or observed distribution of assay variables to create an “assay result” and subtracting an assumed reference value from this result to obtain an assay error. The distribution of these differences allows one to estimate outlier rates.
The “GUM method” (guide to the expression of uncertainty in measurement) also starts with a cause and effect or fishbone diagram of an assay. In the GUM method, a mathematical model is used to link all random and systematic errors sources . All systematic errors are either corrected by adjustment or can be converted into random errors when the error is unexplained. All (resulting) random errors are combined using the mathematical model, and following the rules of the propagation of error, to yield a standard deviation which expresses the combined uncertainty of all error sources. A multiple of this standard deviation (the coverage factor) provides a range for the differences between an assay and its reference for a percentage of the population of results. By selecting a suitable multiplier, one may estimate the magnitude of this range of differences (e.g., the outlier limits) for the desired percentage of the population (e.g., the outlier rate) that corresponds to the outlier goal. A concern with use of the GUM method is that it requires modeling all known errors. If an error is unknown, it won’t be modeled and the GUM standard deviation will be underestimated (3).
A FMEA (Failure Mode Effects Analysis) seeks to identify all possible failure modes and for those modes that are ranked as most important, mitigations are implemented to reduce risk. Thus, at the end of a FMEA, one has the potential to quantify outliers rates although in practice in clinical chemistry final outlier risk is rarely quantified. FMEA is important because risk is assessed for non continuous variables, such as the risk of reporting an assay value for the wrong patient.
Counting
In the counting method, outliers are considered as discrete events. That is, each assay result is judged independently from every other result to be either an outlier or not, based on the magnitude of the difference between the result and reference. Of course, the choice of reference method is important. If the reference method is not a true reference method but a comparison method (another field method), then there is no way to know that a large difference that is being called an outlier is due to the new method or existing method.
The rate of outliers is simply the numbers of outliers found divided by the total number of samples assayed and converted to a percent.
Outlier rate = (x/n) * 100
where x = the numbers of outliers found
n = the total number of samples assayed
This rate is not exact because it is a sample. Hahn and Meeker present a method to account for this uncertainty (4). The table shows for various numbers of total observations and outliers found, the maximum percentage outlier rate with a stated level of confidence. This gives one an idea of sample sizes required to prove the maximum outlier rate.
Sample Size  Number Outliers Found  Maximum Percent Outlier Rate (95% Confidence)  Maximum Percent Outlier Rate (99% Confidence)  ppm Outlier Rate (95%)  ppm Outlier Rate (99%) 
10  0  25.9  36.9  259,000  369,000 
100  0  3.0  4.5  30,000  45,000 
1,000  0  0.3  0.5  3,000  5,000 
1,000  1  0.5  0.7  5,000  7,000 
10,000  0  0.03  0.05  300  500 
10,000  1  0.05  0.07  500  700 
10,000  10  0.2  0.2  2,000  2,000 
The following entry is a “six sigma” process  
881,000  0  3.40037E04  5.23E04  3.4  5.2 
Understanding the table entries
Using the third row as an example, 1,000 samples have been run and no outliers have been found. The estimated outlier rate is zero. However, this is only a sample and subject to sampling variation. Using properties of the binomial distribution allows one to state with 95% confidence that there could be no more than 0.3% outliers for the true rate. This is equivalent to saying that in 1,000,000 samples there could be no more than 3,000 outliers.
“Six sigma” and outliers
The popular six sigma paradigm assumes that if one has a process with a 1.5 standard deviation shift and variation of 6 standard deviations, the number of defects will be 3.4 per million. Defects per million for 1 to 6 sigma are shown on the following table.
SIGMA (SL)  NORMSDIST(SL)  SL+1.5  1.5SL  Prob. Good  Prob. Defect  Defects per million 
1  0.84134474  0.99379  0.691462  0.302328  0.697672  697672.1 
2  0.977249938  0.999767  0.308538  0.69123  0.30877  308770.2 
3  0.998650033  0.999997  0.066807  0.933189  0.066811  66810.6 
4  0.999968314  1  0.00621  0.99379  0.00621  6209.7 
5  0.999999713  1  0.000233  0.999767  0.000233  232.7 
6  0.999999999  1  3.4E06  0.999997  3.4E06  3.4 
These results assume a normal distribution. In a diagnostic assay, it would be difficult if not impossible to prove that all results are normally distributed. However, the corresponding entry in the bottom of the first table corresponds to a six sigma process of 3.4 defects.
Discussion
Laboratories are not going to run 10,000 samples (nor should they) to prove that there are no outliers. Unfortunately, there are proposals to get laboratories to perform a limited type of GUM modeling which is totally inadequate and would prove nothing (3). Manufacturers could (and do) run large numbers of samples during assay development but don’t want to include estimation of outlier rates in their product labeling.
Thus, outliers remain an ignored topic and only surface when they cause problems. One possible remedy would be a uniform way for manufacturers to report outlier studies as part of their product labeling.
References
 Cole LA, Rinne KM, Shahabi S, and Omrani A. FalsePositive hCG Assay Results Leading to Unnecessary Surgery and Chemotherapy and Needless Occurrences of Diabetes and Coma Clin Chem 1999;45:313 – 314
 National Committee for Clinical Laboratory Standards. Estimation of total analytical error for clinical laboratory methods; approved guideline. NCCLS document E21A 2003 NCCLS Villanova, PA
 Krouwer JS Critique of the Guide to the Expression of Uncertainty in Measurement Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin Chem 2003;49:18181821.
 Hahn GJ and Meeker WQ. Statistical intervals. A guide for practitioners. Wiley: New York, 1991, p. 104