Proficiency testing and six sigma metrics as a measure of analytical quality in laboratory testing

Proficiency testing has long been used to assess the analytical quality of laboratory testing. Hospital laboratories send in several quality control results a year which are used to assess an assay’s analytical performance.

Much has been said about using six sigma calculations as a measure of quality. A recent paper by Westgard^2 discusses the analytical quality of laboratory testing based on proficiency testing (1). Here are some problems with the use of proficiency testing in general and this use of six sigma.

Proficiency testing data processing rules often throw out outliers – That is, computer program rules are set up to automatically delete data (see outliers in proficiency testing essay). The rationale often used is that there must have been a sample mix-up (or something similar that’s not part of the analytical process) and it’s ok to delete these values because a sample mix-up is a pre-analytical error and proficiency testing is supposed to inform about analytical quality. But what if the outlier result was from the analytical process. How does one really know? With the data coming from thousands of hospital laboratories, there is no practical way to find out.

Proficiency testing misses problems from interferences as well as shorter term random errors – As part of a method evaluation recommendation, I looked at a year’s worth of performance complaints that appeared in Clinical Chemistry (2). Most complaints (71%) were about interfering substances, a type of analytical error that would likely be missed in proficiency testing. In fact any shorter term error source would likely be missed in a proficiency testing program (see equivalent QC essay). So what is being measured is not the analytical process, it’s a subset of the analytical process. How serious is this? It can be very serious (3). To measure potential problems from interfering substances, one must conduct a method comparison using patient samples.

Westgard’s six sigma calculations are based on a model – This model (1) assumes that the data are normally distributed (see using the wrong model essay). But what if they aren’t. I’ve looked at thousands of datasets. Some are normally distributed, some aren’t. This is important because the actual distribution could contain a lot of data in one or both tails of the distribution (an example is the log normal distribution). This would mean more defects and a lower six sigma result than an equivalent normal distribution.

The error goals are not severity based. –The error goals used in reference 1 are CLIA based. As an example, glucose CLIA goals are 6 mg/dL or 10%, whichever is higher. This ignores the concept of different severity for different errors as expressed in a Clarke (or Parkes) error grid. So the whole FMEA concept of ranking errors by severity is lost since all errors are treated the same.

Some recommendations (Some of these recommendations will appear as references 4-5).

Outliers – If there is no practical way to investigate potential outliers, then report the data two ways – one with all data, the other without data declared to be outliers.

Goals – Use error grids to divide errors into severity categories. This means that other than for glucose, error grids need to be developed. The location of the proficiency sample concentration has to be optimally chosen with respect to the error grid.

Six sigma estimates – Don’t use models – simply count the data in each zone of the relevant error grid. For example, in reference 1 there are 9,258 laboratories that reported data for cholesterol. If each lab submitted three specimens per year, there are 27,774 data points. That’s plenty to get accurate estimates by counting.

Estimating a subset of analytical performance – There’s no solution to this problem in proficiency testing. One should not imply that all of analytical performance is being measured by proficiency testing.

How much quality control is required – See more QC essay. All the problems in proficiency testing remain in quality control testing. Thus, one should not imply that all of analytical performance is being measured by quality control.


  1. Westgard, JO and Westgard SA. The quality of laboratory testing today. Am J Clin Path 2006;125:343-354.
  2. Krouwer JS Estimating Total Analytical Error and Its Sources: Techniques to Improve Method Evaluation. Arch Pathol Lab Med 1992;116:726-731.
  3. Cole LA, Rinne KM, Shahabi S., Omrani A. False positive hCG assay results leading to unnecessary surgery and chemotherapy and needless occurrences of diabetes and coma. Clin. Chem. 1999;45:313-314.
  4. Krouwer JS Uncertainty intervals based on deleting data are not useful. Clinical Chemistry 2006;52:1204-1205.
  5. Krouwer JS Recommendation to treat continuous variable errors like attribute errors. Clinical Chemistry and Laboratory Medicine 2006;44(7):797–798.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: