“I so anoint myself”

June 29, 2006

“The following list presents 10 persons who have made a significant impact on the IVD industry.” This is how the magazine IVD Technology begins and then gives a short description of each of the 10 people (1). Two of the people listed in the top 10 happen to be on the editorial advisory board of IVD Technology (2). Hmmm…..

About half of the editorial advisory board are in regulatory affairs and four of the top 10 are also in regulatory affairs (including the two above). In case you’re wondering, Leonard Skeggs, the inventor of the auto-analyzer didn’t make the list! OK, to be fair, the text also says “Efforts were made to ensure that this list reflects contributions in both the regulatory and scientific areas.”, but the title and first sentence are misleading.


  1. Top 10 Persons in the IVD Industry IVD Technology April 2005, see http://www.devicelink.com/ivdt/archive/05/04/002.html
  2. See, http://www.devicelink.com/ivdt/eab.html


“No reaction”

June 29, 2006

In November of 1998, I was invited to attend the chairholder’s council of NCCLS (now called CLSI). This is a meeting of the leaders of the committees that produce clinical laboratory standards. During the meeting, NCCLS started a quality initiative kicked off with a keynote speech and rationale for the program by David Nevalainen, listed at the time as from the Abbott Quality Institute (1). He presented a quality system quite similar to ISO 9000. I commented at the presentation that in my experience, ISO 9000 (upon which the NCCLS quality system is based) has had virtually no impact on quality in industry. (I believe this is still true). There was no reaction to my comment.

One year latter, I was attending the November 1999 chairholder’s council. In the lobby of the hotel, I was reading the Wall Street Journal when I noticed that one of the top stories was about an FDA fine for Abbott quality problems. The fine was for 100 million dollars and ordered Abbott to stop selling certain assays (2). When I tried to point out to NCCLS senior management the connection among the NCCLS quality system, Nevalainen, and Abbott, I got no reaction.


  1. See, for example: http://arpa.allenpress.com/arpaonline/?request=get-document&doi=10.1043%2F0003-9985(1999)123%3C0566:TQSA%3E2.0.CO%3B2
  2. Abbott to pay $100 million in fine to U.S. The Wall Street Journal, November 3, 1999.


“If it isn’t in ISO, it doesn’t exist”

June 29, 2006

There is a CLSI subcommittee that deals with risk management. One of the European participants had trouble with the word mitigation as in the term “risk mitigation.” It was pointed out that the ISO standard on risk management 14971 does not contain the term risk mitigation primarily because of translation difficulties and therefore, the CLSI standard should not use this term.

Now this translation problem baffles me as ISO standards are in English. Moreover if one does a search in Google for risk mitigation, one will get over 4 million hits.


Medical diagnostics industry participates in fake news

June 23, 2006

You may (or may not) be aware that some news stories aired by television news stations are provided by companies and the news station fails to disclose this. Hence, this is often referred to as “fake news”. The medical diagnostic industry participates in fake news. For a segment on allergy testing provided by Quest Diagnostics and aired by KABC-7 (Los Angeles), go here.

Why Bland Altman plots should use X, not (X+Y)/2 when X is a reference method

June 18, 2006

This essay has been published in Statistics in Medicine. (Jan S. Krouwer: Why Bland-Altman plots should use X, not (Y+X)/2 when X is a reference method. Statistics in Medicine, 2008;27:778-780). It is no longer available on this web site.

The Excel simulation file is still available below. 


More on GUM

June 17, 2006

I have critiqued the use of GUM (Guide to the expression of uncertainty in measurement) for commercial diagnostic assays (1) and also commented on a Letter about GUM (2).

To review why I don’t favor the use of GUM for commercial diagnostic assays:

  • GUM is an extremely complicated modeling method with respect to the capabilities of most clinical laboratories
    • This leads to “simplified” versions of GUM for clinical laboratories which are completely inadequate (3). Whereas one can argue that these simplified methods aren’t GUM, they may nevertheless be claimed as such.
  • GUM requirements won’t be met in many cases. For example:
    • many assays don’t meet the definition of a well defined physical quantity
    • one must correct known errors, which is impractical if not impossible for users of commercial diagnostic assays, who must know what the errors are and how to fix them, and many assays do have problems (although most results are within medically acceptable limits).
  • GUM typically estimates the 95% limits of the error distribution. Whereas this is useful information, GUM provides no information about the remaining 5% of errors – note that the assumption that all data is or has been transformed to Normality is a big stretch.
    • This focus on 95% of the error distribution goes against the patient safety movement of focusing on the largest errors (e.g., the remaining 5%).
  • GUM is unnecessary as one can simply count errors in various severity categories to get rates without the use of complicated modeling with assumptions that may be wrong.

Having said all this, I am still onboard for use of GUM for reference materials.

This essay is about another GUM article for which I published a Letter (4), which prompted a reply from the authors (5). Their article was about use of GUM for serological assays (6). What follows was sent as an eLetter to Clinical Chemistry.

I appreciate the response by Dr. Dimech and understand that analyzing real data is never easy. Of course, I was unaware of Dr. Dimech’s response – I can only react to the words on the paper, not material that is omitted for whatever reason – thus my Letter.

Here is my response to Dr. Dimech’s reply to my Letter combined with his original paper.

Right after the statement to exclude outliers comes the advice:  “It is suggested that results reported by each laboratory are checked for normality by use of a bar graph (See Fig. 1 in the online Data Supplement) or a statistical method such as Grubbs test.”

Normality is usually tested graphically with histograms and / or normal probability plots, not bar graphs. Grubb’s test is not a test for normality – it is a test for outliers and requires normal data! Statistical tests for normality include the Shapiro-Wilk, Kolmogorov-Smirnov, and Anderson-Darling tests.

Perhaps more importantly, consider the authors’ first sentence in the paper:  ”Most regulatory authorities that use International Organization for Standardization (ISO) Standards to assess laboratory competence require an estimate of the uncertainty of measurement (MU) of assay test results.”

At best this sentence is ambiguous. Perhaps the authors mean that one of the components of laboratory competence is an uncertainty interval but one could also interpret this sentence to equate an uncertainty interval with laboratory competence, even though to a clinician, laboratory competence would suggest an acceptable rate of errors from all sources.

In the case of laboratory data, the distribution of errors can be of any shape and can contain large errors, which may or may not be detached from the rest of the error distribution. To a clinician, wrong answers are dangerous, regardless of their source. So, blunders such as the typographical error are part of the population of interest to a clinician. Now for certain purposes, one can define a subset of the population of errors that contain only analytical error sources and exclude pre- and post- analytical error sources. However, this subset can be quickly confused with the total population and the first sentence in this paper will add to this confusion.


  1. Krouwer JS Critique of the Guide to the Expression of Uncertainty in Measurement Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin. Chem. 2003;49:1818 – 1821.
  2. Stöckl D, Van Uytfanghe K, Rodríguez Cabaleiro D, Thienpont LM, Patriarca M, Castelli M, Corsetti F, and Menditto A Calculation of Measurement Uncertainty in Clinical Chemistry Clin Chem 2005 51: 276-277
  3. White GH and Farrance I Uncertainty of Measurement in Quantitative Medical Testing: A Laboratory Implementation Guide Clin Biochem Rev 2004;25:Suplement ii,S1-S24 available at http://www.aacb.asn.au/pubs/Uncertainty%20of%20measurement.pdf
  4. Krouwer JS Uncertainty Intervals Based on Deleting Data Are Not Useful Clin. Chem. 2006;52:1204 – 1205.
  5. Dimech W Uncertainty Intervals Based on Deleting Data Are Not Useful: Reply Clin. Chem. 2006; 52:1205.
  6. Dimech W, Francis B, Kox J, Roberts G. Calculating uncertainty of measurement for serology assays by use of precision and bias. Clin Chem 2006;52:526-529

Detection Systems – Fault Isolation, Automation, and Diagnostic Accuracy – 6/2006

June 12, 2006
Detection Systems – Fault Isolation, Automation, and Diagnostic Accuracy – 6/2006

First, a quick review

A clinical laboratory’s product is the report provided to clinicians, whose main element is the assay result. The result needs to be as error free as possible to prevent harm to patients. Assay performance goals can be expressed in terms of error grids such as are available for glucose. It is helpful to conceptualize clinical laboratory errors in terms of a fault tree or FMEA. The top level error one wants to prevent is providing an incorrect result to a clinician.

Another possible top level error is delay in the reporting of a result – to keep things simple that is not considered here, but could also lead to patient harm.

This top level error is the “effect” of many possible lower level errors (e.g., causes). In order to prevent the top level error, the clinical laboratory’s quality program tries to address lower level errors either by

  • preventing errors or
  • detecting and recovering from errors.

Note that detection without recovery is not useful and that these are two (separate) steps.

The use of quality control

Quality control is a means of detecting errors. The recovery part of quality control is simple – after a failed quality control result is observed, no patient results are reported since the last successful quality control . This raises an immediate concern about the CMS proposal to allow quality control to be run once a month, as this makes recovery rather useless – all of these potentially incorrect patient results will have been reported to clinicians. To summarize, quality control detects lower level errors and prevents the effect of these errors. In this way, it blocks the error cascade expressed by a fault tree or FMEA.

There is a another task that clinical laboratories must do after a failed quality control and that is to determine why the quality control failed, so as to correct the problem. This is where fault isolation plays a role.

Fault Isolation – Why its important

Fault isolation, when it is present, refers to a detection system, which points to a single root cause for the failure. To see why this is important, consider the following case, where incorrect results are generated by an assay system because of regent degradation caused by the reagent being stored above its maximum allowable storage temperature. To prevent this error, training would be used and perhaps the use of redundant refrigeration systems. In addition, consider two different detection systems to deal with this failure.

Fault isolation absent

Quality Control – The bad reagent can lead to a failed QC. Since failed QC can be caused by many factors, there is no fault isolation. So one must follow a troubleshooting protocol to determine the root cause of failed QC. This troubleshooting ensures that the next set of results will not fail QC – at least not for that root cause!

Fault isolation present

Temperature Sensor on Reagent – A sensor of the reagent box that indicates storage at a too high temperature by a color change does has fault isolation. Of course this relies on another detection step, where one looks at the temperature sensor.


Ideally, one would like all detection systems to have fault isolation since no troubleshooting is required which returns the system quicker to an error free state. But to design in detection systems with fault isolation for all errors, one must have a complete knowledge of all the ways a system can fail.

For the reasons this knowledge is often not the case, see the AACC expert session.

The value of quality control is that in many cases it detects errors, even though no one (the clinical laboratory or the manufacturer) has knowledge that such an error may occur. The disadvantage of quality control is that there is no fault isolation and a corrective action could involve a substantial amount of work. When this corrective action occurs before product release, it is simply part of product development, but when it occurs after product release in a clinical laboratory, it is also product development but conducted in part by the clinical laboratory.

Automated detection recovery systems:

Automated detection recovery systems are desirable and are prevalent on instrument systems. As an example, a sample’s response curve is evaluated by an algorithm. The algorithm can detect whether the response is too noisy, and if so signal the analyzer to suppress reporting that result (e.g., the recovery). Note that either the previous temperature sensor detection system or quality control are manual detection recovery systems.

There is no guarantee that an automated detection recovery system has fault isolation. In the noisy response example, there is no indication of what is causing the noise. For example, it could be a lipemic specimen or alternatively a dirty reaction chamber.

Diagnostic accuracy

The final dimension in this essay is the diagnostic accuracy of the detection system. This was also covered in the AACC expert session and relates the to number of false positives and false negatives that occur with the detection process.

Final Summary

With sufficient knowledge, one would either design a system without errors or employ detection systems for all possible failures. However, one does not have this knowledge. Good detection systems have high diagnostic accuracy, are automated, and have fault isolation. The value of quality control is that in spite of not having fault isolation or being automated, it can catch errors that are missed by detection systems.

‘Sick’ Sigma and zero defects

June 11, 2006

Two recent articles in Quality Digest received a lot of comments. The articles are here:

Sick Sigma

Zero Defects

Here are my comments (1).

Re: Sick Sigma

Dr. Burns questions the origin of the 1.5 drift that is part of a six sigma process and pretty much implies that having a 1.5 sigma bias is intended as part of six sigma. Does he really believe that someone will build in a 1.5 sigma bias into their product to conform to six sigma? He then goes on to talk about control limits. This is irrelevant with respect to attribute defects in many cases, such as in medical errors. One can have attribute control limits, say to control the proportion of bad pixels in an LCD screen. But for wrong site surgery, control limits make no sense since the allowable number of defects is zero. Yet, one can still count defects and relate them to six sigma terms, and improve the process until the observed defect rate is zero.


Dr Tony Burns responds

Unfortunately there are hundreds of thousands of people who do build in the 1.5 sigma bias. There are millions of people using 3.4 ppm based on the 1.5 drift. A quick google search revealed a quarter of a million sites promoting averages unavoidably drifting by 1.5 sigma. It all started with a theoretical error by Mikel Harry, that no one bothered to check. The erroneous theoretical drift over 24 hours then became an empirical “long term” drift. The 1.5 drift is nonsense. It has set the quality world backwards by many years. World class quality can only be “on target with minimum variation”.

You are mistaken. Control limits do apply to attribute data. Attribute control charts are of 4 main types, “pn”, “p”, “c”, “u”. I suggest that you read any basic text on Statistical Process Control (SPC), such as “Analysis of Control and Variation” by John McConnell. Before speaking to your clients, I would strongly suggest reading more advanced texts such as Don Wheeler’s “Understanding SPC” and “Advanced Topics in SPC”. In the example that you quote, either a pn, p or an XmR chart might be used, depending on the details of the situation. The texts mentioned above will describe how to calculate and draw the appropriate control limits.

Jan replies

Thanks for your reply.

 As for people actually building in the bias, this is not my experience. I’m not sure how you can come to your conclusion from a Google search.

 Yes, you are right about control charts for attribute data. Perhaps I got carried away. But my point was based on reading your article. Let me state it differently. If one is producing LCD screens, one can set up attribute charts to control the number (or proportion) of bad pixels in a screen. But in my area of interest – medical errors – one does not do this. For example, the proportion of wrong site surgeries has been estimated at 0.00085%. This not an acceptable rate (which of course is zero) so there is no control chart that one can set up as one does not wish to control to an acceptable level of defects (e.g., >0). One continues to measure the rate and improve the process until one is observing a rate of zero. (After which, one still measures but does not change the process.) Six Sigma is sometimes used as a benchmark in medical error opinion articles. That is, one would rather have a six sigma than a three sigma process since less medical errors are implied. But for serious medical errors, a six sigma process is unacceptable.

 As it turns out, I am not a fan of Six Sigma and I am suspicious of all of these people who have no experience analyzing data all of a sudden becoming experts (black belts).

Dr Tony Burns responds

Regarding six sigma’s 1.5 sigma bias, perhaps I should explain further.  Anyone who is using six sigma tables to calculate a “sigma level” for a process, is making the assumption of a 1.5 sigma bias.  Anyone quoting 3.4 DPMO has assumed the erroneous 1.5 sigma drift in averages.  Not only is the bias fallacious, but the assumption of process normality used in six sigma tables is grossly in error, as is using counts at the extreme tail of any distribution as an estimator of the distribution’s dispersion (sigma).  I have drafted a second paper “Tail Wagging It’s Dog” that has been submitted to Quality Digest, which describes the latter in more detail.  You may wish to read more in the various papers referenced at our site http://www.q-skills.com/sixsigtools.htm


Attribute control charts can fortunately still come to the rescue, even with rare events such as you suggest. Chapter 11.9 “Advanced topics in SPC” gives a lovely example of how to use an XmR chart for this purpose.  Zero wrong site surgeries may be a desirable target but chaos, human error and variation will inevitably occur, even in the most ideal system.


Comparisons such as “… rather have a six sigma than a three sigma process since …” are meaningless. Six sigma relates to the specification, that is, the voice of the customer.  Three sigma relates to the voice of the process. The specification may be set at any level you wish, four, five, six, seven sigma, whatever. The voice of the process is always three sigma. 


I won’t get started on black belts.  I feel quite sorry for these unfortunate people who are grabbed from the shop floor and expected to become overnight statisticians and process magicians.  It baffles me how they can be expected to understand Students-T and Box Wilson experimental design, when they clearly don’t understand the even the meaning of sigma.

Jan replies

Well, I don’t belong to the “anyone” group, lol cause I don’t assume a 1.5 sigma bias when I calculate DPMO, but I get it and agree with you about the origin of the 1.5 sigma bias. This has always been mysterious to me – the 1.5 bias and Normal assumptions, because one can always calculate DPMO, without knowing anything about Six Sigma (or assumptions about the data) yet one can relate DMPO to the defects expressed by a table (e.g., 6 sigma = 3.4, 5 sigma=I forgot the number, etc.). So in this sense, six sigma is just a level of desirability with six sigma = very good, five sigma = pretty good, etc.

My other point was mainly to express the fact that one does not use quality control rules for a process whose desired defect level is zero. I realize that defects still may occur. So one uses risk analysis tools such as fault trees and calculates probability of failure events. If the probability of a failure event is low enough (the goal is never zero), then one can have both acceptable risk and zero defects (zero not theoretically, but zero for practical purposes – e.g., one failure event in the next million years).

I will look at your web site and thanks for stimulating me to think about things more.

Dr Tony Burns responds  

Being able to compare processes by quoting a “sigma level” is appealing to management, however it simply doesn’t work.  For example, consider comparing two processes, one of which has a histogram skewed to the left and the other to the right. They may both appear “very good” with an assumption of normality, however one might be far more poorly controlled than the other.  The situation is even worse because defects counts and “sigma levels” give no indication of process capability.  A process can only be “capable” (of producing product or service within specification) that is, “very good”, if it is “in-control”.  If the process average is drifting, as assumed in six sigma, the process is out of control and therefore unpredictable.  An unpredictable process is certainly not a “very good” thing.  Only control charts and histograms can give this information.


Thank you for your final comment.  My aim is to encourage people to question the accepted norms (no pun intended) rather than to accept them blindly.

Re: Zero Defects

Mr. Crosby would have us believe that a zero defects program is inexpensive, especially compared to six sigma. Well, I remember spending 5 days in Corning, NY as part of Total Quality training (which included zero defects, a la Crosby). Everyone in our company (Corning Medical, Medfield, MA) received training in quality. The cost of this program across Corning must have been substantial.

Mr. Crosby also talks about the zero defects concept as “Work right the first time and every time” and the performance standard is that “No defects are acceptable.” This gives one the impression that without this program, inept engineers and production staff are creating poor quality products and if only they had this quality training … . Well, things aren’t that simple. The number of defects in a design relate to the state of knowledge of the technology. No defects are possible when the state of knowledge is high. However, for many systems, the state of knowledge is not high enough to design a product with no defects and a common and efficient development process is to go through a test and improvement loop until the number of problems reaches an acceptable level and this number is not zero.


Well I realize that in my comments about Sick Sigma I talk about processes whose defect rate should be zero (e.g., wrong site surgery) and in the comments about “Zero Defects” I talk about processes with allowable defects rates greater than zero (e.g., proportion of bad pixels in an LCD screen) but that’s the real world.


  1. Sick sigma comments amended after an email from Dr. Burns, the article’s author, since I had provided comments to the journal in which his article appeared.