Blog Review

May 26, 2017

I started this blog 13 years ago in March 2004 – the first two articles are about six sigma, here and here. The blog entry being posted now is my 344th blog entry.

Although the blog has an eclectic range of topics, one unifying theme for many entries is specifications, how to set them and how to evaluate them.

A few years ago, I was working on a hematology analyzer, which has a multitude of reported parameters. The company was evaluating parameters with the usual means of precision studies and accuracy using regression. I asked them:

  1. a) what are the limits that, when differences from reference are contained within these limits, will ensure that no wrong medical decisions would be made based on the reported result (resulting in patient harm) and
  2. b) what are the (wider) limits that, when differences from reference are contained within these limits, will ensure that no wrong medical decisions would be made based on the reported result (resulting in severe patient harm)

This was a way of asking for an error grid for each parameter. I believe, then and now, that constructing an error grid is the best way to set specifications for any assay.

As an example about the importance of specifications there was a case for which I was an expert witness whereby the lab had produced an incorrect result that led to patent harm. The lab’s defense was that they had followed all procedures. Thus, as long as they as followed procedures, they were not to blame. But procedures, which contain specifications, are not always adequate. As an example, remember the CMS program “equivalent quality control”?

You get what you ask for

April 13, 2007

I have written before about the difference between horizontal and vertical standards. ISO/TC212 produces standards for the clinical laboratory. The following came from a talk by Dr. Stinshoff, who has headed the ISO/TC212 effort. The red highlights are from Dr. Stinshoff.

“ISO/TC 212 Strategies:

– Select new projects using the breadth and depth of the expertise gathered in ISO/TC 212; focus on horizontal standards; address topics that are generally applicable to all IVD devices; and, limit the activities of ISO/TC 212 to a level that corresponds to the resources that are available (time and funds of the delegates).

– Assign high preference to standards for developed technologies; assign high preference to performance-oriented standards; take the potential cost of implementation of a standard into consideration; and, solicit New Work Item ideas only according to perceived needs, which should be fully explained and supported by evidence.

– Globalize regional standards that have a global impact”


What is meant by performance-oriented standards? “ISO Standardisation Performance vs. Prescriptive Standards:

Whenever possible, requirements shall be expressed in terms of performance rather than design or descriptive characteristics. This approach leaves maximum freedom to technical development….

(Excerpt of Clause 4.2, ISO/IEC Directives, Part 2, 2004)”

So one reason ISO/TC212 produces horizontal standards is because that is their strategy.


European and US clinical laboratory quality

April 5, 2007

I am somewhat skeptical about the statement in a recent Westgard essay which suggests that Europeans  who use ISO 15189 to help with accreditation are more likely to improve quality in their laboratories than US laboratories, who just try to meet minimum CLIA standards. ISO 15189 is much like ISO 9001, which  is used for businesses. I have previously written that ISO 9001 certification plays no role in improving quality for diagnostic companies (1). As an example of ISO 15189 guidance – albeit in the version I have which is from 2002 – under the section “Resolution of complaints”, ISO 15189 says the laboratory should have a policy and procedures for the resolution of complaints. In ISO 17025, which is a similar standard, virtually the identical passage occurs.

Westgard mentions that clinical laboratories need a way to estimate uncertainty that is more practical than the ISO GUM standard and mentions a CLSI subcommittee which is working on this. A more practical way will be unlikely. I was on that subcommittee. I didn’t want to participate at first, since I don’t agree that clinical laboratories should estimate uncertainty according to GUM (2). However, the chair holder wanted me for my contrarian stance, so I joined. I must say that I enjoyed being on the subcommittee, which had a lot of smart people and an open dialog. However, I was unable to convince anyone of my point of view and therefore resigned, because it would make no sense to be both an author of this document and reference 2. The last version of this document I saw was 80 pages long (half of it an Appendix) with many equations. This version will not be understood by most (any?) clinical laboratories. However, there is a CLSI document that allows one to estimate uncertainty intervals easily, EP21A, although not according to GUM.

What is needed to improve clinical laboratory quality anywhere? Policies that emphasize measuring error rates such as FRACAS (3).


  1. Krouwer JS: ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry. Accred. Qual. Assur. 2004;9:39-43.
  2. Krouwer JS: A Critique of the GUM Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin. Chem., 49:1818 -1821 (2003).
  3. Krouwer JS: Using a Learning Curve Approach to Reduce Laboratory Error, Accred. Qual. Assur., 7: 461-467 (2002).


Better automation for clinical chemistry

March 30, 2007

I first heard Martin Hinckley speak at an AACC forum. That talk was published in Clinical Chemistry, 1997;43:873-879.

A new article is available at (I suspect this link will work for a limited time).

This article deals with automation and how it has not lived up to the expectation that it would greatly improve quality. Hinckley offers some interesting advice regarding how to improve the implementation of automation.


You’re either part of the problem or part of the solution

February 18, 2007


Westgard bemoans the current process of establishing performance claims for assays.  He states that

“There is one major fault with this approach [precision, accuracy, linear range, reference range(s), etc.]. Manufacturers do not make any claim that a method or test system provides the quality needed for medical application of the test results, i.e., FDA clearance does not require a claim for quality! To do so, a manufacturer would need to state a quality specification, e.g., the allowable total error, the maximum allowable SD, or the maximum allowable bias, then demonstrate that the new method or test system has less error than specified by those allowable limits of error.”

You’re either part of the problem or part of the solution. In this case, Westgard is part of the problem. His suggestion of allowable total error as stated above sounds good, but as I have pointed out many times,

  • Westgard’s maximum allowable total error is for a specified percentage of results – often 95% – which allows for too many results to fail to meet clinical needs
  • Westgard’s suggested testing procedures as described by his quality control rules fail to include all contributions to total error

Thus, 5% of a million results means that there could be 50,000 medically unacceptable results – that’s not quality. When one tests with control samples, one cannot detect interferences, which is often a source of important clinical laboratory errors so all of Westgard’s control quality algorithms for total error are meaningless – they inform about a subset of total error.

Things are improving. In the FDA draft guidance for CLIA waiver applications, FDA requires use of error grids (such as those in use for glucose) and demonstration of lack of erroneous results as defined in those error grids in addition to total allowable error. Many of my essays stress the need to go beyond total allowable error – as used by Westgard – and to put in place procedures to estimate erroneous results (1).


  1.  Jan S. Krouwer: Recommendation to treat continuous variable errors like attribute errors. Clin Chem Lab Med 2006;44(7):797–798.

Frequency of Medical Errors II – Where’s the Data?

May 17, 2006

In virtually any tutorial about quality improvement, one is likely to encounter something like Figure 1, which describes a “closed loop” process. The way this works is simple. One has a goal which one wishes to meet. One measures data appropriate to the goal. If the results from this measure fall short, one enters the “closed loop” where one revises the process and measures progress and continues this cycle until the goal is met. Then, one enters into a different phase, (not shown in Figure 1), where one ensures that the goal will continue to be met.

Two deficiencies in the patient safety movement are: 1) the lack of clear, quantitative goals; and 2) the data from which one can measure progress. A list of problems the way goals are often stated is available (1).

An interesting paper that appeared recently discuses wrong site surgery (2). Given the visibility of wrong site surgery, one notable aspect of this paper is that it is one of the few sources which has wrong site surgery rates. The wrong site surgery rate was 1 in 112,994, or 8.85 wrong site surgeries per million opportunities. To recall, a 6 sigma process has 3.4 errors per million opportunities, so this rate is about 5.8 sigma. The authors state that the rate is equivalent to an error occurrence once every 5 to 10 years. This corresponds to the lowest frequency ranking in the Veterans Administration scheme of an error occurrence once or less every 5 to 30 years (3).

Another interesting aspect of the paper is the discussion of the Universal Protocol, which is a series of steps incorporated into the surgical process and designed to prevent wrong site surgery. One of the conclusions of the paper is that the Universal Protocol does not prevent all wrong site surgeries. The Universal Protocol was implemented as the solution to prevent wrong site surgeries. The problem is that where one would hope that a process change might be sufficient to remedy an issue, often this is not the case. Thus, one must continue to collect data and to add remedies and or change existing ones until the goal has been met, or in other words, continue with the cycle shown in figure 1. So one criticism of the patient safety movement is the mandated, static nature of corrective actions. The dynamic nature implied in figure 1 seems to have been bypassed.

The authors lament that the public is likely to overreact to wrong site surgery relative to other surgical errors such as retained foreign bodies. There are several points to be made here.

In classifying the severity of an error, one must examine the effect of the error, which means looking at the consequences of downstream events connected to the error (often facilitated by using a fault tree). Based on the authors discussion from actual data, retained foreign bodies is a more severe error than wrong site surgery. This is somewhat of a surprise, but is understandable.

Given one has classified all error events for criticality (which is severity and frequency of occurrence), one has the means to construct a Pareto chart. Since organizations have limited resources and cannot fix all problems, based on the Pareto chart, retained foreign bodies is likely to be higher on the Pareto chart than wrong site surgery and deserves more attention.

Proposed process changes need to be evaluated with respect to cost and effectiveness. The “portfolio” of proposed process changes can be viewed as a decision analysis problem whereby the “basket” of process changes selected represent the largest cumulative reduction in medical errors (e.g., reduction in cost associated with medical errors) for the lowest cumulative cost. See the essay on preventability.

I discuss (4) a hypothetical case where two events have identical criticality with respect to patient safety but one is high profile and the other isn’t. Should the high profile event get more attention? The answer is yes, because besides patient safety, there are other error categories for which the high profile event will be more important, such as customer complaints, threat to accreditation, and threat to financial health.

There are other comments that could be made but perhaps the most important comment is that studies such as those conducted by these authors are extremely valuable and are the heart of figure 1; namely, examining error events and currently implemented corrective actions and deciding how to make further improvements.


  1. Assay Development and Evaluation: A Manufacturer’s Perspective. Jan S. Krouwer, AACC Press, Washington DC, 2002. pp 33-44.
  2. Kwaan MR, Studdert DM, Zinner MJ, Gawande AA Incidence, patterns, and prevention of wrong-site surgery. Arch Surg. 2006;141:353-7; discussion 357-8, available at
  3. Healthcare Failure Mode and Effect Analysis (HFMEA) VA National Center for Patient Safety
  4. Managing risk in hospitals using integrated Fault Trees / FMECAs. Jan S. Krouwer, AACC Press, Washington DC, 2004. pp 17-18.

Troponin I – Its Performance Doesn’t Justify Actions

October 13, 2004
  1. An article about evaluation of a troponin I assay – A recent article in Clinical Chemistry reported on the evaluation of a troponin I assay (1). The article’s conclusion was: “Conclusion: AccuTnI is a sensitive and precise assay for the measurement of cTnI”. Yet, in the article it could be seen that the assay failed the ESC- ACC (European Society of Cardiology – American College of Cardiology) guidelines for imprecision (2). Moreover, 6% to 9% of the samples studied using other troponin I assays gave values on different sides of the medical decision cutoff. As is often the case, the study was funded by the manufacturer. Note – It was still possible that this assay was the best troponin I on the market at that time (or today), but the conclusions presented by the authors don’t match their facts.
  2. A questionable POCT strategy – In a recent article (3), the benefit of POCT (Point Of Care Testing) troponin I was reported. The variables studied were clinician satisfaction (using a survey), Emergency Department Length Of Stay, (ED LOS) and Turn-Around-Time (TAT). These results were also presented in a seminar – also published (4), in which the speaker noted the strategy for dealing with POCT Troponin I results was:
  • if the POCT troponin I result was elevated, it was followed up with a lab troponin I assay
  • if the troponin I result was not elevated, no follow up assay was conducted and the POCT result was used as part of the clinician’s rationale to rule out myocardial infarction.

Problems with the clinician satisfaction survey – The clinicians were surveyed about the “confidence in the results obtained from POCT (ie, satisfaction with test accuracy of POCT)”. There are two concepts here – the performance of the assay and the perception of the assay. The survey provides information about the perception (by the clinicians) about the assay. It is unwise to draw conclusions about performance from a survey which the authors admit was subjective. Moreover, since the TAT was better for the POCT assay, one could argue that the clinicians were biased with respect to answering other questions about the assay (e.g., assay quality)

Problems with ED LOS and the assay strategy – The strategy for POCT Troponin I reported above is a biased approach which could lead to a reduced ED LOS. That is, some elevated POCT troponin I results might not be confirmed, so these patients would be released sooner from the ED. Yet, there is no opportunity to rerun POCT troponin I results that are below the cutoff range (e.g., all of these patients do not stay long in the ED). One would like to know how many POCT troponin I results that are below the cutoff, would be above the cutoff, if repeated with a lab troponin I assay.

  1. A lab unwilling to consider alternatives – In another seminar, a speaker reviewed troponin I performance and noted that no assays (as of 2002) met the ESC – ACC guidelines for troponin I performance, with imprecision being the major point of failure. This had the implication that incorrect clinical decisions would be made, partly based on an incorrect troponin I assay. I asked the question, why don’t you run replicates for the assay. This was immediately rejected  as being too expensive.

Whereas many people would agree that running replicates is impractical financially, one may run replicates only for values close to a medical decision limit to minimize increased cost. Moreover, for any proposed cost increase, one should model the financial tradeoffs as suggested in the following table:


Case Benefit Cost Comment
No change No increased test cost Cost of incorrect clinical decisions Depends on likelihood of incorrect clinical decisions
Increase replicates Reduced cost of fewer incorrect clinical decisions Increased test cost


  1. Uettwiller-Geiger D, Wu AHB, Apple FS, Jevans AW, Venge P, Olson MD, Darte C, Woodrum DL, Roberts S and Chan S. A Multicenter Evaluation of an Automated Assay for Troponin I Clinical Chemistry. 2002;48:869-876
  2. The Joint European Society of Cardiology/American College of Cardiology Committee. Myocardial infarction redefined—a consensus document of the joint European Society of Cardiology/American College of Cardiology Committee for the redefinition of myocardial infarction. J Am Coll Cardiol 2000;36:959–69
  3. Lee-Lewandrowski E, Corboy D, Lewandrowski, K, Sinclair J, McDermot S,  Benzer, TI. Implementation of a Point-of-Care Satellite Laboratory in the Emergency Department of an Academic Medical Center. Archives of Pathology and Laboratory Medicine: Vol. 127, No. 4, pp. 456–460 (2003).
  4. Kratz A, Januzzi JL,  Lewandrowski K, and Lee-Lewandrowski E. Positive Predictive Value of a Point-of-Care Testing Strategy on First-Draw Specimens for the Emergency Department–Based Detection of Acute Coronary Syndromes. Archives of Pathology and Laboratory Medicine: Vol. 126, No. 12, pp. 1487–1493 (2002).