Revisiting Bland Altman plots and a paranoia

February 13, 2017


Over 10 years ago I submitted a paper critiquing Bland Altman plots. Since the original publication of Bland Altman plots was the most cited paper ever in The Lancet, I submitted my paper with some temerity.

Briefly, the issue is this. When one is comparing two methods, Bland Altman suggest plotting the difference (Y-X) vs. the average of the two methods (Y+X)/2. Bland Altman also stated in a later paper (1) that even if the X method is a reference method (they use the term gold standard) one should still plot the difference against the average and not doing so is misguided and will lead to correlations. They attempted to prove this with formulas.

Not being so great in math, but doubting their premise, I did some simulations. The results are shown in the table below. Basically, this says that when you have two field methods you should plot the difference vs. (Y+X)/2 as Bland Altman suggest. But when you have field and a reference method, you should plot the difference vs. X. The values in the table are the correlation coefficients for Y-X vs. (Y-X)/2 and Y-X vs. X (after repeated simulations where Y is always a field method and X is either a field method or a reference method).


Case X=X X=(X+Y)/2
X=Reference method ~0 ~0.1
X=Field method ~-0.12 ~0


The paranoia

I submitted my paper as a technical brief to Clin Chem and included my simulation program as an appendix. After being told to recast the paper as a Letter, it was rejected. I submitted it to another journal (I think it was Clin Chem Lab Med) and it was also rejected. I then submitted my letter to Statistics in Medicine (2) where it was accepted.

Now in the lab medicine field, I am known by the other statisticians, and sometimes have published papers not to their liking. Regarding Statistics in Medicine, I am an unknown and lab medicine is a small part of Statistics in Medicine. So maybe, my paper was judged solely on merit or maybe I’m just paranoid.


  1. Bland JM, Altman DG. (1995) Comparing methods of measurement – why plotting difference against standard method is misleading. Lancet, 346, 1085-1087.
  1. Krouwer JS Why Bland-Altman plots should use X, not (Y+X)/2 when X is a reference method. Statistics in Medicine, 2008;27:778-780.

Theranos Board now populated with past AACC presidents

August 10, 2016


Theranos has been criticized for its board, which has two former secretaries of state (Henry Kissinger and George Schultz), two former senators and several former high ranking military officers and not much in the way of scientific expertise. Now, their scientific and medical advisory board includes four former AACC presidents: Susan Evans, Ann Gronowski, Larry Kricka, and Jack Ladenson. Note that although clinical chemists have been added, the fact that past presidents have been chosen conforms to Theranos’s strategy of favoring “official” types.

So here’s a question – if you were a well-known clinical chemist, would you accept a position to serve on Theranos’s board?

Theranos – Part 2

August 3, 2016


I was among the multitudes who attended Elizabeth Homes’s presentation about Theranos at AACC in Philadelphia. Overall, I was impressed and here are some details. First, she said she wasn’t going to address past malfeasances (not the way she put it) but focus on Theranos’s new instrument.

As an aside, she had an identical accent to that of Mira Sorvino in “Romy and Michelle’s high school reunion”). For those who haven’t seen the movie, I would call this “adult valley girl”.

Her presentation included a lot of data analysis. Terms like ANOVA, Passing-Bablok regression, weighted Deming regression, CLSI guidelines EP05-A3 and EP09-A3, ATE (allowable total error) and others were pronounced and used correctly. (The ATE corresponded to CLIA limits). Having worked most of my career for manufacturers, there is a simple rule manufacturers never show bad data. Hence, until these data are reproduced by others….

The instrumentation was impressive from the standpoint that so many different assay types could fit in one relatively small box, but the technologies with which I am familiar were standard – nothing’s new. I don’t recall her mentioning any specific reagents. When you think about assays, reagents are the ballgame – the instrument is not that special. Something that did seem new was that the software for the instrument (the minilab) is in a central server. The advantages of this remain to be demonstrated.

IQCP – waste of time? No surprise

July 30, 2016


Having looked at a blog entry by the Westgards, which is always interesting, here are my thoughts.

Regarding IQCP, they say it’s mostly been a “waste of time”, an exercise of paperwork to justify current practices, with very little change occurring in QC practices.

This is no surprise to me – here’s why.

There are two ways to reduce errors.

FMEA (or similar programs) reduces the likelihood of rare but severe errors.

FRACAS (or similar programs) reduces the error rate of actual errors, some of which may be severe.

Here are the challenges with FMEA

  1. It takes time and personnel. There’s no way around this. If sufficient time is not provided with all of the relevant personnel present, the results will suffer. When the Joint Commission required every hospital to perform at least one FMEA per year, people complained that performing a FMEA took too much time.
  2. Management must be committed. (I was asked to facilitate a FMEA for a company – the meetings were scheduled during lunch. I asked why and was told they had more important things to do). Management wasn’t committed. The only reason this group was doing the FMEA was to satisfy a requirement.
  3. FMEA requires a facilitator. The purpose of FMEA is to challenge the ways things are done. Often, this means challenging people in the room (e.g., who have put systems in place or manage the ways things are done). This can create an adversarial situation where subordinates will not speak up. Without a good facilitator, results will suffer.
  4. The guidance to perform a FMEA (such as EP23) is not very good. Example: Failure mode is a short sample. The mitigation is to have someone examine each tube to ensure the sample volume is adequate. The group moves on to the next failure mode. The problem is that the mitigation is not new – it’s existing laboratory practice. Thus, as the Westgards say – all that has happened is the existing process has been documented. That is not FMEA. (A FMEA would enumerate the many ways that someone examining each sample could fail to detect the short sample).
  5. Pareto charts are absent in the guidance. But real FMEAs require Pareto charts.
  6. I have seen reports where people say their error rate has been reduced after they conducted a FMEA. But there are no error rates in a FMEA (errors rates are in a FRACAS). So this means no FMEA was carried out.
  7. And how anyone could say they have conducted a FMEA and conclude that it is ok to run QC monthly.

Here are the challenges with FRACAS

  1. FRACAS requires a process where errors are counted in a structured way (severity and frequency) and reports issued on a periodic basis. This requires knowledge and commitment.
  2. FRACAS also requires periodic meetings to review errors whereby problems are assigned to corrective action teams. Again, this requires knowledge and commitment.
  3. Absence of a Pareto chart is a flag that something is missing (no severity classification, for example).
  4. People don’t like to see their error rates.
  5. FRACAS requires a realistic (error rate) goal.

There are FRACAS success stories:

Dr. Peter Pronovost performed a FRACAS type approach on placing central lines and dropped the infection rate from 10% to 0 by the use of checklists.

In the 70s, the use of a FRACAS type approach reduced the error rate in anesthesiology instruments.

And FMEA failures

A Mexican teenager came to the US for a heart lung transplant. The donated organs were not checked to see if they were the right type. The patient died.

What needs to be measured to ensure the clinical usefulness of an assay

July 19, 2016


I was happy to see an editorial which IMHO states the required error components that need to be understood to ensure the clinical usefulness of an assay. Of course bias and imprecision are mentioned. But in addition, the author mentions freedom from interferences and pre and post analytical errors.

One can ask don’t interferences and pre and post analytical errors cause bias? Since the answer is yes, then why do these terms need to be mentioned if it was already stated that bias is to be measured. The reason is the way bias is measured in many cases will fail to detect the biases from interferences and pre and post analytical errors.

For example, if regression is used, average bias will be estimated, not the individual biases that can occur from interferences.

If σ is estimated, this usually involves bias measured from either regression or from quality control samples so again interference biases don’t get counted.

Finally, most of the studies are done in ways in which pre and post analytical errors have been minimized – the studies are performed outside of the routine way of processing patient samples. Hence, to ensure the clinical usefulness of an assay, one must construct protocols that measure all of the error components mentioned in the first paragraph.

No surprise that Instructions For Use (package inserts) are weak

July 16, 2016



A recent letter in Clinical Chemistry (subscription required) talks about package inserts from manufacturers (also called instructions for use (IFU). The letter says that manufacturers’ IFUs often do not follow CLSI guidelines with respect to hemoglobin interference.

This should come as no surprise – here’s why.

The authors cite FDA regulations which state: “Limitation of the procedure: Include a statement of limitations of the procedure. State known extrinsic factors or interfering substances affecting results.”

This regulation leaves a lot of leeway as to what should appear in the IFU.

So the authors say that CLSI guidelines (C56 and EP7) are not followed. One should understand that CLSI guidelines are not regulations. No manufacturer has to follow them. Moreover, these guidelines are often manufacturer friendly as manufacturers dominate the committees who prepare the documents. For example, the authors cite C56 which has an example for how to report when there is no hemoglobin interference for glucose. The table contains the concentration of hemoglobin tested, two glucose levels, and bias < 10%.

This is messed up! If bias found were 9%, this CLSI guideline is suggesting that it is ok to say there was no bias!

So even if manufacturers followed CLSI guidelines, maybe this wouldn’t be so good.

To understand why a CLSI document would permit the claim “no bias” when 9% bias was found…

CLSI prides itself on equal influence of “professions” (e.g., clinical chemists in hospitals), “government” (e.g., FDA), and “manufacturers” (people in industry). But the industry people are largely from regulatory affairs and their role on committees has often been an obstructionist role. Basically, the industry – like industries in other fields – does not want to be regulated at all, so if there has to be a standard, the regulatory people try to make it as industry friendly as possible.

As an example of the obstructionist role, consider EP7. It was initially published as a “P” (proposed) version in 1986. Only “A” (accepted) versions are accepted by the FDA. So how long did it take for this standard to go from P to A: 16 years! (initially published in 2002.) It wasn’t until I was the chair of the Evaluation Protocol Committee that this project got moving faster than a snail’s pace and was finished.

And then there was the CLS standard EP11 – Uniformity of Claims. It was intended to be a guideline for IFUs. It’s hard to say if this standard would help since it could also be ignored. It was published as a “P” document in 1996. CLSI management (who was pressured by industry) pressured me to cancel it – I didn’t but they did and it was not advanced and is no longer available.

Finally, I can’t speak about other companies, but in the company that I worked for, IFUs were prepared by the marketing department.

Unwarranted Conclusions

June 2, 2016


Looking at a paper about QC procedures (subscription required), I admit I was intrigued by the title: “Selecting Statistical Procedures for Quality Control Planning Based on Risk Management.”

Just reading the abstract and the first few lines informs me that the conclusions are unwarranted because the authors claim, they can estimate the probability of patient harm based on which QC procedure is chosen.

A QC procedure helps to detect problems with the assay process. Patient harm can be caused by an assay process gone astray but it can also be caused by things with an assay process that has not gone astray. For example, a patient interference can cause patient harm and will not be detected by QC. Moreover, the authors assume that an out of control condition will occur in a constant fashion until it is detected by the next QC sample, but a shift in results that occurs for a limited number of samples can occur and is eliminated from consideration. So even QC considerations don’t include all possible errors.

Ok, I admit that I have stopped reading but it is clear that whatever the authors estimate (assuming their logic is correct) is an underestimate of the probability of patient harm.

That also makes me wonder, of all cases of patient harm caused by wrong medical decisions caused by assay error, what percentage are due to the assay process gone bad vs. other causes (e.g., interferences). For example searching for the word “interference” in the title of Clinical Chemistry over the last 10 years yielded 912 results.