Commutability and déjà vu

March 18, 2018

Reading the series of articles and editorial in March 2018 Clinical Chemistry about commutability reminds me of my job that started almost 40 years ago at Technicon Instruments. My group, under the leadership of Dr. Stan Bauer, was responsible for putting the right values on calibrators for all of our assays. Back then, when customers complained that they weren’t getting the right result, the calibrator value was often blamed. I seem to recall that the customer even had the ability to choose a different value for the calibrator (we called the calibrator values “set points”).

In any case, what we did was as follows. We occupied space at the hospital of New York Medical College in nearby Valhalla (Technicon was in Tarrytown). We acquired patient samples that were no longer needed by the hospital and ran them both on our instruments and reference methods. Then, through data analysis, we assigned a calibrator value to the master lot of calibrator that would make the patient samples in the Technicon method equal what was obtained for the reference method. For some assays such as bilirubin if I remember correctly, the calibrator contained a dye and thus no analyte at all! Suffice it to say that whereas commutability of our calibrators didn’t exist, the patient samples nevertheless came out right (same as reference method).

It was this data analysis work that turned me into a statistician. I enjoyed the work and was finding out properties of our Technicon assays that the biostatisticians had missed and some of these properties were critical in calibrator value assignment.

On another note, I was at a small company a few years ago on a sales call. As I was describing my background including Technicon, I asked the small group – anyone hear of Technicon? No one raised their hand.


Articles accompanied by an editorial

March 16, 2018

Ever notice how in Clinical Chemistry (and other journals), an editorial accompanies an article (or series of articles) in the same issue. The editorial is saying – hey! listen up people, these articles are really important. And then the editorial goes on to explain what the article is about and why it’s important. It’s the book explaining the book.

Misuse of the term random error

January 31, 2018


In clinical chemistry, one often hears that there are two contributions to error – systematic error and random error. Random error is often estimated by taking the SD of a set of observations of the same sample. But does the SD estimate random error? And are repeatability and reproducibility forms of random error? (Recall that repeatability = within run imprecision and reproducibility = long term (or total) imprecision.

Example 1 – An assay with linear drift with 10 observations run one after the other.

The SD of these 10 observations = 1.89. But if one sets up a regression with Y=drift + error, the error term is 0.81. Hence, the real random error is much less than the estimated SD random error because the observations are contaminated with a bias (namely drift). So here is a case where repeatability doesn’t measure random error by taking the SD, one has to investigate further.


Example 2 – An assay with calibration (drift) bias using the same figure as above (Ok I used the same numbers but this doesn’t matter).

Assume that in the above figure, each N is the average of a month of observations, corresponding to a calibration. Each subsequent month has a new calibration.

Clearly, the same argument applies. There is now calibration bias which inflates the apparent imprecision so once again, the real random error is much less than what one measures by taking the SD.

More commitment needed from authors

November 5, 2017

I just read an interesting paper about irreproducibility in science. The authors suggest a remedy: namely; that “authors of such papers should be invited to provide a 5-year (and perhaps a 10-year) reflection on their papers”.

I suggested to Clinical Chemistry a few years ago that every paper should have a “recommendations” section. To recall, most papers have some or all of: an introduction, methods, results, discussion, and conclusion sections. But rarely if ever is there a recommendations section, although sometimes there is a recommendation in the conclusions section.

In my company, I established a reporting format that required a recommendations section. The recommendations required action words (e.g., verbs).

So a study to evaluate an assay might have as a conclusion: “Assay XYZ has met its performance specifications.” The corresponding recommendation might be: “Release assay XYZ for sale.”

Although the recommendation might seem to be a logical consequence of the conclusion, psychologically, the recommendation requires more commitment. Were there outliers? Did the study have enough samples? Was there possible bias?

In any case, Clinical Chemistry declined to accept my suggestion.


Two examples of why interferences are important and a comment about a “novel approach” to interferences

September 29, 2017

I had occasion to read an open access paper “full method validation in clinical chemistry.” So with that title, one expects the big picture and this is what this paper has. But when it discusses analytical method validation, the concept of testing for interfering substances is missing. Precision, bias, and commutability are the topics covered. Now one can say that an interference will cause a bias and this is true but nowhere do these authors mention testing for interfering substances.

The problem is that eventually these papers are turned into guidelines, such as ISO 15197, which is the guideline for glucose meters. And this guideline allows 1% of the results to be unspecified (it used to be 5%). This means that an interfering substance could cause a large error resulting in serious harm in 1% of the results. Given the frequency of glucose meter testing, this translates to one potentially dangerous result per month for an acceptable (according to ISO 15197) glucose meter. If one paid more attention to interfering substances and the fact that they can be large and cause severe patient harm, the guideline may have not have allowed 1% of the results to remain unspecified.

I attended a local AACC talk given by Dr. Inker about GFR. The talk, which was very good had a slide about a paper about creatinine interferences. After the talk, I asked Dr. Inker how she dealt with creatinine interferences on a practical level. She said there was no way to deal with this issue, which was echoed by the lab people there.

Finally, there is a paper by Dr. Plebani, who cites the paper: Vogeser M, Seger C. Irregular analytical errors in diagnostic testing – a novel concept. (Clin Chem Lab Med 2017, ahead of print). Ok, since this is not an open access paper, I didn’t read it but what I can tell from Dr. Plebani comments, the cited authors have discovered the concept of interfering substances and think that people should devote attention to it. Duh! And particularly irksome is the suggestion by Vogeser and Seger of “we suggest the introduction of a new term called the irregular (individual) analytical error.” What’s wrong with interference?

Overinterpretation of results – bad science

June 16, 2017

A recent article (subscription required) in Clinical Chemistry suggests that in many accuracy studies the results are overinterpreted. The authors go on to say that there is evidence of “spin” in the conclusions. All of this is a euphemistic way of saying the conclusions are not supported by the study that was conducted, which means the science is faulty.

As an aside, early in the article, the authors imply that overinterpretation can lead to false positives, which can cause potential overdiagnosis. I have commented that the word overdiagnosis makes no sense.

But otherwise, I can relate to what the authors are saying – I have many posts of a similar nature. For example…

I have commented that Westgard’s total error analysis while useful does not live up to his claims of being able to determine the quality of a measurement procedure.

I commented that a troponin assay was declared “a sensitive and precise assay for the measurement of cTnI” in spite of the fact that in the results section the assay failed the ESC- ACC (European Society of Cardiology – American College of Cardiology) guidelines for imprecision.

I published observations that most clinical trials conducted to gain regulatory approval for an assay are biased.

I suggested that a recommendation section should be part of Clinical Chemistry articles. There is something about the action verbs in a recommendation that make people think twice.

It would have been interesting if the authors determined how many of the studies were funded by industry, but on the other hand, you don’t have to be part of industry to state conclusions that are not supported by the results.


Revisiting Bland Altman plots and a paranoia

February 13, 2017


Over 10 years ago I submitted a paper critiquing Bland Altman plots. Since the original publication of Bland Altman plots was the most cited paper ever in The Lancet, I submitted my paper with some temerity.

Briefly, the issue is this. When one is comparing two methods, Bland Altman suggest plotting the difference (Y-X) vs. the average of the two methods (Y+X)/2. Bland Altman also stated in a later paper (1) that even if the X method is a reference method (they use the term gold standard) one should still plot the difference against the average and not doing so is misguided and will lead to correlations. They attempted to prove this with formulas.

Not being so great in math, but doubting their premise, I did some simulations. The results are shown in the table below. Basically, this says that when you have two field methods you should plot the difference vs. (Y+X)/2 as Bland Altman suggest. But when you have field and a reference method, you should plot the difference vs. X. The values in the table are the correlation coefficients for Y-X vs. (Y-X)/2 and Y-X vs. X (after repeated simulations where Y is always a field method and X is either a field method or a reference method).


Case X=X X=(X+Y)/2
X=Reference method ~0 ~0.1
X=Field method ~-0.12 ~0


The paranoia

I submitted my paper as a technical brief to Clin Chem and included my simulation program as an appendix. After being told to recast the paper as a Letter, it was rejected. I submitted it to another journal (I think it was Clin Chem Lab Med) and it was also rejected. I then submitted my letter to Statistics in Medicine (2) where it was accepted.

Now in the lab medicine field, I am known by the other statisticians, and sometimes have published papers not to their liking. Regarding Statistics in Medicine, I am an unknown and lab medicine is a small part of Statistics in Medicine. So maybe, my paper was judged solely on merit or maybe I’m just paranoid.


  1. Bland JM, Altman DG. (1995) Comparing methods of measurement – why plotting difference against standard method is misleading. Lancet, 346, 1085-1087.
  1. Krouwer JS Why Bland-Altman plots should use X, not (Y+X)/2 when X is a reference method. Statistics in Medicine, 2008;27:778-780.