The bad and the good in lab medicine

May 5, 2020

First, the bad. Well bad is maybe too strong a word.


If you have ever read an ISO standard, you will notice that something is missing. There is no list of authors or committee members. The people who write the standard should be listed!

ISO 9001, ISO 15189

In the 90s, companies would display banners stating there were ISO 9001 certified. From the ISO website “Using ISO 9001 helps ensure that customers get consistent, good-quality products and services, which in turn brings many business benefits.” But accreditation success is judged by the documentation that the organization has, to show that it is following the processes that it has developed. A company could have poor quality but if they can prove through documentation that they follow their processes, they will be accredited. The same applies to ISO 15189, the clinical laboratory version.

Krouwer JS ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry Accred. Qual. Assur., 9: 39-43 (2004)

ISO 15197

This standard describes the required accuracy for patients with diabetes who self-monitor their glucose with glucose meters. It came out in 2003 and was updated in 2013. The 2003 version allowed 5% of the results to have any difference from reference (hence 5% unspecified). The 2013 version reduced the unspecified amount to 1%. People with diabetes do a lot of testing. To have 5% unspecified meant that once a week you could get a result that could kill you from an ISO acceptable glucose meter 2003 version). For the 2013 version this was once a month. I was invited to attend an early meeting of the 15197 committee. It was not run by endocrinologists, but rather by regulatory affairs people from industry.

Krouwer JS Wrong thinking about glucose standards. Clin Chem, 2010;56:874-875.


I spent many years contributing to CLSI in the area of evaluations. This group is dominated by regulatory affairs people and it was always difficult to finish any evaluation standard. For example, when I became chairholder of the committee, I finally published a bunch of standards which had been sitting around for 14 years!

I thought Jim Westgard’s original idea about total error made a lot of sense, so I established and chaired a standard about total error (EP21). When it was time to revise the standard, it occurred to me that the original document was about “total analytical error.” I suggested that in the revision, we include pre- and post-analytical error. There was strong opposition to this and not just by the regulatory affairs people but also by hospital clinical chemists. After a while, the CLSI management threw me out of CLSI.

Clinical Chemistry (Journal)

There have been a lot of good things in the journal Clinical Chemistry but here is one that is not so good. The journal will not accept for review a Letter to the Editor except if the Letter is about an original article. That means that about half of the content in the journal (case studies, opinions, editorials, and so on) are off limits. I asked the editor of the journal during a local AACC meeting and his response went along the lines – we vet our articles very carefully and don’t wish to burden the journal with useless blather. My Letter to Clinical Chemistry published in 2010 about the ISO glucose standard would not be considered today.

The good

I like all sections of the AACC Artery. I look at it every day.

Who influences CMS and CDC?

March 23, 2019

A recent editorial disagrees with the proposed CLIA limits for HbA1c provided by CMS and CDC (The Need for Accuracy in Hemoglobin A1c Proficiency Testing: Why the Proposed CLIA Rule of 2019 Is a Step Backward) online in J Diabetes Science and Technology. The proposed CLIA limits are ± 10% – the NGSP limits are 5%, and the CAP limits 6%. Reading the Federal Register, I don’t understand the basis of the 10%.

This reminds me of another CMS decree in the early 2000s – Equivalent Quality Control. Under this program, a lab director could run quality control for 10 days as well as the automated internal quality checks and decide whether the two were equivalent. If the answer was yes, the frequency of quality control could be reduced to once a month. This made no sense!

New statistics will not help bad science

November 27, 2018

An article in Clinical Chemistry (1) refers to another article by Ioannidis (2) with a recommendation to change the tradition level of statistical significance for P values from 0.05 to 0.005.

The reasons presented for the proposed change make no sense. Here’s why

The first limitation is that P values are often misinterpreted …

If people misinterpret P values, then training needs to be improved, not changing P values!

The second limitation is that P values are overtrusted, when the P value can be highly influenced by factors such as sample size or selective reporting of data. 

Any introductory statistics textbook provides guidance on how to calculate the proper sample size for an experiment. Once again, this is a training issue. The second part of this reason is more insidious. If selective reporting of data occurs, the experiment is biased and no P value is valid!

The third limitation discussed by Ioannidis is that P values are often misused to draw conclusions about the research.

Another plea for training. And how will changing the level of statistical significance prevent wrong conclusions?

Actually, I prefer using confidence limits instead of P values but they provide no guarantees either. A famous example by Youden showed that for 15 estimates of the solar unit made from 1895 to 1961, each confidence interval did not overlap its predecessor.


  1. Hackenmueller, SA What’s the Value of the P Value? Clin Chem 2018;64:1675.
  2. Ioannidis JPA. The proposal to lower P value thresholds to .005. JAMA 2018;319:1429 –30.

Reviving an old accuracy hierarchy in clinical chemistry

September 3, 2018

Things that simplify are good and I recently had occasion to review one of these. It is an article by Tietz which is here. He describes a hierarchy of accuracy for clinical chemistry methods as follows:

Definitive method – methods that provide the highest accuracy such as isotope dilution mass spectroscopy

Reference method – documented methods not quite as accurate but doable for a wider variety of sites. Often these are manual methods using protein free filtrates

Field method – All of the commercial methods

Unfortunately, the ponderous and unhelpful metrology terminology now dominates and the clarity of Tietz has taken a backseat. For example, if one searches through VIM, the word definitive does not appear. But the word measurand is all over the place.


It’s hard to be a clinical chemist

March 25, 2018

What I mean by a clinical chemist is anyone associated with clinical chemistry which includes people who work in hospitals and anybody who works for a manufacturer.

A recent example is about blood lead, a product for which I consulted. As reported recently, the electrochemical method was at times giving the wrong answers. It was finally determined that a compound in the rubber stoppers of blood collection tubes was dissolving in blood and absorbing lead. Thus, nothing can be assumed – anything including the blood collections tubes can cause problems.

Commutability and déjà vu

March 18, 2018

Reading the series of articles and editorial in March 2018 Clinical Chemistry about commutability reminds me of my job that started almost 40 years ago at Technicon Instruments. My group, under the leadership of Dr. Stan Bauer, was responsible for putting the right values on calibrators for all of our assays. Back then, when customers complained that they weren’t getting the right result, the calibrator value was often blamed. I seem to recall that the customer even had the ability to choose a different value for the calibrator (we called the calibrator values “set points”).

In any case, what we did was as follows. We occupied space at the hospital of New York Medical College in nearby Valhalla (Technicon was in Tarrytown). We acquired patient samples that were no longer needed by the hospital and ran them both on our instruments and reference methods. Then, through data analysis, we assigned a calibrator value to the master lot of calibrator that would make the patient samples in the Technicon method equal what was obtained for the reference method. For some assays such as bilirubin if I remember correctly, the calibrator contained a dye and thus no analyte at all! Suffice it to say that whereas commutability of our calibrators didn’t exist, the patient samples nevertheless came out right (same as reference method).

It was this data analysis work that turned me into a statistician. I enjoyed the work and was finding out properties of our Technicon assays that the biostatisticians had missed and some of these properties were critical in calibrator value assignment.

On another note, I was at a small company a few years ago on a sales call. As I was describing my background including Technicon, I asked the small group – anyone hear of Technicon? No one raised their hand.

Articles accompanied by an editorial

March 16, 2018

Ever notice how in Clinical Chemistry (and other journals), an editorial accompanies an article (or series of articles) in the same issue. The editorial is saying – hey! listen up people, these articles are really important. And then the editorial goes on to explain what the article is about and why it’s important. It’s the book explaining the book.

Misuse of the term random error

January 31, 2018


In clinical chemistry, one often hears that there are two contributions to error – systematic error and random error. Random error is often estimated by taking the SD of a set of observations of the same sample. But does the SD estimate random error? And are repeatability and reproducibility forms of random error? (Recall that repeatability = within run imprecision and reproducibility = long term (or total) imprecision.

Example 1 – An assay with linear drift with 10 observations run one after the other.

The SD of these 10 observations = 1.89. But if one sets up a regression with Y=drift + error, the error term is 0.81. Hence, the real random error is much less than the estimated SD random error because the observations are contaminated with a bias (namely drift). So here is a case where repeatability doesn’t measure random error by taking the SD, one has to investigate further.


Example 2 – An assay with calibration (drift) bias using the same figure as above (Ok I used the same numbers but this doesn’t matter).

Assume that in the above figure, each N is the average of a month of observations, corresponding to a calibration. Each subsequent month has a new calibration.

Clearly, the same argument applies. There is now calibration bias which inflates the apparent imprecision so once again, the real random error is much less than what one measures by taking the SD.

More commitment needed from authors

November 5, 2017

I just read an interesting paper about irreproducibility in science. The authors suggest a remedy: namely; that “authors of such papers should be invited to provide a 5-year (and perhaps a 10-year) reflection on their papers”.

I suggested to Clinical Chemistry a few years ago that every paper should have a “recommendations” section. To recall, most papers have some or all of: an introduction, methods, results, discussion, and conclusion sections. But rarely if ever is there a recommendations section, although sometimes there is a recommendation in the conclusions section.

In my company, I established a reporting format that required a recommendations section. The recommendations required action words (e.g., verbs).

So a study to evaluate an assay might have as a conclusion: “Assay XYZ has met its performance specifications.” The corresponding recommendation might be: “Release assay XYZ for sale.”

Although the recommendation might seem to be a logical consequence of the conclusion, psychologically, the recommendation requires more commitment. Were there outliers? Did the study have enough samples? Was there possible bias?

In any case, Clinical Chemistry declined to accept my suggestion.


Two examples of why interferences are important and a comment about a “novel approach” to interferences

September 29, 2017

I had occasion to read an open access paper “full method validation in clinical chemistry.” So with that title, one expects the big picture and this is what this paper has. But when it discusses analytical method validation, the concept of testing for interfering substances is missing. Precision, bias, and commutability are the topics covered. Now one can say that an interference will cause a bias and this is true but nowhere do these authors mention testing for interfering substances.

The problem is that eventually these papers are turned into guidelines, such as ISO 15197, which is the guideline for glucose meters. And this guideline allows 1% of the results to be unspecified (it used to be 5%). This means that an interfering substance could cause a large error resulting in serious harm in 1% of the results. Given the frequency of glucose meter testing, this translates to one potentially dangerous result per month for an acceptable (according to ISO 15197) glucose meter. If one paid more attention to interfering substances and the fact that they can be large and cause severe patient harm, the guideline may have not have allowed 1% of the results to remain unspecified.

I attended a local AACC talk given by Dr. Inker about GFR. The talk, which was very good had a slide about a paper about creatinine interferences. After the talk, I asked Dr. Inker how she dealt with creatinine interferences on a practical level. She said there was no way to deal with this issue, which was echoed by the lab people there.

Finally, there is a paper by Dr. Plebani, who cites the paper: Vogeser M, Seger C. Irregular analytical errors in diagnostic testing – a novel concept. (Clin Chem Lab Med 2017, ahead of print). Ok, since this is not an open access paper, I didn’t read it but what I can tell from Dr. Plebani comments, the cited authors have discovered the concept of interfering substances and think that people should devote attention to it. Duh! And particularly irksome is the suggestion by Vogeser and Seger of “we suggest the introduction of a new term called the irregular (individual) analytical error.” What’s wrong with interference?