November 5, 2017
I just read an interesting paper about irreproducibility in science. The authors suggest a remedy: namely; that “authors of such papers should be invited to provide a 5-year (and perhaps a 10-year) reflection on their papers”.
I suggested to Clinical Chemistry a few years ago that every paper should have a “recommendations” section. To recall, most papers have some or all of: an introduction, methods, results, discussion, and conclusion sections. But rarely if ever is there a recommendations section, although sometimes there is a recommendation in the conclusions section.
In my company, I established a reporting format that required a recommendations section. The recommendations required action words (e.g., verbs).
So a study to evaluate an assay might have as a conclusion: “Assay XYZ has met its performance specifications.” The corresponding recommendation might be: “Release assay XYZ for sale.”
Although the recommendation might seem to be a logical consequence of the conclusion, psychologically, the recommendation requires more commitment. Were there outliers? Did the study have enough samples? Was there possible bias?
In any case, Clinical Chemistry declined to accept my suggestion.
September 29, 2017
I had occasion to read an open access paper “full method validation in clinical chemistry.” So with that title, one expects the big picture and this is what this paper has. But when it discusses analytical method validation, the concept of testing for interfering substances is missing. Precision, bias, and commutability are the topics covered. Now one can say that an interference will cause a bias and this is true but nowhere do these authors mention testing for interfering substances.
The problem is that eventually these papers are turned into guidelines, such as ISO 15197, which is the guideline for glucose meters. And this guideline allows 1% of the results to be unspecified (it used to be 5%). This means that an interfering substance could cause a large error resulting in serious harm in 1% of the results. Given the frequency of glucose meter testing, this translates to one potentially dangerous result per month for an acceptable (according to ISO 15197) glucose meter. If one paid more attention to interfering substances and the fact that they can be large and cause severe patient harm, the guideline may have not have allowed 1% of the results to remain unspecified.
I attended a local AACC talk given by Dr. Inker about GFR. The talk, which was very good had a slide about a paper about creatinine interferences. After the talk, I asked Dr. Inker how she dealt with creatinine interferences on a practical level. She said there was no way to deal with this issue, which was echoed by the lab people there.
Finally, there is a paper by Dr. Plebani, who cites the paper: Vogeser M, Seger C. Irregular analytical errors in diagnostic testing – a novel concept. (Clin Chem Lab Med 2017, ahead of print). Ok, since this is not an open access paper, I didn’t read it but what I can tell from Dr. Plebani comments, the cited authors have discovered the concept of interfering substances and think that people should devote attention to it. Duh! And particularly irksome is the suggestion by Vogeser and Seger of “we suggest the introduction of a new term called the irregular (individual) analytical error.” What’s wrong with interference?
June 16, 2017
A recent article (subscription required) in Clinical Chemistry suggests that in many accuracy studies the results are overinterpreted. The authors go on to say that there is evidence of “spin” in the conclusions. All of this is a euphemistic way of saying the conclusions are not supported by the study that was conducted, which means the science is faulty.
As an aside, early in the article, the authors imply that overinterpretation can lead to false positives, which can cause potential overdiagnosis. I have commented that the word overdiagnosis makes no sense.
But otherwise, I can relate to what the authors are saying – I have many posts of a similar nature. For example…
I have commented that Westgard’s total error analysis while useful does not live up to his claims of being able to determine the quality of a measurement procedure.
I commented that a troponin assay was declared “a sensitive and precise assay for the measurement of cTnI” in spite of the fact that in the results section the assay failed the ESC- ACC (European Society of Cardiology – American College of Cardiology) guidelines for imprecision.
I published observations that most clinical trials conducted to gain regulatory approval for an assay are biased.
I suggested that a recommendation section should be part of Clinical Chemistry articles. There is something about the action verbs in a recommendation that make people think twice.
It would have been interesting if the authors determined how many of the studies were funded by industry, but on the other hand, you don’t have to be part of industry to state conclusions that are not supported by the results.
February 13, 2017
Over 10 years ago I submitted a paper critiquing Bland Altman plots. Since the original publication of Bland Altman plots was the most cited paper ever in The Lancet, I submitted my paper with some temerity.
Briefly, the issue is this. When one is comparing two methods, Bland Altman suggest plotting the difference (Y-X) vs. the average of the two methods (Y+X)/2. Bland Altman also stated in a later paper (1) that even if the X method is a reference method (they use the term gold standard) one should still plot the difference against the average and not doing so is misguided and will lead to correlations. They attempted to prove this with formulas.
Not being so great in math, but doubting their premise, I did some simulations. The results are shown in the table below. Basically, this says that when you have two field methods you should plot the difference vs. (Y+X)/2 as Bland Altman suggest. But when you have field and a reference method, you should plot the difference vs. X. The values in the table are the correlation coefficients for Y-X vs. (Y-X)/2 and Y-X vs. X (after repeated simulations where Y is always a field method and X is either a field method or a reference method).
I submitted my paper as a technical brief to Clin Chem and included my simulation program as an appendix. After being told to recast the paper as a Letter, it was rejected. I submitted it to another journal (I think it was Clin Chem Lab Med) and it was also rejected. I then submitted my letter to Statistics in Medicine (2) where it was accepted.
Now in the lab medicine field, I am known by the other statisticians, and sometimes have published papers not to their liking. Regarding Statistics in Medicine, I am an unknown and lab medicine is a small part of Statistics in Medicine. So maybe, my paper was judged solely on merit or maybe I’m just paranoid.
- Bland JM, Altman DG. (1995) Comparing methods of measurement – why plotting difference against standard method is misleading. Lancet, 346, 1085-1087.
- Krouwer JS Why Bland-Altman plots should use X, not (Y+X)/2 when X is a reference method. Statistics in Medicine, 2008;27:778-780.
August 10, 2016
Theranos has been criticized for its board, which has two former secretaries of state (Henry Kissinger and George Schultz), two former senators and several former high ranking military officers and not much in the way of scientific expertise. Now, their scientific and medical advisory board includes four former AACC presidents: Susan Evans, Ann Gronowski, Larry Kricka, and Jack Ladenson. Note that although clinical chemists have been added, the fact that past presidents have been chosen conforms to Theranos’s strategy of favoring “official” types.
So here’s a question – if you were a well-known clinical chemist, would you accept a position to serve on Theranos’s board?
August 3, 2016
I was among the multitudes who attended Elizabeth Homes’s presentation about Theranos at AACC in Philadelphia. Overall, I was impressed and here are some details. First, she said she wasn’t going to address past malfeasances (not the way she put it) but focus on Theranos’s new instrument.
As an aside, she had an identical accent to that of Mira Sorvino in “Romy and Michelle’s high school reunion”). For those who haven’t seen the movie, I would call this “adult valley girl”.
Her presentation included a lot of data analysis. Terms like ANOVA, Passing-Bablok regression, weighted Deming regression, CLSI guidelines EP05-A3 and EP09-A3, ATE (allowable total error) and others were pronounced and used correctly. (The ATE corresponded to CLIA limits). Having worked most of my career for manufacturers, there is a simple rule manufacturers never show bad data. Hence, until these data are reproduced by others….
The instrumentation was impressive from the standpoint that so many different assay types could fit in one relatively small box, but the technologies with which I am familiar were standard – nothing’s new. I don’t recall her mentioning any specific reagents. When you think about assays, reagents are the ballgame – the instrument is not that special. Something that did seem new was that the software for the instrument (the minilab) is in a central server. The advantages of this remain to be demonstrated.
July 30, 2016
Having looked at a blog entry by the Westgards, which is always interesting, here are my thoughts.
Regarding IQCP, they say it’s mostly been a “waste of time”, an exercise of paperwork to justify current practices, with very little change occurring in QC practices.
This is no surprise to me – here’s why.
There are two ways to reduce errors.
FMEA (or similar programs) reduces the likelihood of rare but severe errors.
FRACAS (or similar programs) reduces the error rate of actual errors, some of which may be severe.
Here are the challenges with FMEA
- It takes time and personnel. There’s no way around this. If sufficient time is not provided with all of the relevant personnel present, the results will suffer. When the Joint Commission required every hospital to perform at least one FMEA per year, people complained that performing a FMEA took too much time.
- Management must be committed. (I was asked to facilitate a FMEA for a company – the meetings were scheduled during lunch. I asked why and was told they had more important things to do). Management wasn’t committed. The only reason this group was doing the FMEA was to satisfy a requirement.
- FMEA requires a facilitator. The purpose of FMEA is to challenge the ways things are done. Often, this means challenging people in the room (e.g., who have put systems in place or manage the ways things are done). This can create an adversarial situation where subordinates will not speak up. Without a good facilitator, results will suffer.
- The guidance to perform a FMEA (such as EP23) is not very good. Example: Failure mode is a short sample. The mitigation is to have someone examine each tube to ensure the sample volume is adequate. The group moves on to the next failure mode. The problem is that the mitigation is not new – it’s existing laboratory practice. Thus, as the Westgards say – all that has happened is the existing process has been documented. That is not FMEA. (A FMEA would enumerate the many ways that someone examining each sample could fail to detect the short sample).
- Pareto charts are absent in the guidance. But real FMEAs require Pareto charts.
- I have seen reports where people say their error rate has been reduced after they conducted a FMEA. But there are no error rates in a FMEA (errors rates are in a FRACAS). So this means no FMEA was carried out.
- And how anyone could say they have conducted a FMEA and conclude that it is ok to run QC monthly.
Here are the challenges with FRACAS
- FRACAS requires a process where errors are counted in a structured way (severity and frequency) and reports issued on a periodic basis. This requires knowledge and commitment.
- FRACAS also requires periodic meetings to review errors whereby problems are assigned to corrective action teams. Again, this requires knowledge and commitment.
- Absence of a Pareto chart is a flag that something is missing (no severity classification, for example).
- People don’t like to see their error rates.
- FRACAS requires a realistic (error rate) goal.
There are FRACAS success stories:
Dr. Peter Pronovost performed a FRACAS type approach on placing central lines and dropped the infection rate from 10% to 0 by the use of checklists.
In the 70s, the use of a FRACAS type approach reduced the error rate in anesthesiology instruments.
And FMEA failures
A Mexican teenager came to the US for a heart lung transplant. The donated organs were not checked to see if they were the right type. The patient died.