December 7, 2013
I recently took a survey and as for many surveys, there were a few questions for which none of the answer choices seemed to fit. In the case of this survey, I knew the author of the survey and emailed him my concern. His answer illuminated things.
The survey was about prostate cancer treatment by proton beam therapy (a form of radiation) and the question was: “How would you describe the quality of your life TODAY: better than, same as, or worse than before proton treatment?” The author was thinking that proton beam therapy side effects are minimal as compared to – for some – the life altering side effects of surgery. Moreover, the author had non prostate related health counseling that improved his life so for him, the choice was clear – his quality of life was better.
For most of us, prostate cancer has no symptoms – the only way we know we have it is an elevated PSA followed by a biopsy. Also for most of us, proton beam therapy side effects are minimal but there are still side effects; hence the only logical way to answer the question is that quality of life is worse than before proton treatment. Of course, the quality of life for some might be better – say if you hit the lottery, but this is unrelated to treatment.
One way of preventing these issues is to test the survey with a subset of the intended recipients. This should help but perhaps another thing to do is to add a response to every question that is something like: “this question cannot be answered with the above choices.”
December 7, 2013
An article has been accepted in the Journal of Diabetes Science and Technology for March which once again will critique the Westgard model of total error. In this case my critique focuses on glucose meters where Boyd and Bruns (1) model glucose meter total error and claim that if one sets goals for (average) bias and imprecision, one knows the total error of glucose meters.
My critique will show (by simulation) that if one has two glucose meters, where one is subject to hematocrit interference and the other not (yes, this happens) that the Boyd and Bruns model fails to distinguish any performance difference between these two meters but a correct way of measuring total error shows that the two meters have much different performance.
One could ask, why write this when I have already critiqued the Boyd and Bruns model and they responded that my critique was correct. The reason is that Boyd and Bruns have written subsequent papers using their model as if my critique never happened. And there is currently much emphasis on understanding how glucose meters perform.
- Boyd JC and Bruns DE Quality Specifications for Glucose Meters: Assessment by Simulation Modeling of Errors in Insulin Dose Clinical Chemistry 2001; 47:209-214.
November 30, 2013
Glucose meters are an example of unit use devices, meaning that when a sample is assayed, a new reagent strip (the unit) has to be used. Some years ago, unit use device manufacturers argued that QC is less important for their products because of among other reasons, a more rigorously controlled manufacturing process was used.
I have been doing some work with glucose meters and note that at least twice this year there have been recalls for reagents strips from two different manufacturers. Here are my reasons for why these recalls continue to happen.
- Vendors that supply raw materials have provided different lots from those used to design and evaluate the original reagent strip.
- Vendor processes have changed.
- The glucose meter manufacturer processes have changed
- The process used to release reagent strip lots is imperfect. It is not as rigorous as a full blown method comparison and the parameters measured may not reflect all aspects of performance.
- The process parameters limits may not be correct.
- Some key variables may not be measured.
- The sample size may not be adequate.
- And last but not least people make mistakes!!!
As someone who worked for manufacturers, the recall sequence was usually: our service department received complaints from customers, these complaints were verified in-house, and a recall was initiated.
November 28, 2013
The mountain plot, created by my colleague Mike Lynch while we were at Ciba Corning is part of the CLSI standard EP21-A. It was used extensively while we were at Ciba Corning, but it has not been very popular as it is not often cited. On the other hand, the Bland-Altman plot is frequently cited.
But this is not a competition. Sometimes one plot is better, sometimes the other and often both should be shown. I was at a glucose meter conference this September in Washington DC where someone was presenting data for two glucose meters vs. reference using a Bland Altman plot. He should have been using a mountain plot. I don’t have his data, but this is an example of when the mountain plot is better than the Bland-Altman plot.
With the Bland-Altman plot, the pattern of the “bad” vs. “good” assay is harder to see than with the mountain plot. Moreover, as more data gets added, the Bland-Altman plot becomes a mess of dots, whereas the mountain plot remains sharp. If there were 3 or 4 glucose meters, the mountain plot would be even better.
To construct a mountain plot in a spreadsheet:
- Calculate the differences between the candidate and reference assay
- Sort the differences from low to high
- Rank the sorted differences
- Calculate the cumulative probability as rank / (number of observations + 1)
- Calculate the adjusted cumulative probability as: If the cumulative probability is greater than 0.5, use 1- cumulative probability.
November 9, 2013
P4P has been around for a number of years as a way to reward and punish physicians financially based on selected measures. P4P has been widely criticized, here for example and also in the NEJM.
I disagree with one P4P criticism that the work of a physician is so complex that judging performance is impossible – it would be easier to split the atom. The problem is rather that a subset of performance measures has been chosen and without the entire set of performance measures, the result is similar to my last post although in this case it is like having a dictionary with only the letter “C.”
An alternative would be to use a total error concept. That is, for any patient care episode, what errors have been made? Errors could not only include harm but could be financial as well. Thus, if a physician ordered unnecessary blood tests, financial waste has occurred. The value of total error is that there is no modeling – all errors would be captured by examining whether there is harm (or waste) in the care of a patient. Thus, there is no list of performance measures. And yes, there can be a physician error even when the patient presents with a complex set of symptoms. But the concept is impractical because one would need a panel of experts – see for example a NEJM case study – for each patient encounter.
Another approach would be to evaluate known cases of patient harm in a NTSB type of approach. Here the goal would be to understand causes to reduce error rates. In certain cases such as incompetence, physicians would be punished but in other cases, process improvements including better training might would be used.
November 3, 2013
A recent review article on glucose meters deserves comment. Its title is: Assessing the Analytical Performance of Systems for Self-Monitoring of Blood Glucose: Concepts of Performance Evaluation and Definition of Metrological Key Terms.
In this article, the Westgard model for total error is used. There are the usual four pictures of data superimposed on bull’s-eyes with all combinations of high and low precision and trueness. Well, below is a picture of assay drift that never makes it into these discussions. The problem is that it doesn’t fit into one of the four combinations of high and low precision and trueness. The authors reference an article I wrote in which I critiqued the Westgard model and I suggested a different and more complete total error model. The authors of the current review say that factors such as drift and interferences “may go beyond the scope of this review.” But it makes no sense to leave out error sources and at the same time claim you have total error. It’s like publishing a dictionary of the English language but leaving out words that begin with the letter W.