Looking at glucose meter adverse events from the FDA database, an SQL query showed that 73% of the time, a form of the verb allege was used by manufacturers to describe the event, as in “the user alleged that …” This is I guess one way of acknowledging that these events are unverified.
Having an occasion to read the ISO 15197 standard (for glucose meters) I notice the statements:
One of the reasons allowed to discard data is: “the blood-glucose monitoring system user recognizes that an error was made and documents the details”
This makes ISO a biased standard because in the real world there will be user error which generates outlier data.
And compounding things is this statement:
“Outlier data may not be eliminated from the data used in determining acceptable system accuracy, but may be excluded from the calculation of parametric statistics to avoid distorting estimates of central tendency and dispersion.”
The problem is outliers that are representative of what happens in the real world should not be thrown out to help statistics such as regression and precision from being distorted. Rather these statistics should not be used. An error grid is a perfectly adequate statistic to handle 100% of the data.
The New England Journal has finally changed its mind and recommends confidence intervals instead of p values. The article is here.
Having just read a Clin Lab News article on QC which IMHO is misleading, here are my thoughts.
The purpose of QC is to determine whether a process is in control or not. In clinical chemistry, the process is running an assay. An out of control process is undesirable because it can yield unpredictable results.
QC by itself cannot guarantee the quality of patient results, even when the process is in control. This is because QC does not detect all errors (example an interference).
The quality of the results of an in control process is called the process capability of the process (e.g., its inherent accuracy). QC cannot change this, regardless of the QC rules that are used.
QC is like insurance, hence cost should not be considered in designing a QC program. That is, regardless of how low risk a failure mode is, one should never abandon QC.
Although running more QC can detect an out of control process sooner, any QC program should always protect patient results from being reported when an out of control condition is detected. Risk is not involved.
Performance standards are used in several ways: to gain FDA approval, to make marketing claims, and to test assays after release for sale that are in routine use.
Using glucose meters as an example…
Endocrinologists, who care for people with diabetes, would be highly suited to writing standards. They are in a position to know the magnitude of error that will cause an incorrect treatment decision.
FDA would also be suited with statisticians, biochemists, and physicians.
Companies through their regulatory affairs people know their systems better than anyone, although one can argue that their main goal is to create a standard that is as least burdensome as possible.
So in the case of glucose meters, at least for the 2003 ISO 15197 standard, regulatory affairs people ran the show.
The article, “Getting More Information From Glucose Meter Evaluations” has just been published in the Journal of Diabetes Science and Technology.
Our article makes several points. In the ISO 15197 glucose meter standard (2013 edition), one is supposed to prepare a table showing the percentage of results in system accuracy within 5, 10, and 15 mg/dL. Our recommendation is to graph these results in a mountain plot – it is a perfect example of when a mountain plot should be used.
Now I must confess that until we prepared this paper, I had not read ISO 15197 (2013). But based on some reviewer comments, it was clear that I had to bite the bullet, send money to ISO and get the standard. Reading it was an eye opener. The accuracy requirement is:
95% within ± 15 mg/dL (< 100 mg/dL) and within ± 15% (> 100 mg/dL) and
99% within the A and B zones of an error grid
I knew this. But what I didn’t know until I read the standard is user error from the intended population is excluded from this accuracy protocol. Moreover, even the healthcare professionals performing this study could exclude any result if they thought they made an error. I can imagine how this might work: That result can’t be right…
In any case, as previously mentioned in this blog, in the section when users are tested, the requirement for 99% of the results to be within the A and B zones of an error grid was dropped.
In the section where results may be excluded, failure to obtain a result is listed since if there’s no result, you can’t get a difference from reference. But there’s no requirement for the percentage of times a result can be obtained. This is ironic since section 5 is devoted to reliability. How can you have a section on reliability without a failure rate metric?
- Always tell the truth.
- Don’t offer information that wasn’t asked for. As an example,
FDA: Your study is acceptable.
You: We have another study that also confirms that.
FDA: Oh, tell me about it… Result is a 6 week delay.
- Don’t speculate. As an example,
FDA: What caused that outlier?
You: We think it might be an interfering substance.
FDA: Oh, Let’s review your interference studies…
- Know when to say yes and when to say no.
Agree to change wording, graphs, and so on. Also agree to change calculation methods even when you think your original methods are correct. Challenge a finding that requires you to repeat or provide new studies, unless you agree.
- Don’t submit data that doesn’t meet specifications. Doesn’t sound smart but I’ve seen it happen.