October 25, 2009

Virtually any evaluation protocol has a recommended sample size, or at least a procedure for calculating a sample size. This entry explores some problems with sample sizes.
Assume that it is desired to calculate a sample size for an error grid evaluation (1). See reference one for details, but in an error grid evaluation, one performs a method comparison, plots the results in the grid and calculates the percentage of results in each error grid zone. In an error grid, there are least two areas of interest – the innermost zone (called “A” here) which contains most of the differences and an outer zone (called “C” here) which should contain no results as these differences have a high potential for serious patient harm.
I will skip the discussion of calculating the sample size for zone A. To calculate the sample size for zone C, one needs a goal – assume that one wishes less than one result per million in zone C. It can be shown (2) that the required sample size to prove with 95% confidence that less than one result per million is in zone C is to run 371,000,000 samples and observe no results in zone C. Additionally, the candidate assay has to be run in a representative way (with respect to routine use) in the method comparison. Since this number of samples is a bit much, how can one be confident that the goal for zone C will be achieved? The answer is using risk management techniques. The clinical laboratory has to perform FMEA/fault tree analysis (3) to ensure that user errors don’t cause zone C results and the manufacturer has to perform FMEA/fault tree analysis to ensure that the system itself doesn’t cause zone C results.
References
- CLSI/NCCLS. How to Construct and Interpret an Error Grid for Diagnostic Assays EP27 Proposed Guideline. CLSI/NCCLS document EP27-P. Wayne, PA: NCCLS; 2009.
- Hahn GJ and Meeker WQ. Statistical intervals. A guide for practitioners. Wiley: New York, 1991, pp 103-105.
- CLSI/NCCLS Risk Management Techniques to Identify and Control Laboratory Error Sources. Proposed Guideline –Third Edition CLSI/NCCLS document EP18-P3 Wayne, PA: NCCLS; 2009.
Leave a Comment » |
Clinical laboratory statistics, Medical error, fmea, risk management |
Permalink
Posted by jkrouwer
October 10, 2009

I had occasion to read about the suggestion that some fraction (often 50%) of biological variation should play a role in setting assay performance standards. This makes no sense to me. Here’s why.
The most fundamental measure of assay performance is diagnostic accuracy. That is – sensitivity the percentage of people tested whose assay value is above the cutoff and who have the disease and, and specificity the percentage of people tested whose assay value is below the cutoff and who do not have the disease.
Biological variation serves to decrease diagnostic accuracy. If a person who does not have the disease has a spike in the assay due to biological variation and this elevates the value beyond the cutoff, a false positive is the result. The more biological variation, the more the decrease in diagnostic accuracy. Analytical error does the same thing – the more error, the lower the observed diagnostic accuracy. From a diagnostic accuracy standpoint, there is no difference between biological variation and analytical error. Thus, it makes no sense that the performance of an assay should be allowed to reach 50% of the biological variation.
How should performance standards be set?
Use error grids to define limits where no results should occur (e.g., errors large enough to have high potential to cause patient harm). These limits are called limits of erroneous results (LER) by the FDA. These are the most important limits and are set using clinical judgment.
The area in an error grid to contain most of the results (often 95%) is less important and can be set using performance achieved by existing technology, with the caveat that considerations must be given to special circumstances such as cost, turn around time when it’s important, and so on.
Leave a Comment » |
Clinical laboratory statistics, healthcare economics, laboratory medicine, quality |
Permalink
Posted by jkrouwer
September 21, 2009

Here are some comments (some paraphrased) and suggestions made on a blog about adapting the Aviation Safety Reporting System to reduce the rate of medical errors.
Incident reporting provides “no useful information about the true frequency of errors in an institution.” It’s too expensive, it takes too much time. There’s too much data. We should only report errors that cause temporary or serious harm.
These comments were made by Robert M. Wachter, MD. From his blog’s bio …
“He has published 200 articles and 6 books in the fields of quality, safety, and health policy. He is also a national leader in the fields of patient safety and healthcare quality. He is editor of AHRQ WebM&M, a case-based patient safety journal on the Web, and AHRQ Patient Safety Network, the leading federal patient safety portal.”
There are two ways to reduce the rate of medical errors.
- Lower the probability of errors that have not yet occurred
- Lower the rate of errors that have occurred.
The Aviation Safety Reporting System is an example of tackling #2.
Many (most) errors do not directly cause harm – they have the potential to cause harm. This can be understood be mapping out each medical procedure. To suggest to not report all errors will shortchange the system.
The most important errors receive focus by using a Pareto chart or table. The fastest way to reduce error rates relies on a suitable ranking system.
All of this takes time, training, and commitment.
Some successful medical examples: anesthesiology improvements in the 70s 80s (1). The recent reduction of infections in placing central lines (2).
Dr. Wachter is going in the wrong direction.
References
- Cooper JB, Newbower RS, Long CD, McPeek B: Preventable anesthesia mishaps: A study of human factors. ANESTHESIOLOGY 1978; 49:399-406. An online version of Paper 5 can be found at http://qshc.bmj.com/content/vol11/issue3/#CLASSIC_PAPERS
- Pronovost P. et al. An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU. N Engl J Med 2006;355:2725-32
PS – A commentator who agrees with Dr. Wachter offers a standard bit of resistance to using the Aviation Safety Reporting System in medicine – We’re different – medicine is more complicated.
Leave a Comment » |
Medical error, fracas, healthcare economics, near miss, risk management |
Permalink
Posted by jkrouwer
September 11, 2009

There is/was? a promising marker for prostate cancer called EPCA-2 (1). The person who discovered the test is now being sued by the company that licensed the marker (2). According to the lawsuit:
“Notwithstanding the spectacular (and false) results proclaimed by defendants, the Getzenberg assay was no more accurate in distinguishing cancerous tissue from normal tissue than flipping a coin,”
Among the coauthors in reference 1 are well known clinical chemists. They may have a lot of explaining to do.
Another well known clinical chemist questioned the validity of the results (3). He may be looking pretty good when all of this gets straightened out.
We shall see.
References
- Leman ES, Cannon GW, Trock BJ, Sokoll LJ, Chan DW, Mangold Ln Partin AW, Getzenberg RH. EPCA-2: a highly specific serum marker for prostate cancer. Urology. 2007;69:714-20.
- http://www.pittsburghlive.com/x/pittsburghtrib/news/pittsburgh/s_641304.html
- Point:EPCA-2: A promising new serum biomarker for prostatic carcinoma? Diamandis EP. Clinical Biochemistry 2007;40:1437-1439.
Leave a Comment » |
clinical chemistry, prostate cancer |
Permalink
Posted by jkrouwer
September 2, 2009

Glucose has been in the news lately both in the New York Times and the medical literature.
A standard favoring tight glycemic control was dropped, possibly because the glucose meters used were inaccurate (1-3.)
Glucose meters that use glucose dehydrogenase can give very wrong answers in dialysis patients (4).
And finally, the FDA is considering revising glucose standards (5). This blog entry is about glucose standards revision. The article mentions that FDA is considering revising the performance standards in ISO 15197 which are: 95% of values must be with 20% of reference at 75 mg/dL or above and within 15 mg/dL below 75 mg/dL. The Boyd and Bruns modeling paper is referenced and Bruns is quoted in the article. I have previously critiqued the Boyd and Bruns paper (6).
Here is the main point which is not covered in the article. The main problem with the ISO standard is that it specifies performance for only 95% of the data. This of course leaves up to 5% of the data as unspecified and means that if up to 5% of glucose results had large enough error so that hyperglycemic patients were classified as hypoglycemic and vice versa, that assay would be acceptable according to ISO. This is equivalent to saying that up to a 5% wrong site surgery rate is acceptable! 100% of the data must be specified as is the case with glucose error grids, which predated the ISO guideline.
A second problem with the ISO guideline is that the performance limits ignore user error. But user error contributes to the final result and must be part of the performance specification.
The protocol must also be part of the guideline. In a short method comparison, it is possible to observe no large errors. To supplement this, specific analytical properties of the assay must be specified as well as risk management criteria. There are recent glucose recalls where software was faulty and allowed units to be changed from mg/dL to mmol/L or vice versa without customer knowledge.
I mention in passing that the Boyd and Bruns article referred to the article underestimate total user due to an inadequate model which fails to account for interferences. Reference 4 is an example of interferences and responsible for at least 13 deaths.
References
- See: http://www.nytimes.com/2009/08/18/health/policy/18diabetes.html?_r=1&scp=1&sq=glucose&st=cse
- Intensive versus Conventional Glucose Control in Critically Ill Patients. NEJM 2009;360:1283-1297
- Scott MG, Bruns DE, Boyd JC, and Sacks DB. Tight glucose control in the intensive care unit: Are glucose meters up to the task? Clin Chem 2009;55:18-20.
- See: http://www.fda.gov/MedicalDevices/Safety/AlertsandNotices/PatientAlerts/ucm177189.htm
- See: http://www.aacc.org/publications/cln/2009/september/Pages/inside0909.aspx
- Krouwer, JS. How to Improve Total Error Modeling by Accounting for Error Sources Beyond Imprecision and Bias, Clin Chem 2001;47:1329-30.
Leave a Comment » |
Clinical laboratory statistics, ISO, risk management |
Permalink
Posted by jkrouwer
August 29, 2009

DB responded to my entry about “Just because it’s not easy to measure …”, which gave me mixed feelings. I am thankful that he acknowledged that his statement was not what he meant but I must admit that I am in awe of DB – hence I was uncomfortable that he would apologize.
Since I follow his blog, my first instinct was to comment that I agreed with the comment made by “Curious” in the entry that had the quote. But DB didn’t respond to Curious’s comment so I left it alone.
In my field of laboratory medicine, I will continue – having started about 20 years ago – to advocate for measuring everything that’s important. This includes not just measuring the easy things like precision and bias but also the difficult things like rare interferences or user errors.
I added the quote from DB, because it’s important for guidelines. With healthcare reform, we can expect more rather than fewer guidelines in part fueled by the 1.1 billion for comparative effectiveness research. Obama said:
“The point is we want to use science, we want doctors and medical experts to be making decisions that all too often right now are driven by skewed policies, by outdated means of reimbursement, or by insurance companies.“
I became aware of this from a Dr Rich entry. Comparative effectiveness research will be based on data, analyzed to yield measurements, which will turn into conclusions and recommendations. There is a famous statistical example about how difficult it is to measure things. Youden (1) compiled 15 different estimates of the astronomical unit from scientists who estimated that quantity over the years 1895–1961. The confidence interval constructed by every scientist did not overlap the confidence interval of his predecessor. The difficulties are only greater in medicine. Just getting agreement on definitions is important as I cited an example for side effects of prostatectomy where urinary incontinence was defined as using greater than 3 pads per day implying that less than 3 pads per day = continence. Maybe urologists could agree with that definition, but I don’t think patients would.
References
Youden WJ. Enduring values. Technometrics 1972;14:1–11.
Leave a Comment » |
Clinical laboratory statistics, healthcare economics, laboratory medicine, prostate cancer |
Permalink
Posted by jkrouwer
August 25, 2009

The American Diabetes Association (ADA) has revised its recommendation for diagnosis of diabetes and now recommends using hemoglobin A1c to diagnose diabetes (1). They also say:
A1C tests to diagnose diabetes should be performed using clinical laboratory equipment. Point-of-care instruments have not yet been shown to be sufficiently accurate or precise for diagnosing diabetes.
Maybe this is a trend to slow down the adoption of point-of-care (POC) assays.
Scott, et. al. speculate that one of the reasons that tight glycemic control (TGC) in ICUs has been dropped as a guideline is that the use of POC glucose meters (meaning less accurate) as opposed to laboratory assays may have contributed to the adverse findings of TGC.
This will also mean that the CLSI standard POCT09-P Selection Criteria for Point-of-Care Testing Devices will need to be revised since although they suggest conducting performance evaluations, their examples of benefits of POC assays now include two examples where the accuracy of POC tests have either been rejected or questioned (1-2). POCT09-P also cites the benefit of a POC troponin assay, where performance was tested by surveying clinicians – hardly a rigorous test (3) and not conforming to their own recommendation for a real evaluation.
References
- International Expert Committee Report on the Role of the A1C Assay in the Diagnosis of Diabetes. Diabetes Care 2009;32:1327-1334.
- Scott MG, Bruns DE, Boyd JC, and Sacks DB. Tight glucose control in the intensive care unit: Are glucose meters up to the task? Clin Chem 2009;55:18-20.
- Lee-Lewandrowski E, Corboy D, Lewandrowski, K, Sinclair J, McDermot S, Benzer, TI. Implementation of a Point-of-Care Satellite Laboratory in the Emergency Department of an Academic Medical Center. Archives of Pathology and Laboratory Medicine 2003;127:456–460.
Leave a Comment » |
CLSI, Medical error, Quality control, clinical chemistry |
Permalink
Posted by jkrouwer
August 24, 2009

There have been several objections to measuring errors that are not as easy as calculating a standard deviation.
One comment was – pre-analytical error is important but can’t be measured in a method comparison protocol. It needs to be handled by risk management.
Similar arguments were made during a meeting for measurement uncertainty, where it was suggested that large but rare analytical errors be handled by risk management.
DB writes:
I favor limited guidelines, but not measurement. Measurement has too many unintended consequences.
ISO 15197, a standard for home glucose meters has a specification for total analytical error which does not include errors due to pre-analytical error. User errors are to be evaluated separately and without stating any analysis procedure:
Results shall be documented in a report
Unfortunately, risk management as used by these people means sweeping these problems under the rug and is the same as DB’s advice about not measuring things or the ISO guidance.
It’s time to start measuring errors from all sources – it’s possible and necessary.
1 Comment |
Clinical laboratory statistics, ISO, Medical error, laboratory medicine, risk management |
Permalink
Posted by jkrouwer
August 17, 2009

Non specificity in diagnostic assays (interferences) is a problem. For example, using the search tool in Clinical Chemistry from 2006 to date, yielded 45 references. See reference 1 for one of the 45 (1).
Interferences can be thought of in two ways:
- a particularly bad interference will cause a huge error in a result
- a result can also exhibit smaller errors either by a single interference or by a combination of interferences, whose net effect is a smaller error.
Manufacturers study extensive lists of candidate interfering substances. Unfortunately, the way that many manufacturers report the results of interfering studies in package inserts is misleading with respect to case 2 above.
Manufacturers often cite a CLSI document – EP7A2 – which states that a claim can be that the following compounds: “were found not to interfere at the concentrations indicated.” But this only means that a bias of less than 10% was found. Later in the document, alternative bias claims are presented, which added to the first claim are:
- Substance did not interfere (< 10%)
- The observed amount of bias due to interference
- the maximum amount of bias due to interference that could occur
It’s not surprising that manufacturers tend to choose method 1 for claims. Besides being misleading, claim 1 is wrong statistically, since it amounts to stating that the null hypothesis has been proved, which is impossible.
There is an opportunity for CLSI to revisit its canceled standard – EP11 – Uniformity of Claims
References
- Dimeskia G, Jones B and Ungerer JPJ Interference from Rose Bengal with Total Bilirubin Measurement. Clin Chem 2009 55: 1040-1041.
Leave a Comment » |
CLSI, Clinical laboratory statistics, Medical error, clinical chemistry |
Permalink
Posted by jkrouwer
July 29, 2009

I have been going to the AACC (American Association for Clinical Chemistry) meetings for many years. Each year, about a month before the meeting, I start getting mail from companies which describes what products the company will feature at the meeting, invitations to workshops, and the location of their booth.
During the meeting (when I have already left) and after I return home, and without fail, I continue to receive more mail, again announcing products I should see. It makes one wonder what’s going on in these companies. Next year I will show the distribution of mail received vs. date.
Leave a Comment » |
clinical chemistry, laboratory medicine |
Permalink
Posted by jkrouwer