Discrepant Analysis in Every Day Situations

January 18, 2006

Abnormal patient sample results are often repeated, whereas non abnormal results are usually not repeated. The same is true for QC; namely, that values that are “out” are often repeated, whereas values that are “in”, are usually not repeated. This essay considers some attributes of this practice.

Discrepant analysis for evaluations

Before considering the above cases, consider discrepant analysis for evaluations (1-2). For example, consider an assay evaluation, where one compares an assay result to a gold standard diagnosis (3). Usually, most assay results will agree with the gold standard diagnosis, but there will be a few exceptions. In discrepant analysis, assay results that don’t agree are rerun. This results in either confirmation or non confirmation of the discrepancy. This is shown graphically below.

Candidate assay positive Candidate assay negative
Reference Method positive Result agrees Discrepant
Reference Method negative Discrepant Result agrees
Repeat Discrepants only  
Candidate assay positive Candidate assay negative
Reference Method positive Result agrees Discrepant
Reference Method negative Discrepant Result agrees

In the second table, the number of discrepant results are either the same or lower than the first table (with the number of “result agrees” either the same or higher).

Before continuing, one may ask what is the root cause for the discrepant result. Whereas there can be many root causes, consider two generic top level effects – the result upon rerun either gives essentially the same value (e.g., remains discrepant) or gives a different value (for the sake of argument, assume the new value is no longer discrepant). The first case is consistent with a fixed bias, whereas the second case is consistent with a random error.

So far, there is nothing wrong with the above practice and it is natural to try to explore discrepant samples. Where people get into trouble is estimating things with the results of the study (which of course is almost always the case, or else why do the study). In such an evaluation, one wants to estimate the analytical sensitivity and analytical specificity of the assay. The problem is, constructing these estimates using the discrepancy procedure above (e.g., results from the second table) is biased, and results in diagnostic accuracy estimates that are too optimistic (1-2). One can think of this bias intuitively. That is, consider a sample that agrees with the gold standard diagnosis. Were this sample rerun, it might yield a discrepant result upon rerun (due to statistical variation and especially if the initial result were close to being a discrepant result in the first case). But there is no chance for this to occur because in the above procedure, only samples that are initially discrepant are rerun. Hence, the estimates are too optimistic.

To summarize, one could of course run replicates for all samples, but this might add too much expense to the study. It is reasonable to try to resolve only the discrepant results by rerunning them as long as one calculates sensitivity and specificity correctly (see references 1-2).

General Comment about Discrepant Analysis in Every Day Situations

In every day situations, discrepant analysis is use to inform about an action to be taken. For a patient sample result, the result of discrepant analysis will inform between two treatment alternatives (typically corresponding to those associated with a “normal” or “abnormal” result). In QC, the result of discrepant analysis will inform whether or not to rerun a block of patient samples and to troubleshoot the assay.

Discrepant analysis for Patient Samples

In routine practice, patient sample results are often repeated according to rules set up by the clinical laboratory and in a typical case, only abnormal results are rerun.

This is not the same as the discrepant analysis discussed in the section on evaluations because there is no reference method. However, the same arguments apply, because only selected patient sample results are repeated. One could perhaps consider the result slated to be repeated as discrepant from a working hypothesis that the result was “normal”.

The practice of repeating only selected samples was questioned in another essay with respect to troponin I.

To recall, in that study a point of care assay result for troponin I was repeated only if the result was above the cutoff. The study was used to support a reduced length of stay in the emergency department.

The bottom line is what is the performance of the assay (e.g., its analytical sensitivity and analytical specificity) given the specific set of clinical laboratory’s practice of only repeating selected samples. This is likely to be different than the analytical sensitivity and analytical specificity of the assay as determined by an evaluation.

Discrepant analysis for QC

The same arguments apply to QC. That is, when QC is out, one of the first (troubleshooting) steps is to repeat the QC to see if it is repeatedly out (see cases 1 and 2 below). Yet, a QC that is “in” is not repeated. Note, that for multiple rule QC programs, nothing really changes, because if one needs three observations to fulfill a criterion before QC is considered out, then one can simply consider that set of observations as a case. So, again, the bottom line is what is the performance of the QC procedure (e.g., the equivalent of analytical sensitivity and analytical specificity) given the practice of only repeating discrepant samples.

QC is different than the patient case above, because the result of QC affects a block of patient samples. For example, if a patient sample result was initially abnormal, but normal upon several repeats, then it is likely that random error caused the initial result to be abnormal and that the result is normal and can be so reported. This type of argument does not apply to QC. If a QC sample that is “out” does not repeat as “out”, one has no way of knowing whether the cause of the initial “out” result affected one or more patient sample results.

Moreover, with QC one can distinguish between the following cases:

Case 1 – QC is out. Rerun the QC sample. If the rerun is in, declare the run OK. This is the discrepant analysis case under discussion.

Case 2 – QC is out. Declare the run out and rerun the patient samples. Rerun the QC sample as part of a procedure to troubleshoot the assay. This is not discrepant analysis since the decision to take action (rerun the patient samples and troubleshoot the assay) has already been made.

Acknowledgement Helpful comments were provided by Sten Westgard.

References

1.       Quantifying the bias associated with use of discrepant analysis. Lipman HB, and Astles JR Clinical Chemistry 1998;44:108-115.

2.       User protocol for evaluation of qualitative test performance: Approved Guideline EP12A 2002 CLSI 940 West Valley Road, Suite 1400, Wayne, PA 19087.

3.       There are several variations to this scheme with respect to the accuracy of the gold standard diagnosis and whether the repeat assay uses a different (e.g., better reference procedure). These are beyond the scope of this essay.

 


The quality of quality initiatives

January 17, 2006
An organization starts a quality initiative. But what’s the quality of the quality initiative. Here are some examples where quality was less than ideal and some recommendations to position quality to senior management so that these problems are less likely to occur.

Culture

A high level executive, rumored to retire as he has not progressed for the last few years, is given a new assignment to lead a quality initiative for the organization. To start, each department holds several sessions during lunch to roll out the new quality program.

What’s wrong with this picture? First, the leader of the quality program sounds like an executive who no longer has much value to the organization. His next step was to be out of the organization but he was given this assignment instead. This sends a  message that an assignment in quality is for losers. This high level message will permeate through all levels of the organization.

Second, meetings are held during lunch, with lunch provided. Now lunch is supposed to be a break from work. Some people exercise, others read a book, surf the net, and so on. Having the quality meeting during lunch sends the wrong signals:

  • that “real” work should not be interrupted by this quality initiative e.g., the quality initiative is not as important as real work as should not be held during regular working hours
  • the quality initiative is a form of entertainment or relaxation, as are other lunchtime activities

A successful quality initiative requires the right culture to be in place as well as the right tools. The above are examples of quality problems with the culture. Another example of a cultural quality problem could be bringing in consultants to conduct the quality initiative (ok, this is a more sensitive issue, since I am a quality consultant). How the quality consultant is brought into the organization determines whether this is a cultural problem. If there is no attempt to obtain in-house expertise, then quality will not be perceived as a core value.

Tools

I once participated in an all day off site meeting to try to improve the reliability of a new medical diagnostic instrument system that has just gotten underway. The previous instrument had poor reliability, yet during the meeting there was no evidence anything was in place to prevent a recurrence of reliability problems. Around the room, each person said things like, “I’m not going the make the same mistakes again.” I couldn’t help thinking, of course not, who would, you will make new mistakes. The point is that for quality improvement to be meaningful, there needs to be

  • realistic goals
  • ways of measuring progress
  • tools / processes that will help achieve goals

Yet, in this meeting, there was an unrealistic goal, no system proposed to measure progress, and no new tools.

Recommendations

Almost all organizations make decisions based on financial considerations, hence the case for quality should be financial. Whereas this has been said before, here are my opinions.

When not to worry about quality – You are trying to get new technology to work – for example a completely non invasive glucose assay. You don’t have to worry about quality until after the technology works (and you’ve convinced the FDA).

How not to make a case for quality – Use non financial arguments such as “it’s the right thing to do”, “it’s what the customer wants”, “it’s part of our core values” etc. These arguments may result in the culture and tools problems mentioned above.

How  to make a financial case for quality – The financial case for quality typically includes two situations:

  • cost is reduced, which adds to the bottom line and or
  • risk is reduced

The reduction in cost is straightforward and can be made using standard financial analysis. For example, for a medical diagnostic instrument, a service call for a system under warranty may cost the company $1,000. If there are 5 service calls per instrument per year and 1,000 instruments under warranty in the field, then this costs the company five million dollars per year. If you could reduce the service call rate to 3 service calls per instrument per year, you could save the company two million dollars per year minus the cost of the quality program. There might also be a staff reduction in service for an additional savings.

Unfortunately, risk reduction is extremely important but cannot be quantified using standard financial analysis. However, decision analysis based financial models will account for improved profitability based on programs (e.g., quality programs) that reduce risk. These models are probabilistic. So just as one can smoke and never get lung cancer, on average the risk for lung cancer is higher for smokers. Similarly, companies can quantify the gains in profit on average for the amount of money spent on quality programs and the amount that risk is reduced. However, risk reduction is a hard sell compared to the non invasive glucose program, which might be sold as bringing in over a billion in revenue when released and of course solutions are in place for the last remaining technical problem.

Increased revenue was not included as a bullet. In the medical field it is a hard case to prove as most medical consumers expect quality. This may change if quality measures become available to consumers.

By the way, if you work for a non profit organization, all of the above still applies.


Software Validation

January 13, 2006

When I worked in R&D for diagnostic companies, part of what I did was to write software. These included SAS programs to evaluate assay performance. Some of the results of this work (also performed by others in my group) ended up in the instrument and affected result accuracy. For example, method comparison to reference was used to establish calibration algorithms that were hard coded in the instrument. I also wrote Excel programs in vba (visual basic for applications) for the scientific staff so that they could analyze data from the field in near real time. A senior developer wrote the other part of this software (a major effort) – communication with field instruments and data acquisition and transfer to a large database.

For all of this, there was no “formal validation” until the FDA changed the way it inspected sites and required R&D to validate software (e.g., in the same way that manufacturing had always done). Of course R&D complied. The purpose of this essay is to point out some misconceptions with formal validation.

Formal validation comments

Exclude the developer from the validation – Before formal validation, the developer would validate his or her own software, a practice that is avoided in formal validation. The benefit in validating one’s own software must be considered in the following perspective – we were not contract or journeymen developers – we were extremely knowledgeable in the subject matter. This meant that we could test the reasonableness of the results and spot problems on that basis – formal validation usually has no provision for this and often, the people who validate the software have no knowledge of the subject matter.

Formal validation tends to exclude creativity – Formal validation follows a set of rules. For example, there are test protocols based on the user interface specifications with a variety of inputs. As early users of new software, our group had a reputation of finding bugs in formally validated software. One reason is that we thought up test conditions during testing (e.g., playing around with the software) that were not covered by formal validation protocols. There is no mechanism for this in formal validation.

One hundred percent coverage is misleading – Formal validation involves the concept of 100% coverage. This is misleading. Due to branching and the many variations of inputs, true 100% coverage of software cannot be achieved. Whereas this is known by software professionals, it is often misrepresented to or by management.

Formal validation may simply be a checkmark – The quality of software validation cannot be guaranteed, simply because it has been checked off upon completion and there may be pressure for this milestone to be reached.

The ideal case

Ideally, 1) software validation will be conducted by the developer, as before (although it is not called “formal validation”) but a step in development. 2) People who have a talent for finding errors are given this task and proceed by informal methods. 3) Formal validation is then conducted which has the benefit of discovering other errors.

A danger is that less effort will be spent on procedures 1 and 2 because 3 is required.