Wrong thinking about evaluating assays

February 3, 2010

I published a Letter to the editor as well as an article about wrong thinking for (glucose) standards (subscription required for both). Here is a companion piece about evaluating assays.

Ideally, the goal in evaluating an assay is to determine the population of differences between the candidate assay and truth for the analyte over the life of the candidate assay. This is not attainable directly because it would mean to assay each patient sample with a definitive reference method. So one takes a small sample (say 100 patient samples) to estimate these differences. And one usually uses a comparative assay rather than a definitive reference assay.

So far there is nothing wrong with the above, but here’s where things go bad. In many cases, people run the evaluation experiment far from the way that the assay will be run routinely. Note that this is always unavoidable to a certain extent. For example, the results of an evaluation experiment are not sent to clinicians, because the assay is not in use. However, one can easily not perform the evaluation in ways that could match routine use. For example, a glucose meter that is designed to have nurses perform a fingerstick, might instead be run with venous samples, perhaps because the fingerstick procedure would cause more error and one wishes to observe just the “analytical” properties of the assay. But the experiment no longer answers the question set forth in the goal. This is because a potential source of error has been removed from the evaluation.

Another problem is how the results will be handled. I have argued that the only meaningful analysis is an error grid analysis, yet other analyses persist such as estimating total error by adding 2 times imprecision to average bias, or calculating six sigma metrics.

However, there is even a bigger issue. Say one runs 100 patient samples and it is estimated that the candidate assay will be used for one million patient samples. This experiment samples 0.01% of the population. The issue is how to interpret the results of this 100 sample experiment. If the results are bad, then one should question the acceptability of the assay. However, if the results are good, one cannot say much. Again, the experiment should be done and it is nice to know the results are good, but more is needed.

To understand what else is needed, consider elements that have either definitely, or probably not been tested in the 100 sample experiment, using glucose as an example:

  • Different interfering substances (some may have been present) including extremes of hematocrit
  • Different lots of reagents, age of reagents, storage of reagents
  • Different environmental conditions (temperature, humidity)
  • Different operators with representative skill levels
  • Evaluating the software
  • Determining the percentage of times a result is failed to be provided
  • And so on

There are two ways this information can be assessed. The first is by the manufacturer, by performing special studies such as factorial experiments, software evaluation, FMEA, FRACAS, and so on.

Since 85% of laboratory error is due to pre and post analytical error and not analytical error, one can’t underestimate the effect of laboratory procedures. The second way is for the clinical laboratory to perform their own FMEA and FRACAS to deal with conditions in their laboratory, since the manufacturer cannot anticipate all laboratory procedures.

To summarize:

  1. The 100 sample evaluation (often less samples) performed by the clinical laboratory is not much more than a cursory check to make sure nothing has gone wrong with the assay in the hands of the laboratory.
  2. The manufacturer performs most of the analytical validation of the assay and some (often simulated) user validation with the FDA evaluating the results.
  3. The laboratory performs FMEA and FRACAS in the context of their procedures.

CLSI documents that support this approach are EP27 (error grids) and EP18 (risk management).


The trifecta in flying media for instruction

January 25, 2010

I wish I had the following while taking flying lessons. I’m putting together this now.

1)      Video of the flight. I’ve followed this advice. Here’s an example from one of my flights.

2)      Audio of ATC (Air Traffic Control) and other comments (such as from the instructor). The advice video above shows how to do this. My video doesn’t have this audio since I haven’t yet received everything I’ve ordered.

3)      A Google Earth record of the flight. One has to have GPS software that can save an XML file of the trip. I have software on my cell phone that does this. The screen shot above is an example of what it looks like. This is pattern work at Norwood (KOWD). With Google Earth, one can replay the trip – the first pattern was a little sloppy.


Six Sigma can be dangerous to your health

January 19, 2010

Last January, this article was published. It is now available here. The citation is: Krouwer JS. Six Sigma can be dangerous to your health. Accred Qual Assur 2009;14:49-52.


Improving the quality of glucose standards

January 3, 2010

Improving glucose standards won’t improve the quality of glucose meters and patient care – at least not directly. However, the biggest problem with current glucose standards is that one can’t tell the state of quality in sufficient detail by evaluating glucose meters against current standards such as ISO 15197.

Having blogged about this for a long time, I’ve now joined forces with an endocrinologist who is well known in the diabetes field David Klonoff , who invited me to write a review of glucose standards and statistics. This has now been published (1). A Letter to the editor is also in press (2).

References

  1. Jan S. Krouwer and George S. Cembrowski A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology, 2010;4:75-83.
  2. Jan S. Krouwer: Wrong thinking about glucose standards. Clin Chem, online version at: http://www.clinchem.org/cgi/reprint/clinchem.2009.140277v1 

And now for something completely different

December 23, 2009

Click on image for full size version.

This is a MapPoint image of the airports that I have landed at mainly while taking flying lessons. To produce this or to produce something like it (e.g. any set of locations), you need

  1. Your own copy of MapPoint (I have the 2006 version)
  2. An Excel file with the latitudes and longitudes of each location you want to display (I got these from http://www.Airnav.com)
  3. Some name you wish to appear on the map (I used the airport names and identifiers) – save the file as the 2003 version 
  4. The following VBA code, modified to your situation
    1. To add the code in Excel, press alt F11
    2. Add a module
    3. Copy and paste the code
    4. Set a reference to Microsoft MapPoint Control 13.0
  5. Open MapPoint
  6. Run the module
  7. Save the picture

VBA Code

Sub OpenDataSet()
    Dim objApp As MapPoint.Application
    Dim oMap As MapPoint.Map
    Dim objDataSets As MapPoint.DataSets
    Dim objDataSet As MapPoint.DataSet
    Dim zDataSource As String
    Dim objRS As MapPoint.Recordset
    Dim ppin As MapPoint.Pushpin

    Set objApp = GetObject(, “MapPoint.Application”)
    Set oMap = objApp.ActiveMap
    ”’ This is where your Excel file is located. Use the 2003 format
    zDataSource = “C:\Jan8100\JansData\HomeStuff\Pilot\JKAirports.xls!Sheet1!A1:F23″
   
    Set objDataSets = objApp.ActiveMap.DataSets
    Set objDataSet = objDataSets.ImportData(zDataSource, , _
            geoCountryDefault, _
            geoDelimiterComma, _
            geoImportFirstRowIsHeadings)
   ”’ This is a purple plane. For other symbols, go to
   ”’ http://msdn.microsoft.com/en-us/library/aa493300.aspx
    objDataSet.Symbol = 89
    Set objRS = objDataSet.QueryAllRecords
        objRS.MoveFirst
        Do While Not objRS.EOF
            Set ppin = objRS.Pushpin
            ppin.Highlight = True
            ”’ The first column in the Excel data set
            ”’ is the airport name and the fourth columns
            ”’ is the airport identifier
            ppin.Name = objRS.Fields(1).Value & “(” & objRS.Fields(4).Value & “)”
            ppin.BalloonState = geoDisplayName
            objRS.MoveNext
        Loop
End Sub


Another EPCA-2 update

December 9, 2009

It’s time to improve assay specifications

December 7, 2009

I’ve been critiquing assay specifications for some time, including:

Assay Standard Organization Reference
cholesterol NCEP 1
glucose ISO 15197 2
creatinine NKDEP 3
hemoglobin A1c NGSP 4

 

For more colorful critiques, go to http://krouwerconsulting.com/ and enter a search term or click on publications. Some of my critiques go back almost 20 years.

These standards have one or more of the following problems:

  • Limits are given for only 95% of the data, so 5% of the data are unspecified
  • The wrong model is used (often total error = bias ± 1.96 X imprecision)
  • Outliers are discarded
  • User error is excluded

The ideal specification should have:

  • Limits for 100% of the data, as exemplified by an error grid
  • A protocol for collecting method comparison data. The protocol should not exclude user error
  • An analysis method, whereby no data is thrown out. The analysis could be as simple as tallying the percentage of data in each error grid zone
  • FMEA and fault tree analysis to evaluate the risk of rare errors

References

  1. Krouwer JS. Problems with the NCEP (National Cholesterol Education Program) Recommendations for Cholesterol Analytical Performance. Arch Pathol Lab Med 2003;127: 1249 (2003).
  2. Krouwer JS and Cembrowski GS. A review of standards and statistics used to describe blood glucose monitor performance. Journal of Diabetes Science and Technology 2010;4:75-83.
  3. Jan S. Krouwer: A recommended improvement for specifying and estimating serum creatinine performance. Clin Chem 2007;53:1715-1716.
  4. See: http://jkrouwer.wordpress.com/2009/12/03/wrong-thinking-about-hemoglobin-a1c-standards/

Appendix – Disagreeing with so many experts

Each of the standard organizations comprises a group of experts and four groups equals a lot of experts! I know people in these groups and respect their expertise. These experts are much more knowledgeable than I am in the clinical chemistry of each analyte. However, another domain of interest is how to specify and measure the quality of these assays. I suspect that these groups are underrepresented in this area.


EPCA-2 Update

December 6, 2009

Go here for a Letter by Dr. Diamandis and the response by Dr. Getzenberg regarding the prostate cancer marker EPCA-2.


Wrong thinking about hemoglobin A1c Standards

December 3, 2009

There will be an article and editorial (subscription required) about 6 of 8 assays that fail the NGSP hemoglobin A1c standard, which is here. As an aside, the NGSP could use a little revision control so that one can understand what is new.

There are problems with this standard. Here’s why. The standard states:

“In order for a commercial method to be considered traceable to the CPRL, the 95% CI of the differences between methods (test method and SRL method) must fall within the clinically significant limits of ±0.85% GHB.”

The problem is this is a measure of the average difference. While it is true that the 95% CI (confidence interval) will fail if there is too much scatter in the differences, reading further suggests a another problem.

“All data analysis will be performed by the NETCORE following Bland and Altman Assessment of Agreement. Outliers will be analyzed for informational purposes only; an outlier is defined as > mean + 3SD of the absolute differences between pairs. All outliers will be investigated by the NETCORE to determine if the discrepancy could be due to characteristics of the specimen rather than the assay method. If results show that a discrepancy could be due to characteristics of the specimen, then the manufacturer will be asked to submit a new specimen and the data will be reanalyzed.”

This doesn’t make too much sense to me. An evaluation should try to estimate performance that will be observed under routine conditions.

1)      Routine conditions don’t include a reference assay with which one can calculate differences.

2)      Eliminating data will provide a biased and too favorable performance estimate

3)      Why should one throw out a result “due to characteristics of the specimen rather than the assay method.” The assay method performance is a summation of many things including how characteristics of the specimen are handled by the assay.

The Bland-Altman approach requires normal (distribution) data. If the data is not normal, it must be transformed.

A simpler specification would be use an error grid, which accounts for 100% of the data.


Resistance in CLSI standards

November 18, 2009

As a consultant, much of my work are projects that I suggest. As might be expected, I have to sell people on these ideas. Sometimes, it is clear that stated objections are not the real reason the client doesn’t want to do the project. This is known as resistance and takes on many forms such as “great idea, let’s do it on the next project” or “great idea, but our problem is different” or “I just don’t understand.”

There have been some reject votes for the revised CLSI total error standard, EP21. Based on several discussions, the two major objections look like resistance. Here’s why.

The first objection is that EP21 is too complicated. Now the people who voted to reject EP21 are all senior people in the field of clinical chemistry. EP21 involves graphing differences between a new and comparison method. In one plot the differences are manipulated with subtractions. This is probably the simplest of all the CLSI evaluation documents.

The second objection is that the revised document includes total error from any source that is present in the evaluation protocol (not just analytical error). So what does this mean? Well, if one were going to evaluate a POC device such as a glucose meter which had fingersticks as a valid sample mode, the evaluation should employ fingersticks as the sampling method for the POC glucose meter under test. It is likely that the comparison method would be a laboratory glucose instrument using a venous sample. This means that differences between test and comparison will include errors due to improper fingerstick technique, which is a user error, not an analytical error. This is appropriate because the goal of the evaluation is to estimate performance in routine use.

So what were the objections? We can’t be expected to evaluate all pre- and post-analytical errors such as problems with the LIS. Let’s develop a new standard for pre- and post-analytical errors.

But the revised EP21 does not suggest evaluating all pre- and post-analytical errors, it advises one not to exclude the opportunity for relevant errors in the protocol, such as the glucose example above.

The consultant’s task is to expose objections that have no merit.