Randomized Clinical Trials (RCTs) vs. the use of historical data to compare existing treatments

December 20, 2008

rct2RCTs are experiments whereby a similar set of patients are randomly assigned one of two treatments. Often, one treatment is a new experimental treatment and the other is a placebo.  Success of the new treatment is a statistically different effect between the two treatments with respect to the outcome measure. An outcome measure is defined for a study (for example, 5 year survival rate). With historical data, patients who have been treated in two or more ways are analyzed as to the outcome measure. Since the patients have not been randomly chosen as to the treatment group, it is attempted to find groups of patients who are similar.

RCTs are thought to be the gold standard in assessing treatments. RCTs are also advocated to compare existing treatments in addition to establishing efficacy of a new treatment. This contribution is limited to assessing RCTs vs. historical data to compare existing treatments.

To make this discussion less abstract, an example is used; namely treatments for prostate cancer. One would like to compare these existing treatments for success and side effects outcome measures. A typical outcome would be the 5 year percentage of patients free from biochemical evidence of disease (as defined by PSA measurements). Side effects include incontinence, impotence, and others.

Round 1 – Treatment selection – In a RCT, typically two or perhaps three treatments would be selected. Many treatment categories can be further subdivided. For example, prostatectomy can be divided into the open procedure (which can be subdivided into where the incision is made) and a laparoscopic procedure, which may or may not be robotic. If one studies only one type of prostatectomy, then the results don’t apply to the other types. If one wishes to include all types, then the size of the trial becomes too large.

One must also consider patient eligibility. Typically, besides excluding patients for a number of reasons, patients are grouped into low, medium, and high risk categories. This again causes a strain on sample sizes.

Consider historical data. There are over 200,000 men diagnosed with prostate cancer each year. If one goes back ten years, this includes 2 million men. Provided the data for these 2 million men is accessible (or a subset of 2 million that is still large), there will sufficient sample size to compare different treatments, including treatment subcategories and patient risk groups.

Round 2 – Randomization – Patients in a RCT are already a biased set. Consider an eligibility requirement. If the trial were to compare radical prostatectomy (RP) to external beam radiation therapy (EBRT) then each patient would have to be eligible for either treatment. But say someone who needed therapy had a previous stroke and was therefore not a candidate for surgery. This person would be excluded from the study and thus the set of patients in the study is a medical subset of the general population of patients who need treatment. This subset has bias because one needs healthier patients for the RCT because healthier patients are needed for surgery. There is another way the set of patients is a subset; some patients after being explained about the two treatments, may opt out of the study because they prefer one treatment. Thus, the set of patients in the study, is indifferent to which treatment they will get and indifference vs. preference may be important.

Historical data is more relevant to the real world. Some patients will have researched different treatments and selected the one they preferred, others will have accepted whatever their physician advised. The same biases as the RCT will be in historical data. That is, the stroke patient won’t get RP.  But the effect of these biases can be explored through data analysis.

Round 3 – Time – A RCT will by definition not provide results for 5 years (the outcome measure is the 5 year rate of no biochemical evidence of disease).

If treatments have existed for a number of years, historical data will provide results as soon as the data analysis is complete.

Round 4 – Cost – A RCT is expensive. Analysis of historical data isn’t.

Round 5 – Reliability – Reliability can be measured by sample size and bias. Assume for a moment the bias is the same. The following gives an idea of reliability.

RCT – assume 1,000 patients are treated either of two ways and also according to low, medium and high risk. This means one treatment has 500 patients and for each risk group 167 per treatment (numbers of course would not be exactly equal for each case).

Historical data – assume 2 million patients over ten years with half of the data usable. This leaves 100,000 per year. Following the same logic for the RCT gives 16,700 patients per risk stratification per treatment per year. One can use 5 years of data which gives 83,500.

What is the confidence interval (95%) for an 80% success rate for a treatment from either the RCT or historical data?

Trial Type

Total N

Success N


Low CI

High CI







Historical data







Round 6 – Exploring the data – Clearly, with a huge number, one can explore the data in ways not possible with a RCT. For example, for RP, one can examine the outcome measure and each side effect as a function of the hospital type where the surgery was performed, the number of surgeries performed by the surgeon, the time of day the surgery was performed, and so on. This can be done for all types of surgery.

The winner – In this example, historical data is the winner provided the data exists. If it doesn’t, it’s worth spending money to make the data available rather than spending money for a RCT.

Added 1/4/09 – All of the above is for two treatments that are thought to be equivalent. Consider the case that treatment A is generally considered to be superior to treatment B based on historical or anecdotal data, but without studies from a randomized clinical trial. Proponents of treatment B might insist that “the jury is still out” with respect to treatment superiority and insist that treatment superiority can only be answered by a randomized clinical trial, but it is unlikely that enough patients could be found that are indifferent to which treatment they receive so the trial will never take place.

ISO 15189 Accreditation for Clinical Laboratories

December 3, 2008


documentI am not a big fan of ISO standards I have seen and have said so. For example I have critiqued ISO 15197 (standard for glucose meter performance) because it doesn’t specify useful performance criteria (1). I have also said that ISO 9001 doesn’t guarantee anything about quality (2).


ISO 15189 is based on ISO 9001 but geared towards clinical laboratories. It now appears that CAP is using ISO 15189 to accredit clinical laboratories (3). My views are:

Accreditation is a good thing. I would not want my results to come from an unaccreditated clinical laboratory. But it should be understood that clinical laboratories that fail the accreditation process are rare and that when problems do occur, they are usually from accreditated clinical laboratories.

So what is my beef with ISO 15189? OK, I have not read the 2007 version, but if it is anything like the 2003 version then it is similar to ISO 9001 whereby accreditation success is judged by the documentation that the organization has, to show that it is following the processes that it has developed in all areas specified by ISO 15189 (which covers all important areas in the clinical laboratory).

The problem with this is the processes themselves may not be optimal with respect to the one quality measure that is important – the error rate in the clinical laboratory – and that (the error rate) is usually not tracked in a meaningful way such as would occur in a FRACAS (Failure Reporting And Corrective Action System). Instead, the lab is judged on its documentation of how well processes are followed. Reference 2 gives examples of how this can go wrong.

Now there is a lot of hype in reference 3 about how wonderful things are when using ISO 15189. But the only thing that matters is the error rate.



1.       Krouwer JS. Six Sigma can be dangerous to your health. Accred Qual Assur 2008; in press, see: http://www.springerlink.com/content/5t379823t3766109/fulltext.html

2.       Krouwer JS. ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry. Accred. Qual. Assur. 2004;9:39-43

3.       See: http://www.cap.org/apps/cap.portal?_nfpb=true&cntvwrPtlt_actionOverride=%2Fportlets%2FcontentViewer%2Fshow&_windowLabel=cntvwrPtlt&cntvwrPtlt%7BactionForm.contentReference%7D=cap_today%2F1108%2F1108_ISO_15189_approval_02.html&_state=maximized&_pageLabel=cntvwr







PSA Cutpoints

December 2, 2008


PSA Cutpoints

Vickers and Lilja suggest that cutpoints in general are not a good idea and use PSA as an example (1). To summarize their point of view,

·         The rationale for cutpoints is often unknown (or irrational as with the use of normal ranges)

·         Cutpoints are invariant to patient preferences 

·         Risk prediction should be used in place of cutpoints

·         Patient preferences could be taken into account with risk prediction (but not with cutpoints)

My response to these views is as follows:

·         The value of cutpoints depends on the properties of the marker. PSA is not a very good marker. Men with a negative DRE who are biopsied when their PSA is between 4 µg/L and 10 µg/L do not have prostate cancer (according to the biopsy) 75% of the time whereas, men with a negative DRE and PSA results between 2 µg/L and 4 µg/L have prostate cancer 15% of the time.  So even though there may be no rationale for a PSA cutoff, its properties are known.

·         Whereas lab tests play an increasing role in medical care, they are not the only source of data. Primary care physicians assess history (example: 1st degree relatives with prostate cancer), physical exam (example: DRE) and lab tests (example: PSA) to arrive at an overall assessment of the need to refer the patient to a urologist. The assessment is a risk prediction but is based on more than just the PSA test. Of course, there are other sources of data beyond the ones mentioned as examples.

I don’t quite understand what the authors’ mean when they say “cutpoints cannot include multiple pieces of information.” OK, a 4 µg/L may be a cutoff for PSA regardless of whatever other information is available, but that does not mean a physician would not use the other information. For example, if a man had a history of prostate cancer in his family, a suspicious DRE, and a PSA of 3.5, this would prompt many physicians to recommend a referral to a urologist in spite of the fact that the PSA value is below 4.

·         For patient preferences to be taken into account and to be meaningful, the patient has to be informed (to have a preference). It is not clear how this would occur. If a patient had a PSA value of 1.5 µg/L, a negative DRE, no symptoms, and no 1st degree relatives with prostate cancer, it is hard to imagine a primary care physician starting a process to inform the patient about the probabilities of prostate cancer (to account for a certain class of patients). One must realize that this type of process of informing the patient would apply to every medical encounter and is hardly commensurate with the limited time that a patient spends seeing a primary care physician. Thus, the point at which the physician initiates the process of informing the patient may already be at a higher risk than the patient prefers.

·         Ideally, the patient should ask for a written report of every primary care physician visit, including the results of the physical exam and history, all lab tests, and all other procedures such as an EKG. The patient would then have to acquire sufficient knowledge to understand the implications of this information. This would be a monumental task although it could be eased by hiring other medical professionals although this would be costly.

·         The biases of the primary care physician play a role in informing the patient. The practice of primary care physicians is affected by insurers. There may be an incentive to minimize the referral of patients to a urologist or to refer patients to a urologist who biopsy frequency is lower than his peers (and less costly)(2). The biases of a urologist, to recommend those procedures performed by that urologist and to discount others, must also be considered.