Reproducibility in Clinical Trials is Like Trying to Catch a Rabbit by the Tail

Scientists are not routinely sloppy, clinicians don’t usually cheat, and clinical research is generally well designed. So why is it so hard to reproduce results?

In mid-August of this year, a major Alzheimer’s disease journal cleared the biotech company Cassava Sciences of charges of data manipulation in a manuscript that the company had previously published in the journal. The charges belonged to a series of data manipulation allegations against the company by multiple experts in molecular neuroscience over the past year. Such allegations of overt scientific fraud draw headlines, as this case did, but they are rare. More common in clinical research are cases of big, clinical trials producing what seem like promising outcomes that subsequent trials fail to replicate. 

“Reproducibility of results is what separates case reports and ‘open-label’ studies from randomized, double-blind, clinical trials,” neurologist Andrew N. Wilner told To drive home the point, Wilner, an associate professor at the University of Tennessee, recounted the story of Paolo Zamboni, a doctor who in 2009 reported encouraging results of a novel treatment for multiple sclerosis, a neurodegenerative disease affecting an estimated 2.8 million people. Zamboni disagreed with the widely accepted idea that the disease results from the immune system attacking the myelin sheath that surrounds neurons, so he pursued his own hypothesis and treatments, and used open-label studies to substantiate his approach. Unfortunately, Zamboni’s treatment was soon abandoned because nobody other than he could reproduce the results of his study.

Wilner’s distinction between open-label studies and randomized controlled studies might shine a spotlight on Cassava, if not for the fraud scandal. The company’s therapy has been the subject of randomized controlled trials, but also of open-label studies with highly publicized results that some might consider hyped. But while randomized controlled trials are generally larger, more statistically significant, and more trustworthy than open-label trials, it’s not uncommon for randomized controlled trials to be irreproducible, even when the trials are well-designed and executed by dedicated physicians and researchers.

“It’s the scientific method’s power of reproducibility that underlies the U.S. Food and Drug Administration (FDA) requirement that TWO phase 3 clinical trials produce convincing and similar results before a new medication receives FDA approval that it is safe and effective,” Wilner notes. 

But when it comes to defining the words similar and reproducible, therein lies the rub. Unlike testing the same amount of the same antibiotic on the same strain of bacteria two days in a row, double-blinded late-stage clinical trials are really complex, often involving thousands or tens of thousands of people across multiple medical centers, and the application of knowledge that grows by the day. So reproducibility in the full sense of the word may not in fact be reasonably achievable in the real world by all researchers, and there are different reasons why this can happen.

“We’re always looking for that finding that we think would be applicable across populations, across time, across geographic populations, and that’s a myth.”

The underlying pathology of almost every disease is complex, and the mechanisms and off-target effects of the medications of other therapies being tested can be even more complex. Demographics of the people participating in clinical trials can change over time, as can the means of generating data. Many measurements are based on the lab values in blood tests. But these values are often calculated, rather than measured directly, and over time the calculations are changed as medical knowledge expands. It’s even possible that the chosen placebo turns out not to be inert, with respect to what the study is designed to measure. None of this reflects poorly on researchers, but it illustrates how challenging big clinical trials can be.

“We’re always looking for that finding that we think would be applicable across populations, across time, across geographic populations, and that’s a myth,” MIT physiologist Leo Anthony Celi told Celi says that medicine prizes those robust studies whose conclusions can be valid indefinitely. In reality, this idea is just plain wrong and studies simply have to be repeated every so often as medicine advances and the world changes.

If there’s any particular area within medicine where this is most true, it’s the evaluation of treatments and preventive measures against atherosclerotic cardiovascular disease. That’s the accurate term for heart problems and blood vessel problems elsewhere in the body caused by atherosclerosis, the formation of plaques triggered by an inflammatory process, which leads to blood vessel narrowing and hardening, clot formation, and ultimately blockage. The size of clinical trials for this indication, our exponentiating knowledge around it, and all the moving parts of the science make research so overwhelmingly complicated that the goal of having reproducible clinical trials for it should really be graded on a curve. For simplicity, we’ll call atherosclerotic cardiovascular disease “heart disease” from here on, but keep in mind that there are many other ways that the heart can get sick, and other organs that also get sick from atherosclerosis.

Being one of humanity’s main causes of death, heart disease has demanded a high quantity of resources and brain power devoted to this area of health care. Moreover, the underlying disease process, and therefore its prevention and treatment strategies, involve a plethora of biological pathways related to the processing of lipids (fatty, waxy, or oily biological molecules that do not dissolve in water) and the involvement of these molecules in the inflammatory process. This makes for a huge number of moving parts, whether one is studying nutritional influences on heart disease, genetic factors, or drug therapy against this disease category, including drugs derived from nutrients such as omega-3 fatty acids. 

“Science is evolving, so a trial done two years later with the same therapy may have a different background medication regimen that somebody is on when being studied,” notes preventive cardiologist Ty Gluckman from his office at the Providence St. Joseph Heart Institute in Portland, Oregon, where he directs the Center for Cardiovascular Analytics, Research, and Data Science (CARDS).

In fact, Gluckman has a lot to say about serum lipids, whose concentrations in the bloodstream constitute the lipid panel that you get each time that you go for a medical checkup. But serum lipids hog the spotlight in connection with heart disease—both for its treatment and its prevention. Each particular lipid has a specific mediating role. They don’t all promote inflammation and atherosclerosis. Some, if they are abundant in the diet, do the exact opposite, preventing cardiovascular problems while also producing other health benefits, like lowering blood pressure and reducing heart rate. This is where the story gets tricky, scientifically. Teasing out the specific benefit of a particular dietary lipid may involve numerous clinical trials—all large, randomized, double-blind studies, meaning that neither the people enrolled in the study nor researchers know who is receiving what. Studies aimed at lowering the concentration of particular serum lipids, especially low-density lipoprotein cholesterol (LDL-C), the notorious “bad cholesterol,” run the gamut. They test interventions across the spectrum, from prescribing different diets, new exercise regimens, novel drug therapies, or some combination of each. 

The various cardiovascular clinical trials targeting LDL-C are identified with memorable acronyms like JELIS, REDUCE-IT, STRENGTH, PEGASUS, and my favorite acronym, JUPITER. But despite how easy these trial names are to recall, their results are often tough to repeat. The trials won’t show the kind of concrete reproducibility on par with, say, the effects of an antibiotic against bacteria. That kind of reproducibility is not possible with more complex clinical issues, but that’s also why researchers must design trials with all efforts to avoid bias, and to identify bias when looking back at trial results.

“Randomized clinical trials still represent the mainstay, the gold standard for assessing the efficacy and safety of a given drug,” Gluckman says. He points to two particular omega-3 fatty acids, one called eicosapentaenoic acid, the other called docosahexaenoic acid. 

Both docosahexaenoic acid and eicosapentaenoic acid are abundant in cold-water fish, such as salmon, mackerel, tuna, and sardines. Diets rich in these fish, like the Mediterranean diet, are associated with cardiovascular benefits. Since the nutritional supplement industry is aware of this association, fish oil pills containing both of these omega-3 fatty acids, plus other agents, are very popular. While diets that include docosahexaenoic acid and eicosapentaenoic acid together (fish diets) do produce various cardiovascular benefits, including lowering blood pressure and heart rate, the two fatty acids actually have opposite roles in the inflammatory process. Eicosapentaenoic acid decreases inflammation, so we should expect it to help prevent a lot of heart disease, but docosahexaenoic acid appears to promote inflammation, which may possibly explain why some studies suggest that fish oil pills are not what they’re cracked up to be.

“If you go to your local supermarket, drug store, pharmacy and you pick up a bottle of fish oil, 1,000mg, [it] often includes some component of docosahexaenoic acid, eicosapentaenoic acid, and other things,” says Gluckman. In general, for someone who is at high risk of heart disease or for someone who doesn’t have established cardiovascular disease but may have varying degrees of risk, the oil doesn’t convey cardiovascular benefits, he says. “That’s been shown in several studies and in a large meta-analysis.”

Confusing matters more is the fact that fish diets do contain docosahexaenoic acid and eicosapentaenoic acid together and do confer benefits. This may be just a question of optimal dosing, or the optimal ratio of the two fatty acids. But that’s terra incognita when it comes to food sources, because food engineering is not yet able to control ratios of different fatty acids so precisely. However, given the unequivocable value of eicosapentaenoic acid—including its anti-inflammatory effects and mounting evidence supporting the use of anti-inflammatory drugs in combating heart disease regardless of whether or not they change serum lipids—purified forms of it have been developed into pharmaceuticals. 


Notably, an ultra-purified form of eicosapentaenoic acid, called icosapent ethyl, was tested in the REDUCE-IT trial, where it demonstrated remarkable capability for lowering the level of another key component of the lipid profile—triglycerides. Triglycerides are molecules of fat that spell trouble when their levels are too high in the blood. They are also a major nuisance in cardiology practice and research, because they mess with the LDL value coming from clinical labs. Whereas total cholesterol, HDL cholesterol (“good cholesterol”), and triglycerides are easily measured and routinely quantified in blood, measuring LDL requires ultra-powerful centrifuges. Because these are big and expensive pieces of equipment, LDL is routinely not measured, but calculated instead, based on total cholesterol, HDL, and triglycerides. The 50-year-old old equation, called the Friedewald equation, actually underestimates the level of LDL, when the triglyceride level is very high and/or when LDL is very low. The newer Martin-Hopkins equation gives more accurate results, but the fact that there are more than two equations floating around and a lot of inconsistency in their between-lab use adds one more complication to any clinical trial. And for huge clinical trials involving tens of thousands of patients or ones that have been collecting data over many decades, this inconsistency complicates things a lot.

Perhaps we should replace our shining-beacon-on-a-hill goal of reproducibility above all else with something different.

And then there’s the placebo issue. The REDUCE-IT trial used mineral oil as the placebo given to the control group. This seemed to make sense, since it did look a lot like icosapent ethyl oil, which was the active ingredient in the pills given to the test group. But, as it turned out, the mineral oil was not so inert. It caused elevated levels of LDL as well as a marker for inflammation called high-sensitivity C reactive protein (hsCRP). The subsequent STRENGTH trial replaced the mineral oil with corn oil. That was a good call, but the treatment tested was not icosapent ethyl, the highly purified formulation of eicosapentaenoic acid. Instead, it was a combination of docosahexaenoic acid and eicosapentaenoic acid, albeit a more purified combination of these two fatty acids than you can expect from standard fish oil pills.  

All of this might make studies of heart disease and nutritional agents sound like the horrid Hydra of ancient Greek mythology: the water serpent with multiple heads that grows new ones each time Hercules succeeds at cutting one off. It’s daunting and laborious to even try to tackle such a monster! So if the question is whether we’ll ever see a large clinical trial of this type reproducing findings of a previous study in all aspects, the answer is no. But perhaps modern medicine doesn’t need to attempt the impossible Herculean task of taming the Hydra. If the goal is good medical science, perhaps it’s enough to take incremental steps toward heart health characterized by an overlap of findings from one study to the next that along the way inform adjustments made to drug and lifestyle recommendations, research methodologies, and overall progress. 

Perhaps we should replace our shining-beacon-on-a-hill goal of reproducibility above all else with something different. Given the size of these trials, changes in demographics of people participating in them, and our unfolding knowledge of the underlying science, clinical trials for heart disease are most likely always going to be rather difficult and awfully messy. But the process of the randomized, double-blind study is still the best pathway ahead.

Go Deeper