Tuesday, September 9, 2014

Corruption of the Evidence Base in Evidence-based Medicine

"‘Published' and ‘true' are not synonyms" ~ Brian Nosek, psychology professor at the University of Virginia in Charlottesville

Publish or perish. Obtain outside funding for research or lose your teaching position. 

Academic medicine and psychology have always been like that to some extent, but it’s been getting worse and worse lately. It’s a wonder anyone wants to become an academic these days. In academic medicine, there has also been a new push: invent something you can patent like a new drug or device that will make a profit for the University.

Is it any wonder that some academics start cheating when their livelihoods depend on it and they are under this sort of pressure? Or that business interests would try to take advantage of their plight to enhance their own sales and their bottom line? This sort of hanky panky had been increasing at an alarming rate.

Now of course, I am not arguing against the practice of doing clinical research and randomized controlled studies of various treatments, or against experimental psychology. These activities remain important even in light of all the corruption that is going on. It is one of the major differences between real scientists and snake oil salesmen, like those we see in much of the so-called “complementary and alternative medicine” industry. And just because a study is industry funded, that does not automatically mean that it is dishonest and not to be trusted.

What the increasing level of corruption means is that we have to pay more and more careful attention to the details of the studies that do make it into print.

First, we have to be on the lookout for outright fraud. An article published in the Proceedings of the National Academy of Science by Fang, Steen and Casadevall (October 16, 2012, 108 [42], pp. 17021-17033) found that the percentage of scientific articles that have been retracted because they were found to contain outright fraudulent data has increased about tenfold since 1975!

Journals also retract articles because of problems with a study that do not involve actual faking data, but the Fang article found that only 21.3% of retractions were attributable to innocent errors. In contrast, 67.4% of retractions were attributable to misconduct, including fraud or suspected fraud (43.4%), duplicate publication (14.2%), or plagiarism (stealing other people's material) (9.8%). 

The authors also found that journals often soft-peddle the reasons for any retractions that they do make. Incomplete, uninformative or misleading retraction announcements have led to a previous underestimation of the role of fraud in the ongoing retraction epidemic. Zoe Corbyn of the journal Nature opined that authors and journals may use opaque retraction notices to save face or avoid libel charges.

Second, we have to pay more attention to the design of the studies, the outcome measures used, and the statistical tricks employed to arrive at the study’s conclusions. We have to look to see if the abstract of a study, which is what most practitioners read if they read anything at all, actually summarizes the findings correctly

We have to look closely if the results are suspect because of the way the sample of subjects was selected and/or screened. I described in a previous post an excellent example of authors completely mischaracterizing the sample of subjects in a journal article published in the premier medical journal of our times.

Research in psychology and psychiatry has problems that are unique to those fields and which are very important. In fact, Cook and Cambell in their book (Quasi-Experimentation: Design and AnalysisIssues for Field Settingspoint out that randomized trials in our field are not really truly experimental in the scientific sense, but are instead what they call “quasi” experimental. 

This is primarily because of a major problem in such studies that concerns the nature of subjects selected for a study.

People are by their very nature extremely complicated. True scientific experiments must assign subjects at random to various treatment or placebo groups. However, in the social sciences, subsets of research subjects are very likely to differ from each other in many ways other than the presence of the treatment whose effects are being tested.

Conclusions from studies about cause and effect are much more complicated in psychiatry and psychology than they are in, say, physics. In physics, the matter under study is isolated from external sources of influence. Obviously, most controlled studies in medicine do not keep the subject under wraps, and under the complete control of the experimenters, for months at a time. 

Second, in physics, other variables that change over time can be kept out of the experiment's environment. Not so with aspects of people’s lives. Third, measurement instruments in psychology are often based on highly subjective criteria such as self-report data or rather limited observations interpreted by the experimenter.

Cook and Campbell also show how experimenters can manipulate the subjects in ways that can determine in advance the results they are going to get. This is because experimenters are usually dealing with variables that are distributed continuously rather than classified as one way or the other on the basis of some discreet characteristic. As examples, how much does someone have to drink in order to be classified as an alcoholic? How often do you have to engage in risky impulsive behavior to be classified as having poor impulse control?

Both potential causes and potential effects in psychology and psychiatry are distributed in the usual manner - in a bell-shaped distribution curve. Let's say that the variable (on the "X axis") above is how often subjects engage in risky behavior. Some people will rarely do so, others will do so often. Both extremes are seen infrequently in the populatiom, however. Most people fall somewhere in the middle. So in determining whether one group of subjects (say people with a certain diagnosis) are more prone to risky behavior than another, where should we draw the line on the X axis in determining who has a problem with this and who does not?

As it turns out, a potential cause for any given effect can appear to be necessary (the effect never appears unless the cause does), sufficient (the presence of a cause alone is enough to produce the effect, although other causes may produce the effect as well), both, or neither in a given experiment depending on where the experimenter chooses to draw the line for both the cause and the effect in determining whether they are present or absent, as shown in the following graph:

At points A, the cause appears to be both necessary and sufficient. If points B are used, the cause appears to be necessary but not sufficient. Dichotomize the variables at points C, and the cause appears to be sufficient but not necessary! A tricky experimenter can use this knowledge to design the study in advance to get the results he or she wants to get.

In fact, there are probably no necessary or sufficient causes for most medical and psychiatric conditions, but only risk factors which increase the likelihood that a condition will appear. To steal an analogy from another field of medicine, there will always be people who smoke a lot but who do not get lung cancer, and there will always be people who never smoke who do.  


  1. Another great column, David. Your residents are so fortunate to have you as a supervisor. I remember how impressionable I was when I began my residency and was very lucky to have had a scholarly but also practical mentor such as you, and that has made all the difference, as the Robert Frost poem goes. Santayana reminds us that skepticism is the chastity of the intellect and shouldn't be surrendered to the first comer. Keep up the great commentary.