Systematic Review claims acupuncture as effective as antidepressants: Part 1: Checking the past literature


A recent systematic review in PLOS One raised the question whether acupuncture and other alternative therapies are as effective as antidepressants and psychotherapy for depression. The authors concluded

 differences were not seen with psychotherapy compared to antidepressants, alternative therapies [and notably acupuncture] or active intervention controls

or put it differently,

antidepressants alone and psychotherapy alone are not significantly different from alternative therapies or active controls.

There are clear messages here. To consumers: Why take antidepressants with their long delay and uncertainty in showing any benefits–but immediate side effects and potential risks–when a few sessions of acupuncture work just as well? To promoters of acupuncture and alternative therapies: you can now cite an authoritative review in the peer-reviewed PLOS One as scientific evidence that your treatments is as effective as scary antidepressants and time-consuming psychotherapy when you make appeals to consumers and to third-party payers.

The systematic review had five co-authors, of whom three have been involved in previous meta-analyses of the efficacy of antidepressants. However, fourth author Irving Kirsch will undoubtedly be the author most recognizable to consumers and policymakers, largely because his relentless media campaign claiming antidepressants are essentially worthless, no better than placebo. For instance, in an interview with CBS 60 Minutes Irving Kirsch: The difference between the effect of a placebo and the effect of an antidepressant is minimal for most people.

Irving Kirsch: The difference between the effect of a placebo and the effect of an antidepressant is minimal for most people.

Lesley Stahl: So you’re saying if they took a sugar pill, they’d have the same effect?

Irving Kirsch: They’d have almost as large an effect and whatever difference there would be would be clinically insignificant.

The article opens up a new front in the antidepressant wars: Past claims, notably from Irving Kirsch, were that that any advantage of antidepressants over pill placebo was clinically trivial. Now he and his co-authors assert that antidepressants have an advantage over placebo, but only are equivalent to psychotherapy,or alternative treatments such as  exercise and acupuncture.

At first glance, this article proceeds methodically and consistently to its conclusions. I’m sure that many consumers and journalists will be persuaded unless they have strong preconceptions or an intimate familiarity with the existing literature. However, when I gave the article repeated close reads and checked citations against what was actually said in cited articles, I was struck by the

  • Inconsistencies with what is said in high quality available literature.
  • Unorthodox use of meta-analysis and voodoo statistics.
  • Idiosyncratic use of common terms that usually mean something else.
  • Ideologically driven conclusions, i.e,  a persistent commitment to previously strongly stated opinions in the face of contradictory evidence.

The article lacks transparency—i.e., the inclusion of basic details of what was done with what studies– and leaves readers either forced to accept the authors’ conclusions or to undertake themselves a labor-intensive independent search of the literature without the benefit of sufficient clues from the article as to where to look. Exploring these problems together, we can learn something about the use of systematic reviews as propaganda and marketing tools. Journalists covering the antidepressant war, please take note.

Yet,  I was also left with unsettling concerns about the vulnerability of the PLOS journals in the editing and peer review of cleverly packaged poor science, and also the inability of the PLOS journals to give proper recognition to post publication peer commentary and correction. Lapses in editing and peer review happen all the time, even in high impact traditional journals, but the issue is how readily and prominently they can be corrected. I am an Academic Editor of PLOS One and donate my time because I recognize the fallibility of prepublication peer review and my commitment to post publication peer commentary. I think, though, that this article highlights the need to fix some key aspects of the PLOS journals’ current editorial policies so that proper recognition can be given to post publication peer review in indexed, citable commentary, not just in blogs or elsewhere– something that does not happen now.

Here’s how I will build my case in two blog posts that this article presents a distorted view of antidepressants versus acupuncture for depression and that it makes unwarranted conclusions. In this first blog post, I will compare what is said about what is known about antidepressants and acupuncture in this article to what I found in key sources. In a second blog, I will examine the manner in which this systematic review was conducted and reported, and the deviations from best practices by which the authors reaching their foregone conclusions.  In the second blog I will also draw upon points that I extracted from the literature discussed in the first blog.

One of the many advantages of open access journals like PLOS One is that anyone in the world with Internet access can readily obtain copies of articles without charge. This allows bloggers to have to provide lots of details in order for readers to evaluate independently their claims. I will capitalize on this in encouraging readers to go directly now to the open access article and form their own opinions. In the first of two blogs on this topic, I mainly summarize here how the authors frame the research questions in the introduction and discuss their findings in their conclusion in terms of the fit with the existing literature. But, please, don’t depend solely on what I say, when you can readily look for yourself at the open access article and, if you’d like, follow up on the citations there.

What the article says

The Introduction claims

a number of recent articles have emphasized the inability of antidepressant medication to consistently demonstrate superiority to placebo pills.

As for psychotherapies for depression, they

have also come under scrutiny for the inability to demonstrate substantial superiority to various treatment controls as opposed to waiting-list (no treatment) controls. Similarly, “alternative therapies such as acupuncture and exercise have shown promise of individual published studies…, [But] the profile is less impressive according to independent review such as Cochrane reviews… and those conducted by the National Institute of Health and Clinical Excellence.”

Where does this leave us?

Given this level of ambiguity, it is unclear if pharmacological treatments are any better or worse and psychotherapies or psychotherapies are any better than non-traditional treatments such as exercise and acupuncture.

Thus, the groundwork is laid for the current systematic review.

The Discussion restates its context:

Much has been made of the inability of antidepressants to demonstrate clinically significant superiority to placebo in antidepressant clinical trials.

Then, the key conclusion of the study

Although antidepressants alone and psychotherapy alone did differ significantly from placebo controls, treatment-as-usual and waiting list controls, they did not different from alternative therapies such as exercise and acupuncture or active treatment control procedures.


Although combination therapy did not statistically separate from exercise and acupuncture in the blinded trials, these alternative therapies themselves were not statistically superior to placebo… This may be due to the small number of trials evaluating these.

And getting provocative,

Our results indicate that in acute depression trials using blinded raters the combination of psychotherapy and antidepressants may provide a slight advantage whereas antidepressants alone and psychotherapy do not significantly different [sic] from alternative therapies such as exercise and acupuncture or active intervention controls such as bibliotherapy or sham acupuncture.

Consumers, journalists, and promoters of alternative therapies please take note that what conclusions come from a comprehensive review of the scientific evidence.
Comparison with the existing literature: Antidepressants

Let’s start with the phrase in the introduction “Inability of antidepressant medication to consistently demonstrate superiority to placebo”? and its echo in the discussion, the “inability of antidepressants to demonstrate clinically significant superiority to placebo in antidepressant clinical trials.”

The authors cite four sources, two of them by the authors of this PLOS One article. The first source that is cited, “Evidence b(i)ased medicine – selective reporting from studies sponsored by pharmaceutical industry…” was important in demonstrating publication bias in antidepressant studies submitted to the Swedish drug regulatory authority, but it’s irrelevant to evaluating the efficacy of antidepressants in clinical practice. The Swedish authority evaluates whether antidepressants achieve a 50% reduction in scores on the Hamilton Depression Rating Scale in four-to-eight week trials. In such short trials, placebo effects are maximized and drug effects often have not yet occurred, although side effects are occurring. So, this article really can’t be entered into evidence as to the inability of antidepressant medication to demonstrate efficacy.

The second cited source, “Listening to Prozac but hearing placebo…” was co-authored by Irving Kirsch. The most relevant and provocative claim in the article is

a confident estimate that the response to inert placebos is approximately 75% of the response to active antidepressants medication. Whether the remaining 25% of the drug response is a true pharmacological effect or enhance placebo effect cannot yet be determined…

Citation of the article, as opposed to others, particularly a later high profile article in PLOS Medicine by Kirsch is odd. I have blogged about the cited article (1,2) elsewhere, and I will shamelessly draw on those blog posts in the following comments.

In one of my blog posts, I stated

The article is quite dense and difficult to follow in its twists and turns in logic and sudden declarations of definitive conclusions. Kirsch and Saperstein claimed [their] interpretations were based on rather complicated meta-analyses, which were described in a way that it is unlikely that anyone else could replicate them. Their examination of past reviews and computer searches had yielded 1500 publications, but only 20 were retained as meeting inclusion criteria. What was important technically was that calculation of within-condition effect sizes was nonstandard, and involved calculating for measures of depression the mean posttreatment score minus the mean pretreatment score, divided by the pooled standard deviation.

The article was published in a soon-to-be defunct Internet journal, Prevention and Treatment that was suffering from a lack of submissions. The action editor, psychologist Annette Stanton later conceded on the SSCPnet listserve that she was over her head in selecting reviewers and making editorial decisions about the article. The article was accompanied by scathing, but authoritative commentaries. One of the commentaries was by psychiatrist Don Klein who resigned as founding co-editor of Prevention and Treatment because of its editorial policies. He did not mince words in describing the Kirsch article as relying on

a miniscule group of unrepresentative, inconsistently and erroneously selected articles arbitrarily analyzed by an obscure, misleading effect size… The attempt to further segment the placebo response, by reference to psychotherapy trials incorporating waiting lists, is confounded by disparate samples, despite Kirsch and Saperstein’s claim of similarity.

He concluded by labeling the article as representing a failure of peer review.

Psychologist Robyn Dawes’ brief critique of Kirsch’s article was almost as scathing as Klein’s and zeroed in on an odd way of calculating effect sizes:

“the simple posttreatment minus pretreatment difference in an outcome variable for treatment group over placebo group does not define either a treatment effect or a placebo effect, even for groups randomly constructed. Such differences must be compared with the difference obtained from a (randomly selected) no-treatment group in order to evaluate the effect of treatment or placebo… Effect involves a comparative judgment, not just a pre-post one. And even when a legitimate effect is found for placebo, it often (almost always?) makes little sense to talk of a proportion of a treatment effect has been accounted for by a placebo effect. The logic of Kirsch and Saperstein (1998) is the seriously flawed. Science (like art and life) is not that easy.”

Dawes didn’t let up

the unusual—to use a kind word—nature of this analysis is best illustrated in the first sentence of the discussion section… “No-treatment effect sizes and effect sizes for the placebo response were calculated from different sets of studies.” There’s no such thing as a no treatment effect size.

Dawes ends with:

If knowledge were as simply attained as implied by Kirsch and Saperstein’s article – by just defining effect sizes as a pre-post comparison—it would not even be necessary to randomize. We would know enormous amounts more than we currently do. Unfortunately, we don’t.

In my next blog in which I get into the specifics of the acupuncture versus antidepressant question, we will find it is subject to the same criticisms of the earlier Kirsch study provided by Don Klein and Robyn Dawes. I haven’t been able to find other examples in the literature except for studies by Kirsch in which summary effect sizes are calculated separately for active treatments and placebos and then compared. But will be finding more flaws than just that. But back to the literature review….

The second article that is cited is by Fournier and colleagues and is available open access here. The article is notable in claiming

The magnitude of benefit of antidepressant medication compared with placebo increases with severity of depression symptoms and may be minimal or nonexistent, on average, in patients with mild or moderate symptoms. For patients with very severe depression, the benefit of medications over placebo is substantial.

It ends with a complaint that in efforts to promote antidepressant treatment

There is little mention of the fact that efficacy data often come from studies that exclude precisely those MDD patients who derive little specific pharmacological benefit from taking medications.

The article has received considerable media attention, although a lot less than the work of Irving Kirsch. However, it is also been subject to withering criticism. Among the most telling criticisms:

  • The systematic search of the literature identified 2164 citations, of which 281 were retrieved, but only six studies were included in the analysis because of the need for individual patient-level data.
  • The analysis was restricted to only trials of two antidepressants, imipramine and paroxetine. Imipramine is not a first-line antidepressant, and use of paroxetine is restricted because of concerns about side effects and a withdrawal syndrome, and discouragement of its use with pregnant and nursing women.
  • Only a small proportion of the patients entered into these trials scored in the in the mild range of depression and so even the authors concede that results may not generalize to mild depression.
  • The claim of “minimal or nonexistent” benefits for antidepressant was based on arbitrary criterion that to be otherwise, effect size of at least .5 had to be achieved.

Harriet Hall has already blogged about the effect sizes of antidepressants and provided an excellent link to a discussion of effect size (ES), and so I won’t reproduce her arguments here. However, I will point out that that the criterion of ES  > .5 originally came from Jacob Cohen’s proposed cut off for a large effect size which he warned (page 567) was quite arbitrary and ran the risk of being misunderstood. I asked psychiatrist Simon Gilbody, who has conducted key meta-analyses of enhancements of care for depression what he thought about this criterion and he replied:

“I never believe any study these days which finds an ES > 0.5. it usually means the science is wrong and we haven’t found the bias(es) just yet.”

The third of the four articles is co-authored by Erick Turner, a psychiatrist and former FDA reviewer of antidepressant data submitted by Pharma.  The NEJM article involved a meta-analysis of published and unpublished antidepressant trials submitted to the FDA as part of the drug approval process.

According to analyses of data obtained from the FDA, only half of the trials were positive, but according to the published journal articles, almost all of the trials were positive. In almost a dozen cases, trials that were negative or questionable according to the FDA were published as if they were positive. Once unpublished reports were taken into account, the overall effect size (ES) for antidepressants relative to placebo lowered from 0.41 to 0.31, where ES is the standardized mean difference in improvement. However, in all cases, antidepressants were significantly superior to pill placebos. So, Turner and colleagues found that antidepressants were not they are cracked up to be, but they did not dismiss them as ineffective. As Turner was later quoted, “the glass is far from full but far from empty.”

The fourth source cited in opening statements of the antidepressant/acupuncture meta-analysis is by two of its authors, Khan and Brown. It involves a study of the severity of depression in response to antidepressants using the FDA administrative database. It concludes that severity of initial symptoms affects clinical trial outcomes. While doesn’t seem to present an overall effect size for antidepressants versus placebo, it notes that 48% of clinical trial showed the superiority of investigational antidepressants over placebo, and 64% of the clinical trial showed superiority for established antidepressants relative to pill placebo.

It’s noteworthy which meta-analyses by the authors of the antidepressant/acupuncture meta-analysis were not acknowledged in that article. A PLOS Medicine meta-analysis by Kirsch is among the most highly cited articles in that journal and has been widely discussed in the media. However, its claim that the relative advantage of antidepressants versus placebo are minimal and insignificant would seem to contradict the claims in the present antidepressant/acupuncture article that the effects of antidepressants are significantly superior to pill placebo, although no better than acupuncture. Furthermore, a 2005 study by two of the co-authors of the antidepressant/acupuncture meta-analysis involved 329 depressed patients from their antidepressant clinical trials center who had been entered in 15 multicenter trials. Effect sizes were .51 in the low/moderate group, .54 in the high/moderate group, .77 in the moderately severe group, and 1.09 in the severe group. Effect sizes obtained in multisite trials are known to vary across specific sites, but these relatively strong effects would seem to undermine claims that antidepressants are no better than placebo or no better than acupuncture.

In sum, this seems to be a marked difference in the portrayal in the antidepressant/acupuncture article of the past literature concerning antidepressants versus placebo, and even between the past work of the authors of this article and what they say here. A commentary co-authored by Erick Turner in the British Medical Journal on the discrepancy between Kirsch and his interpretation of similar results aptly notes “we agree that the antidepressant ‘glass’ is far from full, we disagree that it is completely empty.


The PLOS article comparing acupuncture and antidepressants states

“Alternative therapies such as acupuncture and exercise have shown promise… [But] the profiles are less impressive according to independent reviews”.

Let’s see how much less impressive. An updated Cochrane review (2010) found insufficient evidence to recommend acupuncture for depression, and insufficient evidence of a consistent beneficial effect from acupuncture compared to wait list or sham acupuncture control. There were obvious difficulties that could not be readily overcome in the blinding of providers and patients as to whether a medication or acupuncture were received. There was further a high risk of bias across studies. Overall, the study of acupuncture for depression was characterized by a lack of consistently rigorous scientific study and poor reporting. There was unexplained heterogeneity, i.e., inconsistency of results across studies they could not readily be interpreted. Most of the studies were from China, including all of the antidepressant/acupuncture comparisons, with participants largely recruited from inpatient populations. Very few of the American RCTs comparing antidepressants to pill placebo entered into meta-analyses involved in patient samples.

A more recent systematic review of systematic reviews of acupuncture for depression found that all of the positive reviews and most of the positive primary studies originated from China and that there was reason to believe that the reviews were less than reliable. The authors concluded that acupuncture was essentially unproven as a treatment for depression.

A systematic review and meta-analysis of Chinese randomized trials concerning the antidepressant venlafaxine concluded that all trials were of low quality and underpowered.

Yet another systematic review revealed an astonishing publication bias to randomized trials of acupuncture more generally: no trial published in China, or for that matter Russia/USSR had found an acupuncture treatment to be ineffective relative to a comparison treatment.

An interview study with the authors of 2235 apparently randomized trials published in the approximately 1100 medical journals help published in China found that less than 10% adhered to accept a methodology for randomization and only 6.8% could be deemed authentic randomized controlled trials.

To say the least, these reviews suggest the need for caution in accepting data from China and in synthesizing and comparing data from RCTs conducted in China with those conducted in the West without some consideration of quality of the trial and risk of bias. Given the quality of the literature, we are not even in a position to evaluate acupuncture for depression, much less make statements about its efficacy relative to antidepressants or placebo.

In summary, I don’t think that someone would anticipate what is said in the existing literature from how it is portrayed in the acupuncture/antidepressant meta-analysis. Furthermore, we can anticipate some formidable challenges to integrating data from antidepressant versus pill placebo trials with acupuncture trials and acupuncture-compared-to-antidepressants. There appear to be few head-to-head comparisons and a lack of comparability of the control conditions in acupuncture versus antidepressant trials. Both patients and providers are well aware of treatment assignment in ways that cannot be blinded and so the repeated references in the antidepressant/ acupuncture article to controlling for blinding stands out as odd. Finally, there is good reason to doubt the quality and credibility of acupuncture studies conducted in China, where most studies are conducted. In my next blog post, we will see how these problems were addressed or ignored.

Posted in: Acupuncture, Neuroscience/Mental Health, Science and the Media

Leave a Comment (32) ↓