NB: This is a partial posting; I was up all night ‘on-call’ and too tired to continue. I’ll post the rest of the essay later…
This is the fourth and final part of a series-within-a-series* inspired by statistician Steve Simon. Professor Simon had challenged the view, held by several bloggers here at SBM, that Evidence-Based Medicine (EBM) has been mostly inadequate to the task of reaching definitive conclusions about highly implausible medical claims. In Part I, I reiterated a fundamental problem with EBM, reflected in its Levels of Evidence scheme, that although it correctly recognizes basic science and other pre-clinical evidence as insufficient bases for introducing novel treatments into practice, it fails to acknowledge that they are necessary bases. I explained the difference between “plausibility” and “knowing the mechanism.”
I showed, with several examples, that in the EBM lexicon the word “evidence” refers almost exclusively to the results of clinical trials: thus, when faced with equivocal or no clinical trials of some highly implausible claim, EBM practitioners typically declare that there is “not enough evidence” to either accept or reject the claim, and call for more trials—although in many cases there is abundant evidence, other than clinical trials, that conclusively refutes the claim. I rejected Prof. Simon’s assertion that we at SBM want to “give (EBM) a new label,” making the point that we only want it to live up to its current label by considering all the evidence. I doubted Prof. Simon’s contention that “people within EBM (are) working both formally and informally to replace the rigid hierarchy with something that places each research study in context.”
In Part II I responded to the widely held assertion, also held by Prof. Simon, that there is “societal value in testing (highly implausible) therapies that are in wide use.” I made it clear that I don’t oppose simple tests of basic claims, such as the Emily Rosa experiment, but I noted that EBM reviewers, including those employed by the Cochrane Collaboration, typically ignore such tests. I wrote that I oppose large efficacy trials and public funding of such trials. I argued that the popularity gambit has resulted in human subjects being exposed to dangerous and unethical trials, and I quoted language from ethics treatises specifically contradicting the assertion that popularity justifies such trials. Finally, I showed that the alleged popularity of most “CAM” methods—as irrelevant as it may be to the question of human studies ethics—has been greatly exaggerated.
In Part III I continued to argue against trials of implausible methods. I didn’t share Prof. Simon’s optimism, expressed in another post on his blog, that “research can help limit the fraction of CAM expenditures that are inappropriate.” I argued that whatever evidence there is suggests otherwise. I argued that if existing science is sufficient to reject a method, as is the case for much of “CAM,” then the research has already been done, and the task of EBM is to explain this kind of evidence—not to pretend that the jury is still out. I argued, furthermore, that efficacy trials of highly implausible, ineffective methods inevitably yield equivocal, rather than merely disconfirming results, and that this leads to an endless cycle of further (equivocal) trials. I offered parapsychology as a longstanding example of such an “an immortal field of fruitless inquiry: a pathological science.”
I promised that in this final part of the series I would mention a few more points about Cochrane Reviews of highly implausible methods, and even report some reasons for slight optimism. Finally, I promised to respond briefly to this comment by Prof. Simon:
…how can we invoke scientific plausibility in a world where intelligent people differ strongly on what is plausible and what is not? Finally, is there a legitimate Bayesian way to incorporate information about scientific plausibility into a Cochrane Collaboration systematic overview(?)
The ongoing theme is that Cochrane Reviews ignore key ‘external’ evidence, by which I mean all evidence other than what might be found in randomized, controlled trials (RCTs). I’ve previously alluded to the 2002 Cochrane Review of “Chelation Therapy for Atherosclerotic Cardiovascular Disease,” which concluded,
At present, there is insufficient evidence to decide on the effectiveness or ineffectiveness of chelation therapy in improving clinical outcomes of people with atherosclerotic cardiovascular disease.
Elsewhere we have shown that the evidence against such effectiveness is substantial, far exceeding the evidence against Laetrile for cancer or against bilateral ligation of the internal mammary arteries for coronary disease, two long-since discredited methods that no biomedical researcher in his right mind would consider resurrecting for further trials. Oops, make that almost no biomedical researcher.
Quips aside, my reason for bringing up the chelation review again is as follows. The review acknowledges that there have been RCTs, involving about 250 subjects, which
…showed no significant difference in the following outcomes: direct or indirect measurement of disease severity and subjective measures of improvement.
That those findings weren’t sufficient reasons for the Cochrane reviewers to judge chelation ineffective was apparently due first to their having ignored the abundant non-RCT evidence, and second to their having been intrigued by a single, tiny RCT that reported a positive outcome:
One of the studies, which included only 10 participants, was interrupted prematurely, because of an apparent treatment effect. However, relevant data were not available in the report and have been requested from the authors.
I had to chuckle when I read that passage, because I know a lot about that study and its authors. If you look here and scroll down to “Olszewer (1988),” you will know, too. I wonder if the Cochrane Reviewers ever got a straight story from the authors. There hasn’t been an update of that review, so whether they did or not is anyone’s guess.
I’m sorry to say that I haven’t been able to get a copy of that review in its entirety, so my comments apply only to the abstract. I have recently obtained a few other complete reviews, of which two are worth mentioning. A 2008 review of “Touch therapies for pain relief in adults” looked at Healing Touch, Therapeutic Touch, and Reiki. It doesn’t mention the Emily Rosa experiment. Its conclusion is what we’ve come to expect:
Touch therapies may have a modest effect in pain relief. More studies on HT and Reiki in relieving pain are needed. More studies including children are also required to evaluate the effect of touch on children.
In this review we are told why it is that not touching is called “touch”:
Touch Therapies are so-called as it is believed that the practitioners have touched the clients’ energy ﬁeld.
For readers who are unfamiliar with such practices, which consist of waving one’s hands over a “client” (you really have to see it to believe it), the review continues:
It is believed this effect occurs by exerting energy to restore, energize, and balance the energy ﬁeld disturbances using hands-on or hands-off techniques (Eden 1993). The underlying concept is that sickness and disease arise from imbalances in the vital energy ﬁeld. However, the existence of the energy ﬁeld of the human body has not been proven scientiﬁcally and thus the effect of such therapies, which are believed to exert an effect on one’s energy ﬁeld, is controversial and lies in doubt.
Indeed. The following passages are not to be read in detail, other than by the masochistic. Their purpose is to demonstrate the elaborate wheel spinning that EBM treatments of such fanciful methods inevitably involve. The perseveration of statistics, as if they can support the house of cards that is the basis for the technique, will, I hope, ruffle Prof. Simon’s own energy field:
Types of touch therapies
The effects of different kinds of Touch Therapies were examined (Comparison 3). It appears that all three types of Touch Therapy, HT, TT and Reiki, may decrease pain to a certain extent. Substantial heterogeneity exists among the HT group and TT group. The HT group (163 participants) had an I2 of 76% and a P value= 0.04 (Chi-square) and the TT group (686 participants) had an I2 of 70% and a P value < 0.00001 (Chi-square). The results for both the TT and the HT group indicate that there is signiﬁcant heterogeneity and that the effects were positive. There were two studies in HT included in the analyses (Cook 2004; Post-White 2003). The pooled results showed that participants exposed to HT had, on average, 0.71 units less pain, however, this was not statistically signiﬁcant (95% CI: -2.27 to 0.86). The pooled estimates of TT suggested a statistically signiﬁcant result of 0.81 units (95% CI: -1.19 to -0.43) less pain in the exposed group. Nonsigniﬁcant heterogeneity was detected in the Reiki group (116 participants) which consisted of three studies (Dressen 1998; Olson 2003; Tsang 2007) with an I2 of 7% and a P value = 0.34 (Chi-square). The pooled estimates of the results for Reiki reported a statistically signiﬁcant effect. Participants exposed to Reiki had an average of 1.24 (95% CI: -2.06 to -0.42) less pain.
Experience of practitioner
The experience of the practitioner on the effects of touch therapies was also analyzed. This helped to explore whether a less experienced touch practitioner would result in less effect and an experienced touch practitioner would result in an increased effect. Subgroup analyses were thus performed. There are usually four levels of training in HT, TT and Reiki, level I, II, III and a master level or teacher level. Studies were divided into subgroups according to the level of training. Due to a small number of studies having reported the experience of touch practitioners, studies were divided into two groups, level I and II , and level III or above, rather than four groups. Four studies were included in the subgroup of less experienced practitioners (212 participants) (Blankﬁeld 2001; Cook 2004; Frank 2003; Redner 1991). An I2 of 2% and a P value = 0.38 (Chi-square) was found. In the subgroup of more experienced practitioners (116 participants), three studies were included (Dressen 1998; Olson 2003; Tsang 2007) and an I2 of 7% and a P value = 0.34 (Chi-square) was found. No signiﬁcant heterogeneity was detected in the two subgroups. Participants in the subgroup of less experienced practitioners had, on average, 0.47 units (95% CI: -0.73 to -0.22) less pain. More experienced practitioners yielded higher contribution, having on average of 1.24 units reduction in pain intensity (95% CI: -2.63 to -0.23) (Comparison 4). Minor non-signiﬁcant heterogeneity existed in both the experienced and less-experienced group. This might suggest that any heterogeneity calculated in other subgroups was owing to the difference in the experience of the practitioners. However, only a small number of identiﬁed studies were included in this subgroup and the apparent small heterogeneity may be owing to the low power of the chi-square test or due to having a small numbers of studies (Higgins 2005), it would not be appropriate to make conclusions about the existence of heterogeneity at this stage with the current results.
Dose-response analyses were also conducted to investigate if there was any difference in effect due to differences in duration of treatment. The study with the shortest session had the session lasting for ﬁve minutes while studies with the longest session lasted for ninety minutes. The number of treatment sessions ranged from a single session to ten sessions. Data regarding the duration and number of treatment sessions were pooled. Data were analyzed in terms of total duration of treatment (duration of a single session multiplied by the number of treatment sessions). Three hundred and ninety six participants exposed to touch for less than an hour had an average of 1.16 units (95% CI: -1.85 to -0.47) less pain; 239 participants exposed to touch for more than one hour but less than two hours had an average of 0.75 units (95% CI: -1.81 to 0.31) less pain, but this was insigniﬁcant; 255 participants exposed to touch for between two to three hours had an average of 0.47 units (95% CI: -1.09, 0.14) less pain; 116 participants exposed for over three hours to touch therapies had an average of 1.57 units less pain (95% CI: -2.38 to -0.76). No dose-response relationship can be gained as yet from this information (Comparison 7).
Phew! (‘Dose-response analyses’ ?). At the end of the monograph are the first author’s acknowledgements:
I would like to thank Dr Yan-Kit Cheung, who is my Reiki teacher and an expert in complementary and alternative medicine, in giving me valuable advice in conducting this review…Last but not least, I would like to express my gratitude to Miss Wan-Choi Patsy Lee who brought me to Reiki. Without her, I would know nothing about Reiki and would not be enlightened by this precious gift.
I don’t know who should be more embarrassed: Cochrane, for having asked a devout believer to pass judgment on the sacred object of her belief, or me, for having stumbled upon her supplication.
*The Prior Probability, Bayesian vs. Frequentist Inference, and EBM Series:
16. What is Science?