RCT Plausibility Scale
After a few intro paragraphs, I want to present a scale of probability to estimate a value of a “prior” to plug into the formula for obtaining a Bayes Factor. The scale can help to estimate a value, but will still rely on an estimate, the non-quantitative element in Bayesian simulations. However, the checklist may at least provide some objective bases on which to hang a value, and that value would actually make a semi-quantitative statement of its own. Although that value would retain some subjective quality, it would at least be backed by known quantities and laws of nature.
Begging your patience again, I became aware of this problem in 1999 when asked to moderate an online (BioMednet.com) debate on “CAM” among 4 physicians. My role soon morphed into participant-debater when I could not get all to agree on what I thought was obvious common ground to proceed with the discussion – that 1) concepts that violate scientific laws do not have to be subjected to clinical trial (RCT) and that trial results had to be interpreted in light of previous knowledge; and 2) clinical trials could not constitute adequate evidence in the absence of plausibility because their results were too varied and inconsistent. The matter was p-recipitated by systematic reviews (SRs) showing efficacy of acupuncture in back pain. I was truly surprised when one of the participants (Dr. Edzard Ernst) assured me that indeed, RCTs were now the gold standard for efficacy. The debate went downhill from there.
I became fascinated with the disparity raised in the debate, and continued discussing the principle in an exchange in Academic Medicine with the then Director of NCCAM, Steve Straus. who wanted to use “rigorous trials” to prove or disprove “CAM” methods. I maintained that such an approach would lead to an infinite progression of indeterminacy.
Kim Atwood’s series on homeopathy and plausibility sums up our present state of knowledge and lack thereof, or – one could put in several different subjects – what’s wrong with Evidence Based Medicine (EBM;) lack of EBM’s ability to describe accurately the state of knowledge of implausible medical proposals; our understanding of Bayes’s theorem applied to RCT outcomes and reviews of implausible and ineffective methods. I can’t think of much to add to his summaries. BTW, he may not have stated, but he has been mulling this also for almost ten years to my knowledge, and if anyone has a handle on what’s wrong with the medical literature regarding sectarian “CAM” it’s Kim.
By this time most of us on this blog and some other colleagues recognize that EBM methods, including those used by the Cochrane Collaboration, are necessary but insufficient to reach a realistic expression of confidence in clinical trial results. Most of us are familiar with Ioannidis’s article and the way he has gone about this using a simplified Bayes method. Kim gave an example a week or two ago.
I have in my own mind used a few shortcuts to sectarian method evaluation. I am fond of shortcuts to help extract one from brambles of disputes about efficacy using insufficient and conflicting information. One way is to ask simply, how much worse off would we be or would a patient be if the method in question did not exist? It’s sort of a steal from O. W. Holmes famous comment on the materia medica contents of his time – the answer was they would have been better off without them. Regarding most methods that concern us today, the “CAM” ones, the answer is either better off or no worse off without them. In the case of methods with complications, like chiropractice and herbs, we’d be better off without them. With methods lacking bad effects such as homeopathy, we’d be just no worse off. But that assumes the methods are ineffective – which is obvious to us, but apparently not to others.
I will not go into the reason review experts (Cochrane’s and others) do not conclude ineffectiveness, and keep recommending “more clinical trials.” There is a reason, but that’s for another paper.
Kim’s and our problem, then, given RCTs already done, is how to establish ineffectiveness in presence of conflicting information without submitting every nutty idea to infinite numbers of trials.
The answer we’re coming to is to apply a Bayes Factor to the reported P values in a way similar to those of Steven Goodman and John Ioannidis. Goodman took a range of 3-4 possible values for a prior probability and calculated the posterior for each assigned prior. So, one had several possibilities from which to choose. Looking at his charts, the several possibilities are more revealing than one would have thought; the revised P value becoming much less significant in each of the examples. Kim Atwood presented another example a week or two ago.
But if one wanted to narrow the choices to one or two prior probability estimate, here is a checklist for use in estimating prior probabilities.
First, just list the usual classification of the scientific principles from the most basic to those dealing with the most complex, and grade the current evidence about the method on a scale from 0 to 10, with 0 as the least consistent or most inconsistent and 10 as most highly consistent with principles of each.
Physics (and mathematics)
Other complex sciences (geology, botany, astronomy, etc.)
Then, apply a negative integer, 0 to -10, as to how well or complete the phenomenon can be explained by another known science(s) – especially experimental psychology (suggestion, misperception) and social psychology (cognitive dissonance, mass hysteria, etc.) This maneuver takes advantage of known information that offers a hidden logical reason for any observed positive effect.
From here, there are a number of ways for adding, subtracting values, and the option for multiplying by 0 in the case of a highly conflicting basis such as homeopathy, so that no matter how many plus values there are, the answer would still be zero.
Taking the most implausible example, homeopathy, one would assign a 0 value for physics (violation of 1st and 2nd laws of thermodynamics, Boyle’s and Charles’s laws of gases (fluids). a “0” for chemistry (violation of law of mass action) a 0 or 1 for pharmacology, invalidity of “law of similars.” Add the resulting values.
A more direct mathematical way would be to use a numerical scale from 0 to 1, with degrees of consistency expressed as a fraction/decimal (0.001, 0.1, 0.5, etc.) whose final sum or product could be plugged directly into the formula to modify the calculated P of the report. A highest score would be a 1.0. which would confirm the calculated P value. Diminishing consistency with scientific laws and principles would diminish the calculated P proportionately.
Once the scientific scale is applied, one could have the option of applying another scale based on non-scientific credibility (consider the source):
Economic history (involvement in previous scams and schemes,) marketing useless
products, books on same.
Legal history (convictions, fines, licensure disciplinary actions, etc.)
Writings on other implausible claims, sectarian schemes (Scientology, etc.) vitamin
Participation in pseudoscience meetings (Whole Life Expo, etc.)
These events and characteristics reveal degrees of lack of credibility of the individual whose work is being evaluated (Wirth/ Cha prayer group, advocates of mercury-autism link, raw milk promotion, Laetrile advocacy, etc.) A value between 0 and 1 for each or for the combination would diminish further the value assigned to the Prior.
This is as far as I have been able to develop this idea. I grant its sometime subjective quality, that some qualities (convictions, fines) depend on other qualities in the scales, but I think they are important qualities and adding them decreases credibility. At least they should not be ignored. Implausible schemes are attractive to a certain set of mind types – psychopathic and gullible mindset patterns mean something. They can lead to a cultural and institutionalized intellectual psychopathology and a scientific-social terrorism, infiltration of academic institutions, bribery (the massive funding behind movements) and law changes (DSHEA, licensing quackery, Access to Medical Treatment Acts (AMTAs.) .
These credibility indicators are thoughts that occur to us but that are often excluded from evaluations because of principles of the law, intellectual/academic political correctness, and sometimes just a sense of “fairness” however blinding that may be. I think they are significant when it comes to a “scale of credibility,” which is not a bad name for this.
Several days ago Q’ometer blog addressed this problem with an interesting graphic using 4 quadrants and a plot of credibility vs evidence, variating on the theme of Kim Atwood’s fugue. A graphic plot would be a welcome addition to visualize the probability scale values.
This method does not address directly the problem of “MA and SR indeterminacy. (Gimme credit for that one too.) It addresses the RCTs that go into those SRs and MAs. Most SRs do not have single values to which one can apply a prior probability. There are other ways of handling SRs we can explore later.
I am no professional mathematician, so have at it.