## Fun With Statistics

Statistics is the essential foundation for science-based medicine. Unfortunately, it’s a confusing subject that invites errors and misunderstandings. We non-statisticians could all benefit from learning more about statistics as well as trying to get a better understanding of just how much we don’t know. Most of us are not going to read a statistics textbook, but the book *Dicing with Death: Chance, Risk, and Health* by Stephen Senn is an excellent place to start or continue our education. Statistics can be misused to lie with numbers, but when used properly it is the indispensable discipline that allows scientists:

…to translate information into knowledge. It tells us how to evaluate evidence, how to design experiments, how to turn data into decisions, how much credence should be given to whom to what and why, how to reckon chances and when to take them.

Senn covers the whole field of statistics, including Bayesian vs. frequentist approaches, significance tests, life tables, survival analysis, the problematic but still useful meta-analysis, prior probability, likelihood, coefficients of correlation, the generalizability of results, multivariate analysis, ethics, equipoise, and a multitude of other useful topics. He includes biographical notes about the often rather curious statisticians who developed the discipline. And while he includes some mathematics out of necessity, he helpfully stars the more technical sections and chapters so they can be skipped by readers who find mathematics painful. The book is full of examples from real-life medical applications, and it is funny enough to hold the reader’s interest.

### What a Difference a Word Makes

Statistics (and probabilities) are frequently misunderstood, even by many scientists. Even what looks simple can turn out to be complicated and counter-intuitive. Senn revisits an old question. If a man has 2 children and at least one of them is a boy, how likely is it that the other is a girl? Most people reason that there are only 2 possibilities, boy or girl, both equally likely, so there is a probability of 1 in 2, or 50%, that the other child is a girl. That’s wrong. In fact, there is a probability of 2 in 3: the other child is twice as likely to be a girl as a boy. The 50% answer is only true if you change the question slightly from “one of them is a boy” to “the firstborn is a boy.” If this doesn’t make sense to you, you really need to read the book.

### Does Medical Research Discriminate Against Women?

He devotes a whole chapter to that question and answers it with a convincing NO. The Women’s Caucus complained that a lot of studies had only male subjects, that women were under-represented, that what we were learning about treating men might not apply equally to women. The perception that women were being neglected was not supported by the evidence; if anything, women have actually been over-represented. Nevertheless, they persuaded congress to pass a law requiring the director of the NIH to ensure that trials be designed to examine whether the variables being studied affect women or members of minority subgroups. Senn explains how confounding factors like sex and race were already being dealt with adequately and why a strict enforcement of the new policy would be disastrous for research, requiring far greater numbers of subjects and greater expense. Fortunately, researchers have managed to continue their previous rational and appropriate practices while making a few token placatory noises to the grant-making bodies.

### Applications Beyond Medicine

Medicine is the only profession that employs randomized controlled trials to evaluate the effects of its actions and systematically encourages publications about its errors. Evidence-based medicine is a good thing, and there’s no reason the methods of statistics couldn’t be used to develop similarly evidence-based approaches to educational, managerial, economic, and social policies. He describes one intriguing application: the process of religious conversion has been studied using infectious disease modeling.

### Statistics in the Courtroom

The Dow Corning company manufactured silicone breast implants until statistical innumeracy allowed a flurry of successful lawsuits to bankrupt the company. Senn quotes the definition of dice from *The Devil’s Dictionary*:

Small polka-dotted cubes of ivory, constructed like a lawyer to lie on any side, but commonly on the wrong side.

He also includes this discouraging quote from Marcia Angell:

I am occasionally asked by lawyers why the

New England Journal of Medicinedoes not publish studies “on the other side”, a concept that has no meaning in medical research… Yet science in the courtroom, no matter how inadequate, has great impact on people’s lives and fortunes.

### Statistics, MMR Vaccine, and Autism

In the last chapter he revisits the Wakefield fiasco, showing how misunderstandings of statistical principles have led to outbreaks of vaccine-preventable diseases. Statistics has an important role in determining rational public policies to protect the population, and through its sub-science of decision analysis has much to say about the ethical and economic aspects of vaccination.

### Entertainment Value

Senn has a droll wit and an endearing fondness for puns. Who would have thought that a book on a dry subject like statistics could be so entertaining and even laugh-out-loud funny? He says the *post hoc* fallacy is so obvious that “almost any human being who is not a journalist can understand it.” Since it regularly gets by the average journalist, he renames it the “post hoc, passed hack” fallacy. He says that John Donne’s “no man is an island” is true, but if you capitalize Man it becomes “false, as anyone cognizant with the geography of the Irish Sea will surely aver.” In case geography-challenged readers don’t get this, he reluctantly explains it in the endnotes. He says such punctuation would be a capital offense. His puns extend to this tour de force:

Poisson was attracted to magnetism, stretched his mind on the theory of elasticity, starred in astronomy and generally counted as one of the foremost mathematicians of his day.

His sources are wide-ranging and eclectic. He quotes Guernsey McPearson’s definition of a meta-analyst as one who thinks that if manure is piled high enough it will smell like roses. His quotes from Sinclair Lewis’s *Arrowsmith* inspired me to re-read that book (it’s available online and it’s just as pertinent today as it was in 1925.) One of his quotes from *The Phantom Tollbooth* is particularly applicable to some areas of CAM:

I only treat illnesses that don’t exist: that way if I can’t cure then, there’s no harm done.

### Conclusion

The book reinforced my conviction that I don’t know enough about statistics and never will, so I will have to continue to rely on the experts. I confess that some of the material was over my head. I couldn’t follow the formula for Russian roulette showing that the probability of death on the 5^{th} attempt is 0.066 compared to 0.1 on the first attempt (no, the initial probability is not the 1 in 6 that you would suppose, since the weight of the bullet makes the spinning cylinder more likely to stop with the bullet at the bottom). I couldn’t follow his explanation of two different mathematical approaches to coin toss probabilities. But I do know more about statistics now than I did when I started reading. I learned much of value and was well entertained in the process.

I didn’t need any convincing, but I think even the most skeptical reader would come away convinced that:

the calculation of chances and consequences and the comprehension of contingencies are crucial to science and indeed all rational human activity.

I couldn’t agree more that statistics are vital (no pun intended, well, yes, there was) and that most of us don’t have the grasp we should of something we really need in everyday life: off to check out the book, sounds good.

“If a man has 2 children and at least one of them is a boy, how likely is it that the other is a girl?”

Her are some interesting variations:

(You can probaly google the results but try them without googling)

THREE CARDS PUZZLE:

Suppose there are three cards:

A black card that is black on both sides.

A white card that is white on both sides.

A mixed card that is black on one side and white on the other.

All the cards are placed into a hat and one is pulled at random and placed on a table. The side facing up is black. What are the odds that the other side is also black?

THREE BOXES PUZZLE:

There are three boxes:

A box containing two gold coins.

A box containing two silver coins.

A box containing one silver and one gold coin.

A box is chosen at random and, without looking at its contents, one coin is withdrawn at random and placed on the table. It is a gold coin. What is the chance that the other coin in the box is also a gold coin.

MONTY HALL PUZZLE:

You are on a game show. You are given the choice of three doors, A, B, and C. Behind one door is a car. Behind the other two doors is a goat. You pick door A. The host, who knows what’s behind the doors, opens door C to reveal a goat. He then offers you a switch to door B. What do you do, stick with door A or switch to door B?

And this pretty hairy one:

THREE PRISONERS PUZZLE:

Three prisoners, A, B and C, are in separate cells and sentenced to death. The governor has selected one of them at random to be pardoned. The warden knows which one is pardoned, but is not allowed to tell. Prisoner A begs the warden to let him know the identity of one of the others who is going to be executed. “If B is to be pardoned, give me C’s name. If C is to be pardoned, give me B’s name. And if I’m to be pardoned, flip a coin to decide whether to name B or C.” The warden tells A that B is to be executed. Prisoner A secretly tells prisoner C. Has the probability that A is to be pardoned changed? Has the probability that C is to be pardoned changed?

Hmmm,

I think the two child problem is presented incorrectly, or incompletely at least.

http://en.wikipedia.org/wiki/Boy_or_Girl_paradox

Intuitively if you have at least one boy and want to know if the other is a girl the options are:

Two boys

One boy and one girl

That’s it. No mention is made of ordering so we don’t care. Its either two boys, or one boy and one girl.

The wikipedia article shows that you could add additional conditions not stipulated in the question to make it more compelx but that doesn’t seem that this particular problem is a statistics misunderstanding.

( I had to look this up because I really was confused, it seemed the answer was simple in the example given, unlike the Monty Hall problem which is not the same. It doesn’t matter which child is a boy or a girl since you aren’t specifying which child you started with. Just that one must be a boy. )

tdxdave:

What matters under this phrasing is the options are not equally likely. If we assume that boys and girls are likely to be born at the same probability of .5 (a close approximation) then each of the options BG, GB, GG, BB is equally likely at .25 (before we learn anything). This means that the option “two boys” is only half as likely as “one boy and one girl”, since by omitting the ordering we’ve merged two equally likely outcomes into one

morelikely outcome.We then drop the possibility that both are girls when we learn that at least one is a boy, so we end up with the possibilities split between two boys (1 in 3) and one girl and one boy (2 in 3).

tdxdave:

Actually, looking back, i think a different part of the problem may indeed be incorrectly phrased: “the other child is twice as likely to be a girl as a boy” should read “it is twice as likely that one child is a girl”, since no one of the two children has (necessarily) been specified yet, which means that there is no “other” to speak of. (Had we identified one child and “outed” them as a boy, then the odds would be equal that the other child is a girl or is a boy.

But, cards, boxes and doors are not subject to disease or accidents like children are and there is a very slight difference in the numbers of human males and females born living. It’s not 50 boys to 50 girls, yes? And then we have the whole issue of transgender and intersex individuals.

Isn’t it just easier to ask the man what gender his children are?

@MTR:

^_^ Excepting the cases where they might get annoyed that you can’t tell (especially fun when said children are teenagers and they’ve got some of the more bizarre haircuts/clothing/makeup styles).

It’s not just in medical decisions that statistics play a role. Take a look at how many people are probably going to be short on cash for a credit card payment or rent because they plunged that cash into the latest powerball lottery. If we were aware of the astronomical probability of winning, we’d pretty much leave it alone.

DugganSC – If the kids aren’t there, it’s okay to ask. If the kids are there and past puberty, just look for the adam’s apple and achilles tendons (the achilles tendons are a personal observation, most males have much more defined, tight tendons in their legs). If the children are there and younger, it’s hopeless, but – I’m not sure that the increased statical probability of being a girl is going to be really helpful in deciding how to handle the long haired, neutrally dressed child called Rory. Sometimes it’s best just to avoid pronouns until you have a chance to find the correct social footing.

@Epi Ren – as an aside, I’ve actually had a doctor compare a possible treatment to Vegas odds.

Stats and percentages are an enigma. I heard on the radio a few years ago that in the area, from 6pm Friday to 6pm Sunday, 80% of all drivers have been drinking. They further stated that during the same period, 60% of accidents involved a drinking driver. At first glance that makes since but looking at it later, it appeared that the 20% of drivers not drinking are causing 40% of the accidents. In otherwords, the non-drinking drivers are causing a disproportionate amount of the accidents during the time period mentioned.

So, are the non-drinking drivers more dangerous?

Dr. Hall, could you say more about why the author believes that researchers should not worry about gender and race representation as a factor in study design?

Having sat on numerous NIH proposal reviews for clinical studies, I have not seen this point as an appreciable review concern. PIs do seem to take seriously whom they should include or exclude. Reviewers definitely look at this and the vast majority of times in my experience we agree with the PI and there’s no need for discussion, just the dull droning of codes (“A1, B1…). Occasionally it does merit discussion and sometimes reviewers make an additional recommendation, but just as often after the discussion we agree with the PI. I can’t address what’s happening at the Council level, however, and maybe there is more pushback there?

At any rate, I thought the author’s comment about this was odd and it certainly didn’t match my own experiences, so I was curious to learn more.

@Angora Rabbit,

The author did not mean that researchers should not consider gender and race. He makes a good case that the common perception that women and minorities have been neglected is wrong, and that the new regulations are counterproductive. He has a whole chapter on the subject. I found it persuasive. I urge you to read it. You don’t have to buy the book to read it: your library can get it for you by interlibrary loan.

tdxdave,

“Intuitively if you have at least one boy and want to know if the other is a girl the options are:

Two boys

One boy and one girl”

As in many cases like this, intuition fails and that is the whole point of the exercise.

Consider all the men who have two children. The possibilities are: BB, GG, BG, GB, and they are all equally likely with a probability of 1/4. If you think order doesn’t matter, then you have to say that there are only three possibilities: two boys, two girls, a boy and a girl, and that these are equally likely with a probability of 1/3. This would make the probability of having a boy and a girl 1/3. This is clearly false. The probability of having a boy and a girl is clearly 1/2. The order clearly does matter.

Toss a coin twice. The possibilities are: HH, TT, HT, TH, each with a probability of 1/4. You will throw a head and a tail 1/2 of the time, not 1/3 of the time.

“No mention is made of ordering so we don’t care.”

The ordering is implied whether it is mentioned or not.

If I say I had a hair cut today, it is implied that I am not completely bald so I would assume I would not have to mention that fact.

…try the Monty Hall problem. It will destroy your intuition.

Roses

dosmell like manure if a cow smells them first.I’ll bet you BillyJoe’s gold coins that those two surveys didn’t use the same method of data collection.

We cannot conclude that – since the risk of an accident goes not as the number of drivers, but the number of miles. If the drunk drivers are preferentially taking shorter trips, they can still be the more dangerous even if they account for fewer of the accidents.

Riddle:

A test is 99% effective at detecting X. i.e. If X exists, it will be detected 99% of the time. If X does not exist, it will be falsely detected 1% of the time. Out of 10,000 tests, we detected X 103 times. What is the probability that X truly exists in our sample?

“Medicine is the only profession that employs randomized controlled trials to evaluate the effects of its actions and systematically encourages publications about its errors. ”

Wow. Really? I am fairly sure that a large number of psychologists, economists, political scientists and policy analysts would disagree with this.

Interesting the “TV show-with-goat-or-car-for-prizes” -isn’t the assumption that anyone would rather have a car? Vegans such as myself would, of course, prefer to liberate the goat: in other words, how can we take ethics into account in statistics?

UncleHoot: ~3%

rmgw,

If you want the goat, stick with your original choice but, if you want the car…to convert to cash…to free a thousand goats…switch to the other door.

@Scott, I don’t know that I can agree with you on that. There is also the consideration that the 60% of accidents involving a drinking driver don’t necessarily mean involving two drinking drivers but some, inevitibly, would involeve a drinking and a non-drinking driver. Then there is the issue of who is at fault which was not a part of the “data” given.

I object to statistics being called a dry subject, naturally. In the 80’s there’d be meetings where folks would nearly come to blows (rowdy Bayesians, ya know). I also find that having a theory of learning, and decision making to be rather interesting. Practically, it’s an effort to not be a fool, also not so dry in my view. Experimental design is so critical, and many researchers so bad at it, that I find that interesting too. People are doing experiments without analysis plans – inconceivable, but true.

What I find disturbing is the failure of some basic researchers to obtain good stats help. Perhaps there’s a worry that we want to know what is true, not just demonstrate that your ideas are correct by any means that appear to work. Paul Goldberg, main author of The Cancer Letter, gave grand rounds here last Monday about the Duke scandal, and one of our most prominent statisticians pointed out that the scandalous retracted papers were often quite statistical, but it was hard to determine who the statistician was, or if any were really involved. As for review, we know that was crappy. The suggestion was that real nerds have higher ethical standards than other lab folks with a few mathy skills at their disposal. Sorry, but I think that’s largely true. Greater independence of the analyst is perhaps the reason. It’s why we are the bad guys: you can’t get away with just any shit with us. (Or you can, but then I ask my name not be mentioned, about once per year – even if figures 1 and 2 were my results.)

I’m not sure why those would be reasons to disagree. The latter point in particular only makes it harder to draw conclusions. The former you can only get anything out of if you make a series of assumptions (e.g. two-car vs. one-car accidents).

I might be missing something.

@BillyJoe

Pretty good guess, but the correct answer is that we have absolutely no idea. Well, actually we do have some idea. The probability is somewhere between 0% and 100%, non-inclusive, assuming that X exists.

What we do not know is the base rate for the incidence of X, and we have no way of making that determination based simply on our results. In fact, our results are even feasible if X does not exist. We can make some determinations that the base rate is pretty low, but we have no idea how low. It may be 1 in 2,000 or 1 in 100,000,000,000.

@Uncle Hoot,

Your puzzle was unanswerable as phrased. We need to know the prevalence in the population as well as the specificity and the sensitivity of the test. Then the question we really want to ask is “Given a positive test result, what is the probability that the patient actually has the condition?”

@Harriet

You are exactly correct. I assume that you posted before my comment made it through moderation. The riddle is a combination of both the base-rate fallacy and the false-positive paradox. And, well, made a bit more confusing, just for fun.

Many people will give answers along the lines of, “If it’s 99% accurate, and you have 103 positive tests out of 10,000, then the probability is very close to 100%.” The 3% answer is somewhat intuitive, but also incorrect. (To be fair, both could be correct depending on the underlying base rate.)

UH: “A test is 99% effective at detecting X. i.e. If X exists, it will be detected 99% of the time. If X does not exist, it will be falsely detected 1% of the time. Out of 10,000 tests, we detected X 103 times. What is the probability that X truly exists in our sample?”

BJ: ” ~3%”

UH: “Pretty good guess, but the correct answer is that we have absolutely no idea.

I think your answer must be correct because I am aware that you do need to know about the prevalence of X to work these things out. However, because you asked and because it didn’t sound like a trick question, I thought there must be an answer. To my surprise I found one and therefore posted it.

Unfortunately, you didn’t explain why my answer was wrong. And you didn’t explain why it was a good guess.

UH: “our results are even feasible if X does not exist.”

This is how I got the result:

If X does not exist, then none of the 103 positives are true positives. Therefore all 103 positives are false positives. But the false positive rate is 1%, and 1% of 10,000 is 100. Therefore 3 of the 103 positives must actually be true positives. Which is ~3%.

The problem, as I realise now, is that the false positive rate of 1% is actually ~1%, so that ~1% of 10,000 could actually be 103, which would mean no true positives.

Well, the rate could well be exactly 1%. But just as you won’t get exactly one each of 1/2/3/4/5/6 from six rolls of a fair die, you won’t get exactly 100 false positives from 10000 tests. It’s a distribution.

Statistical uncertainty in the results is the key, not uncertainty in the true rate.

@BillyJoe

“Unfortunately, you didn’t explain why my answer was wrong. And you didn’t explain why it was a good guess.”

I was being magnanimous.

The 99% chance really means that every time X does not exist, there is a 1% chance of a false positive. While, EXTREMELY unlikely, we could have had 10,000 false positives (0.01^10,000 chance of that). But certainly 103 is not unlikely at all, even if the false positive rate is exactly 1%, and the actual incidence of X (the base rate) is 1 in 100 million or more. We have no way of knowing the positive predictive value, which was more or less what I was asking for.

Consider this:

If the actual incidence of X is 1 in 10,000 in a collection (e.g. the general population), the probability that our presumably random sample contains X is 63.2% (1-1/e). Are our results even relevant to answering the question?

Scott: “Well, the rate could well be exactly 1%. But just as you won’t get exactly one each of 1/2/3/4/5/6 from six rolls of a fair die, you won’t get exactly 100 false positives from 10000 tests. It’s a distribution.”

Well, the false positive rate is unlikely to be exactly 1% but, yes, whatever the rate, it is going to have error bars as well.

UncleHoot: You’re referring to “incidence” when you mean “prevalence”. Dr. Hall has it right. The distinction is important in talking about things like the “autism epidemic” where you can have increasing prevalence without increasing incidence (the prevalence of any condition that’s neither fatal nor curable nor self-limiting will increase with increased longevity even if the incidence stays constant).