Screening for disease is a real pain. I was reminded of this by the publication of a study in BMJ the very day of the Science-Based Medicine Conference a week and a half ago. Unfortunately, between The Amaz!ng Meeting and other activities, I was too busy to give this study the attention it deserved last Monday. Given the media coverage of the study, which in essence tried to paint mammography screening for breast cancer as being either useless or doing more harm than good, I thought it was imperative for me still to write about it. Better late than never, and I was further prodded by an article that was published late last week in the New York Times about screening for cancer.
If there’s one aspect of medicine that causes more confusion among the public and even among physicians, I’d be hard-pressed to come up with one more contentious than screening for disease, be it cancer, heart disease, or whatever. The reason is that any screening test is by definition looking for disease in an asymptomatic population, which is very different from looking for a cause of a patient’s symptoms. In the latter case, the patient is already being troubled by something that is bothering him. There may or may not be a cause in the form of a disease or syndrome that is responsible for the symptoms, but the very existence of the symptoms clues the physician in that there may be something going on that requires treatment. The doctor can then narrow down range of possibilities for what may be the cause of the patient’s symptoms by taking a careful history and physical examination (which will by themselves most often lead to the diagnosis). Diagnostic tests, be they blood tests, X-rays, or other tests, then tend to be more confirmatory of the suspected diagnosis than the main evidence supporting a diagnosis.
In contrast, screening for a disease involves subjecting a population, the vast majority of whom don’t have the disease and nearly all of whom are asymptomatic, to a diagnostic test in order to find disease before it has begun to cause symptoms. This is very different from the vast majority of diagnostic tests used in medicine, the vast majority of which are done for specific indications. Any test to be contemplated for doing this thus has to meet a very stringent set of requirements. First, it must be safe. Remember, we’re subjecting asymptomatic patients to a test, and a risky, invasive tests are rarely going to be worthwhile, except under uncommon circumstances. Second, the disease being screened for must be a disease that is curable or manageable. For instance, it makes little sense to screen for a disease like amyotropic lateral sclerosis (Lou Gehrig’s disease) because there is very little that can be done for it; diagnosing it a few months or a few years before symptoms appear won’t change the ultimate outcome. A corollary to this principle is that acting to treat or manage a disease early, before symptoms appear, should result in a better outcome. I’ve discussed the phenomenon of lead time bias in depth before; it’s a phenomenon in which earlier diagnosis only makes it seem that survival is longer because it was caught earlier when in reality it progressed at the same rate as it would without treatment. Lead time bias is such an important concept that I’m going to republish the diagram I last used to explain it, because in this case a picture is worth a thousand words (and who more than I, other than perhaps Kim Atwood, can lay down a thousand words so easily?):
Another requirement for a screening test is that the disease being screened for must be relatively common. Screening for rare diseases is simply not practical from an economic standpoint, as the vast majority of “positive” test results will be false positives. Another way of saying this is that the specificity of the test must be such that, whatever the prevalence of the disease in the population, it does not produce too many false positives. In other words, for less common diseases the specificity and positive predictive value must be very high (i.e., a “positive” test result must have a very high probability of representing a true positive in which the patient actually does have the disease being tested for and the false negative rate can’t be too high; either that, or screeners must be prepared to do a lot of confirmatory testing for a large number of false positives). For more common diseases, a lower positive predictive value is tolerable. The test must also be sufficiently sensitive that it doesn’t produce too many false negatives. Remember, one potential drawback of a screening program is a false sense of security in patients who have been screened, a drawback that will be increased if a test misses too many patients with disease. Finally, a screening test must also be relatively inexpensive. One of the reasons there is so much controversy over using MRI as a screening test for breast cancer, for example, is because an MRI easily costs over $1,000 per test. In contrast, good, old-fashioned mammography is between 10- and 20-fold less expensive. Consequently, MRI is currently not recommended for breast cancer screening in the general population and is instead reserved for younger women at high risk due to a strong family history or a known mutation in a cancer susceptibility gene, as I discussed last year.
Recently, there has been a bit of a kerfuffle in the literature and in the lay press reporting on the literature regarding the risks and the benefits of various screening tests, including mammography. For example, three weeks ago Harriet wrote about a study that measured the cumulative rates of false positives from screening tests for prostate, lung, colorectal, and ovarian cancer and, earlier, about ultrasound screening for prostate cancer. Prostate cancer, for example, is a particularly difficult disease to screen for because there is such a high incidence of disease that never causes a problem. In autopsy series of men over 80, at least 75% of men have evidence of at least one focus of prostate cancer somewhere in their prostate glands, but obviously it never caused them any problem and they died of something else, including old age. On the other hand, prostate cancer is consistently one of the top cancers in men, resulting in 30,000 deaths a year. Sometimes, it can be quite aggressive. Because we don’t have reliable tests to differentiate prostate cancer that will progress from prostate cancer that will never cause a problem, whenever we find prostate cancer by screening there is always the question of what to do with it, to treat or not to treat.
Since my specialty is breast cancer, in the wake of the aforementioned New York Times article In Push for Cancer Screening, Limited Benefits and multiple news stories about a study that purports to show that 1 in 3 breast cancers detected by mammographic screening are overdiagnosed and overtreated, entitled, appropriately enough, Overdiagnosis in publicly organised mammography screening programmes: systematic review of incidence trends. This study was done by Karsten Juhl Jørgensen and Peter C. Gøtzsche at The Nordic Cochrane Centre. Here’s how an AP story described the study:
LONDON – One in three breast cancer patients identified in public screening programs may be treated unnecessarily, a new study says. Karsten Jorgensen and Peter Gotzsche of the Nordic Cochrane Centre in Copenhagen analyzed breast cancer trends at least seven years before and after government-run screening programs for breast cancer started in parts of Australia, Britain, Canada, Norway and Sweden.
The research was published Friday in the BMJ, formerly known as the British Medical Journal. Jorgensen and Gotzsche did not cite any funding for their study.
Once screening programs began, more cases of breast cancer were inevitably picked up, the study showed. If a screening program is working, there should also be a drop in the number of advanced cancer cases detected in older women, since their cancers should theoretically have been caught earlier when they were screened.
However, Jorgensen and Gotzsche found the national breast cancer screening systems, which usually test women aged between 50 and 69, simply reported thousands more cases than previously identified.
Overall, Jorgensen and Gotzsche found that one third of the women identified as having breast cancer didn’t actually need to be treated.
This is more or less a half way decent CliffsNotes version of the study, but let’s look at the study in a bit more detail, because I have–shall we say?–issues about it. First off, it’s important to remember that this is a systematic review. Actually, it is part systematic review, part meta-analysis, which is one reason why I have issues with it. Whether you call it a systematic review or meta-analysis, as a consequence it is highly dependent upon the selection of studies and how the studies are interpreted. I wasn’t encouraged by this statement in the introduction:
It is well known that many cases of carcinoma in situ in the breast do not develop into potentially lethal invasive disease. In contrast, many find it difficult to accept that screening for breast cancer also leads to overdiagnosis of invasive cancer.
While this may be true of certain advocacy groups, the implication seems to be that many physicians also find it “difficult to accept” that screening can lead to overdiagnosis. No evidence is presented to support this little dig, and overall it sets the tone of settling scores. This is not terribly surprising, because these two authors are well known for publishing articles about mammography that absolutely, positively always downplay the benefits of screening mammography and play up the risks, as a couple of quick PubMed searches show (1, 2). They’re also known for writing letters to scientific journals that dispute points of papers that find a benefit from screening mammography. This doesn’t mean that Jørgensen and Gøtzsche are wrong, of course, only that they do appear to have a definite point of view, which should be kept in mind as much as anyone else’s point of view when examining a scientific paper. While I like their work to some extent as a counterweight to too much boosterism for ever earlier screening, I do find that they tend to go a bit too far at times in downplaying the usefulness of mammography, and at times they seem to be on a mission to cast doubt on mammography, as when they led a campaign to change a British leaflet used to encourage women to undergo mammography and making statements like:
But “it has not been proven that screening saves lives,” they insist, and new evidence shows less benefit and substantially more harm from screening than previously thought.
In fact, this balance between risk and benefit has changed so much in recent years that nationwide programs of breast screening would be unacceptable, they say. “We believe that if policy makers had had the knowledge we now have when they decided to introduce screening about 20 years ago . . . we probably would not have had mammography screening.”
Personally, I think that’s a huge overstatement. However, in fairness, I must mention that Michael Baum ChM, FRCS, MD, FRCR hon., a breast surgeon in England and someone I respect, also signed on to this campaign. I don’t entirely agree with everything he said, but I do take him, as well as Jørgensen and Gøtzsche, seriously. In any case, what bothers me about Jørgensen and Gøtzsche’s systematic review the most is the assumption behind it. Let’s look at how Jørgensen and Gøtzsche estimated overdiagnosis. Basically, the assumption is that in a population that begins screening, intitial cancer diagnoses should increase and there should be a compensatory decrease in cancer diagnoses in the population as women get older. The reason, if this model is valid, is that cancers that wouldn’t have been diagnosed until later ages are now being diagnosed when women are younger. So what Jørgensen and Gøtzsche did was to look for compensatory drops in breast cancer diagnoses in older women. One thing that also has to be emphasized is that this study looked only at mammographically detected breast cancers found by screening, not breast cancers detected by the palpation of a lump in the breast or other symptoms. If a woman has a lump in the breast, ruling out cancer is imperative, especially if the woman is over 40.
One thing that needs to be understood is that the countries examined in this study (United Kingdom; Manitoba, Canada; New South Wales, Australia; Sweden; and parts of Norway) have very different screening protocols than we do in the United States. In most of these countries, for women at an average risk for breast cancer, screening with mammography doesn’t begin until age 50, and in some countries mammography is performed only once every two years, rather than every year. In addition, in many of these countries, screening ends after age 70 or so, which is one reason why Jørgensen and Gøtzsche break down the groups they look at into “too young for screening,” “screening age” (50-64 or 50-69, depending on the nation), or too old for screening. In the U.S., screening generally begins at age 40 and continues on a yearly basis in essence forever, a system that is, quite frankly, not well grounded in evidence and designed to maximize overdiagnosis, given that the evidence that screening between ages 40-50 reduces breast cancer mortality is rather weak, that the prospective randomized controlled trials evaluating mammography to date have excluded patients older than 74, and that few studies have rigorously looked at screening in women over 80. It’s also rather fascinating to those of us in the U.S. that mass screening programs in some of these countries didn’t start until the 1990s.
Basically, what Jørgensen and Gøtzsche found was that in these countries there was, as expected, an increase in the rate of diagnosis of both invasive breast cancer and ductal carcinoma in situ (DCIS), a lesion that is considered cancer but has not yet invaded out of the breast ducts. This is one problem that I have with this article, namely that it lumps together invasive cancers and DCIS. The reason is that DCIS is a condition of uncertain significance in that it is known that a significant percentage of DCIS never progresses to full cancer. Consequently, the inclusion of DCIS is guaranteed to inflate the estimate of overdiagnosis; from my perspective, DCIS should be analyzed separately. For one thing, it would give us a better idea of just how much overdiagnosis there might be in invasive breast cancer; more importantly it might provide evidence that “watchful waiting” might be safe in some cases of DCIS. Dumping the two together, however, confuses things. For example, what if 80% of the overdiagnosis is due to DCIS? Then overdiagnosis of invasive cancer would be much less, telling us that we should treat invasive cancers discovered by mammographic screening.
Another problem with the study was pointed out in the rapid responses on BMJ by Dr. Daniel B. Kopans, Professor of Radiology at the Harvard Medical School and the Massachusetts General Hospital. I noticed this problem, too, but couldn’t formulate it as clearly at first. The problem is that this study is not looking at a fixed cohort of women before and after screening began. Consequently, every year, new women turn 50 and begin screening, and screened women “age out” of the screening system. Finally, another huge confounder that I don’t think that Jørgensen and Gøtzsche adequately account for is the use of hormone replacement therapy and the decrease in inherent breast cancer incidence that has begun since the report from the Women’s Health Initiative study that found increased risk of breast cancer, heart disease, stroke and blood clots in women taking hormone replacement therapy (HRT) led to a dramatic decrease in the number of women using HRT.
Don’t get me wrong. There is no doubt that mammographic screening programs produce a rate of overdiagnosis. The question is: What is the rate? Unfortunately, the most accurate way to measure the true rate of overdiagnosis would be a prospective randomized trial, in which one group of women is screened and another is not, that follows both groups for many years, preferably their entire life. Such a study is highly unlikely ever to be done for obvious reasons, namely cost and the fact that there is sufficient evidence to show that mammographic screening reduces breast cancer-specific mortality for women between the ages of 50 and 70 at least, the latter of which would make such a study unethical. Consequently, we’re stuck with retrospective observational studies, such as the ones analyzed in this systematic review. One trial that has a stronger design, as far as I’m concerned, was reported in 2006 by Zackrisson et al entitled Rate of over-diagnosis of breast cancer 15 years after end of Malmö mammographic screening trial: follow-up study, which found a rate of overdiagnosis of between 10% and 18%, depending upon how the data were analyzed. For one thing, the Malmö study has the advantage of looking at individual women over time randomized to screening or no screening, rather than population-level numbers, as well as having a 15 year followup. Its disadvantage is that the Malmö mammographic screening trial was one of the classic ongoing studies, for which the screening occurred between the years 1976 to 1986. In the interim, mammography has become more sensitive, with improved imaging and digital mammography. Consequently, it is quite possible that the rate of overdiagnosis is higher than what was reported in the followup to the Malmö study.
There’s another wrinkle. As I discussed last December, there was recently a study that purported to show that roughly 22% of mammographically detected cancers might actually spontaniously regress. You may recall that I disputed whether the number was that high, but accepted that it’s probable that some percentage of mammographically-detected cancers might regress. (Remember, though, I’m referring to mammographically-detected cancers in asymptomatic women. There is no good evidence that cancers detected by palpation of a mass or other symptoms regress at an appreciable rate; they might do so rarely, but certainly not at anything approaching 20%.)
On an issue like this, I like to try to look for confluences of studies, and I think I’m starting to see one here. Overall, if you look at the literature, there appears to be a convergence of estimates of the rate of overdiagnosis for mammographically detected breast cancers of somewhere around 20-25%. Jørgensen and Gøtzsche’s work tends to represent the high end (and, indeed, seem custom-designed to me to make the estimate of overdiagnosis as high as possible), while the Malmö study representing the low end. Throw in the Norwegian study that estimates that one in five mammographically detected breast cancers spontaneously regress, and science seems to be honing in on a “true” rate of overdiagnosis that is probably in the range of one in five to one in four. The problem then becomes: What do we do with that information.
The aforementioned Michael Baum actually commented on the study under the title Attitudes to screening locked in time warp, where he advocates a complete rethink of the British breast cancer screening program. He’s right to some extent, but he goes too far in implying that the improvements in our understanding of the biology and natural history of breast cancer are such that we know enough to be highly confident in what to do about the program:
Next, as described in Jørgensen and Gøtzsche’s paper, estimates of harm through over-diagnosis have increased. At the same time our understanding of the biology of breast cancer has improved so that the idea of latent or self limiting pathology that is so counter-intuitive to the screening community, is no longer surprising to those who have bothered to keep up to date.
Finally, whilst the screening programme remains an inviolate bovine deity, treatment regimens have improved by leaps and bounds further narrowing the window of opportunity for screening to demonstrate an impact.
Actually, its a huge overstatement to say that treatment regimens have improved by “leaps and bounds” over the last 20 years. They have definitely improved, but not so dramatically that we can, as Baum appears to do, so blithely discount the potential benefits of detecting breast cancer earlier. Even if earlier detection, because of lead time bias, doesn’t change the overall prognosis, at the very least detecting cancer earlier makes it possible to use less disfiguring surgery in more women and to use less aggressive chemotherapy. This is not a benefit to be sneezed at, but it is one that is frequently completely ignored in discussions of this sort.
Where Baum is more on target is his suggestion for risk-adjusting our recommmendations for mammographic screening. I might quibble with his exact approach (and I do, actually), but it is becoming clear that mass screening programs of the “one-size-fits-all” variety for breast cancer are probably not doing as much good as advertised or are arguably doing more harm than previously expected. The time is coming for a less dogmatic approach than has been the norm. One approach might be an evidence- and science-based discussion of the known risks and benefits of mammographic screening. In an accompanying editorial, H. Gilbert Welch tried to provide a “balance sheet” of the risks and benefits for 1000 women undergoing annual mammography for 10 years starting at the age of 50 years. These include:
- 1 woman will avoid dying from breast cancer
- 2–10 women will be overdiagnosed and treated needlessly
- 10–15 women will be told they have breast cancer earlier than they would otherwise have been told, but this will not affect their prognosis
- 100–500 women will have at least 1 “false alarm” (about half these women will undergo biopsy)
Unfortunately, the error bars around these estimates are very high. Where Welch is correct is here:
Mammography is one of medicine’s “close calls”—a delicate balance between benefits and harms—where different people in the same situation might reasonably make different choices. Mammography undoubtedly helps some women but hurts others. No right answer exists, instead it is a personal choice.
To inform that choice, women need a simple tabular display of benefit and harms—a balance sheet of credits and debits
Equally important are the estimates themselves. Zackrisson and colleagues reported 62 fewer deaths from breast cancer and 115 women overdiagnosed—a ratio of one death avoided to two women overdiagnosed. Recently, Gøtzsche and colleagues argued in the BMJ that the ratio is one to 10. For many women, the tipping point may be within this range. Careful analyses that explicitly lay out their assumptions and methods, which will improve the precision of these estimates, are sorely needed.
The problem in changing the way that screening programs for breast cancer are done is likely to be convincing women that a more selective approach to mammographic screening is desirable. To people not educated in cancer, it makes intuitive sense that detecting cancer earlier will result in better outcomes, and it is hard to explain the down side, namely over diagnosis, unnecessary biopsies and treatment, and the emotional distress that such false positive diagnoses make. Also, the maintenance of public and therefore political support often depends upon keeping the message as simple as possible. Including a discussion of the risks of mammography will likely be seen by many public health officials as “muddying the waters” and harming their efforts to promote screening. Moreover, great care has to be taken to make sure that the public understands that this discussion is only about the screening of asymptomatic women. Women with breast masses or other symptoms worrisome for cancer should not be led to refuse mammograms because of this discussion.
Unfortunately, well-meaning legislators who don’t have a grasp of the potential for harm that screening has are uncritically promoting ever more mammographic screening, even for conditions for which it hasn’t been shown to result in a decrease in mortality. For example, in the U.S., we screen beginning at an earlier age than nearly every other industrialized nation, and we recommend mammography every year instead of every other year. For women over 50, screening has been shown to result in a 15-25% reduction in mortality due to breast cancer, but in women from 40 to 50 years of age demonstrating a major reduction in mortality due to screening is much more difficult and in women under 40 no such reduction has been noted, mainly because of two reasons. First, as the NYT article describes, younger women tend to have denser breast tissue, which makes mammography less sensitive. Second, although there is the perception that there is a lot of breast cancer in women under 40, in reality it is relatively uncommon:
Dr. Love and other critics have also argued that a public health campaign could cause younger women to overestimate their chances of dying from breast cancer. Of the estimated 41,000 deaths a year in the United States from breast cancer, about 1 in 14 involve women younger than 45, according to the C.D.C. Only 1 in 33 breast cancer deaths — about 1,200 a year — occurs in women younger than 40.
Remember what I said about screening for uncommon diseases? The less common the disease, the harder it is to screen for, and the more specific the test has to be to do have a chance of doing more harm than good. Targeting younger women for breast cancer screening not only shifts to a population that has a lower incidence of breast cancer but mammography is less sensitive in that population, two characteristics that combine to increase the potential for harm and decrease any potential for benefit. However, that’s exactly what Representative Wasserman Schultz (D-FL) is trying to do with her sponsorship of the Breast Cancer Education and Awareness Requires Learning Young Act of 2009, also known as the EARLY Act. It’s a well-intentioned, but would almost certainly lead to these complications:
But critics say the House bill promotes techniques like breast self-exams that have not proved to find cancer at an earlier stage or to save lives. The concern is that the technique could cause younger women — a group for whom breast cancer is a rare disease — to find too many medically insignificant nodules that would lead doctors to perform unneeded biopsies, in which tissue is removed for testing.
Scarring from biopsies could make breast cancer harder to detect when the women are older and have a much higher risk of getting the disease, critics say. And such false alarms can also cause women to distrust the medical system and skip mammograms later in life when the tests have been proved to reduce the death toll, said Dr. Otis W. Brawley, an oncologist who is the chief medical officer of the American Cancer Society.
The problem is that what seems intuitively obvious, namely that if some screening is good more screening must be better, is neither obvious nor science-based, and it’s really, really hard to explain that in a way that neither oversells the benefits or mammography nor leads women to think that mammography is useless.
What I hope for one day, and what may well come out of the emerging sciences of oncogenomics and proteonomics, is a manner of determining with more accuracy exactly which cancers will and will not progress. In that case, one potential harm of overtreatment of breast cancer, for example, would be obviated, namely treating women who don’t need them with aggressive surgery, radiation, and chemotherapy. Indeed, we already have some progress in this area with the Oncotype DX gene assay and related products, which have already changed the way oncologists practice by giving them information about which estrogen receptor-positive breast cancer patients can safely forego adjuvant chemotherapy. The problem with such assays, of course, is that they still require tissue, which means that women with overdiagnosed breast cancers would still have to undergo a biopsy. Much better would be an imaging study that could provide the same risk information. Ten years ago, I would have thought such a system to be science fiction. Now I’m optimistic that by the end of my career we will be doing things that way, although what could scuttle any new technology is likely to be cost, which means that mammography is unlikely to be going anywhere any time soon.
In the meantime, we’ll have to muddle through with our imperfect screening systems, and try to tow a middle line between screening advocates who view screening as the be-all and end-all of breast cancer diagnosis and skeptics like Jørgensen and Gøtzsche, who seem to be arguing that screening programs for breast cancer are useless.
Jorgensen, K., & Gotzsche, P. (2009). Overdiagnosis in publicly organised mammography screening programmes: systematic review of incidence trends BMJ, 339 (jul09 1) DOI: 10.1136/bmj.b2587