How Breast Cancer Is Like Climate Change
8th November 2011
We recently took a look at how some people do research on Climate Change. We noted that you cannot execute gold standard experimental science on Climate Change because the nature of nature does not permit it. We then looked at an observational approach that would be useful. Recall my fishing net and arrow shooting metaphor. Then we determined that no one is doing this yet, but rather instead are accumulating huge databases of convenience data. From these biased samples, researchers offer a kind of science revealed in this quote:
B) the application of a quality control and “correction” framework to deal with erroneous, biased, and questionable data . . .
I asserted that only God can de-bias a biased dataset, but that clearly does not stop some people’s goddish aspirations. At Berkeley some physicists believe they can turn bias into a random or representative sample. And today, so too, with epidemiologists at Harvard.
Consider the recent news reported in JAMA that proves alcohol as a carcinogen, specifically as a proven cause of breast cancer. Researchers used data from the massive and ongoing Nurses Study that is tracking the lifestyle and health of over 120,000 nurses since 1976 with a biannual self report survey. From this huge database they focus upon the effect of alcohol consumption on breast cancer. The researchers report that heavy drinkers have a 150% higher risk of breast cancer compared to nondrinkers.
More importantly, they pursue an interesting biological explanation for this outcome by looking at different characteristics of the cancer cases related to hormones and suggest a plausible mechanism of effect. An editorialist underscores this idea.
The association between alcohol use and increased risk of breast cancer is not a novel finding, but the report by Chen et al provides more detail about the risks associated with different patterns of consumption. Chen et al and other investigators suggest that alcohol probably acts through the modification of the hormonal milieu.
So. We’ve got alcohol as a proven cancer causing agent in breast cancer, plus proof of the plausible biological pathway.
All of this is true the same way that all of the research on Climate Change is true. It begins with a huge, biased dataset that researchers then debias to divine the truth of nature. No experiments. No random sampling. Just convenience sampling of observations that are then de-biased.
Think like a scientist.
1. The Nurses Study is a convenience sample of biased data. People select in and out at their option, not under researcher control. The occupational group was chosen for obvious convenience reasons. Why not teachers? Or secretaries? Or WalMart-ish associates? Why not women working in the home? Why not unemployed women? As with the Berkeley climate database, a big, biased dataset is still biased and you cannot change that.
2. Since 1976 over 30% of the sample has dropped out. They started with 122,000 participants and report results from 85,000 now. Did those 40,000 people drop out for the same reasons and with the same impact on the data? What happens when you compare results from different time periods with different sample sizes and composition? Does any one doubt that you’d get SSD among all the various comparisons you could make here and that there would be interesting and contradictory findings?
3. Self reports of behavior contain small measurement errors. People cannot exactly and consistently exactly report their alcohol consumption. With effects this small even measurement error is a plausible Rival Explanation.
4. Adjusting the data reduces the error term and makes trivial effects more statistically significant and artificially changes effect sizes. Thus, a math trick makes the effect larger, not the liquor.
5. The adjusted effect size is a Small Windowpane. At 45/55, it is barely detected above random variation. And it is highly adjusted as we’ll see.
6. The tests for the plausible biological pathway are not even statistically significant even though the editorialist seems to believe them as truly true. Really. Just like the weather report. Here’s the key quote from the research.
Because one potential mechanism for alcohol’s effect on breast cancer risk involves hormonal effects, we examined the association by ER/PR status of the tumor (TABLE 3). For this analysis, we excluded 1620 cases with unknown ER status, PR status, or both. Alcohol consumption seemed to be more strongly associated with risk of ER-positive status, PR-positive status, or both, but the P value for interaction was not significant.
The presumed hormonal effect based on a huge sample is not even statistically significant, but smart, trained, and motivated editorialists misunderstand this. Here’s his key quote, again.
The association between alcohol use and increased risk of breast cancer is not a novel finding, but the report by Chen et al provides more detail about the risks associated with different patterns of consumption. Chen et al and other investigators suggest that alcohol probably acts through the modification of the hormonal milieu.
When a hypothesis is interesting, novel, and plausible, but the data aren’t even statistically significant in a sample of 85,000 cases, it appears you can call that hypothesis “probable” in JAMA. Why confuse the science with facts when you know you are right?
7. Since there is no randomization in these data, tests of sampling error, those tests of statistical significance, are not warranted and are deceptively applied. Smart people constantly misunderstand the meaning of statistical significance not realizing that it quantifies a potential Rival Explanation – sampling error.
Simply put: The de-biased effects are Small and the Rival Explanations are plausible. These data do not support the hypothesis that alcohol causes breast cancer any more than any Climate Change database supports the hypothesis of a change in global temperature.
Now, let’s pivot from the bad science that aims at persuasion to simply science that aims at understanding. Just take the absolute outcomes from this study. The raw truth of the data is expressed in this Table which is just their reformatted Table 1.
|
Grams of alcohol per day |
||||||
|
0 |
0.1-4.9 |
5-9.9 |
10-19.9 |
> 20 |
Totals |
|
| cancers |
1669 |
3143 |
1063 |
1091 |
724 |
7680 |
| cases |
18967 |
377030 |
11559 |
10212 |
6192 |
84630 |
| percent |
.0879 |
.0833 |
.0919 |
.1068 |
.1169 |
.0908 |
Take a minute and think about it. The columns have alcohol consumption in grams. The > 20 Column translates into 2 or more drinks a day. The rows show the number of breast cancers in each drinking category and the number of cases with the percent row simply the number of cancers divided by cases. Finally, note the last column provides the totals and the grand average. Let’s start with that grand average.
The average incidence of breast cancer in this biased, convenience sample is .0908 or about 9%. Around this mean we find a range from a low of 8.3% to a high of 11.7%. If you do a simple test for differences between proportions for the non drinkers and the heavy drinkers (8.8 versus 11.7) you get a highly statistically significant difference (z = 10.74, p < .000000001!!!!) but an h effect size of .096. A Small Windowpane for h would be .2, so this obtained difference is less than half of a Small Windowpane. Small effect sizes are barely detectable over random variation and this is half of that. Sure, it is SSD out the wazoo, but that’s purely a function of that huge sample size and given that the sample is not random, statistical significant testing is unwarranted and misleading. The raw effect size is trivial even if it is true. And, is this trivial effect even true?
Think more about this. Couldn’t you get that much of a difference because of . . .
. . . convenience sampling?
. . . bias in the drop outs?
. . . error in the self report measurement?
Think about this analysis another way. Compare the nondrinkers to the next category, that 0.1-4.9 grams per day (about a drink a week). Note there’s a quantitative difference between 8.79% and 8.33%. Sure, the difference is only 0.46%, but the statistical test reveals z = 2.77, p < .006, a highly significant difference. Does it matter that the h effect is a miniscule 0.016? This comparison meets the same standards of judgment the researchers use for their other tests. So, why aren’t they saying that nondrinking causes cancer?
Remember that the researchers reported a 150% increase in breast cancer while the data we looked at in the Table show 120%. Why the small discrepancy? Recall again that I looked only at the headline comparison – more drinking, more cancer – but the researchers actually tested a more complex model – more drinking plus all those confounders, more cancer. When you “adjust” or “de-bias” or “correct” the headline data with all those other factors, it serves to artificially increase the effect size.
Please note the verbal trick here. Headline a simple hypothesis – More Drinking Causes More Cancer! But, don’t test those data. Add in other variables, like:
Additional covariates in the model were chosen to represent possible confounders and commonly accepted breast cancer risk factors and included menopausal status, age at menarche, parity, age at first birth, body mass index, family history of breast cancer in a first-degree relative, breastfeeding, cigarette smoking, and self-report of benign breast disease. All variables except age at menarche and breastfeeding were updated from follow-up questionnaires. For postmenopausal women, terms were also included for age at menopause, type of menopause, and duration and type of hormone therapy use.
Find the effect size for drinking + adjustments = cancer, but report the result as drinking = cancer. While researchers act as if “adjusting” a convenience sample makes it a random or representative sample, they are only playing math games to artifically inflate SSD and effect sizes.
Finally, realize all the false precision in this report and in most of those observational fairy tales. Researchers report things like alcohol consumed in grams per day and breast cancer cases down to decimal places as if they are counting all the pennies in the bank and reporting exact population values. All of these numbers are just estimates of Drinkingness or Cancerness or Temperatureness or whatever the construct under consideration. The numbers are never the Thing Itself, but only an estimation of the Thing Itself. Since we cannot directly and correctly measure Nature as She is, we can only estimate and then make decisions based on those estimates. This research literally eats the menu, thinking it is real food. When you realize we are dealing with estimates and not true, exact, scientific values, you realize the obtained effects are not real in themselves, but indicators and that those indicators are weak.
When you think about this, you realize the bad science here and you also see the good persuasion. The researchers look precise, objective, and deep when they are just persuasive, saying one thing while meaning something else. It also helps when you get an editorialist who asserts claims you left unsupported. These two articles literally talk out of both sides of the mouth. The original research describes the hormonal hypothesis, but then reports nonsignificant effects. The editorialist also recounts the hormonal hypothesis, but then miscounts the data to imply that hypothesis is supported. If you do not read Methods and Results you miss this and are left with a very different and mistaken conclusion.
This study, taken as a whole, offers no support for the claim that drinking causes breast cancer. No matter how you define the effect, it is small and arises from a data collection that encourages belief more in Rival Explanations than the ones provided by the researchers or the editorialist.
See the bad scientific similarities between two very different areas of study, Climate Change and Breast Cancer. In both instances, better science is available, but no one is doing it. They instead assert the ability to do what I claim is impossible. When anyone says she can de-bias the data that means she is the Queen of Tomorrow who also should be able to corner the stock market, pick Presidential winners, and run the table at Vegas. If anyone knows how to find truth in a convenience sample of data, the world is her’s for the taking, easy, ripe, and luscious to quote a charming thief.
Now. What’s a persuasion maven to do with today’s lesson?
If you are in the numbers business, you’ve got a great and enduring model of how to do sophistical statistics. All the tricks are hiding in plain sight. Just read without your ruby slippers, Dorothy.
If you are in the health business, hire these guys as consultants and away you go. You might design some kind of whiz bang decision maker app that allows women to make drinking decisions in a bar. Might even be useful for physicians. If you’ve read any pop press quotes from the clinicians, they are confused as hell over this research and an overly complex piece of software that essentially and always tells them to do nothing new would be a big seller. Get a recommendation quote from one of the study line authors or maybe that editorialist. Be careful about how you offer the money, however. Ethical considerations are important here.
If you are in the Lifestyle Police, get your drum and bugle. Harvard provides all the cover you need. Avoid any echoes of Carry Nation and Prohibition. Or, what the hell, we haven’t had a good Constitutional Amendment fight in a long time. Repeal the 21st Amendment with the 28th Amendment and reinstate the 18th Amendment! Think of the fund raising you could do on this. It could run longer than the ERA fight. Hey, we’re creating jobs here, folks! Better than Obama, Bernanke, or Congress proposals.
But, see the Health Bubble continue to inflate. And see the strain, how hard you have to work now to convince people to give you money to save their lives. You’ve got to sell trivial outcomes as science. The Bubble is so big and so taut right now, it’s really hard to make it bigger. It takes the Crimson reputation and gunfighter fast fingers pulling the statistical trigger. It also helps that no one reads the Methods and Results.
This is what you get when you try to turn bias into truth. Rumpelstiltskin tried to spin flax into gold and Harvard tries to turn wine into cancer. The persuasion lessons from these fairy tales are more interesting than the scientific ones.
P.S. Here’s a reader exercise. Some folks on this research team employed their skills looking at vitamin supplements many years ago. Using Observational Research they proved beyond the shadow of anyone’s doubt that vitamin supplements protected health, particularly with cancer. Of course, later large scale randomized controlled trials have decisively and repeatedly disconfirmed this proof. It’s your tax dollars at work. They’ve gotten millions of grant dollars and have been consistently and repeatedly proven wrong when somebody does just plain science rather than that scientific science.
P.P.S. What’s going on at Harvard? All these trivial outcomes puffed up like peacocks. Remember one of them proved that drinking pop makes kids into killers. A 109% RR. And remember that bad news with HRT. And now vitamins. As the Harvard don once put it, those who cannot remember the past are condemned to repeat it.