Observational studies on red meat consumption and lifespan are prime examples of attempts to find signal in a sea of noise.
Randomized controlled trials are the best way to sort cause from mere correlation. But these are not possible in most matters of food consumption. So, we look back and observe groups with different exposures.
My most frequent complaint about these nonrandom comparison studies has been the chance that the two groups differ in important ways, and it's these differences — not the food in question — that account for the disparate outcomes.
But selection biases are only one issue. There is also the matter of analytic flexibility. Observational studies are born from large databases. Researchers have many choices in how to analyze all these data.
A few years ago, Brian Nosek, PhD, and colleagues elegantly showed that analytic choices can affect results. His Many Analysts, One Data Set study had little uptake in the medical community, perhaps because he studied a social science question.
Multiple Ways to Slice the Data
Recently, a group from McMaster University, led by Dena Zeraatkar, PhD, has confirmed the analytic choices problem, using the question of red meat consumption and mortality.
Their idea was simple: Because there are many plausible and defensible ways to analyze a dataset, we should not choose one method; rather, we should choose thousands, combine the results, and see where the truth lies.
You might wonder how there could be thousands of ways to analyze a dataset. I surely did.
The answer stems from the choices that researchers face. For instance, there is the selection of eligible participants, the choice of analytic model (logistic, Poisson, etc.), and covariates for which to adjust. Think exponents when combining possible choices.
Zeraatkar and colleagues are research methodologists, so, sadly, they are comfortable with the clunky name of this approach: specification curve analysis. Don't be deterred. It means that they analyze the data in thousands of ways using computers. Each way is a specification. In the end, the specifications give rise to a curve of hazard ratios for red meat and mortality. Another name for this approach is multiverse analysis.
For their paper in the Journal of Clinical Epidemiology, aptly named "Grilling the Data," they didn't just conjure up the many analytic ways to study the red meat–mortality question. Instead, they used a published systematic review of 15 studies on unprocessed red meat and early mortality. The studies included in this review reported 70 unique ways to analyze the association.
Is Red Meat Good or Bad?
Their first finding was that this analysis yielded widely disparate effect estimates, from 0.63 (reduced risk for early death) to 2.31 (a higher risk). The median hazard ratio was 1.14 with an interquartile range (IQR) of 1.02-1.23. One might conclude from this that eating red meat is associated with a slightly higher risk for early mortality.
Their second step was to calculate how many ways (specifications) there were to analyze the data by totaling all possible combinations of choices in the 70 ways found in the systematic review.
They calculated a total of 10 quadrillion possible unique analyses. A quadrillion is 1 with 15 zeros. Computing power cannot handle that amount of analyses yet. So, they generated 20 random unique combinations of covariates, which narrowed the number of analyses to about 1400. About 200 of these were excluded due to implausibly wide confidence intervals.
Voilà. They now had about 1200 different ways to analyze a dataset; they chose an NHANES longitudinal cohort study from 2007-2014. They deemed each of the more than 1200 approaches plausible because they were derived from peer-reviewed papers written by experts in epidemiology.
Specification Curve Analyses Results
Each analysis (or specification) yielded a hazard ratio for red meat exposure and death (Figure).
The median HR was 0.94 (IQR, 0.83-1.05) for the effect of red meat on all-cause mortality — ie, not significant.
The range of hazard ratios was large. They went from 0.51 — a 49% reduced risk for early mortality — to 1.75: a 75% increase in early mortality.
Among all analyses, 36% yielded hazard ratios above 1.0 and 64% less than 1.0.
As for statistical significance, defined as P ≤.05, only 4% (or 48 specifications) met this threshold. Zeraatkar reminded me that this is what you'd expect if unprocessed red meat has no effect on longevity.
Of the 48 analyses deemed statistically significant, 40 indicated that red meat consumption reduced early death and eight indicated that eating red meat led to higher mortality.
Nearly half the analyses yielded unexciting point estimates, with hazard ratios between 0.90 and 1.10.
Paradigm Changing
As a user of evidence, I find this a potentially paradigm-changing study. Observational studies far outnumber randomized trials. For many medical questions, observational data are all we have.
Now think about every observational study published. The authors tell you — post hoc — which method they used to analyze the data. The key point is that it is one method.
Zeraatkar and colleagues have shown that there are thousands of plausible ways to analyze the data, and this can lead to very different findings. In the specific question of red meat and mortality, their many analyses yielded a null result.
Now imagine other cases where the researchers did many analyses of a dataset and chose to publish only the significant ones. Observational studies are rarely preregistered, so a reader cannot know how a result would vary depending on analytic choices. A specification curve analysis of a dataset provides a much broader picture. In the case of red meat, you see some significant results, but the vast majority hover around null.
What about the difficulty in analyzing a dataset 1000 different ways? Zeraatkar told me that it is harder than just choosing one method, but it's not impossible.
The main barrier to adopting this multiverse approach to data, she noted, was not the extra work but the entrenched belief among researchers that there is a best way to analyze data.
I hope you read this paper and think about it every time you read an observational study that finds a positive or negative association between two things. Ask: What if the researchers were as careful as Zeraatkar and colleagues and did multiple different analyses? Would the finding hold up to a series of plausible analytic choices?
Nutritional epidemiology would benefit greatly from this approach. But so would any observational study of an exposure and outcome. I suspect that the number of "positive" associations would diminish. And that would not be a bad thing.
John Mandrola practices cardiac electrophysiology in Louisville, Kentucky, and is a writer and podcaster for Medscape. He espouses a conservative approach to medical practice. He participates in clinical research and writes often about the state of medical evidence.
COMMENTARY
Is Red Meat Healthy? Multiverse Analysis Has Lessons Beyond Meat
John M. Mandrola, MD
DISCLOSURES
| May 09, 2024Observational studies on red meat consumption and lifespan are prime examples of attempts to find signal in a sea of noise.
Randomized controlled trials are the best way to sort cause from mere correlation. But these are not possible in most matters of food consumption. So, we look back and observe groups with different exposures.
My most frequent complaint about these nonrandom comparison studies has been the chance that the two groups differ in important ways, and it's these differences — not the food in question — that account for the disparate outcomes.
But selection biases are only one issue. There is also the matter of analytic flexibility. Observational studies are born from large databases. Researchers have many choices in how to analyze all these data.
A few years ago, Brian Nosek, PhD, and colleagues elegantly showed that analytic choices can affect results. His Many Analysts, One Data Set study had little uptake in the medical community, perhaps because he studied a social science question.
Multiple Ways to Slice the Data
Recently, a group from McMaster University, led by Dena Zeraatkar, PhD, has confirmed the analytic choices problem, using the question of red meat consumption and mortality.
Their idea was simple: Because there are many plausible and defensible ways to analyze a dataset, we should not choose one method; rather, we should choose thousands, combine the results, and see where the truth lies.
You might wonder how there could be thousands of ways to analyze a dataset. I surely did.
The answer stems from the choices that researchers face. For instance, there is the selection of eligible participants, the choice of analytic model (logistic, Poisson, etc.), and covariates for which to adjust. Think exponents when combining possible choices.
Zeraatkar and colleagues are research methodologists, so, sadly, they are comfortable with the clunky name of this approach: specification curve analysis. Don't be deterred. It means that they analyze the data in thousands of ways using computers. Each way is a specification. In the end, the specifications give rise to a curve of hazard ratios for red meat and mortality. Another name for this approach is multiverse analysis.
For their paper in the Journal of Clinical Epidemiology, aptly named "Grilling the Data," they didn't just conjure up the many analytic ways to study the red meat–mortality question. Instead, they used a published systematic review of 15 studies on unprocessed red meat and early mortality. The studies included in this review reported 70 unique ways to analyze the association.
Is Red Meat Good or Bad?
Their first finding was that this analysis yielded widely disparate effect estimates, from 0.63 (reduced risk for early death) to 2.31 (a higher risk). The median hazard ratio was 1.14 with an interquartile range (IQR) of 1.02-1.23. One might conclude from this that eating red meat is associated with a slightly higher risk for early mortality.
Their second step was to calculate how many ways (specifications) there were to analyze the data by totaling all possible combinations of choices in the 70 ways found in the systematic review.
They calculated a total of 10 quadrillion possible unique analyses. A quadrillion is 1 with 15 zeros. Computing power cannot handle that amount of analyses yet. So, they generated 20 random unique combinations of covariates, which narrowed the number of analyses to about 1400. About 200 of these were excluded due to implausibly wide confidence intervals.
Voilà. They now had about 1200 different ways to analyze a dataset; they chose an NHANES longitudinal cohort study from 2007-2014. They deemed each of the more than 1200 approaches plausible because they were derived from peer-reviewed papers written by experts in epidemiology.
Specification Curve Analyses Results
Each analysis (or specification) yielded a hazard ratio for red meat exposure and death (Figure).
Paradigm Changing
As a user of evidence, I find this a potentially paradigm-changing study. Observational studies far outnumber randomized trials. For many medical questions, observational data are all we have.
Now think about every observational study published. The authors tell you — post hoc — which method they used to analyze the data. The key point is that it is one method.
Zeraatkar and colleagues have shown that there are thousands of plausible ways to analyze the data, and this can lead to very different findings. In the specific question of red meat and mortality, their many analyses yielded a null result.
Now imagine other cases where the researchers did many analyses of a dataset and chose to publish only the significant ones. Observational studies are rarely preregistered, so a reader cannot know how a result would vary depending on analytic choices. A specification curve analysis of a dataset provides a much broader picture. In the case of red meat, you see some significant results, but the vast majority hover around null.
What about the difficulty in analyzing a dataset 1000 different ways? Zeraatkar told me that it is harder than just choosing one method, but it's not impossible.
The main barrier to adopting this multiverse approach to data, she noted, was not the extra work but the entrenched belief among researchers that there is a best way to analyze data.
I hope you read this paper and think about it every time you read an observational study that finds a positive or negative association between two things. Ask: What if the researchers were as careful as Zeraatkar and colleagues and did multiple different analyses? Would the finding hold up to a series of plausible analytic choices?
Nutritional epidemiology would benefit greatly from this approach. But so would any observational study of an exposure and outcome. I suspect that the number of "positive" associations would diminish. And that would not be a bad thing.
John Mandrola practices cardiac electrophysiology in Louisville, Kentucky, and is a writer and podcaster for Medscape. He espouses a conservative approach to medical practice. He participates in clinical research and writes often about the state of medical evidence.
Any views expressed above are the author's own and do not necessarily reflect the views of WebMD or Medscape.
TOP PICKS FOR YOU