Correlation Is Not Causation, Wink, Is Not Good Enough for Observational Research

; Robert W. Yeh, MD, MSC

Disclosures

April 17, 2024

This transcript has been edited for clarity.

Robert A. Harrington, MD: Hi. This is Bob Harrington from Weill Cornell Medicine on Medscape Cardiology | theheart.org. Over the years, we've talked often on this show about randomized clinical trials. We talk about some of the latest research coming out of observational analyses, outcomes-based research, and we've had discussions about which one is a better form of evidence or a stronger form of evidence. Is it the randomized clinical trial? Is it a well-done observational analysis? The answer is that it depends.

It depends on what your question is. It depends on the datasets that you're using. It depends how the trial was done. What we're really trying to do is to understand whether or not we have reliable information coming out of a research study that allows us to make decisions about how we might be treating our patients. This also involves things like causal inference, and we're going to talk about that today.

I can't think of a better person to have this conversation with today than my good friend and colleague, Dr Bobby Yeh from Harvard. Bobby is the Katz-Silver Family Professor of Medicine at Harvard Medical School, and he's also the director of the Richard and Susan Smith Center for Outcomes Research at the Beth Israel Lahey Medical Center in Boston. Bobby, thanks for joining us here on Medscape Cardiology.

Robert W. Yeh, MD, MSC: Thanks for having me, Bob.

Harrington: It's been a few years. You reminded me before we got on the air. If I think about it, last time you and I talked, you were having a baby. What did you say? You now have a 5-year-old, so it's been a while.

Yeh: I do. He's almost 6, so it's been a little bit.

Harrington: It's been a while. I think last time we talked, we talked about Twitter. You had written a paper about democratization of knowledge through social media.

Yeh: That's right. That's gone a little sideways since that paper.

Credible Observational Research

Harrington: Twitter has definitely gone sideways. The reason I wanted to invite you to talk about this topic is twofold, one of which is it's something that you and I have talked about often at medical meetings, at conferences — this whole notion of the strength of evidence, how we assess evidence, and what kind of level of evidence do we need.

You wrote a really important paper last summer, which is the second reason I invited you. It's called, "Bringing the Credibility Revolution to Observational Research and Cardiology." I know you've talked with others about the topic, and I thought, You know what, it's still a pretty important topic. It was really important during COVID, wasn't it, as people were trying to put forward all sorts of observational analyzes to try to help us gain insight rapidly into what therapies might be of value?

Thanks for joining me. Let's maybe start with the biggest picture, Bobby. Why did you write the paper?

Yeh: It's a topic that I've been thinking about for many years, obviously, and we've had many discussions over the years, and our colleagues here have as well. It's an area that we do a large amount of research in here, which is, how do we get more credible estimates out of our observational analyses for comparative effectiveness research? We have had a particular interest in the past few years working with the FDA, and more recently CMS, to understand the value of real-world evidence, and I think it's a very polarizing topic.

It's a topic that you can find people who swear that they'll never trust an observational study, and you find others who say that they really can believe the causal interpretability of a well-done observational study. Very smart, reasonable people can disagree strongly on this issue. I felt like there has been a real challenge that we faced in cardiology, and really, across medicine — but our field is cardiology. I felt that it was time to talk about at least some of the challenges in how we might move that discussion forward.

Harrington: I think that's a really nice way to sum it up. When you invited me to the Harvard School of Public Health last year to attend a meeting on causal inference, I felt like I was the sacrificial clinical trialist walking into the lion's den. Actually, I think we all agreed on much more than we disagreed on.

Yeh: I think that's right. I think, under certain circumstances, randomized clinical trials give you the best answer. I think we'd all agree also that we can't conduct trials for every question that we have. For some questions, particularly those that relate to understudied populations or how devices and therapies are working in actual practice, we have to at least be able to understand some evidence from observational literature, and we can't just throw all of it out. We do so only at our own peril.

Harrington: I fully agree with that. While I like to think of myself as a trialist in many ways, I'm a clinical researcher who uses the methods that are available and that might be best suited to answer that particular question. If I think about my own body of work, there's much more that we've written that deals with nonrandomized data than there is when we've dealt with a specific randomized question. Even within randomized clinical trials, there's a whole set of observational things that you do.

Yeh: Absolutely.

Harrington: Bobby, one of the things I really enjoyed about the paper is it made me think about where people are using some of these techniques in very thoughtful, innovative, and maybe ways that we could learn from. You particularly talk about economics and economic policy. Talk about that a little bit, particularly for the cardiology reader who might not be thinking about the economic literature.

Yeh: When I started thinking about these issues, really, as a fellow trying to understand observational research methods — this was when I was a fellow at UCSF — I had a lecture on instrumental variable analysis, and I was really struck by the potential power of these.

When I looked into it a little bit more, I started to understand that, boy, instrumental variable analysis is something that all cardiologists should really have a deep understanding of because the best instrumental variable analysis is really the randomized clinical trial. Yet, very few cardiologists and cardiovascular investigators utilize observational approaches with instrumental variable analysis, and actually, very few clinical trials really apply true instrumental variable approaches to their randomized controlled trials, which is a whole different potential topic of conversation.

Learning From Economists

When I got back here as an interventional fellow, I was really having trouble understanding one particular question, actually, sort of esoteric question, and I sent an email to this guy who had written papers on this, Josh Angrist. He was an MIT professor. He was just across one stop on the Red Line from Mass General, where I was at the time. He's at the Kendall Square T stop, which you know well. I had lunch with him, and the conversation we had at lunch was just really eye-opening.

Fast forward a little bit more than a decade later, and Josh received the Nobel Prize in economics. I should say, the Ford Professor of Economics, Joshua Angrist received that Nobel Prize because he really discovered how to apply these quasi-experimental methods to understand labor economics and charter school policies, things that we don't really study that much but have this confounding issue that, I think, challenges many medical studies, which is people who opt into charter schools or people who opt into certain economic policies, don't randomly opt in.

He had figured out that we can find these natural experiments, instrumental variables, and really understand the effects of policy. It was a great example of where there was very low credibility in economic studies for comparative effectiveness or causal inference back then, and he, David Card, and Guido Imbens — the three of them together — really revolutionized that field. That's the kind of revolution that we're in need of in cardiology.

Harrington: Let's unpack that a little bit. First, describe for people what makes for a good observational study if one wants to draw causal inference.

Yeh: I've spoken a little bit about the econometric methods with my epidemiology colleagues. They listen to this, and they would be turning over because they would say I'm putting too much weight on those. The first starting point is common to whatever approach you do, which is actually identifying and clearly stating what the question is. It's harder than we think it is. Sometimes we think, Oh, I'm just comparing A vs B. But when you look at the details of what you actually set up in observational analysis, you didn't quite do what you think you did.

What most investigators will say now, including Miguel Hernán, who has really pioneered much of this thinking and who you met in our causal inference seminar last year, he said, think of the theoretical clinical trial that you want to run, the randomized clinical trial. Everything about your observational analysis should follow from how you do it, including the eligibility criteria, the hypothesis, the outcomes, and how you're ascertaining those things. If you approach an observational study with the same rigor that we design a randomized clinical trial, then we won't make silly mistakes that can cause problems. That's the starting point.

After that, I think you have to think really clearly about the dataset and whether the dataset are fit for purpose for the study that you're trying to do. For me, that means understanding whether or not the device or the treatment is well measured, the outcome is well measured, and then maybe most importantly, do you have all of the potential confounders? Often, the answer to that question is no, and the answer should be that you shouldn't proceed with that analysis.

Harrington: You had a conversation on this topic with John Mandrola, and there was a fascinating set of comments after the presentation, including from someone I suspect you know well, my former Duke colleague, statistician Frank Harrell, from whom I learned over the years when I was a cardiology fellow at what was then called the Duke Databank.

What Frank said in one of his comments — and I could hear his voice — is that you ought to be getting 10 experts and asking them all the things that they think about when they're selecting a therapy, technique, or technology. If you have measured all those things, then you can proceed. That's essentially what you're saying.

Yeh: It's so interesting, too, that that's coming from Frank, a statistician, because many of us investigators think that this is a problem that can be solved with statistics. It's actually not. It's a problem that is solved by asking clinicians who make those decisions every day. A statistician can do many things, but what a statistician can't do is tell you whether or not the dataset capture the important clinical confounders. That's a clinician's job.

Harrington: It reminded me of the best part of being a fellow in the Duke Databank, which was Tuesday research conference when the statisticians, the clinicians, all of us as fellows would get together to talk about work in progress, sometimes work that was pretty evolved in progress. The conversations were always, "Tell, tell me, Bob, why did you pick angioplasty for this patient?"

It's trying to get to your point, which is what were the things that made you choose one therapy over another? Is there a certain set of those things that we can all agree upon and then proceed? As you've said, if we can't come up with that list or if we come up with a list and you realize that your dataset are woefully lacking, you probably ought to stop. People don't stop, do they?

Yeh: They rarely do. They usually press forward, and then they'll have it as a line in the limitation section, saying, "We can't rule out confounding as a possible explanation." The problem with that is that's like a get-out-of-jail-free card. Sometimes you read the study and you say this is impossibly confounding, and your limitation is actually the primary conclusion of the paper. If there's a strong possibility or probability that your single greatest limitation entirely overturns your study, you ought to think twice about whether or not that's a worthwhile study.

Now, I will say also that these quasi-experimental approaches that we started with, the econometric methods, don't hinge on the unmeasured confounder assumption. They are useful when the dataset doesn't capture the measured confounders, and you can't do a direct A vs B comparison.

Harrington: That's an important observation. The really unfortunate thing is that most people, and I think Frank noted this in his comments — he said the good researchers are not the ones that he's worried about because they are doing the things that he thinks are important and that you've already laid out. It's the not-so-good researchers who were trying to get an answer out there and many times haven't prespecified what they're going to do.

In the clinical trials, we go through a great deal of mental anguish around the order in which, for example, you're going to analyze your secondary endpoints to protect the type I error? You declare it. You put your nickel down, and you follow it. Particularly if you're doing things like sequential testing or hierarchical testing, you say, "Okay, I got through these three, the P values are what I said they would be to achieve statistical significance, and now, I have one that's above that threshold, everything else is exploratory." You have to say that.

Yeh: You have to declare those things, and we ought to be doing more of that prespecification and declaration and registration and statistical analysis plans for observational studies in the same way we do for randomized controlled trials. They're not unique to trials. It just so happens that that's become the standard operating procedure for trials, and it hasn't been for observational data.

Harrington: Why do you think that is? We go to clinicaltrials.gov where we're all required to put our clinical trials plans online to give people a chance to make sure that we haven't changed our minds during the course of the study, that we did what we said we were going to do. What's the saying: "You say what you're going to do and do what you said"? That's what you should be doing. How come we don't do that with observational reports?

Yeh: It's a great question. I think it has to do with how much discipline it takes to do it. It requires quite a bit of discipline to do it for randomized clinical trials, but it's become the expectation for trials. Now, if you didn't do those things, the journal might not accept that trial. Whereas for the observational study, it just hasn't been the expectation, in part because it's so much easier to conduct observational studies.

It gets into the debate about data-sharing and access. Part of the challenge of having data that are so widely available — it's obviously a good thing that data are so widely available and that it's democratized in a sense — is it also makes it just easier to get away with bad-quality studies. Large-scale randomized clinical trials are confined to a handful of investigators like yourself and other big research groups like Duke Clinical Research Institute or the Baim Institute that Mike Gibson runs.

The downside of that is it's not really a level playing field for being able to conduct those. The positive side is that there is a common understanding of the rigor with which it takes to run those trials because there's a barrier to entry. Imagine that it became incredibly easy for every investigator to run a randomized clinical trial. You might not see the same rigor of those trials as you do now for the major trials.

Harrington: Although you've heard me say that I want it made much easier to do randomized clinical trials, the world I want to live in is one that if we don't know the answer, somehow we're being randomized to be able to get some insight into what that answer might be. I'm willing to accept a little messiness in order to have more questions exposed to the rigor of randomization.

Let me go back and ask. I was just thinking about the Duke Databank days, which really started as observational outcomes analysis with the collection of all the coronary disease patients at Duke. Do you make your fellows write a research plan as to what they're going to do when they show up in the Smith Center and that they want to do what you do?

Yeh: We do. We have a form that they have to fill out. They meet with the statistician to discuss what they've written. We iterate on that, and then we decide on something. That's pretty standard for us. Do we take it to the top and register every study? No, we don't register every study, but important ones, we do. Things that we think are impactful for regulatory science, we certainly do it, and there are a couple of examples that we've done with the FDA recently that we did that for.

We try to hold ourselves to as high a standard as we can, but it's easy to fall short of that for sure. It's easy to want to take just another peek under the hood and tweak the model just a little to get the result that you want. That's why it takes a large amount of planning up front. You really want to be able to go to bed at night feeling good about the study that you're putting forward.

Making Causal Inference

Harrington: There was a rigorous examination of the question. One of the things that I really enjoyed in your paper—those of us who are journal editors spend a large amount of time making sure that there's no causal inference language in some of these analyses—you tell me that's a cop out. That it gives us license to say, we know that there's limitations here, so go ahead and say what you want. You think that's not good enough to just take out the causal language? In fact, I think you said somewhere in here, maybe we should have the causal language in there, but tell us how you got there.

Yeh: That's exactly right, in my opinion. This is controversial.

Harrington: I know. That's why I brought it up.

Yeh: I was at a meeting, Transcatheter Cardiovascular Therapeutics, and I said that from the podium. [Statistician] Stuart Pocock almost stood up and interrupted me, and he said, "No, no, no, we cannot do that." This is controversial for sure. My point is that I don't think that every observational study ought to be interpreted causally. No, just the opposite. I think only a very small minority should be interpreted causally.

When the intent is causal, we ought to be up-front about that. When we say, "This is associated with that…" wink-wink, nod-nod, we know what we're getting at and what our point is. We ought to acknowledge that. If we acknowledge that our goal here was to study the causal impact of this treatment for this condition compared to some other condition, then all of a sudden, boy, I better clear a high bar then. I have to convince you.

Once you know that's my intent, I think the likelihood that I convinced you is going to be low. If I do convince you, we ought to acknowledge that you've been convinced. Now, what that will mean to me is fewer observational studies being published, but raising the quality of observational studies, being more explicit about the intent of it, and I think, overall, that will improve the quality.

Harrington: A few months ago, I met with a couple of investigators, and we did a show. I think it was at American Heart Association where we talked about "publish or perish." Part of the challenge is that there are many journals. I think you mentioned that. There are many things that people want to say. People need to start developing their research chops and their reputation, so unfortunately, in some ways, we created this mess, didn't we?

Yeh: I see many trainees, just like you, and many others who are listening to this, and sometimes I steer them away from those causal comparative effectiveness studies for their first ones because they're controversial. They're sometimes difficult to interpret. There are other observational studies, including descriptive epidemiology, prediction modeling, and risk factor assessment. Those are important observational studies that don't rely quite on the same extent of assumptions that a comparative effectiveness study does, and sometimes those are really good studies for trainees to work on.

Harrington: People will often ask me, "Which one do you like for answering my question: a randomized trial or an observational analysis?" I say, "Well, tell me what your question is, the data you're going to use, and what you're trying to ultimately get at. Do you want to make causal inference? If you want to make causal inference and it's possible to do a randomized clinical trial, that's what you ought to do. If that's not your goal, there are many other ways to put forward some thoughts into the literature and use some different methods."

Bobby, if I gave you the power to do one thing to make the field better, what would you do?

Yeh: Gosh. You've got me on the spot here. I would say that if we could standardize and even just make modern methods for causal inference more widely known and understood in our community, in the journals, in the scientific community, in the readers, I think many of these challenges would not go away entirely, but we'd be much better off.

Harrington: In some ways, that's what we've done in clinical trials, isn't it? We've standardized much of what we do, including the reporting of results, etc., in a way — with the consort diagrams, for example — that really helps us think through what it is that we set out to do and then what did we do? Then, reporting it in as full a fashion as is humanly possible.

Bobby, thank you for joining us here on Medscape Cardiology. This has been a fun conversation with my friend and colleague, Bobby Yeh, from Beth Israel Deaconess. Bobby is a professor of medicine at Harvard Medical School, and he's the director of the Richard and Susan Smith Outcomes Center at BI. Bobby, thanks for joining us.

Yeh: Thanks so much, Bob.

Robert A. Harrington, MD, is chair of medicine at Stanford University and former president of the American Heart Association. (The opinions expressed here are his and not those of the American Heart Association.) He cares deeply about the generation of evidence to guide clinical practice. He's also an over-the-top Boston Red Sox fan.

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.

processing....