Why Is Correlation Neither Necessary Nor Sufficient for Causation (in Non-Experimental Data)?

A detailed explanation with toy examples

Photo by charlesdeluvio on Unsplash

I. Correlation is “not sufficient” for causation

(i.e., two “significantly” correlated variables may not have a causal relationship)

In non-experimental data, despite the fact that —

  1. X and Y are strongly correlated/associated (coefficient close to/equal to 1)
  2. X and Y are (statistically) significantly associated (p-value of the coefficient less than 0.05)
  3. You can almost perfectly fit a straight line through the scatter plot of X and Y
  4. The explanatory variable in your model (X) explains almost all the variation in the outcome variable (Y) (i.e., R² close to 1)

X and Y may have no causal relationship! Let’s look at two such scenarios:

Scenario #1

There are confounding variables (simply, confounders) that your model did not control for.

A famous example: the ice cream sales in a month and the number of people attacked by sharks in the same month are strongly correlated but there is no reason to believe one of these variables causes the other.

I have written a few articles on this; therefore, I am not going to delve deeper into this here. In case you are interested, read the following article:

Scenario #2

You controlled for a “collider” which is a variable exactly opposite to confounders. And guess what! Two variables — which neither cause each other nor are caused by a common factor — appear to be correlated. You induced bias by controlling for a collider although your intention was to reduce bias by controlling for a confounder 🤦!

Here is an intuitive example to understand the issue (known as endogenous selection bias/collider bias):

II. Correlation is “not necessary” for causation

(i.e., two uncorrelated variables may have a causal relationship)

The first time I learned about it, I was like “Good heavens 😲”.

Perhaps this fact is even more unintuitive to an ordinary human mind than the earlier one. In fact, I have heard some experienced people say that “For one variable to cause another, there “must be” a correlation between them”.

It turns out that the statement is not correct in non-experimental conditions!

Let’s look at three distinct scenarios where there can be a causal relationship between two variables although they are not correlated/statistically associated (as in a simple linear regression):

Scenario #1

You are driving uphill. The slope is steep. There comes the point when you press the gas pedal harder and harder but the car's speed remains the same.

Photo by Maria Teneva on Unsplash

Let’s make some arbitrary numbers and see the correlation between the force applied to the gas pedal and the speed of the car:

(Image by the author) Please pardon the physics behind it 🙏🏽

The correlation between force and speed is actually “undefined” here. Why? Because —

  1. Covariance between the two variables is 0
  2. Correlation between Force and Speed= (covariance between Force and Speed)/(standard deviation of Force)*(standard deviation of Speed)
  3. As the standard deviation of Speed = 0, in the correlation formula, we are dividing 0 by a 0!

Okay, so we have an “undefined correlation” between two variables; how can they be causally related? 🤔

Well, for identifying causation between two variables, we need to invoke “counterfactual thinking”.

Imagine yourself in two different worlds. Everything is exactly the same in the two worlds — the same car, the same steepness of the road, and the same you — except in one world you keep exerting more force on the gas pedal, and in the other, you keep the force constant.

Here are some more numbers to help us think counterfactually:

(Image by the author)

Once you invoke the counterfactual worlds, it becomes clear to you that:

Although in the real world, there was no correlation between the force you exerted on the gas pedal and the speed of the car — i.e., change in one thing, apparently, was not associated with the change in the other — the force did have a causal effect on the speed of the car!

The other two scenarios are a bit more complicated!

Usually, in most social, behavioral, and health science research, we are interested in the “average causal effect (ACE)” of a treatment on an outcome, and we estimate that using a linear model. Linear models — in many circumstances — work as a useful approximation of a complicated real-world phenomenon.

Are there situations where the preference for a linear model and the reliance on ACE is not a good idea? 🤔

Scenario #2

(Image by the author) Apologies for this rather bizarre example 🙏🏽

Let’s pretend “I want someone to love me”.

More importantly, I want to know the causal effect of being loved by that someone on my suffering.

Love in this world can be measured using a scale that stretches from -5 to +5 in which -5 is a complete lack of love, 0 is a neutral point (a point of non-attached love i.e., both in love and not in love), and +5 is consummate love.

Suffering can be measured using a scale that stretches from 0 to 25 in which 0 means a complete lack of suffering (something like nirvana) and 25 means utmost suffering (something like the eternal hell).

This is how it works in this “hypothetical” world:

  1. The less they love me, the more I suffer (i.e., as love moves from 0 to -5, suffering increases from 0 to 25)
  2. The more they love me, the more I suffer (i.e., as love moves from 0 to +5, suffering increases from 0 to 25)
  3. I suffer the least (i.e., reach my Nirvana) when their love is at the neutral point (i.e., I am located at the point where love=0 and suffering =0)

Clearly, there is a causal effect of their love on my suffering. However, the correlation coefficient is 0 (because the correlation coefficient shows the strength of the linear relationship between two variables). Also, the coefficient of Love — estimated using a simple linear regression (ordinary least squares) — is 0 (because it found that a horizontal line is the best fit implying the slope=coefficeint=0).

If we use a simple linear model, based on the correlation coefficient/regression coefficient, we conclude there is no relationship between their love and my suffering.

The (purported) true relationship is:

Suffering = (Love)²

But for the sake of using OLS, I estimated:

Suffering = b0 + b1*Love + u

One may say, it is indeed the case that, on average, there is no relationship between the two variables. But, we may argue: why should we settle for the average and ignore the entire picture?

Scenario #3

The last scenario is quite intriguing as well!

(Image by the author)

Let’s pretend you are running an experiment. You randomly assign your study participants into two groups — a treatment group (which receives a treatment X) and a control group (which receives no treatment). You are interested in the causal effect of treatment X on an outcome Y.

The participant pool (N=8) has two types of people — green people (N=4) and red people (N=4). Because they are randomly assigned into two equally sized groups, each group — thanks to randomness — (somehow) has 2 green people and 2 red people.

(Image by the author)

This is how this hypothetical world works: treatment X has a positive effect (+2 units) on the Red group; however, the same treatment has an equally negative effect (-2 units) on the Green group.

As a consequence, the correlation between treatment X and outcome Y is 0. The average causal effect (ACE) — estimated by simple OLS — is also 0.

This toy example shows that due to treatment effect heterogeneityeven though you ran an experiment and the treatment, indeed, affects the outcome — if you base your judgment only on the correlation coefficient and/or the coefficient of the simple OLS (i.e., you do not have a theory of the possible heterogeneity between groups), you will conclude that the treatment has no effect on the outcome!

To conclude, the above examples are, perhaps, quite extreme; nevertheless, they serve a useful purpose by highlighting some worst-case possibilities. Rather than keeping our eyes closed and pretending everything will work out just fine, an awareness of these issues can be extremely helpful for a credible investigation of a real-world phenomenon and useful knowledge generation.

Footnotes:

  1. Can you look at the data first and then decide on whether a linear or non-linear model should be fitted? Unfortunately, if all you have is non-experimental data, going from data to model may not be a good idea. Think about the ice cream sales and shark attacks case. If you create a scatter plot, the two variables appear to be linearly related. That, just by itself, tells you either nothing or something completely wrong about how the world works. You need a theory first! For example, here is a theory: “when the temperature goes up, people buy more ice cream and flock to the sea beaches (ice cream sales ← temperature → number of shark attacks on swimmers).
  2. A key difference between experimental and non-experimental data is that the earlier automatically generates counterfactual worlds (actually I never realized it until I made an exodus to applied social sciences). In high school physics, we learned about Ohm’s law which says that the current through a conductor between two points is directly proportional to the voltage across the two points. And we saw it first-hand by running experiments. We increased voltage across two points and the flow of current increased proportionately resulting in a linear relationship. But remember two key things: we were running the experiment on the (1) same conductor and (2) we kept the temperature constant. This implies that we changed only one thing — the voltage across two points. In non-experimental data, say a survey data on income and years of education, we may have 10,000 observations — but crucially, these are observations of different people. These people have not only different years of education and different income but also many other different characteristics many of which affect both education and income! This is very different from Ohm’s law situation in which we had repeated observations of the same conductor. I do not quite understand why so many people plot two variables from non-experimental data (say income and happiness), find a linear/quadratic relationship, and interpret the relationship as if it were as clear as Ohm’s law. 😔

In case you would like to read some more articles on the fundamentals of causal inference, here are some suggestions:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vivekananda Das

Sharing synthesized ideas on Causal Inference, Data Analysis in R, Stat Literacy, and Wellbeing | Ph.D. candidate @UW-Madison | More: https://vivekanandadas.com