How to Explore the Effect of Doing Something? (Part 2)
Applied Causal Inference 101: Non-Experimental Data
In the first part of this article, we discussed some of the core ideas of identifying the causal effect of doing something (such as receiving a treatment, implementing a policy, etc.). In case you missed it, here is the link to the first part:
How to Explore the Effect of Doing Something? (Part 1)
Causal Inference 101: The Experimental Ideal
In this second and final part of the article, we will discuss how to identify the causal effect of doing something using non-experimental data. Before we start, we will review the basics one more time:
#1. The causal effect of doing something (for example, receiving a treatment X) on an outcome (Y) is the difference in the value of the outcome (Y) when we do X and the value of the outcome (Y) when we do not do X, keeping all other things constant.
#2. For identifying the causal effect of doing X, merely observing the outcome Y in this real-world is not sufficient. We have to know what would be the outcome Y in a counterfactual world, where everything else stays exactly the same as in this real-world except we do not do X. Ideally, we consier the difference in the outcome Y in the two worlds as the causal effect of doing X on the outcome Y.
#3. There is no reliable way to know the causal effect of doing something for any particular human being. However, a properly conducted randomized experiment helps us identify the causal effect of doing X on some outcome Y for the typical (average) participant from a target population.
#4. The typical (average) participant of the treatment group and the control group in a randomized experiment can be considered as counterfactuals of each other. In simpler words, randomized experiments help us make an apples-to-apples comparison. If randomization works as intended, we consider the difference between the average outcome of the treatment group and the average outcome of the control group as the estimate of the causal effect (a.k.a. average treatment effect).
#5. Three key requirements for making a valid causal inference are: 1) apples-to-apples comparison, 2) no interference, and 3) no two versions of the same treatment.
Experiments are great except sometimes they are unethical if not impossible to conduct. For this reason, social, behavioral, and health scientists often try to make causal inferences using non-experimental (also known as observational) data.
The key issue with using non-experimental data is that because the treatment is not randomly assigned, some people are more likely than others to select into the treatment. In this sense, in the context of observational data, usually, the treatment and the control groups are not apples-to-apples. If there are certain factors/characteristics that affect both the treatment and the outcome, then, comparing the outcomes of the two groups will result in a biased estimate of the causal effect.
Using a hypothetical example, now, we will try to contextualize this key problem and the fundamental concepts behind two types of methodological approaches to tackle it!
A grocery store XYZ offered a “shopper’s card” to all its customers in September. Some of the customers accepted it and others did not. The marketing analyst is trying to understand whether accepting the shopper’s card led to higher spending by customers in October. As “lead to” is a causal phrase, fundamentally, the analyst is asking: does the acceptance of a shopper’s card cause an increase in spending by the customers at the grocery store? She defines the treatment as a binary variable: whether a customer accepts the shopper’s card or not. And, the total amount of money that a customer spends at the grocery store in a month is her outcome of interest.
For the sake of simplicity, let’s pretend the analyst does some background research and finds that “income” is the only factor that differentiates the customers who accepted the shopper’s card and who did not accept it. In other words, customers with a higher annual household income were more likely to accept the shopper’s card compared to customers with a lower annual household income. Moreover, intuitively, the analyst believes that higher-income people are more likely to spend more on grocery shopping. Adding everything together, in this example, income is the only factor that affects both the treatment (i.e., whether a customer accepts the shopper’s card or not) and the outcome (i.e. the total amount of money that a customer spends at the grocery store in a month).
What if the analyst takes the simple difference between the monthly spending by customers who accepted the shopper’s cards and who did not? This naive estimate (i.e. the simple difference) of the causal effect can be broken down as the following:
Naive Estimate = Causal Effect (the green arrow) + Spurious Correlation (induced by the red arrows)
It should be evident that to recover the causal effect, we need to remove the spurious correlation. If we can successfully do that, we are more likely to get an unbiased estimate of the treatment effect.
Conditional Apples-to-Apples Comparison 🍎v🍎
We can remove the spurious correlation by employing a method that does “conditional apples-to-apples comparison”.
For simplicity, let’s further pretend that there are only three income groups among the customers. These groups are: customers with annual household income of 1) $50,000, 2) $75,000, and 3) $100,000. If income is the only factor that differentiates our treatment and control groups, comparing customers within the same income group will be a kind of “apples-to-apples” comparison. Why? Because, as the analyst figured out based on her background research, other than income, on average, there is no other factor/characteristic which is different between the two groups.
Next, within each of the three income groups, we compare the monthly spending between the treatment and the control groups. Doing so, we get three different numbers, right? Finally, we take a weighted average of these three numbers to get our estimate of the average causal effect! Figure 2 shows the procedure using a toy example. Also, it illustrates the “bias” in the naive estimate.
**Here, weight=proportion of the total customers within each income group**
Comparing Groups with Apples-to-Apples Trends Overtime 🍎v🍎📈
For the conditional apples-to-apples approach to work, we need to know which variables to condition on (i.e., do a within comparison of the average outcome of the treatment and the control group). In our example, we assumed that income is the only factor that affects both the treatment and the outcome, and therefore, conditioning on only income was enough to get an unbiased estimate of the average causal effect.
What if the reality is far more complicated than the one we considered so far? For instance, maybe other than income, a customer’s materialism, impulsivity, and a whole host of other factors affect both the treatment and the outcome. What if we do not have data on all these variables and we cannot condition on all of them? In other words, how should we proceed when we know that there is no way to turn the treatment and the control group into apples-to-apples?
In such a scenario, we gather data from the treatment and a control group over an extended period of time before the treatment group received the treatment. Also, we gather data from both groups for at least one post-treatment time period.
Most importantly, we have to make a key assumption that “had the treatment group not been treated, in the post-treatment period, on average, their outcome, would have evolved in parallel to the evolution of the outcome of the control group.”
In the case of our example, the analyst gathers monthly spending data for both the shoppers who accepted the shopper’s card (treatment group) and those who did not (control group) before and after the grocery store offered the card.
Let’s try to break apart the story in figure 3 step-by-step:
- The grocery store offered the shopper’s card in September. The analyst gathered average monthly spending data for both groups from May to October (i.e., before and after September).
- Before the treatment was assigned (i.e., the shopper’s card was offered), the average monthly spending of the two groups was evolving in parallel. Please note that the average outcomes in the two groups were not exactly the same. This is expected because these two groups are not-apples-to-apples. People who were spending more anyway were more likely to accept the shopper’s card.
- Most importantly for us, the average spending over the months in the pre-treatment period was moving in parallel — this is the key for causal inference here! We can make a “fair” argument that given the existence of the parallel trends in the pre-treatment period, in a counterfactual world where everything else stays the same except the grocery store does not offer the shopper’s card, the average outcome of the treatment group would have evolved in parallel to the average outcome of the control group (this is shown by the dotted green line).
- Given the above argument is sound, the distance between the solid green line and the dotted green line is an estimate of the causal effect of the treatment on those who received the treatment.
- Mathematically, we want this quantity,
Average Causal Effect on the Treated = [Average outcome in the post-treatment period for the treatment group − Average outcome in the pre-treatment period for the treatment group]−[Average outcome in the post-treatment period for the control group − Average outcome in the pre-treatment period for the control group]
**This approach is called a difference-in-differences method**
To conclude, if we are working with non-experimental data, for making a valid causal inference, we can try to do either a comparison of conditional apples-to-apples groups at a specific point in time or a comparison of not-apples-to-apples groups (with apples-to-apples trends) overtime. In either case, two other key assumptions of causal inference — no interference and no two versions of the same treatment — must hold true.
Thanks for following along! In case you are interested in reading more beginner-level articles on the fundamental ideas in causal inference, check some of my previous posts!
Confounding Variable and Spurious Correlation: Key Challenge in making Causal Inference
Desire to solve problems is perhaps natural to all humans. Inability to identify the causes of a problem, particularly…
Endogenous Selection Bias: Another Key Issue in Causal Inference
Caution: Try not to condition on a collider!
Regression and Causal Inference: How Causal Graphs Helped Me Overcome 3 Key Misconceptions
Regression models can be used for two purposes: predictive modeling and causal inference. However, based on my…
Regression and Causal Inference: Which Variables Should Be Added to The Model?
Struggle and (Potential) Remedy
*For drawing the figures, I used vector images from Publicdomainvectors.org, which offers copyright-free vector images in popular .eps, .svg, .ai and .cdr formats.