Sitemap

How to Explore the Effect of Doing Something? (Part 2)

Applied Causal Inference 101: Non-experimental data

7 min readMar 23, 2022

--

Press enter or click to view image in full size
Photo by JJ Jordan on Unsplash

In the first part of this article, I explained the core concepts of identifying the causal effect of an action (such as receiving treatment or implementing a policy). If you missed it, here is the link to part one:

In this second and final part, I discuss how to identify the causal effect of an action using non-experimental data. Before diving in, let’s briefly review the basics once more:

The causal effect of doing something (e.g., receiving treatment X) on an outcome (Y) is the difference in Y when we do X versus when we do not, keeping all else constant.

Observing Y in the real world alone is insufficient to identify the causal effect. We must also consider the counterfactual outcome — what Y would be if we did not do X — while keeping everything else identical. The difference between these two outcomes defines the causal effect.

There is no reliable way to know the causal effect of an action on any individual. However, a well-conducted randomized experiment can identify the average causal effect of X on Y for the average participant in the target population.

In randomized experiments, the average participant in the treatment group and the control group serve as counterfactuals of each other, enabling an apples-to-apples comparison. The difference in average outcomes estimates the Average Causal Effect.

Valid causal inference requires three key assumptions: (1) apples-to-apples comparison, (2) no interference between participants, and (3) no multiple versions of the same treatment.

Experiments are ideal for causal inference, but they can sometimes be unethical or impossible to conduct. Therefore, social scientists often rely on non-experimental data to make causal inferences.

The key issue with using non-experimental data is that because the treatment is not randomly assigned, some people are more likely than others to select into the treatment. In this context, the treatment and control groups are usually not apples-to-apples. Suppose, there are certain factors that affect both the treatment and the outcome. In that case, comparing the outcomes of the two groups will result in a biased estimate of the causal effect.

Using a hypothetical example, I now discuss this key challenge and explain the fundamental concepts behind two methodological approaches designed to address it.

Problem Context

Grocery store XYZ offered all its customers a “shopper’s card” in September. Some customers accepted it, while others did not. The marketing analyst wants to understand whether accepting the shopper’s card caused customers to spend more in October.

Since “lead to” implies causation, the analyst is essentially asking: Does accepting the shopper’s card cause an increase in customer spending at the grocery store?

She defines the treatment as a binary variable indicating whether a customer accepts the shopper’s card. The outcome of interest is the total amount a customer spends at the grocery store in a month.

For simplicity, suppose the analyst’s background research reveals that income is the only factor differentiating customers who accepted the shopper’s card from those who did not. In other words, customers with higher annual household incomes are more likely to accept the card compared to those with lower incomes. Additionally, the analyst reasonably believes that higher-income customers tend to spend more on groceries. Putting it all together, in this example, income is the sole factor influencing both the treatment (acceptance of the shopper’s card) and the outcome (monthly grocery spending).

Press enter or click to view image in full size
Figure 1: Confounding Variable

What if the analyst simply compares the average monthly spending of customers who accepted the shopper’s card to those who did not? This naive estimate — the simple difference — can be decomposed as follows:

Naive Estimate = Causal Effect (represented by the green arrow) + Spurious Correlation (caused by the red arrows)

It should be clear that to recover the true causal effect, we must eliminate the spurious correlation. Successfully doing so increases the likelihood of obtaining an unbiased estimate of the treatment effect.

Conditional Apples-to-Apples Comparison 🍎v🍎

We can remove the spurious correlation by employing a conditional apples-to-apples comparison method.

For simplicity, let’s further pretend that only three income groups exist among the customers. These groups are customers with an annual household income of 1) $50,000, 2) $75,000, and 3) $100,000.

If income is the only factor differentiating our treatment and control groups, comparing customers within the same income group will be a“apples-to-apples” comparison. Why? Because as the analyst figured out, based on her background research, other than income, on average, there is no other factor/characteristic which is different between the two groups.

Next, we compare the monthly spending between the treatment and the control groups within the three income groups. By doing so, we get three different numbers. Finally, we take a weighted average of these three numbers to get our estimate of the average causal effect. Figure 2 shows the procedure using a toy example. Also, it illustrates the “bias” in the naive estimate.

**Weight=proportion of the total customers within each income group**

Press enter or click to view image in full size
Figure 2: Calculation of the Naive Estimate and the Average Causal Effect. A naive estimate means the simple difference between the average spending by the treatment group (customers with shopper’s card) and the average spending by the control group (customers without the shopper’s card). For calculating the Average Causal Effect, first, we calculate the difference in average monthly spending between customers with and without shopper’s cards within each of the three income groups. Then, we take a weighted average of these three numbers to estimate the Average Causal Effect. Note that, in this toy example, the weight of each of the three income groups is 6/18 because we have 18 customers, of whom 6 are from $50k, 6 are from $75k, and 6 are from $100k income groups.

Comparing Groups with Apples-to-Apples Trends Overtime 🍎v🍎📈

For the conditional apples-to-apples approach to work, we need to know which variables to condition on (i.e., do a within-group comparison of the average outcome of the treatment and the control groups). In our example, we assumed that income is the only factor that affects both the treatment and the outcome. Therefore, conditioning on only income was enough to get an unbiased estimate of the average causal effect.

What if the reality is far more complicated than we have considered? For instance, maybe other than income, a customer’s materialism, impulsivity, and other factors affect the treatment and the outcome. What if we do not have data on all these variables and cannot condition on them? In other words, how should we proceed when we know there is no way to turn the treatment and the control group into apples-to-apples?

In such a scenario, we gather data from a treatment and a control group over an extended period of time before the treatment group received the treatment. Also, we gather data from both groups for at least one post-treatment time period.

Most importantly, we have to make a critical assumption that “had the treatment group not been treated, in the post-treatment period, their average outcome would have evolved in parallel to the average outcome of the control group.”

In our example, the analyst gathers monthly spending data for the shoppers who accepted the shopper’s card (treatment group) and those who did not (control group) before and after the grocery store offered the card.

Press enter or click to view image in full size
Figure 3: Comparing the difference in average outcomes before and after the treatment was assigned for the not-apples-to-apples treatment and control groups (with apples-to-apples trends)

Let’s try to break apart the story in Figure 3 step-by-step:

  1. The grocery store offered the shopper’s card in September. The analyst gathered average monthly spending data for both groups from May to October (i.e., before and after September).
  2. Before the treatment was assigned (i.e., the shopper’s card was offered), the average monthly spending of the two groups was evolving in parallel. Please note that the average outcomes in the two groups were not exactly the same. This is expected because these two groups are not-apples-to-apples. People spending more anyway were more likely to accept the shopper’s card.
  3. Most importantly, the average spending over the months in the pre-treatment period was moving in parallel — this is the key to causal inference! We can make a “fair” argument that given the existence of the parallel trends in the pre-treatment period, in a counterfactual world where everything else stays the same except the grocery store does not offer the shopper’s card, the average outcome of the treatment group would have evolved in parallel to the average outcome of the control group (the dotted green line shows this).
  4. Given that the above argument is sound, the distance between the solid green line and the dotted green line estimates the treatment’s causal effect on those who received the treatment.
  5. Mathematically, we estimate:

Average Causal Effect on the Treated = [Average outcome in the post-treatment period for the treatment group − Average outcome in the pre-treatment period for the treatment group]−[Average outcome in the post-treatment period for the control group − Average outcome in the pre-treatment period for the control group]

**This approach is called a difference-in-differences method**

To conclude, when working with non-experimental data, for making a valid causal inference, we can try to make either a comparison of conditional apples-to-apples groups at a specific point in time or a comparison of not-apples-to-apples groups (with apples-to-apples trends) over time. In either case, two other key assumptions of causal inference — no interference and no two versions of the same treatment — must hold.

There are many other methods for causal inference using non-experimental data, but we will leave that discussion for another time.

Thanks for following along!

For drawing the figures, I used vector images from Publicdomainvectors.org, which offers copyright-free vector images in popular .eps, .svg, .ai and .cdr formats.

--

--

Vivekananda Das
Vivekananda Das

Written by Vivekananda Das

Sharing synthesized ideas on data and behavior | Researcher | Educator | Connect with me: https://www.linkedin.com/in/vivekananda-das-421922385/

No responses yet