Interpreting the Coefficients of a Regression with an Interaction Term: A Detailed Explanation

Vivekananda Das
7 min readMay 23, 2020
Photo by Charles Deluvio on Unsplash

Adding an interaction term to a linear model — estimated using regression — becomes necessary when the statistical association between a predictor and an outcome depends on the value/level of another predictor.

Although adding an interaction term to a model can make it a better fit with the data, it simultaneously complicates the interpretation of the coefficients of the predictors.

In this article, we explore how to interpret the coefficients of the predictors of a linear model that includes a two-way interaction term (between a continuous predictor and a binary predictor). We want to understand how the interpretation of coefficients differs between a model with an interaction term and a model without an interaction term. We use the statistical software R for estimating the models and visualizing the outcomes.

A Hypothetical Example

Suppose a graduate admissions committee wants to explore how a student’s Bachelor’s GPA and GRE score relate to their Master’s GPA. (Note: the dataset in this example is imaginary and used only for illustrative purposes.)

Model 1: Without interaction between bgpa and gre

First, we estimate the following model:

mgpa = b0 + b1*bgpa + b2*gre + error

R Output

In this case, we interpret the coefficient of the continuous bgpa variable as:

“Keeping the level of gre constant, a one unit increase in bgpa is, on average, associated with 0.883 units increase in mgpa.”

Now, as gre is a binary variable (with gre=0 set as the base case), we interpret its coefficient a bit differently:

“Keeping the value of bgpa constant, the average value of mgpa is 0.35 units higher for the group with gre = 1 than the group with gre = 0.”

For a clearer understanding, based on the R output, we can express the estimated model mathematically as:

mgpa = 0.258 + 0.883*bgpa + 0.35*gre ……… (1)

Let us consider the case in which we want to compare the predicted mgpa value of two students with 0.1 units different bgpa (bgpa =3.3 and bgpa = 3.4 respectively) but same level of gre (gre=1),

For the student with bgpa = 3.3 and gre = 1,

mgpa = 0.258+0.883*3.3+0.35*1 = 3.522

For the student with bgpa = 3.4 and gre = 1,

mgpa = 0.258+0.883*3.4+0.35*1 = 3.610

So, we find that, keeping the level of gre constant (as gre=1 for both cases), 0.1 units increase in bgpa (that is comparing bgpa=3.4 with bgpa=3.3) is, on average, associated with 3.610 - 3.522=0.088 units increase in mgpa. Scaling up, this means, a one unit increase in bgpa is, on average, associated with 0.88 units increase in mgpa, which is exactly what the coefficient of the bgpa variable in the R output shows (ignoring the slight discrepancy in values due to rounding).

Additionally, from equation 1, we observe that,

If gre = 0,

mgpa = 0.258 + 0.883*bgpa + 0.35*0 = 0.258 + 0.883*bgpa

And, if gre = 1,

mgpa = 0.258 + 0.883*bgpa + 0.35*1 = 0.608 + 0.883*bgpa

So, depending on the two different levels of gre, we get two different straight lines with the same slope (parallel lines), which suggests that regardless of the level of gre, a one unit increase in bgpa is, on average, associated with 0.883 units increase in mgpa. In other words, this model assumes that the association between bgpa and mgpa does not depend on the levels of gre.

We visualize the two straight lines in the following graph:

Visualizing the changes in the outcome (mgpa) with changes in the continuous predictor (bgpa) at both levels of the binary predictor (gre) in the model without interaction

Similarly, we could better understand the coefficient of the gre variable by putting values into equation 1 (e.g., by keeping bgpa constant at 3.5 and considering the two levels, 0 and 1, of gre).

Note that to interpret the coefficients of the predictors of a model without any interaction term, we do not need any of these extra calculations as the R output already gives us the same information.

However, if we have an interaction term in the model, we need additional steps beyond the R output to interpret the model coefficients.

Model 2: With interaction between bgpa and gre

Now, we estimate the following model, which incorporates the interaction between bgpa and gre:

mgpa = b0 + b1*bgpa + b2*gre + b3*bgpa*gre + error

R Output

First, we see that the interaction term is statistically significant at the 5% significance level (as the p-value is <0.05), which justifies the inclusion of the term in the model.

Second, we can no longer interpret the model coefficients by only looking at the R output.

Consequently, based on the R output, we write the estimated model mathematically as:

mgpa = 0.940 + 0.688*bgpa - 1.477*gre + 0.534 *bgpa*gre ………… (2)

Now, if gre = 0, equation 2 reduces to:

mgpa = 0.940 + 0.688*bgpa - 1.477*0 + 0.534* bgpa*0

= 0.940 + 0.688*bgpa

And if gre =1, equation 2 reduces to:

mgpa = 0.940 + 0.688*bgpa - 1.477*1 + 0.534 *bgpa*1

= -0.537 + 1.222*bgpa

For the model in equation (2), at the two levels of gre, we find two straight lines with different slopes (0.688 and 1.222), which reveals that the lines are not parallel. This model assumes that the positive association between bgpa and mgpa depends on the level of gre. Therefore, we interpret the model coefficients as:

Provided that gre=0, a one unit increase in bgpa is, on average, associated with 0.688 units increase in mgpa.

Provided that gre=1, a one unit increase in bgpa is, on average, associated with 1.222 units increase in mgpa.

The coefficient of the interaction term (i.e., bgpa: gre1) in R output displays the difference in slope between the two lines (i.e., 1.222–0.688 = 0.534)

We can visualize these in the graph below:

Visualizing the changes in the outcome (mgpa) with changes in the continuous predictor (bgpa) at both levels of the binary predictor (gre) in the model with an interaction (bgpa*gre) term

As a real-world explanation of the model coefficients, we can say: Master’s GPA generally tends to be higher for students who have a higher Bachelor’s GPA; however, Master’s GPA apparently increases at a higher rate with the increase in Bachelor’s GPA for students with GRE scores above 310 than for students with GRE scores equal or below 310. Adding everything together, the positive association between Master’s GPA and Bachelor’s GPA is apparently dependent on the level of the GRE score.

How do we decide whether to include the interaction term or not?

We may use two techniques to decide whether to include the interaction term in the model. Initially, a scatterplot can help us identify whether the linear relationship between a continuous predictor (bgpa) and a continuous outcome (mgpa) varies depending on a categorical predictor (gre).

From the above plot, we can clearly see a possible steeper line passing through the points for which gre = 1 (shown by the green points). So, the evidence of non-parallel lines at different levels of a categorical predictor suggests that we consider adding an interaction term.

Secondly, after we add the interaction term to the model, if the p-value of the coefficient of the interaction term turns out to be lower than the significance level (usually 0.05), that suggests the interaction term is significantly different from 0. In that case, we should keep the interaction term in the model.

However, both approaches are purely data-driven and atheoretical. The best way to decide whether to include an interaction term is to depend on a relevant theory (i.e., is there any theoretical reason to believe that the association between X and Y depends on the value of Z?)

Wrapping Up

In this article, we learned the interpretation of the coefficients of a model that includes interaction between a continuous predictor and a binary predictor. Also, we learned how to decide whether to include an interaction term in a model.

Before we end…..

You may have noticed that throughout this article, I wrote statements such as:

“a one unit increase in bgpa is, on average, associated with 0.883 units increase in mgpa.”

And I did NOT write statements such as:

“a one unit increase in bgpa increases mgpa by 0.883 units”

bgpa positively affects/influences/impacts mgpa

Would you like to know why? 🤔 If you are curious, feel free to visit the following!

--

--

Vivekananda Das

Sharing synthesized ideas on Data Analysis in R, Data Literacy, Causal Inference, and Wellbeing | Ph.D. candidate @UW-Madison | More: https://vivekanandadas.com