Interpreting the Coefficients of a Regression with an Interaction Term (Part 1)
A Detailed Explanation
Adding an interaction term to a linear model — estimated using regression — becomes necessary when the statistical association between a predictor and an outcome depends on the value/level of another predictor.
Although adding an interaction term to a model can make it a better fit with the data, it simultaneously complicates the interpretation of the coefficients of the predictors.
This article explores how to interpret the coefficients of the predictors of a linear model that includes a two-way interaction between a continuous predictor and a binary predictor. In particular, I focus on how the interpretation of coefficients differs between a model with an interaction term and a model without an interaction term.
In the second part of this article, I discuss how to interpret the coefficients of the predictors of a linear model that includes a two-way interaction between two continous predictors or between two binary predictors.
A Hypothetical Example
Suppose a graduate admissions committee wants to explore how a student’s Bachelor’s GPA and GRE score relate to their Master’s GPA. (Note: the dataset in this example is imaginary and used only for illustrative purposes.)
I am going to use the statistical software R to generate some fake data and estimate the models.
If you do not code in R, ignore the code and follow the rest of the article for understanding the content.
# Set seed
set.seed(42)
# Number of observations
n <- 10000
# Generate binary predictor 'gre' (0 or 1)
gre <- rbinom(n, 1, 0.5)
# Generate continuous predictor 'bgpa' (ranging between 2 and 4)
bgpa <- runif(n, min = 2.0, max = 4.0)
# Interaction term
interaction <- gre * bgpa
# Generate continuous outcome 'mgpa' with a larger scale
# Use a higher coefficient for interaction to ensure a strong effect
error <- rnorm(n, mean = 0, sd = 0.1) # Moderate noise
# Calculate the mgpa values with a larger range
mgpa_raw <- 2 + 0.3 * bgpa + 0.5 * gre + 3 * interaction + error
# Clip mgpa values to ensure they stay within the range [2, 4]
# Applying a scaling transformation to keep the interaction effect prominent
scaled_mgpa <- mgpa_raw - min(mgpa_raw)
scaled_mgpa <- scaled_mgpa / max(scaled_mgpa) * (4 - 2) + 2
# Clip to ensure values are within the range [2, 4]
scaled_mgpa <- pmin(pmax(scaled_mgpa, 2), 4)
# Combine into a dataframe
data <- data.frame(gre, bgpa, mgpa = scaled_mgpa)
# Estimate the linear model with interaction
model_wo_interaction <- lm(mgpa ~ bgpa+gre, data = data)
# Summarize the model
summary(model_wo_interaction)
Model 1: Without interaction between bgpa and gre
First, we estimate the following model:
mgpa = b0 + b1*bgpa + b2*gre + error
First, we interpret the intercept as the predicted (average) value of mgpa when bgpa=0 and gre=0. As bgpa can’t be 0 (because we assumed it must be between 2 and 4), the intercept, all by itself, doesn’t really provide any useful information.
Next, we interpret the coefficient of bgpa as:
“Keeping the level of gre constant, a one unit increase in bgpa is, on average, associated with 0.27 units increase in mgpa.”
Now, as gre is a binary variable (with gre=0 set as the base case), we interpret its coefficient a bit differently:
“Keeping the value of bgpa constant, the average value of mgpa is 1.39 units higher for the group with gre = 1 than for the group with gre = 0.”
For a clearer understanding, based on the R output, we can express the estimated model mathematically as:
mgpa = 1.29 + 0.27*bgpa + 1.39*gre ……… (1)
Let us consider the case in which we want to compare the predicted mgpa value of two students with 0.1 units different bgpa (bgpa =3.3 and 3.4, respectively) but same level of gre (gre=0),
For the student with bgpa = 3.3 and gre = 0,
mgpa = 1.29 + 0.27*3.3 + 1.39*0 = 2.181
For the student with bgpa = 3.4 and gre = 0,
mgpa = 1.29 + 0.27*3.4 + 1.39*0 = 2.208
So, we find that, keeping the level of gre constant (as gre=0 for both cases), 0.1 units increase in bgpa (that is comparing bgpa=3.4 with bgpa=3.3) is, on average, associated with 2.208 - 2.181 = 0.027 units increase in mgpa. Scaling up, this means, a one unit increase in bgpa is, on average, associated with 0.27 units increase in mgpa, which is exactly what the coefficient of the bgpa variable in the R output shows (it actually shows 0.265734 but we ignore the slight discrepancy in values caused by rounding).
Additionally, from equation 1, we observe that,
If gre = 0,
mgpa = 1.29 + 0.27*bgpa + 1.39*0 = 1.29 + 0.27*bgpa
And, if gre = 1,
mgpa = 1.29 + 0.27*bgpa + 1.39*1= 2.68 + 0.27*bgpa
So, depending on the two different levels of gre, we get two different straight lines with the same slope (parallel lines), which suggests that regardless of the level of gre, a one unit increase in bgpa is, on average, associated with 0.27 units increase in mgpa. In other words, this model assumes that the association between bgpa and mgpa does not depend on the levels of gre.
We visualize the two straight lines by running the following code:
# Load necessary package
library(ggplot2)
# Predict mgpa for each value of bgpa at the two levels of gre using the model without interaction
data$pred_mgpa_wo_interaction <- predict(model_wo_interaction, newdata = data)
# Coefficients from the model without interaction
coef_no_interaction <- coef(model_wo_interaction)
# Extract the coefficients
intercept <- coef_no_interaction["(Intercept)"]
coef_bgpa <- coef_no_interaction["bgpa"]
coef_gre <- coef_no_interaction["gre"]
# Create the equations as strings
equation_gre_0 <- paste0("mgpa = ", round(intercept, 2), " + ", round(coef_bgpa, 2), "*bgpa")
equation_gre_1 <- paste0("mgpa = ", round(intercept + coef_gre, 2), " + ", round(coef_bgpa, 2), "*bgpa")
# Find the positions where we want to place the equations
text_positions <- data.frame(
gre = c(0, 1),
x = c(2.5, 2.5),
y = c(
predict(model_wo_interaction, newdata = data.frame(gre = 0, bgpa = 2.5)),
predict(model_wo_interaction, newdata = data.frame(gre = 1, bgpa = 2.5))
)
)
# Plot the data
ggplot(data, aes(x = bgpa, y = pred_mgpa_wo_interaction, color = as.factor(gre))) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Relationship between bgpa and mgpa (without Interaction)",
x = "bgpa",
y = "Predicted mgpa",
color = "GRE Level") +
geom_text(data = text_positions, aes(x = x, y = y, label = ifelse(gre == 0, equation_gre_0, equation_gre_1),
color = as.factor(gre)), size = 6, hjust = -0.1, vjust = 2) + # Adjusted equation position
theme_minimal(base_size = 16) +
theme(legend.position = "none",
plot.title = element_text(size = 20, face = "bold"),
axis.title = element_text(size = 18),
axis.text = element_text(size = 16))
Similarly, we could better understand the coefficient of the gre variable by putting values into equation 1 (e.g., by keeping bgpa constant at 3.5 and considering the two levels of gre).
Model 2: With interaction between bgpa and gre
Now, we estimate the following model, which incorporates an interaction between bgpa and gre:
mgpa = b0 + b1*bgpa + b2*gre + b3*bgpa*gre + error
# Estimate the linear model with interaction
model_w_interaction <- lm(mgpa ~ bgpa*gre, data = data)
# Summarize the model
summary(model_w_interaction)
First, we see that the interaction term is statistically significant at the 5% significance level (as the p-value is <0.05), which justifies the inclusion of the term in the model.
Second, the interpretation of coefficients is not quite as simple as it was for the model without an interaction term.
Based on the R output, we write the estimated model mathematically as:
mgpa = 1.95 + 0.04*bgpa + 0.07*gre + 0.44*bgpa*gre ………… (2)
Now, if gre = 0, equation 2 reduces to:
mgpa = 1.95 + 0.04*bgpa + 0.07*0 + 0.44*bgpa*0
mgpa = 1.95+ 0.04*bgpa
And if gre =1, equation 2 reduces to:
mgpa = 1.95 + 0.04*bgpa + 0.07*1 + 0.44*bgpa*1
mgpa = 2.02 + 0.48*bgpa
For the model in equation (2), at the two levels of gre, we find two straight lines with different slopes (0.04 and 0.48), which reveals that the lines are not parallel. This model assumes that the positive association between bgpa and mgpa depends on the level of gre (and vice versa). Therefore, we interpret the model coefficients as:
Provided that gre=0, a one unit increase in bgpa is, on average, associated with 0.04 units increase in mgpa.
Provided that gre=1, a one unit increase in bgpa is, on average, associated with 0.48 units increase in mgpa.
The coefficient of the interaction term (i.e., bgpa: gre1) shows the difference in slope between the two lines (i.e., 0.48–0.04 = 0.44)
We can visualize these in the graph below:
# Predict mgpa for each value of bgpa at the two levels of gre using the model with interaction
data$pred_mgpa_w_interaction <- predict(model_w_interaction, newdata = data)
# Coefficients from the model with interaction
coef_with_interaction <- coef(model_w_interaction)
# Extract the coefficients
intercept <- coef_with_interaction["(Intercept)"]
coef_bgpa <- coef_with_interaction["bgpa"]
coef_gre <- coef_with_interaction["gre"]
coef_interaction <- coef_with_interaction["bgpa:gre"]
# Create the equations as strings
equation_gre_0 <- paste0("mgpa = ", round(intercept, 2), " + ", round(coef_bgpa, 2), "*bgpa")
equation_gre_1 <- paste0("mgpa = ", round(intercept + coef_gre, 2), " + ", round(coef_bgpa + coef_interaction, 2), "*bgpa")
# Find the positions where we want to place the equations
text_positions <- data.frame(
gre = c(0, 1),
x = c(2.5, 2.5), # Adjust x position for better visibility
y = c(
predict(model_w_interaction, newdata = data.frame(gre = 0, bgpa = 2.5)),
predict(model_w_interaction, newdata = data.frame(gre = 1, bgpa = 2.5))
)
)
# Plot the data
ggplot(data, aes(x = bgpa, y = pred_mgpa_w_interaction, color = as.factor(gre))) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm", se = FALSE) + # Add a linear trend line for each level of gre
labs(title = "Relationship between bgpa and mgpa (with Interaction)",
x = "bgpa",
y = "Predicted mgpa",
color = "GRE") +
geom_text(data = text_positions, aes(x = x, y = y, label = ifelse(gre == 0, equation_gre_0, equation_gre_1),
color = as.factor(gre)), size = 6, hjust = -0.1, vjust = 2) + # Adjusted equation position
theme_minimal(base_size = 16) +
theme(legend.position = "none", # Remove the legend since the colors are indicated by the text
plot.title = element_text(size = 20, face = "bold"),
axis.title = element_text(size = 18),
axis.text = element_text(size = 16)) + ylim (1.5,4)
As a real-world explanation of the model coefficients, we can say: Master’s GPA generally tends to be higher for students who have a higher Bachelor’s GPA; however, Master’s GPA apparently increases at a much higher rate with the increase in Bachelor’s GPA for students with GRE scores above 310 than for students with GRE scores equal or below 310. Adding everything together, the positive association between Master’s GPA and Bachelor’s GPA is apparently dependent on the level of the GRE score.
How do we decide whether to include the interaction term or not?
We may use two techniques to decide whether to include the interaction term in the model. Initially, a scatterplot can help us identify whether the linear relationship between a continuous predictor (bgpa) and a continuous outcome (mgpa) varies depending on a categorical predictor (gre).
# Create a scatter plot with lines
ggplot(data, aes(x = bgpa, y = mgpa, color = factor(gre), group = factor(gre))) +
geom_point(alpha = 0.3, size = 3) + # Lightly visible points with larger size
geom_line() + # Add lines connecting the points
labs(
x = "bgpa",
y = "mgpa",
color = "GRE"
) +
theme_minimal() +
theme(
text = element_text(size = 18), # Increase text size for better visibility
axis.title = element_text(size = 18), # Increase axis title size
axis.text = element_text(size = 16), # Increase axis text size
legend.title = element_text(size = 18), # Increase legend title size
legend.text = element_text(size = 16), # Increase legend text size
legend.position = "top" # Place legend at the top
)
From the above plot, we can clearly see a much steeper line passing through the points for which gre = 1. So, the evidence of possible non-parallel lines at different levels of a categorical predictor suggests that we should consider adding an interaction term.
Secondly, after we add the interaction term to the model, if the p-value of the coefficient of the interaction term turns out to be lower than the significance level (usually 0.05), that suggests the interaction term is significantly different from 0. In that case, we should keep the interaction term in the model.
However, both approaches are purely data-driven and atheoretical. The best way to decide whether to include an interaction term is to depend on a relevant theory (i.e., is there any theoretical reason to believe that the association between X and Y depends on the value of Z?)
Wrapping Up
In this article, we learned the interpretation of the coefficients of a model that includes interaction between a continuous predictor and a binary predictor. Also, we learned how to decide whether to include an interaction term in a model.
Before we end…..
You may have noticed that throughout this article, I wrote statements such as:
“a one unit increase in bgpa is, on average, associated with 0.883 units increase in mgpa.”
And I did NOT write statements such as:
“a one unit increase in bgpa increases mgpa by 0.883 units”
“bgpa positively affects/influences/impacts mgpa”
Would you like to know why? 🤔 If you are curious, feel free to visit the following!