Monday, January 30, 2017

Plotting relationships between two variables (with 95% confidence bands), holding other variables constant

This'll be a short post.

Question:
How do you generate a scatterplot with 95% confidence bands that have been adjusted for control variables?

Simulate 5 variables; all pairwise correlations, r = .3.


library(MASS)
# randomizer seed (so the results can be reporduced)
set.seed(1234)

dat <- mvrnorm(n = 100, mu = c(0,0,0,0,0), Sigma = diag(x = .7, nrow = 5, ncol = 5) + .3)

# add id column
dat <- data.frame(id = 1:100, dat)

Fit model where X4 is the key predictor (i.e. it's entered last, predicting X5 'over and above' the other predictors)


dat.model <- lm(X5 ~ X1 + X2 + X3 + X4, data = dat)

Now we need to estimate confidence intervals for X4 when all the controls are in the model.

This is key: predict X5 values from X4 values when all other predictors = 0.


dat.predict <- data.frame(predict(dat.model,
                            new = data.frame(X1 = 0,
                                             X2 = 0,
                                             X3 = 0,
                                             X4 = dat$X4),
                            interval = 'confidence',
                            level = .95))


The predict function outputs 3 columns: fit, lwr, upr. Below we add these new columns to our original dataframe.


dat <- data.frame(dat, dat.predict)

We should expect the slope for X4 to get smaller when controls are added.

Summarize model without controls


only.X4.model <- lm(X5 ~ X4, data = dat)
summary(only.X4.model)
## 
## Call:
## lm(formula = X5 ~ X4, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6639 -0.6288  0.0372  0.7109  2.3293 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.009052   0.095681   0.095 0.924823    
## X4          0.358364   0.092481   3.875 0.000193 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9541 on 98 degrees of freedom
## Multiple R-squared:  0.1329, Adjusted R-squared:  0.124 
## F-statistic: 15.02 on 1 and 98 DF,  p-value: 0.0001927

Summarize model with controls


summary(dat.model)
## 
## Call:
## lm(formula = X5 ~ X1 + X2 + X3 + X4, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.95272 -0.52728  0.00174  0.65839  2.18738 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.05916    0.09168  -0.645  0.52030   
## X1           0.19753    0.09925   1.990  0.04945 * 
## X2           0.12438    0.08579   1.450  0.15040   
## X3           0.21466    0.10055   2.135  0.03535 * 
## X4           0.24308    0.09217   2.637  0.00977 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8937 on 95 degrees of freedom
## Multiple R-squared:  0.2625, Adjusted R-squared:  0.2315 
## F-statistic: 8.455 on 4 and 95 DF,  p-value: 7.028e-06

Plot model without controls


library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
ggplot(data = dat, aes(x = X4, y = X5)) +
  geom_point(color = 'black', fill = 'cyan4', size = 2, shape = 21) +
  geom_smooth(method = 'lm', fill = 'cyan4', lty = 'dashed', color = 'cyan4', alpha = 0.2) +
  theme_minimal()

Plot model with controls

The confidence bands are based on the standard error for X4 in the full model. I draw the regression line for the model without controls (black, dashed line).


ggplot(data = dat, aes(x = X4, y = X5, ymin = lwr, ymax = upr)) +
  geom_ribbon(fill = 'cyan4', alpha = 0.2) +
  geom_point(color = 'black', size = 2, shape = 21, fill = 'cyan4') +
  geom_line(aes(y = fit), color = 'cyan4', size = 1, lty = 'dashed') +
  geom_abline(slope = only.X4.model$coefficients[[2]], intercept = only.X4.model$coefficients[[1]], lty = 'dashed', size = 1) +
  annotate(geom = 'text', x = c(-1,-1), y = c(2.5,2.0), label = c(paste('X5 =',
                                                                        round(only.X4.model$coefficients[[1]],3),'+',
                                                                        round(only.X4.model$coefficients[[2]],3),'X4',
                                                                        '(no controls)'),
                                                                  paste('X5 =',
                                                                        round(dat.model$coefficients[[1]],3),'+',
                                                                        round(dat.model$coefficients[[2]],3),'X1 +',
                                                                        round(dat.model$coefficients[[3]],3),'X2 +',
                                                                        round(dat.model$coefficients[[4]],3),'X3 +',
                                                                        round(dat.model$coefficients[[5]],3),'X4',
                                                                        '(with controls)')), color = c('black','cyan4'), fontface = 'bold') +
  theme_minimal()


Happy R,

Nick

No comments:

Post a Comment