This'll be a short post.
How do you generate a scatterplot with 95% confidence bands that have been adjusted for control variables?
Simulate 5 variables; all pairwise correlations, r = .3.
library(MASS) # randomizer seed (so the results can be reporduced) set.seed(1234) dat <- mvrnorm(n = 100, mu = c(0,0,0,0,0), Sigma = diag(x = .7, nrow = 5, ncol = 5) + .3) # add id column dat <- data.frame(id = 1:100, dat)
Fit model where X4 is the key predictor (i.e. it's entered last, predicting X5 'over and above' the other predictors)
dat.model <- lm(X5 ~ X1 + X2 + X3 + X4, data = dat)
Now we need to estimate confidence intervals for X4 when all the controls are in the model.
This is key: predict X5 values from X4 values when all other predictors = 0.
dat.predict <- data.frame(predict(dat.model, new = data.frame(X1 = 0, X2 = 0, X3 = 0, X4 = dat$X4), interval = 'confidence', level = .95))
dat <- data.frame(dat, dat.predict)
We should expect the slope for X4 to get smaller when controls are added.
Summarize model without controls
only.X4.model <- lm(X5 ~ X4, data = dat) summary(only.X4.model)
## ## Call: ## lm(formula = X5 ~ X4, data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.6639 -0.6288 0.0372 0.7109 2.3293 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.009052 0.095681 0.095 0.924823 ## X4 0.358364 0.092481 3.875 0.000193 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9541 on 98 degrees of freedom ## Multiple R-squared: 0.1329, Adjusted R-squared: 0.124 ## F-statistic: 15.02 on 1 and 98 DF, p-value: 0.0001927
Summarize model with controls
summary(dat.model)
## ## Call: ## lm(formula = X5 ~ X1 + X2 + X3 + X4, data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.95272 -0.52728 0.00174 0.65839 2.18738 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.05916 0.09168 -0.645 0.52030 ## X1 0.19753 0.09925 1.990 0.04945 * ## X2 0.12438 0.08579 1.450 0.15040 ## X3 0.21466 0.10055 2.135 0.03535 * ## X4 0.24308 0.09217 2.637 0.00977 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.8937 on 95 degrees of freedom ## Multiple R-squared: 0.2625, Adjusted R-squared: 0.2315 ## F-statistic: 8.455 on 4 and 95 DF, p-value: 7.028e-06
Plot model without controls
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
ggplot(data = dat, aes(x = X4, y = X5)) + geom_point(color = 'black', fill = 'cyan4', size = 2, shape = 21) + geom_smooth(method = 'lm', fill = 'cyan4', lty = 'dashed', color = 'cyan4', alpha = 0.2) + theme_minimal()
Plot model with controls
The confidence bands are based on the standard error for X4 in the full model. I draw the regression line for the model without controls (black, dashed line).
ggplot(data = dat, aes(x = X4, y = X5, ymin = lwr, ymax = upr)) + geom_ribbon(fill = 'cyan4', alpha = 0.2) + geom_point(color = 'black', size = 2, shape = 21, fill = 'cyan4') + geom_line(aes(y = fit), color = 'cyan4', size = 1, lty = 'dashed') + geom_abline(slope = only.X4.model$coefficients[[2]], intercept = only.X4.model$coefficients[[1]], lty = 'dashed', size = 1) + annotate(geom = 'text', x = c(-1,-1), y = c(2.5,2.0), label = c(paste('X5 =', round(only.X4.model$coefficients[[1]],3),'+', round(only.X4.model$coefficients[[2]],3),'X4', '(no controls)'), paste('X5 =', round(dat.model$coefficients[[1]],3),'+', round(dat.model$coefficients[[2]],3),'X1 +', round(dat.model$coefficients[[3]],3),'X2 +', round(dat.model$coefficients[[4]],3),'X3 +', round(dat.model$coefficients[[5]],3),'X4', '(with controls)')), color = c('black','cyan4'), fontface = 'bold') + theme_minimal()
Happy R,
Nick
No comments:
Post a Comment