Novum R-ganum: Reproducing Hayes’s PROCESS Models' results in R

If you're an experimental psychologist, you likely know that journals expect more than "main effects" or "one cause fits all" explanations of psychological phenomenon. That is, they expect any given phenomenon to "depend on" some circumstance or some trait. Or they expect there to exist some process by which the phenomenon occurs, where first X happens, which leads to M followed by Y.

In the social sciences, researchers overwhelmingly prefer two statistical methods for demonstrating such psychological processes: moderation and mediation. Since at least 1986 when Baron and Kenny¹ explained the difference between the two, researchers have gone sorta hog-wild in shoehorning every hypothesis into a moderation and/or mediation framework. The statistical techniques for testing these relationships are so popular that I'd bet any given empirical article published in the past 20 years is no more than 1 degree removed from a reference to Baron & Kenny (1986), Aiken, West, & Reno (1991)², or Preacher & Hayes (2008)³.

This post is about that last reference, specifically Andrew Hayes who had the brilliant idea to code and release macros (for free) that add user-friendly drop-down menus for conducting not 1, 2, but 76 statistical models in SPSS and SAS. If you're a graduate student or a "younger" faculty member, you're probably familiar with at least 2 of these models: Model 1 for moderation and Model 4 for mediation.

I'm going to show you how to reproduce their output in R. I'll even throw in Model 7 (moderated mediation). It's my hope that learning how to run these models in R will provide you a framework for running similar and even more complex models without a tutorial.

Here's where to find the data I use in the post (mediation_data.sav): UCLA SPSS Statistical Computing Workshop Moderation and Mediation using the Process Macro

Hayes's PROCESS Model 1: moderation with a continuous moderator (see Hayes's template for all 76 models here)

Here's how you do it in SPSS

Download the folder with the PROCESS macros and documentation. Open then file called 'process.sps', select everything in the syntax window and click the giant green arrow to run. Then open a new syntax file.

GET
file = YOUR.FILE.PATH.HERE.

PROCESS vars = socst read math
/ x = socst

/ m = math
/ y = read
/ model = 1
/ center = 1
/ boot = 1000.

Run MATRIX procedure:

* PROCESS Procedure for SPSS Release 2.16

          Written by Andrew F. Hayes, Ph.D.       www.afhayes.com

    Documentation available in Hayes (2013). www.guilford.com/p/hayes3

************

Model = 1

    Y = read

    X = socst

    M = math

Sample size

        200

Outcome: read

Model Summary

          R       R-sq        MSE          F        df1        df2          p

      .7390      .5461    48.4421    78.6145     3.0000   196.0000      .0000

Model

              coeff         se          t          p       LLCI       ULCI

constant    51.6153      .5687    90.7626      .0000    50.4938    52.7369

math          .4807      .0637     7.5455      .0000      .3550      .6063

socst         .3738      .0555     6.7301      .0000      .2643      .4834

int_1         .0113      .0052     2.1572      .0322      .0010      .0216

Product terms key:

int_1    socst       X     math

R-square increase due to interaction(s):

         R2-chng          F        df1        df2          p

int_1      .0108     4.6534     1.0000   196.0000      .0322

*

Conditional effect of X on Y at values of the moderator(s):

       math     Effect         se          t          p       LLCI       ULCI

    -9.3684      .2681      .0678     3.9574      .0001      .1345      .4018

      .0000      .3738      .0555     6.7301      .0000      .2643      .4834

     9.3684      .4795      .0799     6.0034      .0000      .3220      .6370

Values for quantitative moderators are the mean and plus/minus one SD from mean.

Values for dichotomous moderators are the two values of the moderator.

ANALYSIS NOTES AND WARNINGS *

Level of confidence for all confidence intervals in output:

    95.00

NOTE: The following variables were mean centered prior to analysis:

socst    math

------ END MATRIX -----

Here's how you do it in R (highlights = output to compare)

Read in the data

setwd(dir = "~/Desktop/analysis-examples/Replicating Hayes's PROCESS Models in R/")

want_packages <- c("Hmisc", "lavaan", "knitr")
have_packages   <- want_packages %in% rownames(installed.packages())
if(any(!have_packages)) install.packages(want_packages[!have_packages])
library(Hmisc)

egdata <- spss.get(file = "~/Desktop/analysis-examples/Replicating Hayes's PROCESS Models in R/mediation_data.sav",
                     lowernames = TRUE,
                     to.data.frame = TRUE)

Functions used

library() activates a package in your library. Think of it as taking a package of the shelf.
installed.packages() gives you a list of all installed packages
install.packages() downloads the name of the package you give it (R has so many packages and people develop more every day) and puts it in your library
%in% matches the names of elements in one vector with the names of elements in another
any() look at this/these logical vector(s) (TRUES and FALSES): is at least one of the values true?
if() use this for executing functions using if-then logic
spss.get() for reading in SPSS files. I like this function more than the foreign package's read.spss() function just because it has more features

Packages used

Hmisc is a grab bag of functions for data analysis and management (e.g., spss.get(), but also way more cool stuff)

Center variables

Below I create a function for centering the variables I'll use in my examples. Then I create those centered variables. We know the centering "worked" if the centered variable's mean = 0 but the standard deviation is the same as the non-centered variable.

# function for centering variables
center.var <- function(x){
  centered <- as.vector(scale(x = x, center = TRUE,
                              scale = FALSE))
  return(centered)
}

# apply function to many variables and name them oldname_c (_c for centered)
egdata[,c("math_c", "read_c", "write_c", "socst_c")] <- lapply(X = egdata[,c("math", "read", "write", "socst")], FUN = function(x) center.var(x))

# means of the vars
sapply(egdata[,c("math", "math_c", "read", "read_c", "write", "write_c", "socst", "socst_c")],
       FUN = function(x){
         round(mean(x), 2)
       }, simplify = TRUE,
       USE.NAMES = TRUE)

##    math  math_c    read  read_c   write write_c   socst socst_c 
##   52.65    0.00   52.23    0.00   52.77    0.00   52.41    0.00

# sd of the vars
sapply(egdata[,c("math", "math_c", "read", "read_c", "write", "write_c", "socst", "socst_c")],
       FUN = function(x){
         round(sd(x), 2)
       }, simplify = TRUE,
       USE.NAMES = TRUE)

##    math  math_c    read  read_c   write write_c   socst socst_c 
##    9.37    9.37   10.25   10.25    9.48    9.48   10.74   10.74

Functions used

lapply() takes a vector and applys a function to each element of that vector; then it returns a list of each output
sapply() like lapply above by with some more features like simplify (basically trys to make the output prettier) and USE.NAMES (names each output with the names of the input)
function() use this to make your own function

Hayes's PROCESS Model 1 results for moderation (continuous moderator)

The code is tricky at first but I think it's way easier to learn than Stata's or Mplus's (yes, SPSS's AMOS is easier, like Microsoft paint for SEM). I suggest looking over the basic tutorials on lavaan's webpage. For now, fitting and running a model essentially takes 2 steps:

1. write the model using regression syntax (~ means =), making sure to name coefficients (`a1variablename` means the coeff for variablename will be called `a1`) and new paramters (`indirect := a1 b1` will give you an output line for a1 x b1 called `indirect`)

2. run that model using `sem()` or `cfa()`, making sure to give the function the name of the dataframe with your variables

The new part is writting a model as a character element and using that as an argument in the lavaan functions. More new parts are the lavaan operators. They're easy to learn, I think.

library(lavaan)

# create interaction term between X (socst) and M (math)
egdata$socst_c.math_c <- with(egdata,
                              socst_c*math_c)

# parameters
hayes1 <- '# regressions
           read ~ b1*socst_c
           read ~ b2*math_c
           read ~ b3*socst_c.math_c

           # mean of centered math (for use in simple slopes)
           math_c ~ math.mean*1

           # variance of centered math (for use in simple slopes)
           math_c ~~ math.var*math_c

           # simple slopes
           SD.below := b1 + b3*(math.mean - sqrt(math.var))
           mean := b1 + b3*(math.mean)
           SD.above := b1 + b3*(math.mean + sqrt(math.var))
           '

# fit the model (this takes some time)
sem1 <- sem(model = hayes1,
            data = egdata,
            se = "bootstrap",
            bootstrap = 1000)

# fit measures
summary(sem1,
        fit.measures = TRUE,
        standardize = TRUE,
        rsquare = TRUE)

## lavaan (0.5-20) converged normally after  17 iterations
## 
##   Number of observations                           200
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               76.103
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              233.790
##   Degrees of freedom                                 5
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.676
##   Tucker-Lewis Index (TLI)                       0.190
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3354.138
##   Loglikelihood unrestricted model (H1)      -3316.086
## 
##   Number of free parameters                          7
##   Akaike (AIC)                                6722.276
##   Bayesian (BIC)                              6745.364
##   Sample-size adjusted Bayesian (BIC)         6723.187
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.430
##   90 Percent Confidence Interval          0.351  0.516
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.180
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##   read ~                                                                
##     socst_c   (b1)    0.374    0.054    6.974    0.000    0.374    0.437
##     math_c    (b2)    0.481    0.062    7.799    0.000    0.481    0.490
##     scst_c.m_ (b3)    0.011    0.005    2.270    0.023    0.011    0.118
## 
## Intercepts:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     math_c  (mth.)   -0.000    0.648   -0.000    1.000   -0.000   -0.000
##     read             51.615    0.564   91.462    0.000   51.615    5.628
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     math_c  (mth.)   87.329    7.115   12.274    0.000   87.329    1.000
##     read             47.473    4.611   10.295    0.000   47.473    0.564
## 
## R-Square:
##                    Estimate
##     read              0.436
## 
## Defined Parameters:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     SD.below          0.268    0.067    4.025    0.000    0.268    0.319
##     mean              0.374    0.054    6.929    0.000    0.374    0.437
##     SD.above          0.479    0.075    6.359    0.000    0.479    0.554

# bootstraps

parameterEstimates(sem1,
                   boot.ci.type = "bca.simple",
                   level = .95,
                   ci = TRUE,
                   standardized = FALSE)

##               lhs op                              rhs     label      est
## 1            read  ~                          socst_c        b1    0.374
## 2            read  ~                           math_c        b2    0.481
## 3            read  ~                   socst_c.math_c        b3    0.011
## 4          math_c ~1                                  math.mean    0.000
## 5          math_c ~~                           math_c  math.var   87.329
## 6            read ~~                             read             47.473
## 7         socst_c ~~                          socst_c            114.681
## 8         socst_c ~~                   socst_c.math_c            -88.161
## 9  socst_c.math_c ~~                   socst_c.math_c           9184.507
## 10           read ~1                                              51.615
## 11        socst_c ~1                                               0.000
## 12 socst_c.math_c ~1                                              54.489
## 13       SD.below := b1+b3*(math.mean-sqrt(math.var))  SD.below    0.268
## 14           mean :=                b1+b3*(math.mean)      mean    0.374
## 15       SD.above := b1+b3*(math.mean+sqrt(math.var))  SD.above    0.479
##       se      z pvalue ci.lower ci.upper
## 1  0.054  6.974  0.000    0.267    0.474
## 2  0.062  7.799  0.000    0.353    0.601
## 3  0.005  2.270  0.023    0.002    0.022
## 4  0.648  0.000  1.000   -1.345    1.210
## 5  7.115 12.274  0.000   74.678  102.638
## 6  4.611 10.295  0.000   39.785   59.013
## 7  0.000     NA     NA  114.681  114.681
## 8  0.000     NA     NA  -88.161  -88.161
## 9  0.000     NA     NA 9184.507 9184.507
## 10 0.564 91.462  0.000   50.514   52.725
## 11 0.000     NA     NA    0.000    0.000
## 12 0.000     NA     NA   54.489   54.489
## 13 0.067  4.025  0.000    0.125    0.402
## 14 0.054  6.929  0.000    0.268    0.476
## 15 0.075  6.359  0.000    0.323    0.625

Functions used

with() I use this function when I don't want to call a dataframe everytime I name a variable (i.e., mydata$myvariable)
$ extracts an element from an object
sem() structural equation model fit function
summary() summarises objects like linear models or dataframes and more
'parameterEstimates()' outputs parameter estimates from an sem() or cfa() model

Packages used

lavaan badass package for structural equation modeling; it's like MPLUS but free, easier to use, and prettier

Hayes's PROCESS Model 4: "classic" mediation

Here's how you do it in SPSS

PROCESS vars = science read math
/ x = math
/ m = read
/ y = science
/ model = 4
/ center = 1
/ total = 1
/ effsize = 1
/ boot = 1000.

Run MATRIX procedure:

*************** PROCESS Procedure for SPSS Release 2.16 ******************

          Written by Andrew F. Hayes, Ph.D.       www.afhayes.com

    Documentation available in Hayes (2013). www.guilford.com/p/hayes3

**************************************************************************

Model = 4

    Y = science

    X = math

    M = read

Sample size

        200

**************************************************************************

Outcome: read

Model Summary

          R       R-sq        MSE          F        df1        df2          p

      .6623      .4386    59.3124   154.6991     1.0000   198.0000      .0000

Model

              coeff         se          t          p       LLCI       ULCI

constant    14.0725     3.1158     4.5165      .0000     7.9281    20.2170

math          .7248      .0583    12.4378      .0000      .6099      .8397

**************************************************************************

Outcome: science

Model Summary

          R       R-sq        MSE          F        df1        df2          p

      .6915      .4782    51.6688    90.2743     2.0000   197.0000      .0000

Model

              coeff         se          t          p       LLCI       ULCI

constant    11.6155     3.0543     3.8030      .0002     5.5923    17.6387

read          .3654      .0663     5.5091      .0000      .2346      .4962

math          .4017      .0726     5.5339      .0000      .2586      .5449

************************** TOTAL EFFECT MODEL ****************************

Outcome: science

Model Summary

          R       R-sq        MSE          F        df1        df2          p

      .6307      .3978    59.3280   130.8077     1.0000   198.0000      .0000

Model

              coeff         se          t          p       LLCI       ULCI

constant    16.7579     3.1162     5.3776      .0000    10.6126    22.9032

math          .6666      .0583    11.4371      .0000      .5516      .7815

***************** TOTAL, DIRECT, AND INDIRECT EFFECTS ********************

Total effect of X on Y

     Effect         SE          t          p       LLCI       ULCI

      .6666      .0583    11.4371      .0000      .5516      .7815

Direct effect of X on Y

     Effect         SE          t          p       LLCI       ULCI

      .4017      .0726     5.5339      .0000      .2586      .5449

Indirect effect of X on Y

         Effect    Boot SE   BootLLCI   BootULCI

read      .2649      .0550      .1590      .3760

Partially standardized indirect effect of X on Y

         Effect    Boot SE   BootLLCI   BootULCI

read      .0268      .0054      .0158      .0378

Completely standardized indirect effect of X on Y

         Effect    Boot SE   BootLLCI   BootULCI

read      .2506      .0519      .1482      .3537

Ratio of indirect to total effect of X on Y

         Effect    Boot SE   BootLLCI   BootULCI

read      .3973      .0962      .2090      .5989

Ratio of indirect to direct effect of X on Y

         Effect    Boot SE   BootLLCI   BootULCI

read      .6593      .3179      .2642     1.4929

R-squared mediation effect size (R-sq_med)

         Effect    Boot SE   BootLLCI   BootULCI

read      .3167      .0396      .2421      .4000

******************** ANALYSIS NOTES AND WARNINGS *************************

Number of bootstrap samples for bias corrected bootstrap confidence intervals:

     1000

Level of confidence for all confidence intervals in output:

    95.00

NOTE: Kappa-squared is disabled from output as of version 2.16.

------ END MATRIX -----

Here's how you do it in R

Reproducing Hayes's PROCESS Model 4 results for "classic" mediation

# parameters
hayes4 <- ' # direct effect
              science ~ c*math_c
              direct := c

            # regressions
              read_c ~ a*math_c
              science ~ b*read_c

            # indirect effect (a*b)
              indirect := a*b

            # total effect
              total := c + (a*b)'

# fit model
sem2 <- sem(model = hayes4,
            data = egdata,
            se = "bootstrap",
            bootstrap = 1000)
# fit measures
summary(sem2,
        fit.measures = TRUE,
        standardize = TRUE,
        rsquare = TRUE)

## lavaan (0.5-20) converged normally after  14 iterations
## 
##   Number of observations                           200
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                0.000
##   Degrees of freedom                                 0
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              245.569
##   Degrees of freedom                                 3
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2098.582
##   Loglikelihood unrestricted model (H1)      -2098.582
## 
##   Number of free parameters                          5
##   Akaike (AIC)                                4207.164
##   Bayesian (BIC)                              4223.656
##   Sample-size adjusted Bayesian (BIC)         4207.816
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent Confidence Interval          0.000  0.000
##   P-value RMSEA <= 0.05                          1.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws             997
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##   science ~                                                             
##     math_c     (c)    0.402    0.084    4.771    0.000    0.402    0.380
##   read_c ~                                                              
##     math_c     (a)    0.725    0.058   12.502    0.000    0.725    0.662
##   science ~                                                             
##     read_c     (b)    0.365    0.076    4.818    0.000    0.365    0.378
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     science          50.894    4.937   10.309    0.000   50.894    0.522
##     read_c           58.719    5.853   10.033    0.000   58.719    0.561
## 
## R-Square:
##                    Estimate
##     science           0.478
##     read_c            0.439
## 
## Defined Parameters:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     direct            0.402    0.084    4.768    0.000    0.402    0.380
##     indirect          0.265    0.055    4.773    0.000    0.265    0.251
##     total             0.667    0.058   11.418    0.000    0.667    0.631

# bootstraps
parameterEstimates(sem2,
                   boot.ci.type = "bca.simple",
                   level = .95, ci = TRUE,
                   standardized = FALSE)

##        lhs op     rhs    label    est    se      z pvalue ci.lower
## 1  science  ~  math_c        c  0.402 0.084  4.771      0    0.255
## 2   read_c  ~  math_c        a  0.725 0.058 12.502      0    0.609
## 3  science  ~  read_c        b  0.365 0.076  4.818      0    0.216
## 4  science ~~ science          50.894 4.937 10.309      0   42.729
## 5   read_c ~~  read_c          58.719 5.853 10.033      0   48.068
## 6   math_c ~~  math_c          87.329 0.000     NA     NA   87.329
## 7   direct :=       c   direct  0.402 0.084  4.768      0    0.255
## 8 indirect :=     a*b indirect  0.265 0.055  4.773      0    0.156
## 9    total := c+(a*b)    total  0.667 0.058 11.418      0    0.562
##   ci.upper
## 1    0.573
## 2    0.833
## 3    0.511
## 4   63.073
## 5   71.236
## 6   87.329
## 7    0.573
## 8    0.375
## 9    0.790

Hayes's PROCESS Model 7: moderated mediation with a continuous moderator

Here's how you do it in SPSS

PROCESS vars = science read math write
/ x = math
/ m = read
/ w = write
/ y = science
/ model = 7
/ center = 1
/ total = 1
/ boot = 1000.

Run MATRIX procedure:

*************** PROCESS Procedure for SPSS Release 2.16 ******************

          Written by Andrew F. Hayes, Ph.D.       www.afhayes.com

    Documentation available in Hayes (2013). www.guilford.com/p/hayes3

**************************************************************************

Model = 7

    Y = science

    X = math

    M = read

    W = write

Sample size

        200

**************************************************************************

Outcome: read

Model Summary

          R       R-sq        MSE          F        df1        df2          p

      .7048      .4968    53.7100    64.4961     3.0000   196.0000      .0000

Model

              coeff         se          t          p       LLCI       ULCI

constant    51.9853      .6362    81.7178      .0000    50.7307    53.2399

math          .5075      .0728     6.9667      .0000      .3638      .6511

write         .3403      .0720     4.7292      .0000      .1984      .4823

int_1         .0045      .0068      .6633      .5079     -.0089      .0178

Product terms key:

int_1    math        X     write

**************************************************************************

Outcome: science

Model Summary

          R       R-sq        MSE          F        df1        df2          p

      .6915      .4782    51.6688    90.2743     2.0000   197.0000      .0000

Model

              coeff         se          t          p       LLCI       ULCI

constant    32.7641     3.5015     9.3572      .0000    25.8589    39.6693

read          .3654      .0663     5.5091      .0000      .2346      .4962

math          .4017      .0726     5.5339      .0000      .2586      .5449

******************** DIRECT AND INDIRECT EFFECTS *************************

Direct effect of X on Y

     Effect         SE          t          p       LLCI       ULCI

      .4017      .0726     5.5339      .0000      .2586      .5449

Conditional indirect effect(s) of X on Y at values of the moderator(s):

Mediator

          write     Effect    Boot SE   BootLLCI   BootULCI

read    -9.4786      .1699      .0525      .0843      .2993

read      .0000      .1854      .0435      .1099      .2847

read     9.4786      .2010      .0441      .1231      .3046

Values for quantitative moderators are the mean and plus/minus one SD from mean.

Values for dichotomous moderators are the two values of the moderator.

******************** INDEX OF MODERATED MEDIATION ************************

Mediator

          Index   SE(Boot)   BootLLCI   BootULCI

read      .0016      .0023     -.0034      .0059

******************** ANALYSIS NOTES AND WARNINGS *************************

Number of bootstrap samples for bias corrected bootstrap confidence intervals:

     1000

Level of confidence for all confidence intervals in output:

    95.00

NOTE: The following variables were mean centered prior to analysis:

math     write

------ END MATRIX -----

Here's how you do it in R

Note: cdash = direct effect (c')

Reproducing Hayes's PROCESS Model 7 results for moderated mediation

# create interaction term between X (math) and W (write)
egdata$math_c.write_c <- with(egdata, math_c*write_c)

hayes7 <- ' # regressions
              read_c ~ a1*math_c
              science ~ b1*read_c
              read_c ~ a2*write_c
              read_c ~ a3*math_c.write_c
              science ~ cdash*math_c

            # mean of centered write (for use in simple slopes)
              write_c ~ write.mean*1

            # variance of centered write (for use in simple slopes)
              write_c ~~ write.var*write_c

            # indirect effects conditional on moderator (a1 + a3*a2.value)*b1
              indirect.SDbelow := a1*b1 + a3*-sqrt(write.var)*b1
              indirect.mean := a1*b1 + a3*write.mean*b1
              indirect.SDabove := a1*b1 + a3*sqrt(write.var)*b1'

# fit model
sem3 <- sem(model = hayes7,
            data = egdata,
            se = "bootstrap",
            bootstrap = 1000)

# fit measures
summary(sem3,
        fit.measures = TRUE,
        standardize = TRUE,
        rsquare = TRUE)

## lavaan (0.5-20) converged normally after  26 iterations
## 
##   Number of observations                           200
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              119.179
##   Degrees of freedom                                 4
##   P-value (Chi-square)                           0.000
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              386.820
##   Degrees of freedom                                 9
##   P-value                                        0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.695
##   Tucker-Lewis Index (TLI)                       0.314
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3978.754
##   Loglikelihood unrestricted model (H1)      -3919.164
## 
##   Number of free parameters                         11
##   Akaike (AIC)                                7979.508
##   Bayesian (BIC)                              8015.789
##   Sample-size adjusted Bayesian (BIC)         7980.940
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.379
##   90 Percent Confidence Interval          0.323  0.440
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.199
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                            Bootstrap
##   Number of requested bootstrap draws             1000
##   Number of successful bootstrap draws            1000
## 
## Regressions:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##   read_c ~                                                              
##     math_c    (a1)    0.507    0.073    6.929    0.000    0.507    0.511
##   science ~                                                             
##     read_c    (b1)    0.365    0.076    4.832    0.000    0.365    0.358
##   read_c ~                                                              
##     write_c   (a2)    0.340    0.070    4.832    0.000    0.340    0.347
##     mth_c._   (a3)    0.004    0.006    0.737    0.461    0.004    0.039
##   science ~                                                             
##     math_c  (cdsh)    0.402    0.085    4.753    0.000    0.402    0.397
## 
## Intercepts:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     write_c (wrt.)    0.000    0.680    0.000    1.000    0.000    0.000
##     read_c           -0.245    0.645   -0.379    0.705   -0.245   -0.026
##     science          51.850    0.508  102.017    0.000   51.850    5.477
## 
## Variances:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     write_c (wrt.)   89.394    7.323   12.207    0.000   89.394    1.000
##     read_c           52.636    4.674   11.261    0.000   52.636    0.612
##     science          50.894    4.810   10.582    0.000   50.894    0.568
## 
## R-Square:
##                    Estimate
##     read_c            0.388
##     science           0.432
## 
## Defined Parameters:
##                    Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
##     indirect.SDblw    0.170    0.050    3.372    0.001    0.170    0.169
##     indirect.mean     0.185    0.041    4.488    0.000    0.185    0.183
##     indirect.SDabv    0.201    0.043    4.715    0.000    0.201    0.197

# bootstraps
parameterEstimates(sem3,
                   boot.ci.type = "bca.simple",
                   level = .95, ci = TRUE,
                   standardized = FALSE)

##                 lhs op                          rhs            label
## 1            read_c  ~                       math_c               a1
## 2           science  ~                       read_c               b1
## 3            read_c  ~                      write_c               a2
## 4            read_c  ~               math_c.write_c               a3
## 5           science  ~                       math_c            cdash
## 6           write_c ~1                                    write.mean
## 7           write_c ~~                      write_c        write.var
## 8            read_c ~~                       read_c                 
## 9           science ~~                      science                 
## 10           math_c ~~                       math_c                 
## 11           math_c ~~               math_c.write_c                 
## 12   math_c.write_c ~~               math_c.write_c                 
## 13           read_c ~1                                              
## 14          science ~1                                              
## 15           math_c ~1                                              
## 16   math_c.write_c ~1                                              
## 17 indirect.SDbelow := a1*b1+a3*-sqrt(write.var)*b1 indirect.SDbelow
## 18    indirect.mean :=       a1*b1+a3*write.mean*b1    indirect.mean
## 19 indirect.SDabove :=  a1*b1+a3*sqrt(write.var)*b1 indirect.SDabove
##         est    se       z pvalue ci.lower ci.upper
## 1     0.507 0.073   6.929  0.000    0.359    0.647
## 2     0.365 0.076   4.832  0.000    0.213    0.521
## 3     0.340 0.070   4.832  0.000    0.202    0.481
## 4     0.004 0.006   0.737  0.461   -0.008    0.016
## 5     0.402 0.085   4.753  0.000    0.232    0.578
## 6     0.000 0.680   0.000  1.000   -1.439    1.201
## 7    89.394 7.323  12.207  0.000   77.074  106.572
## 8    52.636 4.674  11.261  0.000   44.066   62.911
## 9    50.894 4.810  10.582  0.000   42.479   61.583
## 10   87.329 0.000      NA     NA   87.329   87.329
## 11   91.757 0.000      NA     NA   91.757   91.757
## 12 6358.530 0.000      NA     NA 6358.530 6358.530
## 13   -0.245 0.645  -0.379  0.705   -1.430    1.138
## 14   51.850 0.508 102.017  0.000   50.885   52.868
## 15    0.000 0.000      NA     NA    0.000    0.000
## 16   54.555 0.000      NA     NA   54.555   54.555
## 17    0.170 0.050   3.372  0.001    0.085    0.287
## 18    0.185 0.041   4.488  0.000    0.110    0.270
## 19    0.201 0.043   4.715  0.000    0.125    0.299

For now I'm too lazy to explore assumptions or diagnostics or interpretations. Of course, whether I "ran these analyses right" depends on the data meeting assumptions, but my goal here was simpler than that: reproduce Hayes's macro outputs. Win.

If you're interested in assumptions or diagnostics or interpretations of these models—I think you should be—then I recommend checking out Hayes's book. Also, if you haven't guessed, these models are SEM models; any good material on SEM = good material for learning when and how to run and interpret these kinds of models. The lavaan people recommend Alex Beaujean's book: Latent Variable Modeling using R: A Step-By-Step Guide.

I linked Hayes's template for his 76 models above, but here it is again. With some time and some gumption, you should be able to use the spirit of the code in this post to run every (or nearly every) model in that PDF. You can find even more models here. I also suggest visiting lavaan's webpage for really simple (and aesthetically pleasing) tutorials on how to use their package.

Happy R

Papers I cited

Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173-1182.
Aiken, L. S., West, S. G., & Reno, R. R. (1991). Multiple regression: Testing and interpreting interactions. New York, NY: Sage.
Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40(3), 879-891.
Hayes, A. F. (2015). An index and test of linear moderated mediation. Multivariate Behavioral Research, 50(1), 1-22.

Update: I forgot Hayes's Index of Moderated Mediation⁴. What's that?

"The heart of the test is a quantification of the association between an indirect effect and a moderator—an "index of moderated mediation"—followed by an inference as to whether this index is different from zero. I show that if the outcome of this test results in a claim that an indirect effect is moderated, this means that any two conditional indirect effects estimated at different values of the moderator are significantly different from each other."

I ran the model again and pasted some abbreviated code and output below. Again, highlights = output to compare to SPSS above. One interpretation I'll offer here is that one should not conclude that the indirect effect is significantly different at different values of the moderator. Notice that though the "simple slopes" are significantly different than zero at 1 standard deviation above and below the mean, they're not significantly different from each other. This is indexed by the aptly named index of moderated mediation (its CI captured zero), but you could also guess this to be the case by looking at the CIs for each simple slope; their CIs capture each others' estimates even if they don't capture zero.

hayes7 <- ' # regressions
              read_c ~ a1*math_c

            ...


            # Index of moderated mediation
              index.mod.med := a3*b1

            # indirect effects conditional on moderator (a1 + a3*a2.value)*b1
              indirect.SDbelow := a1*b1 + a3*-sqrt(write.var)*b1
              indirect.mean := a1*b1 + a3*write.mean*b1
              indirect.SDabove := a1*b1 + a3*sqrt(write.var)*b1'

# fit model
sem3 <- sem(model = hayes7,
            data = egdata,
            se = "bootstrap",
            bootstrap = 1000)

# bootstraps
parameterEstimates(sem3,
                   boot.ci.type = "bca.simple",
                   level = .95, ci = TRUE,
                   standardized = FALSE)

##                 lhs op                          rhs            label
## 1            read_c  ~                       math_c               a1

...

## 17    index.mod.med :=                        a3*b1    index.mod.med
## 18 indirect.SDbelow := a1*b1+a3*-sqrt(write.var)*b1 indirect.SDbelow
## 19    indirect.mean :=       a1*b1+a3*write.mean*b1    indirect.mean
## 20 indirect.SDabove :=  a1*b1+a3*sqrt(write.var)*b1 indirect.SDabove
##         est    se       z pvalue ci.lower ci.upper
## 1     0.507 0.074   6.816  0.000    0.358    0.650

                                   ...   
## 17    0.002 0.002   0.734  0.463   -0.003    0.006
## 18    0.170 0.052   3.290  0.001    0.089    0.297
## 19    0.185 0.042   4.384  0.000    0.119    0.295
## 20    0.201 0.042   4.739  0.000    0.129    0.304

Novum R-ganum

Friday, July 29, 2016

Reproducing Hayes’s PROCESS Models' results in R

I'm going to show you how to reproduce their output in R. I'll even throw in Model 7 (moderated mediation). It's my hope that learning how to run these models in R will provide you a framework for running similar and even more complex models without a tutorial.

Here's where to find the data I use in the post (mediation_data.sav): UCLA SPSS Statistical Computing Workshop Moderation and Mediation using the Process Macro

Hayes's PROCESS Model 1: moderation with a continuous moderator (see Hayes's template for all 76 models here)

Here's how you do it in SPSS

Download the folder with the PROCESS macros and documentation. Open then file called 'process.sps', select everything in the syntax window and click the giant green arrow to run. Then open a new syntax file.

Run MATRIX procedure:

Here's how you do it in R (highlights = output to compare)

Read in the data

Functions used

Packages used

Center variables

Below I create a function for centering the variables I'll use in my examples. Then I create those centered variables. We know the centering "worked" if the centered variable's mean = 0 but the standard deviation is the same as the non-centered variable.

Functions used

Hayes's PROCESS Model 1 results for moderation (continuous moderator)

The code is tricky at first but I think it's way easier to learn than Stata's or Mplus's (yes, SPSS's AMOS is easier, like Microsoft paint for SEM). I suggest looking over the basic tutorials on lavaan's webpage. For now, fitting and running a model essentially takes 2 steps:

1. write the model using regression syntax (~ means =), making sure to name coefficients (`a1variablename` means the coeff for variablename will be called `a1`) and new paramters (`indirect := a1 b1` will give you an output line for a1 x b1 called `indirect`)

2. run that model using `sem()` or `cfa()`, making sure to give the function the name of the dataframe with your variables

The new part is writting a model as a character element and using that as an argument in the lavaan functions. More new parts are the lavaan operators. They're easy to learn, I think.

Functions used

Packages used

Hayes's PROCESS Model 4: "classic" mediation

Here's how you do it in SPSS

Run MATRIX procedure:

Here's how you do it in R

Reproducing Hayes's PROCESS Model 4 results for "classic" mediation

Hayes's PROCESS Model 7: moderated mediation with a continuous moderator

Here's how you do it in SPSS

Run MATRIX procedure:

Here's how you do it in R

Reproducing Hayes's PROCESS Model 7 results for moderated mediation

For now I'm too lazy to explore assumptions or diagnostics or interpretations. Of course, whether I "ran these analyses right" depends on the data meeting assumptions, but my goal here was simpler than that: reproduce Hayes's macro outputs. Win.

Happy R

Papers I cited

Update: I forgot Hayes's Index of Moderated Mediation⁴. What's that?

No comments:

Post a Comment

Friday, July 29, 2016

Reproducing Hayes’s PROCESS Models' results in R

I'm going to show you how to reproduce their output in R. I'll even throw in Model 7 (moderated mediation). It's my hope that learning how to run these models in R will provide you a framework for running similar and even more complex models without a tutorial.

Here's where to find the data I use in the post (mediation_data.sav): UCLA SPSS Statistical Computing Workshop Moderation and Mediation using the Process Macro

Hayes's PROCESS Model 1: moderation with a continuous moderator (see Hayes's template for all 76 models here)

Here's how you do it in SPSS

Download the folder with the PROCESS macros and documentation. Open then file called 'process.sps', select everything in the syntax window and click the giant green arrow to run. Then open a new syntax file.

Run MATRIX procedure:

Here's how you do it in R (highlights = output to compare)

Read in the data

Functions used

Packages used

Center variables

Below I create a function for centering the variables I'll use in my examples. Then I create those centered variables. We know the centering "worked" if the centered variable's mean = 0 but the standard deviation is the same as the non-centered variable.

Functions used

Hayes's PROCESS Model 1 results for moderation (continuous moderator)

The code is tricky at first but I think it's way easier to learn than Stata's or Mplus's (yes, SPSS's AMOS is easier, like Microsoft paint for SEM). I suggest looking over the basic tutorials on lavaan's webpage. For now, fitting and running a model essentially takes 2 steps:

1. write the model using regression syntax (~ means =), making sure to name coefficients (a1*variablename means the coeff for variablename will be called a1) and new paramters (indirect := a1 * b1 will give you an output line for a1 x b1 called indirect)

2. run that model using sem() or cfa(), making sure to give the function the name of the dataframe with your variables

The new part is writting a model as a character element and using that as an argument in the lavaan functions. More new parts are the lavaan operators. They're easy to learn, I think.

Functions used

Packages used

Hayes's PROCESS Model 4: "classic" mediation

Here's how you do it in SPSS

Run MATRIX procedure:

Here's how you do it in R

Reproducing Hayes's PROCESS Model 4 results for "classic" mediation

Hayes's PROCESS Model 7: moderated mediation with a continuous moderator

Here's how you do it in SPSS

Run MATRIX procedure:

Here's how you do it in R

Reproducing Hayes's PROCESS Model 7 results for moderated mediation

For now I'm too lazy to explore assumptions or diagnostics or interpretations. Of course, whether I "ran these analyses right" depends on the data meeting assumptions, but my goal here was simpler than that: reproduce Hayes's macro outputs. Win.

Happy R

Papers I cited

Update: I forgot Hayes's Index of Moderated Mediation4. What's that?

No comments:

Post a Comment

1. write the model using regression syntax (~ means =), making sure to name coefficients (`a1variablename` means the coeff for variablename will be called `a1`) and new paramters (`indirect := a1 b1` will give you an output line for a1 x b1 called `indirect`)

2. run that model using `sem()` or `cfa()`, making sure to give the function the name of the dataframe with your variables

Update: I forgot Hayes's Index of Moderated Mediation⁴. What's that?