Wednesday, June 1, 2016

Repeated measures contrasts in R and SPSS

 

SPSS image and R image

 

This is my first post so I'm going to demonstrate something simple: testing the effects of one within-subjects factor. My goal in this post and others is to provide simple step-by-step procedures for conducting analyses in R that you would have otherwise conducted in SPSS.

 

Here's where to find the data1UCLA SPSS Repeated Measures tutorial

"The data consist of people who were randomly assigned to two different diets: low-fat and not low-fat and three different types of exercise: at rest, walking leisurely and running. Their pulse rate was measured at three different time points during their assigned exercise: at 1 minute, 15 minutes and 30 minutes."

 

I'll use the other factors in this data in future posts demonstrating more complex concepts and procedures.

Here's how you enter the data in SPSS syntax.

DATA LIST FREE / id exertype diet time1 time2 time3.
BEGIN DATA.
 1    1         1       85       85       88
 2    1         1       90       92       93
 3    1         1       97       97       94
 4    1         1       80       82       83
 5    1         1       91       92       91
 6    1         2       83       83       84
 7    1         2       87       88       90
 8    1         2       92       94       95
 9    1         2       97       99       96
10    1         2      100       97      100
11    2         1       86       86       84
12    2         1       93      103      104
13    2         1       90       92       93
14    2         1       95       96      100
15    2         1       89       96       95
16    2         2       84       86       89
17    2         2      103      109       90
18    2         2       92       96      101
19    2         2       97       98      100
20    2         2      102      104      103
21    3         1       93       98      110
22    3         1       98      104      112
23    3         1       98      105       99
24    3         1       87      132      120
25    3         1       94      110      116
26    3         2       95      126      143
27    3         2      100      126      140
28    3         2      103      124      140
29    3         2       94      135      130
30    3         2       99      111      150
END DATA.

Again, in SPSS, compute a difference score (linear contrast = -1, 0, 1) and run a one-sample t-test.

COMPUTE Linear = time1 * -1 + time2 * 0 + time3 * 1.
T-TEST
/ VAR Linear
/ TESTVAL 0.


One-Sample StatisticsOne-Sample Statistics, table, 1 levels of column headers and 1 levels of row headers, table with 5 columns and 3 rows
N Mean Std. Deviation Std. Error Mean
Linear 30 11.3000 16.54075 3.01991
One-Sample TestOne-Sample Test, table, 3 levels of column headers and 1 levels of row headers, table with 7 columns and 5 rows
Test Value = 0
t df Sig. (2-tailed) Mean Difference 95% Confidence Interval of the Difference
Lower Upper
Linear 3.742 29 .001 11.30000 5.1236 17.4764

You can get the same results using the GLM command:

GLM Linear
/ INTERCEPT = INCLUDE
/ PRINT = PARAMETER.
 
Tests of Between-Subjects EffectsTests of Between-Subjects Effects, table, Dependent Variable, Linear, 1 layers, 1 levels of column headers and 1 levels of row headers, table with 6 columns and 9 rows
Linear
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model .000a 0 . . .
Intercept 3830.700 1 3830.700 14.001 .001
Error 7934.300 29 273.597
Total 11765.000 30
Corrected Total 7934.300 29
a. R Squared = .000 (Adjusted R Squared = .000)    

Parameter EstimatesParameter Estimates, table, Dependent Variable, Linear, 1 layers, 2 levels of column headers and 1 levels of row headers, table with 7 columns and 5 rows
Linear
Parameter B Std. Error t Sig. 95% Confidence Interval
Lower Bound Upper Bound
Intercept 11.300 3.020 3.742 .001 5.124 17.476

Commands used:

  • COMPUTE
  • T-TEST
  • GLM
  • INTERCEPT
  • PRINT

Here's how you do the same as above in R.

Read in the data.

Below I read from the web comma-seperated data I prepared and uploaded to my Bitbucket account. Then I print a random subset of this data so that you can see most or all of the conditions in this dataset.

exerdiet <- read.csv(file = "https://bitbucket.org/nmmichalak/analysis-examples/raw/05db96c5c7e022b20c836e2729106e53ab239579/exercise_diet_example.csv", header = TRUE)

exerdiet[sample(x = 1:nrow(exerdiet),
                size = 10,
                replace = FALSE), ]
##    id exertype diet time1 time2 time3
## 6   6        1    2    83    83    84
## 24 24        3    1    87   132   120
## 25 25        3    1    94   110   116
## 4   4        1    1    80    82    83
## 13 13        2    1    90    92    93
## 16 16        2    2    84    86    89
## 30 30        3    2    99   111   150
## 23 23        3    1    98   105    99
## 27 27        3    2   100   126   140
## 1   1        1    1    85    85    88

Functions used:

  • read.csv() takes a .csv file from a folder on your computer or, generally, it takes comma-separated data and turns it into a data frame (i.e., R's version of an Excel spreadsheet).
  • sample() takes a random sample from some data you give it.
  • nrow() tells you how many rows are in a matrix or data frame.
  • I also use basic subsetting code. You can find an easy introductory tutorial for subsetting at Quick-R, but, essentially, the argument is structured like [ "rows you want", "columns you want" ] (I added spaces for clarity, but they don't matter).

Convert exertype and diet variables into factor variables.

exerdiet[,c("exertype","diet")] <- lapply(exerdiet[,c("exertype","diet")],
                                          factor)

Functions used:

  • lapply() applies a function to an object you give it and spits out the results as a list.
  • factor() turns data you give it into a factor (R's version of SPSS's nominal variable type).
  • c() combines objects you give it, a lot like Excel's CONCATENATE function but with more applications.

Run a one-sample t-test on difference scores.

t.test(x = as.matrix(
  exerdiet[,c("time1","time2","time3")]) %*% c(-1,0,1)
)
## 
##  One Sample t-test
## 
## data:  as.matrix(exerdiet[, c("time1", "time2", "time3")]) %*% c(-1,     0, 1)
## t = 3.7418, df = 29, p-value = 0.0008026
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   5.123581 17.476419
## sample estimates:
## mean of x 
##      11.3

You can get the same results using the lm() function:

summary(
  lm(formula = as.matrix(
  exerdiet[,c("time1","time2","time3")]) %*% c(-1,0,1) ~ 1
)
)
## 
## Call:
## lm(formula = as.matrix(exerdiet[, c("time1", "time2", "time3")]) %*% 
##     c(-1, 0, 1) ~ 1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -24.30 -10.30  -8.30   4.95  39.70 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    11.30       3.02   3.742 0.000803 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.54 on 29 degrees of freedom

  

Instead of computing difference scores like I did in SPSS (which is a totally valid way to do it), above I multiply a matrix of the three time columns by the standard linear contrast, -1, 0, 1. Put simply, each number in column 1 is multiplied by -1, each number in column 2 is multiplied by 0, and each number in column 3 is multiplied by 1; the resulting rows are added together. The final result is one column of the difference scores. Here's a sample of what this looks like.


head(head(as.matrix(
  exerdiet[,c("time1","time2","time3")]))
)
##      time1 time2 time3
## [1,]    85    85    88
## [2,]    90    92    93
## [3,]    97    97    94
## [4,]    80    82    83
## [5,]    91    92    91
## [6,]    83    83    84
head(
  as.matrix(
  exerdiet[,c("time1","time2","time3")]) %*% c(-1, 0, 1)
)
##      [,1]
## [1,]    3
## [2,]    3
## [3,]   -3
## [4,]    3
## [5,]    0
## [6,]    1

Functions used:

  • t.test() takes data you give it and outputs a t-test. You can tell things like whether the data are paired or whether variances aren't equal.
  • as.matrix() turns data into matrices, which are like bare-bone data arranged in rows and columns. These are nice because you can use them for matrix algebra.
  • %*% multiples matrices if they're "conformable", which is a fancy way of saying, "math rules allow you to add, subtract, multiple, divide, etc. these things."
  • lm() is basic regression function in R.
  • summary() summarizes objects in R. It's especially useful for summarizing regression models (i.e., giving you a table of the results of your regression).
  • head() prints out a sample of a data frame (i.e., R's version of an Excel spreadsheet. It defaults to the first 6 rows.

 

You'll notice that I didn't compute omnibus tests. Why? Because performing omnibus tests requires "pooling" error terms, and doing so with repeated measures data requires pretty restrictive assumptions about homogeneity of difference score variances. Put simply, if you computed the difference score between every pair of levels in your within-subjects factor, assuming "homogeneity of treatment difference variances" would mean assuming all those variances are the same (i.e., homogeneous). When they're not and you go ahead and run an omnibus test anyway, you run the risk of inflating false-postive rates to 10% or even 15% (instead of the standard 5%). If, instead, you compute contrasts like in the above analyses, you can ignore this assumption because contrasts compare only two means, and the variance of only one difference score can't be heterogeneous with itself2.

 

There are, of course, procedures for estimating how much your data deviate from homogeneity and adjusting the degrees of freedom associated with the omnibus F test. I may rant about this topic in more detail in later posts.

Happy R,

Nick

Footnotes

  1. By the way, UCLA's IDRE is an excellent source for in-depth statistics tutorials.
  2. See Chapters 11-13 of Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective. New York, NY: Psychology Press. Richard Gonzalez of the University of Michigan summarizes this way of thinking in his Advanced topics in ANOVA lecture notes here.

No comments:

Post a Comment