Novum R-ganum: Repeated measures contrasts in R and SPSS

SPSS image and R image

This is my first post so I'm going to demonstrate something simple: testing the effects of one within-subjects factor. My goal in this post and others is to provide simple step-by-step procedures for conducting analyses in R that you would have otherwise conducted in SPSS.

Here's where to find the data¹: UCLA SPSS Repeated Measures tutorial

"The data consist of people who were randomly assigned to two different diets: low-fat and not low-fat and three different types of exercise: at rest, walking leisurely and running. Their pulse rate was measured at three different time points during their assigned exercise: at 1 minute, 15 minutes and 30 minutes."

I'll use the other factors in this data in future posts demonstrating more complex concepts and procedures.

Here's how you enter the data in SPSS syntax.

DATA LIST FREE / id exertype diet time1 time2 time3.
BEGIN DATA.
1    1         1       85       85       88
2    1         1       90       92       93
3    1         1       97       97       94
4    1         1       80       82       83
5    1         1       91       92       91
6    1         2       83       83       84
7    1         2       87       88       90
8    1         2       92       94       95
9    1         2       97       99       96
10    1         2      100       97      100
11    2         1       86       86       84
12    2         1       93      103      104
13    2         1       90       92       93
14    2         1       95       96      100
15    2         1       89       96       95
16    2         2       84       86       89
17    2         2      103      109       90
18    2         2       92       96      101
19    2         2       97       98      100
20    2         2      102      104      103
21    3         1       93       98      110
22    3         1       98      104      112
23    3         1       98      105       99
24    3         1       87      132      120
25    3         1       94      110      116
26    3         2       95      126      143
27    3         2      100      126      140
28    3         2      103      124      140
29    3         2       94      135      130
30    3         2       99      111      150
END DATA.

Again, in SPSS, compute a difference score (linear contrast = -1, 0, 1) and run a one-sample t-test.

COMPUTE Linear = time1 * -1 + time2 * 0 + time3 * 1.
T-TEST
/ VAR Linear
/ TESTVAL 0.

One-Sample StatisticsOne-Sample Statistics, table, 1 levels of column headers and 1 levels of row headers, table with 5 columns and 3 rows
	N	Mean	Std. Deviation	Std. Error Mean
Linear	30	11.3000	16.54075	3.01991

One-Sample TestOne-Sample Test, table, 3 levels of column headers and 1 levels of row headers, table with 7 columns and 5 rows
	Test Value = 0
	t	df	Sig. (2-tailed)	Mean Difference	95% Confidence Interval of the Difference
					Lower	Upper
Linear	3.742	29	.001	11.30000	5.1236	17.4764

You can get the same results using the GLM command:

GLM Linear
/ INTERCEPT = INCLUDE
/ PRINT = PARAMETER.

Tests of Between-Subjects EffectsTests of Between-Subjects Effects, table, Dependent Variable, Linear, 1 layers, 1 levels of column headers and 1 levels of row headers, table with 6 columns and 9 rows

Linear

Source

Type III Sum of Squares

Mean Square

Sig.

Corrected Model

.000^a

Intercept

3830.700

14.001

.001

Error

7934.300

273.597

Total

11765.000

Corrected Total

7934.300

a. R Squared = .000 (Adjusted R Squared = .000)

Parameter EstimatesParameter Estimates, table, Dependent Variable, Linear, 1 layers, 2 levels of column headers and 1 levels of row headers, table with 7 columns and 5 rows

Linear

Parameter

Std. Error

Sig.

95% Confidence Interval

Lower Bound

Upper Bound

Intercept

11.300

3.020

3.742

.001

5.124

17.476

Commands used:

COMPUTE
T-TEST
GLM
INTERCEPT
PRINT

Here's how you do the same as above in R.

Read in the data.

Below I read from the web comma-seperated data I prepared and uploaded to my Bitbucket account. Then I print a random subset of this data so that you can see most or all of the conditions in this dataset.

exerdiet <- read.csv(file = "https://bitbucket.org/nmmichalak/analysis-examples/raw/05db96c5c7e022b20c836e2729106e53ab239579/exercise_diet_example.csv", header = TRUE)

exerdiet[sample(x = 1:nrow(exerdiet),
                size = 10,
                replace = FALSE), ]

##    id exertype diet time1 time2 time3
## 6   6        1    2    83    83    84
## 24 24        3    1    87   132   120
## 25 25        3    1    94   110   116
## 4   4        1    1    80    82    83
## 13 13        2    1    90    92    93
## 16 16        2    2    84    86    89
## 30 30        3    2    99   111   150
## 23 23        3    1    98   105    99
## 27 27        3    2   100   126   140
## 1   1        1    1    85    85    88

Functions used:

read.csv() takes a .csv file from a folder on your computer or, generally, it takes comma-separated data and turns it into a data frame (i.e., R's version of an Excel spreadsheet).
sample() takes a random sample from some data you give it.
nrow() tells you how many rows are in a matrix or data frame.
I also use basic subsetting code. You can find an easy introductory tutorial for subsetting at Quick-R, but, essentially, the argument is structured like [ "rows you want", "columns you want" ] (I added spaces for clarity, but they don't matter).

Convert exertype and diet variables into factor variables.

exerdiet[,c("exertype","diet")] <- lapply(exerdiet[,c("exertype","diet")],
                                          factor)

Functions used:

lapply() applies a function to an object you give it and spits out the results as a list.
factor() turns data you give it into a factor (R's version of SPSS's nominal variable type).
c() combines objects you give it, a lot like Excel's CONCATENATE function but with more applications.

Run a one-sample t-test on difference scores.

t.test(x = as.matrix(
  exerdiet[,c("time1","time2","time3")]) %*% c(-1,0,1)
)

## 
##  One Sample t-test
## 
## data:  as.matrix(exerdiet[, c("time1", "time2", "time3")]) %*% c(-1,     0, 1)
## t = 3.7418, df = 29, p-value = 0.0008026
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   5.123581 17.476419
## sample estimates:
## mean of x 
##      11.3

You can get the same results using the lm() function:

summary(
  lm(formula = as.matrix(
  exerdiet[,c("time1","time2","time3")]) %*% c(-1,0,1) ~ 1
)
)

## 
## Call:
## lm(formula = as.matrix(exerdiet[, c("time1", "time2", "time3")]) %*% 
##     c(-1, 0, 1) ~ 1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -24.30 -10.30  -8.30   4.95  39.70 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    11.30       3.02   3.742 0.000803 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.54 on 29 degrees of freedom

Instead of computing difference scores like I did in SPSS (which is a totally valid way to do it), above I multiply a matrix of the three time columns by the standard linear contrast, -1, 0, 1. Put simply, each number in column 1 is multiplied by -1, each number in column 2 is multiplied by 0, and each number in column 3 is multiplied by 1; the resulting rows are added together. The final result is one column of the difference scores. Here's a sample of what this looks like.

head(head(as.matrix(
  exerdiet[,c("time1","time2","time3")]))
)

##      time1 time2 time3
## [1,]    85    85    88
## [2,]    90    92    93
## [3,]    97    97    94
## [4,]    80    82    83
## [5,]    91    92    91
## [6,]    83    83    84

head(
  as.matrix(
  exerdiet[,c("time1","time2","time3")]) %*% c(-1, 0, 1)
)

##      [,1]
## [1,]    3
## [2,]    3
## [3,]   -3
## [4,]    3
## [5,]    0
## [6,]    1

Functions used:

t.test() takes data you give it and outputs a t-test. You can tell things like whether the data are paired or whether variances aren't equal.
as.matrix() turns data into matrices, which are like bare-bone data arranged in rows and columns. These are nice because you can use them for matrix algebra.
%*% multiples matrices if they're "conformable", which is a fancy way of saying, "math rules allow you to add, subtract, multiple, divide, etc. these things."
lm() is basic regression function in R.
summary() summarizes objects in R. It's especially useful for summarizing regression models (i.e., giving you a table of the results of your regression).
head() prints out a sample of a data frame (i.e., R's version of an Excel spreadsheet. It defaults to the first 6 rows.

You'll notice that I didn't compute omnibus tests. Why? Because performing omnibus tests requires "pooling" error terms, and doing so with repeated measures data requires pretty restrictive assumptions about homogeneity of difference score variances. Put simply, if you computed the difference score between every pair of levels in your within-subjects factor, assuming "homogeneity of treatment difference variances" would mean assuming all those variances are the same (i.e., homogeneous). When they're not and you go ahead and run an omnibus test anyway, you run the risk of inflating false-postive rates to 10% or even 15% (instead of the standard 5%). If, instead, you compute contrasts like in the above analyses, you can ignore this assumption because contrasts compare only two means, and the variance of only one difference score can't be heterogeneous with itself².

There are, of course, procedures for estimating how much your data deviate from homogeneity and adjusting the degrees of freedom associated with the omnibus F test. I may rant about this topic in more detail in later posts.

Happy R,

Nick

Footnotes

By the way, UCLA's IDRE is an excellent source for in-depth statistics tutorials.
See Chapters 11-13 of Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective. New York, NY: Psychology Press. Richard Gonzalez of the University of Michigan summarizes this way of thinking in his Advanced topics in ANOVA lecture notes here.

Novum R-ganum

Wednesday, June 1, 2016

Repeated measures contrasts in R and SPSS

This is my first post so I'm going to demonstrate something simple: testing the effects of one within-subjects factor. My goal in this post and others is to provide simple step-by-step procedures for conducting analyses in R that you would have otherwise conducted in SPSS.

Here's where to find the data¹: UCLA SPSS Repeated Measures tutorial

I'll use the other factors in this data in future posts demonstrating more complex concepts and procedures.

Here's how you enter the data in SPSS syntax.

Again, in SPSS, compute a difference score (linear contrast = -1, 0, 1) and run a one-sample t-test.

You can get the same results using the GLM command:

Commands used:

Here's how you do the same as above in R.

Read in the data.

Below I read from the web comma-seperated data I prepared and uploaded to my Bitbucket account. Then I print a random subset of this data so that you can see most or all of the conditions in this dataset.

Functions used:

Convert exertype and diet variables into factor variables.

Functions used:

Run a one-sample t-test on difference scores.

You can get the same results using the lm() function:

Functions used:

There are, of course, procedures for estimating how much your data deviate from homogeneity and adjusting the degrees of freedom associated with the omnibus F test. I may rant about this topic in more detail in later posts.

Happy R,

Nick

Footnotes

No comments:

Post a Comment

Wednesday, June 1, 2016

Repeated measures contrasts in R and SPSS

This is my first post so I'm going to demonstrate something simple: testing the effects of one within-subjects factor. My goal in this post and others is to provide simple step-by-step procedures for conducting analyses in R that you would have otherwise conducted in SPSS.

Here's where to find the data1: UCLA SPSS Repeated Measures tutorial

I'll use the other factors in this data in future posts demonstrating more complex concepts and procedures.

Here's how you enter the data in SPSS syntax.

Again, in SPSS, compute a difference score (linear contrast = -1, 0, 1) and run a one-sample t-test.

You can get the same results using the GLM command:

Commands used:

Here's how you do the same as above in R.

Read in the data.

Below I read from the web comma-seperated data I prepared and uploaded to my Bitbucket account. Then I print a random subset of this data so that you can see most or all of the conditions in this dataset.

Functions used:

Convert exertype and diet variables into factor variables.

Functions used:

Run a one-sample t-test on difference scores.

You can get the same results using the lm() function:

Functions used:

There are, of course, procedures for estimating how much your data deviate from homogeneity and adjusting the degrees of freedom associated with the omnibus F test. I may rant about this topic in more detail in later posts.

Happy R,

Nick

Footnotes

No comments:

Post a Comment

Here's where to find the data¹: UCLA SPSS Repeated Measures tutorial