This is my first post so I'm going to demonstrate something simple: testing the effects of one within-subjects factor. My goal in this post and others is to provide simple step-by-step procedures for conducting analyses in R that you would have otherwise conducted in SPSS.
"The data consist of people who were randomly assigned to two different diets: low-fat and not low-fat and three different types of exercise: at rest, walking leisurely and running. Their pulse rate was measured at three different time points during their assigned exercise: at 1 minute, 15 minutes and 30 minutes."
I'll use the other factors in this data in future posts demonstrating more complex concepts and procedures.
Again, in SPSS, compute a difference score (linear contrast = -1, 0, 1) and run a one-sample t-test.
COMPUTE Linear = time1 * -1 + time2 * 0 + time3 * 1.
T-TEST
/ VAR Linear
/ TESTVAL 0.
One-Sample StatisticsOne-Sample Statistics, table, 1 levels of column headers and 1 levels of row headers, table with 5 columns and 3 rows
|
N |
Mean |
Std. Deviation |
Std. Error Mean |
Linear |
30 |
11.3000 |
16.54075 |
3.01991 |
|
|
|
|
|
One-Sample TestOne-Sample Test, table, 3 levels of column headers and 1 levels of row headers, table with 7 columns and 5 rows
|
Test Value = 0 |
t |
df |
Sig. (2-tailed) |
Mean Difference |
95% Confidence Interval of the Difference |
Lower |
Upper |
Linear |
3.742 |
29 |
.001 |
11.30000 |
5.1236 |
17.4764 |
|
|
|
|
|
|
|
You can get the same results using the GLM command:
GLM Linear
/ INTERCEPT = INCLUDE
/ PRINT = PARAMETER.
Tests of Between-Subjects EffectsTests of Between-Subjects Effects, table, Dependent Variable, Linear, 1 layers, 1 levels of column headers and 1 levels of row headers, table with 6 columns and 9 rows
|
Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
Corrected Model |
.000a |
0 |
. |
. |
. |
Intercept |
3830.700 |
1 |
3830.700 |
14.001 |
.001 |
Error |
7934.300 |
29 |
273.597 |
|
|
Total |
11765.000 |
30 |
|
|
|
Corrected Total |
7934.300 |
29 |
|
|
|
|
|
|
|
|
|
Parameter EstimatesParameter Estimates, table, Dependent Variable, Linear, 1 layers, 2 levels of column headers and 1 levels of row headers, table with 7 columns and 5 rows
|
Parameter |
B |
Std. Error |
t |
Sig. |
95% Confidence Interval |
Lower Bound |
Upper Bound |
Intercept |
11.300 |
3.020 |
3.742 |
.001 |
5.124 |
17.476 |
|
|
|
|
|
|
|
- COMPUTE
- T-TEST
- GLM
- INTERCEPT
- PRINT
Below I read from the web comma-seperated data I prepared and uploaded to my Bitbucket account. Then I print a random subset of this data so that you can see most or all of the conditions in this dataset.
exerdiet <- read.csv(file = "https://bitbucket.org/nmmichalak/analysis-examples/raw/05db96c5c7e022b20c836e2729106e53ab239579/exercise_diet_example.csv", header = TRUE)
exerdiet[sample(x = 1:nrow(exerdiet),
size = 10,
replace = FALSE), ]
## id exertype diet time1 time2 time3
## 6 6 1 2 83 83 84
## 24 24 3 1 87 132 120
## 25 25 3 1 94 110 116
## 4 4 1 1 80 82 83
## 13 13 2 1 90 92 93
## 16 16 2 2 84 86 89
## 30 30 3 2 99 111 150
## 23 23 3 1 98 105 99
## 27 27 3 2 100 126 140
## 1 1 1 1 85 85 88
read.csv()
takes a .csv file from a folder on your computer or, generally, it takes comma-separated data and turns it into a data frame (i.e., R's version of an Excel spreadsheet).
sample()
takes a random sample from some data you give it.
nrow()
tells you how many rows are in a matrix or data frame.
- I also use basic subsetting code. You can find an easy introductory tutorial for subsetting at Quick-R, but, essentially, the argument is structured like [ "rows you want", "columns you want" ] (I added spaces for clarity, but they don't matter).
Convert exertype and diet variables into factor variables.
exerdiet[,c("exertype","diet")] <- lapply(exerdiet[,c("exertype","diet")],
factor)
lapply()
applies a function to an object you give it and spits out the results as a list.
factor()
turns data you give it into a factor (R's version of SPSS's nominal variable type).
c()
combines objects you give it, a lot like Excel's CONCATENATE function but with more applications.
t.test(x = as.matrix(
exerdiet[,c("time1","time2","time3")]) %*% c(-1,0,1)
)
##
## One Sample t-test
##
## data: as.matrix(exerdiet[, c("time1", "time2", "time3")]) %*% c(-1, 0, 1)
## t = 3.7418, df = 29, p-value = 0.0008026
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 5.123581 17.476419
## sample estimates:
## mean of x
## 11.3
summary(
lm(formula = as.matrix(
exerdiet[,c("time1","time2","time3")]) %*% c(-1,0,1) ~ 1
)
)
##
## Call:
## lm(formula = as.matrix(exerdiet[, c("time1", "time2", "time3")]) %*%
## c(-1, 0, 1) ~ 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.30 -10.30 -8.30 4.95 39.70
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.30 3.02 3.742 0.000803 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.54 on 29 degrees of freedom
Instead of computing difference scores like I did in SPSS (which is a totally valid way to do it), above I multiply a matrix of the three time columns by the standard linear contrast, -1, 0, 1. Put simply, each number in column 1 is multiplied by -1, each number in column 2 is multiplied by 0, and each number in column 3 is multiplied by 1; the resulting rows are added together. The final result is one column of the difference scores. Here's a sample of what this looks like.
head(head(as.matrix(
exerdiet[,c("time1","time2","time3")]))
)
## time1 time2 time3
## [1,] 85 85 88
## [2,] 90 92 93
## [3,] 97 97 94
## [4,] 80 82 83
## [5,] 91 92 91
## [6,] 83 83 84
head(
as.matrix(
exerdiet[,c("time1","time2","time3")]) %*% c(-1, 0, 1)
)
## [,1]
## [1,] 3
## [2,] 3
## [3,] -3
## [4,] 3
## [5,] 0
## [6,] 1
t.test()
takes data you give it and outputs a t-test. You can tell things like whether the data are paired or whether variances aren't equal.
as.matrix()
turns data into matrices, which are like bare-bone data arranged in rows and columns. These are nice because you can use them for matrix algebra.
%*%
multiples matrices if they're "conformable", which is a fancy way of saying, "math rules allow you to add, subtract, multiple, divide, etc. these things."
lm()
is basic regression function in R.
summary()
summarizes objects in R. It's especially useful for summarizing regression models (i.e., giving you a table of the results of your regression).
head()
prints out a sample of a data frame (i.e., R's version of an Excel spreadsheet. It defaults to the first 6 rows.
You'll notice that I didn't compute omnibus tests. Why? Because performing omnibus tests requires "pooling" error terms, and doing so with repeated measures data requires pretty restrictive assumptions about homogeneity of difference score variances. Put simply, if you computed the difference score between every pair of levels in your within-subjects factor, assuming "homogeneity of treatment difference variances" would mean assuming all those variances are the same (i.e., homogeneous). When they're not and you go ahead and run an omnibus test anyway, you run the risk of inflating false-postive rates to 10% or even 15% (instead of the standard 5%). If, instead, you compute contrasts like in the above analyses, you can ignore this assumption because contrasts compare only two means, and the variance of only one difference score can't be heterogeneous with itself2.
There are, of course, procedures for estimating how much your data deviate from homogeneity and adjusting the degrees of freedom associated with the omnibus F test. I may rant about this topic in more detail in later posts.
- By the way, UCLA's IDRE is an excellent source for in-depth statistics tutorials.
- See Chapters 11-13 of Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective. New York, NY: Psychology Press. Richard Gonzalez of the University of Michigan summarizes this way of thinking in his Advanced topics in ANOVA lecture notes here.
No comments:
Post a Comment