I'd say most SPSS-psychologists who try to learn R can get the hang of reading in data and running some basic analyses.

Plots though. Unlike t-tests and correlations, there's no straightforward functions for plotting "psychology plots" in R.

What are psychology plots? Mean plots with errorbars. And moderation graphs that hide the points.

That's what I'm doing in this post, but I'm also going to sneak in some boxplots because they're damn useful (and underrated) in psychology.

No SPSS in this post. Maybe later.

First, here's how you fake data.

set.seed(1)
faked <- data.frame(factor1 = rep(x = LETTERS[1:2],
                                  each = 50),
                    factor2 = rep(x = LETTERS[3:4],
                                  each = 25,
                                  times = 4),
                    outcome = c(rnorm(n = 25, mean = 4, sd = 1),
                                rnorm(n = 25, mean = 2, sd = 1),
                                rnorm(n = 25, mean = 4, sd = 1),
                                rnorm(n = 25, mean = 5, sd = 1))
)

Now here's a trusty boxplot for Factor 1.

boxplot(outcome ~ factor1, data = faked,
        ylab = "Outcome",
        xlab = "Factor 1")

Now Factor 1's means (± 1 standard error).

library(ggplot2)
ggplot(data = faked, aes(x = factor1, y = outcome, fill = factor1)) +
  stat_summary(fun.data = "mean_se", geom = "bar") +
  stat_summary(fun.data = "mean_se", geom = "errorbar", width = 0.1) +
  labs(x = "Factor 1", y = "Outcome", fill = "Factor 1")

Boxplots for Factor 2.

boxplot(outcome ~ factor2, data = faked,
        ylab = "Outcome",
        xlab = "Factor 2")

Now mean plots for Factor 2 (again, ± 1 standard error).

ggplot(data = faked, aes(x = factor2, y = outcome, fill = factor2)) +
  stat_summary(fun.data = "mean_se", geom = "bar") +
  stat_summary(fun.data = "mean_se", geom = "errorbar", width = 0.1) +
  labs(x = "Factor 2", y = "Outcome", fill = "Factor 2")

Now things get a little fancier: interaction boxplots (Factor 1 x Factor 2).

boxplot(outcome ~ interaction(factor1, factor2, sep = " x "), data = faked,
        ylab = "Outcome",
        xlab = "Group")

Even fancier now: Mean bars for Factor 1 filled by levels of Factor 2.

ggplot(data = faked, aes(x = factor1, y = outcome, fill = factor2)) +
  stat_summary(fun.data = "mean_se", geom = "bar", position = position_dodge(1)) +
  stat_summary(fun.data = "mean_se", geom = "errorbar", position = position_dodge(1), width = 0.1) +
  labs(x = "Factor 1", y = "Outcome", fill = "Factor 2")

After this, the code for faking data gets really involved because I need to generate correlated data for within-subjects factors.

library(MASS)
set.seed(1)

# group A, .6 correlation between times, mean of time1 = 4 and time2 = 3
A <- mvrnorm(n = 100,
                  mu = c(4, 3),
                  Sigma = matrix(data = c(1,.5,
                                          .5,1),
                                 nrow = 2,
                                 ncol = 2,
                                 byrow = TRUE))

# group A, .6 correlation between times, mean of time1 = 4 and time2 = 5
B <- mvrnorm(n = 100,
                  mu = c(4, 5),
                  Sigma = matrix(data = c(1,.5,
                                          .5,1),
                                 nrow = 2,
                                 ncol = 2,
                                 byrow = TRUE))

time <- data.frame(rbind(A, B))
names(time) <- c("time1","time2")

Ok the data are done. Now I have to transorm the data to remove between-subject variability (thank you cognitive science and more).

# average value
outcome.avg <- mean(x = as.matrix(x = time))

# subject average
time$subj.avg <- rowMeans(time)

# remove within-subject variability from time 1
time$time1.new <- with(time, time1 - subj.avg + outcome.avg)

# remove within-subject variability from time 2
time$time2.new <- with(time, time2 - subj.avg + outcome.avg)

# long format
faked2 <- with(time, data.frame(id = c(1:200,1:200),
                                  factor1 = rep(LETTERS[1:2],
                                                each = 100,
                                                times = 2),
                                  time = rep(c("time 1","time 2"),
                                               each = 200),
                                  outcome = c(time1.new, time2.new)))

# now plot
ggplot(data = faked2,
       aes(x = time, y = outcome, fill = time)) +
  stat_summary(fun.data = "mean_se", geom = "bar") +
  stat_summary(fun.data = "mean_se", geom = "errorbar", width = 0.1) +
  labs(x = "time", y = "Outcome", fill = "time")

Notice the error bars are wicked small. That's because each "subject" serves as his or her own baseline. If you look back at my nasty code that made these data, I set the within-subject correlations to r = .5.

Boxplots for Time 1 and Time 2--why are their variances so different? Funny thing we can't see this in the mean plots above, isn't it?

boxplot(outcome ~ time, data = faked2,
        ylab = "Outcome",
        xlab = "Time",
        names = c("Time 1", "Time 2"))

Oh, that's why: there's an interaction big enough to drive a truck through it.

ggplot(data = faked2,
       aes(x = time, y = outcome, fill = factor1)) +
  stat_summary(fun.data = "mean_se", geom = "bar", position = position_dodge(1)) +
  stat_summary(fun.data = "mean_se", geom = "errorbar", position = position_dodge(1), width = 0.1) +
  labs(x = "time", y = "Outcome", fill = "Factor 1")

Now those are some damn fine looking boxplots.

boxplot(outcome ~ interaction(time, factor1, sep = " x "), data = faked2,
        ylab = "Outcome",
        xlab = "Time")

Because these data aren't real, I can re-interpret this fake data and pretend I assigned people to conditions and measured two variables. Framed this way, the correlation between Time 1 and Time 2 doesn't seem to depend on Factor 1 because the slopes are parallel.

The code below could work for a moderation plot (Surprise!).

See the points? Always plot your points :-)

# add groups to the time dataframe from before the transformations
time$factor1 <- rep(x = LETTERS[1:2], each = 100)

ggplot(time, aes(x = time1, time2, color = factor1)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "time 1", y = "time 2", color = "Factor 1")

Want to change colors? Here's my new favorite theme, five38 (maybe you're familiar?).

library(devtools)
devtools::install_github('bart6114/artyfarty', force = TRUE)

## Downloading GitHub repo bart6114/artyfarty@master
## from URL https://api.github.com/repos/bart6114/artyfarty/zipball/master

library(artyfarty)
ggplot(data = faked2,
       aes(x = time, y = outcome, fill = factor1)) +
  stat_summary(fun.data = "mean_se", geom = "bar", position = position_dodge(1)) +
  stat_summary(fun.data = "mean_se", geom = "errorbar", position = position_dodge(1), width = 0.1) +
  labs(x = "time", y = "Outcome", fill = "Factor 1") +
  theme_five38() +
  scale_fill_manual(values = pal("five38"))

Happy R.

Novum R-ganum

Thursday, October 27, 2016

Basic Plotting in R and SPSS for Psychologists