Mean difference contrasts

Input: INDEPENDENT means, variances, sample sizes, contrast weights (each row of weights should have the same number of columns as there are means, variances, etc.), contrast labels (optional), alpha (defaults to .05)

Output: Table of results, which includes estimates for each contrast under the equal variance assumption or not

sep.var.cont <- function(means, vars, ns, contrast, labels = 1:nrow(contrast),alpha = .05) {

  ihat = contrast %*% means

  # welch
  welch.denom = sqrt(contrast^2 %*% (vars/ns))
  t.welch = ihat/welch.denom

  # classic
  s.pooled = sqrt(sum((ns - 1)*vars)/(sum(ns) - length(ns)))
  classic.denom = s.pooled*sqrt(contrast^2 %*% (1/ns))
  t.classic = ihat/classic.denom

  # if dimensions of contrast are NULL, num. of contrasts = 1,
  # if not, num. of contrasts = dimensions of contrast (num. dimensions of matrix = num. of contrasts)
  num.contrast = ifelse(is.null(dim(contrast)),1,dim(contrast)[1])
  df.welch = rep(0,num.contrast)
  df.classic = rep(0,num.contrast)

  # makes rows of contrasts if contrast dimensions aren't NULL
  if (is.null(dim(contrast))) contrast = t(as.matrix(contrast))

  # calculating degrees of freedom for welch and classic
  for(i in 1:num.contrast) {
    num = (contrast[i,]^2 %*% (vars))^2
    den = sum((contrast[i,]^2 * vars)^2 / (ns-1))
    df.welch[i] <- num/den
    df.classic[i] = sum(ns) - length(ns)
    }

  # p-values
  p.welch = 2*(1- pt(abs(t.welch),df.welch))
  p.classic = 2*(1- pt(abs(t.classic),df.classic))

  # 95% confidence intervals
  lwr.95.welch = ihat - welch.denom*qt(p = 1 - (alpha/2), df = df.welch)
  upr.95.welch = ihat + welch.denom*qt(p = 1 - (alpha/2), df = df.welch)

  lwr.95.classic = ihat - classic.denom*qt(p = 1 - (alpha/2), df = df.classic)
  upr.95.classic = ihat + classic.denom*qt(p = 1 - (alpha/2), df = df.classic)

  result = data.frame(cont = rep(labels, times = 2),
                      equal.var = rep(c("assumed (classic)", "not assumed (Welch)"),each = num.contrast),
                      ihat = rep(ihat,times = 2),
                      se = c(classic.denom,welch.denom),
                      t = c(t.classic, t.welch),
                      df = c(df.classic,df.welch),
                      pval = c(p.classic,p.welch),
                      lwr.95 = c(lwr.95.welch,lwr.95.classic),
                      upr.95 = c(upr.95.welch,upr.95.classic))

  # rename confidence band columns
  names(result)[8:9] = paste0(c("lwr","upr"),
                                    substr(x = 1-alpha, start = 2, stop = 4))

  return(result)

}

Simulated data

set.seed(2)
dat1 <- data.frame(id = 1:12,
                   grp = rep(c('a','b','c','d'), each = 3),
                   dv = c(rnorm(n = 3, mean = 0, sd = 1),
                          rnorm(n = 3, mean = 1, sd = 1),
                          rnorm(n = 3, mean = 0, sd = 1),
                          rnorm(n = 3, mean = -1, sd = 1)))

Graph simulated data

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.3.2

ggplot(data = dat1, aes(x = grp, y = dv, fill = grp)) +
  stat_summary(fun.data = "mean_cl_normal", geom = "bar", color = 'black') +
  stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.1) +
  theme_minimal() +
  scale_fill_manual(values = c('red','blue','red','blue'))

Test main effects and interactions (simulated data)

sep.var.cont(means = by(data = dat1$dv, INDICES = dat1$grp, FUN = mean),
             vars = by(data = dat1$dv, INDICES = dat1$grp, FUN = var),
             ns = by(data = dat1$dv, INDICES = dat1$grp, FUN = length),
             contrast = rbind(c(1,1,-1,-1),
                              c(1,-1,1,-1),
                              c(1,-1,-1,1)),
             labels = c("main1", "main2","interact"),
             alpha = .05)

##       cont           equal.var       ihat       se          t       df
## 1    main1   assumed (classic)  0.6947419 1.090698  0.6369697 8.000000
## 2    main2   assumed (classic)  1.0487004 1.090698  0.9614943 8.000000
## 3 interact   assumed (classic) -1.7460423 1.090698 -1.6008479 8.000000
## 4    main1 not assumed (Welch)  0.6947419 1.090698  0.6369697 5.967952
## 5    main2 not assumed (Welch)  1.0487004 1.090698  0.9614943 5.967952
## 6 interact not assumed (Welch) -1.7460423 1.090698 -1.6008479 5.967952
##        pval    lwr.95    upr.95
## 1 0.5419386 -1.977579 3.3670629
## 2 0.3644602 -1.623621 3.7210214
## 3 0.1480782 -4.418363 0.9262787
## 4 0.5477843 -1.820413 3.2098971
## 5 0.3736246 -1.466455 3.5638556
## 6 0.1607937 -4.261198 0.7691129

'Real world' data

From Study 1 of Eskine, Kacinik, & Prinz (2011) A Bad Taste in the Mouth: Gustatory Disgust Influences Moral Judgment

An overall moral-judgment score was obtained for each of the remaining 54 participants (bitter condition: n = 15; sweet condition: n = 18; control condition: n = 21) by averaging his or her ratings of the six vignettes.

Results revealed a significant effect of beverage type, F(2, 51) = 7.368, p = .002, ηp2 = .224. Planned contrasts showed that participants’ moral judgments in the bitter condition (M = 78.34, SD = 10.83) were significantly harsher than judgments in the control condition (M = 61.58, SD = 16.88), t(51) = 3.117, p = .003, d = 1.09, and in the sweet condition (M = 59.58, SD = 16.70), t(51) = 3.609, p = .001, d = 1.22.

# data as vectors
ekp2011.n.s1 <- c(15,18,21)
ekp2011.mean.s1 <- c(78.34,59.58,61.58)
ekp2011.sd.s1 <- c(10.83,16.70,16.88)

# add to dataframe
ekp2011.s1 <- data.frame(cond = factor(x = c('bitter', 'sweet', 'control'),
                                       levels = c('bitter', 'sweet', 'control'),
                                       labels = c('bitter', 'sweet', 'control')),
                         n = ekp2011.n.s1,
                         mean = ekp2011.mean.s1,
                         sd = ekp2011.sd.s1)

# compute margin of error (95% confidence)
ekp2011.s1$moe <- with(data = ekp2011.s1, (sd/sqrt(n))*qt(p = 1 - .05/2, df = sum(ekp2011.n.s1) - length(ekp2011.n.s1)))

Graph 'real world' data

ggplot(data = ekp2011.s1, aes(x = cond, y = mean, fill = cond, ymin = mean - moe, ymax = mean + moe)) +
  geom_bar(stat = 'identity',color = 'black') +
  geom_errorbar(stat = 'identity', width = 0.1) +
  theme_minimal() +
  scale_fill_manual(values = c('grey','yellow','white'))

Test contrasts ('real world' data)

sep.var.cont(means = ekp2011.s1$mean,
             vars = ekp2011.s1$sd^2,
             ns = ekp2011.s1$n,
             contrast = rbind(c(1,0,-1),
                              c(0,1,-1),
                              c(1,-1,0),
                              c(2,-1,-1)),
             labels = c('bit.v.cntrl', 'swt.v.cntrl','bt.v.swt','bt.v.other'),
             alpha = .05)

##          cont           equal.var  ihat       se          t       df
## 1 bit.v.cntrl   assumed (classic) 16.76 5.203288  3.2210405 51.00000
## 2 swt.v.cntrl   assumed (classic) -2.00 4.943884 -0.4045402 51.00000
## 3    bt.v.swt   assumed (classic) 18.76 5.380925  3.4863894 51.00000
## 4  bt.v.other   assumed (classic) 35.52 9.360295  3.7947521 51.00000
## 5 bit.v.cntrl not assumed (Welch) 16.76 4.624669  3.6240430 32.08718
## 6 swt.v.cntrl not assumed (Welch) -2.00 5.390936 -0.3709931 36.81653
## 7    bt.v.swt not assumed (Welch) 18.76 4.828369  3.8853698 28.24051
## 8  bt.v.other not assumed (Welch) 35.52 7.767833  4.5727037 43.80942
##           pval     lwr.95    upr.95
## 1 2.225715e-03   7.340861 26.179139
## 2 6.875072e-01 -12.924912  8.924912
## 3 1.016338e-03   8.873327 28.646673
## 4 3.935690e-04  19.863037 51.176963
## 5 9.912286e-04   6.313964 27.206036
## 6 7.127690e-01 -11.925261  7.925261
## 7 5.646187e-04   7.957342 29.562658
## 8 3.925495e-05  16.728423 54.311577

Difference in proportion contrasts

Input: INDEPENDENT proportions, sample sizes, contrast weights (each row of weights should have the same number of columns as there are proportions and sample sizes), contrast labels (optional), alpha (defaults to .05)

Output: Table of results, where the estimates are logits

logit.test <- function(ps, ns, weights, labels = 1:ifelse(is.null(weights) == TRUE, 1, dim(weights)[1]), alpha = 0.5){

  # compute logits from proportions
  logits = log(ps/(1-ps))

  # compute their variances
  logit.vars = 1/(ns*ps*(1-ps))

  # estimate
  ihat = weights %*% logits

  # standard error
  se = sqrt(sum(logit.vars))

  # z-score test statistic
  z = ihat/se

  # two-tailed p-value
  pval = 2*(1 - pnorm(q = abs(z), mean = 0, sd = 1))

  # confidence intervals
  lower = ihat - se*qnorm(p = 1 - alpha/2, mean = 0, sd = 1)
  upper = ihat + se*qnorm(p = 1 - alpha/2, mean = 0, sd = 1)

  # table of output
  output.table = data.frame(cont = labels,
                            ihat,
                            se,
                            z,
                            pval,
                            lower,
                            upper)

# rename confidence band columns
names(output.table)[6:7] = paste0(c("lwr","upr"),
                                  substr(x = 1-alpha, start = 2, stop = 4))

  return(output.table)
}

Simulated data

set.seed(2)
dat2 <- data.frame(id = 1:12,
                  grp = rep(c('a','b','c','d'), each = 3),
                  dv = c(sample(x = c(0,1), size = 12, replace = TRUE, prob = c(.5,.5)),
                         sample(x = c(0,1), size = 12, replace = TRUE, prob = c(.25,.75)),
                         sample(x = c(0,1), size = 12, replace = TRUE, prob = c(.5,.5)),
                         sample(x = c(0,1), size = 12, replace = TRUE, prob = c(.75,.25))))

Graph simulated data

ggplot(data = dat2, aes(x = grp, y = dv, fill = grp)) +
  stat_summary(fun.data = "mean_cl_normal", geom = "bar", color = 'black') +
  stat_summary(fun.data = "mean_cl_normal", geom = "errorbar", width = 0.1) +
  theme_minimal() +
  scale_fill_manual(values = c('red','blue','red','blue'))

Test main effects and interaction (simulated data)

logit.test(ps = by(data = dat2$dv, INDICES = dat2$grp, FUN = mean),
           ns = by(data = dat2$dv, INDICES = dat2$grp, FUN = length),
           weights = rbind(c(1,1,-1,-1),
                           c(1,-1,1,-1),
                           c(1,-1,-1,1)),
           labels = c("main1","main2","interact"),
           alpha = .05)

##       cont       ihat       se          z      pval     lwr.95   upr.95
## 1    main1 -0.3566749 1.184724 -0.3010616 0.7633675 -2.6786917 1.965342
## 2    main2  1.7025639 1.184724  1.4370973 0.1506904 -0.6194529 4.024581
## 3 interact -0.3566749 1.184724 -0.3010616 0.7633675 -2.6786917 1.965342

Difference in correlation contrasts

Input: INDEPENDENT correlations, sample sizes, contrast weights (each row of weights should have the same number of columns as there are correlations and sample sizes), contrast labels (optional), null value (defaults to 0), alpha (defaults to .05)

Output: Table of results, where the estimate is the correlation difference

corr.cont <- function(r, n, weights, labels = 1:ifelse(is.null(weights) == TRUE, 1, dim(weights)[1]), null = 0, alpha = .05){
  # convert to z
  z = atanh(x = r)

  # standard error
  se.squared = ifelse(n == 0, 0, 1/(n - 3))

  # numerator z-observed
  z.obs.num = weights %*% z

  # denominator z-observed
  z.obs.denom = weights^2 %*% se.squared

  # convert null r to z
  rtoz.tran.rho = log((1 + null)/(1 - null))/2

  # test z against null z
  ztest = (z.obs.num - rtoz.tran.rho)/sqrt(z.obs.denom)

  # p-value
  pval = 2*(1 - pnorm(q = abs(ztest), mean = 0, sd = 1))

  # convert to r
    r.diff = tanh(x = z.obs.num)

  # confidence interval for ztest
    lower = z.obs.num - sqrt(z.obs.denom)*(qnorm(p = 1 - alpha/2, mean = 0, sd = 1))
    upper = z.obs.num + sqrt(z.obs.denom)*(qnorm(p = 1 - alpha/2, mean = 0, sd = 1))

  # confidence interval for ztest
    es.lwr = tanh(x = lower)
    es.upr = tanh(x = upper)

  # output table
  output = data.frame(cont = labels,
                      r.diff,
                      ztest,
                      pval,
                      es.lwr,
                      es.upr)

  # name confidence intervals
  names(output)[5:6] = paste0(c("lwr","upr"),
                              substr(x = 1-alpha, start = 2, stop = 4))

  return(output)
}

Simulated data

library(MASS)
set.seed(3)
# simulate 4 pairs of columns with defined correlations
dat3 <- do.call(what = rbind,
                args = lapply(X = c(0,.5,0,-.5), FUN = function(r){
                  mvrnorm(n = 12,
                          mu = c(0,0),
                          Sigma = matrix(data = c(1,r,r,1),
                                         nrow = 2,
                                         ncol = 2,
                                         byrow = TRUE))
}
)
)

# those are my groups of n = 12
dat3 <- data.frame(id = 1:48,
                   grp = rep(c('a','b','c','d'), each = 12),
                   dat3)

Graph simulated data

ggplot(data = dat3, aes(x = X1, y = X2, fill = grp, color = grp)) +
  geom_point(color = 'black', shape = 21) +
  geom_smooth(method = 'lm') +
  facet_wrap(~grp, nrow = 1, ncol = 4) +
  theme_minimal() +
  guides(fill = FALSE) +
  scale_fill_manual(values = c('red','blue','red','blue')) +
  scale_color_manual(values = c('red','blue','red','blue'))

Test main effects and interaction (simulated data)

corr.cont(r = by(data = dat3[,c("X1","X2")], INDICES = dat3$grp, FUN = function(x) cor(x)[1,2]),
          n = by(data = dat3[,c("X1","X2")], INDICES = dat3$grp, nrow),
          weights = rbind(c(1,1,-1,-1),
                           c(1,-1,1,-1),
                           c(1,-1,-1,1)),
          labels = c("main1","main2","interact"),
          alpha = .05)

##       cont     r.diff     ztest      pval     lwr.95    upr.95
## 1    main1  0.7091334  1.328158 0.1841260 -0.3979443 0.9753607
## 2    main2 -0.1267073 -0.191088 0.8484567 -0.8924905 0.8272152
## 3 interact -0.6499941 -1.162933 0.2448568 -0.9693813 0.4864155

'Real world' data

From results section on pp. 1097 of Oishi, Kesebir, & Diener (2011) Income Inequality and Happiness

Gini coefficients showed a strong negative association with the mean happiness level of the lowest-20% income group, r(25) = −.54, p < .01, and the mean happiness level of the next lowest (20–40%) income group, r(25) = −.63, p < .01. That is, lower-income respondents’ happiness was lower in the years with more income inequality than in the years with less income inequality. In contrast, income inequality of the year was unrelated to the mean happiness of the middle (40–60%) income group, r(25) = −.12, p = .56, the upper-middle (60–80%) income group, r(25) = −.09, p = .65, and the top-20% income group, r(25) = .03, p = .88.

okd2011.rs <- c(-.54,-.63,-.12,-.09,.03)
okd2011.ns1 <- c(26,26,26,26,26)

# looks like 27 points on the scatterplot...
okd2011.ns2 <- c(27,27,27,27,27)

# combine in dataframe
okd2011 <- data.frame(income.grp = factor(x = c('lowest-20%','lowest-20-40%','middle 40-60%','upper-middle 60-80%','top-20%'),
                                          levels = c('lowest-20%','lowest-20-40%','middle 40-60%','upper-middle 60-80%','top-20%'), ordered = TRUE),
                      r = okd2011.rs,
                      n1 = okd2011.ns1,
                      n2 = okd2011.ns2)

# compute margin of error (95% confidence)
okd2011$fisher.z <- with(data = okd2011, atanh(x = r))
okd2011$fisher.se1 <- with(data = okd2011, 1/sqrt(n1 - 3))
okd2011$fisher.se2 <- with(data = okd2011, 1/sqrt(n2 - 3))
okd2011$fisher.moe1 <- with(data = okd2011, fisher.se1*qnorm(p = 1 - .05/2, mean = 0, sd = 1))
okd2011$fisher.moe2 <- with(data = okd2011, fisher.se2*qnorm(p = 1 - .05/2, mean = 0, sd = 1))
okd2011$r.moe1.95 <- with(data = okd2011, tanh(x = fisher.moe1))
okd2011$r.moe2.95 <- with(data = okd2011, tanh(x = fisher.moe2))

Graph 'Real world' data

ggplot(data = okd2011, aes(x = income.grp, y = r, ymin = r - r.moe1.95, ymax = r + r.moe1.95, fill = income.grp)) +
  geom_bar(stat = 'identity', color = 'black') +
  geom_errorbar(stat = 'identity', width = 0.1) +
  scale_y_continuous(breaks = seq(-1,1,0.2), limits = c(-1.1,1.1)) +
  coord_cartesian(ylim = c(-1,1)) +
  theme_minimal() +
  scale_fill_brewer(palette = 'Blues')

Test polynomial contrasts ('real world' data)

# n given reported degrees of freedom = 25
corr.cont(r = okd2011$r,
          n = okd2011$n1,
          weights = t(contr.poly(5)),
          labels = c('linear','quadratic','cubic','4th.power'),
          alpha = .05)

##         cont      r.diff       ztest       pval     lwr.95    upr.95
## .L    linear  0.54201157  2.91106766 0.00360196  0.1957591 0.7681012
## .Q quadratic -0.02016769 -0.09673394 0.92293768 -0.4043608 0.3700753
## .C     cubic -0.20820746 -1.01334409 0.31089584 -0.5511124 0.1948598
## ^4 4th.power  0.23786789  1.16305105 0.24480879 -0.1646551 0.5724730

# n given 27 observations on the scatterplot
corr.cont(r = okd2011$r,
          n = okd2011$n2,
          weights = t(contr.poly(5)),
          labels = c('linear','quadratic','cubic','4th.power'),
          alpha = .05)

##         cont      r.diff       ztest        pval     lwr.95    upr.95
## .L    linear  0.54201157  2.97367843 0.002942533  0.2040200 0.7645497
## .Q quadratic -0.02016769 -0.09881448 0.921285570 -0.3971380 0.3626254
## .C     cubic -0.20820746 -1.03513893 0.300604035 -0.5450927 0.1865681
## ^4 4th.power  0.23786789  1.18806576 0.234807503 -0.1562720 0.5666597

Update

I had to make an adjustment to the sep.var.cont—the degrees of freedom for the welch tests were wrong, and so were their p-values (their ts were right right)

Here's how you know they're correct now

# test data
set.seed(1234)
test.data <- data.frame(grp = rep(letters[1:2], each = 10),
                         dv = c(rnorm(n = 10, mean = 1, sd = 1),
                                rnorm(n = 10, mean = 3, sd = 3)))

# here are the numbers we're aiming for
lapply(X = c(TRUE,FALSE), FUN = function(assume){
  if(assume == TRUE){
  t.test(formula = dv ~ grp, data = test.data, var.equal = TRUE)
  }
  else{
  t.test(formula = dv ~ grp, data = test.data, var.equal = FALSE)
  }
})

## [[1]]
## 
##  Two Sample t-test
## 
## data:  dv by grp
## t = -1.9131, df = 18, p-value = 0.07178
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.2564627  0.1991722
## sample estimates:
## mean in group a mean in group b 
##       0.6168426       2.6454879 
## 
## 
## [[2]]
## 
##  Welch Two Sample t-test
## 
## data:  dv by grp
## t = -1.9131, df = 10.725, p-value = 0.08279
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.3698997  0.3126091
## sample estimates:
## mean in group a mean in group b 
##       0.6168426       2.6454879

# test sep.var.cont
with(data = test.data, sep.var.cont(means = by(dv,grp,mean),
                                    vars = by(dv,grp,var),
                                    ns = by(dv,grp,length),
                                    contrast = c(1,-1),
                                    labels = 'test',
                                    alpha = .05))

##   cont           equal.var      ihat     se         t       df       pval
## 1 test   assumed (classic) -2.028645 1.0604 -1.913095 18.00000 0.07177914
## 2 test not assumed (Welch) -2.028645 1.0604 -1.913095 10.72471 0.08279197
##      lwr.95    upr.95
## 1 -4.369900 0.3126091
## 2 -4.256463 0.1991722

Novum R-ganum

Sunday, January 29, 2017

Contrast toolkit: Functions for testing contrasts in your own data and published reports

Mean difference contrasts

Input: INDEPENDENT means, variances, sample sizes, contrast weights (each row of weights should have the same number of columns as there are means, variances, etc.), contrast labels (optional), alpha (defaults to .05)

Output: Table of results, which includes estimates for each contrast under the equal variance assumption or not

Simulated data

Graph simulated data

Test main effects and interactions (simulated data)

'Real world' data

From Study 1 of Eskine, Kacinik, & Prinz (2011) A Bad Taste in the Mouth: Gustatory Disgust Influences Moral Judgment

Graph 'real world' data

Test contrasts ('real world' data)

Difference in proportion contrasts

Input: INDEPENDENT proportions, sample sizes, contrast weights (each row of weights should have the same number of columns as there are proportions and sample sizes), contrast labels (optional), alpha (defaults to .05)

Output: Table of results, where the estimates are logits

Simulated data

Graph simulated data

Test main effects and interaction (simulated data)

Difference in correlation contrasts

Input: INDEPENDENT correlations, sample sizes, contrast weights (each row of weights should have the same number of columns as there are correlations and sample sizes), contrast labels (optional), null value (defaults to 0), alpha (defaults to .05)

Output: Table of results, where the estimate is the correlation difference

Simulated data

Graph simulated data

Test main effects and interaction (simulated data)

'Real world' data

From results section on pp. 1097 of Oishi, Kesebir, & Diener (2011) Income Inequality and Happiness

Graph 'Real world' data

Test polynomial contrasts ('real world' data)

Update

No comments:

Post a Comment