Monday, August 15, 2016

Computing reliability coefficients (like Cronbach's Alpha) in R and SPSS

This post is about measurement reliability. I think it'll be useful to define some concepts before I start my usual tutorial.


First, we should expect that any test or scale that supposedly measures something real will contain measurement error. Formally, a measurement score boils down to "true" stuff plus error stuff.



In the above equation, X is the observed score, T is the true score, and ε is measurement error.



Reliability is, in a nutshell, the proportion of observed variance that is true variance.

We don't know the true score or the true variance, so we have to collect data and do our best maths to estimate reliability. One way to do so is pretty intuitive: give people your measurement more than once.


To me, this idea is really neat. Measure people twice, X1 and X2, and assume that the true score stays the same and that measurement error is independent between time points (i.e., ε at time 1 doesn't correlate with ε at time 2).




Design and run a study or an experiment where it's reasonable to make these assumptions and you can be confident that what X1 and X2 have in common is the true score.

That is, the correlation between X1 and X2 is reliability. Pretty sick.


Even cooler, in my opinion, is that you can use the correlations among many related measurement items to estimate how their composite (e.g., summing or averaging them) will correlate with a composite of a similar number of items or a composite of the same items measured at another time.


This is the gist of coefficient alpha (i.e., Cronbach's alpha), which changes as a function of the average inter-item correlation and the number of items. More items and greater inter-item correlations result in larger alphas.


Source: LECTURE NOTES #13: Reliability, Structural Equation Modeling, and Hierarchical

Modeling by Rich Gonzalez

To demonstrate how to compute coefficient alpha (and the intraclass correlation coefficient, and an alternative coefficient, omega), I'll use data that comes with the psych package in R. I'll load that first.


Before I get into it, keep these points in mind:



  • I've highlighted values to compare in the R and SPSS outputs.
  • SPSS doesn't compute confidence intervals for coefficient alpha directly. When you ask for the ICC, it'll give you CIs around an estimate, which is equal to coefficient alpha.
  • SPSS hasn't implemented the alternative coefficient I'll discuss, omega.
  • There are many ways to compute omega—it's complicated.

Here's how you do it in R: Coefficient Alpha

First, I'll load the psych and knitr packages.

# set working directory
setwd(dir = "~/Desktop/analysis-examples/Cronbach's alpha and ICC in R and SPSS/")

# packages I want
want_packages <- c("psych", "knitr")

# packages that are already downloaded in my library
have_packages   <- want_packages %in% rownames(installed.packages())

# if I don't have these packages, download the ones I don't have
if(any(!have_packages)) install.packages(want_packages[!have_packages])

# load psych package from library
library(psych)
library(knitr)

25 Personality items representing 5 factors

25 personality self report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality assessment project. The data from 2800 subjects are included here as a demonstration set for scale construction, factor analysis, and Item Response Theory analysis. Three additional demographic variables (sex, education, and age) are also included.
# load data
data("bfi")

# example from psych package help page on bfi
keys.list <- list(agree = c("-A1","A2","A3","A4","A5"),
                  conscientious = c("C1","C2","C3","-C4","-C5"),
                  extraversion = c("-E1","-E2","E3","E4","E5"),
                  neuroticism = c("N1","N2","N3","N4","N5"),
                  openness = c("O1","-O2","O3","O4","-O5"))

keys <- make.keys(bfi,keys.list)

# using knit's kable function to make the output prettier
keys <- data.frame(fa.lookup(f = keys,
          dictionary = bfi.dictionary[,1:4])
)
keys$variable <- row.names(keys)
row.names(keys) <- NULL

Cronbach's alpha "from scratch"

alpha.f <- function(mat){
  r11 = mean(mat[lower.tri(mat)])
  alpha = ncol(mat) * r11 / (1 + (ncol(mat) - 1) * r11)
  return(alpha)
}

alpha.f(cor(reverse.code(bfi[,11:15],keys = c(-1,-1,1,1,1)),use = "complete.obs"))
## [1] 0.7609641

Cronbach's alpha from the psych package

# select the extraversion items and save them as an object
extraversion.items <- names(bfi)[11:15]

# alpha with 5000 bootstraps
psych::alpha(x = bfi[,colnames(bfi) %in% extraversion.items],
             keys = c(-1, -1, 1, 1, 1),
             title = "Cronbach's alpha for 5 extraversion items",
             n.iter = 5000)
## 
## Reliability analysis  Cronbach's alpha for 5 extraversion items  
## Call: psych::alpha(x = bfi[, colnames(bfi) %in% extraversion.items], 
##     keys = c(-1, -1, 1, 1, 1), title = "Cronbach's alpha for 5 extraversion items", 
##     n.iter = 5000)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd
##       0.76      0.76    0.73      0.39 3.2 0.007  4.1 1.1
## 
##  lower alpha upper     95% confidence boundaries
## 0.75 0.76 0.78 
## 
##  lower median upper bootstrapped confidence intervals
##  0.75 0.76 0.78
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r S/N alpha se
## E1-      0.73      0.73    0.67      0.40 2.6   0.0084
## E2-      0.69      0.69    0.63      0.36 2.3   0.0095
## E3       0.73      0.73    0.67      0.40 2.7   0.0082
## E4       0.70      0.70    0.65      0.37 2.4   0.0091
## E5       0.74      0.74    0.69      0.42 2.9   0.0078
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean  sd
## E1- 2777  0.72  0.70  0.59   0.52  4.0 1.6
## E2- 2784  0.78  0.76  0.69   0.61  3.9 1.6
## E3  2775  0.68  0.70  0.58   0.50  4.0 1.4
## E4  2791  0.75  0.75  0.66   0.58  4.4 1.5
## E5  2779  0.64  0.66  0.52   0.45  4.4 1.3
## 
## Non missing response frequency for each item
##       1    2    3    4    5    6 miss
## E1 0.24 0.23 0.15 0.16 0.13 0.09 0.01
## E2 0.19 0.24 0.12 0.22 0.14 0.09 0.01
## E3 0.05 0.11 0.15 0.30 0.27 0.13 0.01
## E4 0.05 0.09 0.10 0.16 0.34 0.26 0.00
## E5 0.03 0.08 0.10 0.22 0.34 0.22 0.01

Informal test of unidimensionality (principal components analysis)

# checking for dimensionality with principal component analysis
psych::principal(r = bfi[,colnames(bfi) %in% extraversion.items])
## Principal Components Analysis
## Call: psych::principal(r = bfi[, colnames(bfi) %in% extraversion.items])
## Standardized loadings (pattern matrix) based upon correlation matrix
##      PC1   h2   u2 com
## E1 -0.70 0.49 0.51   1
## E2 -0.78 0.61 0.39   1
## E3  0.69 0.48 0.52   1
## E4  0.76 0.57 0.43   1
## E5  0.64 0.41 0.59   1
## 
##                 PC1
## SS loadings    2.57
## Proportion Var 0.51
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.13 
##  with the empirical chi square  895.79  with prob <  2.2e-191 
## 
## Fit based upon off diagonal values = 0.9
# eigen values
psych::principal(r = bfi[,colnames(bfi) %in% extraversion.items])$values
## [1] 2.5681044 0.7640221 0.6410349 0.5620838 0.4647548
# percent variance explained
psych::principal(r = bfi[,colnames(bfi) %in% extraversion.items])$values /
  sum(psych::principal(r = bfi[,colnames(bfi) %in% extraversion.items])$values)
## [1] 0.51362088 0.15280441 0.12820699 0.11241677 0.09295096

Here's how you do it in SPSS: Coefficient Alpha

Load the BFI dataset from R.


* load BFI data

GET DATA  /TYPE=TXT
  /FILE="/Users/nicholasmmichalak/Desktop/analysis-examples/Cronbach's alpha and ICC in R and SPSS/bfi.spss.csv"  /ENCODING='Locale'  /DELCASE=LINE  /DELIMITERS=","  /QUALIFIER='"'  /ARRANGEMENT=DELIMITED  /FIRSTCASE=2  /IMPORTCASE=ALL  /VARIABLES=  A1 F4.0  A2 F4.0  A3 F1.0  A4 F4.0  A5 F1.0  C1 F4.0  C2 F4.0  C3 F4.0  C4 F1.0  C5 F1.0  E1 F1.0  E2 F4.0  E3 F4.0  E4 F1.0  E5 F4.0  N1 F4.0  N2 F4.0  N3 F1.0  N4 F4.0  N5 F4.0  O1 F1.0  O2 F1.0  O3 F1.0  O4 F1.0  O5 F1.0  gender F1.0  education F4.0  age F2.0.MISSING VALUES A1 TO age (9999).CACHE.EXECUTE.DATASET NAME DataSet1 WINDOW=FRONT.


* reverse code first two items

COMPUTE E1.rev = abs(E1 - 7).
COMPUTE E2.rev = abs(E2 - 7).
EXECUTE.


* cronbachs alpha with ICC for 95% confidence intervals

RELIABILITY
  /VARIABLES=E1.rev E2.rev E3 E4 E5
  /SCALE("Cronbach's alpha for 5 extraversion items") ALL
  /MODEL=ALPHA
  /STATISTICS=DESCRIPTIVE SCALE CORR
  /SUMMARY=TOTAL MEANS VARIANCE CORR
  /ICC=MODEL(RANDOM) TYPE(CONSISTENCY) CIN=95 TESTVAL=0.



Case Processing Summary
N%
CasesValid271396.9
Excludeda873.1
Total2800100.0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's AlphaCronbach's Alpha Based on Standardized ItemsN of Items
.761.7615
Item Statistics
MeanStd. DeviationN
E1.rev4.02841.632412713
E2.rev3.85551.607202713
E34.00001.352372713
E44.42091.461312713
E54.41841.336762713
Inter-Item Correlation Matrix
E1.revE2.revE3E4E5
E1.rev1.000.468.320.419.298
E2.rev.4681.000.377.514.377
E3.320.3771.000.418.383
E4.419.514.4181.000.316
E5.298.377.383.3161.000
Summary Item Statistics
MeanMinimumMaximumRangeMaximum / MinimumVarianceN of Items
Item Means4.1453.8564.421.5651.147.0675
Item Variances2.2001.7872.665.8781.491.1695
Inter-Item Correlations.389.298.514.2161.723.0045
Item-Total Statistics
Scale Mean if Item DeletedScale Variance if Item DeletedCorrected Item-Total CorrelationSquared Multiple CorrelationCronbach's Alpha if Item Deleted
E1.rev16.694818.280.513.278.725
E2.rev16.867717.399.606.380.688
E316.723220.196.501.266.728
E416.302218.678.578.351.701
E516.304820.784.455.222.742
Scale Statistics
MeanVarianceStd. DeviationN of Items
20.723228.1135.302125
Intraclass Correlation Coefficient
Intraclass Correlationb95% Confidence IntervalF Test with True Value 0
Lower BoundUpper BoundValuedf1df2Sig
Single Measures.389a.371.4084.183271210848.000
Average Measures.761.746.7754.183271210848.000
Two-way random effects model where both people effects and measures effects are random.
a. The estimator is the same, whether the interaction effect is present or not.
b. Type C intraclass correlation coefficients using a consistency definition-the between-measure variance is excluded from the denominator variance.

* factor analysis for informally testing unidimensionality

FACTOR
  /VARIABLES E1.rev E2.rev E3 E4 E5
  /FORMAT SORT.


Communalities
InitialExtraction
E1.rev1.000.490
E2.rev1.000.608
E31.000.477
E41.000.575
E51.000.414
Extraction Method: Principal Component Analysis.
Total Variance Explained
ComponentInitial EigenvaluesExtraction Sums of Squared Loadings
Total% of VarianceCumulative %Total% of VarianceCumulative %
12.56551.29851.2982.56551.29851.298
2.76815.36866.666
3.64312.85179.517
4.56111.21190.728
5.4649.272100.000
Extraction Method: Principal Component Analysis.
Component Matrixa
Component
1
E2.rev.780
E4.758
E1.rev.700
E3.691
E5.644
Extraction Method: Principal Component Analysis.
a. 1 components extracted.
Rotated Component Matrixa
a. Only one component was extracted. The solution cannot be rotated.


Here's how you do it in SPSS: Intraclass Correlation Coefficient


Here's a great resource for the ICC: Computing Intraclass Correlations (ICC) as Estimates of Interrater Reliability in SPSS

* load Shrout and Fleiss data

GET DATA  /TYPE=TXT
  /FILE="/Users/nicholasmmichalak/Desktop/analysis-examples/Cronbach's alpha and ICC in R and SPSS/sf.csv"
  /ENCODING='Locale'
  /DELCASE=LINE
  /DELIMITERS=","
  /QUALIFIER='"'
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /IMPORTCASE=ALL
  /VARIABLES=
subj F2.0
  J1 F2.0
  J2 F1.0
  J3 F1.0
  J4 F1.0.
CACHE.
EXECUTE.
DATASET NAME DataSet2 WINDOW=FRONT.


* ICC 2-way random effects and averages

RELIABILITY
  /VARIABLES=J1 J2 J3 J4
  /SCALE('ALL VARIABLES') ALL
  /MODEL=ALPHA
  /STATISTICS=DESCRIPTIVE SCALE CORR
  /SUMMARY=TOTAL MEANS VARIANCE CORR
  /ICC=MODEL(RANDOM) TYPE(CONSISTENCY) CIN=95 TESTVAL=0.

Case Processing Summary
N%
CasesValid6100.0
Excludeda0.0
Total6100.0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's AlphaCronbach's Alpha Based on Standardized ItemsN of Items
.909.9274
Item Statistics
MeanStd. DeviationN
J17.671.6336
J22.501.6436
J34.331.6336
J46.672.5036
Inter-Item Correlation Matrix
J1J2J3J4
J11.000.745.725.750
J2.7451.000.894.729
J3.725.8941.000.718
J4.750.729.7181.000
Summary Item Statistics
MeanMinimumMaximumRangeMaximum / MinimumVarianceN of Items
Item Means5.2922.5007.6675.1673.0675.4144
Item Variances3.5752.6676.2673.6002.3503.2204
Inter-Item Correlations.760.718.894.1771.246.0044
Item-Total Statistics
Scale Mean if Item DeletedScale Variance if Item DeletedCorrected Item-Total CorrelationSquared Multiple CorrelationCronbach's Alpha if Item Deleted
J113.5028.300.806.651.883
J218.6727.467.859.824.867
J316.8327.767.844.812.872
J414.5020.700.790.635.918
Scale Statistics
MeanVarianceStd. DeviationN of Items
21.1744.9676.7064
Intraclass Correlation Coefficient
Intraclass Correlationb95% Confidence IntervalF Test with True Value 0
Lower BoundUpper BoundValuedf1df2Sig
Single Measures.715a.342.94611.027515.000
Average Measures.909.676.98611.027515.000
Two-way random effects model where both people effects and measures effects are random.
a. The estimator is the same, whether the interaction effect is present or not.
b. Type C intraclass correlation coefficients using a consistency definition-the between-measure variance is excluded from the denominator variance.

* ICC 2-way random effects and absolute agreement

RELIABILITY
  /VARIABLES=J1 J2 J3 J4
  /SCALE('ALL VARIABLES') ALL
  /ICC=MODEL(RANDOM) TYPE(ABSOLUTE) CIN=95 TESTVAL=0.

Case Processing Summary
N%
CasesValid6100.0
Excludeda0.0
Total6100.0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's AlphaN of Items
.9094
Intraclass Correlation Coefficient
Intraclass Correlationb95% Confidence IntervalF Test with True Value 0
Lower BoundUpper BoundValuedf1df2Sig
Single Measures.290a.019.76111.027515.000
Average Measures.620.071.92711.027515.000
Two-way random effects model where both people effects and measures effects are random.
a. The estimator is the same, whether the interaction effect is present or not.
b. Type A intraclass correlation coefficients using an absolute agreement definition.

* ICC 1-way random effects and averages

RELIABILITY
  /VARIABLES=J1 J2 J3 J4
  /SCALE('ALL VARIABLES') ALL
  /ICC=MODEL(ONEWAY) TYPE(CONSISTENCY) CIN=95 TESTVAL=0.

Case Processing Summary
N%
CasesValid6100.0
Excludeda0.0
Total6100.0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's AlphaN of Items
.9094
Intraclass Correlation Coefficient
Intraclass Correlation95% Confidence IntervalF Test with True Value 0
Lower BoundUpper BoundValuedf1df2Sig
Single Measures.166-.133.7231.795518.165
Average Measures.443-.884.9121.795518.165
One-way random effects model where people effects are random.

* ICC 1-way random effects and absolute agreement

RELIABILITY
  /VARIABLES=J1 J2 J3 J4
  /SCALE('ALL VARIABLES') ALL
  /ICC=MODEL(ONEWAY) TYPE(ABSOLUTE) CIN=95 TESTVAL=0.

Case Processing Summary
N%
CasesValid6100.0
Excludeda0.0
Total6100.0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's AlphaN of Items
.9094
Intraclass Correlation Coefficient
Intraclass Correlation95% Confidence IntervalF Test with True Value 0
Lower BoundUpper BoundValuedf1df2Sig
Single Measures.166-.133.7231.795518.165
Average Measures.443-.884.9121.795518.165
One-way random effects model where people effects are random.

* ICC mixed effects and averages

RELIABILITY
  /VARIABLES=J1 J2 J3 J4
  /SCALE('ALL VARIABLES') ALL
  /ICC=MODEL(MIXED) TYPE(CONSISTENCY) CIN=95 TESTVAL=0.

Case Processing Summary
N%
CasesValid6100.0
Excludeda0.0
Total6100.0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's AlphaN of Items
.9094
Intraclass Correlation Coefficient
Intraclass Correlationb95% Confidence IntervalF Test with True Value 0
Lower BoundUpper BoundValuedf1df2Sig
Single Measures.715a.342.94611.027515.000
Average Measures.909c.676.98611.027515.000
Two-way mixed effects model where people effects are random and measures effects are fixed.
a. The estimator is the same, whether the interaction effect is present or not.
b. Type C intraclass correlation coefficients using a consistency definition-the between-measure variance is excluded from the denominator variance.
c. This estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.

* ICC mixed effects and absolute agreement

RELIABILITY
  /VARIABLES=J1 J2 J3 J4
  /SCALE('ALL VARIABLES') ALL
  /ICC=MODEL(MIXED) TYPE(ABSOLUTE) CIN=95 TESTVAL=0.

Case Processing Summary
N%
CasesValid6100.0
Excludeda0.0
Total6100.0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's AlphaN of Items
.9094
Intraclass Correlation Coefficient
Intraclass Correlationb95% Confidence IntervalF Test with True Value 0
Lower BoundUpper BoundValuedf1df2Sig
Single Measures.290a.019.76111.027515.000
Average Measures.620c.071.92711.027515.000
Two-way mixed effects model where people effects are random and measures effects are fixed.
a. The estimator is the same, whether the interaction effect is present or not.
b. Type A intraclass correlation coefficients using an absolute agreement definition.
c. This estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.

Here's how you do it in R: Intraclass Correlation Coefficient

Intraclass Correlations (ICC1, ICC2, ICC3 from Shrout and Fleiss) from the psych package


The Intraclass correlation is used as a measure of association when studying the reliability of raters. Shrout and Fleiss (1979) outline 6 different estimates, that depend upon the particular experimental design. All are implemented and given confidence limits.
# data matrix
sf <- matrix(c(1, 9,    2,   5,    8,
2, 6,    1,   3,    2,
3, 8,    4,   6,    8,
4, 7,    1,   2,    6,
5, 10,   5,   6,    9,
6, 6,   2,   4,    7),
ncol = 5,
byrow = TRUE)

# add column names
colnames(x = sf) <- c("subj",paste("J", 1:4,sep = ""))

# run ICC
ICC(x = sf[,2:5],
    missing = TRUE,
    alpha = .05)
## Call: ICC(x = sf[, 2:5], missing = TRUE, alpha = 0.05)
## 
## Intraclass correlation coefficients 
##                          type  ICC    F df1 df2       p lower bound
## Single_raters_absolute   ICC1 0.17  1.8   5  18 0.16477      -0.133
## Single_random_raters     ICC2 0.29 11.0   5  15 0.00013       0.019
## Single_fixed_raters      ICC3 0.71 11.0   5  15 0.00013       0.342
## Average_raters_absolute ICC1k 0.44  1.8   5  18 0.16477      -0.884
## Average_random_raters   ICC2k 0.62 11.0   5  15 0.00013       0.071
## Average_fixed_raters    ICC3k 0.91 11.0   5  15 0.00013       0.676
##                         upper bound
## Single_raters_absolute         0.72
## Single_random_raters           0.76
## Single_fixed_raters            0.95
## Average_raters_absolute        0.91
## Average_random_raters          0.93
## Average_fixed_raters           0.99
## 
##  Number of subjects = 6     Number of Judges =  4

The alpha() function's Cronbach's alpha ~= ICC() function's ICC3k (average_fixed_raters)

ICC3: A fixed set of k judges rate each target. There is no generalization to a larger population of judges. (MSB - MSE)/(MSB+ (nr-1)*MSE)
alpha (Cronbach, 1951) is the same as Guttman's lambda3 (Guttman, 1945) and may be found by
Lambda 3 = (n)/(n-1)(1-tr(Vx)/(Vx) = (n)/(n-1)(Vx-tr(Vx)/Vx = alpha

# alpha on the ICC formatted data
psych::alpha(x = sf[,2:5])
## 
## Reliability analysis   
## Call: psych::alpha(x = sf[, 2:5])
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd
##       0.91      0.93    0.92      0.76  13 0.057  5.3 1.7
## 
##  lower alpha upper     95% confidence boundaries
## 0.8 0.91 1.02 
## 
##  Reliability if an item is dropped:
##    raw_alpha std.alpha G6(smc) average_r  S/N alpha se
## J1      0.88      0.91    0.89      0.78 10.7    0.080
## J2      0.87      0.89    0.85      0.73  8.1    0.083
## J3      0.87      0.90    0.85      0.74  8.6    0.080
## J4      0.92      0.92    0.90      0.79 11.2    0.060
## 
##  Item statistics 
##    n raw.r std.r r.cor r.drop mean  sd
## J1 6  0.88  0.89  0.83   0.81  7.7 1.6
## J2 6  0.92  0.93  0.92   0.86  2.5 1.6
## J3 6  0.91  0.92  0.91   0.84  4.3 1.6
## J4 6  0.91  0.88  0.82   0.79  6.7 2.5
## 
## Non missing response frequency for each item
##       1    2    3    4    5    6    7    8    9   10 miss
## J1 0.00 0.00 0.00 0.00 0.00 0.33 0.17 0.17 0.17 0.17    0
## J2 0.33 0.33 0.00 0.17 0.17 0.00 0.00 0.00 0.00 0.00    0
## J3 0.00 0.17 0.17 0.17 0.17 0.33 0.00 0.00 0.00 0.00    0
## J4 0.00 0.17 0.00 0.00 0.00 0.17 0.17 0.33 0.17 0.00    0

write data to .csv file for use in SPSS


# set working directory
setwd("~/Desktop/analysis-examples/Cronbach's alpha and ICC in R and SPSS/")

# spss reads NAs and turns the entire variable into a string. It's dumb. SO I make NA's into 9999 (a number)
# in SPSS syntax, I code 9999 as missing
bfi.spss <- bfi
bfi.spss[is.na(bfi.spss)] <- 9999

# BFI data
write.csv(x = bfi.spss,
          file = "bfi.spss.csv",
          row.names = FALSE)

# Shrout and Fleiss example
write.csv(x = sf,
          file = "sf.csv",
          row.names = FALSE)


The problem(s) with coefficient alpha


I went through all of that R code and SPSS syntax to tell you this: you probably shouldn't rely on or report coefficient alpha, or, if you do, you should run some other reliability and related analyses as well. Here's why.


Coefficient alpha tells you about the variance shared by all the items used in a composite score. The calculation of alpha depends on the spread of the the inter-item correlations, so the point estimate for alpha will reflect the range of correlations no matter where the source of this range comes from. It could be measurement error or more factor dimensions could underly the scale—coefficient alpha doesn't know.

Coefficient alpha will equal reliability only under conditions that are rarely met in practice:

  1. Residuals don't correlate
  2. Items have identical loadings
  3. The scale is unidimensional.

When these conditions are not met, coefficient alpha may underestimate or overestimate reliability.

Here's one particularly damning quote from Zinbarg, Revelle, Yovel, & Li (2005):

Equation (10) and its reduced forms in Cases I, II and III show that is multiply determined under all but the most highly restrictive of conditions — it often reflects not only general factor saturation but also group factor saturation and even variability in factor loadings. Given such complexity, one could question whether the meaning of is terribly clear in many conditions.

Here's Cronbach himself in Cronbach & Shavelson (2004) :
Coefficients are a crude device that does not bring to the surface many subtleties implied by variance components.

OK, but we still want a coefficient that tells us our scale is reliable (or not) and unidimensional (or not). Luckily, some smart statisticians made some more coefficients and generally agreed that one is best for unidimensional measures: omega.

Source: User chl, response to StackExchange question here


Here's how you do it in R: Omega

McDonald's omega estimates of general and total factor saturation

McDonald has proposed coefficient omega as an estimate of the general factor saturation of a test. One way to find omega is to do a factor analysis of the original data set, rotate the factors obliquely, do a Schmid Leiman transformation, and then find omega. This function estimates omega as suggested by McDonald by using hierarchical factor analysis (following Jensen).

# with psych package
psych::omega(m = bfi[,colnames(bfi) %in% extraversion.items], key = c(-1, -1, 1, 1, 1), nfactors = 3, n.iter = 100, plot = FALSE)
## Loading required namespace: GPArotation
## Omega 
## Call: omegah(m = m, nfactors = nfactors, fm = fm, key = key, flip = flip, 
##     digits = digits, title = title, sl = sl, labels = labels, 
##     plot = plot, n.obs = n.obs, rotate = rotate, Phi = Phi, option = option, 
##     covar = covar)
## Alpha:                 0.76 
## G.6:                   0.73 
## Omega Hierarchical:    0.69 
## Omega H asymptotic:    0.87 
## Omega Total            0.8 
## 
## Schmid Leiman Factor loadings greater than  0.2 
##        g   F1*  F2*   F3*   h2   u2   p2
## E1- 0.56                  0.36 0.64 0.89
## E2- 0.73  0.30            0.62 0.38 0.85
## E3  0.53       0.37       0.44 0.56 0.65
## E4  0.65             0.36 0.58 0.42 0.74
## E5  0.50       0.36       0.39 0.61 0.63
## 
## With eigenvalues of:
##    g  F1*  F2*  F3* 
## 1.81 0.14 0.28 0.16 
## 
## general/max  6.45   max/min =   1.96
## mean percent general =  0.75    with sd =  0.12 and cv of  0.15 
## Explained Common Variance of the general factor =  0.76 
## 
## The degrees of freedom are -2  and the fit is  0 
## The number of observations was  2800  with Chi Square =  0  with prob <  NA
## The root mean square of the residuals is  0 
## The df corrected root mean square of the residuals is  NA
## 
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 5  and the fit is  0.04 
## The number of observations was  2800  with Chi Square =  113.96  with prob <  6e-23
## The root mean square of the residuals is  0.05 
## The df corrected root mean square of the residuals is  0.07 
## 
## RMSEA index =  0.088  and the 90 % confidence intervals are  0.075 0.103
## BIC =  74.27 
## 
## Measures of factor score adequacy             
##                                                  g   F1*   F2*   F3*
## Correlation of scores with factors            0.85  0.35  0.50  0.47
## Multiple R square of scores with factors      0.72  0.12  0.25  0.22
## Minimum correlation of factor score estimates 0.44 -0.76 -0.50 -0.56
## 
##  Total, General and Subset omega for each subset
##                                                  g  F1*  F2*  F3*
## Omega total for total scores and subscales    0.80 0.65 0.58 0.56
## Omega general for total scores and subscales  0.69 0.57 0.38 0.43
## Omega group for total scores and subscales    0.07 0.08 0.20 0.13
## 
##  Estimates and bootstrapped confidence intervals
##           lower estimate upper
## omega_h    0.55     0.69  0.70
## alpha      0.68     0.76  0.76
## omega_tot  0.76     0.80  0.84
## G6         0.67     0.73  0.74
## omega_lim  0.68     0.87  0.89

estimating omega h using the ci.reliability() from the MBESS package


MBESS::ci.reliability(data = data.frame(
  reverse.code(items = na.omit(bfi[,colnames(bfi) %in% extraversion.items]
  ),
  keys = c(-1,-1,1,1,1))
  ), type = "hierarchical", interval.type = "none"
  )
## $est
## [1] 0.7668617
## 
## $se
## [1] NA
## 
## $ci.lower
## [1] NA
## 
## $ci.upper
## [1] NA
## 
## $conf.level
## [1] 0.95
## 
## $type
## [1] "hierarchical omega"
## 
## $interval.type
## [1] "none"
MBESS::ci.reliability(data = data.frame(
  reverse.code(items = na.omit(bfi[,colnames(bfi) %in% extraversion.items]
  ),
  keys = c(-1,-1,1,1,1))
  ), type = "omega", interval.type = "none")
## $est
## [1] 0.7673336
## 
## $se
## [1] NA
## 
## $ci.lower
## [1] NA
## 
## $ci.upper
## [1] NA
## 
## $conf.level
## [1] 0.95
## 
## $type
## [1] "omega"
## 
## $interval.type
## [1] "none"



Final thoughts

I'm genuinely confused why ci.reliability() essentially returns the same estimate as alpha in other functions. If anyone knows why this is the case, please email me or post your ideas in the comments.

There is a lot of code and commentary in this post. This is mostly because reliability turned out to be more complicated than I thought. You can't just compute alpha and dust your hands.


Use some sort of dimension reduction technique (e.g., principal components) to check whether the first eigenvalue is big and the next values aren't. This suggests a unidimensional scale.


Source: SPSS FAQ What does Cronbach's alpha mean? 


If the scale you're using is well-developed, compute its alpha and omega, and treat them like point estimates (because they are). That is, report their confidence intervals.


If it's not well-developed, then you have some factor analyses and some theory-guided decisions to make that go beyond the scope of this post. In any case...


Happy R

References



  • Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and psychological measurement, 64(3), 391-418.
  • Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399-412.
  • Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ω H: Their relations with each other and two alternative conceptualizations of reliability. psychometrika, 70(1), 123-133.

No comments:

Post a Comment