For starters, here’s the Stan code. And here’s the R script: Stan code. If you are already familiar with RStan, the basic concepts you need to combine are standard multilevel models with correlated random slopes and heteroskedastic errors.

I will embed R code into the demonstration. The required packages are `lavaan`

, `lme4`

and `RStan`

.

I like to understand most statistical methods as regression models. This way, it’s easy to understand the claims underlying a large number of techniques. It’s an approach that works for multilevel, SEM, and IRT models. Here, I’ll be focusing on confirmatory factor analysis (CFA), so I’ll begin by developing CFA from a model that is easy to fit in any multilevel regression software:

$$ y_{pi} = \theta_p + \beta_i + \epsilon $$

where $y_{pi}$ is the response of person $p$ to item $i$, $\theta_p$ is the random intercept that varies by person and is assumed to be normal with mean zero and standard deviation $\alpha$ $\big(\mathcal{N}(0, \alpha^2)\big)$, $\beta_i$ are the coefficients of the item dummies, and $\epsilon$ is the residual (level 1) error assumed normal with mean zero and standard deviation $\sigma,\ \mathcal{N}(0, \sigma^2)$.

Anyone familiar with multilevel regression can fit this basic random-intercept model. Shape your person by item data to “long form” where each row contains a person ID, an item ID and the item response. Make the item response the outcome, make each item into a dummy variable as predictors (at level 1) and place a random intercept on the person. Your random intercept standard deviation is $\alpha$, residual standard deviation (or level one error) is $\sigma$ and the item dummies are $\beta_i$. If your software permits, remove the fixed intercept and use all item dummies. Finally, $\theta_p$ represents the latent score for each individual $p$.

To demonstrate, I use the HolzingerSwineford dataset. There are 9 items, items 1 to 3 load on factor 1, 4 to 6 on factor 2, and 7 to 9 on factor 3.

```
library(lavaan)
dat <- HolzingerSwineford1939
dat$sex <- dat$sex - 1 # Who is 1, no idea
dat$grade <- dat$grade - min(dat$grade, na.rm = TRUE)
dat$ID <- 1:nrow(dat)
# Make data long
dat.l <- tidyr::gather(dat, item, score, x1:x9)
dat.l$item.no <- as.integer(gsub("x", "", dat.l$item))
library(lme4)
lmer(score ~ 0 + factor(item.no) + (1 | ID), dat.l, REML = FALSE)
# Random effects:
# Groups Name Std.Dev.
# ID (Intercept) 0.5758
# Residual 0.9694
# Number of obs: 2709, groups: ID, 301
```

The model above fit with ML not REML is identical to a unidimensional CFA where we constrain all the item loadings to be the same value and the item errors to be the same value. The random intercept standard deviation, $\alpha$, is the factor loading for all items and $\sigma^2$ is the item error variance for all items. To get the standardized loading which we will call $\lambda$, use:

$$\lambda = \frac{\alpha}{\sqrt{\alpha^2+\sigma^2}} = \frac{0.5758}{\sqrt{0.5758^2+0.9694^2}}=0.5107$$

The lavaan model below should confirm this to be true. Note that in the lavaan syntax, the factor is standardized to have variance of 1 using `std.lv = TRUE`

. The item dummies (not shown) from lme4 are also the CFA item intercepts.

```
parameterEstimates(sem(
"F1 =~ a * x1 + a * x2 + a * x3 + a * x4 + a * x5 + a * x6 + a * x7 + a * x8 + a * x9\n
x1 ~~ f * x1\nx2 ~~ f * x2\nx3 ~~ f * x3\nx4 ~~ f * x4\n
x5 ~~ f * x5\nx6 ~~ f * x6\nx7 ~~ f * x7\nx8 ~~ f * x8\nx9 ~~ f * x9",
dat, std.lv = TRUE
), standardized = TRUE)[c(1:2, 10:11), c(1:5, 12)]
# lhs op rhs label est std.all
# 1 F1 =~ x1 a 0.576 0.511
# 2 F1 =~ x2 a 0.576 0.511
# 10 x1 ~~ x1 f 0.940 0.739
# 11 x2 ~~ x2 f 0.940 0.739
```

The standardized loading has a similar formula to the (residual) intraclass correlation coefficient (ICC) which is $\frac{\alpha^2}{\alpha^2+\sigma^2}$. So we have that $\lambda=\sqrt{\mathrm{ICC}}$. This is in line with thinking of standardized loadings as correlation coefficients and ICCs as $R^2$.

Now, everyone familiar with CFA knows we almost never constrain all item loadings and item errors to have the same value (for what it’s worth, this is what we do in the Rasch model with binary outcomes). But we will maintain the *CFA as regression* for as long as possible. Let us extend the model to include multiple factors. To include multiple factors, we create an indicator column in long form that uniquely identifies the factor an item belongs to. And instead of using a single random intercept, we use the factor dummies as random slopes without the random intercept. In this model, we assume:

$$ y_{pi} = \mathbf{\theta}_{pf} + \beta_i + \epsilon $$

where $\mathbf{\theta}_{pf}$ instead of being the same at all times for any individual now also depends on the specific factor $f$. Precisely, we assume the latent scores are multivariate normal, $\mathbf{\theta}_{pf}\sim\mathcal{N}_f(\mathbf{0,\Sigma})$, where $f$ is the number of factors, and $\mathbf{\Sigma}$ represents the covariance matrix of the random slopes. The square root of the diagonal of this covariance matrix (i.e. standard deviations of random slopes) will again be the unstandardized factor loadings, $\alpha_f$. The off-diagonal covariances when standardized (i.e. correlations) will be the interfactor correlations. We can again fit this model in `lme4`

:

```
# Assign item to factors
dat.l$Fs <- ((dat.l$item.no - 1) %/% 3) + 1
lmer(score ~ 0 + factor(item) + (0 + factor(Fs) | ID), dat.l, REML = FALSE)
# Random effects:
# Groups Name Std.Dev. Corr
# ID factor(Fs)1 0.7465
# factor(Fs)2 0.9630 0.41
# factor(Fs)3 0.6729 0.38 0.30
# Residual 0.7909
```

The corresponding lavaan model is:

```
parameterEstimates(sem(
"F1 =~ a * x1 + a * x2 + a * x3\nF2 =~ b * x4 + b * x5 + b * x6\nF3 =~ c * x7 + c * x8 + c * x9\n
x1 ~~ f * x1\nx2 ~~ f * x2\nx3 ~~ f * x3\nx4 ~~ f * x4\nx5 ~~ f * x5\n
x6 ~~ f * x6\nx7 ~~ f * x7\nx8 ~~ f * x8\nx9 ~~ f * x9",
dat, std.lv = TRUE
), standardized = TRUE)[c(1:10, 22:24), c(1:5, 12)]
# lhs op rhs label est std.all
# 1 F1 =~ x1 a 0.746 0.686
# 2 F1 =~ x2 a 0.746 0.686
# 3 F1 =~ x3 a 0.746 0.686
# 4 F2 =~ x4 b 0.963 0.773
# 5 F2 =~ x5 b 0.963 0.773
# 6 F2 =~ x6 b 0.963 0.773
# 7 F3 =~ x7 c 0.673 0.648
# 8 F3 =~ x8 c 0.673 0.648
# 9 F3 =~ x9 c 0.673 0.648
# 10 x1 ~~ x1 f 0.626 0.529
# 22 F1 ~~ F2 0.407 0.407
# 23 F1 ~~ F3 0.385 0.385
# 24 F2 ~~ F3 0.301 0.301
```

We see that the factor loadings in CFA are the random slope standard deviations from multilevel. And the interfactor correlation matrix matches the random slope correlations from multilevel.

The final model we can fit with some, not all, multilevel regression software is:

$$ y_{pi} = \mathbf{\theta}_{pf} + \beta_i + \epsilon_i $$

where we permit the residual error, $\epsilon_i$, to be different depending on the item. This is a heteroskedastic regression model. One would have to switch to from `lme4`

to `nlme`

or `glmmTMB`

in R to estimate it; it should be possible in HLM. In `lavaan`

, the model syntax would be:

```
# Drop the error variance constraints
"F1 =~ a * X1 + a * X2 + a * X3\nF2 =~ b * X4 + b * X5 + b * X6\nF3 =~c * X7 + c * X8 + c * X9"
```

The latest model is very close to a standard CFA model. The final change is that we need to permit the item loadings to vary by item, not by factor. Once we do this, we can no longer fit the model with multilevel regression software. The equation becomes:

$$ y_{pi} = \alpha_i\theta_{pf} + \beta_i + \epsilon_i $$

$\mathbf{\theta}_{pf}$ is still multivariate normal, but with variance $\mathbf{1},\ \mathbf{\theta}_{pf}\sim\mathcal{N}_f(\mathbf{0,1})$. The most interesting inclusion is that we multiply the latent scores by an item coefficient, $\alpha_i$.^{1} Note that $\alpha$ consistently represents the unstandardized loadings in this demonstration. Here, instead of being constant or varying by factor, it varies by item.

Bayesian software can fit a complicated model like this. We’d have to specify priors for the different components of this equation. Page 144 of the Stan manual provides some priors for the 2PL item-response theory model that we can apply here as a starting point:

- loadings will be log-normal constraining them to be positive, $\alpha_i\sim\mathrm{Lognormal}(0,\sigma_\alpha)$. For $\sigma_\alpha$, we can assume it’s half-Cauchy, $\sigma_\alpha\sim C^{+}(0, 2.5)$.
- there is usually enough information to estimate item intercepts/difficulties, we can use a uniform prior
- item errors can be assumed to be inverse gamma, $\epsilon_i\sim\mathrm{InvGamma}(1,1)$.
- the latent scores are multivariate normal, $\mathbf{\theta}_{pf}\sim\mathcal{N}_f(\mathbf{0,1})$

The one point requiring additional comments is the multivariate normal prior for $\mathbf{\theta}_{pf}$. For a positive-definite correlation matrix, $\mathbf{R}$, we can apply the Cholesky decomposition: $\mathbf{R} = \mathbf{L}\mathbf{L}^*$. In Stan, it is computationally more convenient to apply a prior on $\mathbf{L}$ and generate the multivariate normal latent scores using $\mathbf{L}$. The recommended choice of prior for $\mathbf{L}$ is the LKJ-prior.^{2}

In Stan syntax, the required data are:

```
data {
real g_alpha; // for inverse gamma
real g_beta; // for inverse gamma
int<lower = 0> N; // scalar, number of person times number of items
int<lower = 0> Ni; // scalar, number of items
int<lower = 0> Np; // scalar, number of persons
int<lower = 0> Nf; // scalar, number of factors
vector[N] response; // vector, long form of item responses
// all remaining entries are data in long form
// with consecutive integers beginning at 1 acting as unique identifiers
int<lower = 1, upper = Ni> items[N];
int<lower = 1, upper = Np> persons[N];
int<lower = 1, upper = Nf> factors[N];
}
```

The estimated parameters are:

```
parameters {
vector<lower = 0>[Ni] item_vars; // item vars heteroskedastic
real<lower = 0> sigma_alpha; // sd of loadings, hyperparm
vector<lower = 0>[Ni] alphas; // loadings
vector[Ni] betas; // item intercepts, default uniform prior
vector[Nf] thetas[Np]; // person scores for each factor
cholesky_factor_corr[Nf] L; // Cholesky decomp of corr mat of random slopes
}
```

We need some transformed parameters to capture the mean and variance of the item responses. This is where we provide the regression equation verbatim:

```
transformed parameters {
vector[N] yhat;
vector[N] item_sds_i;
for (i in 1:N) {
yhat[i] = alphas[items[i]] * thetas[persons[i], factors[i]] + betas[items[i]];
item_sds_i[i] = sqrt(item_vars[items[i]]);
}
}
```

And for the priors:

```
model {
vector[Nf] A = rep_vector(1, Nf); // Vector of random slope variances
matrix[Nf, Nf] A0;
L ~ lkj_corr_cholesky(Nf);
A0 = diag_pre_multiply(A, L);
thetas ~ multi_normal_cholesky(rep_vector(0, Nf), A0);
alphas ~ lognormal(0, sigma_alpha);
sigma_alpha ~ cauchy(0, 2.5); // hyperparm
item_vars ~ inv_gamma(g_alpha, g_beta);
response ~ normal(yhat, item_sds_i);
}
```

Finally, we can compute the standardized loadings and the interfactor correlation matrix, R:

```
generated quantities {
vector<lower = 0>[Ni] loadings_std; // obtain loadings_std
matrix[Nf, Nf] R;
R = tcrossprod(L);
for (i in 1:Ni) {
loadings_std[i] = alphas[i] / sqrt(square(alphas[i]) + item_vars[i]);
}
}
```

We could make a few modifications:

- We can standardize the item responses prior to modeling to improve computational stability
- Then apply a normal prior to the item intercepts

The syntax to run the model would then be:

```
# First, let's fit the model in lavaan:
cfa.lav.fit <- sem(
"F1 =~ x1 + x2 + x3\nF2 =~ x4 + x5 + x6\nF3 =~ x7 + x8 + x9",
dat, std.lv = TRUE, meanstructure = TRUE)
library(rstan)
options(mc.cores = parallel::detectCores()) # Use multiple cores
rstan_options(auto_write = TRUE) # One time Stan program compilation
cfa.mm <- stan_model(stanc_ret = stanc(file = "bayes_script/cfa.stan")) # Compile Stan code
cfa.stan.fit <- sampling(
cfa.mm, data = list(
N = nrow(dat.l), Ni = length(unique(dat.l$item)),
Np = length(unique(dat.l$ID)), Nf = length(unique(dat.l$Fs)),
items = dat.l$item.no, factors = dat.l$Fs,
persons = dat.l$ID, response = dat.l$score,
g_alpha = 1, g_beta = 1),
control = list(adapt_delta = .99999, max_treedepth = 15))
```

What are some loadings?

```
print(cfa.stan.fit, pars = c("alphas", "loadings_std"),
probs = c(.025, .5, .975), digits_summary = 3)
# mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
# alphas[1] 0.889 0.003 0.078 0.733 0.890 1.041 790 1.002
# alphas[4] 0.991 0.002 0.056 0.885 0.988 1.101 1263 1.002
# alphas[5] 1.102 0.002 0.062 0.980 1.102 1.224 1056 1.001
# alphas[9] 0.692 0.003 0.075 0.548 0.692 0.846 799 1.005
# loadings_std[1] 0.751 0.002 0.052 0.643 0.752 0.848 601 1.003
# loadings_std[4] 0.848 0.001 0.023 0.801 0.849 0.890 1275 1.003
# loadings_std[5] 0.851 0.001 0.023 0.803 0.852 0.891 1176 1.001
# loadings_std[9] 0.672 0.003 0.059 0.552 0.673 0.786 556 1.007
# For comparison, the lavaan loadings are:
parameterEstimates(cfa.lav.fit, standardized = TRUE)[1:9, c(1:5, 11)]
# lhs op rhs est se std.all
# 1 F1 =~ x1 0.900 0.081 0.772
# 4 F2 =~ x4 0.990 0.057 0.852
# 5 F2 =~ x5 1.102 0.063 0.855
# 9 F3 =~ x9 0.670 0.065 0.665
```

For the interfactor correlations:

```
print(cfa.stan.fit, pars = c("R[1, 2]", "R[1, 3]", "R[2, 3]"),
probs = c(.025, .5, .975), digits_summary = 3)
# mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
# R[1,2] 0.435 0.001 0.065 0.303 0.437 0.557 2019 0.999
# R[1,3] 0.451 0.003 0.081 0.289 0.450 0.607 733 1.005
# R[2,3] 0.271 0.001 0.071 0.130 0.272 0.406 2599 1.000
# From lavaan:
parameterEstimates(cfa.lav.fit, standardized = TRUE)[22:24, c(1:5, 11)]
# lhs op rhs est se std.all
# 22 F1 ~~ F2 0.459 0.064 0.459
# 23 F1 ~~ F3 0.471 0.073 0.471
# 24 F2 ~~ F3 0.283 0.069 0.283
```

For the error variances:

```
print(cfa.stan.fit, pars = c("item_vars"),
probs = c(.025, .5, .975), digits_summary = 3)
# mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
# item_vars[3] 0.829 0.003 0.095 0.652 0.828 1.026 1292 1.000
# item_vars[4] 0.383 0.001 0.049 0.292 0.381 0.481 1552 1.002
# item_vars[5] 0.459 0.001 0.059 0.351 0.456 0.581 1577 1.001
# item_vars[9] 0.575 0.004 0.085 0.410 0.575 0.739 532 1.008
# From lavaan:
parameterEstimates(cfa.lav.fit, standardized = TRUE)[10:18, 1:5]
# lhs op rhs est se
# 12 x3 ~~ x3 0.844 0.091
# 13 x4 ~~ x4 0.371 0.048
# 14 x5 ~~ x5 0.446 0.058
# 18 x9 ~~ x9 0.566 0.071
```

For the item intercepts:

```
print(cfa.stan.fit, pars = c("betas"),
probs = c(.025, .5, .975), digits_summary = 3)
# mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
# betas[2] 6.087 0.001 0.068 5.954 6.089 6.219 2540 1.001
# betas[3] 2.248 0.001 0.066 2.122 2.248 2.381 1980 1.002
# betas[6] 2.182 0.003 0.063 2.058 2.182 2.302 625 1.008
# betas[7] 4.185 0.002 0.066 4.054 4.186 4.315 1791 1.001
# From lavaan:
parameterEstimates(cfa.lav.fit, standardized = TRUE)[25:33, 1:5]
# lhs op rhs est se
# 26 x2 ~1 6.088 0.068
# 27 x3 ~1 2.250 0.065
# 30 x6 ~1 2.186 0.063
# 31 x7 ~1 4.186 0.063
```

So we are able to replicate the results from lavaan. From here, you can extend the model in interesting ways to arrive at other results.

For example, if you want to do a regression of the factors, you can use the posterior of the correlation matrix and the `solve()`

function to arrive at the coefficients for the factors in the regression. Here, I regress factor 1 on both factors 2 and 3:

```
R <- extract(cfa.stan.fit, c("R[1, 2]", "R[1, 3]", "R[2, 3]"))
R <- cbind(R$`R[1,2]`, R$`R[1,3]`, R$`R[2,3]`)
coefs <- matrix(NA, nrow(R), ncol(R) - 1)
for (i in 1:nrow(R)) {
m <- matrix(c(1, R[i, 3], R[i, 3], 1), 2, 2)
coefs[i, ] <- solve(m, R[i, 1:2])
}; rm(i, m)
t(apply(coefs, 2, function (x) {
c(estimate = mean(x), sd = sd(x), quantile(x, c(.025, .25, .5, .75, .975)))
}))
# estimate sd 2.5% 25% 50% 75% 97.5%
# [1,] 0.3362981 0.07248634 0.1918812 0.2877936 0.3387682 0.3875141 0.4725508
# [2,] 0.3605951 0.08466494 0.1996710 0.3027047 0.3594806 0.4164141 0.5308578
```

- This makes the model non-linear in its parameters showing why software for generalized linear mixed models cannot fit this model. ↩
- I’m guessing older software manuals may recommend the inverse-Wishart prior for $\mathbf{\Sigma}$. But see HERE as starting point for comments on LKJ-prior. As a funny side note, LKJ is an acronym for the researchers who recommended this approach, the last author is Harry, last name, Joe. And he developed an earlier form of the method. So in the paper by LKJ, you see references to
*Joe’s method*which seems out of place for an academic paper :). ↩

Comments powered by Talkyard.