I have read that the most common model for analyzing ordinal data is the cumulative link logistic model, coupled with the proportional odds assumption. Essentially, you treat the outcome as if it were the categorical manifestation of a continuous latent variable. The predictor variables of this outcome influence it in one way only, so you get a single regression coefficient for each predictor. But the model has several intercepts representing the points at which the variable was cut to create the observed categorical manifestation.
That each predictor variable influences the outcome in one way  as you have in commonplace regression models  is the proportional odds assumption or constraint. Alternatively, one can have each predictor variable influence the outcome differently at each cutpoint. This would be relaxing the proportional odds constraint.
In Agresti’s Categorical Data Analysis,^{1} he writes, “McCullagh (1980) and Thompson and Baker (1981) treated cumulative link models as multivariate GLMs” when discussing estimation for cumulative link models. To unpack this claim, I did some extra reading and it’s relative easy to see how. Consider an ordinal variable that has 4 levels: 1, 2, 3, 4. This ordinal variable can be expressed as three binary variables. And there are a variety of ways to do this, I will use the cumulative link parameterization as an example.
Since there are 4 levels, there are three transitions (aka thresholds, cutpoints): from 1 to 2, 2 to 3 and 3 to 4. Using the cumulative link parameterization, we generate a new variable per transition point as shown in the table below.
$$\begin{array}{cccc} & 1  2 & 2  3 & 3  4\\ \hline 1 & 0 & 0 & 0 \\ 2 & 1 & 0 & 0 \\ 3 & 1 & 1 & 0 \\ 4 & 1 & 1 & 1\end{array}$$
A 1 is less than all the cutpoints, so becomes three zeros; a 4 exceeds all the cutpoints and becomes three ones. Given 4 levels, each respondent now has three responses, one per cutpoint, hence the description of this ordinal model as a multivariate GLM. See the bottom of this page for the adjacent categories parameterization.
How does one model this using univariate GLM software? The UCLA idre page has this article on multivariate random coefficient models. It’s relevant here because they use nlme
(a univariate linear mixed models software) to model a multivariate outcome. The essential idea is in stacking the data, making it a kind of repeated measures, but finding a way to signal to the software that the outcomes are different, requiring different intercepts and slopes for the predictors.
So what we do is we transform the data from wide to long, model it as a regular binomial, but we need to tell the model to estimate a different intercept for each level. If we want proportional odds, then we can estimate a single coefficient for each predictor. If not, we can do otherwise using interaction terms as we will see. The unique challenge is the outcomes are correlated, so we need a model that can model this correlation.^{2} For this, I use general estimating equations (GEE) with an unstructured
working correlation structure.^{3}
Demonstration
library(ordinal) # For ordinal regression to check our results
library(geepack) # For GEE with binary data
I’ll use the soup
dataset from the ordinal package.
soup < ordinal::soup
soup$ID < 1:nrow(soup) # Create a person ID variable
str(soup)
'data.frame': 1847 obs. of 13 variables:
$ RESP : Factor w/ 185 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ PROD : Factor w/ 2 levels "Ref","Test": 1 2 1 2 1 2 2 2 2 1 ...
$ PRODID : Factor w/ 6 levels "1","2","3","4",..: 1 2 1 3 1 6 2 4 5 1 ...
$ SURENESS: Ord.factor w/ 6 levels "1"<"2"<"3"<"4"<..: 6 5 5 6 5 5 2 5 5 2 ...
$ DAY : Factor w/ 2 levels "1","2": 1 1 1 1 2 2 2 2 2 2 ...
$ SOUPTYPE: Factor w/ 3 levels "Selfmade","Canned",..: 2 2 2 2 2 2 2 2 2 2 ...
$ SOUPFREQ: Factor w/ 3 levels ">1/week","14/month",..: 1 1 1 1 1 1 1 1 1 1 ...
$ COLD : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
$ EASY : Factor w/ 10 levels "1","2","3","4",..: 7 7 7 7 7 7 7 7 7 7 ...
$ GENDER : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 2 2 2 2 2 ...
$ AGEGROUP: Factor w/ 4 levels "1830","3140",..: 4 4 4 4 4 4 4 4 4 4 ...
$ LOCATION: Factor w/ 3 levels "Region 1","Region 2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
I’ll use the SURENESS
variable. It has 6 levels. I’ll model it with the DAY
and GENDER
variables.
# Select variables to work with
soup < dplyr::select(soup, ID, SURENESS, DAY, GENDER)
# I like dummy variables with recognizable names
soup$girl < ifelse(soup$GENDER == "Female", 1, 0) # Make male reference group
soup$day2 < ifelse(soup$DAY == "2", 1, 0) # Make day 1 reference group
The next step is to transform the ordinal outcome into 5 outcomes representing each of the thresholds:
soup$SURE2 < ifelse(as.numeric(as.character(soup$SURENESS)) >= 2, 1, 0)
soup$SURE3 < ifelse(as.numeric(as.character(soup$SURENESS)) >= 3, 1, 0)
soup$SURE4 < ifelse(as.numeric(as.character(soup$SURENESS)) >= 4, 1, 0)
soup$SURE5 < ifelse(as.numeric(as.character(soup$SURENESS)) >= 5, 1, 0)
soup$SURE6 < ifelse(as.numeric(as.character(soup$SURENESS)) >= 6, 1, 0)
With that done, we’re ready for the wide to long transformation of these 5 new outcome variables.
soup.long < tidyr::gather(soup, SURE, VAL, SURE2:SURE6) # New long data frame
soup.long < soup.long[order(soup.long$ID), ] # Order the data by person
head(soup.long$SURE) # Currently looks like:
[1] "SURE2" "SURE3" "SURE4" "SURE5" "SURE6" "SURE2"
# I only want 2, 3, 4, ... the final number
soup.long$SURE < as.numeric(substr(soup.long$SURE, 5, 5))
soup.long$SURE.f < factor(soup.long$SURE) # Create a factor variable
head(soup.long) # Let's look at the data
ID SURENESS DAY GENDER girl day2 SURE VAL SURE.f
1 1 6 1 Female 1 0 2 1 2
1848 1 6 1 Female 1 0 3 1 3
3695 1 6 1 Female 1 0 4 1 4
5542 1 6 1 Female 1 0 5 1 5
7389 1 6 1 Female 1 0 6 1 6
2 2 5 1 Female 1 0 2 1 2
Person 1 is a girl with day 1. Her response was a 6, hence, all her VAL
scores are 1. She is like 4 was in the transformation table above. Let’s look at someone who did not select the highest response category:
# First 5 rows in data with a four, since data ordered by person, returns an individual's record
soup.long[soup.long$SURENESS == 4, ][1:5, ]
ID SURENESS DAY GENDER girl day2 SURE VAL SURE.f
22 22 4 1 Female 1 0 2 1 2
1869 22 4 1 Female 1 0 3 1 3
3716 22 4 1 Female 1 0 4 1 4
5563 22 4 1 Female 1 0 5 0 5
7410 22 4 1 Female 1 0 6 0 6
This individual selected a SURENESS
of 4. Her first three VAL
scores are 1 because she cleared the first three thresholds. Her final two scores are 0 because 4 is less than the 4  5 threshold and 5  6 threshold.
The next step is to create dummy variables for the thresholds. These variables will be used to represent the intercept in our models.
# Taking a short cut to create dummy variables.
soup.long < cbind(soup.long, model.matrix(
~ 0 + SURE.f, data = soup.long,
contrasts.arg = list(SURE.f = "contr.treatment")) * 1)
Note that I multiplied the dummy variables by 1. In ordinal regression, doing this makes interpretation easier and is the default option in the ordinal
package. In summary, it ensures that positive coefficients increase the odds of moving from a lower category, say 3, to a higher category, 4, or responding to a higher response category.
And now, we are ready to run the models. We use GEE, remove the intercept and place a dummy variable for each threshold, and the two coefficients for day2
and girl
. The correlation structure is unstructured
.
pom.bin < geeglm( # Prop odds model
VAL ~ 0 + SURE.f2 + SURE.f3 + SURE.f4 + SURE.f5 + SURE.f6 + girl + day2,
id = ID, data = soup.long, family = binomial, corstr = "unstructured"
)
Next, I estimate the model using standard ordinal regression software:
pom.ord < clm(SURENESS ~ girl + day2, data = soup)
Let’s compare coefficients and standard errors:
round(cbind(coef(summary(pom.bin)), coef(summary(pom.ord)))[, c(1, 5, 2, 6, 3, 7, 4, 8)], 5)
Estimate Estimate.1 Std.err Std. Error Wald z value Pr(>W) Pr(>z)
SURE.f2 2.13244 2.13155 0.10454 0.10450 416.0946 20.3971 0.0000 0.0000
SURE.f3 1.19345 1.19259 0.09142 0.09232 170.4284 12.9179 0.0000 0.0000
SURE.f4 0.89164 0.89079 0.08979 0.09011 98.5995 9.8857 0.0000 0.0000
SURE.f5 0.65782 0.65697 0.08945 0.08898 54.0791 7.3833 0.0000 0.0000
SURE.f6 0.04558 0.04477 0.08801 0.08789 0.2682 0.5093 0.6046 0.6105
girl 0.04932 0.04917 0.09036 0.09074 0.2980 0.5419 0.5851 0.5879
day2 0.26172 0.26037 0.08584 0.08579 9.2954 3.0351 0.0023 0.0024
Placing the results side by side, one can see that they are very close.
However, estimation using glm()
which cannot model dependencies between the outcomes of a person yields different results. For demonstration:
round(coef(summary(glm(
VAL ~ 0 + SURE.f2 + SURE.f3 + SURE.f4 + SURE.f5 + SURE.f6 + girl + day2,
data = soup.long, family = binomial
))), 5)
Estimate Std. Error z value Pr(>z)
SURE.f2 2.15144 0.08255 26.062 0.0000
SURE.f3 1.21271 0.06736 18.004 0.0000
SURE.f4 0.91149 0.06472 14.084 0.0000
SURE.f5 0.67782 0.06327 10.713 0.0000
SURE.f6 0.06523 0.06178 1.056 0.2911
girl 0.07326 0.04961 1.477 0.1398
day2 0.26898 0.04653 5.780 0.0000
Both the estimates and standard errors are inadequate.
We can easily relax the proportional odds constraint in the pom.bin
model. Let’s run what some call a partial proportional odds model by relaxing this constraint for the day2
predictor. We do this by estimating an interaction between the threshold dummies and the day2
predictor. Note that there is no effect for the day2
variable itself, it only appears in the interaction terms with the various intercepts.
npom.bin < geeglm( # Partial prop odds, prop odds for girl only
VAL ~ 0 + SURE.f2 + SURE.f3 + SURE.f4 + SURE.f5 + SURE.f6 + girl +
# SURE.f2:girl + SURE.f3:girl + SURE.f4:girl + SURE.f5:girl + SURE.f6:girl +
SURE.f2:day2 + SURE.f3:day2 + SURE.f4:day2 + SURE.f5:day2 + SURE.f6:day2,
id = ID, data = soup.long, family = binomial, corstr = "unstructured"
)
I also run the same model in ordinal
for comparison, using the nominal argument for day2
.
npom.ord < clm( # Using ordinal package for comparison
SURENESS ~ girl, data = soup, nominal = ~ day2)
# Compare results
round(cbind(coef(summary(npom.bin))[c(1:5, 7:11, 6), ],
coef(summary(npom.ord)))[, c(1, 5, 2, 6, 3, 7, 4, 8)], 5)
Estimate Estimate.1 Std.err Std. Error Wald z value Pr(>W) Pr(>z)
SURE.f2 2.02982 2.03106 0.11800 0.11834 295.8986 17.1630 0.00000 0.00000
SURE.f3 1.22087 1.22213 0.09829 0.09857 154.2801 12.3980 0.00000 0.00000
SURE.f4 0.92773 0.92899 0.09458 0.09443 96.2112 9.8375 0.00000 0.00000
SURE.f5 0.65744 0.65870 0.09246 0.09188 50.5554 7.1693 0.00000 0.00000
SURE.f6 0.04733 0.04859 0.08955 0.08965 0.2793 0.5420 0.59714 0.58784
SURE.f2:day2 0.07359 0.07360 0.14148 0.14155 0.2705 0.5199 0.60298 0.60312
SURE.f3:day2 0.31691 0.31697 0.10607 0.10613 8.9270 2.9867 0.00281 0.00282
SURE.f4:day2 0.33301 0.33308 0.09970 0.09973 11.1551 3.3398 0.00084 0.00084
SURE.f5:day2 0.26330 0.26339 0.09618 0.09616 7.4938 2.7391 0.00619 0.00616
SURE.f6:day2 0.26741 0.26748 0.09347 0.09345 8.1842 2.8622 0.00423 0.00421
girl 0.04809 0.04994 0.09048 0.09077 0.2825 0.5502 0.59507 0.58221
The results are again comparable.
We can now compare the partial proportional odds binary model to the proportional odds binary model to test the constraint for the day2
variable. Luckily, geepack
permits anova()
which conducts a Wald test comparing both models:
anova(pom.bin, npom.bin) # Testing partial proportional odds assumption.
Analysis of 'Wald statistic' Table
Model 1 VAL ~ 0 + SURE.f2 + SURE.f3 + SURE.f4 + SURE.f5 + SURE.f6 + girl + SURE.f2:day2 + SURE.f3:day2 + SURE.f4:day2 + SURE.f5:day2 + SURE.f6:day2
Model 2 VAL ~ 0 + SURE.f2 + SURE.f3 + SURE.f4 + SURE.f5 + SURE.f6 + girl + day2
Df X2 P(>Chi)
1 4 6.94 0.14
The difference between both models was not statistically significant, suggesting the proportionality constraint for the day2
variable suffices.
We can conduct the same test in ordinal
either by comparing pom.ord
and npom.ord
models using anova()
, or using the nomimal_test()
function. Both are likelihood ratio tests, which are more adequate than the Wald test for the GEEs above.
anova(pom.ord, npom.ord)
Likelihood ratio tests of cumulative link models:
formula: nominal: link: threshold:
pom.ord SURENESS ~ girl + day2 ~1 logit flexible
npom.ord SURENESS ~ girl ~day2 logit flexible
no.par AIC logLik LR.stat df Pr(>Chisq)
pom.ord 7 5554 2770
npom.ord 11 5555 2766 6.91 4 0.14
nominal_test(pom.ord)
Tests of nominal effects
formula: SURENESS ~ girl + day2
Df logLik AIC LRT Pr(>Chi)
<none> 2770 5554
girl 4 2766 5554 8.02 0.091 .
day2 4 2766 5555 6.91 0.141

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Both tests converge to the same result, and also give the same pvalues for the Wald test comparing the GEE models. However, the Wald\(\chi^2\) test statistic was slightly higher.
Having done this, of course using a package for ordinal data is much easier. However, there may be some advantage to treating the model as binary, but all this was driven by curiosity rather than necessity. For some reason I am yet to figure out, ordinal
returns only one set of fitted probabilities when one attempts to obtain predicted probabilities from the model using the fitted()
function. Ideally, it should return a fitted probability for each threshold. With geepack
, obtaining the predicted probabilities at each level is straightforward using the fitted()
function. However, this advantage is trivial. For one, VGAM
’s vglm()
does not suffer the same problem, and calculating predicted probabilities is a relatively simple task.
Also, if one is familiar with maximum likelihood estimation, one could simply program the likelihood function. It is the product of the probability of responding to each category within each person, followed by the product of these values across persons. If we take the log likelihood instead, we arrive at sums across persons, and within persons.
Syntax for the example above in the proportional odds case would be:
library(stats4) # For maximum likelihood estimation
# The coefficients from glm are useful starting values
# Using them can remove the need for programming monotonicity of thresholds
start.coefs < coef(glm( # Reverse code the thresholds
VAL ~ 0 + SURE.f + girl + day2, binomial, soup.long))
X < cbind(soup$girl, soup$day2) # matrix of predictors
y < (soup$SURENESS) # response variable
inv.logit < function(x) 1 / (1 + exp(x)) # useful within likelihood function
negll < function(a1, a2, a3, a4, a5, bg, bd) {
betas < c(bg, bd) # Reverse coefs
# sums of probabilities, probability of 6 + prob of 5 + ...
ll < sum(log(1  inv.logit(a5 + X %*% betas))[y == 6]) +
sum(log(inv.logit(a5 + X %*% betas)  inv.logit(a4 + X %*% betas))[y == 5]) +
sum(log(inv.logit(a4 + X %*% betas)  inv.logit(a3 + X %*% betas))[y == 4]) +
sum(log(inv.logit(a3 + X %*% betas)  inv.logit(a2 + X %*% betas))[y == 3]) +
sum(log(inv.logit(a2 + X %*% betas)  inv.logit(a1 + X %*% betas))[y == 2]) +
sum(log(inv.logit(a1 + X %*% betas))[y == 1])
ll # return negative log likelihood
}
# Starting values need named list
start.coefs < sapply(start.coefs, list)
names(start.coefs) < c(paste0("a", 1:5), paste0("b", c("g", "d")))
res < mle(negll, start.coefs) # Obtain ML solution
coef(summary(res))
Estimate Std. Error
a1 2.13155603 0.10450286
a2 1.19259266 0.09232077
a3 0.89079068 0.09010891
a4 0.65697671 0.08898063
a5 0.04477565 0.08788869
bg 0.04917604 0.09073602
bd 0.26037369 0.08578617
coef(summary(pom.ord))
Estimate Std. Error z value Pr(>z)
12 2.13155281 0.10450291 20.3970663 1.775532e92
23 1.19259171 0.09232091 12.9178937 3.567748e38
34 0.89078590 0.09010896 9.8856524 4.804418e23
45 0.65697465 0.08898068 7.3833401 1.543671e13
56 0.04476553 0.08788871 0.5093434 6.105115e01
girl 0.04917245 0.09073601 0.5419287 5.878676e01
day2 0.26037360 0.08578617 3.0351465 2.404188e03
The results are pretty similar, and for a more certain way of comparing models, we can always compare loglikelihoods:
logLik(res)
'log Lik.' 2769.784 (df=7)
logLik(pom.ord)
'log Lik.' 2769.784 (df=7)
Adjacent categories parameterization:
$$\begin{array}{cccc} & 1  2 & 2  3 & 3  4\\ \hline 1 & 0 & & \\ 2 & 1 & 0 & \\ 3 & & 1 & 0 \\ 4 & & & 1\end{array}$$

Agresti, A. (2013). Categorical data analysis. WileyInterscience. ↩︎

Under normal circumstances, we would also need to inform the software that the variances are different for each outcome. However this is the binomial regression, and the variance of the latent variable is fixed at \(\frac{pi^2}{3}\). If this were an attempt at multivariate OLS regression, one would have to model the dispersion separately for each variable. I believe the
sformula =
argument in thegeese()
function ingeepack
should make this possible. ↩︎ 
I use an
unstructured
working correlation structure because it allows the estimation of the correlation between each pair of outcomes. In the example here, there are five outcomes, so we would estimate 10 \(\big(\frac{5 \times 4}{2}\big)\) correlations. Other alternatives do not permit this:independence
places 1 on diagonal, and 0 off diagonal and produces the same parameter estimates asglm()
;exchangeable
forces all off diagonal elements to be the same (compound symmetry in ANOVA terminology) and would be overly constraining; andar1
uses a specification that is more adequate for time ordered data. ↩︎
Comments powered by Talkyard.