Misclassification multi-state models in msmbayes
Christopher Jackson chris.jackson@mrc-bsu.cam.ac.uk
2025-08-28
Source:vignettes/misclass.Rmd
misclass.Rmd
In a hidden Markov model, an individual moves between a set of latent, unobserved states according to a Markov process. The observed data are generated conditionally on the latent states.
Multi-state models with misclassification
Hidden Markov models can be used to account for misclassification of states in a multi-state model.
To fit a misclassification multi-state models in
msmbayes
, the structure of allowed misclassifications is
supplied in the E
argument (the “e” stands for “emission”).
This is a matrix with off-diagonal entries:
1 if true state [row number] can be misclassified as [column number]
0 if true state [row number] cannot be misclassified as [column number]
The diagonal entries of E
are ignored (as for the
Q
argument).
The following example is discussed in the msm
user guide (Section 2.14). We model progression between three states
of CAV (a disease experienced by heart transplant recipients), and allow
death from any of these states. True state 1 can be misclassified as 2,
true state 2 can be misclassified as 1 or 3, and true state 3 can be
misclassified as 2.
For speed in this demo, we use Stan’s "optimize"
method,
which uses the simple normal approximation to the posterior. MCMC would
probably be more sensible in a real application.
library(msmbayes)
library(msm)
Qcav <- rbind(c(0, 1, 0, 1),
c(0, 0, 1, 1),
c(0, 0, 0, 1),
c(0, 0, 0, 0))
Ecav <- rbind(c(0, 1, 0, 0),
c(1, 0, 1, 0),
c(0, 1, 0, 0),
c(0, 0, 0, 0))
draws <- msmbayes(data=cav, state="state", time="years", subject="PTNUM",
Q=Qcav, E=Ecav, fit_method="optimize")
qmatrix(draws)
## rvar<4000>[4,4] mean ± sd:
## [,1] [,2] [,3] [,4]
## [1,] -0.143 ± 0.0087 0.096 ± 0.0078 0.000 ± 0.0000 0.047 ± 0.0048
## [2,] 0.000 ± 0.0000 -0.279 ± 0.0288 0.214 ± 0.0304 0.065 ± 0.0233
## [3,] 0.000 ± 0.0000 0.000 ± 0.0000 -0.369 ± 0.0490 0.369 ± 0.0490
## [4,] 0.000 ± 0.0000 0.000 ± 0.0000 0.000 ± 0.0000 0.000 ± 0.0000
The function edf
extracts the misclassification (or
“emission”) probabilities in a tidy data frame form.
edf(draws)
## # A tibble: 4 × 4
## from to posterior mode
## <int> <int> <rvar[1d]> <dbl>
## 1 1 2 0.014 ± 0.0041 0.0134
## 2 2 1 0.228 ± 0.0367 0.226
## 3 2 3 0.065 ± 0.0160 0.0629
## 4 3 2 0.140 ± 0.0395 0.136
An identical non-Bayesian model is fitted using
msm()
.
Note: this is different from the model fitted in the
msm
manual, since “exact death times” are not supported inmsmbayes
. Also note thatmsm
requires informative initial values for the non-zero intensities and misclassification probabilities here. For hidden Markov models,msm
is not smart enough to determine good initial values automatically given the transition structure.
Qcav <- rbind(c(0, 0.148, 0, 0.0171), c(0, 0, 0.202, 0.081),
c(0, 0, 0, 0.126), c(0, 0, 0, 0))
Ecav <- rbind(c(0, 0.1, 0, 0), c(0.1, 0, 0.1, 0),
c(0, 0.1, 0, 0), c(0, 0, 0, 0))
cav.msm <- msm(state ~ years, subject=PTNUM, data=cav, qmatrix=Qcav, ematrix=Ecav)
qmatrix.msm(cav.msm, ci="none")
## State 1 State 2 State 3 State 4
## State 1 -0.1452746 0.09855669 0.0000000 0.04671790
## State 2 0.0000000 -0.26352026 0.2013230 0.06219724
## State 3 0.0000000 0.00000000 -0.3671024 0.36710242
## State 4 0.0000000 0.00000000 0.0000000 0.00000000
ematrix.msm(cav.msm, ci="none")
## State 1 State 2 State 3 State 4
## State 1 0.9919222 0.008077811 0.00000000 0
## State 2 0.2379674 0.710858571 0.05117404 0
## State 3 0.0000000 0.112813389 0.88718661 0
## State 4 0.0000000 0.000000000 0.00000000 1
The parameter estimates from msm
are close to those from
msmbayes
, with any differences explainable by the influence
of the weak prior.
Specifiying prior distributions for misclassification probabilities
In msmbayes
, normal prior distributions are assumed for
the log odds of misclassification. Denote the misclassification
error probability by \(e_{rs}\), the
probability that an individual in state \(r\) is observed in state \(s\). The corresponding log odds of
misclassification is \(log(e_{rs} /
e_{rr})\), the log odds of being misclassified in state \(s\), relative to no misclassification.
The default normal(0,1) for these log odds parameters is intended to give a roughly uniform distribution on the scale of probabilities.
To specify the mean and SD of these normal priors by hand, use
msmprior
as follows, with loe(r,s)
indicating
the log-odds of misclassication of state \(r\) as state \(s\).
priors <- list(msmprior("loe(1,2)", mean=-2, sd=0.2))
draws_prior <- msmbayes(data=cav, state="state", time="years", subject="PTNUM",
Q=Qcav, E=Ecav, fit_method="optimize", priors=priors)
edf(draws_prior)
## # A tibble: 4 × 4
## from to posterior mode
## <int> <int> <rvar[1d]> <dbl>
## 1 1 2 0.039 ± 0.0043 0.0383
## 2 2 1 0.165 ± 0.0328 0.163
## 3 2 3 0.080 ± 0.0192 0.0773
## 4 3 2 0.135 ± 0.0383 0.130
If there is only one potential misclassification \(s\) for some state \(r\), then the log odds of misclassification is just the standard logit of \(e_{rs}\). In the above model, the prior median 95% credible interval implied by the normal(-2, 0.2) prior can be deduced by taking the inverse logit of the prior quantiles. This prior is fairly tight around a misclassification probability of 0.1, and appears to have the effect of pulling it away from the value estimated from the data.
## [1] 0.08378533 0.11920292 0.16686547
With multiple misclassification possibilities per true state, a multinomial logit transform is needed. To deduce the prior beliefs about probabilities implied by a particular prior mean and SD, a simple approach is to use simulation. For a particular true state \(r\), simulate from the normal priors for all potential observed states \(s\), then use an inverse multinomial logit transform to deduce the corresponding sample for the set of \(e_{rs}\), which satisfies \(\sum_s e_{rs} = 1\).
Fixed misclassification probabilities
Misclassification error probabilities in multi-state models for
intermittently-observed data are often not identifiable from data.
Typically, background information about the observation process is
needed. If there is good evidence about the error proabilities, it may
be sufficient to fix these at constant values. In msmbayes
,
this can be done with the Efix
argument. This is a matrix
matching the dimensions of E
, but with any fixed error
probabilities supplied in the appropriate places, and zero elsewhere.
The following model fixes the probability that true state 1 is
misclassified as 2 to 0.1.
Efix <- rbind(c(0, 0.1, 0, 0), c(0, 0, 0, 0),
c(0, 0, 0, 0), c(0, 0, 0, 0))
draws_fix <- msmbayes(data=cav, state="state", time="years", subject="PTNUM",
Q=Qcav, E=Ecav, Efix=Efix, fit_method="optimize")
Using a prior is a compromise between fixing these parameters and attempting to identify them from data. An advantage of the Bayesian approach is that, as long as the computational algorithm works, we have a valid posterior. Then if the marginal posterior for this parameter is the same as the prior, we can deduce there is no information in the data about this particular parameter. If this prior is defensible, we still have a useful model for the data.