Skip to contents

Extract the data from a multi-state model fitted with msm.


# S3 method for msm
model.frame(formula, agg = FALSE, ...)

# S3 method for msm
model.matrix(object, model = "intens", state = 1, ...)



A fitted multi-state model object, as returned by msm.


Return the model frame in the efficient aggregated form used to calculate the likelihood internally for non-hidden Markov models. This has one row for each unique combination of from-state, to-state, time lag, covariate value and observation type. The variable named "(nocc)" counts how many observations of that combination there are in the original data.


Further arguments (not used).


A fitted multi-state model object, as returned by msm.


"intens" to return the design matrix for covariates on intensities, "misc" for misclassification probabilities, "hmm" for a general hidden Markov model, and "inits" for initial state probabilities in hidden Markov models.


State corresponding to the required covariate design matrix in a hidden Markov model.


model.frame returns a data frame with all the original variables used for the model fit, with any missing data removed (see na.action in msm). The state, time, subject, obstype and obstrue variables are named "(state)", "(time)", "(subject)", "(obstype)" and "(obstrue)" respectively (note the brackets). A variable called "(obs)" is the observation number from the original data before any missing data were dropped. The variable "(pcomb)" is used for computing the likelihood for hidden Markov models, and identifies which distinct time difference, obstype and covariate values (thus which distinct interval transition probability matrix) each observation corresponds to.

The model frame object has some other useful attributes, including "usernames" giving the user's original names for these variables (used for model refitting, e.g. in bootstrapping or cross validation) and "covnames" identifying which ones are covariates.

model.matrix returns a design matrix for a part of the model that includes covariates. The required part is indicated by the "model"


For time-inhomogeneous models fitted with "pci", these datasets will have imputed observations at each time change point, indicated where the variable "(pci.imp)" in the model frame is 1. The model matrix for intensities will have factor contrasts for the timeperiod covariate.


C. H. Jackson