Generate a dataset from the prior predictive distribution in a msmbayes model
Source:R/prior_sample.R
msmbayes_priorpred_sample.RdThis generates a single sample of parameters from the prior, then
generates observed states from a multi-state model with those
parameters. The data argument should contain the time and
subject indicators at which states are to be simulated (by default),
or the maximum observation time (if complete_obs=FALSE).
Usage
msmbayes_priorpred_sample(
data,
state = "state",
time = "time",
subject = "subject",
Q,
covariates = NULL,
pastates = NULL,
pafamily = "gamma",
panphase = NULL,
nphase = NULL,
E = NULL,
priors = NULL,
complete_obs = FALSE,
cov_format = "orig"
)Arguments
- data
Data frame giving the observed data.
- state
Character string naming the observed state variable in the data. This variable must either be an integer in 1,2,...,K, where K is the number of states, or a factor with these integers as level labels. If omitted, this is assumed to be
"state".- time
Character string naming the observation time variable in the data. If omitted, this is assumed to be
"time".- subject
Character string naming the individual ID variable in the data. If omitted, this is assumed to be
"subject".- Q
Matrix indicating the transition structure. A zero entry indicates that instantaneous transitions from (row) to (column) are disallowed. An entry of 1 (or any other positive value) indicates that the instantaneous transition is allowed. The diagonal of
Qis ignored.There is no need to "guess" initial values and put them here, as is sometimes done in
msm. Initial values for fitting are determined by Stan from the prior distributions, and the specific values supplied for positive entries ofQare disregarded.- covariates
Specification of covariates on transition intensities. This should be a list of formulae, or a single formula.
If a list is supplied, each formula should have a left-hand side that looks like
Q(r,s), and a right hand side defining the regression model for the log of the transition intensity from state \(r\) to state \(s\).For example,
covariates = list(Q(1,2) ~ age + sex, Q(2,1) ~ age)specifies that the log of the 1-2 transition intensity is an additive linear function of age and sex, and the log 2-1 transition intensity is a linear function of age. You do not have to list all of the intensities here if some of them are not influenced by covariates.
If a single formula is supplied, this is assumed to apply to all intensities. If doing this, then take care with potential lack of identifiability of effects from sparse data.
In models with phase-type approximated states (specified with
pastates), covariates are modelled through an accelerated failure time model. The effect is a multiplier on the scale parameter of the sojourn distribution. The covariate then has an identical multiplicative effect on all rates of transition between phases for a given state. The left hand side of the formula should containscaleinstad ofQ. For example, if state 1 has a phase type approximation, but state 2 is Markov, then we might supplycovariatesas:covariates = list(scale(1) ~ age + sex, Q(2,1) ~ age)In models with phase-type approximations and competing exit states, covariates on the relative risk of different exit states are specified with a formula with
rrnexton the left hand side. For example in a model where state 1 has a phase-type approximation, and the next state could be either 2 or 3, a linear model on the log relative risk of transition to 3 (relative to the baseline 2) might be specified as:covariates = list(scale(1) ~ age + sex, rrnext(1,3) ~ x + time)In phase-type models specified with
nphase, or misclassification models (specified withE), covariates on transition intensities are specified withQ(), where the numbers insideQ()refer to the latent state space.- pastates
This indicates which states (if any) are given a Weibull or Gamma sojourn distribution approximated by a phase-type model. Ignored if
nphaseis supplied.- pafamily
"weibull"or"gamma", indicating the approximated sojourn distribution in the phased state. Either a vector of the same length aspastates, or just one to apply to all states.- panphase
Number of phases to use for each state given a phase-type Gamma or Weibull approximation. Vector of same length as
pastates. More phases allow a wider range of shape parameters.- nphase
Only required for models with phase-type sojourn distributions specified directly (not through
pastates).nphaseis a vector with one element per state, giving the number of phases per state. This element is 1 for states that do not have phase-type sojourn distributions.- E
By default,
msmbayesfits a (non-hidden) Markov model. IfEis supplied, then a Markov model with misclassification is fitted, a type of hidden Markov model.Eshould then be a matrix indicating the structure of allowed misclassifications, where rows are the true states, and columns are the observed states. A zero entry in row \(r\) and column \(s\) indicates that true state \(r\) cannot be observed as state \(s\). A non-zero \((r,s)\) entry indicates that true state \(r\) may be misclassified as \(s\). The diagonal ofEis ignored.- priors
A list specifying priors. Each component should be the result of a call to
msmprior. Any parameters with priors not specified here are given default priors: normal with mean -2 and SD 2 for log intensities, normal with mean 0 and SD 10 for log hazard ratios, normal(0,1) for log odds parameters in misclassification models.In phase-type approximation models, the default priors are normal with mean 2, SD 2 for scale parameters (i.e. the log inverse of the default prior for the rate), normal(0, SD=0.5) truncated on the supported region for log shape parameters, and normal(0,1) for log odds of transition (relative to first exit state) in structures with competing exit states.
See
msmpriorfor more details.If only one parameter is given a non-default prior, a single
msmpriorcall can be supplied here instead of a list.Maximum likelihood estimation can be performed by setting
priors="mle", and usingfit_method="optimize". This is equivalent to estimating the posterior mode with improper uniform priors on the unconstrained parameter space (i.e. positive parameters on the log scale). Uncertainty is then quantified by sampling from the multivariate normal defined by the Hessian at the mode . The sample can be summarised to produce confidence intervals, as in theci="normal"method in themsmpackage. These are equivalent to credible intervals from a Laplace approximation to the posterior.- complete_obs
If
complete_obs=FALSE(the default) intermittently-observed states are generated for the subjects and times supplied in thedataargument, usingmsm::simmulti.msm. The returned object is a data frame made by appending these states todata.If
complete_obs=TRUE, one complete state transition history is generated usingmsm::sim.msm. Thedataargument should then consist of one row, withtimegiving the maximum observation time, and any covariates supplied, assumed to be time-constant. The returned object is a list.- cov_format
If
"orig"the covariates are in their original form that they were supplied as. If"design"(or any other value) the covariates are returned as a design matrix, i.e. with factors converted to numeric contrasts.
Value
A data frame or a list, see msm::simmulti.msm or msm::sim.msm respectively.