Skip to contents

Converts longitudinal data for a msm model fit, where observations represent the exact transition times of the process, to counting process data. This enables, for example, flexible parametric multi-state models to be fitted with flexsurvreg from the flexsurv package, or semiparametric models to be implemented with coxph and the mstate package.

Usage

msm2Surv(data, subject, time, state, covs = NULL, Q)

Arguments

data

Data frame in the format expected by a msm model fit with exacttimes=TRUE or all obstype=2. Each row represents an observation of a state, and the time variable contains the exact and complete transition times of the underlying process. This is explained in more detail in the help page for msm, section obstype=2.

subject

Name of the subject ID in the data (character format, i.e. quoted).

time

Name of the time variable in the data (character).

state

Name of the state variable in the data (character).

covs

Vector of covariate names to carry through (character). If not supplied, this is taken to be all remaining variables in the data.

Q

Transition intensity matrix. This should have number of rows and number of columns both equal to the number of states. If an instantaneous transition is not allowed from state \(r\) to state \(s\), then Q should have \((r,s)\) entry 0, otherwise it should be non-zero. The diagonal entries are ignored.

Value

A data frame of class "msdata", with rows representing observed or censored transitions. There will be one row for each observed transition in the original data, and additional rows for every potential transition that could have occurred out of each observed state.

The data frame will have columns called:

id

Subject ID

from

Starting state of the transition

to

Finishing state of the transition

Tstart

The starting time of the transition

Tstop

The finishing time of the transition

time

The time difference = Tstop - Tstart

status

Event or censoring indicator, with 1 indicating an observed transition, and 0 indicating censoring

trans

Transition number

and any remaining columns will represent covariates. Any covariates whose names clash with the standard variables in the returned data ("id", "from", "to", "Tstart", "Tstop", "time", "status" or "trans") have ".2" appended to their names.

The transition matrix in mstate format is stored in the trans

attribute of the returned object. See the example code below.

Details

For example, if the data supplied to msm look like this:

subjdaysstatusagetreat
101661
1272661
1753661
1974661
111064691
201490
2902490
210372510

then the output of msm2Surv will be a data frame looking like this:

idfromtoTstartTstoptimestatusagetreattrans1
12027271661111
40272706612123
2775481661312427
7548066141347597
221691521209090
14901214090900
4902223901037947049
032249010379470490

At 27 days, subject 1 is observed to move from state 1 to state 2 (first row, status 1), which means that their potential transition from state 1 to state 4 is censored (second row, status 0).

See the mstate package and the references below for more details of this data format and using it for semi-parametric multi-state modelling.

References

Putter H, Fiocco M, Geskus RB (2007). Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine 26: 2389-2430.

Liesbeth C. de Wreede, Marta Fiocco, Hein Putter (2011). mstate: An R Package for the Analysis of Competing Risks and Multi-State Models. Journal of Statistical Software, 38(7), 1-30.

Jackson, C. H. (2014). flexsurv: Flexible parametric survival and multi-state models. R package version 0.5.

See also

msprep, in mstate, which produces data in a similar format, given data in "wide" format with one row per subject.

Author

C. H. Jackson chris.jackson@mrc-bsu.cam.ac.uk

Examples


msmdat <- data.frame(
 subj = c(1, 1, 1, 1, 1, 2, 2, 2),
 days = c(0, 27, 75, 97, 1106, 0, 90, 1037),
 status = c(1, 2, 3, 4, 4, 1, 2, 2),
 age = c(66, 66, 66, 66, 69, 49, 49, 51),
 treat = c(1, 1, 1, 1, 1, 0, 0, 0)
)
# transitions only allowed to next state up or state 4
Q <- rbind(c(1, 1, 0, 1), 
           c(0, 1, 1, 1),
           c(0, 0, 1, 1),
           c(0, 0, 0, 0))
dat <- msm2Surv(data=msmdat, subject="subj", time="days", state="status", 
         Q=Q)
dat
#> An object of class 'msdata'
#> 
#> Data:
#>   id from to Tstart Tstop time status age treat trans
#> 1  1    1  2      0    27   27      1  66     1     1
#> 2  1    1  4      0    27   27      0  66     1     2
#> 3  1    2  3     27    75   48      1  66     1     3
#> 4  1    2  4     27    75   48      0  66     1     4
#> 5  1    3  4     75    97   22      1  66     1     5
#> 6  2    1  2      0    90   90      1  49     0     1
#> 7  2    1  4      0    90   90      0  49     0     2
#> 8  2    2  3     90  1037  947      0  49     0     3
#> 9  2    2  4     90  1037  947      0  49     0     4
attr(dat, "trans")
#>     to
#> from  1  2  3  4
#>    1 NA  1 NA  2
#>    2 NA NA  3  4
#>    3 NA NA NA  5
#>    4 NA NA NA NA