survextrap
is an R package under development, to model survival from a combination of
- A standard individual-level, right-censored survival dataset, e.g.
Survival time | Death | Predictors… |
---|---|---|
2 years | Yes | |
5 years | No | |
etc… |
- “External” data sources in the following aggregate “count” form:
Follow-up period | Number | Predictors… | ||
---|---|---|---|---|
Start time t | End time u | Alive at t | Still alive at u | |
t_{1} | u_{1} | n_{1} | r_{1} | |
t_{2} | u_{2} | n_{2} | r_{2} | |
etc… |
Any number of rows can be supplied for the “external” data, and the time intervals do not have to be distinct or exhaustive.
The package has been developed under the expectation that many forms of external data that might be useful for survival extrapolation (such as population data, registry data or elicited judgements) can be manipulated into this common “count” form.
Principles
Extrapolations from short-term individual level data should be done using explicit data or judgements about how risk will change over time.
Extrapolations should not rely on standard parametric forms (e.g. Weibull, log-normal, gamma…) that are only used out of convention and do not have interpretations as plausible mechanisms for how risk will change over time.
Instead of selecting (or averaging) traditional parametric models, an arbitrarily flexible parametric model should be used, that adapts to give the optimal fit to the short-term and long-term data in combination.
How it works
Bayesian multiparameter evidence synthesis is used to jointly model all sources of data and judgements.
An M-spline is used to represent how the hazard changes through time (as in rstanarm). The Bayesian fitting method automatically chooses the optimal level of smoothness and flexibility. Spline “knots” should span the period covered by the data, and any future period where there is a chance that the hazard may vary. Then if there is no data in the future period, the uncertainty will be acknowledged and the predicted hazards will have wide credible intervals.
A proportional hazards model or a flexible non-proportional hazards model can be used to describe the relation of survival to predictors.
Mixture cure, relative survival and treatment effect waning models are supported.
It has an R interface, designed to be friendly to those familiar with standard R modelling functions.
Stan is used under the surface to do MCMC (Hamiltonian Monte Carlo) sampling from the posterior distribution, in a similar fashion to rstanarm and survHE.
Estimates and posterior credible intervals / samples for survival, hazard, mean and restricted mean survival can easily be extracted.
Development
The package is in active development. It can currently fit a large range of useful models, but it is not finished and is subject to be changed without warning.
Major things to do are:
Empirical work to show the impact of priors and knot choice. Can we derive more practically-meaningful default priors for changes in hazard through time, e.g. in terms of orders of magnitude? How much does knot choice matter, in particular with external data?
More experience and examples of using it with real external data, including a vignette that lists how to implement other previously-suggested approaches for extrapolation with external data.
Thorough testing, documentation and error handling.
A paper about it.
If you want to try it out - feel free to install the development version as:
install.packages("survextrap", repos=c('https://chjackson.r-universe.dev', 'https://cloud.r-project.org'))
Please give feedback and suggestions if you do. These can be posted on github issues, or email.