Skip to contents

Simulate a dataset from the prior predictive distribution of survival times in an M-spline survival model. Additive hazards models not currently supported.

Usage

prior_pred(
  n,
  fix_prior = FALSE,
  mspline,
  censtime = Inf,
  coefs_mean = NULL,
  prior_hscale = p_normal(0, 20),
  prior_hsd = p_gamma(2, 1),
  newdata = NULL,
  formula = NULL,
  prior_loghr = NULL,
  prior_hrsd = NULL,
  prior_cure = NULL
)

Arguments

n

Sample size of the simulated dataset. Each observation in the dataset is generated from a model with the same parameters. These parameters are generated from a single simulation from the prior distribution.

fix_prior

If TRUE, then one value of the parameter vector is drawn from the prior, followed by n individual-level times given this common prior value. If FALSE, then to produce each sampled individual time, a different sample from the prior is used.

mspline

A list of control parameters defining the spline model.

knots: Spline knots. If this is not supplied, then the number of knots is taken from df, and their location is taken from equally-spaced quantiles of the observed event times in the individual-level data.

add_knots: This is intended to be used when there are external data included in the model. External data are typically outside the time period covered by the individual data. add_knots would then be chosen to span the time period covered by the external data, so that the hazard trajectory can vary over that time.

If there are external data, and both knots and add_knots are omitted, then a default set of knots is chosen to span both the individual and external data, by taking the quantiles of a vector defined by concatenating the individual-level event times with the start and stop times in the external data.

df: Degrees of freedom, i.e. the number of parameters (or basis terms) intended to result from choosing knots based on quantiles of the data. The total number of parameters will then be df plus the number of additional knots specified in add_knots. df defaults to 10. This does not necessarily overfit, because the function is smoothed through the prior.

degree: Polynomial degree used for the basis function. The default is 3, giving a cubic. This can only be changed from 3 if bsmooth is FALSE.

bsmooth: If TRUE (on by default) the spline is smoother at the highest knot, by defining the derivative and second derivative at this point to be zero.

censtime

Right-censoring time to impose on the simulated event times.

coefs_mean

Spline basis coefficients that define the prior mean for the hazard function. By default, these are set to values that define a constant hazard function (see mspline_constant_coefs). They are normalised to sum to 1 internally (if they do not already).

prior_hscale

Prior for the baseline log hazard scale parameter (alpha or log(eta)). This should be a call to a prior constructor function, such as p_normal(0,1) or p_t(0,2,2). Supported prior distribution families are normal (parameters mean and SD) and t distributions (parameters location, scale and degrees of freedom). The default is a normal distribution with mean 0 and standard deviation 20.

Note that eta is not in itself a hazard, but it is proportional to the hazard (see the vignette for the full model specification).

"Baseline" is defined by the continuous covariates taking a value of zero and factor covariates taking their reference level. To use a different baseline, the data should be transformed appropriately beforehand, so that a value of zero has a different meaning. For continuous covariates, it helps for both computation and interpretation to define the value of zero to denote a typical value in the data, e.g. the mean.

prior_hsd

Gamma prior for the standard deviation that controls the variability over time (or smoothness) of the hazard function. This should be a call to p_gamma(). The default is p_gamma(2,1). See prior_haz_sd for a way to calibrate this to represent a meaningful belief.

newdata

A data frame with one row, containing variables in the model formulae. Samples will then be drawn, for any covariate-dependent parameters, with covariates set to the values given here.

formula

A model formula with no response, defining the covariates on the hazard scale.

prior_loghr

Priors for log hazard ratios. This should be a call to p_normal() or p_t(). A list of calls can also be provided, to give different priors to different coefficients, where the name of each list component matches the name of the coefficient, e.g. list("age45-59" = p_normal(0,1), "age60+" = p_t(0,2,3))

The default is p_normal(0,2.5) for all coefficients.

prior_hrsd

Prior for the standard deviation parameters that smooth the non-proportionality effects over time in non-proportional hazards models. This should be a call to p_gamma() or a list of calls to p_gamma() with one component per covariate, as in prior_loghr. See prior_hr_sd for a way to calibrate this to represent a meaningful belief.

prior_cure

Prior for the baseline cure probability. This should be a call to p_beta(). The default is a uniform prior, p_beta(1,1). Baseline is defined by the mean of continuous covariates and the reference level of factor covariates.

Value

A data frame with columns time (simulated time) and event (indicator for whether the time is an event time, as opposed to a right-censoring time). The prior parameters are returned in the prior attribute as a list with components alpha (baseline log hazard) and coefs (spline coefficients).

See also