Make default M-spline knot specification given a survival dataset.
Source:R/survextrap.R
mspline_spec.Rd
Choose default M-spline knot locations given a dataset and desired number of spline parameters. Assumes a cubic spline, and knots based on quantiles of event times observed in the individual data.
Usage
mspline_spec(
formula,
data,
cure = FALSE,
nonprop = NULL,
backhaz = NULL,
backhaz_strata = NULL,
external = NULL,
df = 10,
add_knots = NULL,
degree = 3,
bsmooth = TRUE
)
Arguments
- formula
A survival formula in standard R formula syntax, with a call to
Surv()
on the left hand side.Covariates included on the right hand side of the formula with be modelled with proportional hazards, or if
nonprop
isTRUE
then a non-proportional hazards is used.If
data
is omitted, so that the model is being fitted to external aggregate data alone, without individual data, then the formula should not include aSurv()
call. The left-hand side of the formula will then be empty, and the right hand side specifies the covariates as usual. For example,formula = ~1
if there are no covariates.- data
Data frame containing variables in
formula
. Variables should be in a data frame, and not in the working environment.This may be omitted, in which case
external
must be supplied. This allows a model to be fitted to external aggregate data alone, without any individual-level data.- cure
If
TRUE
, a mixture cure model is used, where the "uncured" survival is defined by the M-spline model, and the cure probability is estimated.- nonprop
Non-proportional hazards model specification. This is achieved by modelling the spline basis coefficients in terms of the covariates. See the methods vignette for more details.
If
TRUE
, then all covariates are modelled with non-proportional hazards, using the same model formula asformula
.If this is a formula, then this is assumed to define a model for the dependence of the basis coefficients on the covariates.
IF this is
NULL
orFALSE
(the default) then any covariates are modelled with proportional hazards.- backhaz
Background hazard, that is, for causes of death other than the cause of interest. This defines a "relative survival" model where the overall hazard is the sum of a cause-specific hazard and a background hazard. The background hazard is assumed to be known, and the cause-specific hazard is modelled with the flexible parametric model.
The background hazard can be supplied in two forms. The meaning of predictions from the model depends on which of these is used.
(a) A data frame with columns
"hazard"
and"time"
, specifying the background hazard at all times as a piecewise-constant (step) function. Each row gives the background hazard between the specified time and the next time. The first element of"time"
should be 0, and the final row specifies the hazard at all times greater than the last element of"time"
. Predictions from the model fitted bysurvextrap
will then include this background hazard, because it is known at all times.(b) The (quoted) name of a variable in the data giving the background hazard. For censored cases, the exact value does not matter. The predictions from
survextrap
will then describe the excess hazard or survival on top of this background. The overall hazard cannot be predicted in general, because the background hazard is only specified over a limited range of time.If there is external data, and
backhaz
is supplied in form (b), then the user should also supply the background survival at the start and stop points in columns of the external data named"backsurv_start"
and"backsurv_stop"
. This should describe the same reference population asbackhaz
, though the package does not check for consistency between these.If there are stratifying variables specified in
backhaz_strata
, then there should be multiple rows giving the background hazard value for each time period and stratifying variable.If
backhaz
isNULL
(the default) then no background hazard component is included in the model.- backhaz_strata
A character vector of names of variables that appear in
backhaz
that indicate strata, e.g.backhaz_strata = c("agegroup","sex")
. This allows different background hazard values to be used for different subgroups. These variables must also appear in the datasets being modelled, that is, indata
,external
or both. Each row of those datasets should then have a corresponding row inbackhaz
which has the same values of the stratifying variables.This is
NULL
by default, indicating no stratification of the background hazard.If stratification is done, then
backhaz
must be supplied in form (a), as a data frame rather than a variable in the data.- external
External data as a data frame of aggregate survival counts with columns named:
start
: Start timestop
: Follow-up timen
: Number of people alive atstart
r
: Number of those people who are still alive atstop
If there are covariates in
formula
, then the values they take in the external data must be supplied as additional columns inexternal
. Therefore if there are external data, the covariates informula
anddata
should not be namedstart
,stop
,n
orr
.- df
Desired number of basis terms, or "degrees of freedom" in the spline. If
knots
is not supplied, the number of knots is then chosen to satisfy this.- add_knots
Additional knots, other than those determined from the quantiles of the individual data. Typically used to add a maximum knot at the time that we want to extrapolate to.
- degree
Spline polynomial degree. Can only be changed from the default of 3 if
bsmooth
isFALSE
.- bsmooth
If
TRUE
then the function is constrained to also have zero derivative and second derivative at the boundary.
Value
A list with components
knots
Knot locations. The number of
knots will be equal to df
+ degree
+ 2.
degree
Spline polynomial degree (i.e. 3)
nvars
Number of basis variables (an alias for df
)
Details
If there are also external data, then these are based on quantiles of a vector defined by concatenating the event times in the individual data with the unique start and stop times in the external data.
This is designed to have the same arguments as
survextrap
. It is intended for use when we want to
fit a set of survextrap
models with the same spline
specification.
See also mspline_list_init
and mspline_init
,
which have lower-level interfaces, and are designed for use without
data, e.g. when illustrating a theoretical M-spline model.