Return fitted survival, cumulative hazard or hazard at a series of times
from a fitted flexsurvreg
or flexsurvspline
model.
Usage
# S3 method for flexsurvreg
summary(
object,
newdata = NULL,
X = NULL,
type = "survival",
fn = NULL,
t = NULL,
quantiles = 0.5,
start = 0,
cross = TRUE,
ci = TRUE,
se = FALSE,
B = 1000,
cl = 0.95,
tidy = FALSE,
na.action = na.pass,
...
)
Arguments
- object
Output from
flexsurvreg
orflexsurvspline
, representing a fitted survival model object.- newdata
Data frame containing covariate values to produce fitted values for. Or a list that can be coerced to such a data frame. There must be a column for every covariate in the model formula, and one row for every combination of covariates the fitted values are wanted for. These are in the same format as the original data, with factors as a single variable, not 0/1 contrasts.
If this is omitted, if there are any continuous covariates, then a single summary is provided with all covariates set to their mean values in the data - for categorical covariates, the means of the 0/1 indicator variables are taken. If there are only factor covariates in the model, then all distinct groups are used by default.
- X
Alternative way of defining covariate values to produce fitted values for. Since version 0.4,
newdata
is an easier way that doesn't require the user to create factor contrasts, butX
has been kept for backwards compatibility.Columns of
X
represent different covariates, and rows represent multiple combinations of covariate values. For examplematrix(c(1,2),nrow=2)
if there is only one covariate in the model, and we want survival for covariate values of 1 and 2. A vector can also be supplied if just one combination of covariates is needed.For ``factor'' (categorical) covariates, the values of the contrasts representing factor levels (as returned by the
contrasts
function) should be used. For example, for a covariateagegroup
specified as an unordered factor with levels20-29, 30-39, 40-49, 50-59
, and baseline level20-29
, there are three contrasts. To return summaries for groups20-29
and40-49
, supplyX = rbind(c(0,0,0), c(0,1,0))
, since all contrasts are zero for the baseline level, and the second contrast is ``turned on'' for the third level40-49
.- type
"survival"
for survival probabilities."cumhaz"
for cumulative hazards."hazard"
for hazards."rmst"
for restricted mean survival."mean"
for mean survival."median"
for median survival (alternative totype="quantile"
withquantiles=0.5
)."quantile"
for quantiles of the survival time distribution."link"
for the fitted value of the location parameter (i.e. the "linear predictor" but on the natural scale of the parameter, not on the log scale)Ignored if
"fn"
is specified.- fn
Custom function of the parameters to summarise against time. This has optional first two arguments
t
representing time, andstart
representing left-truncation points, and any remaining arguments must be parameters of the distribution. It should be vectorised, and return a vector corresponding to the vectors given byt
,start
and the parameter vectors.- t
Times to calculate fitted values for. By default, these are the sorted unique observation (including censoring) times in the data - for left-truncated datasets these are the "stop" times.
- quantiles
If
type="quantile"
, this specifies the quantiles of the survival time distribution to return estimates for.- start
Optional left-truncation time or times. The returned survival, hazard or cumulative hazard will be conditioned on survival up to this time. Predicted times returned with
"rmst"
,"mean"
,"median"
or"quantile"
will be times since time zero, not times since thestart
time.A vector of the same length as
t
can be supplied to allow different truncation times for each prediction time, though this doesn't make sense in the usual case where this function is used to calculate a predicted trajectory for a single individual. This is why the defaultstart
time was changed for version 0.4 of flexsurv - this was previously a vector of the start times observed in the data.- cross
If
TRUE
(the default) then summaries are calculated for all combinations of times specified int
and covariate vectors specifed innewdata
.If
FALSE
, then the timest
should be of length equal to the number of rows innewdata
, and one summary is produced for each row ofnewdata
paired with the corresponding element oft
. This is used, e.g. when determining Cox-Snell residuals.- ci
Set to
FALSE
to omit confidence intervals.- se
Set to
TRUE
to include standard errors.- B
Number of simulations from the normal asymptotic distribution of the estimates used to calculate confidence intervals or standard errors. Decrease for greater speed at the expense of accuracy, or set
B=0
to turn off calculation of CIs and SEs.- cl
Width of symmetric confidence intervals, relative to 1.
- tidy
If
TRUE
, then the results are returned as a tidy data frame instead of a list. This can help with using the ggplot2 package to compare summaries for different covariate values.- na.action
Function determining what should be done with missing values in
newdata
. Ifna.pass
(the default) then summaries ofNA
are produced for missing covariate values. Ifna.omit
, then missing values are dropped, the behaviour ofsummary.flexsurvreg
beforeflexsurv
version 1.2.- ...
Further arguments passed to or from other methods. Currently unused.
Value
If tidy=FALSE
, a list with one component for each unique
covariate value (if there are only categorical covariates) or one component
(if there are no covariates or any continuous covariates). Each of these
components is a matrix with one row for each time in t
, giving the
estimated survival (or cumulative hazard, or hazard) and 95% confidence
limits. These list components are named with the covariate names and
values which define them.
If tidy=TRUE
, a data frame is returned instead. This is formed by
stacking the above list components, with additional columns to identify the
covariate values that each block corresponds to.
If there are multiple summaries, an additional list component named
X
contains a matrix with the exact values of contrasts (dummy
covariates) defining each summary.
The plot.flexsurvreg
function can be used to quickly plot
these model-based summaries against empirical summaries such as
Kaplan-Meier curves, to diagnose model fit.
Confidence intervals are obtained by sampling randomly from the asymptotic normal distribution of the maximum likelihood estimates and then taking quantiles (see, e.g. Mandel (2013)).
Details
Time-dependent covariates are not currently supported. The covariate values are assumed to be constant through time for each fitted curve.
References
Mandel, M. (2013). "Simulation based confidence intervals for functions with complicated derivatives." The American Statistician (in press).
Author
C. H. Jackson chris.jackson@mrc-bsu.cam.ac.uk