Skip to contents

Standardised outputs are outputs from models with covariates, that are defined by marginalising (averaging) over covariate values in a given population, rather than being conditional on a given covariate value.

Usage

standardise_to(newdata, nstd = 1, random = FALSE)

standardize_to(newdata, nstd = 1, random = FALSE)

Arguments

newdata

Data frame describing a population.

nstd

Number of draws from the population distribution used per MCMC sample from the parameters when random=TRUE. With the default of 1, the value of the covariate vector \(X\) is essentially treated as if it were an additional parameter in the Bayesian model, drawn by Monte Carlo independently of the remaining parameters.

random

By default this is FALSE, indicating that standardised samples should be obtained by concatenating the posterior samples for each covariate value in the standard population. The sample from the standardised posterior of parameters then has size niter times the number of rows in newdata, where niter is the number of MCMC iterations used in the original survextrap fit. Computing the resulting output function (e.g. RMST which uses numerical integration) can then be computationally intensive if this sample size is large.

A quicker alternative is to sample a random row of the standard population for each MCMC iteration. The standardised sample from the posterior then has size niter. This is specified by using random=TRUE. If this is used, then the result depends on the random number seed, and it should be checked that the results are stable to within the required number of significant figures. If not, run survextrap with more MCMC iterations or increase nstd here.

Value

A copy of newdata, but with attributes added to indicate that this should be used as a standard population. When this newdata is passed to survextrap's output functions, the outputs will then be presented as an average over the empirical distribution of covariate values described by newdata, rather than as one output per row of newdata (distinct covariate values).

Details

These are produced by generating a Monte Carlo sample from the joint distribution of parameters \(\theta\) and covariate values \(X\), \(p(X,\theta) = p(\theta|X)p(X)\), where \(p(X)\) is defined by the empirical distribution of covariates in the standard population.

Hence applying a vectorised output function \(g()\) (such as the RMST or survival probability) to this sample produces a sample from the posterior of \(\int g(\theta|X) dX\): the average RMST (say) for a heterogeneous population.

See the Examples vignette for some examples and notes on computation.

Examples

rxph_mod <- survextrap(Surv(years, status) ~ rx, data=colons, fit_method="opt")
ref_pop <- data.frame(rx = c("Obs","Lev+5FU"))

# covariate-specific outputs
survival(rxph_mod, t = c(5,10), newdata = ref_pop)
#> # A tibble: 4 × 5
#>   rx          t median  lower upper
#>   <chr>   <dbl>  <dbl>  <dbl> <dbl>
#> 1 Obs         5  0.376 0.208  0.522
#> 2 Obs        10  0.228 0.0429 0.470
#> 3 Lev+5FU     5  0.608 0.407  0.736
#> 4 Lev+5FU    10  0.470 0.186  0.690

# standardised outputs
survival(rxph_mod, t = c(5,10), newdata = standardise_to(ref_pop))
#> # A tibble: 2 × 4
#>       t median  lower upper
#>   <dbl>  <dbl>  <dbl> <dbl>
#> 1     5  0.483 0.242  0.716
#> 2    10  0.345 0.0673 0.662