Limiting long-term illness. Ecological inference with one binary and one continuous exposure, and combined aggregate and individual data.
Model for limiting long-term illness among men aged from 45 to 59 years in London, in terms of individual ethnicity, household income and area deprivation. Aggregate data are taken from the UK census small area statistics, and include ngr = 255 wards in London. Individual data are taken from the Health Survey for England, including 412 individuals in London.
model {
## ecological data
for (i in 1:ngr) {
## The number of individuals per ward reporting limiting long term illness
## is modelled as binomial, with risk py[i] constant over the area.
N.llti[i] ~ dbin(py[i], N.pop[i])
## This risk is obtained by integrating the individual level risk, firstly over the binary exposure, non-white
## ethnicity
py[i] <- p.nonwhite[i] * pnw[i] + ( 1 - p.nonwhite[i] ) * pw[i]
## and secondly over the continuous variable, mean log income. Details of this calculation are given in
## Jackson et al., (2005)
Improving ecological inference using individual-level data.
logit(pw[i]) <- ( mu.base[i] + a.carstairs*carstairs[i] + a.lincome*mean.lincome[i] ) / sqrt(1 + pow(c*a.lincome, 2) / tau.x[i])
logit(pnw[i]) <- ( mu.base[i] + a.carstairs*carstairs[i] + a.nonwhite + a.lincome*mean.lincome[i] ) / sqrt(1 + pow(c*a.lincome, 2) / tau.x[i])
tau.x[i] <- 1 / pow(sd.lincome[i], 2)
## The risk includes a random area-specific effect mu.base[i], to account for heterogeneity between areas
## due to unobserved factors.
mu.base[i] ~ dnorm(base.mu, base.tau)
}
## Model for limiting long term illness in the individual-level data is just a logistic regression with the same
## coefficients
for (i in 1:n) {
llti[i] ~ dbern(pzy[i])
logit(pzy[i]) <- mu.base[ward[i]] + mu.scale +
a.carstairs*carstairs[ward[i]]
+ a.nonwhite * nonwhite[i] + a.lincome * lincome[i]
}
c <- 16*sqrt(3) / (15 * pi)
pi <- 3.141592654
## Log odds ratios for area level Carstairs deprivation index, non-white ethnicity and log income
a.carstairs ~ dnorm(0, 0.1)
a.nonwhite ~ dnorm(0, 1.48) # 95% prior belief that OR(a.nonwhite) is between 1/5 and 5
a.lincome ~ dnorm(0, 1.48)
## Odds ratios for area level Carstairs deprivation index, non-white ethnicity and log income
or.nonwhite <- exp(a.nonwhite)
or.lincome <- exp(a.lincome)
or.carstairs <- exp(a.carstairs)
base.mu ~ dlogis(0, 1)
base.tau ~ dgamma(1, 0.01)
base.sig <- 1 / sqrt(base.tau)
p.base <- exp(base.mu) / (1 + exp(base.mu))
mu.scale <- 0.7
# fixed difference in prevalence (logit scale) between HSE
and census
}
Data:
Click on the arrows to open the data
Inits for two parallel chains:
list(base.mu=0, base.tau=1, a.nonwhite=0, a.lincome=0, a.carstairs=0)
list(base.mu=1, base.tau=5, a.nonwhite=-2, a.lincome=2, a.carstairs=-1)
Results...
After 30000 iterations from two parallel chains, with the first 4000 from each discarded, we obtain the following
summary statistics:
node mean sd MC error 2.5% median 97.5% start sample
base.mu -2.108 0.03027 0.001257 -2.171 -2.107 -2.052 4001 52000
base.sig 0.1843 0.01172 1.438E-4 0.1622 0.184 0.2082 4001 52000
or.carstairs 1.071 0.004904 1.836E-4 1.062 1.072 1.081 4001 52000
or.lincome 0.5617 0.04406 0.00158 0.4785 0.5608 0.6513 4001 52000
or.nonwhite 1.432 0.1649 0.007305 1.133 1.424 1.763 4001 52000