[cervix0]     Cervix: case - control study
      with errors in covariates

Carroll, Gail and Lubin (1993) consider the problem of estimating the odds ratio of a disease d in a case-control study where the binary exposure variable is measured with error. Their example concerns exposure to herpes simplex virus (HSV) in women with invasive cervical cancer (d=1) and in controls (d=0). Exposure to HSV is measured by a relatively inaccurate western blot procedure w for 1929 of the 2044 women, whilst for 115 women, it is also measured by a refined or "gold standard'' method x. The data are given in the table below. They show a substantial amount of misclassification, as indicated by low sensitivity and specificity of w in the "complete'' data, and Carroll, Gail and Lubin also found that the degree of misclassification was significantly higher for the controls than for the cases (p=0.049 by Fisher's exact test).


They fitted a prospective logistic model to the case-control data as follows

i    ~   Bernoulli(p i )   i = 1,...,2044
i )   =    b 0C + b x i    i = 1,...,2044

b is the log odds ratio of disease. Since the relationship between d and x is only directly observable in the 115 women with "complete'' data, and because there is evidence of differential measurement error, the following parameters are required in order to estimate the logistic model

f 1,1    =   P(w=1 | x=0, d=0)
f 1,2    =   P(w=1 | x=0, d=1)
f 2,1    =   P(w=1 | x=1, d=0)
f 2,2    =   P(w=1 | x=1, d=1)
   q   =   P(x=1)

The differential probability of being exposed to HSV (x=1) for cases and controls is calculated as follows

The BUGS code is given below. The role of the variables x 1 and d 1 is to pick the appropriate value of f (the incidence of w) for any given true exposure status x and disease status d. Since x and d take the values 0 or 1, and the subscripts for f take values 1 or 2, we must first add 1 to each x[i] and d[i] in the BUGS code before using them as index values for f . BUGS does not allow subscripts to be functions of variable quantities --- hence the need to create x1 and d1 for use as subscripts. In addition, note that g 1 and g 2 were not simulated directly in BUGS, but were calculated as functions of other parameters. This is because the dependence of g 1 and g 2 on d would have led to a cycle in the graphical model which would no longer define a probability distribution.

       for (i in 1 : N) {
    x[i] ~ dbern(q) # incidence of HSV
    logit(p[i]) <- beta0C + beta * x[i]   # logistic model
    d[i] ~ dbern(p[i]) # incidence of cancer
    x1[i] <- x[i] + 1
    d1[i] <- d[i] + 1
    w[i] ~ dbern(phi[x1[i], d1[i]])   # incidence of w
    q ~ dunif(0.0, 1.0) # prior distributions
    beta0C ~ dnorm(0.0, 0.00001);
    beta ~ dnorm(0.0, 0.00001);
    for(j in 1 : 2) {
    for(k in 1 : 2){
            phi[j, k] ~ dunif(0.0, 1.0)
    # calculate gamma1 = P(x=1|d=0) and gamma2 = P(x=1|d=1)
       gamma1 <- 1 / (1 + (1 + exp(beta0C + beta)) / (1 + exp(beta0C)) * (1 - q) / q)
       gamma2 <- 1 / (1 + (1 + exp(-beta0C - beta)) / (1 + exp(-beta0C)) * (1 - q) / q)

Data ( click to open )

Inits for chain 1      Inits for chain 2    ( click to open )


A 1000 update burn in followed by a further 10000 updates gave the parameter estimates