**
****
**Dyes: variance components model

Box and Tiao (1973) analyse data first presented by Davies (1967) concerning batch to batch variation in yields of dyestuff. The data (shown below) arise from a balanced experiment whereby the total product yield was determined for 5 samples from each of 6 randomly chosen batches of raw material.

The object of the study was to determine the relative importance of between batch variation versus variation due to sampling and analytic errors. On the assumption that the batches and samples vary independently, and contribute additively to the total error variance, we may assume the following model for dyestuff yield:

y
_{ij
} ~ Normal(
m
_{i
},
t
_{within
})

m
_{i
} ~ Normal(
q
,
t
_{between
})

where y
_{ij
} is the yield for sample
*j
* of batch
*i
*,
m
_{i
} is the true yield for batch
*i
*,
t
_{within
} is the inverse of the within-batch variance
s
^{2
}_{within
} (
*i.e.
* the variation due to sampling and analytic error),
q
is the true average yield for all batches and
t
_{between
} is the inverse of the between-batch variance s
^{2
}_{between
}. The total variation in product yield is thus
s
^{2
}_{total
} =
s
^{2
}_{within
} +
s
^{2
}_{between
} and the relative contributions of each component to the total variance are f
_{within
} =
s
^{2
}_{within
}_{
}/
s
^{2
}_{total
} and f
_{between
} =
s
^{2
}_{between
}_{
}/
s
^{2
}_{total
} . We assume standard non-informative priors for
q
,
t
_{within
} and
t
_{between
}.

**
***Graphical model for dyes example
*

*Bugs
* language for dyes example

**
**

model

{

for(i in 1 : batches) {

mu[i] ~ dnorm(theta, tau.btw)

for(j in 1 : samples) {

y[i , j] ~ dnorm(mu[i], tau.with)

cumulative.y[i , j] <- cumulative(y[i , j], y[i , j])

}

}

sigma2.with <- 1 / tau.with

sigma2.btw <- 1 / tau.btw

tau.with ~ dgamma(0.001, 0.001)

tau.btw ~ dgamma(0.001, 0.001)

theta ~ dnorm(0.0, 1.0E-10)

}

__Data
__
( click to open )

__Inits for chain 1____
__
__Inits for chain 2____
__ ( click to open )

Results

A 25000 update burn in followed by a further 100000 updates gave the parameter estimates

Note that a relatively long run was required because of the high autocorrelation between successively sampled values of some parameters. Such correlations reduce the 'effective' size of the posterior sample, and hence a longer run is needed to ensure sufficient precision of the posterior estimates. Note that the posterior distribution for
s
^{2
}_{between
} has a very long upper tail: hence the posterior mean is considerably larger than the median. Box and Tiao estimate
s
^{2
}_{within
} = 2451 and
s
^{2
}_{between
}_{
}= 1764 by classical analysis of variance. Here,
s
^{2
}_{between
} is estimated by the difference of the between- and within-batch mean squares divided by the number of batches - 1. In cases where the between-batch mean square within-batch mean square, this leads to the unsatisfactory situation of a
*negative
* variance estimate. Computing a confidence interval for
s
^{2
}_{between
} is also difficult using the classical approach due to its complicated sampling distribution