Schools: ranking schoolexamination
results using multivariate
hierarcical models
Goldstein et al. (1993) present an analysis of examination results from inner London schools. They use hierarchical or multilevel models to study the between-school variation, and calculate school-level residuals in an attempt to differentiate between `good' and `bad' schools. Here we analyse a subset of this data and show how to calculate a rank ordering of schools and obtain credible intervals on each rank.
Data
Standardized mean examination scores (Y) were available for 1978 pupils from 38 different schools. The median number of pupils per school was 48, with a range of 1--198. Pupil-level covariates included gender plus a standardized London Reading Test (LRT) score and a verbal reasoning (VR) test category (1, 2 or 3, where 1 represents the highest ability group) measured when each child was aged 11. Each school was classified by gender intake (all girls, all boys or mixed) and denomination (Church of England, Roman Catholic, State school or other); these were used as categorical school-level covariates.
Model
We consider the following model, which essentially corresponds to Goldstein et al.'s model 1.
Y
ij
~ Normal(
m
ij
,
t
ij
)
m
ij
=
a
1j
+
a
2j
LRT
ij
+
a
3j
VR
1
ij
+
b
1
LRT
ij
2
+
b
2
VR
2
ij
+
b
3
Girl
ij
+
b
4
Girls' school
j
+
b
5
Boys' school
j
+
b
6
CE school
j
+
b
7
RC school
j
+
b
8
other school
j
log
t
ij
=
q
+
f
LRT
ij
where i refers to pupil and j indexes school. We wish to specify a regression model for the variance components, and here we model the logarithm of
t
ij
(the inverse of the between-pupil variance) as a linear function of each pupil's LRT score. This differs from Goldstein et al.'s model which allows the
variance
s
2
ij
to depend linearly on LRT. However, such a parameterization may lead to negative estimates of
s
2
ij
.
Prior distributions
The fixed effects
b
k
(k=1,...,8),
q
and
f
were assumed to follow vague independent Normal distributions with zero mean and low precision = 0.0001. The random school-level coefficients
a
kj
(k = 1,2,3) were assumed to arise from a multivariate normal population distribution with unknown mean
g
and covariance matrix
S
. A non-informative multivariate normal prior was then specified for the population mean
g
, whilst the inverse covariance matrix
T
=
S
-1
was assumed to follow a Wishart distribution. To represent vague prior knowledge, we chose the degrees of freedom for this distribution to be as small as possible (i.e. 3, the rank of
T
). The scale matrix
R
was specified as