Schools

    Schools: ranking schoolexamination
         results using multivariate
         hierarcical models

Goldstein et al. (1993) present an analysis of examination results from inner London schools. They use hierarchical or multilevel models to study the between-school variation, and calculate school-level residuals in an attempt to differentiate between `good' and `bad' schools. Here we analyse a subset of this data and show how to calculate a rank ordering of schools and obtain credible intervals on each rank.

Data

Standardized mean examination scores (Y) were available for 1978 pupils from 38 different schools. The median number of pupils per school was 48, with a range of 1--198. Pupil-level covariates included gender plus a standardized London Reading Test (LRT) score and a verbal reasoning (VR) test category (1, 2 or 3, where 1 represents the highest ability group) measured when each child was aged 11. Each school was classified by gender intake (all girls, all boys or mixed) and denomination (Church of England, Roman Catholic, State school or other); these were used as categorical school-level covariates.

Model

We consider the following model, which essentially corresponds to Goldstein et al.'s model 1.

   Y _ij   ~   Normal( m _ij, t _ij)
    m _ij   =    a _1j + a _2j LRT _ij + a _3j VR 1 _ij + b ₁ LRT _ij² + b ₂ VR 2 _ij + b ₃ Girl _ij
         + b ₄ Girls' school _j + b ₅ Boys' school _j + b ₆ CE school _j
         + b ₇ RC school _j + b ₈ other school _j
   log t _ij   =    q + f LRT _ij
where i refers to pupil and j indexes school. We wish to specify a regression model for the variance components, and here we model the logarithm of t _ij (the inverse of the between-pupil variance) as a linear function of each pupil's LRT score. This differs from Goldstein et al.'s model which allows the variance s ²_ij to depend linearly on LRT. However, such a parameterization may lead to negative estimates of s ²_ij.

Prior distributions

The fixed effects b _k (k=1,...,8), q and f were assumed to follow vague independent Normal distributions with zero mean and low precision = 0.0001. The random school-level coefficients a _kj (k = 1,2,3) were assumed to arise from a multivariate normal population distribution with unknown mean g and covariance matrix S . A non-informative multivariate normal prior was then specified for the population mean g , whilst the inverse covariance matrix T = S ^-1 was assumed to follow a Wishart distribution. To represent vague prior knowledge, we chose the degrees of freedom for this distribution to be as small as possible (i.e. 3, the rank of T ). The scale matrix R was specified as

which represents our prior guess at the order of magnitude of S .

The BUGS code is given below:

   model
   {
      for(p in 1 : N) {
         Y[p] ~ dnorm(mu[p], tau[p])
         mu[p] <- alpha[school[p], 1] + alpha[school[p], 2] * LRT[p]
            + alpha[school[p], 3] * VR[p, 1] + beta[1] * LRT2[p]
            + beta[2] * VR[p, 2] + beta[3] * Gender[p]
            + beta[4] * School.gender[p, 1] + beta[5] * School.gender[p, 2]
            + beta[6] * School.denom[p, 1] + beta[7] * School.denom[p, 2]
            + beta[8] * School.denom[p, 3]
         log(tau[p]) <- theta + phi * LRT[p]
         sigma2[p] <- 1 / tau[p]
         LRT2[p] <- LRT[p] * LRT[p]
      }
      min.var <- exp(-(theta + phi * (-34.6193))) # lowest LRT score = -34.6193
      max.var <- exp(-(theta + phi * (37.3807))) # highest LRT score = 37.3807

   # Priors for fixed effects:
      for (k in 1 : 8) {
         beta[k] ~ dnorm(0.0, 0.0001)
      }
      theta ~ dnorm(0.0, 0.0001)
      phi ~ dnorm(0.0, 0.0001)

   # Priors for random coefficients:
      for (j in 1 : M) {
         alpha[j, 1 : 3] ~ dmnorm(gamma[1:3 ], T[1:3 ,1:3 ]);
         alpha1[j] <- alpha[j,1]
      }

   # Hyper-priors:
      gamma[1 : 3] ~ dmnorm(mn[1:3 ], prec[1:3 ,1:3 ]);
      T[1 : 3, 1 : 3 ] ~ dwish(R[1:3 ,1:3 ], 3)
   }

Data ( click to open )

Note that school is a 1978 x 3 matrix taking value 1 for all pupils in school 1, 2 for all pupils in school 2 and so on. For computational convenience, Y , mu and tau are indexed over a single dimension p = 1,...,1978 rather than as pupil i within school j as used in equations above. The appropriate school-level coefficients for pupil p are then selected using the school indicator in row p of the data array --- for example alpha[school[p],1] .

Inits for chain 1        Inits for chain 2    ( click to open )

Results

A 1000 update burn in followed by a further 10000 updates gave the parameter estimates

[schools2]

Estimating the ranks

The school-specific intercept a _j1 measures the 'residual effect' for school j after adjusting for pupil- and school-level covariates. This might represent an appropriate quantity by which to rank schools' performance. We compute the ranks in BUGS using the "rank" option of the "Statistics" menu, which we set for the variable alpha at the same time as we set the "sample monitor" option. Since the rank is a function of stochastic nodes, its value will change at every iteration. Hence we may obtain a posterior distribution for the rank of alpha[, k] which may be summarized by posterior histograms as shown below: