Impala

[impala0] Impala: distance Sampling

               (contributed by Andy Royle)

Distance sampling is a popular method for estimating the density of populations based on observed distances between objects and line or point (see Buckland et al. 2001). It can be viewed as a special type of closed population model for estimating population size, having J=1 replicate sample and with distance from line or point to individual as an "individual covariate". We adopt that view here, and thus analyze the model using data augmentation (as in the closed population examples given previously). This analysis derives from Royle and Dorazio (2008; Chapter 7).

The data set is the famous "Impala data" from Burnham (1980), consisting of distance measurements on 73 individuals along linear transects truncated to a width of 400 m. Distances were scaled here to be in units of 100 m.

Let N be the population size of individuals susceptible to sampling. If N is known, the observation model is:

      y _i | x _i ~ Bernoulli(p _i)

where

p _i = g(x _i; theta)

x _i = distance to individual i and g() is the detection probability function. Here we consider the half-normal detection function:

g(x) = exp-(x / theta) ²

Data augmentation -- We do not know N in practice. In fact, we only observe the y _i=1 observations. We handle that in the analysis using data augmentation (see Royle and Dorazio (2008, Ch. 7)). We add the "observations" y _i= 0 for i = nind +1, nind + 2,...., M for M sufficiently large. We introduce the latent indicator variables z _i for i=1,2,...., M with

      z _i ~ Bernoulli( psi ) for i = 1, 2, ...., M # note: specified for all M individuals

(this implies N ~ dunif(0, M) so choose M accordingly). For analysis of the impala data, data augmentation is based on nz=300 y _i=0 observations.

Distance, x, is only observed for individuals that are detected, and it is regarded as missing data for the augmented individuals. We assume:

x _i ~ uniform(0, B) for i = 1, 2, ...., M # note: specified for all M individuals

Where B is chosen to represent the upper-limit of observation (in practice, observations are truncated at some large distance). For the Impala data, B=4 (i.e., 400 meters). The "missing" values of x have to be passed to WinBUGS as missing values, indicated by NA (see below).

Model summary:
      y _i | x _i, z _i ~ Bernoulli(p _i * z _i) # observation model
      p _i = g(x _i; theta)
g(x) = exp((x / theta ²))
z _i ~ Bernoulli(psi) for i = 1, 2, ...., M # data augmentation
x _i ~ unif(0, B) for i = 1, 2, ...., M # distances

Prior distributions are specified as:
psi ~ uniform(0,1)
theta ~ uniform(0, U) # U chosen arbitrarily large

Under data augmentation, population size, N, and density, D are derived parameters: N = S z _i and, D = N / a, where a = area of the sample unit(s).

    model
   {
    # Prior distributions
       theta ~ dunif(0, 10)
       theta2 <- theta *theta
       psi ~ dunif(0,1)

       for(i in 1 : nind+nz){
          z[i] ~ dbern(psi) # latent indicator variables from data augmentation
          x[i] ~ dunif(0, 4) # distance is a random variable
          logp[i]<- -((x[i] * x[i]) / theta2)
          p[i] <- exp(logp[i])
          mu[i] <- z[i] * p[i]
          y[i] ~ dbern(mu[i]) # observation model
       }
       N<-sum(z[1:nind + nz])
       D<- N / 48 # 48 km*km = total area of transects
    }

Data ( click to open )

Inits for chain 1 Inits for chain 2

Note that initial values for the missing values of x are not provided. WinBUGS seems to do a good job picking those, but not always (sometimes there is a crash, so try again).

Results

Results from 2 chains of length 10000 after 1000 burn-in. Estimated density is 3.75 individuals per square kilometer