[huddersfield0]    Poisson-gamma spatial moving
                     average (convolution)model: 
                     Ecological regression of air
                     pollution and respiratory
                     illness in children


Best et al (2000b) applied the Poisson-gamma convolution model to a small-area ecological regression analysis of traffic-related air pollution and respiratory illness in children living in the Huddersfield region of northern England. They carried out two analyses, one using a partition of the study region into 427 administrative enumeration districts, and the other partitioning the region into 605 regular 750m
2 grid cells. Here we consider the latter partition of the study region. The following data were available:

Y
i = number of cases of self-reported frequent cough amongst children aged 7-9 in each of the i=1,..., 605 areas (750 m 2 grid cells)
pop
i = (estimated) population of 7-9 year old children in each area, based on pro-rated 1991 enumeration district census counts. (Note that this leads to <1 child per area in some areas, due to the pro-rata split; counts are also divided by 100, to give rates per 100 children).
NO2
i = average annual nitrogen dioxide concentration in each area, measured in m g m -3 above the baseline concentration for the study region of approximately 20 m g m -3

Best et al (2000b) fitted an indentity-link spatial moving average Poisson regression model:

          Y i ~ Poisson( m i )
          m i = pop i * l i
          l i = b 0 + b 1 NO2 i + b 2 S j k ij g j
where g j can be thought of as latent unobserved risk factors associated with locations or sub-regions of the study region indexed by j. These locations or sub-regions are typically defined by the user, and are chosen to represent a partition of the region that is appropriate for capturing unmeasured spatial variation in the disease rate. Best et al (2000b) define the latent g j parameters on a 12 x 8 grid of approx 3km 2 quadrats covering the Huddersfield study region. k ij are then the elements of a 605 x 96 Gaussian kernel matrix, with
         k
ij = 1 / (2 p r 2 ) exp( - d ij 2 / 2 r 2 )
where d
ij represents the Euclidean distance between the centroid of area i of the study region and the location of the j th latent factor. Note that if the latent factors are defined on sub-regions rather than points, it is preferable to integrate the above kernel over the latent sub-region (i.e. over all possible distances between the centroid of area i and all locations within the latent sub-region) - the Splus / R function given below impelments this integration when computing the kernel matrix. r > 0 is a distance scale governing how rapidly the kernel weights (i.e. the influence of the j th latent factor on the disease risk in the i th area) decline with increasing distance. Best et al (2000b) fix r = 1 km. By fixing r , the kernel matrix can be pre-computed outside of WinBUGS (see link in the data file for Splus / R code for doing this), thus simplifying the implementation. It is possible to treat r as uncertain (see Forest ), but this entails calculating the elements k ij at each MCMC interation within the BUGS code: this leads to big increases in computation time for all but very small study regions and latent grids. (Note that since the overall scale parameter of the Gaussian kernel is common to all elements k ij it can be factored out of the externel calculation of the kernel, and included as an uncertain parameter (here called b 2 ) in the BUGS model.

Independent gamma prior distributions are assumed for the uncertainty quantities
b 0 , b 1 , b 2 and g j as this enables the MCMC sampler to exploit conjugacy with the Poisson likelihood.

See Best et al (2000b) for full details and interpretation of the model, and
Appendix 2 for further technical information about the Poisson-gamma model.

The code for this model is given below.


Model

          model {

          # Likelihood
              for (i in 1 : I) {
          # convolution likelihood:
          # Y[i] ~ Poisson( sum_ j { mu[i, j] } )
          # where mu[i, j] = pop[i] * lambda[i, j] and
          # pop[i] = (standardised) population (in 100's) in area i
          # lambda[i, j] = disease rate in area i attributed to jth 'source', where
          # the sources include both observed and latent covariates.
                Y[i] ~ dpois.conv(mu[i, ])   

             for (j in 1 : J) {
          # rescale kernel matrix
                k1[i, j] <- ker[i, j] / prec
         # jth 'source' (jth latent spatial grid cell)
                 lambda[i, j] <- beta2 * gamma[j] * k1[i, j]
             }
          # (J+1)th 'source' (overall intercept; represents background rate)
              lambda[i,J+1] <- beta0
         # (J+2)th 'source' (effect of observed NO2 covariate in area i)
             lambda[i,J+2] <- beta1 * NO2[i]
          # Note: if additional covariates have been measured, these
          # should be included in the same way as NO2, giving terms
          # lambda[i, J+3], lambda[i, J+4],.....etc.
             for (j in 1 : J+2) {
                mu[i, j] <- lambda[i, j] * pop[i]
              }
          }
                
          # Priors
          # See Table 22.1 in Best et al. (2000b) for further details.
         
          # Priors for latent spatial grid cell (gamma) parameters:
          #
          # Assume gamma_j ~ gamma(a, t)
          #
          # Prior shape parameter for gamma distn on gamma[j]'s will change in proportion to
         # the size of the latent grid cells, to ensure aggregation consistency. The prior inverse
         # scale parameter for gamma distn on gamma[j]'s stays constant across different
         # partitions, but depends on the units used to measure the area of the latent grid cells:
         # here we are assuming # 'area' is in sq kilometres, and we take tauG = 1 (per km^2)
         # which corresponds to assuming # that our prior guess at the value of the gamma[j]'s
         # is based on 'observing' about 1 * (area of study region = 30*20km^2) = 600
          # 'prior points' over the study region.
             
         # Note: area of latent grid cells is read in from data file
             alphaG <- tauG * area
             tauG <- 1

             for (j in 1 : J) {
                gamma[j] ~ dgamma(alphaG, tauG)
             }

          # Priors for beta coefficients:
          #
          # Assume priors on each beta are of the form beta ~ gamma(a, t).
          #
          # Here, parameter of the prior are chosen by setting the prior mean to give an equal
          # number of cases allocated to each of the 3 'sources' (baseline, NO2 and latent), i.e.
         # prior mean = a / t = a * 3 * Xbar / Ybar,
          # where Ybar is the overall disease rate in number of cases per unit population,
          # and Xbar is the population-weighted mean of covariate X.
          #
          # We also take shape parameter a = 0.575, since this gives the ratio of the 90th : 10th
          # percentile of the prior distn = 100, which is suitably diffuse.
         
             YBAR <- sum(Y[])/sum(pop[])   # mean number of cases per unit population
             NO2BAR <- inprod(NO2[], pop[]) / sum(pop[])   # population-weighted mean NO2
             
         # Shape parameter for Intercept (beta0)
             alpha0<- 0.575
         # Inverse scale parameter for Intercept (beta0)
             tau0 <- 3 * alpha0 / YBAR
         # Shape parameter for effect of NO2 (beta1)
             alpha1<- 0.575
         # Inverse scale parameter for effect of NO2 (beta1)
             tau1 <- 3 * alpha1 * NO2BAR / YBAR
         # Shape parameter for Latent coefficient (beta2)
             alpha2 <- 0.575
         # Inverse scale parameter for Latent coefficient (beta2)
             tau2 <- 3 * alpha2   / YBAR

             beta0 ~ dgamma(alpha0, tau0)
             beta1 ~ dgamma(alpha1, tau1)
             beta2 ~ dgamma(alpha2, tau2)


          # Summary quantities for posterior inference
              for(i in 1:I) {
          # Overall disease rate in area i
                RATE[i] <- sum(mu[i, ])/pop[i]
          # Rate associated with latent spatial factor in area i
                LATENT[i] <- beta2 * inprod(k1[i,], gamma[])
             }

          # Expected rate (number of cases per unit population) attributed to each source
          # (see Table 22.2 in Best et al (2000b) )

          # Background (overall baseline) rate per 100 population
          rate.base <- beta0

          # Number of cases per 100 population attributed
          # to NO2 = beta1 * population-weighted mean value
          # of NO2 across the study region.
             rate.no2 <- beta1 * inprod(pop[], NO2[]) / sum(pop[])
             
          # Number of cases per 100 population attributed to
          # latent sources = beta2 * population-weighted mean
          # value of the latent spatial factor (sum_j k_ij gamma_j)
             rate.latent <- inprod(pop[], LATENT[]) / sum(pop[])

          # Percentage of cases attributed to each source:
             Total <- rate.base + rate.no2 + rate.latent
         
# % cases attributed to baseline risk
            
PC.base <- rate.base / Total * 100
         # % cases attributed to NO2 exposure
             PC.no2 <- rate.no2 / Total * 100
         # % cases attributed to latent spatial factors
             PC.latent <- rate.latent / Total * 100

          }

Data     (Click to open)

Inits for to chain 1     Inits for to chain 2     (Click to open)

R / Splus functions click here to open R  Code

Results

A 2000 iteration burn-in followed by a further 10000 updates gave the following results
   


[huddersfield1]

These differ slightly from those in Best et al (2000b) Table 22.2 since the data in this example have been randomly perturbed for confidentiality. However, the picture is broadly similar, with 1.84 (95% interval 0.046 to 4.2) cases per 100 population attributed to NO2 exposure, 7.75 (95% interval 4.2 to 10.0) cases per 100 population attributed to latent spatially varying risk factors and the remaining 0.39 (95% iterval 0.0019 to 3.4) cases per 100 population attributed to the (spatially constant) baseline risk.

Maps showing the posterior mean rate (per 100 population) of frequent cough attributed to the latent risk factors (cf Fig 22.6b in Best et al, 2000b) and the overall rate (per 100 population) of frequent cough in each area in the study region (cf Fig 22.7b in Best et al, 2000b) are show below. The relevant map file is available from the GeoBUGS 1.2 map tool and is called
Huddersfield_750m_grid.


[huddersfield2]

[huddersfield3]