In this workshop we will use a simplified version of the northern bobwhite case study from Oedekoven et al. (2014) to illustrate the methods presented in this paper. It is simplified so that you can work through the exercises during the allocated time. The simplification includes a reduction in the data included and the model selection options. Please note that you are also provided with the case study that accompanies our new distance sampling book (Buckland et al. 2015) which includes more of the originally analyzed data, covers a wider range of model choices and includes a two-stage maximum likelihood approach for comparison.
The way the practical part of this workshop is set up is the following: we begin by defining the integrated likelihood. Then we briefly describe the data. To introduce the Metropolis Hastings updating algorithm we start with a simple detection function model focusing on the likelihood component pertaining to the detection function. In the next part of the practical we use the integrated likelihood to fit both, a simple detection function and a simple count model to the data and obtain all parameter estimates simultaneously. In the third part, we will set up a simple reversible jump algorithm. Please note that we only use 20000 iterations for each of the exercises due to time constraints. For other analyses we recommend using a much larger number to ensure the algorithm has converged.
As you have seen in the lecture, there are various options available within distance sampling for setting up an integrated likelihood. Here we will use the following options (see also lecture slides 22: Bobwhites - exact distances and 23: Integrated likelihood):
analysing exact distance data (i.e. distances were not collected in distance intervals)
conditional likelihood (i.e. conditional on the number of animals being detected)
mixed effect model with Poisson errors for modelling the counts
effective area for adjusting observed counts for imperfect detections
Using this scenario, the integrated likelihood is given by: \[ {\mathcal L}_{n,y} = {\mathcal L}_n\times{\mathcal L}_{y}~. \nonumber \tag{1} \]
To avoid numerical problems, we use the log-likelihood which is given by (note: we use ln for natural log): \[ \ell_{n,y} = \ln\left({\mathcal L}_{n,y}\right) = \ln\left({\mathcal L}_n\right) + \ln\left({\mathcal L}_{y}\right)~. \nonumber \tag{2} \]
We use the case study of northern bobwhite quail coveys that forms the case study in Oedekoven et al. (2014), however, limiting the data to a single state (MS) for simplicity. The covey data stem from a designed experiment where the interest was in determining whether a conservation measure - planting herbaceous buffers around agricultural fields - had the desired effect (Figure 1). For this purpose, pairs of survey points were set up in several states in the US, one point on a buffered treatment field, one point on an unbuffered control field. Pairs of points were surveyed at least once but up to three times per year in autumn from 2006 to 2008.
Figure 1. Buffered agricultural field part of the National CP-33 Monitoring Program coordinated by the Mississippi State University (left), survey points (middle) and northern bobwhite (right).
As the interest was in determining whether covey densities were higher on treatment fields, we use count models where the observed counts are modelled as a function of covariates. Here, the parameter of interest is the coefficient for the Type covariate for the level “TREAT”. A coefficient that is significantly greater than zero would indicate that the buffers had the desired effect, i.e., increased covey densities. Due to the repeated counts and the spatial closeness of points within a pair, we include a random effect for each site (where site refers to a pair of points) in the count model (see below for details).
For completing this case study, we need to extract the contents of the zip file “CoveyCaseStudy_IBS.zip” and into a designated folder on our personal computers. We begin the case study by opening a new R workspace (either in R or in R Studio) and changing the working directory to the same directory by clicking on File – Change dir… from the file menu in R or by choosing Session – Set Working Directory – Choose Directory in R Studio. Then, we use the following code to import the data:
covey<-read.csv("covey.csv")
In addition to the data, we need to upload the functions necessary for the case study:
source("Functions.r")
We begin by ensuring that the data have the correct format required for using the likelihood functions in the following sections. For this purpose, we first look at the raw data contained in the covey data frame. We use the function head to print the first 6 records.
head(covey)
## gr.id smp.id vis.id Year State Type JDctr det.id distance size
## 1 263 487 2006_1 2006 MS CONTROL 20 NA NA NA
## 2 263 488 2006_1 2006 MS TREAT 20 NA NA NA
## 3 263 488 2007_1 2007 MS TREAT 12 NA NA NA
## 4 263 488 2008_1 2008 MS TREAT -11 NA NA NA
## 5 264 489 2006_1 2006 MS CONTROL 20 NA NA NA
## 6 264 489 2007_1 2007 MS CONTROL 0 NA NA NA
The columns represent the identifiers for site (gr.id), point (smp.id), visit (vis.id) and detection (det.id), as well as the distance to the detection (distance) and group size of the detection (size). These are the columns that are required for using the data formatting and likelihood functions below. Note that for the covey data group sizes were unknown and we model numbers of detections rather than number of individuals. However, the size column is required by some of the functions we will use. In addition to the required columns, we have covariates State, Type and JDctr which were ‘measured’ for each visit to a point. Covariate JDctr represents Julian date which was centred around its mean, hence, some values are negative.
Each record represents either a detection or a visit without detection in the case that no coveys were detected during the respective visit to the point. The data are already truncated at 500m, i.e., the largest distance included in the covey data is 500m or less. We can check this using the function max.
max(covey$distance,na.rm=T)
## [1] 490.72
We use the function create.data to turn the raw covey data into a format that allows us to fit the detection and count models below. For the detection model, we need a data frame (which we call dis.object) which contains all the records of detections within the truncation distance including their distances and covariates that may be included in the detection model. For the count model, we need a data frame (which we call glmm.data) where each record represents a visit to a point and the total number of detections within the truncation distance during that visit to the point is recorded. The data frame glmm.data contains some columns that are required such as a unique identifier for each visit to a point (smp.vis), smp.id, gr.id and vis.id from before, as well as the columns detections and individuals (which give the total number of detections and detected individuals for each visit to a point, respectively) and the covariates that might be used in the count model, Type, JDctr and State.
Both the dis.object and glmm.data data frames are created by the function create.data and combined in a list containing additional information. The additional information includes the type of sampler (points or lines), the units of distance measurements, whether we are using binned distance data, which covariates may be used in the detection and count models and which of these should be regarded as a factor covariate and what type of analysis we would like to use the data for. The argument conditional refers to whether we use the conditional likelihood (FALSE refers to the methods using equations 9.25 and 9.27 of (Buckland et al. 2015) and only applies to binned distance data).
# creating a data object which can be used with the functions below
covey.data<-create.data(covey,500,sampler="points","m",binned=F,
dis.cov=c("Type"),count.cov=c("Type","JDctr"),
dis.factor.cov = c("Type"), count.factor.cov = c("Type"),
conditional = T)
The object covey.data is also provided with in the covey case study zip file and can be uploaded directly into the workspace using:
# loading the data object covey.data instead of creating it with create.data()
load("covey.data.rdata")
The object covey.data is a list of 12 items containing the information described in the preceding paragraphs. To have a look at what covey.data contains you can use:
# to have a look at what covey.data contains
str(covey.data, max.level=1)
## List of 12
## $ w : num 500
## $ sampler : chr "points"
## $ dis.unit : chr "m"
## $ sampler.unit : chr "m"
## $ binned : logi FALSE
## $ dis.cov : chr "Type"
## $ dis.cov.fac : logi [1(1d)] TRUE
## $ count.cov : chr [1:2] "Type" "JDctr"
## $ count.cov.fac: logi [1:2(1d)] TRUE FALSE
## $ conditional : logi TRUE
## $ dis.object :'data.frame': 218 obs. of 11 variables:
## $ glmm.data :'data.frame': 236 obs. of 8 variables:
The two essential data frames in covey.data are covey.data$dis.object and covey.data$glmm.data which contain the distance data and count data respectively:
# a quick peek at the data
head(covey.data$dis.object)
## gr.id smp.id vis.id Year State Type JDctr det.id distance size
## 7 264 489 2008_1 2008 MS CONTROL -11 606 419.73 1
## 10 264 490 2008_1 2008 MS TREAT -11 607 186.37 1
## 14 265 492 2006_1 2006 MS TREAT 19 608 261.88 1
## 19 266 494 2007_1 2007 MS TREAT -1 609 142.10 1
## 20 266 494 2008_1 2008 MS TREAT -12 610 292.61 1
## 21 267 495 2006_1 2006 MS CONTROL -11 611 231.45 1
## smp.vis
## 7 264_489_2008_1
## 10 264_490_2008_1
## 14 265_492_2006_1
## 19 266_494_2007_1
## 20 266_494_2008_1
## 21 267_495_2006_1
head(covey.data$glmm.data)
## smp.vis smp.id gr.id vis.id detections individuals Type JDctr
## 1 263_487_2006_1 487 263 2006_1 0 0 CONTROL 20
## 2 263_488_2006_1 488 263 2006_1 0 0 TREAT 20
## 3 263_488_2007_1 488 263 2007_1 0 0 TREAT 12
## 4 263_488_2008_1 488 263 2008_1 0 0 TREAT -11
## 5 264_489_2006_1 489 264 2006_1 0 0 CONTROL 20
## 6 264_489_2007_1 489 264 2007_1 0 0 CONTROL 0
To introduce the steps involved for a Metropolis Hastings (MH) update, we use only one component of the integrated likelihood (see equations 1 and 2), the component for the detection model \({\mathcal L}_{y}\).
Setting up a Metropolis Hastings (MH) updating algorithm involves several steps. For this exercise these include:
Defining the likelihood function and parameters
Defining the priors for the parameters
Defining the proposal distributions for updating parameters in the MH algorithm
Initial values for the parameters
Setting up objects for storing the parameter values during the MH algorithm
Running the MH updating algorithm
Pilot tuning the MH updating algorithm
Inference: obtaining summary statistics of the posterior distributions of parameters and the effective area
In this section, we provide the functions and R code necessary to implement these steps. We begin with the theory for this algorithm.
We use a single-update random walk MH algorithm with normal proposal density, where we cycle through each parameter in the likelihood. During each iteration of the MCMC algorithm, we update each of the parameters by proposing to move to a new state and accepting this move with some probability. To update, say, parameter \(\sigma_{hr}\) (from the hazard-rate detection model with two parameters: scale \(\sigma_{hr}\) and shape \(\tau\)) at iteration \(t\) with current value \(\sigma^t_{hr}\), we propose to move to a new state, \(\sigma'_{hr}\), with \(\sigma'_{hr} \sim N\left(\sigma^t_{hr},\sigma^2_{\sigma_{hr}}\right)\) (Hastings 1970; Davison 2003), where \(\sigma^2_{\sigma_{hr}}\) is the proposal variance for \(\sigma_{hr}\). This newly proposed state is accepted as the new state with probability \(\alpha(\sigma'_{hr}|\sigma_{hr}^t)\) given by: \[ \alpha(\sigma'_{hr}|\sigma_{hr}^{t})=\min\left(1,\frac{{\mathcal L}_{y}(\sigma'_{hr},\tau^t)p(\sigma'_{hr})q(\sigma_{hr}^{t}|\sigma'_{hr})}{{\mathcal L}_{y}(\sigma_{hr}^{t},\tau^t)p(\sigma_{hr}^{t})q(\sigma'_{hr}|\sigma_{hr}^{t})} \right)~. \tag{3} \] where \({\mathcal L}_{y}(\sigma'_{hr},\tau)\) is the full likelihood calculated using the newly proposed value \(\sigma'_{hr}\) and current values \(\tau^t\); \(p(\sigma'_{hr})\) is the prior distribution for \(\sigma_{hr}\) evaluated at \(\sigma_{hr} = \sigma'_{hr}\) and \(q(\sigma'_{hr}|\sigma_{hr}^{t})\) denotes the proposal density of the newly proposed state \(\sigma'_{hr}\) given that the current state is \(\sigma_{hr}^{t}\). We note that the terms \(q(\sigma_{hr}^{t}|\sigma'_{hr})\) and \(q(\sigma'_{hr}|\sigma_{hr}^{t})\) cancel in the acceptance probability as we use a symmetrical proposal distribution (normal). Equation 3 is equivalent to equation 9.39 from Section 9.4.1 of Buckland et al. (2015).
In the following we describe each of the terms required to calculate the acceptance probability, i.e., the likelihood, the prior distributions and the proposal distributions.
The function covey.hr.detmodel.log.lik calculates the log-likelihood given a set of values for the parameters \(\sigma_{hr}, \tau\) for a hazard-rate detection model fitted to the distance data of the covey data. It has two arguments: p and datax (where p contains the parameter values in the order \(\sigma_{hr}, \tau\) and datax = covey.data). This function expects certain elements in the data; hence, it is essential that the covey data is in the format described above, i.e., a data object created with the create.data function.
The code in covey.hr.detmodel.log.lik can be divided into two parts: assigning the parameter values for argument p and the likelihood component for the detection function: \[ {\mathcal L}_{y} = \prod_{e=1}^n{f(y_e)} \nonumber \tag{4} \]
where, for our case study of point transect data, the \(y_e\) represent the radial distances from the point to the eth detection (for the \(e = 1,2,3,...,n\) detections) and \(f(y)\) is the probability density function (pdf) of observed distances. Here, we assume that the observed distances \(y_e\) are identically and independently distributed with \(y_e \sim f(y)\). First we review the components \(f(y)\) and then look at the code for this function in detail.
The pdf \(f(y)\) is generally composed of two other functions, the detection function \(g(y)\) and the function \(\pi(y)\) describing the expected distribution of animals with respect to the point (or line in the case of line transects). For \(g(y)\), we focus on the hazard-rate model as in preliminary analyses (fitting detection functions to the distance data using program Distance) we found that this function provided the better fit to the covey distance data compared to the half-normal. The hazard-rate contains the shape parameter \(\tau\) and the scale parameter \(\sigma_{hr}\) and is given by: \[ g(y_e) = 1-\exp\left[ \left(-y_e/\sigma_{hr}\right)^{-\tau}\right],~~0\leq y_e \leq w~. \tag{5} \]
where, \(w\) is the truncation distance.
For point transects, the function \(\pi(y)\) describing the expected distribution of animals with respect to the point is given by: \[ \pi(y) = 2y\pi/\pi w^2~. \nonumber \]
For point transects, the function \(f(y)\) is given by:
\[ f(y) = \frac{\pi(y)\times g(y)}{\int_0^w{\pi(y)\times g(y)dy}}=\frac{2y\pi/\pi w^2\times g(y)}{\int_0^w{2y\pi/\pi w^2\times g(y)dy}}=\frac{y\times g(y)}{\int_0^w{y\times g(y)dy}} \nonumber \] and is evaluated at each \(y_e\) for \(e=1,...,n\) to evaluate the log-likelihood of equation 4 (see below). As \(\pi(y)\) is in the numerator and denominator, we can omit the constants of \(\pi(y)\) (i.e., \(2\pi/\pi w^2\)) from this equation as this will speed up the MH algorithm. Using the hazard-rate function from equation 5 for \(g(y)\), we can use the following R code to calculate \(f(y_e)\).
# this function is loaded into your workspace using the source command above
f.haz.function.pt<-function(dis,sigma,shape) {
f <- dis*(1-exp(-(dis/sigma)^(-shape)))
f
}
Here, the argument dis represents the \(y_e\) and sigma and shape represent the parameters \(\sigma_{hr}\) and \(\tau\) from equation 5.
For illustrating the MH algorithm, we use a simple hazard-rate detection model. Here, it is essential to include all parameters in argument p and to assign the parameters correctly in the likelihood calculations shown in the following sections.
The covey detection data consists of exact distances which are stored in the data frame covey.data$dis.object in the column distance. The truncation distance \(w\) is stored in covey.data$w.
The following R code shows the function covey.hr.detmodel.log.lik which calculates the log-likelihood for the detection model \(\ln\left({\mathcal L}_{y}\right)\). Note that the truncation distance covey.data$w was set to \(500\) when formatting the data using the function create.data.
# This function is loaded into your workspace using the source command above
covey.hr.detmodel.log.lik<-function(p, datax){
## Part 1: setting up the parameter values p for covariates
# the detection model
scale<-exp(p[1]) # the scale parameter
shape<-exp(p[2]) # the shape parameter
## Part 2: the likelihood component pertaining to the detection model
# calculating the f(y) for each observed distances
le<-nrow(datax$dis.object)
fe<-numeric(le)
# calculating the f(y) for each observation
norm.const<-integrate(f.haz.function.pt,0,datax$w,scale,shape)$value
for (e in 1:le){
fe[e]<-f.haz.function.pt(datax$dis.object$distance[e],scale,shape)/norm.const
}
# the sum of the log(f(y))
log.e<-sum(log(fe))
return(log.e)
}
In addition to defining the likelihood we need to set up the priors for our model parameters. If prior knowledge on the parameters exists, it may be incorporated via the prior distributions. If no prior knowledge on the parameters exists, uninformative priors, such as uniform, may be placed on all parameters. As we have no prior information on any of the parameters, we use uniform priors. The important part to consider here is that these need to be wide enough for the chain to move freely within the parameter space without hitting any of the boundaries. This may be investigated with pilot tuning (see below).
On the other hand, when using RJMCMC methods, care needs to be taken to not make the boundaries too wide as that might prevent the chain from moving freely across the model space (see RJMCMC section below for more details).
For this exercise we have two parameters in the detection model (see Section Part 1: Assigning the parameter values p above). We use the following R code to set up the lower and upper boundaries for the uniform distributions.
# lower and upper limits are given on the un-transformed scale for both parameters
lolim<-c(50,1)
uplim<-c(1000,10)
We use the following R function to calculate the log of the uniform prior density for each of the parameters. The argument coef is the input for the coefficient values and the arguments lo and up refer to the parameter-specific lower and upper limits for the uniform distribution. In contrast to using log(dunif(…)), the l.prior function returns a large negative value for the log of the prior density if the value is outside the boundaries of the distribution. This prevents the acceptance of a newly proposed parameter value if it is outside the boundaries. In contrast, using log(dunif(…)) returns -Inf if the parameter value is outside the boundaries which may cause the algorithm to crash.
# This function is uploaded into your workspace using the source command above
l.prior<-function(coef,lo,up){
l.u.int<-log(dunif(coef,lo,up))
if(any(abs(l.u.int)==Inf))l.u.int<- -100000
sum(l.u.int)
}
In this section, we set up the proposal distributions for updating the parameters during the MH algorithm. We use normal proposal distributions where the mean is the current value of the parameter and the standard deviation is defined for each parameter. The proposal distributions have two purposes. They are used for updating the current parameter values: if, for example, at iteration \(t\) we wish to update parameter \(\sigma_{hr}\) with current value \(\sigma_{hr}^t\), we draw a new random sample \(\sigma'_{hr} \sim N\left(\sigma^t_{hr},\sigma^2_{\sigma_{hr}}\right)\) where \(\sigma_{\sigma_{hr}}\) is the proposal standard deviation for parameter \(\sigma_{hr}\).
The proposal distributions are also used to calculate the proposal densities for equation 3 (see above). However, as normal proposal distributions are symmetric, the proposal density \(q(\sigma_{hr}^{t}|\sigma'_{hr})\) equals \(q(\sigma'_{hr}|\sigma_{hr}^{t})\) and, hence, these cancel when calculating the acceptance probability for the proposal to update parameter \(\sigma_{hr}^t\) in equation 3 above. We use pilot-tuning to define the parameter-specific proposal standard deviations where we aim to obtain appropriate acceptance rates, i.e., allowing the chain to move freely in the parameter space (Gelman, Roberts, and Gilks 1996). Increasing the variance on average results in decreasing acceptance probabilities and vice versa. See section on pilot-tuning below.
############# proposal distributions
# Standard deviations for normal proposal distributions for detection function parameters
q.sd<-c(3.5,0.1)
In this step, we set up the initial values for each of the two model parameters. The initial parameter values are stored in the array curparam. This array (in combination with the corresponding array newparam) is used to update the current parameter values throughout the chain. (See the Sections on implementing the MH and RJ algorithms below). The updated parameter values are added to a data frame at the end of each iteration of the chain (see below).
## initial values for the detection function parameters
# scale and shape (both enter the model on the log scale)
sig.0<-log(130) # scale parameter
sha.0<-log(2.5) # shape parameter
# combining initial values in a single array for input into the likelihood function
curparam<-c(sig.0,sha.0)
In this section we set up the data frames which will store the values for the parameters and the random effect coefficients for each iteration of the chain of the MH updating algorithm. These are large objects due to the length of the chain. As the chain will likely take a long time to complete, it is advisable to save these objects periodically while the chain is running. We deal with this when we set up the MH algorithm.
# number of iterations for the MC chain
nt<-20000
# param.mat holds the values for all parameters throughout the chain
param.mat.ex1<-matrix(NA,nt,length(curparam))
param.mat.ex1<-data.frame(param.mat.ex1)
names(param.mat.ex1)<-c("sig","sha")
param.mat.ex1[1,]<-curparam
This algorithm involves updating each parameter in \(p\) during each of \(nt\) iterations using the acceptance probability from equation 3 above. In the above section Part 2: Defining the likelihood function, we learned about the R function covey.hr.detmodel.log.lik. We use this function for updating each of the parameters in argument p. The function \(l.prior\) is used to calculate the uniform priors for each of the parameters.
########################## Metropolis Hastings updating algorithm ############
# nt is the number of iterations and is set above
# Row 1 in param.mat.ex1 contains the initial values, so we start with i = 2
set.seed(1234)
#t1<-unclass(Sys.time())
for (i in 2:nt){
print(i)
# updating the model parameters
newparam<-curparam
for (b in c(1:2)){
num<-NA
den<-NA
# proposing to update the bth parameter
u<-rnorm(1,0,q.sd[b])
# the scale and shape parameters are on the log scale in curparam and newparam
# the boundaries for the priors and the proposal densities are not on the log scale
newparam[b]<-log(exp(curparam[b])+u)
new.l.prior<-l.prior(exp(newparam[b]),lolim[b],uplim[b])
cur.l.prior<-l.prior(exp(curparam[b]),lolim[b],uplim[b])
num<-covey.hr.detmodel.log.lik(newparam,covey.data) + new.l.prior
den<-covey.hr.detmodel.log.lik(curparam,covey.data) + cur.l.prior
A<-min(1,exp(num-den))
V<-runif(1)
ifelse(V<=A,curparam[b]<-newparam[b],newparam[b]<-curparam[b])
}
# storing the updated parameter values
param.mat.ex1[i,]<-curparam
# saving the parameter matrices every 5000 iterations
if(!is.na(match(i,seq(0,nt,5000))==T)){
save(param.mat.ex1,file='param.mat.ex1.RData')
}
}
For an MH updating algorithm, pilot tuning involves ensuring that the parameter space is explored freely for each parameter. This can be done using trace plots which may be produced at any iteration \(i\) with the following R code.
# The following code will only work if the MH algorithm has run for several iterations.
# We recommend running at least 100 iterations.
# producing trace plots for all parameters for iterations 1:i
par(mfrow=c(1,2))
for (b in 1:2){
plot(param.mat.ex1[1:i,b],t='l',xlab="Iteration",ylab="")
if(b == 1) {title(main = expression("Parameter " * sigma[hr])) }
if(b == 2) {title(main = expression("Parameter " * tau)) }
}
For our case study, we use normal proposal distributions which use the current value of the respective parameter as the mean and have parameter-specific proposal variances (defined in the section on proposal distributions above). Pilot tuning may involve adjusting the standard deviations of the proposal distributions (defined in the array q.sd above).
We show two sets of trace plots for the same selection of parameters in p - after pilot tuning (Figure 2) presenting the desired pattern of quick up-and-down movements and before pilot tuning (Figure 3) presenting the undesired pattern of a ‘skyscraper landscape’.
Figure 2. Trace plots for iterations 1:1000 for parameters \(\sigma_{hr}\) and \(\tau\) after pilot tuning (i.e. using the proposal distributions from above q.sd=c(3.5,0.1)). Note that shown parameter values are on the log-scale.
Figure 3. Trace plots for iterations 1:550 for parameters \(\sigma_{hr}\) and \(\tau\) before pilot tuning (i.e. using proposal distributions q.sd2=c(4.55, 2)). Note that shown parameter values are on the log-scale.
In addition, we need to ensure that the potential values that each parameter in p can take are not limited by the boundaries of the uniform prior distributions. This can also be checked using the trace plots. An example of what this might look like is shown in Figure 4.
Figure 4. Trace plot for parameter 1 (\(\sigma_{hr}\)) before pilot tuning when the upper boundary of the uniform prior for this parameter was set too low.
The results from the MH algorithm can also be uploaded into the workspace using:
load("param.mat.ex1.rdata")
First we check if the algorithm returned the desired results, i.e. that the chain converged and that the parameter space was explored freely. This was confirmed using the traceplots for the parameters using:
# producing trace plots for all parameters for iterations 1:nt
par(mfrow=c(1,2))
for (b in 1:2){
plot(param.mat.ex1[1:nt,b],t='l',xlab="Iteration",ylab="")
if(b == 1) {title(main = expression("Parameter " * sigma[hr])) }
if(b == 2) {title(main = expression("Parameter " * tau)) }
}
Figure 5. Trace plots for iterations 1:20000 for parameters \(\sigma_{hr}\) and \(\tau\). Note that shown parameter values are on the log-scale.
We draw inference on parameters by obtaining summary statistics of the marginal posterior distributions. For this purpose, we use the parameter values stored in param.mat.ex1 after completing the MH algorithm above for all \(nt = 20000\) iterations.
Using param.mat.ex1, we calculate the mean, standard deviation and central 95% credible intervals. For this exercise, we use the first 999 iterations for the burn-in phase and omit these. Note, that usually larger numbers are recommended for both the burn-in and the total number of iterations. These summary statistics may be obtained using the following R code:
nt=20000
# means
round(apply(param.mat.ex1[1000:nt,1:2],2,mean),4)
## sig sha
## 5.6621 1.3220
# standard deviations
round(apply(param.mat.ex1[1000:nt,1:2],2,sd),4)
## sig sha
## 0.0758 0.1744
# 95% credible intervals
round(apply(param.mat.ex1[1000:nt,1:2],2,quantile,probs=c(0.025,0.975)),4)
## sig sha
## 2.5% 5.5052 0.9706
## 97.5% 5.7954 1.6286
We can plot the distribution for each of the parameters using the following R code:
par(mfrow=c(1,2))
for (b in 1:2){
plot(density(param.mat.ex1[10000:nt,b]),t='l',main="",ylab="Density",xlab="Parameter value")
if(b == 1) {title(main = expression("Parameter " * sigma[hr])) }
if(b == 2) {title(main = expression("Parameter " * tau)) }
}
Figure 6. Posterior distribution for detection function parameters \(\sigma_{hr}\) and \(\tau\).
We are more interested in the effective area rather than the parameter values themselves (here, the effective area per point). Hence, we use the parameter values from param.mat.ex1 to estimate the posterior distribution of the effective area. Note that for this example, where we use a detection function without covariates or stratification, we obtain one estimate for the effective area for each iteration of the MCMC.
efa.ex1<-array(NA,nt)
for (i in 1:nt){
efa.ex1[i]<-integrate(f.haz.function.pt,0,covey.data$w,exp(param.mat.ex1[i,1]),exp(param.mat.ex1[i,2]))$value*pi*2
}
We can plot the distribution and traceplot of the effective area using:
par(mfrow=c(1,2))
plot(density(efa.ex1[1000:nt]/(1000*1000)),main="Effective Area",xlab=expression(km^2))
plot(efa.ex1[1000:nt]/(1000*1000),main="Effective Area",ylab=expression(km^2),xlab="Iteration",t='l')
To obtain summary statistics for the effective area we use:
# means
round(mean(efa.ex1[1000:nt]/(1000*1000)),4)
## [1] 0.3857
# standard deviations
round(sd(efa.ex1[1000:nt]/(1000*1000)),4)
## [1] 0.0331
# 95% credible intervals
round(quantile(efa.ex1[1000:nt]/(1000*1000),probs=c(0.025,0.975)),4)
## 2.5% 97.5%
## 0.3224 0.4529
We can use the parameter estimates to produce detection function plots using the following code:
# The hazard-rate detection function
g.haz.function.pt<-function(dis,sigma,shape) {
g <- 1-exp(-(dis/sigma)^(-shape))
g
}
par(mfrow=c(1,2))
hist(covey.data$dis.object$distance,freq = F,xlab="m",main="PDF of observed distances",ylab="f(y)")
scale = mean(exp(param.mat.ex1[1000:nt,1]))
shape = mean(exp(param.mat.ex1[1000:nt,2]))
norm.const<-integrate(f.haz.function.pt,0,covey.data$w,scale,shape)$value
lines(f.haz.function.pt(seq(0,500,1),scale,shape)/norm.const~seq(0,500,1),col=2,lwd=2)
plot(g.haz.function.pt(seq(0,500,1),scale,shape)~seq(0,500,1),type='l',xlab="m",ylab="g(y)",main="Detection function")
Figure 7. Histogram of observed distances with probability density function (left) and detection function (right). For both functions, we used posterior means of the parameter values.
See if you understand the R code provided above and can get the code to run on your laptop. Note, you do not need to complete all 20000 iterations in order to get the summary statistics as the data frame containing the results param.mat.ex1 can be loaded into the workspace.
Replace the hazard-rate detection function with a half-normal. The half-normal detection function is given by: \[ g(y) = \exp\left(\frac{-y^2}{2\sigma_{hn}^2}\right) \nonumber \] and has one parameter \(\sigma_{hn}\). Write the R code for \(f(y) = y(gy)\) with the half-normal detection model equivalent to function f.haz.function.pt above and call it, e.g. f.hn.function.pt.
By modifying function covey.hr.detmodel.log.lik, write the likelihood function covey.hn.detmodel.log.lik to reflect that we now want to fit a half-normal instead of a hazard-rate detection function. Define the steps for an MH algorithm for fitting a half-normal detection function to the covey data.
Define the priors, proposal distributions and starting value for model parameters including \(\sigma_{hn}\). Set up a data frame param.mat.ex1.hn which stores the parameter values for \(nt\)=20000 iterations.
Using the MH algorithm from above as a starting point, write your MH algorithm for a half-normal detection model.
Using the methods from above for estimating the posterior distribution of the effective area, what are you mean, standard deviation and 95% credible intervals for parameter the effetive area from your algorithm using the half-normal detection model?
We expand on the methods we learned in exercise 1 and will now fit both models, the detection and count models, in a single analysis. Here we are going to use the log-likelihood from equation 2 above (note that, again, we use \(ln\) to denote the natural log). \[ \ell_{n,y} = \ln\left({\mathcal L}_{n,y}\right) = \ln\left({\mathcal L}_n\right) + \ln\left({\mathcal L}_{y}\right)~. \nonumber \]
The function covey.int.log.lik calculates the log-likelihood given a set of values for parameters p and random effect coefficients raneff for a full model for the covey data. It has three arguments: p, raneff and datax (here datax = covey.data). This function expects certain elements in the data; hence, it is essential that the covey data is in the format described above, i.e., a data object created with the create.data function.
For details on the detection model component \({\mathcal L}_{y}\) see exercise 1. In comparison to function covey.hr.detmodel.log.lik, we add the count model component \({\mathcal L}_n\) including random effects for sites and present the R code of the covey.int.log.lik that correspond to the respective components. The code in covey.int.log.lik can be divided into three parts: assigning the parameter values p, the likelihood component for the detection function and the likelihood component for the count model, which we discuss individually below.
For the count model, we use a Poisson likelihood where the expected value \(\lambda\) is modelled as a function of covariates \(x_q\) with associated coefficients \(\beta_q\). Due to the repeated counts at the two points of each site, we use the methods described in Oedekoven et al. (2014) and Section 9.2.5.1 of Buckland et al. (2015) and include a random effect \(b_l\) for each site for which we assume normality: \(b_l \sim N(0,\sigma_l^2)\). In the following subscript \(l\) refers to the different sites and subscript \(t\) to the repeated surveys. As each site also consisted of two points, a control and a treated point, we also include a subscript for point \(k\). The expected value is then given by: \[ \lambda_{lkt}=\exp\left(\sum_{q=1}^Q x_{qlkt}\beta_q + b_l + \ln(\nu_{lkt} )\right) \tag{6} \] Note that in comparison to the equations given in Section 9.2.5.1 of the book, we replaced the product of the surveyed area and the average detection probability \(a_{lkt} P_{lkt}\) with the equivalent quantity, the effective area \(\nu_{lkt}\). We can omit the subscripts from the effective area as, for this exercise, we fit a detection function without any covariates or stratification.
Including a random effect in the count model entails that, in addition to the Poisson densities given for the observed counts, we include normal densities for the random effect coefficients in the likelihood (Oedekoven et al. 2014). However, when fitting random effect models using the MH algorithm, the random effect is not integrated out analytically (see also Section 9.4.1 of Buckland et al. 2015). Instead, we use a data augmentation scheme where the individual random effect coefficients are included as parameters (or auxiliary variables) in the model and updated at each iteration of the MCMC algorithm. As a consequence, when calculating the likelihood for a given set of values for the parameters and the random effect coefficients, we can omit the integral for the random effect from the likelihood (as in slide 23 from the lecture) which is now given by: \[ {\mathcal L}_n = \prod_{l=1}^L\left\{\prod_{k=1}^K \prod_{t=1}^{T_k}\frac{\lambda_{lkt}^{n_{lkt}}\exp[-\lambda_{lkt}]}{n_{lkt}!}\right\} \times \frac{1}{\sqrt{2\pi \sigma_l^2}}\exp\left[-\frac{b_l^2}{2\sigma_l^2}\right]~, \nonumber \] where \(L\) is the total number of sites (40 for our case study), \(K\) is the total number of points per site (2 for each site for our case study) and \(T_k\) is the number of repeat visits to the kth point (ranging between 1 and 3 for our case study). Without the integral, we are able to perform a one-to-one log-transformation. Hence, after the log-transformation, our likelihood component \(\ln\left({\mathcal L}_n\right)\) for the count model is defined as: \[ \ln\left({\mathcal L}_n\right) = \sum_{l=1}^L\sum_{k=1}^K \sum_{t=1}^{T_k}\ln\left(\frac{\lambda_{lkt}^{n_{lkt}} \exp[-\lambda_{lkt}]}{n_{lkt}!}\right) + \sum_{l=1}^L\ln \left(\frac{1}{\sqrt{2\pi \sigma_l^2}}\exp\left[-\frac{b_l^2}{2\sigma_l^2}\right]\right)~. \nonumber \tag{7} \] We fit the count model to the count data which are in a data frame stored in covey.data$glmm.data. Here, each record represents a single visit to a point and total numbers of detections within the truncation distance \(w\) are tallied for each visit in the column covey.data$glmm.data$detections. The part of the covey.int.log.lik function that calculates the count model likelihood component is given under Part 3 within the code. Note, that for any observation in covey.data$glmm.data the term in the left large brackets can be evaluated with the function dpois while for each random effect coefficient the term in the right large bracket can be evaluated using the function dnorm.
For illustrating the MH algorithm, we use a hazard-rate detection model (as in exercise 1) and, for the count model, an intercept and the factor covariate Type plus the random effects for site. The coefficients p include the detection function parameters, \(\sigma_{hr}\) and \(\tau\), and the count model parameters, the intercept \(\beta_0\), the coefficient \(\beta_1\) for covariate Type associated with factor level TREAT and the standard deviation for the random effects \(\sigma_l\).
The random effects coefficients are stored in the second argument, raneff, of the function covey.int.log.lik while the third argument datax is for the data. As in exercise 1, we use the data object covey.data as this has been formatted appropriately with the function create.data.
# This function is loaded into your workspace using the source command above
covey.int.log.lik<-function(p, raneff, datax){
## Part 1: setting up the parameter values p for the detection function and count models
# the detection model
scale<-exp(p[1]) # the scale parameter
shape<-exp(p[2]) # the shape parameter
# the count model
int<-p[3] # the intercept
typ<-c(0,p[4]) # coefficient for Type level "TREAT" (level "CONTROL" is absorbed in the intercept)
sd.ran<-exp(p[5]) # the random effect standard deviation on the log scale
# Part 2: the likelihood component pertaining to the detection model
# calculating the f(y) for each observed distances
le<-nrow(datax$dis.object)
fe<-numeric(le)
# calculating the f(y) for each observation (note that the truncation distance is stored in datax$w)
norm.const<-integrate(f.haz.function.pt,0,datax$w,scale,shape)$value
for (e in 1:le){
fe[e]<-f.haz.function.pt(datax$dis.object$distance[e],scale,shape)/norm.const
}
# the sum of the log(f(y))
log.e<-sum(log(fe))
# Part 3: the likelihood component pertaining to the count model
# setting up indices for the factor covariate Type and random effect coefficients
Type0<-match(datax$glmm.data$Type,sort(unique(datax$glmm.data$Type)))
gr.id<-sort(unique(datax$glmm.data$gr.id))
Ran0<-match(datax$glmm.data$gr.id,gr.id)
# log of effective area (here: same for each visit to a point due to global det model)
l.efa<-log(norm.const*pi*2)
# calculate the log-Poisson density for each count
lambda<-exp(int + typ[Type0] + raneff[Ran0] + l.efa)
dpois.y<-dpois(datax$glmm.data$detections,lambda)
logdpois.y<-sum(log(dpois.y))
# calculate the log of the normal density for each random effect coefficient
log.dnorm.raneff<-sum(log(dnorm(raneff,0,sd.ran)))
# adding up the likelihood components
log.lik<-logdpois.y + log.dnorm.raneff + log.e
# the function return
return(log.lik)
}
In addition to defining the likelihood we need to set up the priors for our model parameters. If prior knowledge about the parameters exists, it may be incorporated via the prior distributions. If no prior knowledge exists, uninformative priors, such as uniform, may be placed on all parameters. As we have no prior information on any of the parameters, we use uniform priors. The important part to consider here is that these need to be wide enough for the chain to move freely within the parameter space without hitting any of the boundaries. This may be investigated with pilot tuning (see below).
For exercise 2, we have a total of five parameters in the combined detection and count model. (See Section Assigning the parameter values p above). We use the following R code to set up the lower and upper boundaries for the uniform distributions.
# Note: lower and upper limits are given on the un-transformed scale for all parameters including scale, shape and sd.ran
lolim<-c(50,1,rep(-5,2),0.0001)
uplim<-c(1000,10,rep(5,2),3)
We use the following R function to calculate the log of the uniform prior density for each of the parameters. The argument coef is the input for the coefficient values and the arguments lo and up allow the input of parameter-specific lower and upper limits for the uniform distribution. In contrast to using log(dunif(…)), the l.prior function returns a large negative value for the log of the prior density if the value is outside the boundaries of the distribution. This prevents the acceptance of a newly proposed parameter value if it is outside the boundaries. In contrast, using log(dunif(…)) returns -Inf if the parameter value is outside the boundaries which may cause the algorithm to crash.
# This function is uploaded into your workspace using the source command above
l.prior<-function(coef,lo,up){
l.u.int<-log(dunif(coef,lo,up))
if(any(abs(l.u.int)==Inf))l.u.int<- -100000
sum(l.u.int)
}
In this section, we set up the proposal distributions for updating the parameters during the MH algorithm. We use normal proposal distributions where the mean is the current value of the parameter and the standard deviation is defined for each parameter. The proposal distributions have two purposes. They are used for updating the current parameter values: if, for example, at iteration \(t\) we wish to update parameter \(\beta_0\) with current value \(\beta_0^t\), we draw a new random sample \(\beta'_0 \sim N\left(\beta^t_0,\sigma^2_{\beta_0}\right)\) where \(\sigma_{\beta_0}\) is the proposal standard deviation for parameter \(\beta_0\).
The proposal distributions are also used to calculate the proposal densities for equation 3 (see above). However, as normal proposal distributions are symmetric, the proposal density, e.g. for parameter \(\beta_0\), \(q(\beta_0^{t}|\beta'_0)\) equals \(q(\beta'_0|\beta_0^{t})\) and, hence, these cancel when calculating the acceptance probability for the proposal to update parameter \(\beta_0^t\) in equation 3. We use pilot-tuning to define the parameter-specific proposal standard deviations where we aim to obtain appropriate acceptance rates, i.e., allowing the chain to move freely in the parameter space (Gelman, Roberts, and Gilks 1996). Increasing the variance on average results in decreasing acceptance probabilities and vice versa. See section on pilot-tuning below.
############# proposal distributions
# Standard deviations for normal proposal distributions for all parameters:
# detection function (1:2) and count model (3:5) including random effect standard deviation
q.sd<-c(3.5,0.1,0.02,0.03,0.02)
In this step, we set up the initial values for each of the five model parameters as well as the random effect coefficients. The initial parameter values are stored in the array curparam. This array (in combination with the corresponding array newparam) is used to update the current parameter values throughout the chain. The updated parameter values are stored in a data frame at the end of each iteration of the chain (see Section on implementing the MH algorithm below for more details).
## initial values for the detection function parameters
# scale and shape (enter the model on the log scale)
sig.0<-log(130) # scale parameter
sha.0<-log(2.5) # shape parameter
## initial values for count model parameters
int.0<- -13 # intercept
# type coefficient
typ.0<- 0.4 # level treat
# initial values for random effect standard deviation and coefficients
sd.ran.0<-log(1)
j=length(unique(covey.data$glmm.data$gr.id)) # number of sites in covey data
set.seed(1234)
raneff<-rnorm(j,0,exp(sd.ran.0))
# combining the initial values in a single array
curparam<-c(sig.0,sha.0,int.0,typ.0,sd.ran.0)
In this section we set up the data frames which will store the values for the parameters and the random effect coefficients for each iteration of the chain of the MH updating algorithm. These are large objects due to the length of the chain and the size of the data frame holding the values for the random effect coefficients, given the number of random effect coefficients. As the chain will likely take a long time to complete, it is advisable to save these objects periodically while the chain is running. We deal with this when we set up the MH algorithm.
# number of iterations for the MC chain
nt<-20000
# param.mat.ex2 holds the values for all parameters throughout the chain
param.mat.ex2<-matrix(NA,nt,length(curparam))
param.mat.ex2<-data.frame(param.mat.ex2)
names(param.mat.ex2)<-c("sig","sha","int","typ","sd.ran")
param.mat.ex2[1,]<-curparam
# raneff.mat.ex2 holds the values for all random effect coefficients throughout the chain
raneff.mat.ex2<-matrix(NA,nt,j)
raneff.mat.ex2<-data.frame(raneff.mat.ex2)
raneff.mat.ex2[1,]<-raneff
This algorithm involves defining a number of iterations \(nt\) and updating each parameter in \(p\) from the full likelihood during each iteration using the acceptance probability from equation 3. In the above section Defining the likelihood function, we learned about the different components of the full likelihood \({\mathcal L}_{n,y}\) and how these are implemented in the R function covey.int.log.lik. We use this function for updating each of the parameters in p. The function \(l.prior\) is used to calculate the uniform prior densities for each of the parameters.
Additionally, as we have a random effect in the count model, we need to update each of the random effect coefficients \(b_l\) during each iteration. However, when updating the random effect coefficients, we can take a few shortcuts to make the algorithm more time-efficient. Firstly, we do not calculate prior densities for updating the coefficients (for our models, we do not place priors on the random effect coefficients). Also, as before we use symmetrical (normal) proposal distributions. Hence, when updating for example the random effect coefficient for location \(l = 1\), the acceptance probability from equation 3 reduces to: \[ \alpha(b'_1|b_1^{t})=\min\left(1,\frac{{\mathcal L}_{n,y}(b'_1,p^t,b^t_{-1})}{{\mathcal L}_{n,y}(b_1^{t},p^t,b^t_{-1})} \right)~. \nonumber \]
Furthermore, we no longer need to calculate the full likelihood for updating each random effect coefficient. We may now limit the calculations to those parts that make a difference when subtracting the log-likelihood with the current value of the random effect coefficient \(ln({\mathcal L}_{n,y}(b_1^{t},p^t,b^t_{-1}))\) from the log-likelihood with the updated value of the random effect coefficient \(ln({\mathcal L}_{n,y}(b'_1,p^t,b^t_{-1}))\). Here, we no longer need the likelihood contribution for the detection model (the component log.e from the overall log-likelihood log.lik in covey.int.log.lik above) as this component will remain unaffected when updating any of the \(b_l\). For updating a single random effect coefficient, we need to include the Poisson densities for all counts at the respective site as well as the normal density for this coefficient. Hence, following the example for updating the random effect coefficient \(b_1\) for site \(l = 1\), the following part for the overall likelihood is included in calculating the acceptance probability:
\[ \sum_{k=1}^K \sum_{t=1}^{T_k}\ln\left(\frac{\lambda_{1kt}^{n_{1kt}} \exp[-\lambda_{1kt}]}{n_{1kt}!}\right) + \ln \left(\frac{1}{\sqrt{2\pi \sigma_l^2}}\exp\left[-\frac{b_1^2}{2\sigma_l^2}\right]\right)~. \nonumber \]
This is set up in the function covey.int.log.lik.raneff.update which, with a single call, updates each of the random effect coefficients individually and returns all updated random effect coefficients.
covey.int.log.lik.raneff.update<-function(p,raneff,datax){
# setting up the covariates
scale<-exp(p[1])
shape<-exp(p[2])
int<-p[3]
typ<-c(0,p[4])
sd.ran<-exp(p[5])
# log of effective area (here: same for each visit to a point due to global det model)
norm.const<-integrate(f.haz.function.pt,0,datax$w,scale,shape)$value
l.efa<-log(norm.const*pi*2)
# setting up indices for the factor covariate Type and random effect coefficients
Type0<-match(datax$glmm.data$Type,sort(unique(datax$glmm.data$Type)))
gr.id<-sort(unique(datax$glmm.data$gr.id))
Ran0<-match(datax$glmm.data$gr.id,gr.id)
# calculate the log-Poisson density for each count
lambda<-exp(int + (typ[Type0]) + raneff[Ran0] + l.efa)
#lambda<-as.numeric(unlist(lambda))
# update each coefficient
for (r in 1:length(gr.id)){
u<-rnorm(1,0,0.2)
yr<-which(datax$glmm.data$gr.id==gr.id[r])
dpois.y.new<-dpois(datax$glmm.data$detections[yr],lambda[yr]*exp(u))
num<-sum(log(dpois.y.new))+log(dnorm(raneff[r]+u,0,sd.ran))
dpois.y.cur<-dpois(datax$glmm.data$detections[yr],lambda[yr])
den<-sum(log(dpois.y.cur))+log(dnorm(raneff[r],0,sd.ran))
A<-min(1,exp(num-den))
V<-runif(1)
if(V<=A){raneff[r]<-raneff[r]+u}
}
# return updated coefficients
raneff
}
The following code gives the MH updating algorithm for this example.
########################## Metropolis Hastings updating algorithm ############
# nt is the number of iterations and is set above
# row 1 in param.mat.ex2 and raneff.mat.ex2 contains the initial values, so we start with i = 2
set.seed(1234)
for (i in 2:nt){
print(i)
# updating the 5 model parameters
newparam<-curparam
for (b in c(1:5)){
num<-NA
den<-NA
# proposing to update the bth parameter
u<-rnorm(1,0,q.sd[b])
# the scale, shape and RE sd are on the log scale in curparam and newparam while the boundaries for the priors are not on the log scale
if(!is.na(match(b,c(1,2,5)))){
if((exp(curparam[b])+u)>=lolim[b]){
newparam[b]<-log(exp(curparam[b])+u)
new.l.prior<-l.prior(exp(newparam[b]),lolim[b],uplim[b])
cur.l.prior<-l.prior(exp(curparam[b]),lolim[b],uplim[b])
num<-covey.int.log.lik(newparam,raneff,covey.data) + new.l.prior
den<-covey.int.log.lik(curparam,raneff,covey.data) + cur.l.prior
A<-min(1,exp(num-den))
V<-runif(1)
ifelse(V<=A,curparam[b]<-newparam[b],newparam[b]<-curparam[b])
}
}
else{
newparam[b]<-curparam[b]+u
new.l.prior<-l.prior(newparam[b],lolim[b],uplim[b])
cur.l.prior<-l.prior(curparam[b],lolim[b],uplim[b])
num<-covey.int.log.lik(newparam,raneff,covey.data) + new.l.prior
den<-covey.int.log.lik(curparam,raneff,covey.data) + cur.l.prior
A<-min(1,exp(num-den))
V<-runif(1)
ifelse(V<=A,curparam[b]<-newparam[b],newparam[b]<-curparam[b])
}
}
# storing the updated parameter values
param.mat.ex2[i,]<-curparam
# updating the random effect coefficients
raneff<-covey.int.log.lik.raneff.update(curparam, raneff, covey.data)
# storing the updated random effect coefficients
raneff.mat.ex2[i,]<-raneff
# saving the parameter matrices every 5000 iterations
if(!is.na(match(i,seq(0,nt,5000))==T)){
save(param.mat.ex2,file='param.mat.ex2.RData')
save(raneff.mat.ex2,file='raneff.mat.ex2.RData')
}
}# end of i loop
For an MH updating algorithm, pilot tuning involves ensuring that the full parameter space is explored freely for each parameter. This can be done using trace plots which may be produced at any iteration \(i\) with the following R code.
# The following code will only work if the MH algorithm has run for several iterations.
# We recommend running at least 100 iterations.
# producing trace plots for each of the 5 parameters for iterations 1:i
par(mfrow=c(3,2))
for (b in 1:5){
plot(param.mat.ex2[1:i,b],t='l',xlab="Iteration",main="",ylab="")
if(b == 1) {title(main = expression("Parameter " * sigma[hr]))}
if(b == 2) {title(main = expression("Parameter " * tau))}
if(b == 3) {title(main = expression("Parameter " * beta[0]))}
if(b == 4) {title(main = expression("Parameter " * beta[1]))}
if(b == 5) {title(main = expression("Parameter " * sigma[l]))}
}
As in exercise 1, we use normal proposal distributions which use the current value of the respective parameter as the mean and have parameter-specific proposal variances (defined in the section on proposal distributions above). Pilot tuning may involve adjusting the standard deviations of the proposal distributions (defined in the array q.sd above).
The results from the MH algorithm can also be uploaded into the workspace using:
load("param.mat.ex2.rdata")
load("raneff.mat.ex2.rdata")
First we check if the algorithm returned the desired results, i.e. that the chain converged and that the parameter space was explored freely. This was confirmed using the traceplots for the parameters using:
# producing trace plots for all parameters for iterations 1:nt
par(mfrow=c(3,2))
for (b in 1:5){
plot(param.mat.ex2[1:nt,b],t='l',xlab="Iteration",ylab="")
if(b == 1) {title(main = expression("Parameter " * sigma[hr])) }
if(b == 2) {title(main = expression("Parameter " * tau)) }
if(b == 3) {title(main = expression("Parameter " * beta[0]))}
if(b == 4) {title(main = expression("Parameter " * beta[1]))}
if(b == 5) {title(main = expression("Parameter " * sigma[l]))}
}
Figure 8. Trace plots for iterations 1:20000 for all model parameters, \(\sigma_{hr}\), \(\tau\), \(\beta_0\), \(\beta_1\) and \(\sigma_l\). Note that shown parameter values are on the log-scale.
We draw inference on parameters by obtaining summary statistics of the marginal posterior distributions. For this purpose, we use the parameter values stored in param.mat.ex2 after completing the MH algorithm above for all \(nt = 20000\) iterations.
Using param.mat.ex2, we calculate the mean, standard deviation and central 95% credible intervals. We use the first 999 iterations for the burn-in phase and omit these. Note, that usually larger numbers are recommended for both the burn-in and the total number of iterations. These summary statistics may be obtained using the following R code:
nt=20000
# means
round(apply(param.mat.ex2[1000:nt,1:5],2,mean),4)
## sig sha int typ sd.ran
## 5.6580 1.3177 -13.6605 0.7793 -0.3108
# standard deviations
round(apply(param.mat.ex2[1000:nt,1:5],2,sd),4)
## sig sha int typ sd.ran
## 0.0855 0.1797 0.1868 0.1418 0.1752
# 95% credible intervals
round(apply(param.mat.ex2[1000:nt,1:5],2,quantile,probs=c(0.025,0.975)),4)
## sig sha int typ sd.ran
## 2.5% 5.4614 0.9524 -14.0075 0.5051 -0.676
## 97.5% 5.8064 1.6513 -13.2553 1.0546 0.012
We can plot the distribution for each of the parameters using the following R code:
par(mfrow=c(3,2))
for (b in 1:5){
plot(density(param.mat.ex2[1000:nt,b]),t='l',main="",ylab="Density",xlab="Parameter value")
if(b == 1) {title(main = expression("Parameter " * sigma[hr])) }
if(b == 2) {title(main = expression("Parameter " * tau)) }
if(b == 3) {title(main = expression("Parameter " * beta[0]))}
if(b == 4) {title(main = expression("Parameter " * beta[1]))}
if(b == 5) {title(main = expression("Parameter " * sigma[l]))}
}
Figure 9. Posterior distributions for model parameters from the integrated likelihood.
The parameter of interest for this study was the coefficient for level “TREAT” for the Type covariate in the count model. The mean estimate for this parameter was 0.78 (SD = 0.142, 95% central credible interval = {0.51, 1.05}). Remembering that this parameter entered the model for the expected value \(\lambda_{lkt}\) on the log scale (equation 6 from above: \(\lambda_{lkt}=\exp\left(\sum_{q=1}^Q x_{qklt}\beta_q + b_l + \ln(\nu_{lkt} )\right)\)) we conclude that covey densities were 120.19 % higher on treated fields compared to control fields (100 \(\times\) (2.2 -1 ), with exp(0.78) = 2.2).
If we wish to draw inference on plot density, we can treat plot density as a parameter and obtain similar summary statistics as for the parameters p. For this purpose, we use the parameter values from the count model stored in param.mat and calculate the expected density for each iteration (excluding the burn-in phase). We note that while \(\lambda_{lkt}\) (equation 6) models the expected counts, \(\exp\left(\sum_{q=1}^Q x_{qklt}\beta_q + b_l \right)\) from the \(\lambda_{lkt}\) model (i.e., without the log-effective area \(\ln(\nu_{lkt} )\)) models the densities (i.e., number of coveys per \(m^2\)).
We can obtain posterior distributions of plot density for any given combination of covariates (e.g., control vs treatment plots). We may wish, for example, to draw inference on a baseline plot density where the covariate value is set to their baseline level (level for Type = “CONTROL”). We further need to add a contribution from the random effect: \(0.5 \times \sigma_l^2\). Hence, the expected baseline density \(D_{bl}\) can be expressed as: \(D_{bl} = exp(\beta_0 + (0.5 \times \sigma_l^2)))\), where \(\beta_0\) is the intercept for the count model and \(\sigma_l\) is the random effect standard deviation. We can calculate the \(D_{bl}\) for each iteration either during the MH-update by including the appropriate code or calculate these after the MH algorithm has completed. Regardless, we need to consider that the units for distance measurements were metres (covey.data$dis.unit = m). As the effective area also enters the model for \(\lambda\) in \(m^2\), the units for the density estimates from the count model are \(m^2\). We can convert the density estimates from number of animals per \(m^2\) to number of animals per \(km^2\) by including the appropriate code:
Dbl.ex2<-array(NA,nt)
# calculate the baseline density
# omit the burn-in
for (i in 1000:nt){
Dbl.ex2[i]<-exp(param.mat.ex2[i,3]+(0.5*exp(param.mat.ex2[i,5])^2))
}
Dbl.ex2<-Dbl.ex2*1000*1000
Summary statistics for the baseline density can now be obtained using:
# the posterior mean, standard deviation and central credible interval for
# baseline density after omitting the burn-in
# (rounded to 2 decimal points)
round(mean(Dbl.ex2[1000:nt]),2)
## [1] 1.58
round(sd(Dbl.ex2[1000:nt]),2)
## [1] 0.33
round(quantile(Dbl.ex2[1000:nt],prob=c(0.025,0.975)),2)
## 2.5% 97.5%
## 1.07 2.40
We conclude that the baseline density of northern bobwhite coveys was 1.58 coveys per \(km^2\) (SD = 0.33, 95% central credible interval = {1.072, 2.4}).
See if you understand the R code provided above and can get the code to run on your laptop. Note, you do not need to complete all 20000 iterations in order to get the summary statistics as the data frame containing the results param.mat.ex2 can be loaded into the workspace.
Consider the case that we are more interested in the effects of Julian date on bird numbers rather than the type of field (buffered or unbuffered). Hence, instead of fitting a model with covariate Type, we want to fit a model with covariate JDctr which represents the centred values for this covariate. Modify the function covey.int.log.lik in such a manner that we no longer have Type as a covariate in the count model (which was fitted as a factor covariate with two levels) but JDctr instead (which should be fitted as a continuous covariate). You could call your new function e.g. covey.int.jd.log.lik. Make the equivalent changes to function covey.int.log.lik.raneff.update and call it, e.g. covey.int.jd.log.lik.raneff.update.
Define the priors, proposal distributions and starting value for model parameters including the new parameter \(\beta_{2}\). Set up a data frame param.mat.ex2.jd which stores the parameter values for \(nt\)=20000 iterations.
Using the MH algorithm from above as a starting point, write your MH algorithm for fitting this model.
Use the methods described above to check if you algorithm has converged. What are your mean, standard deviation and 95% credible intervals for the coefficient associated with covariate JDctr (parameter \(\beta_{2}\))?
Modify the methods from above for estimating the posterior distribution of the baseline density which, here, means where covariate JDctr is set to zero. What are your mean, standard deviation and 95% credible intervals for density per \(km^2\) from your algorithm? In which way is it different from your baseline densities estimated with a model containing covariate Type and why do you think this is?
In this section, we will set up a reversible jump algorithm similar to the one described in Oedekoven et al. (2014; or see methods described in Section 9.4.2 of Buckland et al. 2015). We build upon the previous example and include model selection as part of the inference. Now, instead of conditioning on the model being the correct model as in the MH algorithm above, we include a reversible jump (RJ) step at each iteration where we propose to move to a different model. For this exercise we keep it simple and only include two model choices: model 1 from exercise 2 with a hazard-rate detection function and a count model with the Type covariate and model 2 for which the count model also includes covariate JDctr (centred Julian date).
Now each iteration consists of a proposal to update the model (the RJ step) and a proposal to update the parameters of the current model (the MH step). For the MH step we use an algorithm similar to the above except that only parameters contained in the current model are updated. The RJ step is described in more detail in the following section.
Setting up an RJMCMC updating algorithm involves a few additional steps compared to the MH algorithm above. The steps for the RJMCMC algorithm of this exercise include:
Defining the likelihood function
Defining the priors for the parameters and models
Defining proposal distributions for the MH step and the RJ step
Defining initial model and parameter values for the RJMCMC algorithm
Storing the parameter values and selected model
The RJMCMC algorithm
Pilot tuning the RJMCMC algorithm
In this section, we provide the functions and R code necessary to implement these steps. In addition, we describe how to obtain summary statistics of the posterior model probabilities and posterior distributions of parameters and plot densities. We begin with the details for the RJ step.
With the two model choices in this exercise, the RJ step at each iteration involves proposing to delete or add covariate JDctr (modelled as a continuous covariate) depending on whether it is in the current model or not. If, for example, at iteration \(t\) covariate JDctr is not contained in the count model, we propose to add it. For example, say the current model \(m\) with parameters \(p\) does not include JDctr and the newly proposed model \(m'\) includes JDctr. This proposal involves drawing a random sample for the coefficient associated with JDctr, say \(\beta'_2\) from the respective proposal distribution (e.g., \(\beta'_2 \sim N(\mu_{\beta_2},\sigma^2_{\beta_2})\) where \(\mu_{\beta_2}\) and \(\sigma^2_{\beta_2}\) are the parameter-specific proposal mean and variance). We use the identity function as the bijective function (similar to equation 9.41 of Buckland et al. 2015) where all other parameters \(p\) of the current model \(m\) are set equal to the corresponding parameters in \(p'\) and the additional parameter for model \(m'\) is:
\[ \begin{aligned} u_2 = \beta'_2 ~~\\ \end{aligned} \nonumber \]
for this particular model move. We then calculate the acceptance probability \(A(m'|m)\) for this proposed move using equation 8. Note, this equation is a simplified version of A.4 in Oedekoven et al. (2014). We are able to use this simplified version due to the following circumstances. As we use the identity function as the bijective function, the Jacobian \(|J|\) equals 1. Furthermore, as we consider all model probabilities equally likely, \(P(m|m')\) and \(P(m'|m)\) as well as \(p(m)\) and \(p(m')\) cancel in equation 8. This leaves calculating the likelihoods (for the newly proposed model \({\mathcal L}_{n,y}(p',m')\) as well as the current model \({\mathcal L}_{n,y}(p,m)\)), the proposal density for the new parameter (\(q\left(u_2\right)\)) and the prior density for the new parameter (\(p(\beta'_2)\)): \[ A(m'|m) = \frac{{\mathcal L}_{n,y}(p',m')p(\beta'_2)} {{\mathcal L}_{n,y}(p,m)q\left(u_2\right)}~. \tag{8} \]
We show the R code for calculating the acceptance probabilities in the section The RJMCMC algorithm below.
For the RJMCMC algorithm, we use the same likelihood function covey.rj.int.log.lik (including the f.haz.function.pt) for both models. It is slightly different from covey.int.log.lik from exercise 2 in that covey.rj.int.log.lik requires a vector of length six for argument p. This is for the additional covariate JDctr. This function can be used for both models, 1 and 2, despite the fact that model 1 may not contain this covariate - in which case the parameter value is set to zero.
# This function is loaded into your workspace using the source command above
covey.rj.int.log.lik<-function(p, raneff, datax){
## Part 1: setting up the parameter values p for the detection function and count models
# the detection model
scale<-exp(p[1]) # the scale parameter
shape<-exp(p[2]) # the shape parameter
# the count model
int<-p[3] # the intercept
typ<-c(0,p[4]) # coefficient for Type level "TREAT" (level "CONTROL" is absorbed in the intercept)
day<-p[5] # coefficient for JDctr
sd.ran<-exp(p[6]) # the random effect standard deviation on the log scale
# Part 2: the likelihood component pertaining to the detection model
# calculating the f(y) for each observed distances
le<-nrow(datax$dis.object)
fe<-numeric(le)
# calculating the f(y) for each observation (note that the truncation distance is stored in datax$w)
norm.const<-integrate(f.haz.function.pt,0,datax$w,scale,shape)$value
for (e in 1:le){
fe[e]<-f.haz.function.pt(datax$dis.object$distance[e],scale,shape)/norm.const
}
# the sum of the log(f(y))
log.e<-sum(log(fe))
# Part 3 of covey.int.log.lik(): the likelihood component pertaining to the count model
# setting up indices for the factor covariate Type and random effect coefficients
Type0<-match(datax$glmm.data$Type,sort(unique(datax$glmm.data$Type)))
gr.id<-sort(unique(datax$glmm.data$gr.id))
Ran0<-match(datax$glmm.data$gr.id,gr.id)
# log of effective area (here: same for each visit to a point due to global det model)
l.efa<-log(norm.const*pi*2)
# calculate the log-Poisson density for each count
lambda<-exp(int + typ[Type0] + datax$glmm.data$JDctr*day + raneff[Ran0] + l.efa)
dpois.y<-dpois(datax$glmm.data$detections,lambda)
logdpois.y<-sum(log(dpois.y))
# calculate the log of the normal density for each random effect coefficient
log.dnorm.raneff<-sum(log(dnorm(raneff,0,sd.ran)))
# adding up the likelihood components
log.lik<-logdpois.y + log.dnorm.raneff + log.e
# the function return
return(log.lik)
}
We consider all models equally likely. Hence, the probability \(P(m'|m)\) of proposing to move to model \(m'\) given that the chain is in model \(m\) and the probability \(P(m|m')\) for the reverse step of proposing to move to model \(m\) given that the chain is in model \(m'\) are equal and can be ignored for this exercise.
As for the MH algorithm above, we place uniform priors on all model parameters and use the same function l.prior to calculate the log of the prior densities
# setting the lower and upper limits for the uniform prior density
lolim<-c(50,1,-25,-5,-5,0.0001)
uplim<-c(1000,10,5,5,5,3)
# This function is uploaded into your workspace using the source command above
l.prior<-function(coef,lo,up){
l.u.int<-log(dunif(coef,lo,up))
if(any(abs(l.u.int)==Inf))l.u.int<- -100000
sum(l.u.int)
}
As for the MH algorithm, we use normal proposal distributions. However, for the RJMCMC algorithm, we need two sets of proposal distributions: one set for the RJ step and another set for the MH step. The proposal distributions for the RJ step are used to draw random samples for new model coefficients. They are also used to calculate proposal densities for calculating the acceptance probabilities using equation 8. We note that all parameters except one (for JDctr) are always in the model and do not need a proposal distribution for the RJ step. However, for consistency, we define these as well.
The proposal distributions for the MH step defined with zero means and parameter-specific standard deviations are are the same as for the MH algorithm above for the corresponding parameters.
# proposal distributions for parameters in the detection and count model
# for RJ step: proposals of new parameters
prop.mean<-c(270, 3.5, -13, 0.6, .2, 0.5)
prop.sd<-c(1.41, 0.84, 0.1, 0.2, 0.2, 0.3)
# for MH step: proposals to update current parameters
q.sd<-c(3.5, 0.1, 0.02, 0.03, 0.005, 0.02)
This is more complex than for the MH algorithm, as we need not only to store the values for the parameters and random effect coefficients during each iteration, but also to keep track of which model was chosen for each iteration and which parameters in p are switched on and off. For this purpose, we set up a model indicator matrix, count.list, which indicate what parameters are switched on for the respective model. These indicator matrices contain combinations of \(0\) and \(1\) in each row where the row number refers to the model number and \(1\) indicates that the parameter (indicated by column names) is switched on for the respective model. These will help us store the model choice for each iteration as integer numbers in the vector count.model.
# number of iterations in the RJMCMC chain
nt<-20000
# array that will keep track of the model choice (stored as an integer number referring to the model number (row number) from count.list below)
count.model<-array(NA,nt)
# we need to know which model contains which parameters
# row number indicates the model number
# column names indicate which parameters are switched on or off for the respective model
# 0: parameter is switched off
# 1: parameter is switched on
count.list<-matrix(0,2,6) # 2 count model choices:
# model 1: int + typ
# model 2: int + typ + JDctr
# columns: scale, shape, int, type level TREAT, JDctr, sd.ran
colnames(count.list)<-c("scale","shape","int","type.tr","JDctr","sd.ran")
# all detection models contain the scale and shape parameter (hazard rate)
# all count models contain the intercept, covariate type and sd.ran
count.list[1,]<-c(rep(1,4),0,1) # without JDctr
count.list[2,]<-c(rep(1,6)) # with JDctr
# data frames for storing the parameter values
param.mat.ex3<-matrix(NA,nt,6)
param.mat.ex3<-data.frame(param.mat.ex3)
# these obtain the same column names as count.list
colnames(param.mat.ex3)<-c("scale","shape","int","type.tr","JDctr","sd.ran")
# data frame for storing the random effect coefficients
j<-length(sort(unique(covey.data$glmm.data$gr.id)))
raneff.mat.ex3<-matrix(NA,nt,j) # 40 sites
raneff.mat.ex3<-data.frame(raneff.mat.ex3)
We begin our chain with model 1 which does not include JDctr. Hence, we need to set up starting values for the model parameters including the scale and shape parameter of the hazard-rate detection model and the intercept, type coefficient and random effect standard deviation of the count model.
The current model choice is stored in curmodel which is updated at each iteration. The current parameter values are stored in curparam which is also updated at each iteration. As for the MH algorithm in exercise 2, we store the current values for the random effect coefficients in raneff.
## Setting the initial models
# global detection function and intercept + RE sd count model
curmodel<-1
count.model[1]<-curmodel
# curlist will tell you which parameters are switched on for the current model
curlist<-count.list[curmodel,]
## Setting up initial values
# for detection model: scale and shape (which are both on the log-scale)
sig.0 <- log(130)
sha.0 <- log(2.5)
# initial values for count model parameters
int.0 <- -13 # intercept
typ.0 <- 0.4 # type coefficient
# day.0 <- -0.01 # needed in case you start with model 2
sd.ran.0 <- log(1) # random effect standard deviation
j<-length(sort(unique(covey.data$glmm.data$gr.id))) # number of sites
set.seed(1234)
raneff<-rnorm(j,0,exp(sd.ran.0))
# curparam and new param always needs to be of length 6 as the likelihood function covey.rj.int.log.lik expects an array of length 6 for argument p regardless of model
# for model 1 we set the coefficient for JDctr to 0
curparam<-c(sig.0,sha.0,int.0,typ.0,0,sd.ran.0)
param.mat.ex3[1,]<-curparam
raneff.mat.ex3[1,]<-raneff
Similar to the MH algorithm from exercise 2, we use the function covey.rj.int.log.lik.raneff.update to update the random effect coefficients.
In the following algorithm, we begin each iteration with the RJ step where we propose to update the current model. Here, we propose to switch from model 1 to model 2 (if the current model is 1) or vice versa. A proposed move from model 1 to model 2 entails a proposal to add covariate JDctr, a proposed move from model 2 to model 1 entails a proposal to delete covariate JDctr. This completes the RJ step. Note that Type is fitted as a factor covariate with two levels and JDctr is fitted as a continuous covariate. At the completion of the RJ step, we store current model choice in the data frame count.model. Following the RJ step is the MH step where we propose to update current model parameters. Here we cycle through each of the current parameters and propose to update it. (See Section The MH updating algorithm from exercises 1 and 2 above for more details about MH updating). At the completion of the MH step, we store the current parameter values and random effect coefficients in the respective data frames (param.mat.ex3 and raneff.mat.ex3). We note that in the case JDctr is not in the current model, the parameter value will be stored as \(0\). Using the information stored in count.list in combination with the model choice stored in count.model allows us to distinguish these \(0\)s from true zeros for a parameter value.
# nt is the number of iterations and is set above
set.seed(1234)
for (i in 1:nt){
print(i)
########################## RJ step ########################################
# method: propose to add or delete covariate JDctr depending on whether it is in the current model
newlist<-curlist
newparam<-curparam
######################################### count model parameters
# check if JDctr it is in the current model:
# propose to add it if it is not in the current model
b <- 5
if(curlist[b]==0){
newlist[b]<-1
newparam[b]<-rnorm(1,prop.mean[b],prop.sd[b])
num<-NA
den<-NA
# the prior densities
new.l.prior<-l.prior(newparam[b],lolim[b],uplim[b])
# likelihood
new.lik<-covey.rj.int.log.lik(newparam,raneff,covey.data)
cur.lik<-covey.rj.int.log.lik(curparam,raneff,covey.data)
# proposal density
prop.dens<-log(dnorm(newparam[b],prop.mean[b],prop.sd[b]))
# numerator and denominator in acceptance probability
num<-new.lik+new.l.prior
den<-cur.lik+prop.dens
}
# propose to delete it if it is in the current model
else{
newlist[b]<-0
newparam[b]<-0
num<-NA
den<-NA
# the prior densities
cur.l.prior<-l.prior(curparam[b],lolim[b],uplim[b])
# likelihood
new.lik<-covey.rj.int.log.lik(newparam,raneff,covey.data)
cur.lik<-covey.rj.int.log.lik(curparam,raneff,covey.data)
# proposal density
prop.dens<-log(dnorm(curparam[b],prop.mean[b],prop.sd[b]))
# numerator and denominator in equation 8
num<-new.lik+prop.dens
den<-cur.lik+cur.l.prior
}
A<-min(1,exp(num-den))
V<-runif(1)
if(V<=A){
curparam[b]<-newparam[b]
curlist[b]<-newlist[b]
}
else{
newparam[b]<-curparam[b]
newlist[b]<-curlist[b]
}
# determining the current model and storing it for this iteration
curmodel<-match.function(curlist,count.list)
count.model[i]<-curmodel
##################################### MH step ###################################
newparam<-curparam
# updating current parameters
for (b in which(curlist==1)){
num<-NA
den<-NA
# proposing to update the bth parameter
u<-rnorm(1,0,q.sd[b])
# remembering that scale, shape and sd.ran are on the log-scale
if(!is.na(match(b,c(1,2,6)))){
newparam[b]<-log(exp(curparam[b])+u)
new.l.prior<-l.prior(exp(newparam[b]),lolim[b],uplim[b])
cur.l.prior<-l.prior(exp(curparam[b]),lolim[b],uplim[b])
}
else{
newparam[b]<-curparam[b]+u
new.l.prior<-l.prior(newparam[b],lolim[b],uplim[b])
cur.l.prior<-l.prior(curparam[b],lolim[b],uplim[b])
}
num<-covey.rj.int.log.lik(newparam,raneff,covey.data) + new.l.prior
den<-covey.rj.int.log.lik(curparam,raneff,covey.data) + cur.l.prior
A<-min(1,exp(num-den))
V<-runif(1)
ifelse(V<=A,curparam[b]<-newparam[b],newparam[b]<-curparam[b])
}
param.mat.ex3[i,]<-curparam
# updating the random effect coefficients
raneff<-covey.rj.int.log.lik.raneff.update(curparam, raneff, covey.data)
raneff.mat.ex3[i,]<-raneff
# saving the parameter matrices every 5000 iterations
if(!is.na(match(i,seq(0,nt,5000))==T)){
save(param.mat.ex3,file='param.mat.ex3.RData')
save(raneff.mat.ex3,file='raneff.mat.ex3.RData')
save(count.model,file="model.ex3.RData")
}
}
For pilot tuning the within-model moves (the MH step), i.e. adjusting prior and proposal distributions, refer to the section Pilot tuning the MH updating algorithm above in exercise 1.
Pilot tuning the between-model move (the RJ step), may involve adjusting the proposal distributions for the RJ step and the prior distributions for the parameters. We found that, for this exercise, decreasing the standard deviation of the normal proposal distribution for a parameter and/or widening the boundaries of a uniform prior distribution may decrease the acceptance probability of a proposal to include the respective parameter in the current model.
After the algorithm has completed, we can use the data frames that store the model choices and parameter values to draw inference. For the purpose of the exercise, these data frames may also be uploaded into the workspace using:
load("param.mat.ex3.rdata")
load("model.ex3.rdata")
load("raneff.mat.ex3.rdata")
For the RJMCMC algorithm, we may draw inference on both the model probabilities and the model parameters by obtaining summary statistics of the respective marginal posterior distributions. As for the MH algorithm, we consider the first 999 iterations of the chain as the burn-in and omit these from obtaining summary statistics.
We use the following code to obtain model probabilities:
# to calculate model probabilities (omiting the burn-in)
round(table(count.model[1000:nt])/length(count.model[1000:nt]),6)
##
## 1 2
## 0.062576 0.937424
We conclude that the preferred model was model 2 which included the JDctr covariate. After the burn-in, this model was selected in 93.74 % of the iterations.
To assess model mixing we can plot the sequence of models selected during the algorithm:
plot(count.model,t='l',yaxt="n",ylab="Model",xlab="Iteration")
axis(side=2,at=c(1,2),labels = c(1,2))
Figure 10. Models selected during 20000 iterations of the RJMCMC algorithm.
The chain seemed to be moving freely between the two models.
For inference on the parameters, we can use the parameter values stored in param.mat.ex3 to calculate the mean, standard deviation and central 95% credible interval for, e.g., the parameter of interest. We consider the first 999 iterations as the burn-in phase and omit these. Generally, we now have choices about which iterations we include for presenting summary statistics for each parameter. We can use all iterations after the burn-in and consider the value taken by a parameter that was not in the model to be \(0\). We could also present summary statistics for parameters conditional on the corresponding covariates being in the model (hence excluding those iterations during which the covariates were not in the model). We could also condition on the preferred model and present summary statistics for parameters conditional on the model choice. However, we have to consider that, when covariates are correlated, estimates of the coefficient for one covariate with another covariate absent may not be comparable with estimates when the other covariate is present.
The parameter of interest for our case study was the coefficient for level “TREAT” for the Type covariate in the count model. From the results of the MH algorithm we concluded that the mean estimate for this parameter was {r round(mean(param.mat[1000:nt,8]),2)}
(SD = {r round(sd(param.mat[1000:nt,8]),3)}
, 95% central credible intervals = {r round(quantile(param.mat[1000:nt,8],probs=c(0.025,0.975)),2) }
).
We will now investigate whether the inference on this parameter changes whether or not the model also contains JDctr. For this purpose, we obtain summary statistics for this parameter using the parameter values stored in param.mat.ex3. We use the following R code to produce summary statistics for three types of scenarios: 1. using all iterations after the burn-in, 2. using the preferred model 2 only.
# Inference on level "TREAT" for the Type coefficient in the count model
# 1. including all iterations after the burn-in
round(mean(param.mat.ex3[1000:nt,4]),2)
## [1] 0.75
round(sd(param.mat.ex3[1000:nt,4]),2)
## [1] 0.14
round(quantile(param.mat.ex3[1000:nt,4],probs=c(0.025,0.975)),2)
## 2.5% 97.5%
## 0.49 1.04
# 2. using model 2 only
x2<-which(count.model[1000:nt]==2)+999
round(mean(param.mat.ex3[x2,4]),2)
## [1] 0.75
round(sd(param.mat.ex3[x2,4]),2)
## [1] 0.14
round(quantile(param.mat.ex3[x2,4],probs=c(0.025,0.975)),2)
## 2.5% 97.5%
## 0.49 1.04
In this simple example with the two model options, inference on the parameter for level “TREAT” of the Type covariate in the count model was unaffected by whether we include JDctr in the model or not. The mean, standard deviation (SD) and 95 % central credible intervals for the Type coefficient were the same for both options.
Similar to inference on the parameters for an RJMCMC algorithm, we can include all iterations from the chain or condition on the preferred models for obtaining summary statistics for plot density.
The national CP33 monitoring program was coordinated and delivered by the Department of Wildlife, Fisheries, and Aquaculture and the Forest and Wildlife Research Center, Mississippi State University. The national CP33 monitoring program was funded by the Multistate Conservation Grant Program (Grants MS M-1-T, MS M-2-R), a program supported with funds from the Wildlife and Sport Fish Restoration Program and jointly managed by the Association of Fish and Wildlife Agencies, U.S. Fish and Wildlife Service, USDA-Farm Service Agency, and USDA-Natural Resources Conservation Service-Conservation Effects Assessment Project.
Buckland, S. T., E. Rexstad, T. A. Marques, and C. S. Oedekoven. 2015. Distance Sampling: Methods and Applications. Methods in Statistical Ecology. Springer.
Davison, A. C. 2003. Statistical Models. Cambridge University Press.
Gelman, A., G. O. Roberts, and W. R. Gilks. 1996. “Efficient Metropolis Jumping Rules.” In Bayesian Statistics, edited by J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, 599–608. Oxford University Press, Oxford.
Hastings, W. K. 1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications.” Biometrika 57(1): 97–109.
Oedekoven, C. S., S. T. Buckland, M. L. Mackenzie, R. King, K. O. Evans, and Jr. Burger L. W. 2014. “Bayesian Methods for Hierarchical Distance Sampling Models.” Journal of Agricultural, Biological, and Environmental Statistics 19 (2). Springer US: 219–39. doi:10.1007/s13253-014-0167-0.