Thursday, December 4, 2014

The Rock Hyrax Problem



This is the third of a series of articles about Bayesian analysis.  The previous article is here.

Earlier this semester I posed this problem to my Bayesian statistics class at Olin College:
Suppose I capture and tag 10 rock hyraxes.  Some time later, I capture another 10 hyraxes and find that two of them are already tagged.  How many hyraxes are there in this environment?
This is an example of a mark and recapture experiment, which you can read about on Wikipedia.  The Wikipedia page also includes the photo of a tagged hyrax shown above.

As always with problems like this, we have to make some modeling assumptions.

1) For simplicity, you can assume that the environment is reasonably isolated, so the number of hyraxes does not change between observations.

2) And you can assume that each hyrax is equally likely to be captured during each phase of the experiment, regardless of whether it has been tagged.  In reality, it is possible that tagged animals would avoid traps in the future, or possible that the same behavior that got them caught the first time makes them more likely to be caught again.  But let's start simple.

My solution to this problem uses the computation framework from my book, Think Bayes.  The framework is described in this notebook.  If you have read Think Bayes or attended one of my workshops, you might want to attempt this problem before you look at my solution.

If you solve this problem analytically, or use MCMC, and you want to share your solution, please let me know and I will post it here.

And when you are ready, you can see my solution in this notebook.

I will post more of the exercises from my class over the next few weeks.

UPDATE December 5, 2014: João Neto posted a solution to this problem in BUGS using a Jeffrey's prior.

5 comments:

  1. You should try Jeffrey's prior $p(N) \propto 1/N$ that can be normalized for any given M. This will make it more robust to changes of M.

    Also, you could narrow the interval to [N-n+k,M] instead of [1,M].

    ReplyDelete
  2. ok, I programmed it in BUGS using the Jeffrey's prior :-) My solution suggests there are around 40 hyraxes for this data (I tried with a maximum of 500 and 1000 hyraxes, and the results were relatively stable).

    The code and results are here http://www.di.fc.ul.pt/~jpn/r/bugs/hyraxes.html

    Cheers,

    ReplyDelete
    Replies
    1. Hi João. Thanks (again) for your solution. It looks great!

      Delete
  3. I'm useless at stats, so please let me know if my answer is correct.

    The first solution I came up with is:
    If in a sample of 10 I capture 2, then that means that the population is 5 times larger than the number I've tagged. Since I tagged 10, there would be 50 hyraxes.

    My second solution is using Python.
    I assume they'll be at most 100 hyraxes as working assumption, and see how far that takes me. The approriate pmf is the hypergeom distribution. Given a population M, with a sample size N = 10, and number of tagged hyraxes n = 10, what is the probability of observing k = 2 tagged hyraxes?
    So let me create a simple program and observe the output:
    from scipy.stats import hypergeom
    for population in range(11,100):
    prob = hypergeom.pmf(k = 2, M = population, n = 10, N = 10)
    print(population, prob)

    The output shows a maximum probability at populations of sizes 49 and 50. They're the maximum likelihoods. So, 50 it is, then.

    Regards,
    Mark Carter.

    ReplyDelete
    Replies
    1. Your solution looks good. But I encourage you to resist the temptation to collapse the posterior distribution to a point estimate (like an MLE, MAP, or posterior mean). One of the big advantages of the Bayesian approach is that you get a posterior distribution, which captures everything you know/believe about the value, and not just a point estimate or interval.

      Delete