Olin College is Hiring

Olin College is Hiring. I teach at Olin College, a new undergraduate engineering college with the mission to fix engineering education. If you're interested in joining our team, here is information about the Faculty Search at Olin College.

Monday, November 28, 2011

Estimating the age of renal tumors

UPDATE April 2, 2012: I wrote a paper describing this work and submitted it to arXiv. You can download it here.

Abstract: We present a Bayesian method for estimating the age of a renal tumor given its size. We use a model of tumor growth based on published data from observations of untreated tumors. We find, for example, that the median age of a 5 cm tumor is 20 years, with interquartile range 16-23 and 90% confidence interval 11-30 years.


A few weeks ago I read this post on reddit.com/r/statistics:
"I have Stage IV Kidney Cancer and am trying to determine if the cancer formed before I retired from the military. ... Given the dates of retirement and detection is it possible to determine when there was a 50/50 chance that I developed the disease? Is it possible to determine the probability on the retirement date?  My tumor was 15.5 cm x 15 cm at detection. Grade II."
I contacted the original poster and got more information; I learned that veterans get different benefits if it is "more likely than not" that a tumor formed while they were in military service (among other considerations).

Because renal tumors grow slowly, and often do not cause symptoms, they are often left untreated.  As a result, we can observe the rate of growth for untreated tumors by comparing scans from the same patient at different times.  Several papers have reported these growth rates.

I collected data from a paper by Zhang et al.  I contacted the authors to see if I could get raw data, but they refused on grounds of medical privacy.  Nevertheless, I was able to extract the data I needed by printing one of their graphs and measuring it with a ruler.  It's silly, but it works.

They report growth rates in reciprocal doubling time (RDT), which is in units of doublings per year.  So a tumor with RDT=1 doubles in volume each year; with RDT=2 it quadruples in the same time, and with RDT=-1, it halves.  The following figure shows the distribution of RDT for 53 patients:

The squares are the data points from the paper; the line is a model I fit to the data.  The positive tail fits an exponential distribution well, so I used a mixture of two exponentials.

As a simple model of tumor growth, I chose the median value of RDT, which is 0.45, and used that to estimate the age of a tumor with maximum dimension 15.5 cm.  Here's what I wrote in my letter to the Veterans Benefits Administration:
  1. In the largest study I reviewed (53 patients) the median volume doubling time is 811 days. By definition of median, 50% of observed tumors grew faster and 50% slower.  By geometry, the doubling time for the maximum linear dimension is approximately (811)(3) = 2433 days or 6.7 years.  Therefore, for a tumor with maximum linear dimension 15.5 cm on [diagnosis date], it is as likely as not that the size on [discharge date] was 6 cm.
  2. If the diameter of the tumor on [discharge date] were 1 mm and it grew to 15.5 cm by [diagnosis date], the effective volume doubling time would be 150 days.  Fewer than half of the tumors in the studies I reviewed grew at this rate or faster, so it is more likely than not that the tumor grew more slowly.
Based on this analysis, I conclude that it is more likely than not that this tumor formed prior to [discharge date].
I think this model is sufficient to answer the question as posed, but it occurred to me later (in the shower, where all good ideas come from) that we can do better.  By sampling from the distribution of growth rates and generating simulated tumor histories, we can estimate the distribution of size as a function of time and then, using Bayes's Theorem, get the distribution of age as a function of size.

Here's how.  The simulation starts with a small tumor (0.3 cm) and runs these steps:
  1. Choose a growth rate from the distribution of RDT.
  2. Compute the size of the tumor at the end of an 8 month interval (that's the median interval between  measurements in the data source).
  3. Repeat until the tumor is 20 cm in diameter.
This figure shows 100 simulated growth trajectories:
The line at 10 cm shows the range of ages for tumors at that size: the fastest-growing tumor gets there in 8 years; the slowest takes more than 35.

By drawing the line at different sizes, we can estimate the distribution of age as a function of size.  There's an implicit use of Bayes's Theorem in there, but because I did everything discretely, I didn't have to think too hard.  This figure shows the distribution of age for a few different sizes:
Not surprisingly, bigger tumors are likely to be older.  For any size, we can generate the CDF and compute the median, interquartile range, and 90% confidence interval.  Here's what that looks like (with size on a log scale):
The points are data from simulation, which produces some variability due to discrete approximation.  The lines are fitted to the data.

With these results, doctors can look up the size of a tumor and get the distribution of ages; for example, the median age of a 15 cm tumor is 27 years, with interquartile range 22-31 and 90% confidence interval 16-39 years.

This model yields more detail than the simple model I started with, but the results are qualitatively similar; a tumor this size is more likely than not to have formed prior to the original poster's date of discharge.  It looks like there is also a good chance that it formed prior to enlistment, but I don't know what the VBA makes of that.


I think this model makes the best use of the available data, but there are several limitations:

1) The factors that limit tumor growth are different for very small tumors, so the observed data doesn't apply.  We can extrapolate back to when the tumor was small (I chose 0.3 cm, a bit smaller than the smallest tumor in the study).  That gives us a lower bound on the age of the tumor, but we can't say much about when the first cancer cell appeared.

2) The distribution of growth rates is based on a sample of 53 patients.  A different sample would yield a different distribution.  I could use resampling to characterize this source of error, but haven't.

3) The growth model does not take into account tumor subtype or grade, which is consistent with the conclusion of Zhang et al: “Growth rates in renal tumors of different sizes, subtypes and grades represent a wide range and overlap substantially.”  

4) In our model of tumor growth, the growth rate during each interval is independent of previous growth rates.  It is plausible that, in reality, tumors that have grown quickly in the past are more likely to grow quickly.

If this correlation exists, it affects the location and spread of the results.  For example, running simulations with ρ = 0.4 increases the estimated median age by about a year, and the interquartile range
by about 3 years.  However, if there were a strong serial correlation in growth rate, there would be also be a correlation between tumor volume and growth rate, and prior work has shown no such relationship.

There could still be a weak serial correlation, but since there is currently no evidence for it, I ran these simulations with ρ  = 0.


  1. I saw this reddit thread and felt like unfortunately I had neither time nor skills necessary to try to help out the guy. Thanks very much for taking on projects like that!

  2. Thanks, Pavel. I have to give some credit to my employer for this: Olin College uses a broad definition of "intellectual vitality" (as opposed to just conventional academic research). That gives me freedom to take on projects that serve a lot of different goals, including just helping someone out.

  3. I posted the original request on reddit. I am very thankful to Allen and the reddit community for the support.