## Monday, November 2, 2015

### One million is a lot

When I was in third grade, the principal of my elementary school announced a bottle cap drive with the goal of collecting one million bottle caps.  The point, as I recall, was to demonstrate that one million is a very large number.  After a few months, we ran out of storage space, the drive was cancelled, and we had to settle for the lesson that 100,000 is a lot of bottle caps.

So it is a special pleasure for me to announce that, early Sunday morning (November 1, 2015), this blog reached one million page views.  I am celebrating the occasion with a review of some of my favorite articles and, of course, some analysis of the page view statistics.

Here's a screenshot of my Blogger stats page to make it official:

And here are links to the 10 most read articles:

### Posts

EntryPageviews
Feb 7, 2011, 9 comments
130446
Oct 27, 2011, 56 comments
47773
Oct 20, 2011, 13 comments
33020
Mar 25, 2015, 5 comments
32210
Aug 18, 2015, 23 comments
30330
Mar 14, 2012
21718
Aug 7, 2013, 13 comments
15035
Feb 2, 2011, 3 comments
13806
Jan 29, 2012, 6 comments
7904
Feb 10, 2015
7096

By far the most popular is my article about whether first babies are more likely to be late.  It turns out they are, but only by a couple of days.

Two of the top 10 are articles written by students in my Bayesian statistics class: "Bayesian survival analysis for Game of Thrones" by Erin Pierce and Ben Kahle, and "Bayesian analysis of match rates on Tinder", by Ankur Das and Mason del Rosario.  So congratulations, and thanks, to them!

Five of the top 10 are explicitly Bayesian, which is clearly the intersection of my interests and popular curiosity.  But the other common theme is the application of statistical methods (of any kind) to questions people are interested in.

According to Blogger stats, my readers are mostly in the U.S., with most of the rest in Europe.  No surprises there, with the exception of Ukraine, which is higher in the rankings than expected.  Some of those views are probably bogus; anecdotally, Blogger does not do a great job of filtering robots and fake clicks (I don't have ads on my blog, so I am not sure how anyone benefits from fake clicks, but I have to conclude that some of my million are not genuine readers).

Most of my traffic comes from Google, Reddit, Twitter, and Green Tea Press, which is the home of my free books.  It looks like a lot of people find me through "organic" search, as opposed to my attempts at publicity.  And what are those people looking for?

People who find my blog are looking for Bayesian statistics, apparently, and the answer to the eternal question, "Are first babies more likely to be late?"

Those are all the reports you can get from Blogger (unless you are interested in which browsers my readers use).  But if I let it go at that, this blog wouldn't be called "Probably Overthinking It".

I used the Chrome extension SingleFile to grab the stats for each article in a form I can process, then used the Pandas read_html function to get it all into a table.  The results, and my analysis, are in this IPython notebook.

My first post, "Proofiness and Elections", was on January 4, 2011.  I've published 115 posts since then; the average time between posts is 15 days, but that includes a 180 day hiatus after "Secularization in America: Part Seven" in July 2012.  I spent the fall working on Think Bayes, and got back to blogging in January 2013.

Blogger provides stats on the most popular posts; I had to do my own analysis to extract the least popular posts:

Some of these deserve their obscurity, but not all.  "Will Millennials Ever Get Married?" is one of my favorite projects, and I think the video from the talk is pretty good.  And "When will I win the Great Bear Run?" is one of the best statistical models I've developed, albeit applied to a problem that is mostly silly.

Measures of popularity often follow Zipf's law, and my blog is no different.   As I suggest in Chapter 5 of Think Complexity, the most robust way to check for Zipf-like behavior is to plot the complementary CDF of frequency (for example, page views) on a log-log scale:

For articles with more than 1000 page views, the CCDF is approximately straight, in compliance with Zipf's law.

The posts that elicited the most comments are:

Apparently, people like their veridical paradoxes!  The Girl Named Florida problem attracted the attention and wrath of JeffJo, the reader who has contributed by far the most comments.  He also accounts for many of the comments on The Sleeping Beauty Problem, along with Brian Mays.  Between them, they might have posted more words on my blog than I have.

A few of my posts have attracted attention on the social network of Google employees, Google+:

I'm glad someone appreciates The Inspection Paradox.  I submitted it for publication in CHANCE magazine, but they didn't want it.  Thirty thousand readers, 909 Google employees, and I think they blew it.

One thing I have learned from this blog is that I can never predict whether an article will be popular.  One of the most technically challenging articles, "Bayes meets Fourier", apparently found an audience of people interested in Bayesian statistics and signal processing.  At the same time, some of my favorites, like The Rock Hyrax Problem and Belly Button Biodiversity have landed flat.  I've given up trying to predict what will hit.

I have posts coming up in the next few weeks that I am excited about, including an analysis of Internet use and religion using data from the European Social Survey.  Watch this space.

Thanks to everyone who contributed to the first million page views.  I hope you found it interesting and learned something, and I hope you'll be back for the next million!

#### 1 comment:

1. Congrats Allen!
I'm a longtime reader but I have to admit I probably haven't been doing my part to up your hit count - I mostly read your articles on Feedly, where, their UI tells me, you have 2119 readers. Are we included in your count, do you think? Or do you need to adjust your numbers by a few thousand?
Cheers,
Jason