tag:blogger.com,1999:blog-6894866515532737257.comments2015-01-27T07:54:59.796-08:00Probably Overthinking ItAllen Downeyhttps://plus.google.com/111942648516576371054noreply@blogger.comBlogger413125tag:blogger.com,1999:blog-6894866515532737257.post-62036019555430054202015-01-27T07:54:59.796-08:002015-01-27T07:54:59.796-08:00Your solution looks good. But I encourage you to ...Your solution looks good. But I encourage you to resist the temptation to collapse the posterior distribution to a point estimate (like an MLE, MAP, or posterior mean). One of the big advantages of the Bayesian approach is that you get a posterior distribution, which captures everything you know/believe about the value, and not just a point estimate or interval.Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-87584054156710126282015-01-27T07:49:30.600-08:002015-01-27T07:49:30.600-08:00I'm useless at stats, so please let me know if...I'm useless at stats, so please let me know if my answer is correct.<br /><br />The first solution I came up with is:<br />If in a sample of 10 I capture 2, then that means that the population is 5 times larger than the number I've tagged. Since I tagged 10, there would be 50 hyraxes.<br /><br />My second solution is using Python. <br />I assume they'll be at most 100 hyraxes as working assumption, and see how far that takes me. The approriate pmf is the hypergeom distribution. Given a population M, with a sample size N = 10, and number of tagged hyraxes n = 10, what is the probability of observing k = 2 tagged hyraxes?<br />So let me create a simple program and observe the output:<br />from scipy.stats import hypergeom<br />for population in range(11,100):<br /> prob = hypergeom.pmf(k = 2, M = population, n = 10, N = 10)<br /> print(population, prob)<br /><br />The output shows a maximum probability at populations of sizes 49 and 50. They're the maximum likelihoods. So, 50 it is, then.<br /><br />Regards,<br />Mark Carter.Max Powerhttp://www.blogger.com/profile/04470463426170671630noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-3364746769493673162015-01-08T22:57:55.516-08:002015-01-08T22:57:55.516-08:00I assume that the distribution of Franken's vo...I assume that the distribution of Franken's votes follows Poisson distribution with lamda = 1,212,629, and Coleman's distribution is similar. Then I sample from the two Poisson distributions for 1000 times and compare the votes. The result is about 41% for Coleman to win. Is there something wrong with my assumptions? Why my result differs from yours? Thank you.张亮http://www.blogger.com/profile/11113984241425650034noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-50543946909743024222014-12-13T11:05:29.084-08:002014-12-13T11:05:29.084-08:00Jaromir, thanks for your comments -- I'm glad ...Jaromir, thanks for your comments -- I'm glad this problem was worth the effort!<br /><br />I didn't actually apply Bayes's theorem in my solution; I only computed the likelihoods P(E|X) and P(E|Y). The ratio of these likelihoods is the Bayes factor, K, which indicates whether the evidence favors X or Y, and how strong it is.<br /><br />Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-21332512297249923882014-12-13T10:55:23.594-08:002014-12-13T10:55:23.594-08:00Solving this problem was a really cool experience....Solving this problem was a really cool experience. I first did not accept the idea that evidence consistent with a hypothesis can make the hypothesis less likely and agreed with gary that we need to multiply by 2 in P(E/X). Just because it lead to the intuitive conclusion that the evidence increases the probability that Oliver was one of the two people. However, I then imagined what if Oliver was AB and not 0. Multiplying by 2 would then lead to the impossible probability of 1.2. That is 2*1*0.6 instead of 2*1*0.01.<br /><br />Only then it came to me that just one sample of 0 really is a little too few should we take Oliver for granted. It was definitely worth being puzzled for a while.<br /><br />Just a minor technical issue: Since the Bayes's theorem is P(X/E) = P(X)*P(E/X) / P(E)<br />shouldn't the number 0.012 be labelled P(E) rather than P(E/Y)? Not that it makes much difference here, but it leads to some confusion as of where to plug the number in the theorem. Also imagine the same problem for a population of only say 10 people. In such a situation, P(E) would not be equal to P(E/Y) and I think what we really are interested in is P(E).<br /> Jaromír Mazákhttp://www.blogger.com/profile/11939209902309250226noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-83746985887960347122014-12-05T05:52:31.168-08:002014-12-05T05:52:31.168-08:00Hi João. Thanks (again) for your solution. It lo...Hi João. Thanks (again) for your solution. It looks great!Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-4291993094199262182014-12-05T02:01:13.322-08:002014-12-05T02:01:13.322-08:00ok, I programmed it in BUGS using the Jeffrey'...ok, I programmed it in BUGS using the Jeffrey's prior :-) My solution suggests there are around 40 hyraxes for this data (I tried with a maximum of 500 and 1000 hyraxes, and the results were relatively stable).<br /><br />The code and results are here http://www.di.fc.ul.pt/~jpn/r/bugs/hyraxes.html<br /><br />Cheers,João Netohttp://www.blogger.com/profile/05560718055133816500noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-70916318109925497812014-12-05T01:02:40.651-08:002014-12-05T01:02:40.651-08:00You should try Jeffrey's prior $p(N) \propto 1...You should try Jeffrey's prior $p(N) \propto 1/N$ that can be normalized for any given M. This will make it more robust to changes of M.<br /><br />Also, you could narrow the interval to [N-n+k,M] instead of [1,M].João Netohttp://www.blogger.com/profile/05560718055133816500noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-67098678678791887352014-10-10T14:41:55.787-07:002014-10-10T14:41:55.787-07:00If only authors and publishers would meet the spir...If only authors and publishers would meet the spirit of the discipline of scholarship as well as its letter. Just make a bibliography. I have a small collection of memorable and valuable bibliographies that I've copied. Some authors divide up their bibliographies into useful segments (for example, by primary/secondary sources, periodicals, etc.). Those authors have my deepest gratitude (and respect).Edward Carneyhttp://www.blogger.com/profile/18390516990905425802noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-51907379381499695292014-10-08T19:33:31.527-07:002014-10-08T19:33:31.527-07:00Tree diagram
1st 2nd P
GFB 1/2*x*1/2=...Tree diagram<br /> 1st 2nd P<br /> GFB 1/2*x*1/2=1/4*x<br /> GF GFGF 1/2*x*1/2*x=1/4x^2<br /> GFG 1/2*x*(1-x)*1/2=1/4*(x-x^2)<br />G<br /> GB 1/2*(1-x)*1/2=1/4*(1-x)<br /> G GGF 1/2*(1-x)*x*1/2=1/4*(x-x^2)<br /> GG 1/2*(1-x)*(1-x)*1/2=1/4*(1-x)^2<br /><br /> BB 1/2*1/2=1/4<br />B BGF 1/2*1/2*x<br /> BG 1/2*1/2*(1-x)<br /><br />Sample space={GFB, GFGF, GFG, GGF, BGF}<br /><br />P(GFGF)+P(GFG)+P(GGF)=(1/4*x^2+1/4*(x-x^2)+1/4*(x-x^2))/(1/4*x+1/4*x^2+1/4*(x-x^2)+1/4*(x-x^2)+1/4*x)=(2-x)/(4-x)<br /><br />When x->0 (Florida is rare name) P=(2-0)/4-0=1/2<br />when x->1/2 (Florida is a half) P=(2-1/2)/(4-1/2)=3/7<br />when x->1 (Florida is equivalent of girl definition) P=(2-1)/(4-1)=1/3 (reduction to Problem 2) Vasili Gavrilovhttp://www.blogger.com/profile/13021023056247394884noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-5034894005494021692014-10-07T07:29:21.970-07:002014-10-07T07:29:21.970-07:00Good point. I would love to see some kind of typo...Good point. I would love to see some kind of typographical distinction between GD endnotes that contain VIAI and the ones that just have BI (bibliographical information).Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-4600013712175887202014-10-07T01:51:04.270-07:002014-10-07T01:51:04.270-07:00Thanks! However, I would be happy already if the a...Thanks! However, I would be happy already if the authors would write only the necessary, but (for most readers) uninteresting information in the goddamn endnotes, like bibliographical or legal stuff. I hate to always keep a finger or a second bookmark, which must be synchronized with the primary bookmark, in the last few pages, just because the author didn't know how to fit the very interesting additional information (VIAI) or further explications in the main text.Grüner Gimpelhttp://www.blogger.com/profile/14167074815990880487noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-87762797839331274512014-10-01T07:52:53.966-07:002014-10-01T07:52:53.966-07:00Very interesting, Allen. I think you are in the ri...Very interesting, Allen. I think you are in the right ballpark and are right about the linear relationship between the rate (speed) and time. Another way to think about it is that it's a common negative exponential growth curve when marathon time (or percentage change in speed) is on the y-axis. Doing it that way makes it clear that we are running up against a limit at some point.DShttp://www.blogger.com/profile/18277161652273613847noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-44376154112716734392014-09-16T05:19:31.335-07:002014-09-16T05:19:31.335-07:00Thanks for these comments, and for your kind words...Thanks for these comments, and for your kind words. You asked how what I presented differs from most intro stats classes. Based on the textbooks I've seen, I get the impression that many stats classes teach hypothesis testing as a cookbook process, so students learn how to perform various tests and when to use which test. I have not seen much emphasis on the sampling distribution as the basis for standard error and confidence interval (but I am sure there are example of books and classes that do).<br /><br />About the computational approach, you suggested that students might learn how to use tools, but not how they work. I don't think the computational approach prevents students from learning both, and compared to the standard mathematical approaches, it provides a lot of flexibility: students can learn how to use a black box, then learn how it works (a top-down approach) or start with building blocks and assemble the black box (bottom-up).Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-85992281170713542152014-09-16T04:45:08.994-07:002014-09-16T04:45:08.994-07:00I am pleased to discover that the approach I provi...I am pleased to discover that the approach I provided on Reddit (as username blippage) was basically sound. Phew. It's good to know that time hasn't totally withered away my reasoning ability.<br /><br />When you say "The approach I presented here is a bit different from what's presented in most introductory stats classes", how so? Isn't there only one basic way to solve this problem: namely, by understanding that the mean of the sum/difference of two normally distributed variables is the sum/differences of the means, and the variance is the sum of the variance?<br /><br />Also, don't you think that there is a danger that by approaching problems programmatically, it is teaching students to think like engineers rather than mathematicians; that is to say, "I know that it does work, but I'm not sure why". Having said that, an approach that is "too" mathematical can end up looking like symbols just being pushed around the page, with the underlying concepts lost in the process.<br /><br />Your Think books look really interesting, and I think I owe it to myself to read them.<br /><br />All the best, Professor. You're doing a great job of educating the general public about statistical ideas.Max Powerhttp://www.blogger.com/profile/04470463426170671630noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-80834503611950525192014-09-15T05:12:16.891-07:002014-09-15T05:12:16.891-07:00I think he only had a few slides, but he has links...I think he only had a few slides, but he has links to the data and the code. Most of his presentation was a live tutorial.Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-54905307478450521172014-09-15T05:06:57.011-07:002014-09-15T05:06:57.011-07:00Thx for this article with the links. Is Imran Male...Thx for this article with the links. Is Imran Malek's presentation that short? Only 10 slides?rihttp://www.blogger.com/profile/04354229352442247479noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-70647313124428826062014-08-31T17:47:08.275-07:002014-08-31T17:47:08.275-07:00I'm glad you point out this fallacious use of ...I'm glad you point out this fallacious use of p-values. What bothers me, though, is that people fail to dub erroneous p-values the pretend p-values that they are. In order for a p-value, say of .05, to be an actual and not merely a nominal (computed or pretend) p-value, it's required that<br />Prob(p-value < .05; Ho) ~ .05.<br /><br />With the multiple testing in the jelly bean case, say, the probability of so impressive-seeming a p-value is ~.65. I will look carefully at the paper you cite, but I just wanted to note this because it drives me crazy when pretend p-values aren't immediately called out for what they are.<br /><br />thank you for bringing out the fallacy of failing to adjust p-values.<br />errorstatistics.comMAYO:ERRORSTAThttp://www.blogger.com/profile/02967648219914411407noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-20324714342431412882014-08-29T20:38:49.005-07:002014-08-29T20:38:49.005-07:00that xkcd is a perfect analogy to the Hooker paper...that xkcd is a perfect analogy to the Hooker paper nonsense. Thanks you for clarifying it!Cigal MDhttp://www.blogger.com/profile/14389394263265656420noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-59486269738838730972014-08-29T20:31:43.171-07:002014-08-29T20:31:43.171-07:00Thank you for this simple, useful explanation. I a...Thank you for this simple, useful explanation. I appreciate it. Dorit Reisshttp://www.blogger.com/profile/05606807832521443462noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-45516013698174626722014-08-21T06:43:39.037-07:002014-08-21T06:43:39.037-07:00Ok I see, total probability..., thanks!Ok I see, total probability..., thanks!Henrihttp://www.blogger.com/profile/00434803886040541009noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-11022125910069703392014-08-19T11:13:41.387-07:002014-08-19T11:13:41.387-07:00I plugged the previous values into Bayes's the...I plugged the previous values into Bayes's theorem:<br /><br />P(A|E) = P(A) P(E|A) / P(E)<br /><br />Where the denominator P(E) is<br /><br />P(A) P(E|A) + P(B) P(E|B)<br /><br />All clear?Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-18779576131694968062014-08-16T23:50:06.950-07:002014-08-16T23:50:06.950-07:00Hi Allen.
In 3), you end up with:
P(A|E) = 8/54 ~...Hi Allen.<br /><br />In 3), you end up with:<br />P(A|E) = 8/54 ~ 0.15.<br /><br />How do you determine that P(E) = 0.54 ?<br />Henrihttp://www.blogger.com/profile/00434803886040541009noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-32602316813430833482014-08-14T07:00:04.545-07:002014-08-14T07:00:04.545-07:00That's cool. Do you have your R code on GitHu...That's cool. Do you have your R code on GitHub or some other public repo? I think others would like to see it. Let me know and I will add a link to it.Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-87632892469646090692014-08-07T13:10:10.487-07:002014-08-07T13:10:10.487-07:00I was thinking along those lines. So, essentially,...I was thinking along those lines. So, essentially, we say (in R, sorry)<br />x <- seq(from=14000,to=64400,by=700)<br />Like <- dnorm(x-20000,mean=0,sd=sd(SC1Diff))<br />Prior <- approx(SC1PDF$x, SC1PDF$y, x)<br />Post <- Prior$y*Like<br />Post <- Post /sum(Post)<br />where SC1PDF is the kernel density approximation to the data sets. I do indeed match your chart in the book. Thanks! Love the book, but I'm translating it into R as I go, rather than using the Python framework, so it's just a bit tougher. Appreciate it.Reuben Gannhttp://www.blogger.com/profile/14813246182350894222noreply@blogger.com