tag:blogger.com,1999:blog-6894866515532737257.comments2015-05-20T20:59:20.068-07:00Probably Overthinking ItAllen Downeyhttps://plus.google.com/111942648516576371054noreply@blogger.comBlogger427125tag:blogger.com,1999:blog-6894866515532737257.post-68457951902090365412015-05-20T20:55:48.705-07:002015-05-20T20:55:48.705-07:00Oh man, bad news for Jon Snow. I like him. ...whic...Oh man, bad news for Jon Snow. I like him. ...which I suppose should be enough to establish the low odds.northierthanthouhttp://www.blogger.com/profile/04831362921459744537noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-35283348986222066402015-05-01T17:31:06.813-07:002015-05-01T17:31:06.813-07:00Any good statistician knows about the 4 possibilit...Any good statistician knows about the 4 possibilities you speak of. Hypothesis testing is full of assumptions, and these assumptions are violated every day. The problem is not with the p-value, but with the people who misuse it, and the sheer number of 'experiments' that are performed every day. Don't mess with science!Ralph Wintershttp://www.blogger.com/profile/14548913261473484508noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-65266689754680154122015-05-01T14:03:23.973-07:002015-05-01T14:03:23.973-07:00Thank you for this clarification. I deliberately ...Thank you for this clarification. I deliberately chose this wording because it is more readable than the more pedantic version, and it is equally correct if we take "apparent effect" to include cases where the test statistic is equal or greater than what was observed. I realize that it is not completely unambiguous, but I stand by my editorial choice. Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-75478987476319483982015-05-01T13:35:50.355-07:002015-05-01T13:35:50.355-07:00"The p-value is the probability of the appare..."The p-value is the probability of the apparent effect under the null hypothesis..." No, still wrong. The p-value is the probability of the observed effect or one *more extreme* under the null hypothesis. All your tables are assuming that what you know is p<.01, not that p=.01.Richard Moreyhttp://www.blogger.com/profile/11319149283079163004noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-63822321644113926272015-04-24T03:14:23.061-07:002015-04-24T03:14:23.061-07:00This analysis seems to be working on an assumption...This analysis seems to be working on an assumption that there is a stationary stochastic process that produces these deaths, which is frankly an absurd assumption. Bayesian or frequentist, one really cannot model a process that doesn't actually exist.<br /><br />It is thinking like this that got the financial world to its knees, when applied to the behaviour of financial derivatives contracts.<br /><br />At least, if you assume a non-stationary process that has a limited rate of parameter change (which is a stretch in itself), the credible intervals should flare out widely after the range of available data.Reino Ruusuhttp://www.blogger.com/profile/10454242055654173650noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-91803362866917623662015-03-25T09:09:07.425-07:002015-03-25T09:09:07.425-07:00There's a chapter in Think Stats (2nd edition)...There's a chapter in Think Stats (2nd edition) about survival analysis, and a chapter in Think Bayes that does two-parameter Bayesian estimation. So you can put them together!<br /><br />But yes, I will write up the details when I have a chance.Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-83652018236213930932015-03-25T08:06:24.080-07:002015-03-25T08:06:24.080-07:00Thanks a lot for this, I love it!
Would you cons...Thanks a lot for this, I love it! <br /><br />Would you consider writing a blogpost explaining the math behind this kind of analysis? is it covered in your book? Matías Guzmán Naranjohttp://www.blogger.com/profile/11557127959079200429noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-55019520710383177792015-03-11T13:28:16.418-07:002015-03-11T13:28:16.418-07:00Allen, his/her point is that the p value is "...Allen, his/her point is that the p value is "the probability of the data given chance". It is not "the probability of chance given the data".dustin lockehttp://www.blogger.com/profile/12240156576005704547noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-56179634399275280132015-03-10T05:11:52.847-07:002015-03-10T05:11:52.847-07:00I'm not sure I understand your objection. I w...I'm not sure I understand your objection. I was using "due to chance" as a shorthand for "under the null hypothesis", since the null hypothesis is a model of random variation if there is no actual effect.<br /><br />The sentence you quoted is one of four possible explanations for an apparent effect: it might be caused by random variation in the absence of a real effect.<br /><br />As you said, the p-value is the probability of the apparent effect under the null hypothesis, which is the probability of the effect under (at least a model of) random chance.<br /><br />Can you clarify what you are objecting to?Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-59980598640532965072015-03-09T21:08:36.570-07:002015-03-09T21:08:36.570-07:00GamingLifer nails it. Allen, you appear to have m...GamingLifer nails it. Allen, you appear to have made *the* mistake the editors are so concerned about re p-values.dustin lockehttp://www.blogger.com/profile/12240156576005704547noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-70635172920251218192015-03-08T18:21:51.581-07:002015-03-08T18:21:51.581-07:00"The apparent effect might be due to chance; ..."The apparent effect might be due to chance; that is, the difference might appear in a random sample, but not in the general population."<br /><br />p-values don't tell you this, either. All a p-value tells you is the probability of your data (or data more extreme) given the null is true. They say nothing about whether your data is "due to chance."GamingLiferhttp://www.blogger.com/profile/02195336931104729563noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-64052545609886952062015-03-05T11:19:51.181-08:002015-03-05T11:19:51.181-08:00I don't understand the question. It doesn'...I don't understand the question. It doesn't look like you are using Bayes's theorem. Your first and third expressions are equivalent. The middle one is not (unless P(S|~D) = 1, which would be weird).Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-70697695866227202112015-03-05T05:56:59.667-08:002015-03-05T05:56:59.667-08:00Bayes Theorm : In bayes theorm if I know the value...Bayes Theorm : In bayes theorm if I know the value of p(symptoms/disease) let say 0.3 so, is it justifiable if I take p(~symptoms/disease)=(1-p(symptoms/disease)) p(symptoms/~disease) = (1-p(symptoms/disease))?AKANSHA PANDEYhttp://www.blogger.com/profile/00763971001212693605noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-68901014915615703422015-03-02T15:40:40.361-08:002015-03-02T15:40:40.361-08:00You're still giving undue credit to CIs as mea...You're still giving undue credit to CIs as measures of precision: http://www.ejwagenmakers.com/submitted/fundamentalError.pdf and I think your recommended approach - essentially sticking with orthodox statistical inference - is unjustifiable. We know that it's fundamentally, conceptually flawed, fosters bad science, lends itself to misuse and misinterpretation even by experts, and that there is a well-founded, good science-fostering, reason and intuition-congruent alternative. It just doesn't make sense to recommend sticking with it.Paulhttp://www.blogger.com/profile/04309125585593320043noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-62036019555430054202015-01-27T07:54:59.796-08:002015-01-27T07:54:59.796-08:00Your solution looks good. But I encourage you to ...Your solution looks good. But I encourage you to resist the temptation to collapse the posterior distribution to a point estimate (like an MLE, MAP, or posterior mean). One of the big advantages of the Bayesian approach is that you get a posterior distribution, which captures everything you know/believe about the value, and not just a point estimate or interval.Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-87584054156710126282015-01-27T07:49:30.600-08:002015-01-27T07:49:30.600-08:00I'm useless at stats, so please let me know if...I'm useless at stats, so please let me know if my answer is correct.<br /><br />The first solution I came up with is:<br />If in a sample of 10 I capture 2, then that means that the population is 5 times larger than the number I've tagged. Since I tagged 10, there would be 50 hyraxes.<br /><br />My second solution is using Python. <br />I assume they'll be at most 100 hyraxes as working assumption, and see how far that takes me. The approriate pmf is the hypergeom distribution. Given a population M, with a sample size N = 10, and number of tagged hyraxes n = 10, what is the probability of observing k = 2 tagged hyraxes?<br />So let me create a simple program and observe the output:<br />from scipy.stats import hypergeom<br />for population in range(11,100):<br /> prob = hypergeom.pmf(k = 2, M = population, n = 10, N = 10)<br /> print(population, prob)<br /><br />The output shows a maximum probability at populations of sizes 49 and 50. They're the maximum likelihoods. So, 50 it is, then.<br /><br />Regards,<br />Mark Carter.Max Powerhttp://www.blogger.com/profile/04470463426170671630noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-3364746769493673162015-01-08T22:57:55.516-08:002015-01-08T22:57:55.516-08:00I assume that the distribution of Franken's vo...I assume that the distribution of Franken's votes follows Poisson distribution with lamda = 1,212,629, and Coleman's distribution is similar. Then I sample from the two Poisson distributions for 1000 times and compare the votes. The result is about 41% for Coleman to win. Is there something wrong with my assumptions? Why my result differs from yours? Thank you.张亮http://www.blogger.com/profile/11113984241425650034noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-50543946909743024222014-12-13T11:05:29.084-08:002014-12-13T11:05:29.084-08:00Jaromir, thanks for your comments -- I'm glad ...Jaromir, thanks for your comments -- I'm glad this problem was worth the effort!<br /><br />I didn't actually apply Bayes's theorem in my solution; I only computed the likelihoods P(E|X) and P(E|Y). The ratio of these likelihoods is the Bayes factor, K, which indicates whether the evidence favors X or Y, and how strong it is.<br /><br />Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-21332512297249923882014-12-13T10:55:23.594-08:002014-12-13T10:55:23.594-08:00Solving this problem was a really cool experience....Solving this problem was a really cool experience. I first did not accept the idea that evidence consistent with a hypothesis can make the hypothesis less likely and agreed with gary that we need to multiply by 2 in P(E/X). Just because it lead to the intuitive conclusion that the evidence increases the probability that Oliver was one of the two people. However, I then imagined what if Oliver was AB and not 0. Multiplying by 2 would then lead to the impossible probability of 1.2. That is 2*1*0.6 instead of 2*1*0.01.<br /><br />Only then it came to me that just one sample of 0 really is a little too few should we take Oliver for granted. It was definitely worth being puzzled for a while.<br /><br />Just a minor technical issue: Since the Bayes's theorem is P(X/E) = P(X)*P(E/X) / P(E)<br />shouldn't the number 0.012 be labelled P(E) rather than P(E/Y)? Not that it makes much difference here, but it leads to some confusion as of where to plug the number in the theorem. Also imagine the same problem for a population of only say 10 people. In such a situation, P(E) would not be equal to P(E/Y) and I think what we really are interested in is P(E).<br /> Jaromír Mazákhttp://www.blogger.com/profile/11939209902309250226noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-83746985887960347122014-12-05T05:52:31.168-08:002014-12-05T05:52:31.168-08:00Hi João. Thanks (again) for your solution. It lo...Hi João. Thanks (again) for your solution. It looks great!Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-4291993094199262182014-12-05T02:01:13.322-08:002014-12-05T02:01:13.322-08:00ok, I programmed it in BUGS using the Jeffrey'...ok, I programmed it in BUGS using the Jeffrey's prior :-) My solution suggests there are around 40 hyraxes for this data (I tried with a maximum of 500 and 1000 hyraxes, and the results were relatively stable).<br /><br />The code and results are here http://www.di.fc.ul.pt/~jpn/r/bugs/hyraxes.html<br /><br />Cheers,João Netohttp://www.blogger.com/profile/05560718055133816500noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-70916318109925497812014-12-05T01:02:40.651-08:002014-12-05T01:02:40.651-08:00You should try Jeffrey's prior $p(N) \propto 1...You should try Jeffrey's prior $p(N) \propto 1/N$ that can be normalized for any given M. This will make it more robust to changes of M.<br /><br />Also, you could narrow the interval to [N-n+k,M] instead of [1,M].João Netohttp://www.blogger.com/profile/05560718055133816500noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-67098678678791887352014-10-10T14:41:55.787-07:002014-10-10T14:41:55.787-07:00If only authors and publishers would meet the spir...If only authors and publishers would meet the spirit of the discipline of scholarship as well as its letter. Just make a bibliography. I have a small collection of memorable and valuable bibliographies that I've copied. Some authors divide up their bibliographies into useful segments (for example, by primary/secondary sources, periodicals, etc.). Those authors have my deepest gratitude (and respect).Edward Carneyhttp://www.blogger.com/profile/18390516990905425802noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-51907379381499695292014-10-08T19:33:31.527-07:002014-10-08T19:33:31.527-07:00Tree diagram
1st 2nd P
GFB 1/2*x*1/2=...Tree diagram<br /> 1st 2nd P<br /> GFB 1/2*x*1/2=1/4*x<br /> GF GFGF 1/2*x*1/2*x=1/4x^2<br /> GFG 1/2*x*(1-x)*1/2=1/4*(x-x^2)<br />G<br /> GB 1/2*(1-x)*1/2=1/4*(1-x)<br /> G GGF 1/2*(1-x)*x*1/2=1/4*(x-x^2)<br /> GG 1/2*(1-x)*(1-x)*1/2=1/4*(1-x)^2<br /><br /> BB 1/2*1/2=1/4<br />B BGF 1/2*1/2*x<br /> BG 1/2*1/2*(1-x)<br /><br />Sample space={GFB, GFGF, GFG, GGF, BGF}<br /><br />P(GFGF)+P(GFG)+P(GGF)=(1/4*x^2+1/4*(x-x^2)+1/4*(x-x^2))/(1/4*x+1/4*x^2+1/4*(x-x^2)+1/4*(x-x^2)+1/4*x)=(2-x)/(4-x)<br /><br />When x->0 (Florida is rare name) P=(2-0)/4-0=1/2<br />when x->1/2 (Florida is a half) P=(2-1/2)/(4-1/2)=3/7<br />when x->1 (Florida is equivalent of girl definition) P=(2-1)/(4-1)=1/3 (reduction to Problem 2) Vasili Gavrilovhttp://www.blogger.com/profile/13021023056247394884noreply@blogger.comtag:blogger.com,1999:blog-6894866515532737257.post-5034894005494021692014-10-07T07:29:21.970-07:002014-10-07T07:29:21.970-07:00Good point. I would love to see some kind of typo...Good point. I would love to see some kind of typographical distinction between GD endnotes that contain VIAI and the ones that just have BI (bibliographical information).Allen Downeyhttp://www.blogger.com/profile/01633071333405221858noreply@blogger.com