Comments on Probably Overthinking It: Many rules of statistics are wrong

Yes, good point. Time series data is particularly...

2016-03-08T06:06:25.751-08:00

Yes, good point. Time series data is particularly good at producing spurious correlations.

Hello Allen I knew "Think Python" from ...

2016-03-08T05:52:10.704-08:00

Hello Allen

I knew "Think Python" from long ago, and recently I discovered the rest of your books, which are great, thank you.

I just wanted to comment on the "correlation does not imply causation" thing. As I see it, this statement usually refers to heavily autocorrelated series, this is, series with actually a few independent points. It is very easy to find spurious correlations in this kind of series, as the global warming and number of pirates example. When you have two samples of n=1000 points each, with no autocorrelation, and find a 0.9 correlation then there is almost certainly a causal link behind.

Good point. Thanks!

2016-02-12T07:46:18.944-08:00

Good point. Thanks!

Alright, the 'old' mantra wants to warn of...

2016-02-12T05:39:40.970-08:00

Alright, the 'old' mantra wants to warn of the 'trivial' case: Correlation between A and B does not imply A causes B directly, or vice versa"

Your new version is, I believe: 'Correlation between A and B implies *something* is causing it'.

Both true, but I believe it's very important to include "directly" and "something" in the respective versions. Just saying "correlation is evidence of causation." is prone to be misinterpreted, just as the simplified "correlation does not imply causation".

Makes me wonder how the clearly coincidental corre...

2016-01-19T00:14:10.670-08:00

Makes me wonder how the clearly coincidental correlations mentioned were found. If I gather all of the data I can find and am able to pull some spurious relationships out, what hypothesis am I testing? Did I do many, many tests to get a hit? How relevant is the demonstrable existence of pure coincidence to the interpretation of a well designed experiment?

Sometimes I feel like people know just enough statistics to be a little afraid of it, so they take the hard line textbook interpretation.

Thanks for the post, and the response. I was not aware of the opium/Everest correlation. Definitely going to use that in my science literacy class. This stuff boggles my mind.

Regarding p-values, I'd recommend reading Cosm...

2015-12-14T16:15:35.122-08:00

Regarding p-values, I'd recommend reading Cosma Shalizi's blog post here: http://bactra.org/weblog/1111.html

If I'm interpreting him correctly, he would disagree with you that you can say much about H given an adequately small p-value. However, he would also say that this implies that the p-value is a very limited measure since, among other reasons, p-values tend to shrink exponentially fast as the sample size grows.

Hello Allen, Long-time reader, first-time poster. ...

2015-12-11T08:31:00.226-08:00

Hello Allen,
Long-time reader, first-time poster.

A couple of comments:
1) Correlation / causation. Even though this may sound pedantic, I think semantics make a difference here. The word "imply" is often used in the sense of "logically implying". When used in this sense, it is in fact true that the the belief that correlation implies causation is the logical fallacy of affirming the consequent. That being said, if you talk about "evidence" (and not "implication") of correlation in favor of causation you are correct. I don't know how most people typically interpret "imply" - in the logical sense, or in the sense of shifting evidence. Maybe you like this quote from XKCD: Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there."

2) Regression, matching, (and weighting) all have the same underlying causal assumptions, namely "ignorability" (sometimes also called "selection on observables" or "unconfoundedness"). In fact, Angrist and Pischke in their book "Mostly harmless econometrics" formally prove that all three estimators are in the same class. You can re-express regression as a particular weighting scheme, and you can do the same with matching. I am not sure how widespread the belief is that you cite - if you were to go to a conference like Atlantic Causal Inference Conference, I would think that all participants would know that there is nothing magical about matching, and that all these methods share the same underlying assumption. There are practical advantages and disadvantages to all these methods though.

3) Regarding reversing regressions - this is not a theoretically sound way to determine causal direction (or provide evidence for one or the other regression direction). Judea Pearl proved I think in the 80s that the models that you are suggesting are all in the same Markov equivalence class, and that parameters yielded by those models cannot be used to distinguish which one might be the true causal model. Apologies for the self-promotion but my paper on reversing arrows in mediation models also shows this point. That being said, if you are willing to make certain untestable assumptions about distributions of disturbance terms, you can use methods that you suggest (reversing regressions) to determine causal direction. The work of Bernhard Schoelkopf is important in this domain. Unfortunately, the assumptions needed to make these methods work will by definition always be untestable, and thus be subject to debate.

All the best,
Felix

Alex, I think we are agreeing. If all you know is...

2015-12-10T07:49:49.272-08:00

Alex, I think we are agreeing. If all you know is the p-value, the conclusions you can reach about H are pretty weak, and qualitative, even with my additional assumptions. That's why I say that traditional NHST is mostly useless.

But if the p-value is small, you can usually conclude that the observed effect is probably not due to random sampling, and you can turn your attention to other possible sources of error.

I don't object to making additional assumption...

2015-12-10T07:38:14.378-08:00

I don't object to making additional assumptions so that you can say something about H. If you want to compare hypotheses, then yes go ahead and compare them (with Bayes Factor or likelihood ratios, for god's sakes, something!).

You can 'say' or decide what you want, but what is the number that backs up such a statement? I would argue not the p-value. If there are tools that do exactly what you want (to compare H0 to H, for example), why not use them?

Using the p-value does not stand up to even basic additional scrutiny. For example, you get a p < .05 and you say H is 'more likely'. Then someone (a reviewer, a skeptic, an interested friend) asks, well how much more likely? 1.2 times, 20 times? How do you respond to that? Seems vital to me.

It sounds like you are agreeing with the rule that...

2015-12-10T07:23:00.811-08:00

It sounds like you are agreeing with the rule that a small p-value does not allow you to say anything at all about H, and therefore that hypothesis testing is completely useless.

I don't love NHST, but I am a slightly bigger fan than you. In this previous article, I explain why:

http://allendowney.blogspot.com/2015/05/hypothesis-testing-is-only-mostly.html

It's true that you have to make some additional assumptions in order to say anything about H, and it sounds like you object to that.

But the assumptions are very weak, and nearly always true in practice. So if you are trying to do something practical, like guide decision-making under uncertainty, why would you not accept reasonable assumptions? Especially when the alternative is to provide no guidance whatsover?

"Assuming that they are more likely under H (...

2015-12-10T07:07:21.873-08:00

"Assuming that they are more likely under H (which is almost always the case), you can conclude that the data are evidence in favor of H and against H0."

So you don't accept a rule because you choose to assume something else? Not much of an argument. And also unsubstantiated, as there are plenty of examples to the contrary. I'd say generally, merely a significant p-value in social science research where data are noisy, analysis is likely p-hacked or a garden of forking paths, etc. provides fairly weak evidence against H0; I strongly disagree that you can comment directly on H from the p-value without lifting a finger on modeling H. Please see Wagenmakers 2007 p. 792-793, where p = 0.05 can even indicate that H0 is likely to be true, or for another example Nickerson 2000, p. 249-251. There are many criticisms out there. I'd also highly recommend Schmidt and Hunter paper below as general overview.

The main thing is that you are wanting a p-value to be some kind of likelihood ratio or Bayes Factor, which it is not. A p-value is completely one-sided and only concerns the probability of the data under H0, not even the probability of H0. Overall, you are disagreeing with the interpretation of p-values by mis-interpreting them even more than the mess that brings about the current reproducibility crisis, Ionnadis' "most published research is false", etc.

Ioannidis, J. P. (2005). Why most published research findings are false. Chance, 18(4), 40-47. Accessed at:
http://robotics.cs.tamu.edu/RSS2015NegativeResults/pmed.0020124.pdf

Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy. Psychological methods, 5(2), 241.
Accessed at: http://psych.colorado.edu/~willcutt/pdfs/Nickerson_2000.pdf

Schmidt, F. L., & Hunter, J. E. (1997). Eight common but false objections to the discontinuation of signiﬁcance testing in the analysis of research data. What if there were no signiﬁcance tests, 37-64.
Accessed at: http://www.phil.vt.edu/dmayo/personal_website/Schmidt_Hunter_Eight_Common_But_False_Objections.pdf

Wagenmakers, E. J. (2007). A practical solution to the pervasive problems ofp values. Psychonomic bulletin & review, 14(5), 779-804. Accessed at:
http://www.ejwagenmakers.com/2007/pValueProblems.pdf

"Quoting rules is not an argument." Exce...

2015-12-10T04:13:45.320-08:00

"Quoting rules is not an argument." Excellent point (especially re. the correlation/causation maxim you mention earlier. This speaks to a larger trend in citing uncertainty as a means of rejecting any and all evidence out there. Great post.

You are right, I should have included "just p...

2015-12-08T11:52:58.578-08:00

You are right, I should have included "just plain coincidence" on the list of explanations. Thanks for the comment and the link.

I bet the rules are quoted and interpreted in such...

2015-12-08T11:27:23.875-08:00

I bet the rules are quoted and interpreted in such an extreme way because people who have learned a little statistics are feeling smug about it, because there really is a naive tendency to make mistakes that the rules are designed to point out.

There is a funny site that you have probably seen where Tyler Vigen shows strong correlation that is coincidence. You could argue that the correlation is evidence of causation, but that would require a definition of evidence a bit more weak than I think most people would assume.

My favorite is the correlation between the production of opium in Afghanistan and a picture of Mount Everest.

https://twitter.com/tylervigen/status/603204482856591360

In this case, none of the relationships you mentioned are likely to obtain. "A might cause B, B might cause A, or any number of other factors, C, might cause both A and B." Instead, in this case C caused A and D caused B, but they still look similar on a plot.