Comments on Probably Overthinking It: Repeated tests: how bad can it be?

Looks good. Thanks for the link!

2014-05-19T08:47:34.755-07:00

Looks good. Thanks for the link!

Allen, I so liked your take on how "legitima...

2014-05-19T08:25:05.538-07:00

Allen,

I so liked your take on how "legitimate" it is to do repeated A/B tests that I quoted you in an article of mine: http://blog.analytics-toolkit.com/2014/why-every-internet-marketer-should-be-a-statistician/ I argue that online marketing should be much more science-based and I introduce some statistical concepts. Your quote was perfect for my "arbitrary stopping" part, thanks!

Sorry for the slow reply -- I missed your question...

2013-11-28T06:15:11.427-08:00

Sorry for the slow reply -- I missed your question. The smallest case I can think of with an even DoF is A/B/C testing with three possible outcomes (for example, a use might click on a pop-up ad, close it, or ignore it). Then there are 9 cells in the table and one constraint, so DoF=8. More generally, the number of test conditions and the number of outcomes must be odd.

I appreciate the dialogue about degrees of freedom...

2013-09-26T09:51:28.173-07:00

I appreciate the dialogue about degrees of freedom prompted by Unknown's comment. I can't remember where I picked it up, but I always thought that for online A/B tests, the # of DoF was determined by the number of variants. In other words, A/B test would have a DoF of 1, A/B/C would have DoF of 2, etc. After reading this post, I went back through my stats textbook and realized that you're right. While I can't cite any resources off the top of my head, I'm all but positive that I'd read one DoF per variant in a marketing forum or blogpost somewhere. I'm glad to realize the error, I'm now updating the tools that I've built.

BTW, there is A LOT of misinformation about statistics & optimization in the marketing community. Almost every A/B testing tool available encourages letting your tests run until confidence is reached. Many of the prominent tools for MVT encourage Taguchi method for speeding up test results. I'm sure this isn't limited to the world of online marketing, but as I learn more about statistics I'm shocked by the amount of bad math offered by otherwise reputable companies. I'm preaching against this within my company (which is why I've built my own tools to begin with).

This leads to a question. For online A/B/n tests, is there ever a circumstance where you'll have an even DoF? In other words, if version A gets no clicks, would there only be 3 outcomes (no for ver A, yes/no for ver B), leading to DoF of 2? Or would you still count 4 possible outcomes, leading to DoF of 3?

Thanks for informative post!

I think the argument I made above is correct. Ple...

2013-08-05T07:22:42.147-07:00

I think the argument I made above is correct. Please see the update I just added with an additional figure.

The d.f. for chi square is 1, not 3, as the null h...

2013-07-26T08:28:06.922-07:00

The d.f. for chi square is 1, not 3, as the null hypothesis of independence adds 2 more constraints. For JxK 2-way tables,
df for independence = (I-1)(J-1). See any stat text, e.g. A. Agresti, 2002, Categorical Data Analysis, 2nd Ed, Wiley, p. 79

Great post. I love the visualisations. This also...

2011-10-06T16:00:52.438-07:00

Great post. I love the visualisations.

This also reminds me I need to finish that post on multi armed bandit algorithms. Partly because they work, partly because of the name but mainly because "Originally considered by Allied scientists in World War II, it proved so intractable that it was proposed the problem be dropped over Germany so that German scientists could also waste their time on it."

Also thanks for saying such nice things about my post