Monday, February 25, 2008

Top 10 Commandments of Statistical Inference: #9

OK, so last post we discussed the first rule (or, number 10) of statistical inference. That was, Thou shalt not infer causal relationships from statistical significance. Number 9 is:

Thou shalt not apply large sample approximation in vain.

This is more of a tricky concept. For the most part, the larger the sample size the closer to the population you are. Right? Well, based off of that, the closer one is to the sample representing the population. Most statistical models are based off of this assumption, and therefore the larger sample size you have the “easier” it becomes to find statistical significance. Even a first year stats student begins to be able to point this out. Just look at the back of any statistical book and look at the t-distribution and watch what happens to the t-value needed to find significance. It goes down….

In fact, I remember one of my interview questions for my first job was:

You have one sample with a correlation value of .30 and no significance and another value of .28 and statistical significance. Why would a smaller value have significance?

Because significance has little to do with strength and a larger sample size can help “find significance.”

In other words, don’t use large sample sizes just to find significance. It is important to take the correct sample size for the statistical model you are using. Each model is different and if you don’t know the assumptions of the model, you should search out and ask of the ramifications of a large sample size. In other words, know thy model!

No comments: