Monday, March 10, 2008

Top 10 Commandments of Statistical Inference: #4

Now, we are really cooking with the Ten Commandments of Statistical Inference. A long time ago, I wrote an entry that talked about how one should really look and can vs. should. The 4th commandment really speaks to that point:

Thou Shalt honor the assumptions of thy model!

From the engineer who chooses to calculate a performance index on an unstable process to the marketer who uses a t-test when there is a correlation between their samples, I find this to be the most commonly broken commandment. Unfortunately, I also think it is one of the most dangerous that we have talked about in this series. All models have assumptions, and it is important to make sure that you satisfy these assumptions. Otherwise, your results are suspect at best, down-right worthless most of the time. Usually, at this point I get the argument “but mathematically, I can calculate it.” Sure, mathematically you CAN calculate anything. But theoretically, should you?

So, why is this so common? Because it is a misunderstanding of what the assumptions are in the first place. Compound that with the advent of many software programs out there that make it easier and more user-friendly to calculate results. Now, don’t get me wrong, this is not a bad thing. I myself do not want to go back to hand calculations or Excel formulas. However, if you have never seen the formula, never spent the time understanding the assumptions, then you may not have a grasp what you are doing is correct. Sure, plug in numbers and you will get an answer, but is it right?

How important is this to get right? Let’s put it this way, if a Doctor diagnoses you wrong, but does everything else right according to his diagnosis, is he right? No, no way we would let him get away with it. So, why do we let it pass in statistics?

No comments: