Wednesday, March 19, 2008

Top 10 Commandments of Statistical Inference: #3

We are currently following the 10 commandments of Statistical Inference. The 3rd commandment, is that:

Thou shalt not make statistical inference with the absence of a model

The 4th commandment was to honor the assumptions of your model…as we discussed why that is important, however some people go even further down the road of insanity and not only “misplace” the assumptions, but misplace the model itself. Situations in which one wishes to infer statistical inference calls for the use of a model. This ensures that you are “following the rules” of the prescribed model. For some reason I am amazed at the backlash this sometimes gets. The “why can’t you just give me an answer” or “why do you need to make it so complicated” complaints. In our society of “quick hit answers” a lot of times, there is this thought that there is no time to set up the proper model. I used to work for this person who said “it is better to ask forgiveness, than permission” and forced the collection and analysis of data beyond the ability to model the data correctly. What happened? Well, sure, we got an answer, and he went on his merry little way…only to have to come back and do the exact same experiment because the results of the first one proved to be invalid. The result was lost time, money, and resources. If the design was set-up correctly in the beginning, it would have taken 3 days to run. He wanted it in 1. He got it in 1, and then spent months trying to backtrack to get to the answer he would have gotten in 3 days…

So, the next time someone tells you, I can model it and it will take x amount of time, you have every right to ask why, but make sure your push for a faster initial result doesn’t cause long term implications.

Monday, March 10, 2008

Top 10 Commandments of Statistical Inference: #4

Now, we are really cooking with the Ten Commandments of Statistical Inference. A long time ago, I wrote an entry that talked about how one should really look and can vs. should. The 4th commandment really speaks to that point:

Thou Shalt honor the assumptions of thy model!

From the engineer who chooses to calculate a performance index on an unstable process to the marketer who uses a t-test when there is a correlation between their samples, I find this to be the most commonly broken commandment. Unfortunately, I also think it is one of the most dangerous that we have talked about in this series. All models have assumptions, and it is important to make sure that you satisfy these assumptions. Otherwise, your results are suspect at best, down-right worthless most of the time. Usually, at this point I get the argument “but mathematically, I can calculate it.” Sure, mathematically you CAN calculate anything. But theoretically, should you?

So, why is this so common? Because it is a misunderstanding of what the assumptions are in the first place. Compound that with the advent of many software programs out there that make it easier and more user-friendly to calculate results. Now, don’t get me wrong, this is not a bad thing. I myself do not want to go back to hand calculations or Excel formulas. However, if you have never seen the formula, never spent the time understanding the assumptions, then you may not have a grasp what you are doing is correct. Sure, plug in numbers and you will get an answer, but is it right?

How important is this to get right? Let’s put it this way, if a Doctor diagnoses you wrong, but does everything else right according to his diagnosis, is he right? No, no way we would let him get away with it. So, why do we let it pass in statistics?

Friday, March 7, 2008

Top 10 Commandments of Statistical Inference: #5

Well, I am glad on the last post was well received, so now I think it is safe for me to start counting down the last 5 commandments of Statistical Inference. Number five:

Thou shalt not adulterate they model to obtain statistical significance.

Now, when you first look at this you think…Adulterate? But it does make sense. It comes down to some of our previous discussions, and that is make sure you do not knowingly (or unknowingly) allow extraneous variables or inferior ingredients into model. Make sure you take steps to control for the things you can control for. Sometimes, it is as easy as excluding certain people, or certain parameters. Other times you may have to really think practically about what CAN affect your model and control for that. In the marketing world it might be to control the time of day when your sends go out. In an experimental design in the lab you may want to add test at separate times and add a blocking variable. Use common sense and your own knowledge of the situation. This is not a “math” problem per-se. Of course there are some techniques available to help, but this typically requires you to sit down and map out your process and brainstorm about all the things that can affect your design and control the heck out of them when you can. This is always my favorite part of designing experiments. It’s when you can be creative. Next post we will discuss about how to understand and fit your situation into a needed model (rather than the other way around!).

Wednesday, March 5, 2008

Statistical Humor Gone Awry!

So, we have gone through the first 5 commandments of Statistical Inference. I am going to move to the next 5 in the next few posts, but before I do it has come to my attention that by presenting these, I have gotten myself into a little bit of a dilemma. My attempt at statistics humor may have gone awry!

I get the feeling that people wondering when it is ok to do statistical analysis, or is it EVER correct to do statistical analysis. Worst yet, I think statistics can only be done in a laboratory with white coats!

First and foremost let me assure you I do believe whole heartedly in statistical analysis ;-) If I didn’t, well, I wouldn’t have worked for the last decade plus in the arena. Secondly, I am not the anti-layman stats guy in the ivory tower throwing pennies and guessing the probability of it hitting someone. Far be it, I am actually a trained psychologist, not a statistics major so I whole heartedly believe that a statistician can encompass people who are not “classically” trained but apply statistics to solve problems. Lastly, the 10 commandments are a tongue in cheek attempt at humor that some of us stats geeks really enjoy. Sadly, when I first received “the list” from one of my co-workers, all stats geeks I was working with at the time stopped whatever they were doing and ran the Xerox copier out of paper.

Bottom line, I think anyone can perform valid statistical testing, but it must be valid, and must follow the rules of what the models were designed for. If you can do this, you will have a wonderful design and results, if not, you are going to find yourself into a mess and you won’t really know why! If you need help, find someone who does understand all the nuances that’s why they are there!

Friday, February 29, 2008

Top 10 Commandments of Statistical Inference: #6

So, the 6th commandment of Statistical Inference is:

Thy shalt not covet thy Colleague’s data.

This sounds like a pretty easy one, but I am amazed to see even now people who view other people’s data and “want what they have.” Why is this so bad? This can drive people to reach beyond what they should in an effort to find “statistical significance” to keep up with the Joneses. How do they do this? Maybe when the numbers don’t match-up they use a less stringent model. Perhaps they refuse to use an adjustment when the situation calls for it. This leads them to travel in and out of the grey that is statistics. These tactics are egregious enough, but then there are those that understand statistics even less and wonder why their data doesn’t look like someone else’s. Many times I have been asked, and sometimes almost blamed or considered a bad statistician, if the data doesn’t look like someone else’s. In some limited fashions I have been prodded to make it look more favorable. This was refused much to the persons chagrin as they did not understand that it was much more than just data integrity on the line.

Bottom line, data is data. It can be made into anything that you want, but only those that truly understand it and use it correctly will learn and help improve the situation. Wishing to have the results of others is not an issue in itself; if it helps you drive improvement towards that goal. It’s when it drives you to look the other way in the analysis where it causes problems.