Wednesday, October 3, 2007

Data from Multiple Sources

I was just working on a presentation for Innotech next week and I got to thinking about something that has always concerned me. Technology has afforded us the ability to capture droves of data, and has also given us a lot of more user-friendly software in which to analyze the data. Some of this is really good for us, and some of it is bad for us. In the wrong hands, bad data and assumptions can bring a company to it's knees pretty quickly. In order to choose the correct model, one must know what the model assumption are. I have seen many analyses completed on data that do not follow the correct assumptions. Unfortunately, software now available compounds this. In the good old days, one really had to at least understand the make-up of data in order to run an analysis. It wouldn't stop anyone from doing the wrong thing, but it was a decent barrier. Now, one can just push and pull data through systems without knowing too much if what they are doing is really right or not. Some software systems have developed barriers, but this still does not stop some weird things (I was once asked why a software system would not allow him to do a multiple regression with over 500 variables!). Am I saying everyone needs to be a statistician. Well, no.... But what I am saying is, if you don't know some of the basics, beware of your results.

No comments: