Thursday, September 20, 2007

A Kilo no Longer a Kilo

http://www.cnn.com/2007/TECH/science/09/12/shrinking.kilogram.ap/index.html

I read this article this morning, and my first thought was, wow, what are the drug kingpins going to do now! They are getting jipped!

Actually, my mind immediately came to the thought of making sure that you have good artifacts in which to measure and control off of. It also came to mind that people unlike myself who muse about weird stuff (e.g. regular people) may think...so what? That's what they get for using the metric system! So this piece of metal is losing weight (and perhaps, if a piece of metal can do it, so can I!).

Well, it is much more complicated than that. Most people know that there are universal standards throughout our world. Such as a Kilogram. Because of variance a kilo is never a kilo. So we need to make sure that we all trace back to that artifact. Here in the U.S. we tend to use NIST http://www.nist.gov/, which is the National Institute of Standards and Technology. The idea is that all things are traceable back to a standard and although they never measure or weigh the exact same (due to variance), they are within statistically calculated specifications. Now, if the artifact is degrading, then imagine how hard it is to hit a moving target! 50 Micrograms sounds small, but it is dependent upon the distribution. It could reek havoc. Maybe we will just have to go back to "stones."

Wednesday, September 19, 2007

Talk like a Pirate

Ahoy Maite's,

Today is the official talk like a pirate day....For those of you who are interested, go check out what this really is http://www.talklikeapirate.com/

I was thinking about writing like a pirate but when I tried, I realized how brutal this would be. I would like you to check out Seth Godin's blog today http://sethgodin.typepad.com/. He makes some good points, unfortunatly he falls a little short so I want to clear things up.

He states that you should be focusing in on your real distribution when looking at web traffic (or vists to McDonalds). This is true. That you should not focus on Mean, but Median as well. Again, another salient point. However, in the example he gives you, median would have the possibility of not giving you the full story. Media is the middle value. So, let's say you had 4 visitors. if these visitors came to the site 1,1,9, and 10 times, your mean would be 5.25 and your median would be 5. Not much of a differnce...However, your mode would be 1! This may be important to say my although I sometimes have some high number of visits, my most frequently occuring is 1.

Of course different scenarios call for different measures of central tendancy. So yes, he is correct, make sure you measure more than mean, but if you are going to go in the right direction, make sure you go all the way!

Tuesday, September 18, 2007

The Roe Effect: How NOT to perform a Study

Wow. Please read http://www.opinionjournal.com/extra/?id=110005277.

This is an awful study and a great example on how one can twist numbers into a "good story." There are so many things wrong with it, I do not know where to begin. I am only going to point out a few... In fact, this would probably be a study I would hand to students to dissect and tell me what they think is wrong. Their issues are typical of bad statistical analysis. You could point out their mathematical flaws for eons. However, their very construct is wrong. They assume a cause and affect relationship.

1) They mention that children "tend" to absorb ideals of their parents...yet they analyze as it is a cause and effect relationship...that they WILL absorb ideals of parents.
2) N=1. "Hey, I know this guy who thought everyone was like their parents so they MUST all share the same political views"
3) Wow, they really cleaned up the issue that always arises of getting the right demographics by asking people if they "knew of anyone..." What a great and cheap way to control for the demographics factor. I wish I had thought of this in the past. Would have saved me oodles of time! Then, again, they assumed cause and affect saying well, those that answered yes must have had the same political leanings so that MUST mean those who they answered yes about had the same political views...they even go so far as to say that 1/3 of liberals are having more abortions...hmm, based off of..."I know someone?"
4) The "significant difference" badge of honor. Love that one...since there was a "significant difference" it must be true...even though everything they did was wrong before that.
5) Finally, they also assume those with abortions will even vote. They did not even take into account the voting percentage of those likely to vote!

Bottom line...asking people if they know someone that had an abortion, and assuming that the person who did have an abortion followed the same political leanings, and then again assuming that if so, these aborted individuals would B) Most definitely Vote and B) Most definitely have the same political leanings is one of the biggest stretches I have seen in a very long time.

Monday, September 17, 2007

Data Mining and Web 2.0

My first job out of grad school in 1998 was for a statistical software company. I enjoyed it greatly and found myself working with General Linear Models and Neural Networks quite a lot. Around that time, there seemed to be a big push for “data mining” solutions. Not that these did not exist already, but a lot of people started throwing their hats into the ring. It always bothered me (and many others) that statistical analysis was starting to be taken…lightly (or so it seemed). There was no real clear cut definition but ‘everyone was doing it!’ I once interviewed for a data mining position in which after about 30 minutes, I stopped the interview and asked them to define exactly what they thought data mining was….the admission? They had no clue, but knew they needed someone!

In 2001 I jumped over to the manufacturing world and began working with Statistical Process Control. In 2006 I re-entered the world of more complicated analysis. In that time, what had been a few “data mining solutions” has exploded. I did a very quick search this A.M. and by eyeballing it, I came up with 30-40 different vendors. Search each one of these vendors and you can find a large amount of case studies, each touting grand success stories.

What does this mean? Well, it means that a large market exists for these packages (obviously). A market much larger than the amount of skilled analysts (note, I did not say statisticians, because you don’t HAVE to be a statistician, but you must be skilled!) available. It also means that a lot of these packages lack the capabilities to do proper data mining. Combine these together and you have people who lack the skills to do proper analysis, with tools that lack the capabilities. For each success out there, I wonder about how many very expensive failures there are…

So, what does this have in common with Web 2.0? I tend to think Web 2.0 is following a similar path. There is a huge amount of buzz and a lot of people trying to get into the fray…but ask for a definition and good luck! Of course, just like data mining there will be a good amount of success, but with that I wonder how many failures there will be as well…

Friday, September 14, 2007

Can I..Sure...But SHOULD I?

I can't tell you how many times I get asked by people...can you do this analysis? This question always makes me laugh. When I was in 2nd grade I was browbeat by a very large, intimidating teacher on the difference between can, may, and should.

ME: "Mrs. Brown, can I go to the bathroom"
Mrs. Brown: "I don't know, can you?"
ME: "Mrs. Brown, MAY I got to the bathroom?
Mrs. Brown: "Yes you may!"

This happens all over the country in classrooms, learning the difference between can, may and should. Yet, it amazes me how many don't remember these lessons. I would think they would have considering if you didn't, there was no way you were going to the bathroom, except maybe right there in the room. I say this because I still always get the "Can you do the analysis?" Sure, of course I can...but SHOULD I? Many time, I find myself arguing this very point. They mistaken my answer of "Should not do it" to mean, "Can't do it." Then wonder why they hired a stats guy that can't do math....well, let's get this straight...yes, I CAN do it. I can average 2 numbers together...but SHOULD I average two numbers together? That's a whole different question. Several years ago, I would get into this argument with someone over me, and over and over again it was the same argument, same result. He would argue that I could do the calculation, I would argue that I should not do it (to the point I would give references), and eventually would have to do it anyway. You see, it was never a question of can't....but should.

I see this happen ever more so with the advent of all these new, slick stats packages "made easy." Check my next post for more details on this peice. Needless to say, with these packages...can just about anyone do complicated stats? Yes....Should they? Now that's a whole different discussion!

Thursday, September 13, 2007

Smoking Dieters

http://news.yahoo.com/s/nm/20070913/hl_nm/diets_smokers_dc

Really, this is what they thought they found? That female teenagers who initiate dieting appear at risk for beginning regular smoking?

Hmmm, 21% of the girls were actually overweight but yet 55% were dieters....perhaps, could there be something underlining this? Like perhaps maybe image issues? Did they look at this? Does not look like it to me. Perhaps it was their desire to loose weigth for image issues that also had them increasing in the chance to light up? Just a thought....

These are some of the things that really irk me and give researchers a bad name. Always looking for a simple explanation and ignoring everthing else. You can find a correlation with anything (Statistical), but is it Practical, that's another thing!

Welcome

Welcome to the Stats Geek! In the last 10 years, I have worked as a statistician in many different industries. I served as a Sr. Statistical Consultant for a statistical software company, where I worked with many different companies on a wide range of projects. I worked in the semiconductor business for 5 years as the corporate statistician for global operations. I was published several times within the reticle industry.

Currently I work for a B2B Lead Generation Company as the Sr. Product Manager of Analytics.

This blog will cover the general use and misuse of data within business and our daily lives. I will focus more on the B2B Lead Generation, but will also talk in general terms about the struggle between data integrity and real-life situations.