Wednesday, November 14, 2007

Why Statisticians Shouldn't Watch Sports

Check out this link. I got a kick out of this. First of, I really liked his reasoning. For the most part, he controlled for all the variables he needed to control for and made things simple yet elegant (I assume when he controlled for defensive points, he also controlled for only yards allowed by Defense.). Secondly, I understand this man's pain. Can't even watch a game without trying to analyze some mundane fact that only other people like him would like, which in turn causes my wife soem pain as well, having to hear it.

So, for all of you people out there that want to go check out if their team is a "bend but don't break" defense, one bit of warning. He focused on the Pac-10. So, what would be interesting to know if his ratio would stay the same in different conferences. I would assume possibly not. If not, then to have an accurate ratio, you may need to focus on each conference and then within division I football.

Friday, November 2, 2007

Demming

Earlier today I sat through a meeting in which the presenter mentioned Deming. Wow, this was the first time I have heard Deming in about 2 years, or the time period that I have spent in Marketing. Of course my interest was peaked. The presenter went on to explain how important it was to experiment every day. I totally agree with this. However, I was a little disappointed that he never spoke about how to ensure you adequately measure those results. I am giving him a pass though, as I assume it was the audience he was speaking to. Regardless, I think this is an extremely important point. Experiment all you want in this world. Tweak things and be curious...but always make sure you can adequately measure your results of the test. If not, then you have no idea what "experiment" really worked. To do this, you need to make sure your data is accurate and accessible; and that you have control of the variables. If not, you can test all you want, but you will have little understanding as to whether your manipulation effected your metrics, or something else effected them.

Wednesday, October 17, 2007

Email Diva? hmm, stick with email

I was just handed an article written by the Email Diva (it was hard copy, otherwise I would have the link). Summarizing, the author stated that since there was no standardization in email metrics (citing EEC Whitepaper), one should seek out benchmarks from their Email Service Provider, or Marketing Sherpa, which is close to apples to apples. However, because of non-standardization and other issues, “Comparing your results to industry standards will never tell you whether the effort is worthwhile for your company….The only standard is: did I make money/was I able to acquire new customers at an acceptable cost?”

Well, in regard to the standardization issue, I whole-heartedly agree that currently there is a problem. Going to your provider is a great option. Indices such as The Bulldog Index ensure everything is calculated and treated the same. However, I do not agree that the Marketing Sherpa Guide is close to apples to apples. I think it is a good guide, and important, but by its very nature, it is a survey, and therefore wrought with non-standardization. Again, another reason for the indexes like the Bulldog Index!

The main concern I have is the statement of no reason to use Industry standards (even when standardized). The only standard is money? What IS an acceptable cost? An Industry benchmark helps you decide what your standard SHOULD be and helps you compare yourself to competitors. If your CPL is $35.00 one month and $33.00 the next, great, you improved, but if your industry average is $25.00 you have a lot of work to do, and your standard needs to be improved, otherwise you are losing out to your competitors. The landscape changes dramatically if your industry CPL standard is $45.00. They you can make a decision of, continual improvement on what you are currently doing, or taking resources and going after something else.

Friday, October 12, 2007

Right is Right

An individual once came into my office after a particularly upbeat argument about statistics and wrote on my board “Right is Right.” I kept that on my board until I moved out of the office. For a while, I thought, yes, I must be that theorist, because I am right, I know theory. It is my job to remain strong in theory. I am not so sure about that anymore.

Basically, sometimes you will have a person on one side trying to argue that conservative, statistical route and the other side; you have a person explaining that you are thinking too analytical and need to focus on the overall goal. In the end, both are right and both are needed. You need a “theory guy” to ensure that what is being done is following the correct models and assumptions. However, a lot of time that theorist can be too involved in theory, and not involved in enough of delivery. That’s where the other person comes into play. The “Strategy” guy. While it is the theorists’ job to bring the right assumptions to the table, it is the strategists’ job to bring the theorist more into the real world. If this can be done well, it can be a great synergy.

Look, it is about being right, statistically. Because if it is not, then no model will work. But it is also about delivery and getting things done. Sometimes you don’t have the correct data to make the perfect model, and if you wait too long, you and your company looses out. It is the organization that has a good synergy between the two that will be the most successful!

Wednesday, October 3, 2007

Data from Multiple Sources

I was just working on a presentation for Innotech next week and I got to thinking about something that has always concerned me. Technology has afforded us the ability to capture droves of data, and has also given us a lot of more user-friendly software in which to analyze the data. Some of this is really good for us, and some of it is bad for us. In the wrong hands, bad data and assumptions can bring a company to it's knees pretty quickly. In order to choose the correct model, one must know what the model assumption are. I have seen many analyses completed on data that do not follow the correct assumptions. Unfortunately, software now available compounds this. In the good old days, one really had to at least understand the make-up of data in order to run an analysis. It wouldn't stop anyone from doing the wrong thing, but it was a decent barrier. Now, one can just push and pull data through systems without knowing too much if what they are doing is really right or not. Some software systems have developed barriers, but this still does not stop some weird things (I was once asked why a software system would not allow him to do a multiple regression with over 500 variables!). Am I saying everyone needs to be a statistician. Well, no.... But what I am saying is, if you don't know some of the basics, beware of your results.

Tuesday, October 2, 2007

Probability of Eye Injuries

Wow,

OK, sorry everyone. I am back. I had a little mishap. I was running and I actually got stung in the eye with a wasp. Now, I was thinking, what is the probability of THAT happening. So I tried to do a little research on wasp stings and the likelihood of getting stung in the eye. Unfortunately, there is little information out there that can be used. I did find that there was about 9K fireworks related mishaps a year, and 30% of these affecting the eye. Hm, little curious to find out where in the country this happens the most! I also found out that there were about 42,286 work related injuries to the face in 2002 and 70% involved the eye. Dang! Unless I was working with bee keepers, that won't help me.

Do you ever feel like this when trying to calculate what seems to be a simple issue? You can't find the correct data, and you end up chasing the wrong information. Sometimes, in the case of the wasp sting, you may just have to cut your loses and try another day. Otherwise you can reach too hard for honey which turns out to be just jam.

Thursday, September 20, 2007

A Kilo no Longer a Kilo

http://www.cnn.com/2007/TECH/science/09/12/shrinking.kilogram.ap/index.html

I read this article this morning, and my first thought was, wow, what are the drug kingpins going to do now! They are getting jipped!

Actually, my mind immediately came to the thought of making sure that you have good artifacts in which to measure and control off of. It also came to mind that people unlike myself who muse about weird stuff (e.g. regular people) may think...so what? That's what they get for using the metric system! So this piece of metal is losing weight (and perhaps, if a piece of metal can do it, so can I!).

Well, it is much more complicated than that. Most people know that there are universal standards throughout our world. Such as a Kilogram. Because of variance a kilo is never a kilo. So we need to make sure that we all trace back to that artifact. Here in the U.S. we tend to use NIST http://www.nist.gov/, which is the National Institute of Standards and Technology. The idea is that all things are traceable back to a standard and although they never measure or weigh the exact same (due to variance), they are within statistically calculated specifications. Now, if the artifact is degrading, then imagine how hard it is to hit a moving target! 50 Micrograms sounds small, but it is dependent upon the distribution. It could reek havoc. Maybe we will just have to go back to "stones."

Wednesday, September 19, 2007

Talk like a Pirate

Ahoy Maite's,

Today is the official talk like a pirate day....For those of you who are interested, go check out what this really is http://www.talklikeapirate.com/

I was thinking about writing like a pirate but when I tried, I realized how brutal this would be. I would like you to check out Seth Godin's blog today http://sethgodin.typepad.com/. He makes some good points, unfortunatly he falls a little short so I want to clear things up.

He states that you should be focusing in on your real distribution when looking at web traffic (or vists to McDonalds). This is true. That you should not focus on Mean, but Median as well. Again, another salient point. However, in the example he gives you, median would have the possibility of not giving you the full story. Media is the middle value. So, let's say you had 4 visitors. if these visitors came to the site 1,1,9, and 10 times, your mean would be 5.25 and your median would be 5. Not much of a differnce...However, your mode would be 1! This may be important to say my although I sometimes have some high number of visits, my most frequently occuring is 1.

Of course different scenarios call for different measures of central tendancy. So yes, he is correct, make sure you measure more than mean, but if you are going to go in the right direction, make sure you go all the way!

Tuesday, September 18, 2007

The Roe Effect: How NOT to perform a Study

Wow. Please read http://www.opinionjournal.com/extra/?id=110005277.

This is an awful study and a great example on how one can twist numbers into a "good story." There are so many things wrong with it, I do not know where to begin. I am only going to point out a few... In fact, this would probably be a study I would hand to students to dissect and tell me what they think is wrong. Their issues are typical of bad statistical analysis. You could point out their mathematical flaws for eons. However, their very construct is wrong. They assume a cause and affect relationship.

1) They mention that children "tend" to absorb ideals of their parents...yet they analyze as it is a cause and effect relationship...that they WILL absorb ideals of parents.
2) N=1. "Hey, I know this guy who thought everyone was like their parents so they MUST all share the same political views"
3) Wow, they really cleaned up the issue that always arises of getting the right demographics by asking people if they "knew of anyone..." What a great and cheap way to control for the demographics factor. I wish I had thought of this in the past. Would have saved me oodles of time! Then, again, they assumed cause and affect saying well, those that answered yes must have had the same political leanings so that MUST mean those who they answered yes about had the same political views...they even go so far as to say that 1/3 of liberals are having more abortions...hmm, based off of..."I know someone?"
4) The "significant difference" badge of honor. Love that one...since there was a "significant difference" it must be true...even though everything they did was wrong before that.
5) Finally, they also assume those with abortions will even vote. They did not even take into account the voting percentage of those likely to vote!

Bottom line...asking people if they know someone that had an abortion, and assuming that the person who did have an abortion followed the same political leanings, and then again assuming that if so, these aborted individuals would B) Most definitely Vote and B) Most definitely have the same political leanings is one of the biggest stretches I have seen in a very long time.

Monday, September 17, 2007

Data Mining and Web 2.0

My first job out of grad school in 1998 was for a statistical software company. I enjoyed it greatly and found myself working with General Linear Models and Neural Networks quite a lot. Around that time, there seemed to be a big push for “data mining” solutions. Not that these did not exist already, but a lot of people started throwing their hats into the ring. It always bothered me (and many others) that statistical analysis was starting to be taken…lightly (or so it seemed). There was no real clear cut definition but ‘everyone was doing it!’ I once interviewed for a data mining position in which after about 30 minutes, I stopped the interview and asked them to define exactly what they thought data mining was….the admission? They had no clue, but knew they needed someone!

In 2001 I jumped over to the manufacturing world and began working with Statistical Process Control. In 2006 I re-entered the world of more complicated analysis. In that time, what had been a few “data mining solutions” has exploded. I did a very quick search this A.M. and by eyeballing it, I came up with 30-40 different vendors. Search each one of these vendors and you can find a large amount of case studies, each touting grand success stories.

What does this mean? Well, it means that a large market exists for these packages (obviously). A market much larger than the amount of skilled analysts (note, I did not say statisticians, because you don’t HAVE to be a statistician, but you must be skilled!) available. It also means that a lot of these packages lack the capabilities to do proper data mining. Combine these together and you have people who lack the skills to do proper analysis, with tools that lack the capabilities. For each success out there, I wonder about how many very expensive failures there are…

So, what does this have in common with Web 2.0? I tend to think Web 2.0 is following a similar path. There is a huge amount of buzz and a lot of people trying to get into the fray…but ask for a definition and good luck! Of course, just like data mining there will be a good amount of success, but with that I wonder how many failures there will be as well…

Friday, September 14, 2007

Can I..Sure...But SHOULD I?

I can't tell you how many times I get asked by people...can you do this analysis? This question always makes me laugh. When I was in 2nd grade I was browbeat by a very large, intimidating teacher on the difference between can, may, and should.

ME: "Mrs. Brown, can I go to the bathroom"
Mrs. Brown: "I don't know, can you?"
ME: "Mrs. Brown, MAY I got to the bathroom?
Mrs. Brown: "Yes you may!"

This happens all over the country in classrooms, learning the difference between can, may and should. Yet, it amazes me how many don't remember these lessons. I would think they would have considering if you didn't, there was no way you were going to the bathroom, except maybe right there in the room. I say this because I still always get the "Can you do the analysis?" Sure, of course I can...but SHOULD I? Many time, I find myself arguing this very point. They mistaken my answer of "Should not do it" to mean, "Can't do it." Then wonder why they hired a stats guy that can't do math....well, let's get this straight...yes, I CAN do it. I can average 2 numbers together...but SHOULD I average two numbers together? That's a whole different question. Several years ago, I would get into this argument with someone over me, and over and over again it was the same argument, same result. He would argue that I could do the calculation, I would argue that I should not do it (to the point I would give references), and eventually would have to do it anyway. You see, it was never a question of can't....but should.

I see this happen ever more so with the advent of all these new, slick stats packages "made easy." Check my next post for more details on this peice. Needless to say, with these packages...can just about anyone do complicated stats? Yes....Should they? Now that's a whole different discussion!

Thursday, September 13, 2007

Smoking Dieters

http://news.yahoo.com/s/nm/20070913/hl_nm/diets_smokers_dc

Really, this is what they thought they found? That female teenagers who initiate dieting appear at risk for beginning regular smoking?

Hmmm, 21% of the girls were actually overweight but yet 55% were dieters....perhaps, could there be something underlining this? Like perhaps maybe image issues? Did they look at this? Does not look like it to me. Perhaps it was their desire to loose weigth for image issues that also had them increasing in the chance to light up? Just a thought....

These are some of the things that really irk me and give researchers a bad name. Always looking for a simple explanation and ignoring everthing else. You can find a correlation with anything (Statistical), but is it Practical, that's another thing!

Welcome

Welcome to the Stats Geek! In the last 10 years, I have worked as a statistician in many different industries. I served as a Sr. Statistical Consultant for a statistical software company, where I worked with many different companies on a wide range of projects. I worked in the semiconductor business for 5 years as the corporate statistician for global operations. I was published several times within the reticle industry.

Currently I work for a B2B Lead Generation Company as the Sr. Product Manager of Analytics.

This blog will cover the general use and misuse of data within business and our daily lives. I will focus more on the B2B Lead Generation, but will also talk in general terms about the struggle between data integrity and real-life situations.