Monday, September 17, 2007

Data Mining and Web 2.0

My first job out of grad school in 1998 was for a statistical software company. I enjoyed it greatly and found myself working with General Linear Models and Neural Networks quite a lot. Around that time, there seemed to be a big push for “data mining” solutions. Not that these did not exist already, but a lot of people started throwing their hats into the ring. It always bothered me (and many others) that statistical analysis was starting to be taken…lightly (or so it seemed). There was no real clear cut definition but ‘everyone was doing it!’ I once interviewed for a data mining position in which after about 30 minutes, I stopped the interview and asked them to define exactly what they thought data mining was….the admission? They had no clue, but knew they needed someone!

In 2001 I jumped over to the manufacturing world and began working with Statistical Process Control. In 2006 I re-entered the world of more complicated analysis. In that time, what had been a few “data mining solutions” has exploded. I did a very quick search this A.M. and by eyeballing it, I came up with 30-40 different vendors. Search each one of these vendors and you can find a large amount of case studies, each touting grand success stories.

What does this mean? Well, it means that a large market exists for these packages (obviously). A market much larger than the amount of skilled analysts (note, I did not say statisticians, because you don’t HAVE to be a statistician, but you must be skilled!) available. It also means that a lot of these packages lack the capabilities to do proper data mining. Combine these together and you have people who lack the skills to do proper analysis, with tools that lack the capabilities. For each success out there, I wonder about how many very expensive failures there are…

So, what does this have in common with Web 2.0? I tend to think Web 2.0 is following a similar path. There is a huge amount of buzz and a lot of people trying to get into the fray…but ask for a definition and good luck! Of course, just like data mining there will be a good amount of success, but with that I wonder how many failures there will be as well…

No comments: