The Other Big Data
Big Data and its co-conspirator — the data scientist — are grabbing a lot of headlines these days. To be sure, the piles of data available for analysis and true mining for value are enormous. Petabytes, exabytes, whatever. Search for articles on the subject, and you'll find mentions of Twitter, Facebook, Yahoo, LinkedIn, and the other usual suspects. I had a conversation the other day discussing the amount of data in daily extracts from DNS servers. Lots of fascinating activities.
I have Big Data challenges of my own, but they aren't related to social media. I'm not going to mine Twitter for sentiment analysis. At least not in the confines of my current employment; however, I do have big opportunities to take advantage of the data I have access to. And they're not web logs, url requests, or "likes."
One statistician described big data as "a teeming multitude of variables and an army of observations." That's actually much closer to my experience, especially the "army of observations" part. Whereas previously we would make comments like, "How great would it be to have actual data on <insert question here>?" Now we struggle with "What is our highest priority question to answer, and are we collecting enough data to effectively answer it?"
My Big Data focus is prioritizing the questions that will drive the most business value. Defining the parameters of business value is a topic for another day.
For additional reading on the topic, I really enjoyed this post, discussing how to turn Big Data into Big Analytics.