Thursday, May 02, 2013

Calling Bullshit on Big Data

This article has a decent list of ways to call bullshit on data-driven analyses.  Click the link for context, but here are the top points:

  1. Focus on how robust a finding is, meaning that different ways of looking at the evidence point to the same conclusion. 
  2. Data mavens often make a big deal of their results being statistically significant, which is a statement that it’s unlikely their findings simply reflect chance. Don’t confuse this with something actually mattering. 
  3. Be wary of scholars using high-powered statistical techniques as a bludgeon to silence critics who are not specialists. 
  4. Don’t fall into the trap of thinking about an empirical finding as “right” or “wrong.” 
  5. Don’t mistake correlation for causation. 
  6. Always ask “so what?” 
As often occurs with an emerging technology theme, the glitz and glam of the shiny new thing that is big data often overshadows the real value.  The above list is a great start in being sure that the data product or opportunity being pitched truly can add value to your mission.  

#3 is an interesting one - I see a trend in the emerging big data space that vendors and others seeking to exploit big data too often move to high end, overly complex mathematics, when more basic, easier to understand models would suffice.  This is especially true when building out new applications on top of large datasets.  You will often get to the productive answer faster by building simple prototypes before investing more expensive resources.  Data modeling is no different.

#5 above is a particularly important point.  My sense is that it is difficult for most to logically separate the concepts of correlation and causation.  I find myself jumping too far too often, by inferring to much import on a basic correlation that lacks any evidence of causation.  

At the end of the day, high end mathematics do not negate basic economic theory.  Be smart - don't forget your whits when digging in to big data...

No comments: