Friday, May 31, 2013

Data in Context

Garance Frank-Ruta clarifies the context surrounding the data "discovered" that show former IRS Commissioner Douglas Shulman visiting The White House 157 times during his tenure.  The short story is that the data used to make that claim is imperfect.  A large majority of the supposed visits were in fact unfulfilled invites.  Another claim made from the same dataset is that he visited more often than any cabinet member - another falsehood given that the system referenced is used primarily to allow access to those walking in to the complex - cabinet members, given their seniority, are able to drive on to the White House complex.

This is yet another example of how important it is to leverage data thoughtfully.  Intelligence requires deliberate thought, not quick assertions and grandiose conclusions.  Minimal effort would have reveled the imperfections of the data referenced.  (The system used was built to track appointments within the White House complex, but only for meetings and "typical events.  Access lists for larger events often forgo the use of this system, as do appointments involving more senior government officials cleared to drive in to the complex.)

To ensure one does not fall in to this trap, there are three questions you must first answer, before acting on the information gleaned from a particular dataset:

  1. How was the data collected?
  2. What specific data is included in the dataset?
  3. And, most importantly, what specific data is NOT included in the dataset?

Only with such context can you begin to understand the information available...

No comments: