Integrity in data analysis

Posted by HR Analytics on Sunday, December 16, 2012
Just recently, I've been tasked with replicating a couple of data tables done last year to see if there's any noticeable trend. Last year's datasets were done by another group and they're pretty straightforward, mostly averages and aggregated data. But here I am trying to trace the methodology and making sure that those steps are repeatable. I need to make sure that the methodology I'm going to deploy is exactly the same as last year's and so that means I need to get last year's datasets and try to repeat the calculations to reproduce the numbers.  I find myself asking questions such as what variables were included in calculating the averages and what was the justification for including (or excluding) data points.  Trying to reproduce the numbers based on last year's datasets is not always an easy task as one would imagine.  There are so many factors at play and not everybody calculates the same way.  That's why I always appreciate it when analysts send along the formulas in Excel embedded in the cells or the Procs and the steps saved in the SAS file.  That way I can see step by step what variables are included, what values were omitted, how decimals were rounded off, etc.

As I try to work out the formulas to reproduce the numbers, it dawned on me that this is an excellent opportunity to blog about data integrity.  Sure, it is easy to do my own calculations for this year's dataset and stand it side by side with the previous year's numbers, however it was calculated, and assume that these are the trends.  There may be a jump of a couple of percentage points so, sure, even better, the methodology must be somewhat similar step-by-step!  Well, not so.  And this shows one of many holes introduced by human error to compromise data integrity.  It compromises the reliability and consistency of data. It is very easy to assume that crunching averages is a straightforward process and that everyone does it the same way.  But here in this example, it shows that the extra step of validating the accuracy and consistency of data is an obviously critical step in maintaining the integrity of data.  Just keep in mind how much executives and policy-makers base their decisions on that one number. 

 


Tags: data integrity  accuracy  reliability  validity  consistency 
comments powered by Disqus

Copyright © 2012 Analytics-HR. All Rights Reserved   |   Privacy Policy  |  Contact    
Make a Free Website with Yola.