2014 is supposed to be the year of big data in corporate IT. We all (supposedly) have massive amounts of captured transaction information in the data centre that can be mined somehow to glean some strategic business insight. It’s a great hypothesis, and there’s a lot to be said for making use of information that you already have.
After last year’s sensational run of stories about how major retailers will know that a customer is pregnant before she herself knows, there’s been a lot of interest in the business world in trying to find new ways to draw conclusions about customers’ wants and needs. That makes sense. I’m all for pursuing better customer service. However…
One word of warning: in your quest to leverage your so-called big data reserves, be wary of the temptation to reward data initiatives until you can conclusively prove that the results of said initiatives actually realise meaningful, independently-provable value for the company. There’s a very good chance that your early forays into big data processing will be rife with mistaken conclusions. Worse, the pursuit of the “big” component of big data can readily lead to inaccurate or misleading data.
In 1976, the American social scientist Donald Campbell coined what’s become known as Campbell’s Law when he wrote: “The more any quantitative social indicator (or even some qualitative indicator) is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” When you reward your employees for acting on any given metric or process, the employees gain a powerful incentive to “game” the metric or process in order to maintain or to increase their expected reward, whether or not the metric or process has any actual relevance to the business or to its success. The pursuit of reward is inherently and inevitably corrupting, whether people realise that they’re being corrupted by it or not.
In simple human terms, we’re seeing the results of Campbell’s Law in primary education here in the USA. We’ve grown enraptured in recent years at measuring student, teacher, and school performance in terms of standardised exam results. In theory, comparing two schools’ scores on the same set of exams should tell you which of the two is more effective.
Logically, the better-performing school should then receive more resources than the poorer-performing school. That’s proven to be a counterproductive approach. In practice, the obsession with improving one’s data has incentivised teachers and schools to stop teaching anything other than test questions – and has encouraged many to blatantly cheat on the students’ exams in order to boost their schools’ performance data.
For the business owner, this means that the pursuit of big data for decision-making can have undesired negative effects for the business. At each step of the analysis process (classifying, storing, managing, analysing, archiving and reporting of data), you run the risk of rewarding counterproductive behaviour at the level where direct employee action is required. For example, if you decide to reward employees that capture certain customer data (like customers’ addresses or ages), some employees will feel pressure to
plug such data into transactions even when it isn’t available – basically, they’ll make something up in order to maximise their chances of being rewarded for having successfully captured the desired information.
In order to make big data work for you in a safe and responsible manner, you have to set clear and unambiguous standards for data capture, handling, storage, and disclosure. You need to know (in general terms, at least) what it is that you’re looking for and (more importantly) what you don’t need. Sequester, discard, or otherwise protect all of the content that isn’t germane to your analysis. Test your hypotheses rigorously, and be sceptical about believing any causal relationships within your data until you’ve been able to dispassionately prove that a causal relationship actually exists.
More important is to be absolutely clear with everyone involved in the capture of information that bogus or adulterated data will sour the entire analysis, and will not be tolerated. For a big data project to yield practical results, the data has to be reliable. Look for every opportunity in the data capture and handling processes where someone might be inadvertently incentivised to bulk up the records with bogus data, and then put verifiable controls in place to prevent data corruption from happening.