I’m embarrassed to admit that when I first read about business intelligence products in 1997, I thought that BI pertained to covert surveillance kit, like the kind you’d see in a James Bond feature.
I was disappointed to learn that that it was actually about trying to find meaningful trends and indicators within massive reservoirs of stored transaction information. How… boring. I just couldn’t picture Daniel Craig hunched over a laptop trying to interpret vodka-buying trends among 17 to 26-year-old single urban university graduates.
By now, everyone’s likely to have read Charles Duhigg’s riveting story published in the New York Times two years ago, where he explained in layman’s terms how US retail giant Target leveraged its treasure trove of big data to know things about specific consumers before the consumers themselves did. In addition to being a gripping read, rife with questions about privacy, invasiveness, ethics, and business impact, Duhigg also touched briefly on a problem I’ve encountered on every big data and BI project that I’ve ever worked on: you don’t get any value from processing your horking gobs of stored data if the data itself isn’t any good.
I was doing IT project management for a large company that had been contracted to install an online analysis processing engine onto a bunch of mainframe systems that controlled cost data for some aircraft maintenance facilities. The original project team had promised the client this new OLAP capability would allow upper management to (and this is a quote) “manipulate an n-dimensional data cube” to determine whether or not a rainstorm in Oklahoma on a Tuesday would increase the cost of spanners bought for a wing replacement in Ohio. It sounded miraculous. Since the head implementer was more of an evangelical prophet than an engineer, he had promised the client’s executives they could divine new places and ways to cut costs in the darkest corners of their sprawling industrial empire.
It took me less than two weeks to realise the project was never going to accomplish anything. Once I chatted with the people who actually worked at the client’s depots, I discovered that the data that was supposedly being manipulated by our new cube-thingy was actually worthless. Historically, the various sub-elements within the umbrella company had always managed their profit margins by obfuscating their actual costs and issues from upper management. The data they entered into the cost-control mainframes represented an agreed-to interpretation of realty; what they wanted the auditors to see, not what was actually transpiring. The actual cost data existed separately – on paper – locked securely in the line managers’ offices. The big bosses could play with their new OLAP all day long, and they’d never come close to implementing any sort of meaningful change – it was the IT equivalent of giving your toddler a toy steering wheel while they’re buckled into their car seat and telling them that they’re driving.
If you’re pouring money into a big data project, put the spadework in up-front to ensure the data that you’ll be analysing is actually valid. Audit your core business processes to ensure that what employees are entering into your systems reflects what they’re doing, and not what they want you to think they’re doing. And when you find a discrepancy, invest in fixing it before you try to perform any analysis on it.