In a big-data world, it takes an exponentially rising curve of statistics to bring home just how subjugated we now are to the data cruncher's powers. Each day, according to IBM, we collectively generate 2.5 quintillion bytes—a tsunami of structured and unstructured data that's growing, in IDC's reckoning, at 60 per cent a year. Walmart drags a million hourly retail transactions into a database that long ago passed 2.5 petabytes; Facebook processes 2.5 billion pieces of content and 500 terabytes of data each day; and Google, whose YouTube division alone gains 72 hours of new video every minute, accumulates 24 petabytes of data in a single day. No wonder the rock star of Silicon Valley is no longer the genius software engineer, but rather the analytically inclined, ever more venerated data scientist.
Certainly there are vast public benefits in the smart processing of these zetta- and yottabytes of previously unconstrained zeroes and ones. Low-cost genomics allows oncologists to target tumours ever more accurately using the algorithmic magic of personalised medicine; real-time Bayesian analysis lets counterintelligence forces identify the bad guys, or at least attempt to, in new data-mining approaches to fighting terrorism. And let's not forget the commercial advantages accruing to businesses that turn raw numbers into actionable information: according to the Economist Intelligence Unit, companies that use effective data analytics typically outperform their peers on stock markets by a factor of 250 per cent.
Yet as our lives are swept unstoppably into the data-driven world, such benefits are being denied to a fast-emerging data underclass. Any citizen lacking a basic understanding of, and at least minimal access to, the new algorithmic tools will increasingly be disadvantaged in vast areas of economic, political and social participation. The data disenfranchised will find it harder to establish personal creditworthiness or political influence; they will be discriminated against by stockmarkets and by social networks. We need to start seeing data literacy as a requisite, fundamental skill in a 21st-century democracy, and to campaign—and perhaps even to legislate—to protect the interests of those being left behind.
The data disenfranchised suffer in two main ways. First, they face systemic disadvantages in markets which are nominally open to all. Take stockmarkets. Any human traders today bold enough to compete against the algorithms of high-frequency and low-latency traders should be made aware how far the odds are stacked against them. As Andrei Kirilenko, the chief economist at the US Commodity Futures Trading Commission, along with researchers from Princeton and the University of Washington, found recently, the most aggressive high-frequency traders tend to make the greatest profits—suggesting that it would be wise for the small investor simply to leave the machines to it. It's no coincidence that power in a swathe of other sectors is accruing to those who control the algorithms—whether the Obama campaign's electoral "microtargeters" or the yield-raising strategists of data-fuelled precision agriculture.
Second, absolute power is accruing to a small number of data-superminers whose influence is matched only by their lack of accountability. Your identity is increasingly what the data oligopolists say it is: credit agencies, employers, prospective dates, even the US National Security Agency have a fixed view of you based on your online datastream as channeled via search engines, social networks and "influence" scoring sites, however inaccurate or outdated the results. And good luck trying to correct the errors or false impressions that are damaging your prospects: as disenfranchised users of services such as Instagram and Facebook have increasingly come to realise, it's up to them, not you, how your personal data shall be used. The customer may indeed be the product, but there should at least be a duty for such services clearly to inform and educate the customer about their lack of ownership in their digital output.
Data, as we know, is power—and as our personal metrics become ever easier to amass and store, that power needs rebalancing strongly towards us as individuals and citizens. We impeded medical progress by letting pharmaceutical companies selectively and on occasions misleadingly control the release of clinical-trials data. In the emerging yottabyte age, let's ensure the sovereignty of the people over the databases by holding to account those with the keys to the machine.