Personal data mining

From the dawn of civilisation until 2003, Eric Schmidt is fond of saying, humankind generated five exabytes of data. Now we produce five exabytes every two days — and the pace is accelerating. In our post-privacy world of pervasive social-media sharing, GPS tracking, cellphone-tower triangulation, wireless sensor monitoring, browser-cookie targeting, face-recognition detecting, consumer-intention profiling, and endless other means by which our personal presence is logged in databases far beyond our reach, citizens are largely failing to benefit from the power of all this data to help them make smarter decisions. It's time to reclaim the concept of data mining from the marketing industry's microtargeting of consumers, the credit-card companies' anti-fraud profiling, the intrusive surveillance of state-sponsored Total Information Awareness. We need to think more about mining our own output to extract patterns that turn our raw personal datastream into predictive, actionable information. All of us would benefit if the idea of personal data mining were to enter popular discourse.

Microsoft saw the potential back in September 2006, when it filed United States Patent application number 20,080,082,393 for a system of "personal data mining". Having been fed personal data provided by users themselves or gathered by third parties, the technology would then analyse it to "enable identification of opportunities and/or provisioning of recommendations to increase user productivity and/or improve quality of life". You can decide for yourself whether you trust Redmond with your lifelog, but it's hard to fault the premise: the personal data mine, the patent states, would be a way "to identify relevant information that otherwise would likely remain undiscovered".

Both I as a citizen and society as a whole would gain if individuals' personal datastreams could be mined to extract patterns upon which we could act. Such mining would turn my raw data into predictive information that can anticipate my mood and improve my efficiency, make me healthier and more emotionally intuitive, reveal my scholastic weaknesses and my creative strengths. I want to find the hidden meanings, the unexpected correlations that reveal trends and risk factors of which I had been unaware. In an era of oversharing, we need to think more about data-driven self-discovery.

A small but fast-growing self-tracking movement is already showing the potential of such thinking, inspired by Kevin Kelly's quantified self and Gary Wolf's data-driven life. With its mobile sensors and apps and visualisations, this movement is tracking and measuring exercise, sleep, alertness, productivity, pharmaceutical responses, DNA, heartbeat, diet, financial expenditure — and then sharing and displaying its findings for greater collective understanding. It is using its tools for clustering, classifying and discovering rules in raw data, but mostly is simply quantifying that data to extract signals — information — from the noise.

The cumulative rewards of such thinking will be altruistic rather than narcissistic, whether in pooling personal data for greater scientific understanding (23andMe) or in propagating user-submitted data to motivate behaviour change in others (Traineo). Indeed, as the work of Daniel Kahneman, Daniel Gilbert, and Christakis and Fowler demonstrate so powerfully, accurate individual-level data-tracking is key to understanding how human happiness can be quantified, how our social networks affect our behaviour, how diseases spread through groups.

The data is already out there. We just need to encourage people to tap it, share it, and corral it into knowledge.