I started thinking about this first in the area of open-source software, or for that matter, the Web. You think about how much value Tim Berners-Lee created and how he didn't actually capture very much of it. It was captured by companies like Google, Apple, Twitter, and Facebook. You also think about the other extreme, where companies like Goldman Sachs managed to extract a great deal of value from the economy, but as the 2008 financial crisis demonstrated, they did so while actually destroying value for the overall economy. So that got me thinking about how value creation and value capture are not the same thing. Our economics tends to measure value capture. If we're going to get 21st century economic policy right, or even just correctly model what's working and why, we have to start moving to a model that measures value creation rather than value capture.
One really great example of the distinction between value creation and value capture comes from open-source software. A few months ago, I had a conversation with Hari Ravichandran, the founder and CEO of Endurance International Group, the largest web hosting firm in the US. They include brands like Bluehost. Hari said, "Our business is built on open-source software, and I'd like to give something back."
In the course of our conversation I realized that most people don't even think of the Web-hosting industry or ISPs as being dependent on open-source software. But when you think about it, of course Web domain hosting is a simple business model wrapped around the open source internet domain system. They're essentially offering the DNS, Apache, MySQL, and WordPress to their customers.
Hari said something that really struck me, which is that there was a McKinsey study that showed that small businesses that have a Web presence have ten percent greater productivity than those without. In the course of our conversation, we realized that when you look at the value of the Web, when you look at the value created by Linux and MySQL, you don't just look at Red Hat, you don't look at the value of MySQL's acquisition by Oracle, you don't even just look at companies like Google, you look at these downstream effects in the economy, and the ability of a small business to reach customers.
Obviously, there are problems there with double counting and the like, so we really have to get much better at understanding and thinking about these kinds of Clothesline Paradox economics.
Here's another good example that affects Internet policy. We hear a lot about "free content" on the Web, and the idea that users are getting something for nothing. "They don't want to pay for their content." And yet, most people access the Internet by paying an Internet service provider $60, $70, $80 a month. You think of a company like Comcast. The user pays them $80 a month and watches television, and we say, "Oh. They're paying for content." They pay $80 a month for access to the Internet and we say, "Oh. They're getting their content for free." Something is clearly wrong with that picture!
In fact, the reverse is actually true. The free rider is Comcast. If people watch television, the Cable company has to pay money downstream to content providers. When people watch YouTube, or use Facebook or Twitter, or just surf the Web, they pay nothing for content. So it's not users who are getting the free ride, it's these big companies. That's just one of many implications that come to light when you start thinking about the Clothesline Paradox.
But then I started asking myself, what might be some other examples of Clothesline Paradox economies, hidden economies of value creation without value capture? A pretty clear one is YouTube: user-generated content, people making videos for each other. I think of my grandson, who is three years old, and sure, he sometimes watches professionally produced content, but he watches "Thomas the Tank Engine" train crash videos that are made by other three, four, and five-year-olds, and their parents. There are ones that have millions of views. It's a whole sub-genre that's been created by users for users, a niche that would never be made accessible without something like YouTube. And of course, it's an interesting kind of derivative work, since it builds on copyrighted characters. So there's a lot of complexity to be studied.
But what's really interesting, when I dug into YouTube, is that it turns out that the monetary economy there is about to explode. It is exploding. It's going to be one of the big Internet stories of the next year or two. YouTube has come to be taken for granted and while it really exploded as a medium, it was written off by most people as a money-maker. But I was really struck when I went and did a little bit of research. I was told anecdotally about a major pop star who actually makes more money on YouTube than on iTunes, and more of the money comes from ads run against videos that are uploaded by users than from the ones that are put up there by the music companies themselves.
The trick is that YouTube auto-detects the musical signature of a song, and so when a user, for example, puts up their wedding video and it has a pop song as the soundtrack, the music company gets paid, not the person who uploaded the video. There is an emerging set of business models by which a peer economy, a sharing economy, actually gets monetized. Often the value creation is only partly perceived by the people who created the content, and sometimes it's received by other people downstream.
In a session this morning at Science Foo Camp, we were talking about the role, for example, of PubMed in the scientific enterprise. In this case it's government that initiated this program, where research funded by public money is creating a whole lot of "free content." There are scientific journal publishers who are making the claim that this is destroying their business, and they're framing it as "free" is fighting with their business model, and claiming that this will destroy scientific societies who no longer have a way to make revenue, and so on and so forth.
But the reality is that there are about 300 companies that are actually reusing PubMed data as part of their commercial offering, and the entire scientific enterprise in biomedical research is enabled by this free content. Once again we have a story where free is actually a big part of economic value that is realized elsewhere in the economy.
If value creation and value capture are not the same thing, how would we start to model those systems? How do we do the basic research that lets us understand how "free" is being monetized?
I did, for example, a recent study with EIG's Bluehost unit of the economic impact of open source on their small business customers, estimating the economic activity that's attributable to those customers, based on surveys and so on. That's just a small example of how to start studying this.
If we're going to get science policy right, it's really important for us to study the economic benefit of open access and not just accept the arguments of incumbents, as we did in mainstream media. We have all the media companies saying "This will destroy us," and meanwhile, new free ecosystems, like the Web, have actually led to enormous wealth creation, enormous new opportunities for social value, and yes, they did in fact lead to in some cases the destruction of incumbents, but that's the kind of creative destruction that we celebrate in the economy. We have to accept that, particularly in the area of science, there's an incredible opportunity for open access to enable new business models.
I was talking to one researcher here who was talking about how they built the first set of comprehensive computer models of everything that's going on in the cell. He was just saying how great it would be to be able to load in all possible data from the scientific corpus and be able to see that play out in their model. Of course, if they have to go pay for access to all that possible data, they're going to be much more directive in the small number of things they can investigate. That is, the data they load will be determined by the questions they already know to ask. But we're really at the beginnings of a new kind of computational science, where if you can have computers, which can process data so much faster than humans can, do a lot of the work to sort through and look for meaning. Humans build the models, and then we say, let's load in all the data and extract possible meaning from it.
That is really the core of this new field that is being called data science. How do we extract meaning from vast corpuses of data? It's been going on in science for a long time, but where it really caught fire as an economic engine is on the consumer internet, with companies like Google, or Facebook, or Amazon. They are using various kinds of sophisticated computational techniques to extract meaning from massive amounts of data. We are now on the cusp of being able to apply what we've learned there in so many other areas.
We have sessions here at Science Foo Camp on, for example, MIT City Science initiative, what you can learn from mining massive amounts of cell phone data, or traffic data to figure out what works in cities, what doesn't. We have it in life sciences, in a material science session I was in earlier about predicting predictive models for new kinds of materials that you could build. All of this is enabled by open data. People used to pull out the data and say, "Well, we're going to make this valuable." What's increasingly going to be valuable in the future are creative new algorithms, creative new ways to use open data in new application areas to improve outcomes.
That brings me to one of the biggest areas where we need to apply this thinking, which is in healthcare. In the US, we have a healthcare system that's creaking under its own weight and on the point of collapse, and yet, it's pretty clear that if we could do for healthcare what Google does for searching the web, you'd have this enormous revolution, because you start to be able to say, "Oh, yeah, that procedure works," or "These patients are driving most of the costs. Let's intervene." Or "This therapy, we've been saying it works only half the time. No. It actually works 100 percent of the time for half of the patients. Now that we understand their genome, we can figure out which patients it works for and use it only on those, and not use it on the others." Diagnostics starts to shift from something you do once at the front end of a health situation to something you're doing repeatedly throughout the process of treatment, so that you're seeing what works and modifying it. We're really at the beginning of a whole revolution in the way that data can be used in science.
I'm jumping from my economic ruminations to these ruminations about open data in science, but I think the connection here is that what starts with open data and what appears to be uneconomic and free actually is the foundation for the next generation of businesses. There's something that Clayton Christenson once called "The Law of Conservation of Attractive Profits." When something that used to be valuable becomes commoditized, something that's adjacent suddenly becomes valuable.
This is the thread that ties together my thinking about open source software and what I called "Web 2.0." I was fascinated with the parallels between commodity PC hardware and open source software. When IBM made PC hardware a commodity, Microsoft figured out how to make PC software proprietary and valuable. As the Internet and open-source software made software more of a commodity, companies like Google figured out how to make data and algorithms into something that was proprietary and very valuable. I think we're going to see the same thing in the world of open access.