Scientific Knowledge Should Be Structured as "Literature"

In my view, what's most outmoded within science, most badly in need of retirement, is the way we structure and organize scientific knowledge itself. Academic literature, even as it moves online, is a relic of the era of typesetting, modeled on static, irrevocable, toothpaste-out-of-the-tube publication. Just as the software industry has moved from a "waterfall" process to an "agile" process—from monolithic releases shipped from warehouses of mass-produced disks to over-the-air differential updates—so must academic publishing move from its current read-only model and embrace a process as dynamic, up-to-date, and collaborative as science itself.

It amazes me how poorly the academic and scientific literature is configured to handle even retraction, even at its most clear-cut—to say nothing of subtler species like revision. It is typical, for example, that even when the journal editors and the authors fully retract a paper, the paper continues to be available at the journal's website, amazingly, without any indication that a retraction exists elsewhere, let alone on the same site, penned by the same authors and vetted by the same editor. (Imagine, for instance, if the FDA allowed a drug maker to continue manufacturing a drug known to be harmful, so long as they also manufactured a warning label—but were under no obligation to put the label on the drug.)

A subtler question is how and in what manner ("caveat lector"?) to flag studies that depend on the discredited study—let alone studies that depend on those studies.

Citation is the obvious first answer, though it's not quite enough. In academic journals, all citations attest to the significance of the works they cite, regardless of whether their results are being presumed, strengthened or challenged; even theories used as punching bags, for example, are accorded the respect of being worthy or significant punching bags.

But academic literature makes no distinction between citations merely considered significant and ones additionally considered true. What academic literature needs goes deeper than the view of citations as kudos and shout-outs. It needs what software engineers have used for decades: dependency management.

A dependency graph would tell us, at a click, which of the pillars of scientific theory are truly load-bearing. And it would tell us, at a click, which other ideas are likely to get swept away with the rubble of a particular theory. An academic publisher worth their salt would, for instance, not only be able to flag articles that have been retracted—that this is not currently standard practice is, again, inexcusable—but would be able to flag articles that depend in some meaningful way on the results of retracted work.

An academic publisher worth their salt would also accommodate another pillar of modern software development: revision control. Code repositories, like wikis, are living documents, open not only for scrutiny, censure and approbation, but for modification.

In a revision control system like Git (and its wildly successful open-source community on GitHub), users can create "issues" that flag problems and require the author's response, they can create "pull requests" that propose answers and alterations, and they can "fork" a repository if they want to steward their own version of the project and take it in a different direction. (Sometimes forked repositories serve a niche audience; sometimes they wither from neglect or disuse; sometimes they fully steal the audience and userbase from the original; sometimes the two continue to exist in parallel or continue to diverge; and sometimes they are reconciled and reunited downstream.) A Git repository is the best of top-down and bottom-up, of dictatorship and democracy: its leaders set the purpose and vision, have ultimate control and final say—yet any citizen has an equal right to complain, propose reform, start a revolt, or simply pack their bags and found a new nation next door.

The "Accept," "Reject," and "Revise and Resubmit" ternary is anachronistic, a relic of the era of metal type. Even peer review itself, with its anonymity and bureaucracy, may be ripe for reimagining. The behind-closed-doors, anonymous review process might be replaced, for instance, with something closer to a "beta" period. The article need not be held up for months—at least, not from other researchers—while it is considered by a select few. One's critics need not be able to clandestinely delay one's work by months. Authors need not thank "anonymous readers who spotted errors and provided critical feedback" when those readers' corrections are directly incorporated (with attribution) as differential edits. Those readers need not offer their suggestions as an act of obligation or charity, and they need not go unknown.

Some current rumblings of revolution seem promising. Wide circulation among academics of "working papers" challenges the embargo and lag in the peer review process. PLOS ONE insists on top-down quality assurance, but lets importance emerge from the bottom-up. Cornell's arXiv project offers a promising alternative to more traditional journal models, including versioning (and its "endorsement" system has since 2004 suggested a possible alternative to traditional peer reviews). However, its interface by design limits its participatory and collaborative potential.

On that front, a massive international collaboration via the Polymath Project website in 2013 successfully extended the work of Yitan Zhang on the Twin Primes Conjecture (and I understand that the University of Montreal's James Maynard has subsequently gone even further). Amazingly, this groundbreaking collaborative work was done primarily in a comment thread.

The field is crying out for better tools; meanwhile better tools already exist in the adjacent field of software development.

It is time for science to go agile.

The scientific literature, taken as content, is stronger than it's ever been—as, of course, it should be. As a form, the scientific literature has never been more inadequate or inept. What is in most dire need of revision is revision itself.