Tag Archives: Aaron Swartz

How Much is that Journal Article in the Window?

On 9 January 2013, JSTOR, a bedrock resource for countless academic researchers around the globe, launched a new framework that will allow the unschooled masses limited access to a portion of its archive. The occasion marks the first time in the organization’s nearly twenty-year history that anyone with a web browser can view the full text of scholarly journal articles normally locked away behind an institutional paywall. Two days later, Aaron Swartz, facing criminal charges for allegedly exploiting the guest network at MIT to download millions of JSTOR articles, hanged himself in his apartment in Brooklyn.

Despite their close temporal proximity, there is probably not a direct correlation between these two events. JSTOR opened access to its entire collection of public domain articles in 2011 and had been working on a pilot of its new “Register & Read” program for some time. Swartz, who “faced millions of dollars in fines and decades in prison,” had a strong network of supporters, but a long history of depression. The eerie confluence of these two landmarks does, however, offer a chance to assess the current state of open access for professional academic work.

Swartz, who co-wrote the first iteration of the RSS protocol and was characterized as a “hacktivist” by the press, was indited in the summer of 2011 on multiple felonies. The indictment, as well as the alleged criminal activity, is complex and technical and best summarized elsewhere. But, in essence, Swartz was charged with infiltrating a wiring closet on the MIT campus and using a laptop and a script to download large swaths of JSTOR content. Or, as government agents put it, he used computers “to steal…millions of articles.” (An exact line between merely downloading a bunch of articles and stealing them was not established in the indictment.) Swartz had a penchant for scooping large data sets and making them freely available for algorithmic analysis, among other things. His role in liberating taxpayer-funded court documents several years ago resulted in an FBI investigation. It would be difficult to overestimate the significance of such data for scholars and researchers. Access to similarly massive data sets has fueled several high profile research initiatives, including the recent Culturomics project. These attempts at comprehensive, macroscopic interpretation, which stretch back to the Cliometrics fad of the 1960s, have benefits and limitations that I have discussed in other contexts. But Swartz’s intentions (so far as we can know them now) to expand access to critically important data seem laudable.

JSTOR, a modest non-profit organization, is acutely conscious of its public role as gatekeeper of valuable knowledge. It waives or reduces its access fees for certain territories, including all of Africa. Its alumni access program puts it miles ahead of other highly restricted scholarly databases. It also provides a special platform for running large-scale, algorithmic experiments on its entire corpus of academic material.  The organization declined to prosecute Swartz for his shenanigans and discouraged the government from taking action against him (you can read JSTOR’s courteous memorial here). And it now offers limited access to a portion of its content for free. Although free access is restricted to three articles every two weeks and does not include material published within the last three to five years, anyone with a browser and a curious mind can peruse the complete back catalog of the American Historical Review, the William and Mary Quarterly, or any one of thousands of prestigious journals. This is an enormous step forward for democracy, the open web, and the diffusion of human wisdom. Ironically, Swartz’s actions tended, in the short term, to have the opposite effect. After he exploited public library access to the PACER system to liberate millions of court records, federal officials decided to close down their public library program. Likewise, his automated requests for journal content allegedly crashed JSTOR’s servers and resulted in prolonged outages on the MIT network, during which researchers were locked out of the material they needed.

Whatever his personal faults, Swartz and his ilk make a compelling case against the paywall model for the dissemination of scholarly material. Authors and editors contribute content to academic journals largely for free. They receive no payment in advance for their labor. Nor do they receive royalties or any other mode of compensation when their content is published and viewed. In many cases, and I include myself as the author of numerous journal articles and reviews, our research is funded either directly or indirectly by the public and we want our results to be distributed and read by as many people as possible. And yet, as Nick Shockey and Jonathan Eisen point out in the above video, the average subscription rate for a single academic journal (spread across fifteen different disciplines) is over $1,000 per year. Some journals range into quintuple digits for a yearly subscription. Those inflated charges do not go to the authors, editors, or even JSTOR, but accrue directly to the publisher. And exactly what added value the publishing house provides to the journal in exchange for this windfall is not entirely clear.

Most scholarly journals these days provide some form of copyright proviso for “self-archiving,” whereby authors can post an earlier, pre-copy-edited, pre-peer-reviewed, or pre-typeset iteration of their scholarship for free. But the process is not always straightforward. Although there have been some valiant attempts to clarify and centralize the procedures involved in self-archiving and open access (the SHERPA/RoMEO database is an excellent example), the details can be confusing or difficult to manage. And even after wading through a dizzying array of policies and procedures and reverse edits, there is not always a clear path to self-publication. While some colleges and universities provide a managed space for faculty publications, graduate students, adjuncts, and independent scholars do not always have the time or the skill to launch and maintain their own Apache server or to ferret out which of the many third-party digital repositories are best to deposit their academic work. And what happens when a hosting service goes down or you move to a new institution and have to begin the process all over again? What about all of the authors who are unable or unwilling to format, upload, and promote their material? Since scholarship (at its best) is accumulative and progressive, access to a single article without any of its references or antecedents can be like trying to make sense of a jigsaw puzzle using only one piece.

Organizations such as JSTOR that negotiate with publishers to collect, organize, and facilitate access to scholarly material perform a tremendous public service. They offer an imperfect solution to a thorny problem. And that problem, as even Swartz’s critics realize, is a dysfunctional publication model that does more to lock away knowledge than to enable access to it. Subscription costs have gotten so far out of control that last year Harvard took the unusually bold step of asking faculty “to make their research freely available through open access journals and to resign from publications that keep articles behind paywalls.” Moving beyond scholarly journal publishing to the world of digital primary source material, the problem becomes even more severe. Newspaper banks such as Readex and multi-modal databases such as Slavery and Anti-Slavery (to name just two that I use on a regular basis) charge libraries tens of thousands of dollars in annual subscription fees for ongoing access to collections that consist almost entirely of public domain documents. Most of these databases are too embarrassed to post their subscription fees on their websites, so I have to admit that my data in this regard was gleaned from anecdotal conversations with archivists and librarians.

The disarray and confusion over journal articles mirrors, to a small extent, the ongoing content wars being waged by major media conglomerates around the world. Until very recently, the principal tactics used by music and movie studios concerned by the proliferating amount of digital content were ham-fisted lawsuits against 9-year-old girls and draconian legislation. The response to Swartz, who faced a fine of $1 million and 35 years in prison for downloading scholarly articles, appears to fit this mold. Since there was little evidence of any malicious criminal behavior, the severity of the charges suggests that the government wanted to send a terrifying message to would-be hacktivists. Indeed Lawrence Lessig makes a compelling case that prosecutorial bullying contributed directly to Swartz’s death. At the same time, beginning with iTunes and Hulu, content producers have begun to shift from criminalizing consumers of their content to enticing them with free samples and affordable subscription plans. JSTOR’s bid to open up a segment of its archive in order to gather more information about its users and invite them to explore further seems to follow in this vein – making it perhaps the Hulu of digital scholarly content.

I am not persuaded that this model is the best for scholarship. What works well for Hollywood and TV producers will not necessarily work equally well for academics. Perhaps the latter group have something to learn from musicians who have eschewed top-down publication models for a more grassroots, social media approach to production and distribution, or those like Trent Reznor who combine the two. What is clear is that a variety of open access models need to be explored, innovated, and challenged. The future that Aaron Swartz stood, and eventually died, for is predicated on the free flow of information – especially when that information is funded by the public and is meant to serve the public good. It is a future where the partial unlocking of JSTOR can be celebrated as a crucial step forward and at the same time lambasted as radically incomplete.

UPDATE: Researchers have memorialized Swartz over the past few days by tweeting copies of their papers and by helping others locate open access repositories and upload pdfs. Others have started posting and sharing content from JSTOR. The folks at Archive Team have established a cheeky, and potentially legally dubious, Aaron Swartz Memorial JSTOR Liberator. While not a long-term solution, the energy and urgency behind these tributes demonstrates a real momentum for change. Jonathan Eisen provides further details.

Meanwhile, a group of sixty French academics have published a powerful open access manifesto covering humanistic as well as scientific research. The statement points to the critical role of open access in bridging the global digital divide. “Knowledge behind barriers, which only the happy few working in the richest universities can access,” the authors argue, “is barren knowledge.” An English translation is available here.