Category Archives: Site Reviews

History Leaks

I am involved in a new project called History Leaks. The purpose of the site is to publish historically significant public domain documents and commentaries that are not available elsewhere on the open web. The basic idea is that historians and others often digitize vast amounts of information that remains locked away in their personal files. Sharing just a small portion of this information helps to increase access and draw attention to otherwise unknown or underappreciated material. It also supports the critically important work of archives and repositories at a time when these institutions face arbitrary cutbacks and other challenges to their democratic mission.

I hope that you will take a moment to explore the site and that you will check back often as it takes shape, grows, and develops. Spread the word to friends and colleagues. Contributions are warmly welcomed and encouraged. Any feedback, suggestions, or advice would also be of value. A more detailed statement of purpose is available here.

Globalizing the Nineteenth Century

Nineteenth-century Americans viewed themselves through an international lens. Among the most important artifacts of this global consciousness is William Channing Woodbridge’s “Moral and Political Chart of the Inhabited World.” First published in 1821 and reproduced in various shapes and sizes in the decades prior to the Civil War, Woodbridge’s chart was a central and popular component of classroom instruction. I use it in my research and teaching. It forms a key part of my argument about the abolitionist encounter with Africa. And every time I look at it, I see something new or unexpected.

Like basketball and jazz, the moral chart is an innovation unique to the United States. The earliest iterations depart from the Eurocentric and Atlantic focus with which modern readers are most familiar. Reflecting the early American obsession with westward expansion, they gaze out over the Pacific Ocean to East Asia and the Polynesian Islands. The chart features a plethora of statistical and critical data. Nations and territories are ranked according to their “Degrees of Civilisation,” form of government, and religion. Darker colored regions are “savage” or “barbarous” while rays of bright light pour out from the Eastern United States and Northern Europe.

Thematic mapping of this sort was nothing radically new. John Wyld’s “Chart of the World Shewing the Religion, Population and Civilization of Each Country,” published in London in 1815, graded national groups on a progressive scale, from I to V. Wyld gave himself a V and the United States a I, II, and IV. Woodbridge may have been inspired by this example, but he also took it to a new level. Drawing on the climatological charts developed by German explorer Alexander von Humboldt, he used complex shading and mathematical coordinates to give an air of scientific precision. And he placed the United States on a civilized par with Europe. With its sophisticated detail and colorful imagery, it is easy to see why Woodbridge’s image became a runaway success. It is deeply disturbing to compare it to recent NASA maps of the global electrical grid.

Countless men and women stared at similar maps and reports from foreign lands and dreamed and imagined and schemed about their futures. Some experienced dramatic revelations. Visiting friends in 1837, itinerant minister Zilpha Elaw heard the voice of God: “I have a message for her to go with upon the high seas and she will go.” Others were simply bored. Prior to his arrival in Monrovia that same year, medical student David Francis Bacon daydreamed about Africa, “torrid, pestilential, savage, mysterious.” George Thompson, a prisoner in Missouri in the 1840s, read articles from the Union Missionary aloud to his fellow inmates. “We quickly pass from Mendi to Guinea, Gaboon, Natal, Ceylon, Bombay, Madura, Siam, China, Palestine, Turkey, The Islands, the Rocky Mountains, Red Lake,” he wrote in his journal, “from tribe to tribe – from nation to nation – from continent to continent, and round the world we go.”

Woodbridge’s chart and others like it inspired a slew of “moral maps” illustrated by antislavery activists, in which the slave states were usually colored the darkest black. One of the most explicit, published by British ophthalmologist John Bishop Estlin, used blood red to symbolize the “blighting influence” of the South oozing out into the rest of the country. An 1848 broadside showed slavery poised to swallow the entire hemisphere, from Cuba to Central America to the Pacific Rim. Another used a black arrow to trace the “curse of slavery” from Virginia to war, treason, murder, and hell (which is located in Texas). The most famous of the Woodbridge descendants were the elaborate “free soil” charts and diagrams used in electoral campaigns. Crammed with statistics correlating slaveholding with illiteracy and political tyranny, these charts became crucial organizing tools both before and during the Civil War.

The most unusual map I unearthed in the course of my research reversed the logic of the typical moral chart by shining a bright light on the African continent. Published by the American Anti-Slavery Society in 1842 and reprinted many times thereafter, this map reveals the movement’s Afrocentric global vision. Europe and North America recede into darkness as Africa takes center stage. The United States, flanked by the term SLAVERY, is almost falling off the map at the edge of the world. Most editions coupled this image with a moral map of the U.S. South, which colored the slaveholding states, and even the waterways surrounding them, as darkly savage, the lowest of the low on the Woodbridge scale. The juxtaposition of these two images significantly complicates historians’ assumptions about Africa as “the dark continent.” Although we now know that the human race, language, culture, and civilization all began in Africa, such views were not uncommon in the middle decades of the nineteenth century. Contemporary ideas about African cultures were complex and often mixed condescension with respect. Most surprising of all, I know of no historian who has given sustained attention to this map. With the exception of outstanding books by Martin Brückner and Susan Schulten, I know of few historians who have engaged the legacies of William Woodbridge’s various moral charts.

The past five or ten years have witnessed an explosion of scholarship on the global dimensions of American history and the birth of a new field, sometimes referred to as “The United States in the World.” Nineteenth-century history is very much a part of this trend, but progress has been slow and uneven. The nineteenth century was America’s nationalist century, with the Civil War serving as its fulcrum in both classrooms and books. Perhaps understandably, there is a tendency to look inward during times of national crisis. Yet as I and others have argued, nationalism – and racism, and sexism, and classism, and other related isms – are a fundamentally international process. Woodbridge’s Moral and Political Chart is the perfect example. Simultaneously nationalist and international, it depicts the United States embedded in a world of turmoil and change. Two recent conferences in South Carolina and Germany are evidence of a rising momentum that seeks to re-situate the U.S. Civil War era as part of a much broader global conflict. But a great deal of work remains to be done.

To get a sense of where the field is heading, its strengths as well as its weaknesses, it is necessary to map the terrain. To my knowledge, no one has attempted an organized and comprehensive database of the rapidly growing literature on the international dimensions of nineteenth-century American history. So, not too long ago, I launched a Zotero library to see what could be done. Based on the bibliography for my dissertation, it is decidedly biased and impressionistic. Aside from brilliant entries by Gerald Horne and Robert Rosenstone, the Pacific World and Asia are underrepresented. The same could be said for Mexico and the rest of Latin America. Since the nineteenth-century, like all historical periods, is essentially an ideological construction, I have been flexible with the dates. I think anything from the early national period (circa 1783) through the entry into World War I (circa 1917) should be fair game. Although he is not chiefly concerned with the United States, this roughly corresponds to the limits set a decade ago by C. A. Bayly. I also subdivided the material based on publication medium (book, chapter, article, dissertation, etc.). This system can and probably should be refined in the future to allow sorting by geographic focus and time frame.

Zotero is admired by researchers and teachers alike. Over the past seven years, it has evolved a robust set of features, including the ability to collaborate on group projects. The Zotpress plugin, which generates custom citations for blog posts, is another really neat feature. As a content management system, it still has its flaws. The web interface can be sluggish for lower bandwidth users, and compared to Drupal or Omeka, the member roles and permissions are downright archaic. If an admin wants a user to be able to create content but not edit or delete other users’ content, for example, there is no real solution. Admins are able to close membership, so that users must request an invitation to join the group. This allows tight control over the content community. But it arguably kills a good deal of the spontaneity and anonymity that energizes the most successful crowdsourcing experiments. At the same time, the Zotero API and its various branches are fully open source and customizable, so I really can’t complain.

The biggest problem is the randomness of the semantic web. Primarily a browser plugin, Zotero allows users to surf to a site, book, or journal article and add that item to their bibliography with a single click. Sites do not always have the best metadata, however, so manual fixes are usually required. Several of the books I added from Google Books had an incorrect publication date. Others had very little or no descriptive data at all. Without delving into complicated debates about GRDDL or Dublin Core, I will just say that a catalog is only as good as its metadata. None of this has anything to do with Zotero, of course, which still gives the 3×5 index card a run for its money.

Although I admit I am not a heavy user, Zotero struck me as the ideal platform for an historiographical potluck. My Nineteenth-Century U.S. History in International Perspective group is now live. Anyone can view the library, and anyone who signs on as a member can add and edit information (I just ask that members not delete others’ content or make major changes without consulting the group). As of right now, I have not added any substantive notes to the source material. But it might be neat to do this and compile the database as an annotated bibliography. I will try to update the library as I’m able. At the very least, it will be an interesting experiment. A large part of the battle for history is just knowing what material is out there.

Cross-posted at The Historical Society

Follow the Money

This Wednesday, the Institute on Assets and Social Policy at Brandeis University released a new study showing that the wealth gap between white and black households has nearly tripled over the past 25 years. From 1984 to 2009, the median net worth of white families rose to $265,000, while that of black families remained at just $28,500. This widening disparity is not due to individual choices, the authors discovered, but to the cumulative effect of “historical wealth advantages” as well as past and ongoing discrimination. It does not take a rocket scientist to realize that wealth generates more wealth and that centuries of unpaid labor – from chattel slavery to the chain gang – have given white families a greater reserve of inherited equity.

The very same day, 3,000 miles to the east, a team of researchers at University College London launched a major new database entitled Legacies of British Slave-ownership. At its heart is an encyclopedia “containing information about every slave-owner in the British Caribbean, Mauritius or the Cape at the moment of abolition in 1833.” Not only this, the database includes information about how much individual slaveholders received as compensation for their human property and hints as to what they did with their money. The results illustrate the tremendous significance of slave-generated wealth for the British economic and political elite. The families of former Prime Minister William Gladstone and current Prime Minister David Cameron, for example, were direct beneficiaries. At the same time, the site makes it possible to trace many of the smaller-scale slaveholders scattered throughout the empire and to speculate about the impact of all that capital accumulation. Although still in its early stages, the site promises to be an outstanding resource for digital research and teaching.

In part because it is so new, the level of detail in the database can be uneven. Some individuals have elaborate biographies and reams of supporting material. Others have an outline sketch or a placeholder. To help correct this, the authors welcome new information from the public. All of the biographies must have taken a tremendous amount of time and effort to compile, and all claims are meticulously documented with links to both traditional and online sources. While there are few images and maps at this stage, the site features an excellent short essay that helps to place the project and its raw data in historical context. The focus is almost entirely on metropolitan Britain, and there is good reason for this. Nearly half of the £20 million paid to former slaveholders went directly to absentee planters residing in the homeland. Still, it might be useful to place this information in wider perspective.

A significant number of nineteenth-century emancipations involved some sort of compensation to erstwhile slaveholders or their agents. Throughout the Atlantic World, abolitionists occasionally raised funds to liberate individual slaves. This was how celebrity authors, such as Frederick Douglass, Harriet Jacobs, and Juan Francisco Manzano, acquired their free papers. In some cases, enslaved families were required to pay slaveholders directly. Under Connecticut’s gradual emancipation law, for example, male slaves born after a certain date were mandated to work for free until their 25th birthday (unless, of course, their enslavers attempted to smuggle them to the South beforehand). Even Haiti, which successfully abolished slavery while fighting off multiple European invasions, was extorted into a massive reparations payment to its former colonial masters, helping to generate a cycle of debt and poverty that continues to this day.

The United States Civil War is somewhat unique in this regard. Although slaveholders in Washington D.C. received government compensation when the District eliminated slavery in 1862, thanks to the logic of the war, the actions of abolitionists, and above all the determination of the enslaved, rebel slaveholders received little in exchange for the loss of their human property. According to recent estimates, that property was among the most valuable investments in the nation. By 1860, the aggregate value of all slaves was in the neighborhood of $10 trillion (in 2011 dollars), or 70% of current GDP. The sudden loss of this wealth represents what is very likely the most radical and widespread seizure of private capital until the Russian Revolution of 1917. But even in this case, emancipated slaves were left to fend for themselves, their pleas for land largely unanswered.

Although there have been a number of successful attempts to trace the influence of slavery within American institutions, especially universities and financial firms, the haphazard and piecemeal nature of emancipation left no comprehensive record. And this is what makes the compensation windfall included in the British Abolition Act of 1833 so fascinating. By scouring government records, researchers have been able to construct a fairly accurate picture of slavery beneficiaries and to trace their influence across a range of activities – commercial, cultural, historical, imperial, physical, and political. A cursory glance at the data reveals 222 politicians and 459 commercial firms among the recipients. A targeted search for railway investments yields over 500 individual entries totaling hundreds of thousands of pounds. According to the database, over 150 history books and pamphlets were made possible, at least in part, by slavery profits. That a sizable chunk of nineteenth-century historiography, as well as its modern heirs, owes its existence to the blood, sweat, and tears of millions of slaves is extremely consequential. And this fact alone deserves careful attention by every practicing historian.

Slaveholder compensation, which equals about £16.5 billion or $25 billion in present terms, was seen as a necessary measure for social stability. The British planter class was deemed, in short, too big to fail. The funds, as Nicholas Draper explains, were provided by a government loan. And it is worth noting that this loan was paid in large part by sugar duties – protectionist tariffs that drove up the price of imported goods. Since the poorest Britons relied on the cheap calories provided by sugar, they bore a disproportionate share of the cost. Meanwhile, former slaves were coerced into an “apprenticeship” system for a limited number of years, during which they would provide additional free labor for their erstwhile owners. So the wealth generated by this event, if you’ll pardon the dry economic jargon, was concentrated and regressive, taking from the poor and the enslaved and giving to the rich.

As its authors point out, the encyclopedia of British slaveholders carries interesting implications for the reparations debate. Although it does not dwell on this aspect, the site also carries significance for the ongoing historical debate about the relationship between capitalism and slavery. Recent work by Dale Tomich, Anthony Kaye, and Sven Beckert and Seth Rockman has placed nineteenth-century slavery squarely at the center of modern capitalism. While historians may quibble about the specifics, it is clear that the profits of slavery fueled large swaths of what we now call the Industrial Revolution and helped propel Great Britain and the United States into the forefront of global economic development. The database makes it possible to glimpse the full extent of that impact, really, for the first time.

Legacies of British Slave-ownership is refreshingly honest about the limitations of its data. Unlike most digital history projects of which I am aware, the authors have engaged their critics directly. One critique is that the project team is white and focused largely on the identities of white slaveholders. Yet, as the authors point out, it is difficult to relate the experience of the enslaved in a vacuum, hermetically sealed and separate from the actions and reactions of their oppressors. If I have learned anything from my study of the subject, it is that it is impossible to understand the history of slavery apart from the history of abolition, and it is impossible to understand the history of abolition apart from the history of slavery. The two are fundamentally intertwined.

So what about the other side to this story? What about all the slaves and abolitionists who called for immediate, uncompensated emancipation? What about the alternative visions they called into being through their actions and their imaginations? What about the different models they offered, however flawed or fleeting, for a world without slaveholders?

Writing to his “Old Master” in the summer of 1865, in one of the great masterworks of world literature, Jordan Anderson gave his thoughts on the matter:

I served you faithfully for thirty-two years, and Mandy twenty years. At twenty-five dollars a month for me, and two dollars a week for Mandy, our earnings would amount to eleven thousand six hundred and eighty dollars. Add to this the interest for the time our wages have been kept back, and deduct what you paid for our clothing, and three doctor’s visits to me, and pulling a tooth for Mandy, and the balance will show what we are in justice entitled to. Please send the money by Adams’s Express, in care of V. Winters, Esq., Dayton, Ohio.

Anderson’s descendants, in Ohio and elsewhere, are still waiting.

How Much is that Journal Article in the Window?

On 9 January 2013, JSTOR, a bedrock resource for countless academic researchers around the globe, launched a new framework that will allow the unschooled masses limited access to a portion of its archive. The occasion marks the first time in the organization’s nearly twenty-year history that anyone with a web browser can view the full text of scholarly journal articles normally locked away behind an institutional paywall. Two days later, Aaron Swartz, facing criminal charges for allegedly exploiting the guest network at MIT to download millions of JSTOR articles, hanged himself in his apartment in Brooklyn.

Despite their close temporal proximity, there is probably not a direct correlation between these two events. JSTOR opened access to its entire collection of public domain articles in 2011 and had been working on a pilot of its new “Register & Read” program for some time. Swartz, who “faced millions of dollars in fines and decades in prison,” had a strong network of supporters, but a long history of depression. The eerie confluence of these two landmarks does, however, offer a chance to assess the current state of open access for professional academic work.

Swartz, who co-wrote the first iteration of the RSS protocol and was characterized as a “hacktivist” by the press, was indited in the summer of 2011 on multiple felonies. The indictment, as well as the alleged criminal activity, is complex and technical and best summarized elsewhere. But, in essence, Swartz was charged with infiltrating a wiring closet on the MIT campus and using a laptop and a script to download large swaths of JSTOR content. Or, as government agents put it, he used computers “to steal…millions of articles.” (An exact line between merely downloading a bunch of articles and stealing them was not established in the indictment.) Swartz had a penchant for scooping large data sets and making them freely available for algorithmic analysis, among other things. His role in liberating taxpayer-funded court documents several years ago resulted in an FBI investigation. It would be difficult to overestimate the significance of such data for scholars and researchers. Access to similarly massive data sets has fueled several high profile research initiatives, including the recent Culturomics project. These attempts at comprehensive, macroscopic interpretation, which stretch back to the Cliometrics fad of the 1960s, have benefits and limitations that I have discussed in other contexts. But Swartz’s intentions (so far as we can know them now) to expand access to critically important data seem laudable.

JSTOR, a modest non-profit organization, is acutely conscious of its public role as gatekeeper of valuable knowledge. It waives or reduces its access fees for certain territories, including all of Africa. Its alumni access program puts it miles ahead of other highly restricted scholarly databases. It also provides a special platform for running large-scale, algorithmic experiments on its entire corpus of academic material.  The organization declined to prosecute Swartz for his shenanigans and discouraged the government from taking action against him (you can read JSTOR’s courteous memorial here). And it now offers limited access to a portion of its content for free. Although free access is restricted to three articles every two weeks and does not include material published within the last three to five years, anyone with a browser and a curious mind can peruse the complete back catalog of the American Historical Review, the William and Mary Quarterly, or any one of thousands of prestigious journals. This is an enormous step forward for democracy, the open web, and the diffusion of human wisdom. Ironically, Swartz’s actions tended, in the short term, to have the opposite effect. After he exploited public library access to the PACER system to liberate millions of court records, federal officials decided to close down their public library program. Likewise, his automated requests for journal content allegedly crashed JSTOR’s servers and resulted in prolonged outages on the MIT network, during which researchers were locked out of the material they needed.

Whatever his personal faults, Swartz and his ilk make a compelling case against the paywall model for the dissemination of scholarly material. Authors and editors contribute content to academic journals largely for free. They receive no payment in advance for their labor. Nor do they receive royalties or any other mode of compensation when their content is published and viewed. In many cases, and I include myself as the author of numerous journal articles and reviews, our research is funded either directly or indirectly by the public and we want our results to be distributed and read by as many people as possible. And yet, as Nick Shockey and Jonathan Eisen point out in the above video, the average subscription rate for a single academic journal (spread across fifteen different disciplines) is over $1,000 per year. Some journals range into quintuple digits for a yearly subscription. Those inflated charges do not go to the authors, editors, or even JSTOR, but accrue directly to the publisher. And exactly what added value the publishing house provides to the journal in exchange for this windfall is not entirely clear.

Most scholarly journals these days provide some form of copyright proviso for “self-archiving,” whereby authors can post an earlier, pre-copy-edited, pre-peer-reviewed, or pre-typeset iteration of their scholarship for free. But the process is not always straightforward. Although there have been some valiant attempts to clarify and centralize the procedures involved in self-archiving and open access (the SHERPA/RoMEO database is an excellent example), the details can be confusing or difficult to manage. And even after wading through a dizzying array of policies and procedures and reverse edits, there is not always a clear path to self-publication. While some colleges and universities provide a managed space for faculty publications, graduate students, adjuncts, and independent scholars do not always have the time or the skill to launch and maintain their own Apache server or to ferret out which of the many third-party digital repositories are best to deposit their academic work. And what happens when a hosting service goes down or you move to a new institution and have to begin the process all over again? What about all of the authors who are unable or unwilling to format, upload, and promote their material? Since scholarship (at its best) is accumulative and progressive, access to a single article without any of its references or antecedents can be like trying to make sense of a jigsaw puzzle using only one piece.

Organizations such as JSTOR that negotiate with publishers to collect, organize, and facilitate access to scholarly material perform a tremendous public service. They offer an imperfect solution to a thorny problem. And that problem, as even Swartz’s critics realize, is a dysfunctional publication model that does more to lock away knowledge than to enable access to it. Subscription costs have gotten so far out of control that last year Harvard took the unusually bold step of asking faculty “to make their research freely available through open access journals and to resign from publications that keep articles behind paywalls.” Moving beyond scholarly journal publishing to the world of digital primary source material, the problem becomes even more severe. Newspaper banks such as Readex and multi-modal databases such as Slavery and Anti-Slavery (to name just two that I use on a regular basis) charge libraries tens of thousands of dollars in annual subscription fees for ongoing access to collections that consist almost entirely of public domain documents. Most of these databases are too embarrassed to post their subscription fees on their websites, so I have to admit that my data in this regard was gleaned from anecdotal conversations with archivists and librarians.

The disarray and confusion over journal articles mirrors, to a small extent, the ongoing content wars being waged by major media conglomerates around the world. Until very recently, the principal tactics used by music and movie studios concerned by the proliferating amount of digital content were ham-fisted lawsuits against 9-year-old girls and draconian legislation. The response to Swartz, who faced a fine of $1 million and 35 years in prison for downloading scholarly articles, appears to fit this mold. Since there was little evidence of any malicious criminal behavior, the severity of the charges suggests that the government wanted to send a terrifying message to would-be hacktivists. Indeed Lawrence Lessig makes a compelling case that prosecutorial bullying contributed directly to Swartz’s death. At the same time, beginning with iTunes and Hulu, content producers have begun to shift from criminalizing consumers of their content to enticing them with free samples and affordable subscription plans. JSTOR’s bid to open up a segment of its archive in order to gather more information about its users and invite them to explore further seems to follow in this vein – making it perhaps the Hulu of digital scholarly content.

I am not persuaded that this model is the best for scholarship. What works well for Hollywood and TV producers will not necessarily work equally well for academics. Perhaps the latter group have something to learn from musicians who have eschewed top-down publication models for a more grassroots, social media approach to production and distribution, or those like Trent Reznor who combine the two. What is clear is that a variety of open access models need to be explored, innovated, and challenged. The future that Aaron Swartz stood, and eventually died, for is predicated on the free flow of information – especially when that information is funded by the public and is meant to serve the public good. It is a future where the partial unlocking of JSTOR can be celebrated as a crucial step forward and at the same time lambasted as radically incomplete.

UPDATE: Researchers have memorialized Swartz over the past few days by tweeting copies of their papers and by helping others locate open access repositories and upload pdfs. Others have started posting and sharing content from JSTOR. The folks at Archive Team have established a cheeky, and potentially legally dubious, Aaron Swartz Memorial JSTOR Liberator. While not a long-term solution, the energy and urgency behind these tributes demonstrates a real momentum for change. Jonathan Eisen provides further details.

Meanwhile, a group of sixty French academics have published a powerful open access manifesto covering humanistic as well as scientific research. The statement points to the critical role of open access in bridging the global digital divide. “Knowledge behind barriers, which only the happy few working in the richest universities can access,” the authors argue, “is barren knowledge.” An English translation is available here.

Ahead in the Clouds

The Chronicle published a lengthy review article last week on the science of brain mapping. The article focuses on Ken Hayworth, a researcher at Harvard who specializes in the study of neural networks (called connectomes). Hayworth believes, among other things, that we will one day be able to upload and replicate an individual human consciousness on a computer. It sounds like a great film plot. Certainly, it speaks to our ever-evolving obsession with our own mortality. Whatever the value of Hayworth’s prediction, many of us are already storing our consciousness on our computers. We take notes, download source material, write drafts, save bookmarks, edit content, post blogs and tweets and status updates. No doubt the amount of our intellectual life that unfolds in front of a screen varies greatly from person to person. But there are probably not too many modern writers like David McCullough, who spends most of his time clacking away on an antique typewriter in his backyard shed.

Although I still wade through stacks of papers and books and handwritten notes, the vast majority of my academic work lives on my computer, and that can be a scary prospect. I have heard horror stories of researchers who lose years of diligent work in the blink of an eye. I use Carbon Copy Cloner to mirror all of my data to an external hard drive next to my desk. Others might prefer Time Machine (for Macs) or Backup and Restore (for Windows). But what if I lose both my computer and my backup? Enter the wide world of cloud storage. Although it may be some time before we can backup our entire neural net on the cloud, it is now fairly easy to mirror the complicated webs of source material, notes, and drafts that live on our computers. Services like Dropbox, Google Drive, SpiderOak, and SugarSync offer between 2 and 5 GB of free space and various options for syncing local files to the cloud and across multiple computers and mobile devices. Most include the ability to share and collaborate on documents, which can be useful in classroom and research environments.

These free services work great for everyday purposes, but longer research projects require more space and organizational sophistication. The collection of over 10,000 manuscript letters at the heart of my dissertation, which I spent three years digitizing, organizing, categorizing, and annotating, consume about 30 GB. Not to mention the reams of digital photos, pdfs, and tiffs spread across dozens of project folders. It is not uncommon these days to pop into a library or an archive and snap several gigs of photos in a few hours. Whether this kind of speed-research is a boon or a curse is subject to debate. In any event, although they impose certain limits, ADrive, MediaFire, and Box (under a special promotion) offer 50 GB of free space in the cloud. Symform offers up to 200 GB if you contribute to their peer-to-peer network, but their interface is not ideal and when I gave the program a test drive it ate up almost 90% of my bandwidth. If you are willing to pay an ongoing monthly fee, there are countless options, including JustCloud‘s unlimited backup. I decided to take advantage of the Box deal to backup my various research projects, and since the process was far from straightforward, I thought I would share my solution with the world (or add it to the universal hive mind).

Below are the steps I used to hack together a free, cloud-synced backup of my research.  Although this process is designed to sync academic work, it could be modified to mirror other material or even your entire operating system (more or less). While these instructions are aimed at Mac users, the general principles should remain the same across platforms. I can make no promises regarding the security or longevity of material stored in the cloud. Although most services tout 256 bit SSL encryption, vulnerabilities are inevitable and the ephemeral nature of the online market makes it difficult to predict how long you will have access to your files. The proprietary structure of the cloud and government policing efforts are critical issues that deserve more attention. Finally, I want to reiterate that this process is for those looking to backup a fairly large amount of material. For backups under 5 GB, it is far easier to use one of the free synching services mentioned above.

Step 1: Signup for Box (or another service that offers more than a few GB of cloud storage). I took advantage of a limited-time promotion for Android users and scored 50 GB of free space.

Step 2: Make sure you can WebDAV into your account. From the Mac Finder, click Go –> Connect to Sever (or hit command-k). Enter “https://www.box.com/dav” as the server address. When prompted, enter the e-mail address and password that you chose when you setup your Box account. Your root directory should mount on the desktop as a network drive. Not all services offer WebDAV access, so your mileage may vary.

Step 3: Install Transmit (or a similar client that allows synced uploads). The full version costs $34, which may be worth it if you decide you want to continue using this method. Create a favorite for your account and make sure it works. The protocol should be WebDAV HTTPS (port 443), the server should be www.box.com, and the remote path should be /dav. Since Box imposes a 100 MB limit for a single file, I also created a rule that excludes all files larger than 100 MB. Click Transmit –> Preferences –> Rules to establish what files to skip. Since only a few of my research documents exceeded 100 MB, I was fine depositing these with another free cloud server. I realize not everyone will be comfortable with this.

Step 4: Launch Automator and compile a script to run an upload through Transmit. Select “iCal Alarm” as your template and find the Transmit actions. Select the action named “Synchronize” and drag it to the right. You should now be able to enter your upload parameters. Select the favorite you created in Step 3 and add any rules that are necessary. Select “delete orphaned destination items” to ensure an accurate mirror of your local file structure, but make sure the Local Path and the Remote Path point to the same place. Otherwise, the script will overwrite the remote folder to match the local folder and create a mess. I also recommend disabling the option to “determine server time offset automatically.”

Step 5: Save your alarm. This will generate a new event in iCal, in your Automator calendar (if you don’t have a calendar for automated tasks, the system should create one for you). Double-click the event to modify the timing. Set repeat to “every day” and adjust the alarm time to something innocuous, like 4am. Click “Done” and you should be all set.

Automator will launch Transmit every day at your appointed time and run a synchronization on the folder containing your research. The first time it runs, it should replicate the entire structure and contents of your folder. On subsequent occasions, it should only update those files that have been modified since the last sync. There is a lot that can go wrong with this particular workflow, and I did not include every contingency here, so please feel free to chime in if you think I’ve left out something important.

If, like me, you are a Unix nerd at heart, you can write a shell script to replicate most of this using something like cadaver or mount_webdavrsync, and cron. I might post some more technical instructions later, but I thought I should start out with basic point-and-click. If you have any comments or suggestions – other cloud servers, different process, different outcomes – please feel free to share them.

UPDATE: Konrad Lawson over at ProfHacker has posted a succinct guide to scripting rsync on Mac OS X. It’s probably better than anything I could come up with, so if you’re looking for a more robust solution and you’re not afraid of the command line, you should check it out.

Cross-posted at HASTAC