Random Language Generation, Part 1

Debate continues over the usefulness of the so-called Digital Humanities. See this skirmish in the Los Angeles Review of Books for a recent example. As a graduate student in the Department of English Language and Literature (and let this serve as my introduction to this blog–hello!), I often encounter skepticism about whether computational methods can reveal anything we don’t already know about texts. In many cases, I tend to agree.

But there’s a more obvious reason that scholars should be engaged with the digital: an increasing number of contemporary cultural objects are born digital. I’m talking about artists such as Jason Salavon, whose practice involves taking the mean average of a series of photographic portraits and displaying the results. Other artists in the MET’s “After Photoshop” exhibit (up until Spring 2013) are similarly worth checking out. Salavon’s tech notes on his “amalgamation” work are especially fascinating.

In literature, Nick Montfort’s “Taroko Gorge” is a truly born-digital creation. Written in Python and ported into JavaScript for the web, it’s inspired a series of imitations (which Montfort hilariously strikes out in his hyperlinks to on the right margin of the “Taroko Gorge” webpage). The poem implements a basic vocabulary and set of syntactical rules–then it simply runs them forever in random combinations. Or until you quit your browser window!

 

I leave it to the reader (or another post!) to take apart Montfort’s actual code.  What I want to suggest is that humanists can speak to a number of issues surrounding pieces like “Taroko Gorge.” Most obviously, there is the question of authorship. How can a poem that is different each time you load it “belong” to a unitary author? Is Montfort the author of the poem that is endlessly scrolling in my next tab, or does he merely possess some rights with respect to his code? And when others grab that code and just switch out the old words for some new ones, are they plagiarizing? Appropriating?

But “problematizing” the conception of “authorship” is, to my mind, the low-hanging fruit. FWIW, Montfort welcomes the remixes of his poem but prefers remixes of the code, rather than the vocabulary.

As I see it, these remixes say nothing about the poetic quality of my original program. However, they speak endlessly of the poetic potential of a page of code. I would be delighted, of course, to see many more remixes of “Taroko Gorge.” But it would be even better if it becomes impossible to discern that the new poetic programs being developed are related to my original program. It would be even better if ten other programs are written that work in completely different ways to generate poems, and if each of these are themselves remixed ten times. The idea of developing computational art in a short amount of time, in an hour or in a day – an idea that I’ve ripped off and remixed from many others – is what really deserves to get picked up and reworked, again and again.

This kind of random language generation has a history, distinct methods, and an ongoing social impact. Our work as humanists can only improve if we understand these processes and effects. As it so happens, I’m currently writing a random sentence generator for a linguistics seminar homework. Like Montfort, I’m also writing in Python. In today’s post, I’ll outline one step of random sentence generation. Subsequent posts will touch on the other components needed to create a simplified version of something like “Taroko Gorge.”

The first step is to implement a context-free grammar (CFG) that should ideally follow a set of conventions based on work by Noam Chomsky in the early 1960s and called Chomsky Normal Form (CNF). But let’s leave aside CNF for the moment. Here are some sample rules from our CFG:

  1. ROOT –> S .
  2. S –> NP VP
  3. NP –> DET N
  4. VP –> V NP
  5. DET –> the | a
  6. N –>  gorge | mountain | crag | stone
  7. V –>  roams | sweeps

What this tells us is that we start out with a ROOT and can then generate a S (sentence) followed by a period. S expands to a Noun Phrase and a Verb Phrase. NP expands to a Determiner and a Noun; VP can become a Verb followed by another Noun Phrase. All the upper-case symbols are called non-terminals, because they can expand into other constituents (that is, they don’t “terminate” in the English words that will end up making the sentence). 

Rules 5, 6 , and 7 are the terminals. These are the words that will actually appear in the sentence (thus “terminating” the expansion of non-terminals). When we see a DET or determiner we can either select “the” or “a” and likewise for the nouns and verbs. So let’s start with ROOT and generate a pseudo-random sentence by expanding each non-terminal from left to right.

  1. S .
  2. NP VP .
  3. DET N VP .
  4. The N VP .
  5. The gorge VP .
  6. The gorge V NP .
  7. The gorge sweeps NP .
  8. The gorge sweeps DET N .
  9. The gorge sweeps a N .
  10. The gorge sweeps a crag .

Now obviously these rules are not complex enough to represent English. There’s no way to generate a prepositional phrase yet, or even an adjective. Much less things like singular-plural agreement between nouns and verbs. But by grabbing a few words from Montfort’s vocabulary and pairing them with very basic rules about English sentences, we can start to see how we might generate random language. Can you think of (or comment on!) other possible sentences our grammar can make at this early stage? 

Next time, we’ll talk about storing a set of these rules in a file and then writing a program that stores them in a convenient memory structure so that it can randomly select different expansion options.

The Rail Splitter Awakes

I wonder what bell hooks would say about Michonne’s “oppositional gaze?”

Longtime readers will remember that last year I pondered the connection between zombies and the slave trade. So it was pleasing to see the New York Times following a similar thread this week. As the author points out, though rooted in African tradition, zombies are a distinctly New World development, and the relationship between the undead and the enslaved is almost too obvious to mention. The evidence is ubiquitous, lurking just below the surface of our mass cultural consciousness. The wildly popular TV adaptation of The Walking Dead now stars Michonne, a ninja warrior who, among other things, sports her own zombie slave coffle. The two black men she pulls behind her are defanged and controlled by large metal collars and long metal chains that clank and clatter as they stumble along. Whether intentional or not, the echoes of racial slavery are conspicuous and searing – the show itself takes place in the Deep South. And the image of a powerful black woman dragging docile, neutered, and chained zombies across the southern landscape is stunningly poetic.

The living dead are more than just a metaphor for slavery and alienation. As I tried to suggest in my earlier post, they are an enduring artifact of the slave trade, a trade steeped in violence and death, whose legacies continue to haunt us to this day. Zombie folklore is complex and malleable. The ghostly return of the “Zombi” terrified New World slave societies as early as the eighteenth-century. As Francine Saillant and Ana Araujo show, the zombie myth can even serve as a form of empowerment. The seventeenth-century maroon warrior Zumbi is remembered in modern Brazil as a hero, a figure whose quest for autonomy transcends death itself. I’m still waiting for a long form treatment of this pressing and endlessly fascinating topic. In the meantime, the undead continue to beckon.

The 2012 blockbuster Abraham Lincoln: Vampire Hunter, based on the equally-popular mashup novel by Seth Grahame-Smith, does not attempt to present itself as serious history, which is probably a good thing. Like the forthcoming Tarantino flick Django Unchained, it eschews documentary realism in favor of highly stylized violence (Vampire Hunter director Timur Bekmambetov also directed the underrated minor masterpiece Night Watch). Still, there is something oddly compelling about the film’s portrayal of southern slaveholders as voracious vampires who literally drain the life force from their human property. As W. Scott Poole points out in his delightful review, there was “at least one case in Louisiana [in] which newly imported slaves became convinced that [their purported] masters were witches and vampires (after watching them drink red wine).” Although the real Lincoln was hardly a staunch egalitarian, the film offers up a more soothing alternate reality in which (SPOILER ALERT) Harriet Tubman rescues an axe-wielding Abe on the Underground Railroad and the two work together to save the entire Union cause. Despite her pivotal role in the story I would have liked to see more of Tubman, who was among the first American action heroes (just compare this sketch of her in battle fatigues to this image of Michonne). But I guess Harriet Tubman: Vampire Hunter would have been a little too transgressive.

There have been a spate of Lincoln-related movies lately, including The Conspirator, Saving Lincoln, and the forthcoming Spielberg epic Lincoln, which is based in part on Doris Kearns Goodwin’s Team of Rivals. It will be interesting to see how the emotionally earnest realism of the Spielberg film compares with Django, a spaghetti-western-inspired revenge story with shades of the Murrell Conspiracy. Grahame-Smith’s interpretation of the Civil War era offers something different insofar as it openly parodies Lincoln’s heroic mythology. At the same time, it could be read as reinforcing the image of the sainted leader. Whether slaying vampires or emancipating slaves, Honest Abe is always at the center of the action.

“The Great Man,” wrote historian Thomas Carlyle, “is a Force of Nature.” Caryle probably hated Lincoln. The only thing he despised more than elective democracy was slave emancipation, and he was a Confederate partisan. Even so, the nature analogy may be apt.  Abraham Lincoln is something like the black hole of nineteenth-century American history, an irresistible gravitational force pulling in anything and everything around it. Once you cross the event horizon of 1861, the beginning of the United States Civil War, it is nearly impossible to escape. Many of the giants in my field began their careers studying slaveholders or abolitionists and ended up writing Big Books on Lincoln and/or the Civil War. While these are certainly worthy topics, I have never found them particularly enthralling. The real war, I would argue, began in the 1770s and seethed for nearly a century, sometimes expressing itself culturally, sometimes politically, sometimes breaking out into open violence. The events of 1861-65 were important, but they were also the manifest symptom of a more extensive conflict, the final breaking out into the open of an ongoing, decades-long war over slavery.

The cult of Lincoln conceals the extent to which he was controlled by events running much deeper in the national and international political landscape and how his own deft strategizing intersected with those events to shape the outcome of the struggle. Although politicians and generals played a crucial role in the Civil War, there is plenty of evidence to suggest that grassroots developments were just as significant in determining the logic and pace of the radical changes sweeping the country. Data from the Visualizing Emancipation project, for example, clearly show that emancipation events – especially events classified as “African Americans Helping the Union” and “Fugitive Slaves/Runaways” – drastically increased in the ten months prior to the signing of the Emancipation Proclamation. The proclamation cemented this momentum and allowed it to expand. Lincoln’s fictional alliance with Tubman in Vampire Hunter hints at this dynamic. Ultimately, however, the depiction of the sixteenth president as a flawless force of nature, almost single-handedly responsible for the Union victory, obscures a lot more than it reveals.

Abraham Lincoln as the Rail Splitter, a campaign newspaper published in Cincinnati, Ohio, October 3, 1860.

The larger-than-life image of Lincoln as a world-historical figure, as the “Great Emancipator,”as the free laboring “Rail Splitter,” which provides the grist for Grahame-Smith’s revision, did not just appear out of thin air following his martyrdom. It was actively disseminated during his lifetime by editors, politicians, and paramilitary organizations such as the Wide Awakes. The latter group, which numbered in the hundreds of thousands, saturated the northern and border states with Lincoln’s image and served as shock troops for the Union cause. In other words, Lincoln had a pretty efficient public relations machine. And this brings me to the digital humanities (how’s that for an overwrought segue?)

In a classic post on the Social Contract of Scholarly Publishing, Dan Cohen argues, among other things, that academic authors need to do a better job cultivating an audience for their work. This can be done in the digital realm, he suggests, by pioneering new curatorial frameworks, by developing new ways to disseminate, promote, and review scholarship online. Common-Place, Digital Humanities Now, and the American Historical Review prize for Best Digital Article represent promising steps in this direction. The last of these seems especially significant, since it will only accept work that is “impossible in print.” But offering up innovative work in a trusted and easily accessible format, carving out new spaces for the play of ideas, is only half the battle. As any Hollywood producer will tell you, films like Abraham Lincoln: Vampire Hunter, Lincoln, and Django Unchained are only as successful as their attempts to present a recognizable brand, stimulate public interest, and build an audience. Hollywood marketing is notoriously bloated and avaricious – sometimes far exceeding the size of a film’s actual budget. So I do not think academics would benefit from this model. But I wonder what would happen if professional historians had that kind of publicity? It might make the inevitable sequel, in which Frederick Douglass teams up with Frankenstein’s Monster to fight the Wolf Man, somewhat more palatable.

Archival Fragment of the Amistad Revolt

Sometimes the best cure for archive fever is to share it with the world.

“Pa Raymond,” Sierra Leone Mission Album, box 2, p. 122, Records of the United Brethren in Christ Foreign Missionary Society, United Methodist Archives, Drew University.
I was reminded of the mundane joys of the archive again several months ago when, thanks to a tip from a colleague, I located an extremely rare photograph of one of the survivors of the Amistad slave revolt in the United Methodist Archives in New Jersey. It is difficult to tell whether the old man, called “Pa Raymond” on the reverse of the photo, is the real deal, but circumstantial evidence suggests that he might be Kale Walu, or “Little Kale,” who was just a boy when he was abducted and enslaved in West Africa in 1839. Kale (also spelled Kali or Carly) was the author of the famous “crazy dolts” letter, addressed to John Quincy Adams on the eve of their trial in the United States Supreme Court. He assumed the name George Lewis when he returned to Africa in 1842, part of an ongoing project to reinvent former slaves as anglicized Christians. As one of the youngest among the returning group, he was something of a surrogate son for abolitionist missionary William Raymond and may have taken his surname later in life. Pa is Krio for “father,” an honorific title for village elders.

The photo was probably taken sometime in the early 20th century by the United Brethren in Christ, who had inherited an abolitionist outpost, called the “Mendi Mission,” in what is now southwestern Sierra Leone. Almost all of the photos in the collection date from after the rebellion of 1898. When Canadian missionary Alexander Banfield encountered a man claiming to be an Amistad veteran during a tour of Sierra Leone in 1917 (likely the same man in this photo), he estimated the man was about 100 years old. Although my work does not really focus on the Amistad captives (I’m interested in the larger story of American abolitionists in Africa), it is bracing to look into the eyes of this man. Sole survivor. Adopted son of the missionary, traveling barefoot through the bush. White-haired patriarch, holding something mysterious with his right hand. What have those eyes seen? Where are they looking now?

Thanks to the generous (and underpaid and understaffed) archivists in New Jersey and the embattled public domain laws of the United States, I am able to share this treasure with the world (I think) for the first time. It belongs to the world. I am just returning it.

Mal d’Archive

You know you’re a pretentious academic blogger when you start titling your posts in French, and if you can quote one of the most notoriously abstruse French philosophers at the same time, well that’s just a bonus. Jacques Derrida is not much in style these days (if he ever was). His ideas, and especially his prose, have been the butt of many jokes over the past half-century, but his 1994 lecture series Mal d’Archive (later published and translated as “Archive Fever“) is a significant artifact of the early days of the digital revolution. Although I don’t quite agree with everything its author says, the book makes an earnest attempt to grapple with the intersection of technology and memory and offers some worthwhile insight.

An archivist works feverishly.

The idiomatic en mal de does not have a direct analogue in English, but for Derrida it means both a sickness and “to burn with a passion.” It is an aching, a compulsive drive (in the Freudian sense) to “return to the origin.” It is the sort of fever rhapsodized by Peggy Lee, the kind of  unquenchable desire that can only be remedied by more cowbell. Whatever Derrida means by archive fever (and I think he leaves its precise meaning deliberately ambiguous), it is a concept that has some resonance for historians. As a profession, we tend to privilege primary sources, or archival documents, over secondary sources, or longer works that analyze and interpret an archive. Yet even the most rudimentary archival fragment contains within it a narrative, a story, an argument. Every document is aspirational; every archive is also an interpretation. There is no such thing as a primary source. There are only secondary sources. We build our histories based on other histories. The archive, Derrida reminds us, is forever expanding and revising, preserving some things and excluding others. The archive, as both subject and object of interpretation, is always open-ended, it is “never closed.”

Of course, in a few weeks, in what can only be described as a stunning disregard for French philosophy, the Georgia State Archives will literally shut its doors. Citing budget cuts, the state announced it will close its archives to the public and restrict access to special appointments (and those appointments will be “limited” due to layoffs). For now, researchers can access a number of collections through the state’s Virtual Vault, but it is not clear whether more material will be added in the future. The closure comes at the behest of governor Nathan Deal, whose recent political career has been beset by ethics violations. The cutbacks are the latest in a string of controversial decisions by the Georgia governor, including the rejection of billions of dollars in medicare funds and a $30 million tax break for Delta Airlines, and will have a negative impact on government transparency. Coming on the heels of the ban on ethnic studies in Arizona, the campaign against “critical thinking” in Texas, attacks on teachers in Illinois and Wisconsin, and deep cuts in public support for higher education across the country, the news from Georgia seems a portent of dark times.

Archives are so essential to our understanding of the past, and our memory of the past is so important to our identity, that it can feel as if we have lost a little part of ourselves when one is suddenly closed, restricted, or destroyed. Historian Leslie Harris calls public archives “the hallmarks of civilization.” Although I don’t entirely agree (are groups that privilege oral tradition uncultivated barbarians?), Harris points to a fundamental truth. The archive is an integral component of a society’s self-perception. Without open access to archival collections, who could corroborate accusations that the government was conducting racist medical experiments? Who would discover the lost masterpiece of a brilliant author? Who would provide the census data to revise wartime death tolls? Who would locate the final key to unlock the gates of Hell? All of the boom and bluster about digitization and the democratization of knowledge notwithstanding, it is easy to forget that archival work is a material process. It takes place in actual physical locations and requires real workers. What does it mean for the vaunted Age of Information when states restrict or close access to public repositories?

However troubling the news from Georgia, all hope is not lost. This is not the end of days. Knowledge workers are fighting to preserve access to the archive. At the same time, efforts by historians to crowdsource the past offer a fascinating and potentially momentous expansion of archive fever. Several high profile projects are now underway to enlist “citizen archivists” to help build, organize, and transcribe documentary collections. Programmers at the always-innovative Roy Rosenzweig Center for History and New Media have just released a “community transcription tool” that will (hopefully) streamline the process of collaborative archiving, transcribing, and tagging across platforms. The potential for public engagement and the production of new knowledge is stupendous. Because they rely on the same volunteer ethos as Wikipedia, however, it is likely that part-time hobbyists will be more interested in parsing obscure Civil War missives than the correspondence of Jeremy Bentham. A citizen archivist with a passion for Iroquois genealogy might have little interest in, let’s say, the municipal records of East St. Louis. And this is precisely where major repositories and their well-trained staff can help supervise, guide, and even lead the public. What if every historian could upload all of their primary sources to a central repository when they finished a project? What if there was a universal queue where researchers could submit manuscripts for public transcription, along the lines of the now-ubiquitous reCAPTCHA service? Perhaps administrators could implement some sort of badge or other incentive program in exchange for transcribing important material? As all manner of documents are digitized, uploaded, and transcribed in a lopsided, haphazard, and ad-hoc fashion, in vastly disparate quality, in myriad formats, in myriad locations, physical archives and their staff are needed more than ever – if only to help level the playing field. Among the most important functions of the professional archivist is to remind us that there is much that is not yet online.

Note recording the arrival of the Amistad survivors in Freetown, Sierra Leone, Jan. 1842. Liberated African Register, Sierra Leone Public Archives, Freetown.

One of the best experiences I’ve ever had as a researcher was in the national archives of Sierra Leone. Despite a century and a half of colonialism, a decades-long civil war, and other challenges that come with occupying a bottom rung on the global development index, the collections remain open to the public and continue to grow and improve. They have even started to go digital thanks to some help from the British Library and the Harriet Tubman Resource Centre. Sitting in the Sierra Leone archives, with its maggot-bitten manuscripts, holes in the windows, and sweltering heat, suddenly the much-discussed global digital divide seems very real. Peering out of the window one day, as I did, to see a mass of students drumming and chanting, then chased by soldiers in riot gear, the screams from the crowd as you shield yourself from gun fire behind a bookshelf thick with papers, it is difficult to look at knowledge work the same way again. When I enter a private archive in the United States, with its marbled columns and leather chairs, its rows of computers and sophisticated security cameras, I am grateful and angry – grateful that this is offered to some, angry that it is denied to others. The archivists and their support team in Freetown are heroes. Full stop. I worry about them when I read about the conflict in Libya, which continues to spill across borders and has led indirectly to the destruction of priceless archives and religious monuments in Mali.

Compared to the situation in West Africa, the more modest efforts to preserve and teach the past across the United States seem like frivolous first world problems. On the other hand, all information is precious. Whether physical or digital, access to our shared heritage should not be held hostage to political agendas or economic ultimatums. Archives are a right, not a privilege. I like to think that Derrida, who grew up under a North African colonial regime, would appreciate this. If Sierra Leone can keep its archives open to the public, why can’t the state of Georgia?

Cross-posted at HASTAC

One man’s trash . . . is another man’s archive

“The most difficult thing about collecting is discarding.”
– Albert Köster

photo by @jmhuculak

The photo above was taken outside Sterling Memorial Library at Yale University. Those long rectangular drawers you see are what’s left of of that pre-digital archive known as the card catalog. The genealogy of this “universal paper machine”  has been detailed by Markus Krajewski in his delightful book Paper Machines. About Cards & Catalogs, 1548 -1929. Far from being the first form of reference technology, this system is only one in a long series of attempts to discover, store and classify knowledge. Yet the transition from the painstakingly compiled paper archive to the extended technological networks which are replacing that archive is more than a simple change in office furniture. The dumpster’s contents signal a change far more dramatic than replacing an index card with a doi, or swapping the cabinetry for a computer.

With a card catalog, the information on the index card would signal to its reader a wide range of information. This exchange between reader and the material read was relatively unproblematic, unless of course the information contained on the card catalog was written in an unfamiliar alphabet or language, or the reader lacked the basic literacy required to grasp the information. Thanks to this information, our reader might find have been able to find the location of the books in the library, some broad subject headings, and other bibliographical information. The reader would have acted on that information by either requesting the volume at the circulation desk, or moving on to a different bibliographic record all together.

In a similar vein, the Resource Description Framework  represents information about resources in the World Wide Web, but instead of using natural language on index cards to communicate sufficient meaning to our curious reader, it communicates that information in a machine readable form. It represents similar metadata about Web resources, such as the title, author, and modification date of a Web page, in addition to any copyright or licensing information. Yet unlike the card catalog, RDF  allows this information to be processed by applications, rather than being  displayed to people. It provides a common framework for expressing this information, so it can be exchanged between applications without loss of meaning.

photo by @jmhuculak

The photo on the left provides a good example. The data stored in the card catalogs could be compared to an application running on a single machine. The only people needing to understand the meaning of a given variable such as “author” or “date of publication” are those who consulted that card catalog directly. In the case of an application running on a single machine, those people would be the programmers reading the source code. But if we want the data contained in this card catalog to participate in a larger network, such as the world wide web, the meanings of the messages the applications exchange, “author”, “date of publication,” etc. need to be explicit.

In fact, currently far too much of the data fueling web applications is prevented from being shared and integrated into other Internet applications. The compartmentalization of the card catalog has carried over into web applications and data transmission becomes entangled in stovepipe systems, or “systems procured and developed to solve a specific problem, characterized by a limited focus and functionality, and containing data that cannot be easily shared with other systems.” (DOE 1999) These applications, instead of allowing users to combine data in new ways to make powerful and compelling connections, risk becoming the digital equivalent of an abandoned card catalog in a dumpster.

As more and more digital humanists share and distribute their work via the world wide web, a working knowledge of the importance of programming for the semantic web becomes essential. Simple mechanisms such as RDF play a key role in transmitting semantic data between machines while allowing applications to combine data in new ways. Much like The Fantastic Flying Books of Mr. Morris LessmoreRDF allows meaningful data transmission to rejoin the many applications hiding behind web interfaces. It transforms what might have been discarded into data rich applications. It also enables digital humanists to join their work to a larger ocean in “the stream of stories.”

Different parts of the Ocean contained different sorts of stories, and as all the stories that had ever been told and many that were still in the process of being invented could be found here, the Ocean of the Streams of Story was in fact the biggest library in the universe. And because the stories were held here in fluid form, they retained the ability to change, to become new versions of themselves, to join up with other stories and so become yet other stories; so that unlike a library of books the Ocean of the Streams of Story was much more than a storeroom of yarns. It was not dead but alive.

Salman Rushdie, Haroun and the Sea of Stories

cross posted at HASTAC