Category Archives: Yale Projects

On Being Quoted by the Simpsons

I will always have a soft spot for The Simpsons. Although I no longer watch the show on a regular basis, it was an important part of my childhood, and I paid homage to it in one of my first academic journal articles. During its heyday in the mid-90s (when I had to sneak around my parents’ prohibition in order to watch it), the show developed a reputation for unusually intelligent, iconoclastic humor. It offered smart, politically-conscious satire, served up by smart, politically-conscious writers. The Simpsons seems to employ more Harvard graduates than McKinsey & Company – the writer’s room is essentially a jobs program for the Harvard Lampoon. Naturally, therefore, its central villain, the inimitable C. Montgomery Burns, is a Yale man (class of 1914). With his absurd anachronisms and ruthless, mustache-twirling embodiment of corporate capitalism, Burns is easily one my favorite characters. So when I heard that the show put out an episode about his return to campus, I could not resist.

A clip from The Simpsons episode “Caper Chase,” season 28, episode 19.

To my surprise, the episode makes direct reference to an essay I published two years ago, establishing that Elihu Yale was a slave trader. The piece, which I wrote in a few hours in response to a conference, probably receives more attention than all of my more traditional scholarship combined. During the debate over the renaming of Calhoun College, it appeared on reddit and the Wall Street Journal, made its way onto Wikipedia, and was tweeted out by historian colleagues and celebrities such as Ann Coulter (which, I will admit, made me throw up a little in my mouth). In the episode, it forms part of a larger joke about liberalism gone mad at Yale. On a tour of campus, Burns encounters teachers who were fired for celebrating Columbus Day, students who call him “worse than Hitler,” and signs that read: “Shakespeare is murder” and “Eli Yale was a profiteering slave trader.” Aghast, Burns wonders aloud if Yale is “still a coven of capitalism, where evil money can acquire a patina of virtue” – and he gets in a good crack about “ruthless media disruptor Samuel F. B. Morse.”

When Burns attempts to endow a Department of Nuclear Plant Management, he is thwarted by students, who are described as “highly-entitled wusses.” Instead, school administrators point out that they “need to hire more deans to decide which Halloween costumes are appropriate.” The latter refers to an actual incident sparked by a memo about racist/offensive Halloween costumes, which made national news. Although the episode does not mention the recent rebirth of Calhoun College as Hopper College, the subtext is clear. This is neither the time nor the place to rehash those debates. (You can read my thoughts on Grace Hopper here.) But the implication that students seeking redress for social injustice are “wusses” left me feeling deeply uneasy.

Students on campuses across the country face unprecedented economic pressures, shameful levels of sexual assault, and administrators eager to capitalize on “diversity” while doing very little to support underprivileged students. These same students have every right to demand a space free from racism or discrimination, and those of us on faculty and staff have a moral obligation to stand with them. By ridiculing and dismissing student protestors, the writers of The Simpsons are doing exactly the opposite. Instead of using their position of tremendous privilege and authority to question the status quo, they use it to attack a vulnerable population. Instead of speaking truth to power, as in some of the show’s greatest episodes, they seek to undermine the relatively powerless.

Of course, no group or idea should be exempt from parody. The ability to laugh at oneself demonstrates humility, strength, and self-awareness. And to be fair, the episode attempts a muddled critique of Trump University and the for-profit education industry. Yet the whole tone of the campus visit feels hackneyed and mean-spirited rather than fresh or funny. The writers’ clear desire to be the next William F. Buckley results in a ham-fisted diatribe that stands at odds with the show’s subversive tradition. (PCU did a better job with similar material over twenty years ago. David Spade’s tirade at the film’s end about “whiny crybaby minorities” shows that not much has changed.)

I understand the argument of some pundits who think that student protests about costumes and memorials are misguided or absurd. With the Trump administration rolling back protections for disadvantaged groups and the environment at a blistering pace, fussing over the names of buildings seems like rearranging deck chairs on the Titanic. But justice comes in all forms, and our colleges and universities should embody the change that we want to see in the world. As historian Craig Wilder argued recently: “Campuses are not museums for the emotional and psychological bigotries of the alumni.” Reckoning with that truth is an important victory and will set the stage for other victories in the future. That Elihu Yale or Mr. Burns would no longer feel at home on a college campus is a good thing. Although it can be a long and arduous process, successful student movements prove that evil money can indeed become virtuous.

The Long Goodbye

“What goes on the Internet, stays on the Internet,” seems to be conventional wisdom these days. To quote the definitely not hysterical or hyperbolic headline used by the New York Times: “The Web Means the End of Forgetting.” Like much conventional wisdom, though, when you actually investigate them, these slogans turn out to be complete nonsense. The Internet is not some kind of perpetual memory machine. The “right to be forgotten” laws in the EU and the 1 million+ URLs Google has evaluated for removal over the past year alone are only the most visible tip of a vast subterranean system of digital decay. Consider all of the missing pages, broken links, and buggy websites that we encounter on a daily basis. Some of this misplaced material can be found with a few minutes of dedicated sleuthing, but a great deal of it will be lost forever. Consider all of the hand-wringing and discussions and lengthy policy studies within the digital humanities community over the issues of preservation and sustainability. If the Internet is forever, why is maintaining a digital project such a big problem?

homestar404Online artifacts are a lot like biological organisms: they are born, they live, and they die. Even the terms used to describe their transmission, such as meme and “going viral,” draw on evolutionary theory. And in this new Darwinian frontier, survival is not automatic. The web is littered with half-finished, abandoned, lost, or outdated digital humanities projects. Part of the reason for this state of affairs is poor institutional support. IT departments are often loath to commit any resources to maintain sites or databases created by students and faculty. (I have encountered some pretty egregious examples of this.) And academic institutions have been slow to realize that support for digital projects is a core part of their basic educational mission. A handful of exceptional sites have endured, with strong institutional commitments. It took two years, a $100,000 grant, and a team of staff and student workers to resuscitate the Valley of the Shadow project, which ended in 2007. The Trans-Atlantic Slave Trade Database, which debuted in 2008, just won a major grant to recode the entire site and add new data and foreign language translations. Most projects are not so lucky. And when scholars are forced to trudge out on their own, with little or no long-term support structure, we all suffer.

I had some fun with the problem of sustainability in my Digital History course last year. One of our core texts, Digital History: A Guide... by Dan Cohen and Roy Rosenzweig, has an entire chapter on preservation. Although the book itself is a decade old, the concepts and analysis remain remarkably fresh, and it is still a popular touchstone that is widely assigned in DH courses on the undergraduate and graduate level. Given the rapid pace of technological development, however, a new edition is badly needed. DH staples, such as Omeka and WordPress, or even Twitter and Wikipedia, were not yet on the radar when the book was published, and many of its examples are comically outdated. The authors spend several paragraphs discussing Jim Zwick’s famous website: “Anti-Imperialism in the United States, 1898–1935.” So I asked my students to evaluate the site for themselves. Of course, this was an impossible task, since the site no longer exists.

If done carefully and respectfully, I have found that tricking my students with phony texts or assignments like this can be both fun and enlightening. In this case, I offered to treat the entire class to an expensive meal if any one of them could find me a copy of Zwick’s site, in any format. Despite vigorous efforts on the Wayback Machine, Google, and even the Darknet, no one could find any trace of it. I gave them the entire semester to complete the challenge, and they still turned up nothing. Aside from a few tantalizing reviews, this huge, popular, exemplar DH project had been completely wiped from the face of the earth. Zwick’s death in 2008 means that the site, and all of its related data, guides, and analysis, will probably never return. At least in its original form, it has been lost forever. What goes on the Internet does not always stay on the Internet. (My students received a pizza lunch at the end of the semester anyway.)

dennisnedryAnother reason for the diminishing returns of existing DH projects is the publish-or-perish cult that dominates mainstream academic work. It is more advantageous to your CV, and way more exciting, to move on to a sexy new project than to spend all of your time and energy updating or preserving work that you did years ago. Although I try my best to maintain my legacy projects, I am as guilty of this as anyone. I did not update the underlying infrastructure or theme for this blog for years, and only did so recently because the server that hosted it was scheduled for demolition. Fortunately, Yale has committed to supporting my digital projects for the long haul, even after my graduation. They gave me a new domain for this site and an updated platform with admin privileges and server-side access. So I will be able to continue to maintain the site now that I have moved on to another position at some podunk school in New Jersey. I will be able to update the overall feel and UX for the site, make it more responsive and mobile friendly, and adapt to any new changes on the tech horizon. (A big shout out to Pam Patterson, Trip Kirkpatrick, and the rest of the ITG staff for offering a wonderful legacy support structure. I could not have kept this site alive without them.) Meanwhile, archived snapshots of earlier versions of the site are available on the Wayback Machine. Unfortunately, it is not a complete archive. Contrary to popular belief, not everything is automatically indexed by the Wayback Machine, or Google, or other similar services. When I realized this a few years ago, I had to go back and manually request the Machine to crawl and capture this site, page by page. If any folks out there know a better way to ensure that your site is archived on a comprehensive and regular basis, please get in touch.

A post about preservation and sustainability seems like a fitting way to close out this blog (or at least this iteration of it). When I started this site about five years ago, I was one of maybe two or three grad students at my university who had any interest in digital tools and methodologies for research and teaching. There seemed to be a pressing need for a voice in the History Department, or in the humanities more generally, to raise the profile of DH work and show something of the potential and excitement of this new area of scholarship. Material on this blog has been featured on Digital Humanities Now, HASTAC, the Historical Society, and other places, and has hopefully contributed in some small way to that goal. This year, Yale launched a multi-million dollar Digital Humanities Lab, which has been many years in the making. Carol Chiodo, my co-blogger and longtime associate, is one of the founding staff members. I can’t really claim that this site contributed much to the establishment of the DH Lab at Yale, but I hope that it at least helped to add to the groundwork or general milieu that made the Lab a logical possibility. Certainly, the site has benefited my own career, building bridges to new topics, allowing me to meet new colleagues from all over the world, providing conference invites, interview requests, and job offers. It allowed me to preview some of my more serious scholarly work, or respond to pressing issues, or just vent some of my pent up silliness. Even if I do not have much time to continue to grow this site in the years ahead, I plan to keep it alive for as long as possible, and I promise to make sure that it is archived and accessible for future generations. Because in the rapidly evolving digital world, permanency is not something that can be taken for granted.

Elihu Yale was a Slave Trader

anonslaveNext week, the Gilder Lehrman Center for the Study of Slavery, Resistance, and Abolition and the Yale Center for British Art are co-hosting a major international conference on slavery and British culture in the eighteenth century. The art exhibit associated with the conference is remarkable for many reasons, not least because it features a portrait of Elihu Yale being waited upon by a collared slave (euphemized as a “page” in the original listing). The painting is related to one held by the University Art Gallery, showing the same scene from a different perspective. And it is similar to another portrait of Yale with yet another collared slave (this time euphemized as a “servant”). This latter portrait, even more ominous and imperial than the first, is not a part of the exhibit. And that is a shame, because these paintings, and the larger conference of which they are a part, offer an opportunity to revisit the controversial and entangled history of slavery and universities.

Historians have long pointed out that Yale (the University) is deeply implicated in the institution of slavery. Many of its prominent buildings are named after slaveholders or slavery apologists. It housed so many southern students that it briefly seceded from the Union at the start of the Civil War. 1 Craig Wilder’s wonderful book Ebony & Ivy, published last year, shows that Yale is not alone in this regard. All of early America’s leading universities, both north and south, promoted and profited from slavery, racism, and colonialism. 2 At the same time, college campuses were battlegrounds where antislavery students and faculty engaged in dramatic confrontations with their opponents and developed new political movements. 3 Oddly enough, none of the scholarship on these issues mentions that Elihu Yale, the namesake of this august and venerable institution, was himself an active and successful slave trader.

As an official for the East India Company in Madras (present-day Chennai), Yale presided over an important node of the Indian Ocean slave trade. Much larger in duration and scope than its Atlantic counterpart, the Indian Ocean trade linked southeast Asia with the Middle East, the Indonesian archipelago, and the African littoral. On the subcontinent, it connected with and drew upon traditions of slavery and servitude that had flourished for generations. 4 In the 1680s, when Yale served on the governing council at Fort St. George on the Madras coast, a devastating famine led to an uptick in the local slave trade. As more and more bodies became available on the open market, Yale and other company officials took advantage of the labor surplus, buying hundreds of slaves and shipping them to the English colony on Saint Helena. Yale participated in a meeting that ordered a minimum of ten slaves sent on every outbound European ship. 5 In just one month in 1687, Fort St. George exported at least 665 individuals. 6 As governor and president of the Madras settlement, Yale enforced the ten-slaves-per-vessel rule. On two separate occasions, he sentenced “black Criminalls” accused of burglary to suffer whipping, branding, and foreign enslavement. 7 Although he probably did not own any of these people – the majority were held as the property of the East India Company – he certainly profited both directly and indirectly from their sale.

Some sources (including Wikipedia) portray Elihu Yale as an heroic abolitionist, almost single-handedly ending the slave trade in Madras. 8 This is incredibly misleading. During his tenure as governor, Yale made an effort to curb the stealing of children and others for the purpose of export. But a close reading of company documents reveals that it was anything but an act of humanitarian altruism. It was, in fact, the local Mughal government, which held more power than the tenuous English merchants, that insisted on abolition. Yale’s decree of May 1688 curbing the transport of slaves from Madras argued that the trade had become more trouble than it was worth. The surfeit of slaves from the previous year’s famine had dried up, and the indigenous government had “brought great complaints & troubles…for the loss of their Children & Servants Sperited and Stoln from them.” 9 With no profit left for the company and a hostile Mughal overlord demanding abolition, Yale was happy to comply.

Only one year later, in October 1689, Yale had no problem issuing orders for a company ship to travel to Madagascar, buy slaves, and transport them to the English colony on Sumatra. When they arrived by the hundreds, these unfortunate individuals were put to work as masons, carpenters, smiths, cooks, maids, gardeners, and porters. A select few even served as soldiers. In addition to free labor, they provided a strategic buffer against European rivals and further consolidated the company’s political and economic power. 10 African slaves in India and Indonesia, Indian slaves on St. Helena, rival empires jostling for control – the Indian Ocean trade was a complicated and convoluted melange. And Elihu Yale was right in the thick of it, directing it, turning it to his own advantage, and growing fat and rich from its spoils. This wealth, in the form of diamonds, textiles, and other luxury goods, enticed the founders of Yale College to pursue the famous merchant and to name their school in his honor. 11

Apologists might counter that Yale was a man of his time. Slavery was impossible to avoid, nobody opposed it, and most rich and successful people had a hand in it. None of that is true. In April 1688, less than a year after Yale became governor of Madras, a group of Quakers in Germantown, Pennsylvania, issued a statement condemning slavery in the colony: “There is a saying that we shall doe to all men licke as we will be done ourselves; macking no difference of what generation, descent or Colour they are. and those who steal or robb men, and those who buy or purchase them, are they not all alicke?” Quakers shed their ties to slavery during the eighteenth century while building a reputation as profitable and successful merchants. And they were hardly the only ones to protest the institution. In 1712, a major slave rebellion erupted in New York City, in which at least nine Europeans and twenty-seven Africans lost their lives. Several years later, when Yale College took its present name, opposition to slavery was endemic across the British Empire. 12 This was the broader world in which Elihu Yale worked, schemed, and built his fortune.

The evidence establishing Yale’s involvement in the slave trade is clear and compelling. Thanks to the Internet Archive, HathiTrust, and Duke University, almost all of the official records of Fort St. George are available online, and even more documents await future researchers. Those looking for further information can follow my footnotes. Hopefully other scholars will build on this record to paint a more complete picture of the stoic British gentleman and his dark, diminutive servants, forever bound together in those disturbing oil portraits.



  1. Antony Dugdale, J.J. Fueser, and J. Celso de Castro Alves, Yale, Slavery and Abolition (New Haven: The Amistad Committee, 2001); Frank Leslie’s Illustrated Newspaper, Feb. 2, 1861. See also
  2. Craig Steven Wilder, Ebony and Ivy: Race, Slavery, and the Troubled History of America’s Universities (New York: Bloomsbury Press, 2013). See also
  3. Wilder touches on this subject briefly in his final chapter, and I have been working on an article that will (hopefully) expand the narrative.
  4. Gwyn Campbell (ed.), The Structure of Slavery in Indian Ocean Africa and Asia (Portland, OR: Frank Cass, 2004); Indrani Chatterjee and Richard M. Eaton (eds.), Slavery and South Asian History (Bloomington: Indiana University Press, 2006); Richard B. Allen, European Slave Trading in the Indian Ocean, 1500–1850 (Athens, OH: Ohio University Press, 2015).
  5. Records of Fort St. George: Diary and Consultation Book of 1686 (Madras: Superintendent Government Press, 1913), 48; Records of Fort St. George: Diary and Consultation Book of 1687 (Madras: Superintendent Government Press, 1916), 8.
  6. Henry Davison Love, Vestiges of Old Madras, 1640-1800: Traced from the East India Company’s Records Preserved at Fort St. George and the India Office, and from Other Sources, vol. 1 (London: John Murray, 1913), 545.
  7. Records of Fort St. George: Diary and Consultation Book of 1688 (Madras: Superintendent Government Press, 1916), 30, 137; Records of Fort St. George: Diary and Consultation Book of 1689 (Madras: Superintendent Government Press, 1916), 99.
  8. See, for example, Hiram Bingham, Elihu Yale: The American Nabob of Queen Square (New York: Dodd, Mead & Company, 1939), 167.
  9. Diary and Consultation Book of 1688, 19, 78-79.
  10. Records of Fort St. George: Letters from Fort St. George for 1689 (Madras: Superintendent Government Press, 1916), 58-59; Records of Fort St. George: Letters from Fort St. George for 1693-94 (Madras: Superintendent Government Press, 1921), 12. On imperial rivalry, especially as it developed over the next century, see Andrea Major, Slavery, Abolitionism and Empire in India, 1772-1843 (Liverpool: Liverpool University Press, 2012), 49-84.
  11. Gauri Viswanathan, “The Naming of Yale College: British Imperialism and American Higher Education,” in Cultures of United States Imperialism, ed. Amy Kaplan and Donald E. Pease (Durham: Duke University Press, 1993), 85-108.
  12. Maurice Jackson, Let This Voice Be Heard: Anthony Benezet, Father of Atlantic Abolitionism (Philadelphia: University of Pennsylvania Press, 2009); Kenneth Scott, “The Slave Insurrection in New York in 1712,” New-York Historical Society Quarterly, 45 (Jan. 1961), 43-74; Peter Linebaugh and Marcus Rediker, The Many-Headed Hydra: Sailors, Slaves, Commoners, and the Hidden History of the Revolutionary Atlantic (Boston: Beacon Press, 2000).

WordPress as a Course Management System

I am a big fan of the WordPress publishing platform. It’s robust and intuitive with an elegant user interface, and best of all, it’s completely open source. Content management heavyweights such as Drupal or MediaWiki may be better equipped when it comes to highly complex, multimodal databases or custom scripting, but for small-scale, quick and dirty web publishing, I can think of few rivals to the WordPress dynasty. About 20% of all websites currently run on some form of WordPress. Considering that Google’s popular Blogger platform accounts for a measly 1.2% of the total, this is a staggering statistic. Like many digital humanists, I use WordPress for my personal blogging as well as for the courses that I teach. Yet I often wonder if I am using this wonderfully diverse free software to its full potential. Instead of an experimental sideshow or an incidental component of a larger course, what if I made digital publishing the core element, the central component of my research and teaching?

Jack Black as a course management system

What follows are my suggestions for using a WordPress blog as a full-fledged course management system for a small discussion seminar. These days almost all colleges and universities have a centralized course management system of some sort. In the dark ages of IT, a proprietary and much-derided software package called Blackboard dominated the landscape. More recently, there is the free and open source Moodle, the Sakai Project, and many others (Yale uses a custom rendition of Sakai called Classes*v2). These platforms, sometimes called learning management systems, collaboration and learning environments, or virtual learning environments, are typically quite powerful. Historically, they have played an important role in bridging analog and digital pedagogy. Compared to WordPress, however, they can seem arcane and downright unfriendly. Although studies of course management systems are sporadic and anecdotal, one of the most common complaints is “the need for a better user interface.” Such systems are built around administrative imperatives, such as quizzing, grading, and paper submission, that either subvert or stifle creative pedagogy. Instead of working to improve these old methods, perhaps it is time to embrace a new paradigm. Why waste time training students and teachers on idiosyncratic in-house systems, based on rote administrative functions, when you can give them more valuable experience on a major web publishing platform? Why let technology determine the limits of our scholarship and teaching, when we can use our scholarship and teaching to push the boundaries of emerging technologies?

Before getting started, I should point out that there are already a wide variety of plugins that aim to transform WordPress into a more robust collaborative learning tool. Sensei and BuddyPress Courseware are good examples. The ScholarPress project was an early innovator and still shows great promise, but it has not been updated in several years and no longer works with the latest versions of WordPress. The majority of these systems are more appropriate for large lectures, distance learning, or MOOCs (massive open online courses). There is no one-size-fits-all approach. For smaller seminars and discussion sections, however, a custom assortment of plugins and settings is usually all that is required. I have benefited from previous conversations about this topic. I also collaborate closely with my colleagues at Yale’s Instructional Technology Group when designing a new course. It is worth repeating that the digital humanities are, at their heart, a community enterprise.

Step 1: Install WordPress. An increasing number of colleges and universities offer custom course blogs along with different levels of IT support. For faculty and students here, Yale Academic Commons serves as a one-stop-shop for scholarly web publishing. Other options include building your own WordPress site or signing up for free hosting.

Step 2: Find a good theme. There is an endless sea of WordPress themes out there, many of them free. For my course blogs, I prefer something that is both minimalist and intuitive, like the best academic blogs. The simpler the better. I also spend a lot of time choosing and editing an appropriate and provocative banner image. This will be the first thing that your students see every time they log in to the site, and it should reflect some of the central themes or problems of your course. It should be something worth pondering. Write a bit about the significance of the banner on the “About” page or as a separate blog post, but do not clutter your site with media. As Dan Cohen pointed out last year, effective design is all about foregrounding the content.

Step 3: Load up on plugins. Andrew Cullison provides a good list of course management plugins for WordPress. Although almost all of them are out of date now, many have newer counterparts that are easily discoverable in the official WordPress plugin directory. Among the more useful plugins are those that allow you to embed interactive polls, create tag clouds, sync calendars, and selectively hide sensitive content. ShareThis offers decent social media integration. WPtouch is a great way to streamline your site for mobile devices. Footnote and annotation plugins are helpful for posting and workshopping assignments. I also recommend typography plugins to do fancy things like pull quotes and drop caps. A well configured WYSIWYG editor, such as TinyMCE, is essential.

Step 4: Upload content. Post an interactive version of the syllabus, links to the course readings, films, image galleries, and any other pertinent data. Although your institution probably has a centralized reserves system, it is perfectly legal to post short reading assignments directly to your course site, as long as they are only available to registered students. In some cases, this might actually be preferable to library reserves that jumble all of your documents together with missing endnotes and abstruse titles. Most WordPress installs do not have massive amounts of media storage space, but there is usually enough for a modest amount of data. If you need more room, use Google Drive or a similar cloud storage service.

Step 5: Configure settings and metadata. Make sure your students are assigned the proper user roles when they are added to the blog. Also be sure to establish a semantic infrastructure, with content categories for announcements, news, reading responses, primary documents, project prospectuses, etc. Your WYSIWYG editor should be configured so that both you and your students can easily embed YouTube videos, cite sources, and create tables. Depending on the level of interaction you would like to encourage on your site, the discussion settings are worth going over carefully.

Step 6: Figure out how you’re going to grade. After a good deal of experimentation, I settled on a plugin called Grader. It allows instructors to post comments that are viewable only to them and the student. Check out Mark Sample’s rubric for evaluating student blogs. Rather than grade each individual post, I prefer to evaluate work in aggregate at certain points during the semester. I also tend to prefer the 0-100 or A-F scale to the alternatives. Providing substantial feedback on blog posts is probably better than the classic √ or √+. You should treat each post as a miniature essay and award extra points for creativity, interactivity, and careful deliberation. If you are serious about digital publishing, it should account for at least 30-50% of the final grade for the course. Although I have not experimented with them yet, there are gradebook plugins that purport to allow students to track their progress throughout the semester.

Step 7: Be clear about your expectations. It can be difficult to strike the correct balance between transparency and simplicity, but I usually prefer to spell out exactly what I want from my students. For a course blog, that probably means posting regular reading responses and commentaries. In addition to response papers, primary documents, and bibliographies, I ask students to post recent news items and events pertaining to the central themes of the course. I encourage them to embed relevant images, films, and documents and to link to both internal and external material. I also require students to properly title, categorize, and tag their posts. Because what good is a blog if you are not making full use of the medium?

Step 8: Publish. Although there are good reasons for keeping course blogs behind an institutional firewall, there are equally good reasons for publishing them to the world. An open blog encourages students to put their best foot forward, teaches them to speak to a broader audience, and leaves a lasting record of their collective efforts. If making your blog publicly accessible, allow your students to post using just their first names or a pseudonym. This will allow them to remain recognizable to class members but relatively anonymous to the rest of the world. It is also a good idea to restrict access to certain pages and posts, such as the course readings and gradebook, to comply with FERPA and Fair Use guidelines.

I always review my course blogs on the first day of class, and I spend a fair amount of time explaining how to navigate the backend and post content. I also find it useful to reinforce these lessons periodically during the semester. It only takes a few minutes to review proper blogging protocol, how to embed images and videos, annotate documents, etc. If possible, project the course site in the background during class discussions and refer back to it frequently. Make it a constant and normal presence. Depending on the class, discussing more advanced digital publishing techniques, such as SEO, CSS, and wikis, can be both challenging and exciting. It is also important to remember that course management systems, like all emerging technologies, are embedded in larger social structures, with all of their attendant histories, politics, and inequalities. So it is worth researching and supporting initiatives, such as Girl Develop It or the Center for Digital Inclusion, that seek to confront and redress these issues.

Please feel free to chime in if you’ve tried something similar with your courses, or if you have any questions, suggestions, or comments about my process.

Scraping Samuel Richardson

It’s hard enough to read Samuel Richardson’s Pamela. It’s even harder to finish his later, longer epistolary novel: Clarissa, or, the History of a Young Lady (1748) [984,870 words]. Having toiled through both books, I was resting easy until confronted with a curious volume that I volunteered to present on in a graduate seminar on the 18C novel. The title? A collection of the moral and instructive sentiments, maxims, cautions, and reflexions, contained in the histories of Pamela, Clarissa, and Sir Charles Grandison (1755). The CMIS, as I’ll abbreviate it, consists of several hundred topics, each with multiple entries that consists of a short summary and a page reference. Here’s an example from the Clarissa section, thanks to ECCO.

Without going into the meaning or importance of these references, I want to focus on a practical problem: how could we extract every single citation of the eight-volume “Octavo Edition” of 1751? Our most basic data structure should be able to capture the volume and page numbers and associate them with the correct topic. While Richardson may very well have used some kind of card index, I can safely say that no subsequent reader or critic has bothered to count anything in the CMIS. But its very structure demands a database!

As a novice user of Python, it will be somewhat embarrassing to share the script I wrote to “scrape” the page numbers from an e-text of the CMIS (subscription required) helpfully prepared by the wonderful people and machines at the Text Creation Partnership (TCP). The TCP’s version was essential since the OCR-produced text (using ABBYY FineReader 8.0) at the Internet Archive is riddled with errors.

I started by cutting and pasting two things into text files in my Python directory: (1) the full contents of the Clarissa section of the CMIS and (2) a list of all 136 topics (from “Adversity. Affliction. Calamity. Misfortune.” to “Youth”) pulled from the TCP table of contents page.

import sys
import re
from collections import defaultdict
from rome import Roman

The first step is to import the modules we’ll need. “Sys” and “re” (regular expressions) are standard; default dictionary is a super helpful way to set the default key-value to 0 (or anything you choose) and avoid key errors; rome is a third-party package that converted Roman numerals to Arabic.

# Read in two files: (1) digitized 'Sentiments' (2) TOC of topics
f1 = open(sys.argv[1], 'r')
f2 = open(sys.argv[2], 'r')

# Create topics list, filtering out alphabetical headings
topics = [line.strip() for line in f2 if len(line) > 3]

# Dictionary for converting volumes into one series of pages
volume = {1:348, 2:355, 3:352, 4:385, 5:358, 6:431, 7:442, 8:399}
startPage = {1:0, 2:348, 3:703, 4:1055, 5:1440, 6:1798, 7:2229, 8:2671}

This section of code reads in the two files as ‘f1’ and ‘f2.’ I’ll grab the contents of f2 and write them to a list called ‘topics,’ doing a little cleanup on the way. Essentially, the list comprehension filters out the alphabetical headings like “A.” or “Z.” (since these are less than three characters in length. Now I have an array of all 136 topics which I can loop over to check if a line in my main file is a topic heading. You probably noticed that the references in CMIS were formatted by volume and page. I’d like to get rid of the volume number and convert all citations to a ‘global’ page number. The first dictionary lists the volume and its total number of pages; the second contains the overall page number at which any given volume begins. Thus, the final volume starts at page 2,671.

counter = 0
match = ''

# Core dictionaries: (1) citations ranked by frequency and (2) sorted by location
frequency = defaultdict(lambda: 0)
location = {}

# Loop over datafile
for line in f1:
    if line.strip() in topics:
        match = line.strip()
        counter += 1
        location[match] = [counter, []]

OK, the hardest thing for me was making sure the extracted references got tossed in the right topic bin. So I initialized a counter that would increment each time the code hits a new topic. The blank string ‘match’ will keep track of the topic name. The loop goes through each line in the main file, f1. The first if statement checks if the line (with white space stripped off) is present in the topics list. If it does, then counter and match update and a key with the topic name (e.g. “Youth.”) is created in the location dictionary. The values for this key will be a list: location[“Youth.”][0] equals 136, since this is the last topic.

elif'[iv]+..*(?=[)', line):
        citation ='[iv]+..*(?=[)', line)
        process = [x for x in re.split('W', if re.match('(d|[iv]+)', x)]
        current = ''
        for i in range(len(process)):
            if re.match(r'[iv]+', process[i]):
                current = process[i]
                #frequency[(int(Roman(current)), process[i])] += 1
                page = startPage[int(Roman(current))] + int(process[i])
                frequency[page] += 1

This is the heart of the code. The else-if statement deals with all lines that are NOT topic headings AND contain the regular expression I have specified. Let’s break down the regex:


Brackets mean disjunction: so either ‘i’ OR ‘v’ is what we’re looking for. The Kleene plus (‘+’) says we need to have at least one of the immediately previous pattern, i.e. the ‘[iv]’. Then we escape the period using a backslash, because we only need to get the Roman numerals up to eight (‘viii’) followed by a period. The second period is a special wildcard and the Kleene star right after means we can have as many wildcards as we want up until the parentheses, which contain a lookahead assertion. The lookahead checks for a left bracket (remember how the citations always include the duodecimo references in brackets). In English, then, the regex checks for some combination of i’s and v’s followed by a period that is followed, at some point, by a bracket.

The process variable runs through the string returned in the regex expression and splits the Roman and Arabic numerals by whitespace, appending them to a list. The string “People in Adversity should endeavour to preserve laud|able customs, that so, if sun-shine return, they may not be losers by their trials, ii. 58. 310. [149. iii. 44].” would be returned as “ii. 58. 310. [” by the regex and then turned into [ii, 58, 310] by process. Current is an empty string designed to hold the current Roman numeral so we know, for instance, which volume to match up page 310 with. In the final lines, the current Roman numeral is converted to its startPage number and the page number is added to it. Then the frequency dictionary for that specific page is incremented and the key for the current topic in the location dictionary is updated with the newly extracted page number.

Obviously, this is a rather crude method. It’d be fun to optimize it (and I do need to fix it up so that it can deal with the handful of citations marked by ‘ibid.’), but scraping is supposed to be quick-and-dirty because it really only works with the specific document or webpage that you’re encountering. I doubt this code would do anything useful for other concordance-like texts in the TCP. But I would love to hear suggestions for how it could be better.

In a later post, I’ll talk about the problems I’ve faced in visualizing the data extracted from the CMIS.