Category Archives: Study Methods

Combine JPEGs and PDFs with Automator

leninchristmasLike most digital historians, my personal computer is packed to the gills with thousands upon thousands of documents in myriad formats and containers: JPEG, PDF, PNG, GIF, TIFF, DOC, DOCX, TXT, RTF, EPUB, MOBI, AVI, MP3, MP4, XLSX, CSV, HTML, XML, PHP, DMG, TAR, BIN, ZIP, OGG. Well, you get the idea. The folder for my dissertation alone contains almost 100,000 discrete files. As I mentioned last year, managing and preserving all of this data can be somewhat unwieldy. One solution to this dilemma is to do our work collaboratively on the open web. My esteemed colleague and fellow digital historian Caleb McDaniel is running a neat experiment in which he and his student assistants publish all of their research notes, primary documents, drafts, presentations, and other material online in a wiki.

Although I think there is a great deal of potential in projects like these, most of us remain hopelessly mired in virtual reams of data files spread across multiple directories and devices. A common issue is a folder with 200 JPEGs from some archival box or a folder with 1,000 PDFs from a microfilm scanner. One of my regular scholarly chores is to experiment with different ways to sort, tag, manipulate, and combine these files. This time around, I would like to focus on a potential solution for the latter task. So if, like most people, you have been itching for a way to compile your entire communist Christmas card collection into a single handy document, today is your lucky day. Now you can finally finish that article on why no one ever invited Stalin over to their house during the holidays.

Combining small numbers of image files or PDFs into larger, multipage PDFs is a relatively simply point-and-click operation using Preview (for Macs) or Adobe Acrobat. But larger, more complex operations can become annoying and repetitive pretty quickly. Since I began my IT career on Linux and since my Mac runs on a similar Unix core, I tend to fall back on shell scripting for exceptionally complicated operations. The venerable, if somewhat bloated, PDFtk suite is a popular choice for the programming historian, but there are plenty of other options as well. I’ve found the pdfsplit and pdfcat tools included in the latter package to be especially valuable. At the same time, I’ve been trying to use the Mac OS X Automator more often, and I’ve found that it offers what is arguably an easier, more user friendly interface, especially for folks who may be a bit more hesitant about shell scripting.

What follows is an Automator workflow that takes an input folder of JPEGs (or PDFs) and outputs a single combined PDF with the same name as the containing folder. It can be saved as a service, so you can simply right-click any folder and run the operation within the Mac Finder. I’ve used this workflow to combine thousands of research documents into searchable digests.

Step 1: Open Automator, create a new workflow and select the “Service” template. At the top right, set it to receive selected folders in the Finder.

Step 2: Insert the “Set Value of Variable” action from the library of actions on the left. Call the variable “Input.” Below this, add a “Run Applescript” action and paste in the following commands:

on run {input}
tell application "Finder"
set FilePath to (container of (first item of input)) as alias
end tell
return FilePath
end run

Add another “Set Value of Variable” action below this and call it “Path.” This will establish the absolute path to the containing folder of your target folder for use later in the script. If this is all getting too confusing, just hang it there. It will probably make more sense by the end.

combinesmallStep 3: Add a “Get Value of Variable” action and set it to “Input.” Click on “Options” on the bottom of the action and select “Ignore this action’s input.” This part is crucial, as you are starting a new stage of the process.

Step 4: Add the “Run Shell Script” action. Set the shell to Bash and pass input “as arguments.” Then paste the following code:

echo ${1##*/}

I admit that I am cheating a little bit here. This Bash command will retrieve the title of the target folder so that your output file is named properly. There is probably an easier way to do this using Applescript, but to be honest I’m just not that well versed in Applescript. Add another “Set Value of Variable” action below the shell script and call it “FolderName” or whatever else you want to call the variable – it really doesn’t matter.

Step 5: Add another “Get Value of Variable” action and set it to “Input.” Click on “Options” on the bottom of the action and select “Ignore this action’s input.” Once again, this step is crucial, as you are starting a new stage of the process.

Step 6: Add the action to “Get Folder Contents,” followed by the action to “Sort Finder Items.” Set the latter to sort by name in ascending order. This will assure that the pages of your output PDF are in the correct order, the same order in which they appeared in the source folder.

Step 7: Add the “New PDF from Images” action. This is where the actual parsing of the JPEGs will take place. Save the output to the “Path” variable. If you don’t see this option on the list, go to the top menu and click on View –> Variables. You should now see a list of variables at the bottom of the screen. At this point, you can simply drag and drop the “Path” variable into the output box. Set the output file name to something arbitrary like “combined.” If you want to combine individual PDF files instead of images, skip this step and scroll down to the end of this list for alternative instructions.

Step 8: Add the “Rename Finder Items” action and select “Replace Text.” Set it to find “combined” in the basename and replace it with the “FolderName” variable. Once again, you can drag and drop the appropriate variable from the list at the bottom of the screen. Save the workflow as something obvious like “Combine Images into PDF,” and you’re all set. When you right-click on a folder of JPEGs (or other images) in the Finder, you should be able to select your service. Try it out on some test folders with a small number of images to make sure all is working properly. The workflow should deposit your properly-named output PDF in the same directory as the source folder.

To combine PDFs rather than image files, follow steps 1-6 above. After retrieving and sorting the folder contents, add the “Combine PDF Pages” action and set it to combine documents by appending pages. Next add an action to “Rename Finder Items” and select “Name Single Item” from the pull-down menu. Set it to name the “Basename only” and drag and drop the “FolderName” variable into the text box. Lastly, add the “Move Finder Items” action and set the location to the “Path” variable. Save the service with a name like “Combine PDFs” and you’re done.

This procedure can be modified relatively easily to parse individually-selected files rather than entire folders. A folder action worked best for me, though, so that’s what I did. Needless to say, the containing folder has to be labeled appropriately for this to work. I find that I’m much better at properly naming my research folders than I am at naming all of the individual files within them. So, again, this process worked best for me. A lot can go wrong with this workflow. Automator can be fickle, and scripting protocols are always being updated and revised, so I disavow any liability for your personal filesystem. I also welcome any comments or suggestions to improve or modify this process.

My Runaway Class

Over a decade ago, the world began to hear about the “digital native” – a new breed of young person reared on computers for whom Google, Wikipedia, Facebook, and Twitter are second nature. Digital natives thrive in an online universe where knowledge is democratized, authority is decentralized, and media is everywhere. And they are most comfortable in an environment that is fast-paced, interactive, and immediate. It reminds me of a line from Hedwig and the Angry Inch:

all our feelings and thoughts
expressed in ones and in oughts
in endless spiraling chains
you can’t decode or explain
cause you are so analog

There is a large and growing body of excellent material on the use of technology to engage digital natives in the classroom. But one thing I have learned over the past few years is that a student who is very comfortable with digital technology is not necessarily digitally literate. A student can spend twelve hours a day online but still not know how to run a sophisticated Google search or post a video, not to mention build a website or script an algorithm. A student who knows how to update her Facebook status does not necessarily know how to navigate the back end of a blog or find an article on JSTOR.

This does not mean that the high-tech classroom is a misguided endeavor – exactly the opposite. It means that educators have to work especially hard to guide students through the digital realm. We have an obligation to teach digital literacy. And since the best way to learn is by doing, I’ve been experimenting with new technologies for a while. I’d like to share the results of some recent tinkering. This is the story of my runaway class.

Last year I taught a course entitled “Slavery and Freedom in Early America.” The course is designed to be both chronological and accumulative. Beginning with Pre-Columbian slavery, it dwells on the wide spectrum of captivity and servitude under colonialism, the transition to African chattel slavery, the rise of antislavery movements, and revolutionary politics. It ends in 1830 with the third edition of David Walker’s Appeal…to the Coloured Citizens of the World. It is not so much a supplement to the traditional early American survey as an attempt to re-narrate the entire period from a substantially different perspective. Each week students are exposed to original documents coupled with the work of a professional historian. And each reading highlights different themes and interpretive strategies. The goal is to be able to marshal these different modes of interpretation to build a multifaceted view of a particular topic, culminating in a final research project.

Drawing on various active learning techniques, I attempted to make the course as dynamic as possible. We had a group blog for weekly reading responses, research prospectuses, and commentary. The blog also served as a centralized space for announcements, follow-ups, and detailed instructions for assignments (at the end of the semester I used the Anthologize plugin for WordPress to compile the entire course proceedings in book form). There were a plethora of digital images and videos, student presentations, peer instruction, and peer editing. We had a really fun, if somewhat chaotic, writing workshop speed date. We used Skype to video conference with the author of one of the required textbooks. We dug through various digital databases and related sites. We even grappled with present-day slavery through Slavery Footprint (an abolitionist social network not unlike the Quaker networks of the eighteenth century). Almost every week I asked the class about their definitions of slavery, and it was fascinating to see how they changed over time. Things really got interesting one day when I surprised them by asking them to define “freedom.” Their answers gave me a lot to think about long after the course had ended. I’ve posted the full syllabus here.

Aware of all of the discussions brewing around digital pedagogy, I gave special attention to the role of technology in the classroom. This culminated in an activity where students used their database skills to find runaway ads in colonial newspapers. Runaway wives, runaway servants, runaway children, runaway slaves – it was all fair game. I was more than a little nervous about giving the students such free reign. But the results were spectacular. The ads they unearthed were wide-ranging and rich, and no two students focused on the same thing. The sheer diversity of the material reminded me of Cathy Davidson’s musings on the brain science of attention. There is much benefit, Davidson argues, in harnessing myriad perspectives on a single topic. It is, in essence, a controlled form of crowdsourcing. Edward Ayers, the doyen of digital history, calls it “generative scholarship.”

One student found an ad for an escaped slave named Romeo, “about twenty-four years old, five feet six inches high, and well proportioned; his complexion a little of the yellowish cast.” Romeo was literate and “exercised his talents in giving passes and certificates of freedom to run-away slaves.” He ran off with a woman from a different county, “a small black girl named Juliet.” Another student found a convict with “a great many Letters and Figures on his Breast and Left Arm, some in red and some in black.” He was imprisoned in England, shipped to Virginia as a bond slave, escaped, traveled back to London, was recaptured, convicted, sent back to Virginia, and escaped again. Some students found notices of hapless travelers who had been captured and deposited in prison on suspicion of being a runaway, such as Thomas Perry, a Welshman, who could provide “no certificate of his freedom.” I also shared one of my personal favorites, a servant who eloped with his master’s wife on a pair of horses.

The students posted their ads to the course blog, and when they arrived for the following class I divided them into small groups. After some preliminary remarks, I asked them to choose an ad among the ones they had found and to write that person’s biography. This was an experiment in generative scholarship, not unlike Visualizing Emancipation or the super-neat History Harvests at the University of Nebraska. But my class was much more narrowly defined in time and scope. The students had to use their wits, their laptops, and all of the contextual information they had accrued from the readings and discussions in previous weeks. They had to build a plausible narrative for their runaway on demand, with no warning, no excuses, and no template. I circulated among the groups to monitor progress and occasionally offered questions or assistance.

The questions we asked were the typical ones employed by historians. What can you find out about Romeo and Juliet’s purported owners? What does the date tell you? What was going on in that location at that time? How many women ran away from their husbands in New York City in 1757? Was it unusual for servants to escape in groups of three or more? Did the time of year matter? How does the price offered for one runaway compare to others? What can you learn from their detailed physical descriptions? What about their profession? What about the lists of items they took with them on their journey? Is this information reliable? What governed decisions to escape or to stay? What, if anything, does this tell you about the relationship between petit marronage and grand marronage? How does this information comport with what we know about slavery in a particular place and time?

It’s shocking how much information you can glean about a person’s life after just a few minutes online, even persons who have been dead and gone for hundreds of years. The various newspaper databases – Readex, Accessible Archives, Proquest – and specialized projects, such as The Geography of Slavery in Virginia, proved invaluable. I directed students to the large collection of external databases featured on the Slavery Portal. Genealogy sites and historical map collections also came in handy. One student discovered that his subject had escaped from the same slaveholder multiple times at different points in his life. Using the Trans-Atlantic Slave Trade Database, we were able to locate the name of the ship that had carried an individual and their likely point of origin in Africa.

Students from different groups helped each other, which created a nice collaborative atmosphere. Sometimes there were dead ends, a common name or a paucity of leads. But even then the student could surmise, could use her imagination based on what she already knew about a particular time and place. And this was one of the goals of the exercise – to expose the central role of the imagination in historical practice. At the end of class, we shared what we had discovered and were able (briefly) to engage some big sociological questions about the lives and labors of colonial runaways. When I polled the students at the end of the semester about the most memorable moments of the course, the runaway class was their favorite by a wide margin. The final evaluations were among the best I have ever received.

There are aspects of this crowdsourcing experiment that I regret. I had hoped at least some students would take inspiration from the material for their final projects, and I’m sure some of the lessons from that day improved their papers. But because I scheduled the runaway class late in the semester, the students were reluctant, I think, to radically revise their project proposals. Of course, if I had run the class too early in the semester, the students would not have had the necessary background to make educated inferences about their subject. There were other snags. Because most students were not familiar or comfortable with the vast range of digital research tools out there, I had to do some hand-holding and gentle nudging. It was clear that my students needed more experience finding, using, and interpreting large online databases, not to mention Google Books, Wikipedia, Zotero, and other tools historians use every day. It might even make sense to run in-class tutorials on what researchers can do with a database like Colonial State Papers, Fold3, or Visualizing Emancipation. A large part of being an historian is just knowing what source materials are out there and how to turn them to your advantage.

I also regret not taking more detailed notes. In part because everything moved so fast, I was left without a finalized version of the students’ many fascinating discoveries. There was a lot of research and sharing going on, but not a lot of synthesis and reflection. I suppose asking the students to follow-through and actually write their speculative biographies would help. Maybe that would be a good midterm assignment? If I ran this course for years, I could easily see building a massive online database of runaways and their worlds, on a national or even international scale.

In the end, the runaway class was an object lesson in the raw energy and potential of digital history. It was interactive, immediate, and exciting. I would be interested to know if anyone has run a similar experiment or has suggestions for different ways to liven up the classroom.

Cross-posted at HASTAC

De nostri temporis studiorum ratione: Giambattista Vico and digital ecosystems

It might seem anachronistic to call on the work of an eighteenth century philosopher to elucidate some of the issues at play in the debates swirling around the digital humanities, but Giambattista Vico has been on my mind lately as we prepare for a conference on his work today and tomorrow at the Beinecke Rare Book and Manuscript Library.

Vico (1668-1744) was an Italian philosopher, rhetorician and jurist. He worked in relative obscurity during his lifetime teaching rhetoric at the University of Naples. His succinct De nostri temporis studiorum ratione (1709) provides a useful lens through which we might consider the digital humanities.  The work, known in the English translation as On the Study Methods of Our Time, was Vico’s first foray into philosophy, and was the seventh in a series of inaugural lectures given at the University of Naples in his position as professor of rhetoric.

In his lecture, Vico took aim at the inadequacy of the critical and pedagogical methods of his contemporaries while weighing the comparative merits of classical and modern culture. In order to discern just how current “study methods” might be superior or inferior to the Ancients, Vico sets up a distinction between the new arts, sciences and inventions – the constituent material of learning – and and the new instruments and aids to knowledge – the ways and means of learning.

Vico’s critique of the Moderns took issue with the logicians of Port-Royal, and their Cartesian method of compartmentalizing knowledge. For Vico, this reductive method of study precludes the human, and is inferior to that of the Ancients: “We devote all of our efforts to the investigation of physical phenomena, because their nature seems unambiguous; but we fail to inquire into human nature which, because of the freedom of man’s will, is difficult to determine.” The result, Vico warns, is that students “because of their training, which is focused on these studies, are unable to engage in the life of the community, to conduct themselves with sufficient wisdom and prudence.” By reducing what there is to know, he argues, we limit our ability to engage with the world on a broader scale. Diminished by our learning, we will be incapable of dealing practically with issues of change or transformation, which require the ability to recognize and follow the most suitable or sensible course of action.

For Vico, the methods of logicians such as Antoine Arnaud and Pierre Nicole and their followers established the constituent material of learning through a process of narrowing the domain of knowledge. Much in the same way that Francis Bacon took issue with the syllogisms of the scholastics to argue that knowledge of the world should be grounded in carefully verified facts, Vico doesn’t limit himself to providing a new method to achieve old-fashioned knowledge. He redefines what it means to know.  And, since the instruments (including logic) used by his contemporaries, those ways and means to that material which constitutes knowledge, were antecedent to the task of learning, the knowledge they yielded was determined by their premisses for their creation. In the technology they harnessed, and in the aims they fulfilled, these instruments were restricted by the discourses which produced them. In reducing knowledge to the unambiguous, the logicians of Port Royal reduced knowledge to what their brains and their technology enabled them to master.

For the digital humanities, it is this category — the ways and means of learning — that carries within it a transformative potential for the constituent material of learning, but in a radically different way from that to which Vico directed his critique. In a scribal or a print environment, the constituent material of learning is often shaped and transformed by its means of transmission (see for example the work of Roger Chartier and Peter Stallybrass); that which is yielded instrumentally, will “speak” a language inherent in the design of the instrument. The critique Vico levelled at Cartesian method, that critics had placed “their fundamental truths before, outside, and above every bodily image of reality” illustrates an extreme case of how the instrument of logic could override the countenance of real life.  In other words, if the only instrument available to us is a hammer, by constraint of  circumstances, everything looks like a nail.

In our current digital environment, we are only beginning to see how the consituent material of learning is radically transformed by the ways and means in which it is transmitted. Rather than dealing with a reduction of knowledge, this time technology has allowed us to expand knowledge into a boundless domain, one whose complexity trumps theory and whose scale defies our individual and physiological capacity to grasp it. The digital humanities is currently grappling with this conundrum: by transforming what it means to know something, particularly in a boundless domain of culture, a discipline is emerging which attempts to come to terms with current interaction of millions of different pieces of human culture, past and present, digital and analog, while critically reflecting on the very nature of human knowledge itself.

David Weinberger has addressed this very problem in his book Too Big to Know.  Weinberger makes a pithy call for a “rethinking of knowledge now that the facts aren’t the facts, experts are everywhere, and the smartest person in the room is the room.” One wonders what Giambattista Vico would have made of such a room. . .

Cross posted at HASTAC