One man’s trash . . . is another man’s archive

“The most difficult thing about collecting is discarding.”
– Albert Köster

photo by @jmhuculak

The photo above was taken outside Sterling Memorial Library at Yale University. Those long rectangular drawers you see are what’s left of of that pre-digital archive known as the card catalog. The genealogy of this “universal paper machine”  has been detailed by Markus Krajewski in his delightful book Paper Machines. About Cards & Catalogs, 1548 -1929. Far from being the first form of reference technology, this system is only one in a long series of attempts to discover, store and classify knowledge. Yet the transition from the painstakingly compiled paper archive to the extended technological networks which are replacing that archive is more than a simple change in office furniture. The dumpster’s contents signal a change far more dramatic than replacing an index card with a doi, or swapping the cabinetry for a computer.

With a card catalog, the information on the index card would signal to its reader a wide range of information. This exchange between reader and the material read was relatively unproblematic, unless of course the information contained on the card catalog was written in an unfamiliar alphabet or language, or the reader lacked the basic literacy required to grasp the information. Thanks to this information, our reader might find have been able to find the location of the books in the library, some broad subject headings, and other bibliographical information. The reader would have acted on that information by either requesting the volume at the circulation desk, or moving on to a different bibliographic record all together.

In a similar vein, the Resource Description Framework  represents information about resources in the World Wide Web, but instead of using natural language on index cards to communicate sufficient meaning to our curious reader, it communicates that information in a machine readable form. It represents similar metadata about Web resources, such as the title, author, and modification date of a Web page, in addition to any copyright or licensing information. Yet unlike the card catalog, RDF  allows this information to be processed by applications, rather than being  displayed to people. It provides a common framework for expressing this information, so it can be exchanged between applications without loss of meaning.

photo by @jmhuculak

The photo on the left provides a good example. The data stored in the card catalogs could be compared to an application running on a single machine. The only people needing to understand the meaning of a given variable such as “author” or “date of publication” are those who consulted that card catalog directly. In the case of an application running on a single machine, those people would be the programmers reading the source code. But if we want the data contained in this card catalog to participate in a larger network, such as the world wide web, the meanings of the messages the applications exchange, “author”, “date of publication,” etc. need to be explicit.

In fact, currently far too much of the data fueling web applications is prevented from being shared and integrated into other Internet applications. The compartmentalization of the card catalog has carried over into web applications and data transmission becomes entangled in stovepipe systems, or “systems procured and developed to solve a specific problem, characterized by a limited focus and functionality, and containing data that cannot be easily shared with other systems.” (DOE 1999) These applications, instead of allowing users to combine data in new ways to make powerful and compelling connections, risk becoming the digital equivalent of an abandoned card catalog in a dumpster.

As more and more digital humanists share and distribute their work via the world wide web, a working knowledge of the importance of programming for the semantic web becomes essential. Simple mechanisms such as RDF play a key role in transmitting semantic data between machines while allowing applications to combine data in new ways. Much like The Fantastic Flying Books of Mr. Morris LessmoreRDF allows meaningful data transmission to rejoin the many applications hiding behind web interfaces. It transforms what might have been discarded into data rich applications. It also enables digital humanists to join their work to a larger ocean in “the stream of stories.”

Different parts of the Ocean contained different sorts of stories, and as all the stories that had ever been told and many that were still in the process of being invented could be found here, the Ocean of the Streams of Story was in fact the biggest library in the universe. And because the stories were held here in fluid form, they retained the ability to change, to become new versions of themselves, to join up with other stories and so become yet other stories; so that unlike a library of books the Ocean of the Streams of Story was much more than a storeroom of yarns. It was not dead but alive.

Salman Rushdie, Haroun and the Sea of Stories

cross posted at HASTAC