October 29, 2012

Emerging Media Digital Feed - The Applications of Digital History Research and Topic Modeling

On Friday, October 19th, Dr. Douglas Seefeldt (Assistant Professor of History, and Emerging Media Fellow) and Dr. James Connolly (Director, Center for Middletown Studies, and Professor of History) showcased their digital history research projects which utilize multiple techniques, the most interesting is topic modeling. Topic modeling is a statistical model method, utilizing computer programs, for discovering and extracting 'topics' and vocabulary patterns within one or many electronic documents. This serves the purposes of analysis and application of analyzed data. Through this analysis, one could more easily verify the validity of a text, narrow down a large list of possible research sources based on key words and phrases, or check for plagiarism and duplicate entries.

Dr. Seefeldt's projects focus primarily on historical texts, like newspaper articles and the analysis and comparison of those documents to find correlative truth. Many times the same incident is reported differently by various sources, the analysis and comparison of the sources leads to evidence of the ways that rhetoric was shaped and how it was conveyed in discourse. Some of these projects include:
In the presentation, Dr. Seefeldt stated that the original project had been best practices of converting historical texts into a digital format, but has since shifted to exploring what can be done with the text(s) after digitization and encoding of the body of texts. Terminology used to describe the nature and function of projects like these have has evolved from the mid-1990s to the present: from digital archive, digital scholarly edition, and thematic research archive, to new model scholarship. One noticeable feature is that this kind of scholarship is what Dino Buzzetti and Jerome McGann have described as a “special, highly sophisticated type of self-reflexive communication” that requires a digital environment.[1] The scholarly objective of these projects is not to simply create collections of “descriptive” text markup (encoding electronic texts to identify or describe parts of a document) the way a library or archive might, but rather, to build upon that foundation to add an “interpretive” markup that is the hallmark of scholarship. In addition to the “concept highlighting” feature of visualizing the text markup, Voyant is one tool Seefeldt utilized to illustrate what topic modeling can accomplish. This tool allows the user to input any text and analyze the 'topics' within it by Word Count, Word Trends, Word Clouds, and Keywords in Context. Voyant is openly available and is currently available in a beta version. It is usable by anyone and was shown to illustrate how this technology works to help reveal patterns in small bodies, like the approximately 150 items in Seefeldt’s Mountain Meadows Massacre project, or much larger bodies, like the tens of thousands of texts that will eventually comprise the William F. Cody Archive.

Dr. Connolly's scholarly work also utilizes electronic texts and topic modeling, but with a very different application. The "What Middletown Read" project is a database and search engine built upon the circulation records of the Muncie Public Library (Muncie, IN) from 1891 to 1902. This relational database is broken down by demographics (gender, race, social class, etc.) and is completely searchable. Through this database, one can assess the kinds of people that were checking out specific books and/or authors. Borrowing patterns can be extracted and analyzed to determine what books and authors were most popular to each demographic. This information would be nearly impossible to manually assess. Topic modeling is used in this project to search for themes in popularly borrowed books to gain insight into what the reading experience might have been like for borrowers in the specified time period. The applications for this technology are nearly endless. Currently, it is being applied to digital audio and video by other researchers.

Dr Connolly's “What Middletown Read” project was externally funded by the National Endowment for the Humanities, the Gladys Kreible Delmas Foundation, and Ball State University’s Office of the Provost. Dr. Seefeldt's projects at the University of Nebraska–Lincoln were externally funded as well.

[1] Dino Buzzetti and Jerome McGann, “Electronic Textual Editing: Critical Editing in a Digital Horizon,” in L. Burnard, K. O'Brien, and J. Unsworth, eds., Electronic Textual Editing (New York: Modern Language Association of America, 2006).

No comments:

Post a Comment