The Project

The Project: Digital Humanities and Data Visualization

It is vitally important to acknowledge the inherent nature of the data contained in morning report books and the limits of what it can be accurately used to reconstruct or represent. The process of converting statistical data into meaningful visualizations is one fraught with challenges and epistemological assumptions that have been identified and critically examined by digital humanities theorists. Most influentially, theorist Johanna Drucker cautions practitioners against losing sight of their professional training as humanists when converting hard numbers into digital visualizations. The quantitative data researchers uncover in statistical compilations is “constructed,” not “given,” she explains. She argues for referring to such material as “capta” instead of “data” in an attempt to re-conceptualize the very nature of information and its relationship to knowledge production.1 Not simply a pedantic reform, Drucker’s concern foregrounds the importance of “acknowledg[ing] the situated, partial, and constitutive character of knowledge production” in humanist study, and thus the importance of exploring new ways of data visualization that highlight these factors instead of relying on visualization tools (i.e., charts and graphs) from the “hard sciences.”2

Taking Drucker’s advice to heart when approaching morning report data means, in part, acknowledging the inherently constructed and artificial nature of the data itself. For example, although the Orderly Sergeant’s blank included categorical columns labeled “Present for Duty, Privates,” “Present, Extra and Daily Duty, Privates,” and “Present, Sick, Privates,” the resulting integers provide no insight into who the soldiers so accounted for were, if on “extra duty,” what the nature of such duty was or, if sick, with what illness they were afflicted (or even if they were physically injured instead). It is, of course, equally possible that the Orderly could have miscounted or could have deliberately placed a favorite subordinate in the “Present for Duty” category even though the soldier was still at home overstaying his leave. The data does not account for changes in condition over the course of a given day (i.e., soldiers who became ill after the morning report was submitted, men wounded in action later that day, etc.). But for all of its limitations, the company and regimental-level morning report data does provide considerably more detail than that of higher-level consolidated reports, making it the highest definition source available to the historian. More problematic than shortcomings in detail is the reality that little consensus between orderlies or regimental adjutant officers seems to have existed as to what ought to constitute inclusion within any given category. Whereas one Orderly might have mercifully marked a soldier who was absent visiting a brother in an adjacent regiment as “Present for Duty,” another might have angrily accounted for the same individual as “Absent without Leave.” These very human decisions disappear when reduced to bare integers, but it is to qualitative sources that the historian must look to assist in filling the gaps and breathing life into the data (or “capta,” if so inclined). Visualizations of statistical data can provide the structural foundation for the exploration of these historical organizations, but that structure must be built upon with traditional qualitative research.

The real thrust of Drucker’s concerns turn these considerations toward a reevaluation of how humanist scholars use visualizations to represent such constructed data. In sum, she explains, “an information visualization allows complex datasets to be perceived in an efficient manner, rendering patterns legible across sets of numbers whose relation is almost impossible to discern in spreadsheet form.”3 Indeed, the long strings of “7, 7, 7, 7, 8, 8, 6, 7…” that fill the columns of morning report books offer little in the way of insight in their raw form. Whereas the data was invaluable to the company commander, or even to the historian evaluating how many men were available for a given battle, the ability to consider and evaluate change over longer periods of time is difficult. Moreover, as reports are consolidated across regiments, numbers of men grow to an extent that the reader’s ability to imagine the scale of such organizations is strained. Although envisioning twenty men standing generally in line is not an onerous cognitive task, envisioning the differences between 600 and 900 can be challenging. Even when plotted on a chart or graph, the abstract nature of the raw numbers restricts their easy translation back into groups of soldiers and prevents ease of extracting meaning at a single glance.

None of this is to say that charts and graphs have no place in humanistic data visualization or in The Digital Adjutant project. Indeed, because such mediums have achieved such a hegemonic position within intellectual culture, they can play an important role in assisting users in tracking change over time within these organizations quickly and easily. Because of this, the backbone of the project’s data visualizations relies on interactive and manipulable charts of transcribed raw data from extant company- and regimental-level morning reports along with corresponding graphs tracking change over time. Developed with the power of multiple WordPress plugins and offline data manipulation and analysis software, data available to the user is sortable by date and organization, and all graphs automatically adapt to the selection parameters identified by the user through manipulation of the quantitative data. This feature allows for a degree of interactive flexibility in the user’s interaction with the data, and fosters experimentation and individual research into these organizations – a chief aim of the project as a whole.

In the end though, as Drucker explains, the creation of “accurate and appropriate representations of quantitative information is an art.”4 Discovering ways to represent and visualize statistical data requires a consideration of what such data represents and how its various meanings can best be portrayed to the researcher or reader. In this way, Drucker’s concerns are similar to those of theorists Julia Flanders and Fotis Jannidis, who consider the relationships between knowledge organization and “data modeling” within the digital humanities. Broadly defined, “data modeling” refers to the production of “formalized perspectives” of information or data that are “expressed in a way that makes it possible to gather specific information about the subject.”5 In other words, the process of modeling data is similar to that of data visualization insofar as both processes require the digital humanist to consider how most effectively and accurately to utilize quantitative data to represent (or model) a past “constructed” reality.

Historian John Thiebault and others have considered these questions through a professional historiographical lens. Digital visualizations for historians, he explains, can offer both “a means of quickly identifying patterns in large datasets…which can open new lines of research and test qualitative assumptions” and “a way to enhance the presentation of arguments, moving beyond what it is possible to display in two dimensions on paper.”6 The Digital Adjutant seeks to accomplish both of these objectives. Interactive charts and graphs which allow the user to manipulate raw data achieve the first of Thiebault’s considerations: the ability to allow for quick identification of temporal patterns within large datasets. While these graphs alone can “enhance the presentation of arguments,” they do not move “beyond what it is possible to display in two dimensions on paper.” Instead, The Digital Adjutant relies on a second arm of data visualization to achieve this end.

Together, Thiebault and Drucker both outline what they consider to be “the key dimensions” of an effective data visualization.7 Drucker argues that all visualizations are based on a sequence of “parameterization (assigning a metric), quantification (counting or measuring what has been parameterized), and translating this captured, constructed information into a graphic.”8 In many cases, the first two stages were accomplished by the Orderly Sergeant himself. In others, the researcher is required to re-organize some of the data in order to increase its capacity to offer insight into broader trends within the organization. For example, in order to ascertain the number of soldiers within a given company actually engaged in active duty on a given day, it is important to combine those listed as “Present for Duty” as well as those listed as “On Extra or Daily Duty.” The last of Drucker’s stages of “translating this captured, constructed information into a graphic,” is the real challenge.9

Thiebault’s prescriptions for effective visualizations match those of Drucker’s. Moreover, he argues that the “key dimensions” of such visualizations “are the density and the transparency of [their] information.” By “density,” he means “the sheer amount of useful information the visualization conveys,” whereas “transparency” refers to “the ease with which that information can be understood by the reader.”10 Considering that the numbers recorded within morning report books are meant to be statistical representations of men counted in accountability formations then, in accordance with the “density” and “transparency” prescriptions of Thiebault and the translation considerations of Drucker, it makes the most sense to design visualizations that most accurately recreate these accountability formations. In short, the most effective data visualizations are visual, three-dimensional reconstructions of the formations of men themselves which were the original source of the morning report data.

To accomplish this, The Digital Adjutant presents rudimentary three-dimensional digital models of accountability formations comprised of symbolically modeled “soldiers” arrayed in regulation dress parade formation according to the exact number recorded in the company or regimental morning report book. 3D formation models, designed in TinkerCad and manipulated with SketchFab applications, are navigable and manipulable by users, complete with interactive annotations for each individual in the formation allowing the viewer to both (1) identify which model represents which soldier, and (2) maneuver the camera to the viewpoint of the individual in formation to see the formation from his perspective. This degree of accuracy is only possible because of the constraints of military regulations governing the structure of these formations. Because company formations were organized by rank and then (from right to left) by height, the exact position of each individual (with the assistance of “height” recordings for each soldier within the company “descriptive book”) can be more or less positively ascertained. Granted, while there can be no certainty that on any given morning a more informal process of taking roll call might not have prevailed in a given company, the recorded data is an expression of a formation that was supposed to exist according to regulation, and thus such a visualization expresses the original intent of the data.

< The Sources | DH Context >

1 Drucker, “Graphical Approaches,” 244.
2 Johanna Drucker, “Humanities Approaches to Graphical Display,” Digital Humanities Quarterly 5 (2011), accessed October 5, 2016, http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html.
3 Drucker, “Graphical Approaches,” 239.
4 Ibid., 239.
5 Julia Flanders and Fotis Jannidis, “Data Modeling,” in A New Companion to Digital Humanities,ed. Susan Schreibman et. al. (Chichester: John Wiley & Sons, Ltd., 2015), 229-230.
6 John Theibault, “Visualizations and Historical Arguments,” in Writing History in the Digital Age, ed. Kristen Nawrotzki and Jack Dougherty. (Ann Arbor: University of Michigan Press, 2013), accessed October 4, 2016, http://dx.doi.org/10.3998/dh.12230987.0001.001.
7 Theibault, “Visualizations and Historical Arguments.”
8 Drucker, “Graphical Approaches,” 245.
9 Ibid.
10 Theibault, “Visualizations and Historical Arguments.”