Monday, April 13, 2015

A Short(ish) History of Digital Curation at the Bentley Historical Library

Our inaugural post introduced readers to the Bentley Historical Library and University of Michigan Library's "ArchivesSpace-Archivematica-DSpace Workflow Integration" project.  In this post, I'd like to give some background and context for the Bentley's involvement in this project.

For close to two decades, the Bentley Historical Library has actively collected, processed, preserved, and provided access to born-digital archives from the University of Michigan as well as from private individuals and organizations from around the state.  These experiences have provided a strong foundation for the planning and implementation work already underway with the grant.

Laying the Foundation (1997-2008)

As early as 1979, archivists in the Bentley's University Archives and Records Program (UARP) were discussing the challenges posed by new technologies and machine readable records.  A 1991 NHPRC grant, “Study on the Uses of Electronic Communication to Document an Academic Community,” provided an opportunity for the library to explore the topic in more depth.  It was not until 1997, however, that the Bentley received its first significant collection of born-digital archives: the Macintosh personal computer of former University of Michigan President James J. Duderstadt.

At the time, Electronic Records Archivist Nancy Deromedi developed a preservation strategy for the approximately 2,100 files in the accession that included running virus scans, documenting file and folder naming conventions, and migrating content from the original MORE 3.1 and Microsoft Word 6.0 file formats to the Word 97 (and later PDF/A) format.  Through the late 1990s and early 2000s, the Bentley continued to collect born-digital archives and Deromedi later published accounts of her strategies in a series of SAA Campus Case Studies:
Deromedi also initiated a web archiving program at the Bentley in 2000 using desktop applications such as HTTrack and Teleport Pro to capture snapshots of the websites of key academic and administrative units and also documenting events such as the university's response to the Y2K bug and the Grutter v. Bollinger Supreme Court case on the use of affirmative action in Law School admission decisions.

These early efforts were instrumental in capturing historical and administrative records of long-term value, but each involved developing unique preservation strategies and relied upon heavily manual procedures.  As the university's production of electronic records with archival value increased, the library faced challenges of scalability and sustainability: the Bentley lacked in-house IT staff and extensive technical expertise and Deromedi balanced numerous responsibilities in addition to her work with digital archives.

MeMail (2009-2011)

Given the above issues and UARP's interest in developing a more proactive approach to documenting the history of the modern university, the Bentley launched the “MeMail” Project (formally titled "Email Archiving at the University of Michigan") in 2009 to explore strategies to collect and preserve the email of key administrators.  Email was selected as the project's focus because (a) the archives was no longer receiving correspondence with the same volume and regularity as in earlier decades and (b) the myriad complexities posed by email (unique platforms, proprietary formats, relationships of attachments to messages and messages to threads of correspondence, etc.) would help the Bentley enhance its overall capacity to preserve and provide access to digital content of unique, essential, and enduring value. 

A generous two-year grant from the Andrew W. Mellon Foundation in January 2010 allowed UARP to partner with the university’s Information and Technology Services (ITS) bringing both archival and IT expertise to bear on digital curation.  The grant also enabled UARP to hire two full-time archivists to serve as the project’s functional and technical leads. Working with ITS, project staff developed a system of 'archival mailboxes' that participating administrators used to collect email of long-term value (by dragging/dropping, forwarding, or CC'ing).  Having administrators conduct the appraisal and selection of their correspondence proved to be difficult due to the cumulative value of email threads and participants' concern over third-party privacy.  The university's decision in December 2010 to adopt Google collaborative tools further complicated the project by making the 'archival mailbox' strategy impractical.  For more information on the lessons learned from these efforts, see Functional Lead Aprille McKay's two SAA campus case studies.

The planning, development, and implementation work associated with MeMail laid the foundations for the Bentley's current digital curation program.  As Technical Lead, I explored software, procedures, and workflows required to ingest and preserve email and attachments (Office files, images, audio, video, etc.).  I worked closely with McKay and others to identify rights and access issues associated with acquiring digital content and making it accessible and also developed policies and procedures to address sensitive personal information (SSNs, credit card numbers, etc.).  In addition, the project gave us an impetus to review and enhance our infrastructure: we acquired secure server space to store our backlog and conduct ingest procedures and also negotiated for expanded use of Deep Blue, the University of Michigan's DSpace repository (with another copy of material stored in a local dark archive managed by ITS).

Digital Curation Division (2011-2014)

One of the most valuable legacies of the MeMail Project was that it helped the Bentley document the needs and demands of administrative and academic units for the preservation of University of Michigan digital assets.  With this information, then-Director Fran Blouin successfully lobbied for the creation of a new Digital Curation Division (headed by Nancy Deromedi) and the addition of a permanent position (yours truly).  Based upon the research and extensive testing from earlier phases of MeMail, we defined functional and technical requirements for digital archives ingest and processing procedures appropriate for our local needs and resources.  This work permitted us to draft a workflow diagram (which has since been updated a number of times) and accompanying guidelines for the manual processing of born-digital materials.  Progress on this manual workflow was tracked on a checklist that included more than 40 discrete steps and required staff to operate some twenty applications and command line utilities, follow strict naming conventions for directories and log files, and generate or record preservation metadata by hand.  While effective, this approach was highly-labor intensive, posed challenges for training staff, and presented numerous opportunities for user error.

Hoping to overcome these constraints and enable more of our staff to work with digital content, I started to explore the possibility of automating workflow steps.  In setting out, I was particularly influenced by the Archivematica digital preservation system and its 'microservice' design, whereby a specific tool is implemented to perform a specific function (and may be swapped out or replaced by another without impacting the rest of the system).  After a successful proof of concept in automating our format migration procedures (a step that creates preservation copies of content based upon migration pathways that reflect professional standards and best practices), I set about revising other steps.  By early 2012, I had produced the AutomatedProcessor (or AutoPro), a collection of 33 Visual Basic and Windows CMD.EXE shell scripts that moved content through an 11 step workflow.
AutoPro splash screen
Nancy Deromedi and I presented a poster on this work at the 2012 iPRES conference and I have continually refined and streamlined features in the intervening years to make procedures more efficient and user friendly. A comparison between earlier versions of the AutoPro user manual and our current procedures for digital processing reveals some alterations in the number and order of workflow steps and significant changes in the interface for adding descriptive and administrative metadata to content.  For more information on the basic AutoPro workflow and related procedures, see the overview in our manual.

Working Smarter (2014-)

Since its introduction, AutoPro has been used to prepare more than 230 accessions of digital content (approx. 1.2 TB) for deposit in our Deep Blue repository.  In helping us to address a growing backlog of digital archives in a standardized manner, the tool has been a smashing success.  At the same time, AutoPro was never intended to be a final solution for the Bentley: the command line interface is not particularly intuitive or user friendly, the CMD.EXE scripts have poor error-handling functionality, and maintaining and updating the scripts and software on individual workstations often takes an inordinate amount of time.  We also realized that we were entering the same descriptive and administrative metadata in numerous locations: once in our finding aids, again in our processing workflow (so that descriptions of content could be stored alongside materials in the Archival Information Package), and a third time when we manually uploaded material to the Deep Blue DSpace repository.

Given these complications and inefficiencies, Nancy Deromedi and I considered options for more than a year before deciding to explore integrating functionality of Archivematica and ArchivesSpace into a single workflow and automating the deposit of material into DSpace.  While the idea of bringing together these systems (especially the former two) has been discussed in various circles for years, the Bentley was fortunate enough to secure grant funding to push development work forward.  I've already described our basic goals and strategy in our first post—in the next one, I'll discuss the challenges and progress we've encountered thus far.  Stay tuned!

No comments:

Post a Comment