Friday, September 16, 2016

On Square Pegs, Round Holes, PREMIS Rights Statements and Apollo 13

As mentioned before in a previous post on PREMIS and PREMIS Rights Statements, we've been exploring ways that we can create rights statements as we're processing SIPs in Archivematica and then use those rights statements to set access profiles for the AIPs in our DSpace repository.

At that point, our our thinking was mostly theoretical. Since then, we've had some time to think about it, to confer with our MLibrary colleagues as well as those at the Rockefeller Archive Center and even reflect on Ed Pinsent's comments on the last post (thanks, everyone!). In this post, I'd like to give an update on how we plan (yes, still just a plan--things could change!) to actually do it. Before I dive in, though, I should remind our readers that what we're proposing here is a bit like trying, as the expression goes, to fit a "square peg in a round hole." Here's a quote from the PREMIS Data Dictionary for Preservation Metadata:
PREMIS primarily defines characteristics of Rights and permissions concerned with preservation activities, not those associated with access and/or distribution.

Yikes. "Not those associated with access and/or distribution." hrm...

The Access Profiles

Let's start at the end. In DeepBlue, our DSpace repository, we have some amount of control over both a digital object--or, to use DSpace-speak, bitstream(s)--and its associated metadata--or item. We can associate each of them (independently of one another) with one of four (or actually as many as we care to create) of what are called groups. In practice, we apply a handful of common combinations of item and bitream(s) groups when we deposit AIPs in our DSpace repository:
  • Open: Both items and bitstreams are open to be viewed/downloaded by anyone in the whole world.
  • Bentley Reading Room users: While items can be viewed by anyone, downloading of bitstreams must be done from within the Bentley's IP range. This can be from a wired or wireless connection.
  • University of Michigan users: Items can be viewed by anyone, but only University of Michigan affiliates may download bitstreams.
  • Totally restricted/embargoed items: Nobody (except Bentley archivists, who may also fulfill reference requests) can view or download anything, item or bitstream(s). Typically, these types of things are embargoed until a particular date (based on local policies), at which point in time both item and bitstream(s) will become open.[1]
  • Audio/visual items with copyright or other types of concerns: Items can be viewed by anyone, but only Bentley archivists can download bitstreams. Heretofore this profile consists mostly of audio/visual material that is preserved in DSpace but made available for streaming (not downloading) in the Bentley Digital Media Library.

A Quick Refresher on Act[ion]s in PREMIS Rights Statements

As a reminder, PREMIS Rights Statements are made up of one basis (the raison d'ĂȘtre of the rights statement, something like copyright or policy) and one or more actions associated with that basis (these are very specific actions the repository is or isn't allowed to do). Since the basis won't have an impact on its associated action, I won't go into much detail about them here.

Actions come from a controlled vocabulary, made up of things like:
  • replicate: make an exact copy
  • migration: make a copy identical in content in a different file format
  • modify: make a version different in content
  • use: read without copying or modifying
  • disseminate: create a copy or version for use outside of the preservation repository
  • delete: remove from the repository

As you can see, these have a very "digital preservation" feel (in the most narrow sense of the word[2]). Hence the data dictionary's warning above.

Actions may be allowed in all cases or, of course, they may have restrictions. These express situations where, for instance, dissemination is permitted, but only to a specific type of person (say, one that's affiliated with your institution), or, taken to the extreme, that dissemination is not permitted, period. At least in Archivematica's case, you've got three choices to that express such restrictions: allow, disallow or conditional. This may sound like it covers a lot, but as you'll see, we had to get a little creative with this, as we end up using "conditional" to describe a number of different conditions.

Other than that, there are some begin and end dates associated with that action, and a note containing a textual description of the right granted if additional description is needed.

Mapping PREMIS Rights Statements in Archivematica to DSpace Groups

SIP rights template--second page

Now on to mapping the PREMIS Rights Statements implementation in Archivematica to the groups in DSpace. There's a couple different ways we might have approached this.

One way might have been to try to use an Act to tell the repository exactly what it was allowed to do with both the item and the bitstream for a particular AIP. While this approach gave us the granularity we'd need for machine-actionable PREMIS Rights Statements, we worried that it would be overly cumbersome for our human processors that would be, for the most part, manually adding them to SIPs and keying in the data.

Another way might have been to use a local controlled vocabulary for the Act field, something like "disseminate-bentley", "disseminate-umich", etc. However, associating a particular target with an action seemed, in the word of one of our DSpace gurus, "somewhat contrary to the spirit of the allowed actions" (see this sample controlled vocabulary for the 'act' element, some of which I listed above). You'll also notice if you click that link that "disseminate-bentley", "disseminate-umich", etc. are not on that list, and for good reason! We even thought briefly about using the Restriction field to specify the target audience before realizing that it too has a controlled vocabulary (one that's actually enforced by Archivematica and then used later on in some logic).

In the end, we settled on using the Note field to specify audience. Now, we know this isn't the most elegant solution--in general, the intention of the notes is specifically to not be machine-actionable, but we felt that since this PREMIS Rights Statement would ultimately be preserved in the AIP (in the METS!), and since there's a chance someone might run across it outside of our repository environment, that this was the way to go.

So here's our plan, at least an overview:

DeepBlue Groups

Archivematica PREMIS Rights Statements





Restriction note

Reading Room only
Reading Room
University of Michigan only
University of Michigan
Archivists only
Archivists only
Archivists only
Executive records (ER)
Personnel records (PR)
Student records (SR)
Patient/client records (CR)

A couple of notes here:
  • We will not use PREMIS Rights Statements (at least those that apply to access/distribution) for AIPs that don't have restrictions.
  • When we do use PREMIS Rights Statements, they will be as minimal as we can make them with the intention that they will only be used by machines, not humans. Human readable rights statements will be recorded elsewhere, like ArchivesSpace Conditions Governing Access and Use notes. 
  • Most of the time, the End Date field will be OPEN, except when a Bentley policy is involved (ER, PR, SR and CR above). In those cases, an end date will let the repository know when a particular restriction expires.

Once an AIP with some sort of restriction is ready to go to DeepBlue, we'll park it somewhere temporarily[2], parse the METS file in the AIP, determine (based on the rights statements) the item and bitstream permissions, convert it to the DSpace Simple Archive Format and upload in batch to DeepBlue from there. It's sounding like the identifier for the Digital Object in ArchivesSpace will be in the AIP, so we're pretty confident we'll also be able to add the Handle back to ArchivesSpace farily easily as well.

We also think (hope!) that this approach, as long as we're consistent, would allow us to change our minds relatively easily in the future, say, if we decided after all that a more granular approach was the way to go.

But Wait! "disseminate" is Hard to Spell!

It occurred to us that in order for this approach to work, our processors can never make typos. We've all been there... this is a pretty unrealistic expectation.

For the time being, we're planning to use Greasemonkey (in Firefox) and Tampermonkey (in Chrome) to help us out with this particular problem. These are browser extensions that customize the way a web page displays or behaves using small bits of JavaScript.

We've written a fairly basic script (you can see our draft here), that looks for URL patterns that match the Add Act pages in Archivematica (as you can see in that script,*/rights/grants/*/ and*/rights/grants/*/). When it finds one, it adds an additional dropdown, like so...

It even has a nice logo!

When an option is chosen (Reading Room was chosen above), it automatically fills out the rest of the form, just like we need it. When a Bentley policy is involved (that requires an end date), it asks a processor for a creation or accession date (still working on a nice datepicker option for this), does some math, and calculates the appropriate end date. It's not the most elegant solution but we think it works for now!


In the end, it's perhaps a little clearer as to why PREMIS wasn't really meant for this kind of thing. Still, maybe square pegs sometimes do fit into round holes...

Seriously, though, let us know what you think!

[1] Although these types of things are not viewable, downloadable or even searchable in DSpace, typically we still provide a link to them in the collections finding aid. 
[2] Philosophically, I'd argue that access and distribution is a fundamental part of digital preservation... maybe the most fundamental part.
[3] At the end of the grant, AIPs without restrictions will be automatically uploaded to DSpace and recorded in ArchivesSpace without any more human intervention!

Tuesday, September 6, 2016

This One Time, At ArchivematiCamp...

While it's been a bit over a week since the inaugural ArchivematiCamp (or, as my colleagues Max and Dallas prefer, "Archivematica Camp") was held here in Ann Arbor, we're still basking in the afterglow... 36 campers and 5 counselors braved the rain and mosquitoes to gather at the University of Michigan's School of Information for two and a half days of discussions on microservices, metadata, and the mechanics of our favorite digital preservation system.  The camp's full agenda will give you some idea of the variety of topics covered in the 'Curator' and 'Technologist' streams—or maybe you were following on Twitter:

While I would be hard-pressed to summarize all the events and discussions, I did want to talk a little bit about Dallas and Max's demonstration of the new Appraisal Tab functionality we've developed as part of our grant project (and which is slated for release in version 1.6 of Archivematica).  In the Q and A period following the demo, counsellors Ben Fino-Radin and Kari Smith helped kickstart a conversation about how the functionality in the Appraisal Tab could be complemented and supplemented by additional external tools/platforms.

As one example, Ben noted that his work with audiovisual materials requires advanced technical metadata extraction and codec characterization that has not always been available in Archivematica.  (As I understand from my notes, the MediaTrace report produced through a collaboration between MoMA and MediaArea is now available in Archivematica.)

Kari brought up the possibility of integrating an email processing tool like ePADD into a workflow that also involves Archivematica.  Given the unique functionality (and awesome interface) of this platform, it doesn't really make sense to replicate it in Archivematica or to cram another full-featured external tool into the Appraisal Tab.

Instead, as we discussed in our previous post on the Archivematica Users' Group meeting at SAA, we should look at ways of establishing/facilitating 'handshakes' between platforms so that the data and any associated metadata (especially preservation or technical) can be passed along and incorporated into the Archivematica METS or maybe even acted upon by Archivematica.  For instance, if you ran bulk_extractor on a disk image in the BitCurator environment, it would be nice to reuse those scanner reports in Archivematica instead of having to run them again.

We're really excited that other members of the archives and digital preservation communities are thinking about how the work we've done with the Appraisal Tab can be adapted or extended to satisfy local needs and workflows! In the same spirit, Kari also asked if DIPs could be produced and likewise tied back to ArchivesSpace (yes, by extending the code!) and Ben (was this Fino-Radin or Goldman?  I'm leaning towards the latter...) asked about the possibility of creating ArchivesSpace event records based upon actions in Archivematica (totally feasible--just need some coding!).

We're hoping to blog a bit more about camp in some upcoming posts, so I'll wrap things up here by noting that my only regret from camp was the absence of the long-promised 'goodbye song':

Thank heavens the Internet can fix anything!