PREMIS is the de-facto standard for preservation metadata. The standard, which is hosted by the Library of Congress and maintained by the PREMIS Editorial board, is currently in version 2.2 of its data dictionary. It models actions and characteristics about archival entities (intellectual entities) within a repository as agents, events, objects or rights with sets of relationships between these main entities.
The PREMIS Implementation Fair is a yearly event which is held in conjunction with iPRES, the largest digital preservation conference, and brings together PREMIS implementers and the PREMIS editorial board in a joint effort to best steer the development of the open standard for preservation metadata. This year’s event was held as part of iPRES 2014 at the State Library of Victoria in Melbourne, Australia, on October 6th, 2014. It brought together an international group of about 25 participants whose institutions’ digital preservation repositories are either already using PREMIS or planning to implement it.
The DURAARK project seized this opportunity to talk about the plans to implement PREMIS within the DURAARK workbench and about the questions which have been encountered as part of the implementation efforts. In particular, three issues which the DURAARK project has encountered were presented:
- The PREMIS Data Dictionary defines preservation metadata as “the information a repository uses to support the digital preservation process”. While PREMIS is not entirely repository-centric per definition, known implementations are usually very specific to that repository’s needs. The DURAARK workbench, however, is meant to be a service to a wide variety of repositories. As such, it is meant to run outside a repository without any knowledge of the system it will pass the information to.
The question here is whether the DURAARK workbench shall describe itself only as an agent to a repository or whether the DURAARK workbench shall describe itself as a stand-alone eco-system in itself.
- As the DURAARK workbench is a pre-ingest system which covers multiple tasks and wraps separate tools, e.g., for file format identification, metadata extraction and semantic enrichment, the status of a single agent would hardly cover all preservation metadata output of the pre-ingest workbench. Pre-Ingest workflows are comparatively new to the digital preservation domain – while the main focus of earlier efforts has been put on the needs of the organization / the repository which is responsible for the long-term stewardship of objects, questions around earlier processes have been arising only recently. Due to this “pre-ingest” dependencies and implications are not explicitly covered in standards like the OAIS or in PREMIS. The question is, how information about the external pre-ingest service can be described meaningfully to the repositories and what level of granularity is called for.
- The last point raised by the project pertains to the way in which the Intellectual Entity is handled. Plainly said, an intellectual entity is “the thing we want to archive”. According to the PREMIS data dictionary intellectual entities can be nested, meaning that one intellectual entity can contain more intellectual entities within. However, no reference implementation is known. Within DURAARK, “a building / structure” is considered an intellectual entity. Representations of the entity, such as a point-cloud scan from September 2014 and an IFC plan from August 2013, always stand in temporal / spatial relationships as the building changes for various reasons or as the representations may only describe a part of the building. They are in some ways different representations of the intellectual entity “building”, but they may significantly differ in content which makes them intellectual entities in themselves.
Discussion with the editorial board representatives during the implementation fair put forth the possibility of using the “environment” entity, which will be newly introduced in the forthcoming PREMIS v3, as a good place to describe the DURAARK workbench. While the existing use cases for the “environment” entity contain rendering environments such as library reading rooms or emulation environments, the DURAARK project has now contributed a new use case to this in form of the pre-ingest workbench. The entity will allow for detailed description of the different agents involved in the workbench’s processes and will enable the pre-ingest workbench to capture information at different levels of granularity.
While it was confirmed that nested entity structures are possible per design, no reference implementation was found and the search continues there.
The slides of Michelle Lindlar’s presentation at the PREMIS Implementation Fair are available here: