Designing research repositories using automated workflows and machine actionable data management plans / by Asztrik Bakos
Schlagwörter (EN)digital preservation / digital repository / machine actionability / OAIS / workflow / data management / data management plan / BPMN / Alfresco / Archivematica
Zusammenfassung (Englisch)

Using a digital repository is an effective way to share research results. The task is not only to publish, but also to provide clear information on metadata, provenance and licenses. Repositories help the reuse of published scientific material and digital preservation techniques enable long-term access for the stored data. A repository requires a data management plan, which describes the correct means of maintenance. Uploading the research material however may occur for the researchers as yet another bureaucratic step. They tend to deposit data at a very late stage of the research project, when some of the earlier outputs are not available anymore, therefore the uploaded metadata and provenance information will not be complete. Depositing research results requires knowledge on digital preservation, assistance for the technical infrastructure - which costs time and effort. ^The aim of this work is to offer a method to automate the preservation as much as possible, and let researchers concentrate on the scientific aspects of a project. To achieve that, we have analyzed how research data management policies influence the data management plans and proposed a template which makes machine actionability possible for them. We have built an executable workflow model using the business process model notation for the data ingest processes and have set up a working demonstration on an Alfresco server. We have also extended Alfresco with a plugin that can run arbitrary preservation tools. To have a basis for comparison we have configured an Archivematica instance - a classical repository implementing the OAIS schema with a fixed preservation workflow. By comparing Alfresco and Archivematica we showed that Alfresco not only manages to preserve the files exactly as Archivematica does, but can also use a more complex preservation workflow. ^We concluded that properly depositing research files - according to the data management plan - is possible during the project with minimal effort required from the researchers. This means that the amount of user interaction can be reduced only to uploading the files and starting the workflow - the rest of the preservation will be done safely and silently in the background.

