Day 3, part I: Machine Actionable Data Management Plans at the University of Queensland
The third and the fourth day of my trip I did jointly with Danny Kingsley, Scholarly Communication Consultant and the former head of Scholarly Communication at the University of Cambridge (my former boss). Together we have met with Fei Yu, Jan Wisgerhof and Kathleen Smeaton from the University of Queensland.
Machine Actionable DMPs: theory and practice
In Europe, there are now a lot of discussions about machine-actionable data management plans (maDMPs). In the European context, traditional DMPs were created mostly as a result of funders’ requirements. Funders wished to have assurance that researchers would manage their research data responsibly. Typically, the information in data management plans was not structured and was not much re-used. The goal of machine-actionable DMPs is to make information recorded in the DMPs structured and actionable by machines. For example, if a researcher needed X amount of data storage space, appropriate storage requests would be made straight away from the DMP. Under the auspices of the Research Data Alliance, a lot of important theoretical work has been already accomplished in order to agree on a data model for maDMPs. However, at least in the European context, there has not yet been a fully functional implementation of the maDMP concept (or I am not aware of them).
It was therefore very interesting for me to visit the University of Queensland, where colleagues from the Library’s research data management team have developed a dedicated tool, the Research Data Manager. The tool, while it is not called a DMP tool, is a beautifully working and functional implementation of maDMPs.
So how did it all start?
Back in 2015, the research data team spoke with some Queensland University researchers and asked them about research data underlying their published papers. In most cases, data were either no longer findable, or very difficult to find.
Therefore, the research data team created a tool, with the intention to capture an initial, basic metadata layer about research projects and research data created by university researchers. That’s how the Research Data Manager tool (RDM tool) came to live.
The tool went through several rounds of iterations. Initially, it was started as something similar to DMPs in the European context – researchers were asked to describe their plans for data management at the beginning of their projects.
This approach, however, wasn’t very popular among researchers. They weren’t motivated to respond to long questions about their data management strategy, where a lot of information had to be copied from somewhere else, and they didn’t see the real benefit of doing this.
To respond to this feedback, colleagues from the RDM team made a lot of changes in the tool to make it more useful for researchers:
- Substantially limited the number of questions asked in a DMP
- Changed all the open text fields into lookup, multiple-choice or checkbox questions, in order to allow for structured responses
- Structured responses allowed integrations, which brought actionability to DMPs and provided tangible benefits to researchers.
So what does the Research Data Manager Tool do?
The tool has 20 very easy to answer data management questions (all are lookup fields, checkboxes, radio buttons). By replying to these questions, researchers get free 1TB of storage (capacity can be extended through the tool), which is backed up and maintained by the University. The game-changer was that the storage which researchers could request through the tool allowed them to easily collaborate online with other researchers (authentication through edugain allows easy collaboration with people from 300+ universities worldwide).
As soon as the researcher responds to the questions, the request for project storage space is immediately pushed to their supervisor for approval, and subsequently, a dedicated project space is created. Altogether, it only takes about 15 mins for researchers to receive their allocated storage.
It is all in integrations
The key principles behind the tool are simplicity and integrations. I was impressed to see how many integrations the tool had already in place. In addition to integrations with storage and authentication systems, the tool also has a direct connection with the university finance and ethics application systems. What it means is that when a researcher logs in to the tool and indicates that their project has been externally funded, they can look up their project info coming from the finance system and auto-populate the relevant fields in the RDM tool. Similarly, if a researcher indicates that they will be working with personal research data, some additional questions will appear, including a question about ethics approval. But again, instead of duplicating the information, both the ethics tool and the RDM tool are connected. If a researcher has already started an ethics application, they can look it up in the RDM tool instead of copy-pasting the content.
Digital research notebooks
An additional interesting feature of the tool is that it prompts researchers for the use of digital research notebooks (aka electronic lab notebooks, or ELNs). In the form, researchers are also asked if they would like to use digital lab notebooks for their project. If they tick the box, then an account is created for them with LabArchives, which is the institutional digital research notebook product.
At TU Delft we are currently piloting ELNs and it seems that researchers from various disciplines have different requirements for an ELN. Therefore, I was curious to know if the one-size-fits-all approach wasn’t a problem for researchers at the University of Queensland. “It is used across all disciplines. Various researchers use it in different ways. In arts and humanities, researchers simply use it as a digital replacement of their paper notebooks or sketchbooks with the advantage that they can access it anywhere on their laptop or mobile, and are fully backed up” – explained Fei.
Jan told us about the latest developments in the tool. One of the biggest successes of the team was the integration between the RDM tool and the institutional eSPACE repository. This allows researchers to easily publish selected datasets in the university repository and get a DOI for them, without the need to populate all the metadata fields – these are auto-populated based on the information in the RDM tool. During our visit, the RDM team was just celebrating the first dataset which was published in the repository through the integration (there are of course more datasets hosted in the repository, which were published before, through direct data deposition route).
The two new developments that the team is currently working on are integrations with the thesis submission system and also with popular scientific instruments. The integration with the thesis submission system means not only that theses could be automatically uploaded to the university repository, but also that students will be asked to publish their data at the same time. The integration with instruments allows instruments data to be directly added to the RDM tool, with baseline metadata, which makes the data flow much easier for researchers to manage and also allows for easy data publication. In addition, this could also enable facility managers to get statistics on the usage of tools and facilities.
All data and metadata in the RDM tool are version controlled. In addition, researchers can easily export and submit their DMPs from the system to funding bodies (albeit Australian funders don’t have strict requirements for DMPs).
The usefulness of the tool and the tangible benefits of transforming it into a ‘maDMP’ meant that researchers didn’t need to be convinced to start using it. In the first 10 months, 1000+ researchers started using the tool. At the moment it has 10,000 users. “People create a DMP and see the immediate benefits, without even knowing this is a DMP” – Fei explained.
In addition, seeing the usefulness of the tool, the graduate school made it mandatory to use the tool for all PhD students. This made PhD supervisors very happy. Many of them were worried that PhD students leave the University without leaving their data behind, or leaving them not in good order. Because every student request needs to be approved by the supervisor, supervisors are now aware of where and how students store their research data and gained better oversight over data management practices in their research groups.
An unintended benefit of these integrations was also much closer cooperation with other university services. Thanks to the joint work on the RDM tool, colleagues from other departments now all see how good data management practices are embedded within their workflows: from grant applications, through ethics approval and finishing with publication.
Back to TU Delft
Our DMP template at TU Delft has a lot of questions with simple, multiple-choice responses. However, we do not yet have integrations in place with various university tools and systems. Visiting colleagues from the University of Queensland was therefore very inspirational and, well, we have a lot of work still to do at TU Delft to transform our DMPs into maDMPs. The work done by Fei, Jan and Kathleen certainly provided us with lots of useful lessons learnt and examples we could try to adapt in our institutional setting.