Research data should be available long-term… but who is going to pay?
Written by Marta Teperek and Alastair Dunning
There are lots of drivers pushing for preservation of research data long-term and to make them Findable Accessible Interoperable are Re-usable. There is a consensus that sharing and preserving data makes research more efficient (no need to generate same data all over again), more innovative (data re-use across disciplines) and more reproducible (data supporting research findings are available for scrutiny and validation). Consequently, most funding bodies require that research data are stored, preserved and made available for at least 10 years.
For example, the European Commission requires that projects “develop a Data Management Plan (DMP), in which they will specify what data will be open: detailing what data the project will generate, whether and how it will be exploited or made accessible for verification and re-use, and how it will be curated and preserved.”
But who should pay for that long-term data storage and preservation?
Given that most funding bodies now require that research data is preserved and made available long-term, it is perhaps natural to think that funding bodies would financially support researchers in meeting these new requirements. Coming back to the previous example, the funding guide for the European Commission’s Horizon 2020 funding programme says that “costs associated with open access to research data, including the creation of the data management plan, can be claimed as eligible costs of any Horizon 2020 grant.”
So one would think that the problem is solved and that funding for making data available long-term can be obtained. But then… why would we be writing this blog post?… As is usually the case, the devil is in the detail. European Commission’s financial rules require that grant money can only be spent during the timeline of the project (and only for the duration of the project).
Naturally, long-term preservation of research data occurs only after datasets have been created and curated, and most of the time only starts at the time when the project finishes. In other words, the costs of long-term data preservation are not eligible costs on grants funded by the European Commission*.
Importantly, the European Commission’s funding is just an example. Most funding bodies do not consider the costs of long-term data curation as eligible costs on grants. In fact, the author is not aware of any funding body which would consider these costs eligible**.
So what’s the solution?
Funding bodies suggest that long-term data preservation should be offered to researchers as one of the standard institutional support services. The costs of these should be recovered within overhead/indirect funding allocation on grant applications. Grants from the European Commission have a flat 25% rate overhead allocation. Which is already generous compared with some other funding bodies which do not allow any overhead cost allocation at all. The problem is that at larger, research-intensive institutions, the overhead costs are at around 50% of the original grant value.
This means that for every 1 mln Euro which researchers receive to spend on their research projects, research institutions need to find an extra 0.5 mln Euro from elsewhere to support these projects (facilities costs, administration support, IT support, etc.). Therefore, given that institutions are already not recovering their full economic costs from research grants, it is difficult to imagine how the new requirements for long-term data preservation can be absorbed within the existing overhead/indirect costs stream.
The problems described above are not new. In fact, these were discussed with funding bodies already on several occasions (see here and here for some examples). But not much has changed so far. There were no new streams of money made available: nor through direct grant funding, nor through increased overhead caps for institutions providing long-term preservation services for research data.
Meantime, researchers (those creating large datasets in particular) continue to struggle to find financial support for long-term preservation and curation of their research data, as nicely illustrated in a blog post by our colleagues at Cambridge.
Since the discussions with funding bodies held by individual institutions did not seem to be fruitful, perhaps the time has come for some joint up national (or international) efforts. Could this be an interesting new project to tackle by the Dutch National Coordination Point Research Data Management (LCRDM)?
* – Some suggest that the costs are eligible if the invoices for long-term data preservation are paid during the lifetime of the project. However, this is only true if the invoice itself does not specify that the costs are for long-term preservation (i.e. says that the invoice is simply for ‘storage charges’, without indicating the long-term aspects of it). Which only confirms the fact that funders are not willing to pay for long-term preservation and forces some to use more creative tactics and measure to finance long-term preservation.
** – Two funding bodies in the UK, NERC (Natural Environment Research Council) and ESRC (Economic and Social Research Council), pay for the costs of long-term data preservation by financing their own data archives (NERC Data Centres and the UK Data Service, respectively) where the grantees are required to deposit any data resulting from the awarded funding.
Maybe data archives (RDNL) could have a role in the discussion with funding bodies as well.
What about a solution where you would pay for premium access and deposition of data during the grant, while the cost of access would actually be calculated to allow the preservation of the database for these 10 years ? Another “creative tactics and measure to finance long-term preservation”, I presume. (access at lower speed would be allowed for free to keep the open data standard).
I take for granted that the actual price of data storage is not so much the hardware, but paying the people making sure the data is stored safely and the infrastructure to make it accessible, right? That would make such a business model quite realistic ???
The solution is to treat data as a publication.
APC for OA includes costs for long term preservation of the paper, which os effectively pating for costs post-grant.