New Landscapes on the Road of Open Science: 6 key issues to address for research data management in the Netherlands
Marta Teperek, Wilma van Wezenbeek, Han Heijmans, Alastair Dunning
The road to Open Science is not a short one. As the chairman of the Executive Board of the European Open Science Cloud, Karel Luyben, is keen to point out, it will take at least 10 or 15 years of travel until we reach a point where Open Science is simply absorbed into ordinary, everyday science.
Within the Netherlands, and for research data in particular, we have made many strides towards that final point. We have knowledge networks such as LCRDM, a suite of archives covered by the Research Data Netherlands umbrella, and the groundbreaking work done by the Dutch Techcentre for Life Sciences.
But there is still much travel to be done; many new landscapes to be traversed. Data sharing is still far from being the norm (see here for a visualisation of these results).
The authors of this blog post have put together six areas that, in their opinion, deserve attention on our Open Science journey.
1. Cultural and Technical Infrastructure for Confidential Data
At a recent workshop on data privacy the event ended with a doctor stating that “all data is personal”. This is going too far – much technical data is free from any personal details. Nevertheless, there are many reasons to see personal data everywhere – the increasing quantities of interdisciplinary work that make use of sensor or social media data; legal mechanisms such as the GDPR; the growing possibilities for retrospective de-anonymisation; and the accumulation and analysis of personal data via machine learning. Increasingly, researchers need sophisticated mechanisms for sharing and publishing data based on humans.
And it’s not just personal data. Increasing engagement with third parties (at TU Delft roughly a third of all research funding is with commercial partners) means that we need to consider how best to safeguard data with a commercial aspect. We need an infrastructure for sharing commercial data with our industrial partners and protecting potentially economically valuable resources from bad actors.
The amount of work (tools and services, advice, standards) to be done is huge. We need:
- trusted infrastructures for sharing data between universities, medical centres, research units, commercial entities; similar infrastructures for publishing personal data (with different access levels)
- a national network of disciplinary access committees who can approve requests for access to restricted data; and perhaps a national body that can act as an access point for researchers for sensitive data from third parties (eg similar to the role the CBS has for government statistical data)
- a national consent service for handling and accessing consent forms
- national advice (or even specific tools) for anonymising data
- nationally agreed terms for data access (perhaps a colour coded system from green for open access to black for closed archive)
- a network of trainers and research data supporters across the country who can guide and advise researchers tiptoeing down the path of personal data
- agreed principles by which higher education and private companies should abide by when co-creating research outputs (articles, data etc)
In many cases, individual research organisations are developing their own solutions. And these issues are being partially discussed within LCRDM groups. But these are generally exploratory discussions. To create a systematic infrastructure (of both digital tools and human expertise) we need a clear plan, a broad nation-wide coalition of partners, all of whom have clearly defined roles and responsibilities. And of course to embed this in the wider international context.
2. Encouragement for discipline-specific guidance and standards
Early analysis of the usage of the FAIR principles focussed on how FAIR repositories are. How FAIR was DANS, or 4TU.ResearchData or subject based repositories?
But the FAIR principles apply not just to metadata and repositories but to the data itself. Above all, we need to make datasets interoperable, using harmonious standards, terminologies, ontologies etc. so that researchers from all over the world can immediately reuse data without having to interpret and reconfigure each discovered dataset.
In some fields, this is already happening (microscopy data, material science, the life sciences, hydrology). But in many sub disciplines, there is no real momentum. Developing this momentum is important, but it is a tricky task, because such standards need to be developed at a disciplinary, international level.
Nevertheless, we can start to make some small steps. Encouraging disciplinary communities to come together and start discussing the challenges and possibilities for FAIR data would be a great start. This is not just a technical discussion; it is about building networks of engaged people to discuss these topics. Workshops, discussion papers, critical engagement will all help push the discussion into first gear – something that can be accelerated by international collaboration via RDA, CODATA and, crucially, international subject societies.
3. Creating a Web of Incentives
The University of Bristol recently revised its promotion criteria to include open research practices. This is obviously great news for those who believe in Open Science.
But it’s worth looking at how this came about. The decision has not been taken unilaterally. Rather:
“Including data sharing in promotion criteria is a requirement of institutions signing the Concordat on Open Research Data. Including open research practices in its promotion criteria allows the University of Bristol to sign the Concordat, which will in turn enhance the environment component of its submission to the Research Excellence Framework. There is a web of incentives.“
So change here has been because universities worked together at a national level. Strategic leadership has collaborated to create the principles behind the Concordat.
Bristol is not the only example. The University of Ghent has made a broader overhaul of rewards and recognition, while the Swiss Academies of Arts and Sciences see the broader ecosystem effects of Open Science.
Within the Netherlands, we need more innovative, nation-wide tactics from our national bodies to implement the ‘web of incentives’ needed to implement Open Science. It’s more than funding bodies simply demanding that projects share their data.
4. Building Capacity for Training
Barend Mons’ claim that we need 500,000 data stewards may have had a touch of hyperbole but it should not mask a key fact: the path to data-intensive science requires new roles (data stewards, managers, research software engineers) as well as data-savvy researchers themselves. This creates an immediate pressure. How do we find such people? How do we train them? How do we get researchers up to speed? From a TU Delft perspective, we have published our Vision on Research Data Management training but do we scale up to train the c.500 new PhD students per year so that they are in a position to publish their data along with their final thesis?
To deal with this common problem, we need to work out a way to train-the-trainer, make use of existing materials, share workshops and generally be smart. This won’t work by institutions working by themselves. Rather, as Celia van Gelder suggested in a recent presentation, we need to have serious investment in capacity building programmes and establish a network of digital research support desks throughout the Netherlands and Europe.
5. Transparent Governance / Coordinated Action
The responsibilities for research data management are often shared between different departments within a university – the library, ICT, legal, research support. These existing silos make it difficult for universities to provide the frictionless support for their research communities. All of us working in support services should be collaborating to see how we can make workable connections between these silos.
But these institutional boundaries also manifest themselves at a national level. Many of the librarians congregate around LCRDM, the Surf CSC group rounds up the ICT managers, while the big decisions at NPOS are taken by senior policy players. Nevertheless, these stakeholders are still dealing with the same fundamental concerns about Open Science and research data – all of them are travelling the same road.
So we need much better coordination, and smarter routes of governance. We can start by being more transparent. What is each organisation doing, what is its role and responsibilities, where is it going? This is the first milestone in openness. And once we have that we can move on with the coordination and governance issues. Do we look to leadership from our government (OCW), or at least make firm proposals to them, perhaps in exchange for more financial stimulus? Or do we develop grass-roots communities of governance that move more quickly but risk leaving some stakeholders behind?
6. Open Infrastructures for Research
In recent years we have seen numerous acquisitions of various elements of scholarly communication infrastructure by two major commercial players: Elsevier and Digital Sciences. This allows these two companies to offer fully integrated workflows to support researchers in almost the entire research lifecycle (reference management tools, electronic lab notebooks, data repositories, current research information systems, various research analytics tools). A dream come true! No need to develop unsustainable local solutions by universities themselves; no need to constantly struggle to recruit and maintain talented developers and system administrators; bags of money saved with better quality products.
But is that so? Outsourcing the most crucial pieces of scholarly communication infrastructure to commercial providers is risky. Among others, institutions are under threat of vendor lock-in: once investment has been made in an integrated infrastructure (both in terms of the actual effort of the tender process, integrating the provider within the university system, but also communication efforts to various stakeholders) who would want to change things? That’s despite companies often promising that customers own their data and can cancel their contracts anytime.
Also, commercial providers are often excellent at providing integration, but only within their own plethora of services. Dare try and integrate services offered by different big players! Then there is the obvious threat of market domination: it is difficult for smaller businesses to compete against the big players. Lack of competition is a way forward to price elevation and reduction of quality.
Finally, by handing over crucial assets (research outputs), academia loses its control. Not only over the actual development of products and services, but, more crucially, over what happens with the data and metadata (commercial companies tend to be very eager to lock down and monetise the latter in particular), but also over the measurement, citation, analytics, discovery, etc.
Meantime, due to lack of alternative options, more and more Dutch institutions are subscribing to services offered by the two big players. For example, “subscriptions [to Pure – Elsevier’s current research information system] amount to an annual €2.3 million nationwide as compared to €14 million for [Elsevier] journal subscriptions”.
So we desperately need viable, sustainable open source alternatives: Open Scholarly Infrastructures. Ideally developed in collaboration between consortia of academic institutions. There are already some efforts, such as the Invest in Open Infrastructure initiative. However, we desperately need better coordination, more strategic support, resources and investment to make it happen and to make these efforts a priority – not only nationally, but also internationally.