EUA-FAIRsFAIR focus group meeting – addressing development of competences for (FAIR)data management and stewardship.

By Paula Martinez-Lavanchy

On the 19th of November I joined the meeting of the EUA-FAIRsFAIR focus “Teaching (FAIR) data management and stewardship” at the University of Amsterdam. In this post I summarized my key reflections of what happened during the meeting.

For those who are not yet familiar with FAIRsFAIR, it is an European project that started in March 2019 with the aim “to supply practical solutions for the use of the FAIR data principles throughout the research data life cycle. Emphasis is on fostering FAIR data culture and the uptake of good practices in making data FAIR.” The project has four main areas of work: ‘Data Practices’, ‘Data Policy’, ‘Certification’ (repositories) and ‘Training, Education and Support’. The meeting in Amsterdam was part of the activities of this last area, and specifically part of Work Package 7 of the project “FAIR Data Science and Professionalisation”. The main organizer of the event was the European University Association (EUA).

FAIRsFAIR project aims to be deeply connected with the European Open Science Cloud (EOSC) through a dedicated Synchronisation Force, which will offer coordination and interaction opportunities between various stakeholders, including the EOSC. It was not clear to me how exactly will the input of the project be used/adopted by EOSC in practice. However, the EOSCpilot work on skills was part of the presentations we saw, which suggest that the deliverables of FAIRsFAIR project are meant to become a building block of the EOSC, and not yet another layer of the cake of FAIR.

The various initiatives related to RDM training and FAIR data skills

The meeting started with five presentations that introduced the audience to different initiatives regarding or related to Research Data Management (RDM) training and/or FAIR data skills. Since we already talked about layers, I would divide the presentations in two: Framework initiatives and Implementation initiatives.

Framework initiatives: where the goal is to define the skills/competences that data scientists, data stewards and researchers should acquire around data management and to build up training curricula. There was a dedicated presentation about the EDISON project (Yuri Demchenko – University of Amsterdam) and FAIR4S (Angus Whyte – Digital Curation Centre – DCC). However, many other initiatives related to RDM skills and competences were mentioned: RDA Education & Training in Data Handling IG, Skills Framework for Information Age (SFIA), Competency Matrix for Data Management Skills (Sapp Nelson, M – Purdue), Open Science Careers Assessment Matrix, Towards FAIR Data Steward as a profession for the Lifesciences”. Kind of impressive and overwhelming to see the amount of groups working in the RDM training field.

Implementation initiatives: I call them implementation initiatives because these are initiatives already providing training or they are in the planning of creating an education program. 

Photo Credit: Lennart Stoy. Original Tweet here.

It was very interesting to hear about the work done by ELIXIR (Celia van Gelder – DTL/ELIXIR-NL), which is running training events for researchers, developers, infrastructure operators and trainers in the Life Sciences. ELIXIR also have a consolidated train-the-trainer- programme that provides training skills and have developed a really nice platform (TESS) where they announce training, make training materials available, but also provide guidance on how to build training. 

We also had the opportunity to hear about the “National Coordination of Data Steward Education in Denmark” (Michael Svendsen – Danish Royal Library). They used a survey approach to investigate the landscape of expected skills that Data Stewards should have (results to be published soon). Based on this, the Danish Royal Library together with the University of Copenhagen, are planning to design a Data steward Education curriculum (launch 2021) and drafting a specific training module for the study program of librarians.

In summary, the terms ‘training’ and ‘education’ were used in the different presentations, but also many target groups and many types of skills with a different degree of relevance depending on the project or the initiative working on it. While this diversity was impressive, it felt somewhat difficult to understand the rationale for all these parallel projects and approaches, and how will they all lead to a coherent, agreed, pan-European framework for RDM skills and competences.   

Advantages, disadvantages, challenges and opportunities

In the afternoon session we had break out discussions where 4 topics were proposed:

  1. Teaching RDM/FAIR at Bachelor/Master level
  2. Addressing RDM/FAIR at Doctoral/Early-career researcher level
  3. Generic Data Stewardship and FAIR data competences
  4. Disciplinary/Domain-specific Data Stewardship and FAIR data competences
Phote Credit: Lennart Stoy. Original Tweet here.

We had two sessions of discussion, so each of us had the opportunity to join two different topics. For each topic we discussed advantages/disadvantages, good practices, missed opportunities, challenges, target audiences, possible synergies, etc. I joined topic 1 (Teaching RDM/FAIR at Bachelor/Master level) and 4 (Disciplinary/Domain-specific Data Stewardship and FAIR data competences). In both breakout groups we had rather broad discussions and exchange of knowledge, with more or less structure, but I found them very interesting and valuable. The organizers promised to report on the discussion results, so I will not duplicate their efforts. There will be a following post for sharing my own overall reflections about education and training on RDM. So to be continued.

Summary

What are the next steps for the FAIRsFAIR project with regards to skills and competences? The organizers intend to use the results of this meeting and the results collected in the “Consultation on EUA-FAIRsFAIR survey on research data and FAIR data principles”, a survey that they recently run, in order to define the activities of the project in the track of training and education. So hopefully more on this soon.

The new 3mE Research Data Management Policy: what is it and why is it introduced?

A shortened version of this article was originally published in the Newsletter of 3mE Faculty.

Research Data Management (RDM) is gaining more attention due to changes in funders’ and publishers’ policies regarding data availability and the emergence of the FAIR data principles. Last year, a central TU Delft Research Data Framework Policy, outlining the roles of the Library, ICT Department, University Services and CvB at TU Delft, got accepted at CvB level (to read the paper describing the development of this policy: http://doi.org/10.5334/dsj-2019-045). Accordingly, every faculty was asked to use this framework to develop their own policies to address disciplinary needs and define roles and responsibilities. Faculties of 3mE and TNW worked closely on the development of this policy, due to disciplinary similarities. Every 3mE and TNW department has been consulted and the 3mE policy got accepted by the dean on 24 September 2019.

Do you want to learn more about the data policy and what is in it for you? Read this interview with Yasemin Türkyilmaz-van der Velden, Data Steward of 3mE, and with two so called “Data Champions” of 3mE:

Data Steward Yasemin Türkyilmaz-van der Velden, 3mE

What kind of data are addressed in this policy?

Any research data that underpins answers to research questions and is necessary to validate research findings. Data can come in various forms and types. Data can be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods, or information derived from existing evidence. Research data also includes elements that make the data reusable or re-workable, e.g. documentation of the research process (e.g. in lab- or notebooks), or underlying software.

What plusses and minuses do the researchers of 3mE see?

We have visited every 3mE department to discuss this policy and ask for feedback. In general, 3mE researchers understand the need for development of such a policy, yet many have questions and concerns when it comes to its implementation. This project is not about compliance but rather increasing awareness to achieve incremental improvements in data management and sharing practices.

What are the next steps in Research Data Policy within 3mE?

To address the concerns mentioned above, RDM training is offered. In January 2019, we introduced Software Carpentry Workshops to teach basic computing skills. From 2020, a RDM Essentials course will be offered. Additionally, we will be in close contact with 3mE researchers to be able to provide the support and resources they need to follow the requirements in this policy. The policy will be reviewed biannually and will be adjusted to the needs of our researchers. 

Can scientist ask for help, advise or even financial support in this matter?

Definitely! It is the Faculty Data Steward’s responsibility to help researchers with any questions related to data collection, management, and publication. When required, I can direct the researchers to other service providers, such as Legal services, ICT and Human Research Ethics Committee. Currently, any TU Delft researcher can request 5TB and more data storage space per project from TU Delft ICT and deposit up to 1TB data (per year) at 4TU Centre for Research Data, free of charge. For projects exceeding these numbers, I can help with cost planning at the proposal stage.

Data Champions

Increasing awareness on research data management is only possible by closely engaging with the research community. Therefore we have Data Champions: local advocates for the good RDM practices and share their tips and tricks with their colleagues.

Data Champion Poulumi Dey, Materials Science and Engineering, 3mE

Data management and sharing is essential since researchers in the present-day world deal with huge volumes of data.  As a data champion, I get the opportunity to not only share my experiences but also learn from other data champions about other efficient ways of managing data. I share the insights gained from such an exchange of ideas with my department members, which is gainful for them.

The benefits that I get by data management surpass the investments that I need to make in terms of time, money and effort.  For instance, with proper management of the pre-existing data, I prevent misuse of computational resources (by avoiding duplication of data), which also saves my time and efforts. 

With my fellow tenure trackers, I will like to share my intention of building a common platform within my department where important data of my research community members can be safely stored for future usage. Also, we have a highly supportive and dedicated data management team in our faculty from whom we can get more insights into good data management practices.

Data Champion Joost de Winter, Cognitive Robotics, 3mE

The amount of data generated in (our) research is enormous; it seems to be increasing at an exponential rate. Currently, so much data is generated in different (student) projects that it requires active coordination to keep track of those data and the corresponding analysis scripts. Within my group, I find it very important the data are made transparent and shared amongst the co-authors. If this does not happen or if it is unclear for others how a table/figure in a research paper is generated from the raw data, I get quite uncomfortable. Being a data champion allows me to be visible and motivate others to share the data in the same way. I already notice that other groups within 3mE pick up the same principles.

Managing the data is an active and time-consuming process. It certainly takes more effort to publish a paper together with annotated data and scripts compared to without.  The benefits occur in the longer term. For example, if a follow-up student wants to continue the research, lots of time is saved by giving the student access to the data archive of the previous project.

I would say that reproducibility is an important criterion in science. Data management can aid in the process. Perfect reproducibility is obtained if all tables and figures of the paper roll out by one press of the button.

Related documents:

FAIR data principles

TU Delft Research Data Framework Policy

3mE Research Data Management Policy

Policy Needs to Go Hand in Hand with Practice: The Learning and Listening Approach to Data Management.

Hacking for reproducible science.

This blog is originally written and posted by Mateusz Kuzak on his website and is reposted here with his permission.

First ReproHack in the Netherlands

Last Saturday, I visited Leiden University Library to join the first ReproHack in the Netherlands. ReproHack is short for Reproducibility Hackathon, an event at which participants hack-on trying to reproduce existing published research based on the journal article. In the lead up to the event, researchers were encouraged to nominate their paper(s) for a reproducibility “test”. The organizers thoughtfully printed out all the articles in advance and posted them on the walls around the venue.

See here for the original tweet.
See here for the original tweet.

Participants

Almost 60 people showed up, despite it being a quite chilly weekend day. The participants’ backgrounds encompassed a whole range of research domains, with the majority being PhD candidates and post-docs. What struck me most about the audience was the level of energy, excitement and willingness to make Open Science happen, as well as the eagerness to learn.

Here we go

The event started with a short introduction by the organiser, followed by a talk by ReproHack funder Anna Krystalli. Anna brought a personal perspective to research reproducibility and introduced a few packages that can be used to make it easier to write reusable code for research.

See here for the original tweet.

Anna also described the research compendium concept. It was the first time I heard about this. The research compendium concept was introduced first by Gentleman and Temple in 2004:

“”…We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, …), and as a means for distributing, managing and updating the collection.”

Go checkout Anna’s slides too, and my tweet thread explaining the compendium.

There was another invited speaker, John Boy, who shared with us his perspective on Open Science.

See here for the original tweet.

Getting hands dirty

See here for the original tweet.

After a welcome coffee and Anna’s talk, participants walked around to pick the research they would like to review and formed small groups to work together. The goal of this exercise was to collect experience on the reproducibility, transparency and reusability of available content.

See here for the original tweet.

What I found special about the event is the positive atmosphere of collaborative learning and sharing. It is by no means an attempt to criticise or discredit work. On the contrary, it helps us to gain a better understanding of the challenges behind research reproducibility and to come together with solutions that will have the positive impact on our scholarly publications.

See here for the original tweet.

It was a really busy day. We ultimately managed to evaluate 21 out of 31 submitted articles.

Improvised hands-on session

While enjoying a delicious lunch, the discussions meandered into the practical aspects of making research code more findable and citable. I gave a short presentation on how to connect GitHub with Zenodo and on the ways to make your software more FAIR.

See here for the original tweet.

Takeaways

I believe the event achieved more than its initial aims. It brought together like-minded people, helped them to exchange ideas, share solutions and build relationships in this growing Open Science Community.

For me as a community manager who is less directly involved in research and who doesn’t have that many opportunities to write code, the event provided a window on researchers’ laudable efforts to make their research more accessible to others.

This blog is originally written and posted by Mateusz Kuzak on his website and is reposted here with his permission.

Research Data Management Survey 2019: the results are here !

Data. We advise researchers on how to manage theirs, but we are not averse to gathering and sharing some of our own.

The problem

The Data Stewardship Project started over 2 years ago. Much has been done, but given that our activities are scattered over eight TU Delft Faculties, and cover numerous issues (data ownership, data storage, data management plans, personal data and GDPR, programming training and support…), it is difficult to know how effective these activities are in supporting the researchers with their daily data management practice.

The solution

A survey! Over the summer (May-June) of 2019 we ran a survey, which was advertised in all faculties. We based this survey on the one that was run in 2017, but, as the challenges evolved, so did research data support services and therefore the content of the survey. This year, 937 staff members involved in research answered our call (from PhD students to full professors, including lecturers and lab assistants).

The results

The results can be browsed here!

The results are presented in terms of percentage of respondents in each faculty. If you put your mouse over each number, you will get a bit more information about who replied (total number of answers for that faculty, and across all faculties)

If you are curious, you can find the results of the previous survey here!

We also provide here a direct comparison between the results of the first and the second survey for some of the questions (not available for all faculties because not all of them participated in 2018 survey – pay attention when looking at the results!)

A few takeaways from the survey

We are happy to see that awareness of the FAIR principles has increased since last year (+8.3 percentage points on average across the faculties)

Data loss was (slightly) less frequent in this survey than reported during the previous one (-3 percentage point on average)

We also appreciate that Data stewards are getting known across all faculties, with an increase of 25 percentage points on average. 

It should be noted that, in the Faculty of Aerospace, awareness of data steward support increased from just 19% of the respondents to 63% this year – the largest increase in all faculties! (Great job Heather!)

The number of researchers relying on manual backup solution is quite high (25% to 53% of respondents). This implies a lot of tedious and error prone work. We intend to communicate more on backup solutions offered by the University.

Publishing data is still not common! Despite our efforts, only about 10% (8% to 12% depending on the faculty) of respondents indicated that they published data in the last year. We may use this information as an indicator of progress in the future. However, for now, this indicates that the culture change toward data sharing is a work in progress.

Next steps

As last year, we will now proceed with carefully analysing the data. This is essential for us to understand what are the key community needs and where should we focus our support efforts. Similarly to what we did with the results of 2017/2018 survey, we will aim to publish the outcomes openly, as a peer-reviewed article. So stay tuned!

We might call on you for yet another survey in a year or so… We are here to help you, so your opinion matters! 

Until that time, if you have any questions, please contact us at datastewards@tudelft.nl.

Acknowledgement

TABLEAU IS HARD! Many thanks to Bonnie van Huik (FIC for TPM and AS) for her help. Without her, this would have never been possible ☺

Related resources

  • Andrews Mancilla, H., Teperek, M., van Dijck, J., den Heijer, K., Eggermont, R., Plomp, E., Turkyilmaz-van der Velden, Y. and Kurapati, S., 2019. On a Quest for Cultural Change – Surveying Research Data Management Practices at Delft University of Technology. LIBER Quarterly, 29(1), pp.1–27. DOI: http://doi.org/10.18352/lq.10287

New Landscapes on the Road of Open Science: 6 key issues to address for research data management in the Netherlands

Marta Teperek, Wilma van Wezenbeek, Han Heijmans, Alastair Dunning

The road to Open Science is not a short one. As the chairman of the Executive Board of the European Open Science Cloud, Karel Luyben, is keen to point out, it will take at least 10 or 15 years of travel until we reach a point where Open Science is simply absorbed into ordinary, everyday science.

Within the Netherlands, and for research data in particular, we have made many strides towards that final point. We have knowledge networks such as LCRDM, a suite of archives covered by the Research Data Netherlands umbrella, and the groundbreaking work done by the Dutch Techcentre for Life Sciences.

But there is still much travel to be done; many new landscapes to be traversed. Data sharing is still far from being the norm (see here for a visualisation of these results).

The authors of this blog post have put together six areas that, in their opinion, deserve attention on our Open Science journey.

1. Cultural and Technical Infrastructure for Confidential Data

At a recent workshop on data privacy the event ended with a doctor stating that “all data is personal”. This is going too far – much technical data is free from any personal details. Nevertheless, there are many reasons to see personal data everywhere – the increasing quantities of interdisciplinary work that make use of sensor or social media data; legal mechanisms such as the GDPR; the growing possibilities for retrospective de-anonymisation; and the accumulation and analysis of personal data via machine learning. Increasingly, researchers need sophisticated mechanisms for sharing and publishing data based on humans. 

And it’s not just personal data. Increasing engagement with third parties (at TU Delft roughly a third of all research funding is with commercial partners) means that we need to consider how best to safeguard data with a commercial aspect.  We need an infrastructure for sharing commercial data with our industrial partners and protecting potentially economically valuable resources from bad actors. 

The amount of work (tools and services, advice, standards) to be done is huge. We need:  

  • trusted infrastructures for sharing data between universities, medical centres, research units, commercial entities; similar infrastructures for publishing personal data (with different access levels)
  • a national network of disciplinary access committees who can approve requests for access to restricted data; and perhaps a national body that can act as an access point for researchers for sensitive data from third parties (eg similar to the role the CBS has for government statistical data)
  • a national consent service for handling and accessing consent forms
  • national advice (or even specific tools) for anonymising data
  • nationally agreed terms for data access (perhaps a colour coded system from green for open access to black for closed archive)
  • a network of trainers and research data supporters across the country who can guide and advise researchers tiptoeing down the path of personal data 
  • agreed principles by which higher education and private companies should abide by when co-creating research outputs (articles, data etc) 

In many cases, individual research organisations are developing their own solutions. And these issues are being partially discussed within LCRDM groups. But these are generally exploratory discussions. To create a systematic infrastructure (of both digital tools and human expertise) we need a clear plan, a broad nation-wide coalition of partners,  all of whom have clearly defined roles and responsibilities. And of course to embed this in the wider international context. 

2. Encouragement for discipline-specific guidance and standards

Early analysis of the usage of the FAIR principles focussed on how FAIR repositories are. How FAIR was DANS, or 4TU.ResearchData or subject based repositories?

But the FAIR principles apply not just to metadata and repositories but to the data itself. Above all, we need to make datasets interoperable, using harmonious standards, terminologies, ontologies etc. so that researchers from all over the world can immediately reuse data without having to interpret and reconfigure each discovered dataset. 

In some fields, this is already happening (microscopy data, material science, the life sciences, hydrology). But in many sub disciplines, there is no real momentum. Developing this momentum is important, but it is a tricky task, because such standards need to be developed at a disciplinary, international level. 

Nevertheless, we can start to make some small steps. Encouraging disciplinary communities to come together and start discussing the challenges and possibilities for FAIR data would be a great start. This is not just a technical discussion; it is about building networks of engaged people to discuss these topics.  Workshops, discussion papers, critical engagement will all help push the discussion into first gear – something that can be accelerated by international collaboration via RDA, CODATA and, crucially, international subject societies.  

3. Creating a Web of Incentives 

The University of Bristol recently revised its promotion criteria to include open research practices. This is obviously great news for those who believe in Open Science.

But it’s worth looking at how this came about. The decision has not been taken unilaterally. Rather: 

“Including data sharing in promotion criteria is a requirement of institutions signing the Concordat on Open Research Data. Including open research practices in its promotion criteria allows the University of Bristol to sign the Concordat, which will in turn enhance the environment component of its submission to the Research Excellence Framework. There is a web of incentives.

So change here has been because universities worked together at a national level. Strategic leadership has collaborated to create the principles behind the Concordat.

Bristol is not the only example.  The University of Ghent has made a broader overhaul of rewards and recognition,  while the Swiss Academies of Arts and Sciences see the broader ecosystem effects of Open Science.

Within the Netherlands, we need more innovative, nation-wide tactics from our national bodies to implement the ‘web of incentives’ needed to implement Open Science. It’s more than funding bodies simply demanding that projects share their data.

4. Building Capacity for Training

Barend Mons’ claim that we need 500,000 data stewards may have had a touch of hyperbole but it should not mask a key fact: the path to data-intensive science requires new roles (data stewards, managers, research software engineers) as well as data-savvy researchers themselves. This creates an immediate pressure. How do we find such people? How do we train them? How do we get researchers up to speed? From a TU Delft perspective, we have published our Vision on Research Data Management training but do we scale up to train the c.500 new PhD students per year so that they are in a position to publish their data along with their final thesis?

To deal with this common problem, we need to work out a way to train-the-trainer, make use of existing materials, share workshops and generally be smart. This won’t work by institutions working by themselves. Rather, as Celia van Gelder suggested in a recent presentation, we need to have serious investment in capacity building programmes and establish a network of digital research support desks throughout the Netherlands and Europe.

5. Transparent Governance / Coordinated Action

The responsibilities for research data management are often shared between different departments within a university – the library, ICT, legal, research support. These existing silos make it difficult for universities to provide the frictionless support for their research communities. All of us working in support services should be collaborating to see how we can make workable connections between these silos.

But these institutional boundaries also manifest themselves at a national level. Many of the librarians congregate around LCRDM, the Surf CSC group rounds up the ICT managers, while the big decisions at NPOS are taken by senior policy players. Nevertheless, these stakeholders are still dealing with the same fundamental concerns about Open Science and research data – all of them are travelling the same road.  

So we need much better coordination, and smarter routes of governance. We can start by being more transparent. What is each organisation doing, what is its role and responsibilities, where is it going? This is the first milestone in openness. And once we have that we can move on with the coordination and governance issues. Do we look to leadership from our government (OCW), or at least make firm proposals to them, perhaps in exchange for more financial stimulus? Or do we develop grass-roots communities of governance that move more quickly but risk leaving some stakeholders behind?

6. Open Infrastructures for Research

In recent years we have seen numerous acquisitions of various elements of scholarly communication infrastructure by two major commercial players: Elsevier and Digital Sciences. This allows these two companies to offer fully integrated workflows to support researchers in almost the entire research lifecycle (reference management tools, electronic lab notebooks, data repositories, current research information systems, various research analytics tools). A dream come true! No need to develop unsustainable local solutions by universities themselves; no need to constantly struggle to recruit and maintain talented developers and system administrators; bags of money saved with better quality products.

But is that so? Outsourcing the most crucial pieces of scholarly communication infrastructure to commercial providers is risky. Among others, institutions are under threat of vendor lock-in: once investment has been made in an integrated infrastructure (both in terms of the actual effort of the tender process, integrating the provider within the university system, but also communication efforts to various stakeholders) who would want to change things? That’s despite companies often promising that customers own their data and can cancel their contracts anytime. 

Also, commercial providers are often excellent at providing integration, but only within their own plethora of services. Dare try and integrate services offered by different big players! Then there is the obvious threat of market domination: it is difficult for smaller businesses to compete against the big players. Lack of competition is a way forward to price elevation and reduction of quality. 

Finally, by handing over crucial assets (research outputs), academia loses its control. Not only over the actual development of products and services, but, more crucially, over what happens with the data and metadata (commercial companies tend to be very eager to lock down and monetise the latter in particular), but also over the measurement, citation, analytics, discovery, etc. 

Meantime, due to lack of alternative options, more and more Dutch institutions are subscribing to services offered by the two big players. For example, “subscriptions [to Pure – Elsevier’s current research information system] amount to an annual €2.3 million nationwide as compared to €14 million for [Elsevier] journal subscriptions”.

So we desperately need viable, sustainable open source alternatives: Open Scholarly Infrastructures. Ideally developed in collaboration between consortia of academic institutions. There are already some efforts, such as the Invest in Open Infrastructure initiative. However, we desperately need better coordination, more strategic support, resources and investment to make it happen and to make these efforts a priority – not only nationally, but also internationally.