Authors: Marta Teperek, Yasemin Turkyilmaz-van der Velden, Shalini Kurapati, Esther Plomp, Heather Andrews, Robbert Eggermont
TU Delft has been leading the way in fostering a good research data management culture to uphold the quality, transparency and reproducibility of research. Since 2017, TU Delft has piloted the Data Stewardship programme with the aim to provide disciplinary specific data management support to TU Delft researchers. The focus on disciplinary support is motivated by the belief that in research data management (RDM), there are no one-size-fits-all solutions.
TU Delft has eight faculties with a wide range of research topics. In order to provide dedicated disciplinary support to researchers, a Data Steward was appointed at every faculty. Each Data Steward has a PhD degree in research are relevant for the faculty.
This is a condensed 2018 annual report describing the progress, activities, achievements and future prospects of the project.
Team building and laying the groundwork for the programme
In 2017 the majority of work focused on the recruitment of Data Stewards at three faculties: Electrical Engineering, Mathematics and Computer Sciences (EEMCS), Aerospace Engineering (AE) and Civil Engineering and Geosciences (CEG), and laying the groundwork of the programme. In 2018 Data Stewards were appointed at the remaining faculties, which concluded the team building work and brought the programme to its full speed. Since the beginning of 2019, the team of Data Stewards is at its full capacity, with a dedicated Data Steward per faculty.
The Data Stewards meet weekly for training, information sessions, and knowledge and practice exchange. The weekly meetings focus on the RDM needs of TU Delft researchers and keeping up to date with the most recent trends in RDM such as the FAIR principles, General Data Protection Regulation (GDPR) law, research and software reproducibility. Dedicated experts from TU Delft, as well as national and international scene are regularly invited to these meetings. Communication channels and information sharing spaces have been also created and are now effectively used by all team members. To increase the visibility of the programme and to openly share its progress, a Data Stewardship webpage and a dedicated section on Open Working blog were launched. While the Data Stewards are embedded at each faculty, the Research Data Services (RDS) team operate centrally at the TU Delft library. To establish strong links between these two teams, a joint Away Day is organised once a year. Additionally, members of the RDS team are also attending weekly Data Stewards meetings and participate in some of the joint projects and undertakings (e.g. roll out of a new data management plan template). In addition, connections with faculty secretaries were developed through dedicated meetings to talk about Data Stewardship hosted by the Library and attended by all faculty secretaries. All of these activities were overseen and coordinated by the Data Stewardship Coordinator who is located at the TU Delft library.
Day to day activities of the Data Stewards
The role of the Data Steward at TU Delft is relatively new, so one of the first tasks of the Data Stewards was to become visible to researchers and gather intelligence on the type of support and advice researchers require within the faculty. In the first couple of months, Data Stewards engaged with researchers during faculty meetings, interviews, graduate school seminars, open science roadshows and by sending out a survey on the data management needs (see below for more details).
After researchers were sufficiently aware of the help they could receive, Data Stewards started receiving questions and requests for data management support. The requests varied across the 8 faculties, but there were a few common topics on which Data Stewards were regularly consulted, such as: advice on data management plans, information about data archiving options, data sharing possibility, GDPR concerns, cross-border data transfers, commercially sensitive data, or data licensing.
Data stewards are also the linking pin to the broader TU Delft research support ecosystem. Pragmatically speaking, Data Stewards act as general practitioners to all data related questions and issues. If there is a need for a specific intervention from a university wide legal, ethics or ICT specialist, Data Stewards know where to direct the researcher to get the most specific and useful answers.
In addition to advice and consultation, Data Stewards provide and/or facilitate on-request training and workshops on data management topics for researchers and PhD students. Agreements are made with faculty graduate schools to allocate credit points for participation.
At the moment all the Data Stewards are involving in leading the RDM policy development at their respective faculties.
Although embedding Data Stewards at each faculty is a prerequisite for creating awareness and achieving cultural change in RDM, community building efforts are essential to fully accomplish these goals. Additionally, it is impossible for a single Data Steward to have all the necessary disciplinary background to understand and support all types of research carried out in one faculty. Therefore the Data Champions programme was launched in September 2018.
Data Champions are researchers who voluntarily act as local community-based advocates for good data management and sharing practices. In return, they are provided with opportunities to showcase their activities during meetings at the department, faculty and TU Delft level as well as (inter)national conferences to offer increased impact and visibility. Additionally, the Data Champions are offered travel grants to join meetings and conferences to showcase their Data Champion activities, and trainings and workshops to learn new RDM skills to share with their local community members.
Suitable candidates for the programme are identified by faculty Data Stewards and are encouraged to become Data Champions. The general communication with the Data Champions is carried out by the Data Steward at the Faculty of Mechanical, Maritime and Materials Engineering (3mE), who took on the role of the Data Champions Community Manager. The first meeting to officially kick off the programme was on 14 December 2018. This meeting took place in an informal setting to encourage interactive discussions, knowledge exchange and networking. Overall, it was very well received by the Data Champions as well as the research support professionals. As of December 2018 we already had 27 Data Champions (at least one Data Champion per faculty) and this number is still growing. The AE Faculty, as well as the Faculty of Technology, Policy and Management (TPM), already have at least one Data Champion at every department.
The Dean of the Faculty of Applied Sciences (AS) has recognised the importance of Data Champions for advocating for good data management and sharing practices and aims to also have at least one Data Champion per department. The AS faculty already has six Data Champions and two of them, Anton Akhmerov and Gary Steele, took the lead in creating a dedicated policy on Open Data for their department (Quantum Nanoscience). The importance of the Data Champions programme has been recognised also at a strategic level at TU Delft, evidenced by the wish of Prof. Rob Mudde, the Vice Rector Magnificus of TU Delft, to attend the next meeting of the Data Champions.
To be able to offer dedicated RDM support, it is necessary to first define the problems and the needs of the researchers. Our survey on research data management needs, which was initiated in 2017 at three faculties (EEMCS, CEG and AE), has been extended and completed in three other faculties in 2018 (TPM, 3mE, AS). The survey gathered 680 responses in total and the data visualisation is publicly available. The survey provided important information on the state of data management practices at TU Delft. The survey will be repeated yearly and this way the results will serve as a benchmark to indicate the effects of the work of Data Stewards on data management awareness and practices at the faculties.
The joint presentation summarising survey results at LIBER conference in July 2018 by the Data Stewards from LT and 3mE faculties was very positively received by the community and downloaded 187 times. Based on this presentation, we got invited to submit a paper about the survey results to LIBER Quarterly. The survey will be run at the two remaining faculties (Architecture and the Built Environment – ABE, and Industrial Design Engineering – IDE) and re-run at the other faculties in 2019.
Data Stewardship in numbers
Summarising, in 2018 the Data Stewards have received at least 245 requests for help with data management (note that not all the requests are recorded, given that it involves manual copy-pasting of the requests received by emails). In addition, in 2018 Data Stewards conducted 68 dedicated interviews with researchers about their data management practices. Notably, the Data Steward at the AE Faculty has met with all the full professors at the faculty, which was positively received by TU Delft’s ex-Rector Magnificus Karel Luyben.
In addition, Data Stewards adhere to the principle “practice as you preach” and therefore share their work as openly as possible. In 2018 the team published 29 blog posts and other publications on the Open Working blog. Our top viewed blog post in 2018 is by the Data Steward at EEMCS, describing the results of the RDM survey (viewed 844 times).
Furthermore, the team have attended 46 national and international conferences and meetings in 2018, including 33 occasions were Data Stewards were presenting as invited speakers or keynote speakers. The Data Steward from the 3mE Faculty was awarded the competitive Research Data Alliance Early Career Researcher Grant to attend the International Data Week 2018 conference in Botswana in November 2018. Again, in adherence with the openness principles, all presentations are publicly shared in a dedicated Data Stewardship at TU Delft community in Zenodo.
Data Stewardship event
On 24 of May 2018 the team has organised a dedicated event “Engaging researchers with research data – Data Stewardship in practice” to showcase the work of Data Stewards at TU Delft and to exchange views and practices on Data Stewardship with other universities. The event was attended by over 120 individuals (with 35% of the participants from countries other than the Netherlands). All participants judged the event as “good” or “excellent” and responses to open questions were overwhelmingly positive.
All the photos (taken by Jan van der Heul from the RDS team, our Chief Photographer), videos and presentations from the event are publicly available. In addition, three participants wrote blog posts with their reflections and take-home messages (Marjan Grootveld, Danny Kingsley and Martin Donnelly).
Data stewards have also been involved in many diverse projects. For example, the Data Stewards from the AE and CEG faculties took part in developing domain data protocols, which aim to provide researchers with disciplinary standards for data management in their research domains. The Data Stewards from the 3mE and AS faculties are part of the Electronic Lab Notebooks working group, which, following up on the successful Electronic Lab Notebooks event in March 2018, is now setting up a pilot to test Electronic Lab Notebooks at TU Delft in 2019.
Data stewards from the faculties of TPM, 3mE, AS and CEG have been involved in providing support for researchers working with software in order to improve code management practices and to make software more reproducible. Several workshops on software sustainability were organised, which resulted in a dedicated research paper that got accepted to be presented during the IEEE eScience 2018 conference and got published in the conference proceedings. The preprint of this paper is already downloaded 227 times.
These efforts eventually resulted in 4TU.Center for Research Data joining in December 2018 The Carpentries which is a non-profit organization teaching foundational coding, and data science skills to researchers worldwide. On 29 and 30 November, the first Software Carpentry workshop took place at TU Delft. The tickets got sold out just in a matter of days and we had around 30 researchers participating and another 45 on the waiting list, showing the huge interest and need for such training. Two more Carpentry workshops will take place in TU Delft in 2019. In addition, the Data Steward from the CEG faculty took the lead in the organisation of walk-in coding consultations for researchers wishing to get tailored support on their code management practices, which, due to its success and positive feedback from researchers, will continue to be organised on a regular basis. Moreover, a meeting with TU Delft researchers took place to discuss community building efforts for good programming practices. To this meeting, a representative from the Carpentries and a researcher from the University of Amsterdam was invited to learn lessons from their community building efforts.
Data Stewards have been also instrumental in driving forward the Open Science agenda. Dedicated Open Science roadshows (information sessions on research data management and on Open Access) have taken place at AE, TPM, IDE and CEG faculties. In addition, the TPM faculty organised a dedicated workshop on Open Science to their PhD students. The presentation “Open Science in a nutshell: what’s in it for me?” which was uploaded to Zenodo, has been downloaded 324 times and viewed 1,815 times.
In the current changing funding landscape where the researchers are expected to publish their papers and data openly, it is not feasible to evaluate researchers based on high impact journal publications alone for funding and promotion criteria. This is why, the TPM Faculty was also actively involved in discussions about academic rewards and how to make open science count in academic careers. Prof. Bartel Van De Walle was the keynote speaker at the event on Open Science skills which was co-organised by the Data Stewards, 4TU.Centre for Research Data and the EOSCPilot. There were two separate blog posts highlighting the key aspects of the event (one blog post about the event as a whole and another one about the interactive workshop).
Following the principle that good data management should start as early as possible, the Data Steward from the AE Faculty opiloted the use of Dataverse for keeping research data of master students. Valuable and curated datasets can be subsequently easily published with 4TU.Center for Research Data.
Recognising the need for disciplinary support and for community building, Data Stewards from the ABE and IDE faculties identified the need for Digital Humanities community at TU Delft and are currently discussing with researchers across TU Delft to scope their interests and needs. A bottom-up approach is taken to encourage researchers to take lead in forming their own communities and exchange research ideas, resources and challenges. The first community-driven meeting will take place in early January 2019 at ABE faculty.
Since 25 May 2018, GDPR has came into effect in Europe. In August, two events dedicated to GDPR and its implications for research data were co-organised by the Data Stewards and the Research Data Netherlands. An important aspect of these two events was that representatives from multiple institutions and countries were present to talk about their individual approaches and considerations.
On 26 June 2018, the TU Delft Research Data Framework Policy was approved by TU Delft’s Executive Board. The Framework Policy is an overarching policy on research data management for TU Delft as a whole and it defines the roles and responsibilities at the University level. In addition, the Framework provides templates for faculty-specific data management policies. It is important to develop the faculty policies according to discipline specific RDM needs of the researchers, so they can use this policy as a roadmap for good RDM practices.
Currently, the deans and the faculty management teams, together with the Data Stewards, are busy with the development of faculty-specific policies on data management which will define faculty-level responsibilities. Any interested researcher and research supporter will be invited to give feedback and therefore contribute to the development of the faculty policy. In AS and 3mE faculties, which have around 1000 researchers each, a single meeting would not be feasible, therefore the Data Stewards of these faculties will join to the meetings of every individual department to introduce the policy and ask for feedback. The Data Champions are particularly encouraged to get involved in the development of the policy in their faculties in order to fine tune the policy based on their disciplinary needs.
As can be seen in this report, 2018 has been a very fruitful year for the TU Delft Data Stewardship programme and with a full team of Data Stewards from the beginning of 2019, we expect 2019 to be even more productive. The faculty policies are expected to be rolled-out and published 2019. As one of the requirements of the policy is all PhD candidates starting from 2019 to attend data management training, currently the Data Stewards are busy with the development of a dedicated training suitable for the disciplinary needs of the PhD candidates. For this, the Data Stewards are in close contact with the central and faculty graduate schools, PhD councils and colleagues from TU Delft Library.
We already have three events planned in 2019: a seminar titled as Limits of Reproducibility: Strategies for Transparent Qualitative Research which will be followed by a hands-on workshop about Managing Qualitative Data for Sharing and Transparency on 28 January, open science seminars kick off on 27 February and a seminar on publishing reproducible research on 16 May.
Additionally, we will also have a one-day event for all TU Delft’s Data Champions,
one workshop on working with software and High Performance Computing (HPC), a conference on collaboration with industry and open science and two more software carpentry workshops.
In addition, a dedicated blog post about out plans for 2019 is going to be published soon, so watch this space!
Written by: Maria Cruz and Julien Colomb
A new RDA project, under the umbrella of the Libraries for Research Interest Group and counting with the help of 29 volunteers from three continents, seeks to collect case studies from organisations around the world on how to engage researchers with research data management.
Collectively, our group have put together a survey, now open for contributions, which allows participants to share their stories and approaches for increasing engagement with research data management among researchers. The results from this survey, including the data, will be shared widely with the community in the form of an open book. The goal is to assemble a wealth of information and resources that can be used by institutions to select the methods that are most suitable for their settings.
The importance of research data management has been well emphasized over the last few years, particularly by research funding agencies, universities, and other research and academic institutions. However, the discussions around this topic have often been led by librarians and data professionals, and researcher engagement has been largely limited to those researchers who are already interested in the topic. In order to achieve global cultural change in data management, researchers need to be motivated and properly recognised for good data stewardship efforts. This is not an easy task.
Many organisations have developed dedicated programmes aiming at greater researcher engagement with research data. Examples include the Data Champions initiative at the University of Cambridge, Data Conversations at the University of Lancaster, the Data Stewardship programme at TU Delft, and the Open Data Champions initiative of SPARC Europe. In addition, some institutions, such as the University Medical Centre Utrecht and the Berlin Institute of Health, decided to change the way in which researchers are rewarded.
However, do we know how successful these programmes are in achieving cultural change? And what about their costs and benefits? Are some programmes more suitable than others for certain types of institutions? Are there other strategies out there that achieve similar results with less effort? These are some of the questions this project is trying to address.
Research data management professionals spend a considerable amount of their time doing outreach, teaching, and otherwise engaging with researchers about research data management. Understanding what we can learn from each other and how to exchange practices more effectively are two very important goals of the project.
The case study collection, review and editing are being led Iza Witkowska, a Data Consultant from the University of Utrecht in the Netherlands, together with Andrea Medina-Smith from the USA and Elli Papadopoulou from Greece. They count with the help of 15 enthusiastic volunteers for these tasks. The first project update will be presented at the RDA Thirteen Plenary Meeting in Philadelphia in April 2019.
This blog post is distributed under a CC-BY 4.0 licence.
RDA Researcher Engagement Project
Lauren Cadwallader, Julien Colomb, University of Jena, Maria Cruz, Mary Donaldson, Lambert Heller, Rosie Higman, Elli Papadopoulou, Vanessa Proudman, James Savage, Marta Teperek
Project group members
Helene N. Andreassen, Daniel Bangert, Miriam Braskova, Lauren Cadwallader, John Chodacki, Julien Colomb, Philipp Conzett, Maria Cruz, Mary Donaldson, Biswanath Dutta, Esther Fernandez, Joshua Finnell, Raman Ganguly, Patricia Henning, Amy Hodge, Stein Høydalsvik, Greg Janée, Lynda Kellam, Gabor Kismihok, Iryna Kuchma, Narendra Kumar Bhoi, Young-Joo Lee, Leif Longva, Andrea Medina-Smith, Solomon Mekonnen, Remedios Melero, Rising Osazuwa, Elli Papadopoulou, Fernanda Peset, Josiline Phiri, Piyachat Ratana, Gerry Ryder, James Savage, Souleymane Sogoba, Magdalena Szuflita-Żurawska, Ralf Toepfer, Ellen Verbakel, Irena Vipavc Brvar, Jacquelynne Waldron, Anna Wałek, Yan Wang, Iza Witkowska, Joanne Yeomans
Written by Shalini Kurapati and Marta Teperek
Training needs: research computing skills for open science
In addition to good data management, software sustainability is important for open science.
In accordance with the survey conducted by the Software Sustainability Institute in 2014, 7 out of 10 researchers rely on code for their research. Sharing research data without the supporting code often makes research impossible to reproduce. Good documentation and version control have been highlighted as major contributors to sustainable software. In addition, earlier workshops and survey results indicated that researchers need training on good code writing and code management practices and version control.
Similarly, TU Delft-wide survey on data management needs revealed that 32% of researchers were interested in training on version control and 18% specifically in software carpentry workshops.
What are The Carpentries?
The Carpentries “teach foundational coding, and data science skills to researchers worldwide.” That’s a community-based organisation, which maintains and develops curricula for three different types of workshops: software carpentry, data carpentry, and library carpentry. Detailed and structured lesson plans are available on GitHub and they are delivered by a network of carpentry instructors.
An important element of The Carpentries is that in order to deliver a workshop, instructors need to be certified. The certification process puts a particular emphasis on the pedagogical skills of the instructors.
First software carpentry at TU Delft
TU Delft hosted the first software carpentry workshop on 29 November 2018 as a pilot before officially joining The Carpentries. We had around 30 researchers participating (and another 45 on the waiting list!). The participants were from four faculties at TU Delft: Civil Engineering and Geosciences, Applied Sciences, Technology Policy & Management, and Architecture and Built Environment. We had three instructors and four helpers in the room.
The GitHub pages with the lesson materials are publicly available and can be found here: https://mariekedirk.github.io/2018-11-29-Delft/ All participants were asked to bring their laptops along and to install some specific software. No prior programming knowledge was required. Collaborative notes were taken with Etherpad.
During the workshop, participants downloaded a prepared dataset and they worked with that dataset through the two days. They learnt task automation using Unix shell, version control using git, and python programming using jupyter notebooks.
The Carpentries have a special way of organising feedback. Participants receive red and green post-it notes and use them to indicate problems / completion of tasks during the whole course. Similarly, after the end of each day, the participants are asked to indicate all the plus sides and negatives of the workshop on green and red post-it notes, respectively.
The feedback from the participants after the workshop helped us evaluate the training. The participants were overwhelmingly appreciative of the instructors and helpers and seem to have enjoyed the training. Some of the participants felt that the pace of the workshop was fast and they did not have time to experiment with the data set. Some others wished to get a more personal approach and to actually get an opportunity to work with their own disciplinary datasets.
Plans for the future
The waiting list for the workshop was very long and we had to disappoint more than 45 researchers who didn’t manage to get their spot on the day. In addition, faculty graduate schools have been willing to give course credits for PhD students who attend this workshop, which made the course even more attractive to attend for PhD students. Therefore, to meet the demand, we are planning to organise four more workshops in 2019: two workshops at TU Delft, one in Eindhoven and one in Twente. We will continue to monitor the number of interested researchers and if the need arises, we might consider scheduling some additional courses.
In addition, to increase our capacity in delivering carpentry training, some of the TU Delft’s data stewards and data champions will attend the training to become instructors. We hope to have this instructor training organised in April.
To address the feedback about the pace of the course, we will be more selective and include fewer exercises in our future workshops to ensure that the participants get the chance to experiment and play with their datasets and scripts.
In order to provide some more tailored support to researchers who have started to code but need some additional support to make it work, or who might have attended a carpentry workshop but are not sure how to apply the learning into practice, we will host dedicated coding walk-in hours consultations starting in January 2019.
So… watch out for the next carpentry workshop – scheduled for Spring 2019!
This blog post was written and originally published by Loek Brinkman on his own blog.
On the 26th of September, I participated in the event “Time for open science skills to count in academic careers!”, organised by the European Open Science Cloud Pilot (EOSCPilot) and the 4TU.Centre for Research Data. The goal was to define open science skills that we thought should be endorsed (more) in academic career advancement.
The setting was nice: we were divided in four groups, representing different stages of academic careers (from PhD to full professor) and discussed which open science skills are essential for each career stage. What I liked about the event was that the outcomes of the discussion were communicated to representatives of EOSCpilot and the European Commission. So I’m optimistic that some of the recommendations will, in time, affect European research policies regarding career advancement.
On the other hand, I think we might be skipping a step here. Open science is often talked about as a good thing that we should all strive for (in line with the (in)famous sticker present on many laptops of open science advocates: “Open Science: just science done right”), as though open science is a goal on itself. To me, this doesn’t make a lot of sense. There is no clear definition of open science. It is an umbrella term covering many aspects, e.g. open access, open data, open code, citizen science and many more. So, in practice, people use various definitions of open science that in- or exclude some of the aforementioned aspects of open science, and differ in how these aspect should be prioritised. That means that while many people are in favour of open science, they may disagree greatly on what they think should be addressed first and how.
I don’t see open science as a goal. I see open science as a means to achieve a goal. I think, we should first agree on the goal: specify what we want to change or improve. The way I see it, the goal is to make science more efficient – to achieve more, faster. Starting from this goal, several sub-goal can be defined, such as:
(1) making science more accessible,
(2) making science more transparent & robust,
(3) making science more inclusive.
Open science can be a means to achieve these subgoals. Depending on how you prioritise the subgoals, you might be more interested in (1) open access, (2) open data and code, or (3) citizen science, respectively.
It is not too difficult to come up with a list of open science skills for academics, and it would be awesome if those skills would be endorsed more in academic career advancement. But we first need to define the goals we want to achieve, before we can start to prioritise the means by which these can be achieved. If the endorsement of open science skills can be aligned with the overall goals, then we are well on our way to make science more efficient.
This blog post was originally published by the LSE Impact Blog.
Recommendations on how to better support researchers in good data management and sharing practices are typically focused on developing new tools or improving infrastructure. Yet research shows the most common obstacles are actually cultural, not technological. Marta Teperekand Alastair Dunning outline how appointing data stewards and data champions can be key to improving research data management through positive cultural change.
By now, it’s probably difficult to find a researcher who hasn’t heard of journal requirements for sharing research data supporting publications. Or a researcher who hasn’t heard of funder requirements for data management plans. Or of institutional policies for data management and sharing. That’s a lot of requirements! Especially considering data management is just one set of guidelines researchers need to comply with (on top of doing their own competitive research, of course).
All of these requirements are in place for good reasons. Those who are familiar with the research reproducibility crisis and understand that missing data and code is one of the main reasons for it need no convincing of this. Still, complying with the various data policies is not easy; it requires time and effort from researchers. And not all researchers have the knowledge and skills to professionally manage and share their research data. Some might even wonder what exactly their research data is (or how to find it).
Therefore, it is crucial for institutions to provide their researchers with a helping hand in meeting these policy requirements. This is also important in ensuring policies are actually adhered to and aren’t allowed to become dry documents which demonstrate institutional compliance and goodwill but are of no actual consequence to day-to-day research practice.
The main obstacles to data management and sharing are cultural
But how to best support researchers in good data management and sharing practices? The typical answers to these questions are “let’s build some new tools” or “let’s improve our infrastructure”. When thinking how to provide data management support to researchers at Delft University of Technology (TU Delft), we decided to resist this initial temptation and do some research first.
Several surveys asking researchers about barriers to data sharing indicated that the main obstacles are cultural, not technological. For example, in a recent survey by Houtkoop at el. (2018), psychology researchers were given a list of 15 different barriers to data sharing and asked which ones they agreed with. The top three reasons preventing researchers from sharing their data were:
- “Sharing data is not a common practice in my field.”
- “I prefer to share data upon request.”
- “Preparing data is too time-consuming.”
Interestingly, the only two technological barriers – “My dataset is too big” and “There is no suitable repository to share my data” – were among three at the very bottom of the list. Similar observations can be made based on survey results from Van den Eynden et al. (2016) (life sciences, social sciences, and humanities disciplines) and Johnson et al. (2016) (all disciplines).
At TU Delft, we already have infrastructure and tools for data management in place. The ICT department provides safe storage solutions for data (with regular backups at different locations), while the library offers dedicated support and templates for data management plans and hosts 4TU.Centre for Research Data, a certified and trusted archive for research data. In addition, dedicated funds are made available for researchers wishing to deposit their data into the archive. This being the case, we thought researchers may already receive adequate data management support and no additional resources were required.
To test this, we conducted a survey among the research community at TU Delft. To our surprise, the results indicated that despite all the services and tools already available to support researchers in data management and sharing activities, their practices needed improvement. For example, only around 40% of researchers at TU Delft backed up their data automatically. This was striking, given the fact that all data storage solutions offered by TU Delft ICT are automatically backed up. Responses to open questions provided some explanation for this:
- “People don’t tell us anything, we don’t know the options, we just do it ourselves.”
- “I think data management support, if it exists, is not well-known among the researchers.”
- “I think I miss out on a lot of possibilities within the university that I have not heard of. There is too much sparsely distributed information available and one needs to search for highly specific terminology to find manuals.”
It turns out, again, that the main obstacles preventing people from using existing institutional tools and infrastructure are cultural – data management is not embedded in researchers’ everyday practice.
How to change data management culture?
We believe the best way to help researchers improve data management practices is to invest in people. We have therefore initiated the Data Stewardship project at TU Delft. We appointed dedicated, subject-specific data stewards in each faculty at TU Delft. To ensure the support offered by the data stewards is relevant and specific to the actual problems encountered by researchers, data stewards have (at least) a PhD qualification (or equivalent) in a subject area relevant to the faculty. We also reasoned that it was preferable to hire data stewards with a research background, as this allows them to better relate to researchers and their various pain points as they are likely to have similar experiences from their own research practice.
Vision for data stewardship
There are two main principles of this project. Crucially, the research must stay central. Data stewards are not there to educate researchers on how to do research, but to understand their research processes and workflows and help identify small, incremental improvements in their daily data management practices.
Consequently, data stewards act as consultants, not as police (the objective of the project is to improve cultures, not compliance). The main role of the data stewards is to talk with researchers: to act as the first contact point for any data-related questions researchers might have (be it storage solutions, tools for data management, data archiving options, data management plans, advice on data sharing, budgeting for data management in grant proposals, etc.).
Data stewards should be able to answer around 80% of questions. For the remaining 20%, they ask internal or external experts for advice. But most importantly, researchers no longer need to wonder where to look for answers or who to speak with – they have a dedicated, local contact point for any questions they might have.
Data Champions are leading the way
So has the cultural change happened? This is, and most probably always be, a work in progress. However, allowing data stewards to get to know their research communities has already had a major positive effect. They were able to identify researchers who are particularly interested in data management and sharing issues. Inspired by the University of Cambridge initiative, we asked these researchers if they would like to become Data Champions – local advocates for good data management and sharing practices. To our surprise, more than 20 researchers have already volunteered as Data Champions, and this number is steadily growing. Having Data Champions teaming up with the data stewards allows for the incorporation of peer-to-peer learning strategies into our data management programme and also offers the possibility to create tailored data management workflows, specific to individual research groups.
Technology or people?
Our case at TU Delft might be quite special, as we were privileged to already have the infrastructure and tools in place which allowed us to focus our resources on investing in the right people. At other institutions circumstances may be different. Nonetheless, it’s always worth keeping in mind that even the best tools and infrastructures, without the right people to support them (and to communicate about them!), may fail to be widely adopted by the research community.
Yasemin Turkyilmaz-van der Velden and Marta Teperek are very privileged to represent TU Delft at the International Data Week 2018 in Gaborone, Botswana. Yasemin has been awarded a very competitive grant for Early Career researchers to attend the conference.
We are working hard to make sure that we get the most of our attendance. Yasemin is presenting:
- Poster: Data Stewardship at Delft University of Technology
- As of 7 November 2018, downloaded already 35 times
- Presentation with the same title: Data Stewardship at Delft University of Technology
- Presentation: Research Data Management in Engineering disciplines
- Presentation: Libraries for Research Data: Engaging Researchers with Research Data: what works?
- Session proposal and chairing: Motivations and recognition for good data stewardship
The full programme of the International Data Week can be accessed online.
Authors (in alphabetical order): Maria Cruz (VU), Marc Galland (UvA), Carlos Martinez (NL eScience Center), Raúl Ortiz (TU Delft), Esther Plomp (VU), Anita Schürch (UMCU), Yasemin Türkyilmaz-van der Velden (TU Delft)
Based on the contributions from workshop participants (in alphabetical order): Joke Bakker (University of Groningen), Jochem Bijlard (The Hyve), Mattias de Hollander (NIOO-KNAW), Joep de Ligt (UMCU), Albert Gerritsen (Radboud UMC) Thierry Janssens (rivm), Victor Koppejan (TU Delft), Brett Olivier (Vrije Universiteit Amsterdam), Raúl Ortiz (TU Delft), Esther Plomp (Vrije Universiteit Amsterdam), Jorrit Posthuma (ENPICOM), Anita Schürch (UMCU)
On 2 October 2018, Maria Cruz (VU), Marc Galland (UvA), Carlos Martinez (NL eScienceCenter), and Yasemin Türkyilmaz-van der Velden (TU Delft) facilitated a workshop titled “Software Reproducibility – The Nuts and Bolts”, as part of the DTL Communities@Work 2018 event held in Utrecht, the Netherlands.
Besides the four organisers, there were 24 workshop participants, including researchers, research software engineers/developers, data stewards and others in research support roles.
Below we summarise the background and rationale for the workshop, key discussions and insights, and recommendations. The description of the workshop setup, including information about the participants gathered via Mentimeter, can be found at the end of this report.
The listed authors include the four organisers and the workshop participants who actively contributed to the report. Workshop participants who agreed to be acknowledged for their contributions are also listed.
Rationale for the workshop
The starting point for the workshop was a paper published in Water Resources Research by Hut, van de Giesen and Drost (2017), which argues that carefully documenting and archiving code and research data may not enough to guarantee the reproducibility of computational results. Alongside the use of the current best practices in scientific software development, these authors recommend close collaboration between scientists and research software engineers (RSEs) to ensure scientists are aware of the latest computational advances, most notably the use of containers (e.g. Docker) and open interfaces.
As happened in a previous similar workshop held at TU Delft on 24 May 2018, the participants discussed the merits of these recommendations and how they could be put into practice; and also what role the various stakeholders (researchers, research software engineers, research institutions, data stewards and other research support staff) could play in this regard.
In this second edition of the workshop, the participants also made recommendations for actions that could be taken at the national level in the Netherlands to raise the awareness of software sustainability and reproducibility and to implement the advice from the paper and the workshop. The key discussion points and insights from these discussions and the ensuing recommendations are summarised below, based on information recorded during the workshop within a collaborative google document.
In this report we define reproducibility and reusability of software as follows. Reproducibility is focused on being able to reproduce results obtained in the past – that is, use the same data and the same software to reach the same result (a docker image may be good enough for this). Reusability is concerned with using the software on a different context than it was used before; this could be as simple as using the same software with different data or it may require modification of the original software (docker images may or may not be sufficient for software reusability).
Key discussion points and insights on the advice by Hut, van de Giesen and Drost
Sound but too technical advice
Overall, the groups felt that the advice was sound but too technically focussed, particularly if it is aimed at researchers. Researchers should not need to concern themselves with containers and open API’s, which are too technical to implement. The advice also fails to consider and recognise deeper cultural issues, such as: the lack of awareness on the topic of reproducibility and reusability of research software; the lack of relevant training, tools and support; and the diversity of code.
Concerns regarding the use of containers
Docker may not necessarily be easy to use if you are not a software developer or research software engineer. There was also the concern that containers, although helpful, should not be used to mask bad coding practices. The use of containers also makes it difficult to upgrade the software. Containers make it easier to distribute software on the short term, but to make software sustainable someone needs to understand how to update and build a new container. This is a role for the research software engineers, not the researchers, as there are no tools that are easy to use that allow for re-use of software in different containers. The other issue that was raised was whether Docker and other platforms would still exist in 20 years’ time.
Not all code is equal
Not all software is meant to be maintained or reused. High software quality, version management, code review, etc. will all help with reproducibility and reusability, but at some point in time the software might not be sustainable anymore. Is this necessarily bad? Code from 10 years ago will probably need to be rewritten in newer languages. Defining the scope of code will help determine the level of reproducibility and reusability requirements. In particular, it is important to differentiate between single-use scripts and pipelines that are used repeatedly and/or by different people. While the former do not need to be highly maintained, the latter need to be extensively reviewed and tested. Commercial software is also an issue. In some fields of research, many scientists use Excel or MATLAB. Commercial software is often closed source, making it difficult to test, review and publish it; and sometimes the publication of the code is also not possible for IP or confidentiality reasons.
Training and raising awareness
How much are researchers aware of the reproducibility crisis? Researchers need to be aware of the key features and concepts behind reproducibility and reusability of research software. These concepts are more important than any particular techniques. The first step should thus be raising awareness of these issues. People who are already aware of the reproducibility crisis and of practices conducive to reproducibility and their practical benefits have a responsibility to raise awareness within their department/group/colleagues. Researchers also need to be aware of the possibilities and best practices in order to apply them. Training is important in this regard. Having the right tools and support is also essential. Researchers need to know who to contact for help and support and how to find the right tools.
There should be code review sessions involving all the interested parties. Code review could be similar to peer review and be done at the institutional or departmental level. Working together on software increases the quality of code, particularly if it is reviewed by multiple stakeholders. Sharing the experience and the knowledge gained from these code review sessions more widely would provide a way to advertise and advocate for the best practices in software development.
Building a community behind a particular tool or piece of software was also seen as a good way to ensure that code is maintained and upgraded. If the software is out there and there is interest in it, people will maintain it. Being a part of such community may not necessarily require specific expertise and technical involvement. A user of a tool can very well contribute to the community by raising issues without needing to have specific knowledge about the code.
Good practices in scientific software development
Good coding practices should be publicly available and widely advertised. Building software should start with clearly documented use cases, and these use cases should define the entry points for the code. Materials and methods should include parameters for any executable. The environment configuration should also be added alongside the code to make it reproducible. For software to be redeployable on different platforms (also through time), it needs to be well documented, including open data and workflows. You need to be able to understand what the purpose of the experiment was and how it was done, and how the data was processed, if that is relevant. Version control and releases with DOIs are also important. Testing with proper positive and negative controls, integration, and validation are also critical to re-using software.
The roles of data stewards, RSEs and researchers
RSEs as ambassadors for software reproducibility
While researchers should lead when it comes to reproducibility, data stewards could help raise the awareness of this important issue and of the best practices for software reproducibility. RSEs – often in support roles, standing between the researchers and their software – have a key role to play as ambassadors and should be part of the driving force behind efforts towards software reproducibility. In particular, they should be creating and maintaining software development guidelines. Research support roles, including those of data stewards and RSEs, should be more clearly defined and rewarded; these roles should not be seen or performed as just a side activity. RSEs should be actively involved in the research design and publication process, and should not been seen solely as a supporter of the researcher, but as a collaborator. Unfortunately, the current funding schemes do not reward these activities.
Communication and interaction between the three key stakeholders (researchers, RSEs and data stewards) was seen as a shared responsibility. However, setting up cross-expertise speed-networking events could be an easy way to connect researchers, data stewards and RSEs, and to encourage collaboration. This type of initiatives could be implemented at institutional, national and/or even at international level. At the institutional level, a central service desk could work as a hub to connect researchers to research support experts. Encouraging collaboration by helping researchers connect with available experts provides a way to avoid redundant solutions to similar problems. For collaborations to be fruitful, however, researchers need to understand the perspective of RSEs and data stewards, and vice versa. Domain-specificity is another barrier that can block the collaboration between data stewards, RSEs and researchers.
How to encourage reproducibility in computational research?
As said earlier, researchers should lead when it comes to reproducibility. However, they may not always be interested in reproducibility, as reproducibility does not always guarantee good science. Researchers need to be intrinsically stimulated to document and review their code and to follow the best practices in software management and development. Publishing a methods or software paper that includes easy-to-reuse, high-quality software will help researchers get more citations. User friendly tools that help with software management and reproducibility will also stimulate use by researchers.
Reproducibility should be enforced from the top down
Journals and funders, in particular NWO, should enforce their policies. There should be funding for reproducibility; there should also be standards and requirements and appropriate audits. Data management plans as well as software sustainability plans are essential to ensure best practices. The funders need to become more aware of software sustainability and the needs for software management. For FAIR data there are funding opportunities, but these are not available for FAIR software. There is a need to make good practices in science the de facto standard. FAIR (both for data and software) should be the rule and no longer the exception. There should also be more recognition about publishing data and code, not only papers.
A leading role for national platforms
National platforms, such as the Netherlands eScience Center, should also be responsible and lead the research community into making software and data sustainability a recognised element of the research process. There is also a need among the research community for more knowledge and awareness about the NL eScience Center and the possibilities for collaborations between researchers and RSEs. In this respect, the Netherlands eScience Center should also take the lead in promoting collaboration between RSEs and researchers.
Community building as a bottom-up approach
Besides a top-down approach, building communities from the bottom up was also recommended as a way to connect researchers with relevant research support experts. The Dutch Techcentre for Life Sciences (DTL), for example, could set up a platform to connect individual researchers with software experts. This could be in the form of national cross-expertise speed-networking events or a forum. The NL-RSE initiative could also play a role in this regard and could help raise awareness of the issues around software reproducibility and sustainability.
It is crucial to educate early career researchers, who have the time and interest. Courses and trainings are needed at the universities and at the national level. Researchers should be made aware of good practices for software development and software engineering at the earliest stages of their careers, including at the bachelor and master level.
The workshop session lasted two hours. It started with the organisers introducing themselves, followed by a short survey of the audience using Mentimeter, led by Yasemin Türkyilmaz-van der Velden. Maria Cruz then gave a presentation setting the scene, providing information on reproducibility and summarising the paper and the suggestions by Hut, van de Giesen & Drost (2017). Marc Galland gave a short presentation on software sustainability from the researcher’s point of view, and Carlos Martinez Ortiz gave his perspective on the same subject from the research software engineer’s point of view.
The audience was then split into four groups, with the organisers each joining a group to help facilitate the discussion. Each groups was allotted 45 minutes to answer the following questions within a collaborative google document:
- How can the advice by Hut, van de Giesen & Drost be put in practice?
- Any additional advice?
- How can researchers, RSEs, and data stewards work together towards implementing the advice?
- What needs to happen at the national level in the Netherlands to raise awareness of research software reproducibility and help implement the above or any of your ideas and recommendations?
About the participants
We asked a few questions to the audience, using Mentimeter, to get familiar with their background and their experiences with research software. As seen in the responses below, we had a mixed audience of researchers, research software engineers, data stewards, and people in other research support positions. As expected from a DTL conference, which focussed on the life sciences, most participants had a research background within this area, ranging from biomedical sciences to bioprocess engineering and plant breeding. All participants had experience with research software.
Almost all participants agreed that there is a reproducibility crisis in science, reflecting the high level of awareness among the audience of this important issue. Before moving to the presentation about software reproducibility, we asked the participants what came to their mind about this topic. The answers, which ranged from version control, documentation and persistent identifiers to Git, containers, and Docker, clearly show that the audience was already very familiar with the topic of software reproducibility. In line with this, when we asked what they were doing themselves in terms of software reproducibility, we received very similar answers, with version control taking the lead among the answers to both questions.