Written by Shalini Kurapati and Marta Teperek
Training needs: research computing skills for open science
In addition to good data management, software sustainability is important for open science.
In accordance with the survey conducted by the Software Sustainability Institute in 2014, 7 out of 10 researchers rely on code for their research. Sharing research data without the supporting code often makes research impossible to reproduce. Good documentation and version control have been highlighted as major contributors to sustainable software. In addition, earlier workshops and survey results indicated that researchers need training on good code writing and code management practices and version control.
Similarly, TU Delft-wide survey on data management needs revealed that 32% of researchers were interested in training on version control and 18% specifically in software carpentry workshops.
What are The Carpentries?
The Carpentries “teach foundational coding, and data science skills to researchers worldwide.” That’s a community-based organisation, which maintains and develops curricula for three different types of workshops: software carpentry, data carpentry, and library carpentry. Detailed and structured lesson plans are available on GitHub and they are delivered by a network of carpentry instructors.
An important element of The Carpentries is that in order to deliver a workshop, instructors need to be certified. The certification process puts a particular emphasis on the pedagogical skills of the instructors.
First software carpentry at TU Delft
TU Delft hosted the first software carpentry workshop on 29 November 2018 as a pilot before officially joining The Carpentries. We had around 30 researchers participating (and another 45 on the waiting list!). The participants were from four faculties at TU Delft: Civil Engineering and Geosciences, Applied Sciences, Technology Policy & Management, and Architecture and Built Environment. We had three instructors and four helpers in the room.
The GitHub pages with the lesson materials are publicly available and can be found here: https://mariekedirk.github.io/2018-11-29-Delft/ All participants were asked to bring their laptops along and to install some specific software. No prior programming knowledge was required. Collaborative notes were taken with Etherpad.
During the workshop, participants downloaded a prepared dataset and they worked with that dataset through the two days. They learnt task automation using Unix shell, version control using git, and python programming using jupyter notebooks.
The Carpentries have a special way of organising feedback. Participants receive red and green post-it notes and use them to indicate problems / completion of tasks during the whole course. Similarly, after the end of each day, the participants are asked to indicate all the plus sides and negatives of the workshop on green and red post-it notes, respectively.
The feedback from the participants after the workshop helped us evaluate the training. The participants were overwhelmingly appreciative of the instructors and helpers and seem to have enjoyed the training. Some of the participants felt that the pace of the workshop was fast and they did not have time to experiment with the data set. Some others wished to get a more personal approach and to actually get an opportunity to work with their own disciplinary datasets.
Plans for the future
The waiting list for the workshop was very long and we had to disappoint more than 45 researchers who didn’t manage to get their spot on the day. In addition, faculty graduate schools have been willing to give course credits for PhD students who attend this workshop, which made the course even more attractive to attend for PhD students. Therefore, to meet the demand, we are planning to organise four more workshops in 2019: two workshops at TU Delft, one in Eindhoven and one in Twente. We will continue to monitor the number of interested researchers and if the need arises, we might consider scheduling some additional courses.
In addition, to increase our capacity in delivering carpentry training, some of the TU Delft’s data stewards and data champions will attend the training to become instructors. We hope to have this instructor training organised in April.
To address the feedback about the pace of the course, we will be more selective and include fewer exercises in our future workshops to ensure that the participants get the chance to experiment and play with their datasets and scripts.
In order to provide some more tailored support to researchers who have started to code but need some additional support to make it work, or who might have attended a carpentry workshop but are not sure how to apply the learning into practice, we will host dedicated coding walk-in hours consultations starting in January 2019.
So… watch out for the next carpentry workshop – scheduled for Spring 2019!
Authors: Heather Andrews, Maria Cruz, Angus Whyte, Yasemin Turkyilmaz-van der Velden, Shalini Kurapati
To read Part 1 of the blog post follow this link.
Researchers at all levels should be equipped with skills relevant to open science and FAIR data, and the practice of these skills should be effectively rewarded and recognised. This much is clear and has been recently highlighted in the “Turning FAIR into reality” Report of the European Commission FAIR Data Expert Group. Indeed, that report states “there is an urgent need to develop skills in relation to FAIR data” and “metrics and indicators for research contributions need to be reconsidered and enriched to ensure they act as compelling incentives for Open Science and FAIR.”
To change the academic rewards system, it is necessary to define and agree upon the skills researchers need to have at different stages of their careers. This was the goal of the workshop “It’s time for open science skills to count in academic careers” held at TU Delft on 26 September 2018. This post forms the second part of the report of the the workshop. Here we present the outcomes of workshop, based on the interactive group work described below. The results of this work will be applied in EOSCpilot, which is laying the groundwork for skills development in the European Commission’s European Open Science Cloud.
Overview of the hands-on workshop
The participants were divided into four groups. Each group focused on a specific career level according to the European Commission’s framework for research careers:
- R1: First Stage Researcher (up to the point of PhD)
- R2: Recognized Researcher (PhD holders or equivalent who are not yet fully independent)
- R3: Established Researcher (researchers who have developed a level of independence.)
- R4: Researchers – Leading Researcher (researchers leading their research area or field)
The above mentioned groups were led by Yasemin Turkyilmaz, Ellen Verbakel, Maria Cruz and Alastair Dunning, respectively. The registered participants of the one day event included researchers from all career levels, librarians, data stewards, and policymakers. However R4 researchers were unavailable for the hands-on workshop.
Each group received a list of nine pre-defined open science skills together with a detailed explanation for each skill. The group activity for each of the groups was divided into two parts. In the first part, the groups were asked to shortlist a maximum of 4 skills most relevant to their respective career level (R1-R4). Subsequently, for each skill, they wrote down on post-it notes: 1. Why is this skill relevant to researchers at this career level? 2. What would be the evidence that researchers at this career level have these skills and can apply it in practice? (in other words: what does a person applying the skill do?) 3. What support (from support staff, service providers) will researchers need to apply this skill?
After displaying their ideas on whiteboards, the groups were given 5 minutes each to present a summary of their main findings to the other groups. This was followed by a short discussion followed with questions from all the groups. Below is the summary of the findings of each of the four groups.
R1 group activity summary:
The group focused on early career researchers. This group of researchers is dependant on the researchers at higher positions in the academic ladder. At the same time, often this group of researchers is perceived as being the ‘bold’ ones able to take initiatives and amenable to change. It was also recognised that R1 researchers can benefit greatly from more training and more information on open science possibilities. The four skills shortlisted by this group were:
1) Adherence to the FAIR data and code principles during and after research. The group added the term ‘code’ to this skill as well. The group reasoned that the FAIR principles represent what is necessary to have sustainable sharing and archiving of both data and code, which is important for verifiability of research. The group recognised that in order to adhere to the FAIR principles, early career researchers have to receive training as an inherent part of the curriculum. They also stressed that good supervision and getting good examples would facilitate this skill.
2) Securing funding for open science/support. The group reasoned that receiving specific training on awareness of funding opportunities for Open Science (e.g. funding for Open Access publishing) as well as preparation to secure such grants would support early career researchers to be more independent from their respective supervisors and practise open science more freely. They proposed that this could be achieved in collaboration with grant support offices and graduate schools.
3) Awareness and adherence to relevant ethical and legal policies. This skill was found to be important to overcome fear and uncertainties about ethical and legal requirements. This can make early career researchers more confident when discussing with senior colleagues the parameters these requirements set around sharing their research outputs. The group proposed Q&A catalogues which could help early career researchers better understand complex terms such as codes of conduct, legal terms, informed consent etc.
4) Recognizing and acknowledging the contribution of others. The group found it important for early career researchers to know how to properly cite data, code and methods; and how to acknowledge collaborators, technicians in the lab, etc. The group argued that if people are properly acknowledged, then they are more willing to contribute again, which is a great source of motivation for early career researchers. They expect University and Faculty policies and training to be instrumental for this skill. These should also promote the standards for using persistent identifiers, and the CRediT taxonomy for acknowledging who has contributed what to a publication.
R2 group activity summary
This group saw R2 researchers as typically those at the postdoctoral level. The group thought that postdocs were seen as researchers who are not in charge of funding nor leading projects like type 3 and type 4 researchers. Postdocs were seen as researchers focused on making effective collaborations, and working on building up their reputation. The four skills proposed were:
1) Recognising and acknowledging the contribution of others. Recognition was perceived as the main driver for postdoctoral researchers, as well as for researchers working with postdocs. The group thought that researchers need a policy framework that enforces proper recognition and receive training on how to get recognition (e.g. setting up and using ORCID).
2) Making use of open data from others. The group considered this skill as important for verifiability, and as an effective way to start collaborations with other researchers. However, to put this skill into practice, researchers need to know how to search for datasets and assess their quality. Support staff should define the standards for high quality open data, provide support in data curation and give training to researchers on best open data practices.
3) Adherence to good code management practices. This skill was also considered important for verifiability purposes and to stimulate others to reuse the code. It is seen as a quality stamp for the respective code creator, which improves the researcher’s reputation. In order to get this skill, researchers would benefit from training on version control and on writing proper code documentation. It was also suggested that researchers could learn more about these matters from research software engineers.
4) Using or developing research tools open for reuse by others. The group felt that standardisation is crucial to enable effective collaborations. Thus, researchers should receive training on the use of platforms such as github and training on metadata standards.
R3 group activity summary
This group proposed 3 skills, unlike the other groups that considered the maximum number of 4 skills (proposed by the workshop organizing committee). Largely because it was felt that the skill ‘being a role model in practicing open science’ would by definition cover most of the practical skills on the list. The group considered this to be the most important skill for R3 researchers. These researchers are already established in their careers and their fields (already developing and leading some projects), but are still very much involved in the day-to-day practice of research (e.g. they still acquire data or write software). As such, they can lead by example and influence not only earlier career researchers, particularly those they directly supervise, but also more senior colleagues. R3 researchers are very aware of the obstacles which researchers encounter when trying to change how research groups work. The proposed skills were the following:
1) Being a role model in practising open science. Stage 3 researchers were seen as researchers who are still very active in research, and but who also have a close relation with senior colleagues, the researchers in higher positions. This gives them the opportunity to establish how researchers are evaluated within their team. Stage 3 researchers have an influential role within their team, e.g. in the hiring and promotion decisions within the team. In order to do this, it was acknowledged that stage 3 researchers need support from the R4 researchers. This was a key point of discussion, because even though stage 3 researchers are seen as big influencers in the academic ladder, they still depend on the stage 4 researchers and funding committees. In relation to this skill, it was also felt that R3 researchers should not only lead by example in the way they practice open science, but should also directly influence others by speaking about it. In short, practice what they preach, and preach what they practice.
2) Securing funding for open science/support. Stage 3 researchers are involved in hiring people and applying for funding. When applying for funding for example, they should be explicit about how open science will be carried out throughout the project. When hiring, they should include open science requirements in the hiring criteria. The group also recognised that for this to happen effectively, funders need to be willing to provide funding to pay for the costs of open science activities associated with projects, and research grant offices need to advise researchers on how to include these costs in their grant applications.
3) Recognizing and acknowledging the contribution of others. The group felt this is an important area where R3 researchers can lead by example. In addition, R3 researchers are usually still building up a network and collaborations, and to do this effectively, recognition is always necessary.
R4 group activity summary
The group considered project leaders as researchers who are usually less involved in the day-to-day research practices of research. As project leaders they may be in charge of project management and involved in designing research projects, policies and regulations, vision and strategy. Nevertheless, it was acknowledged that researchers at this senior career stage still needed substantial support from their institution in order to put their vision into practice. Having this in mind, the group shortlisted the following skills:
1) Being a role model in practising open science. As project leaders, stage 4 researchers can influence a broader community (not only the researchers in their project, but also funding committees, other project leaders, executive boards, etc.). They can promote change in daily practices, but also in research policies. Stage 4 researchers could become open science role models by promoting and discussing it with their network. Advocating for open science during meetings and conferences; participating in policy development, and changing the practices within their respective groups. In order to do this, they need platforms and tools at their institutions, they also need recognition for advocating for open science, and they need support from their team members.
2) Recognising and acknowledging the contribution of others. Just like in the other groups, this group found it relevant for researchers to recognise everyone’s contribution in a project; recognising not only the scientific staff but also the support staff (laboratory technicians, data managers, etc.). In order to do it, the careers of the support staff should also be recognised for example, by creating new job profiles for data managers, data stewards, etc.
3) Developing a vision and strategy on how to integrate OS practices in the normal practice of doing research. This skill was found to help create the link between principles and actual practice. Stage 4 researchers usually work on the ‘big pictures’ of research, and thus, they have to have a vision and strategy to steer their research and project members. In doing so, they should have the advice of support staff, to ensure the feasibility of their vision. They should also have clear information about who does/can-do what within their institution, and about financial possibilities for them to turn their vision into reality.
4) Awareness and adherence to relevant ethical and legal policies. This skill was seen as relevant because senior researchers are accountable for their project team’s behaviour. Any risk of ethical and/or legal infringement will jeopardise the reputation not only of the project leader, but also of their entire group and, quite likely, the institution they belong to. Thus, it is important for stage 4 researchers to establish procedures dealing with ethical and legal issues. In order to properly do this, the institution should provide researchers with integrated support. The role of each support staff member should be well-defined (and well-informed to the researcher), there should be effective communication within the support staff, and the workflows through which the researchers can receive support need to be clearly stated.
Overall workshop summary
Overall, all groups stressed the importance of peer to peer learning: everyone can contribute to changing cultures and daily practices. All groups also agreed that proper infrastructure and policy support from institutions is required for researchers to truly implement open science practices.
Finally, recognition was seen as one of the main drivers for both scientific and non-scientific staff participating in a research project and all groups stressed the importance of proper recognition of open science practices.
The ideas and discussion generated during the workshop have given us a rich corpus of information to reflect on the workshop objectives and to envision a road map for the future to implement these ideas and discussions. The workshop outputs will be applied in EOSCpilot to help focus its Skills Framework on the key skills identified, the rationale for these, and the mapping of skills to researcher career stages together with the support requirements. Watch this space for progress and updates.
And finally a motto for everyone: change is in your hands! Everyone can contribute to change of practice in their own spheres of influence.
Authors: Shalini Kurapati, Marta Teperek, Maria Cruz, Angus Whyte
Disclaimer: In the spirit of openness and transparency, we would like to share that Shalini Kurapati wrote parts of this blog post based on the zenodo record of the presentations even though she wasn’t present during the event. Her account was verified by the remaining authors who were present.
To read Part 2 of this blog post follow this link.
Open Science is not always easy – skills are urgently needed
Open science is becoming a ubiquitous and recurring theme in the current academic environment. Researchers are increasingly expected to publicly share their research outputs (data, code, models etc.) as well as their publications. This often requires considerable effort from researchers to manage and curate their research outputs to make them shareable. But are these efforts appropriately rewarded? Emphasising the number of publications in high impact factor journals as the only valuable metric for academic promotion and hiring won’t motivate researchers to practise open science.
There is a lot of interest in changing the reward system to better align it with the actions researchers are expected to take towards more open research practices, for instance, the OSPP Rewards WG. Making sure that researchers have the right skills to do that is the other side of that coin. To change the rewards system, we have to understand and identify the skills researchers should be rewarded, and recognise that these may change at different stages of their academic careers. This was precisely the goal of the workshop on 26 September 2018 that we organised at TU Delft jointly with the EOSCpilot. The EOSCpilot project is laying the groundwork for the European Open Science Cloud, and wants to offer a framework for institutions and others to develop the skills needed for researchers, data stewards, and others who support research to help put open science into practice.
The workshop was aptly titled “It’s time for open science skills to count in academic careers” (#openskills18). The workshop format combined presentations on related topics with interactive group work in the afternoon. In this post, we summarise the presentations and in a separate blog post we’ll present the outcomes of the workshop and will reflect on the key findings /thoughts on future steps.
The aim and format of the workshop was presented by Valentino Cavalli of LIBER and EOSCPilot. In his welcome note, Mr.Cavalli explained barriers to open science in the European and wider academic context. These barriers include a culture of disincentives, fragmentation between infrastructures, interoperability issues and access to computational resources. He highlighted that the workshop would focus on the culture of disincentives, which has to be changed such that researchers at all careers levels are equipped with relevant skills and suitably rewarded for putting them into practice
The opening talks were delivered by Ms. Anne de Vries (PhD students Network Netherlands), Prof. Bartel Van de Walle (TU Delft) and Mr. Rinze Benedictus (UMC Utrecht).
Ms. Anne de Vries shared the perspectives of the eurodoc, the European council of doctoral candidates and junior researchers on open science policy and practices. She stated that it is important to identify and train open science skills for early career researchers based on their disciplines. Early career researchers should also be made aware on how to make their outputs FAIR and how open science skills will not only take science forward but also positively influence their careers. She also reflected that senior staff should support early career researchers in practising open science and thus also need appropriate education and training.
Prof. Bartel van de Walle spoke on the open science policies and practical examples from his domain of information management during humanitarian crisis response. He also presented the challenges of implementing open science due to the inertia in research institutions that are often resistant to change. He insisted that open science is not just a requirement of funding agencies but is the right way forward to democratise science and achieve the UN’s sustainable development goals. He also pointed out the waves of change and indicated an example of successful implementation of open science policies in practices like McGill University’s Neurological institute and hospital. He concluded his speech by saying that open science is just science done right.
Mr. Rinze Benedictus delivered a powerful message with his talk: institutions should not equate the impact of a research work to an impact factor of a journal. He displayed the reinforcing loop of how authorship in high impact journals is an incentive for researchers for receiving more funding and recognition and to continue with the cycle of publishing to increase citation scores. He showed the damning evidence from The Lancet and Nature about the reproducibility crisis in science due to the earlier said focus on publishing to establish scholarship. While referring to global initiatives such as the DORA to change attitudes among institutions and individual researchers, he gave a concrete example of how UMC Utrecht is implementing good practices in rewarding researchers. For instance, at evaluation meetings at UMC, a researcher would be asked “How did you arrive at your research question and what are your next steps”? Rather than the traditional “what is your measurable output”.
Dr. Simon Kerridge (CASRAI) gave a talk on the CreDIT taxonomy. The problem that CreDIT tries to solve is that the current authorship criteria in publication doesn’t give sufficient recognition for various contributions of researchers. In addition, authorship credit alone doesn’t support accountability for the research results. He stated that since science is increasingly a team effort, credit needs to be given where due to incentivise researchers for their unique contribution. He explained that the CreDiT taxonomy aims to offer a role based credit systems, where the contributors can assign themselves credit for 14 tasks: writing, supervision, review, data analysis, project management and so on. Finally, he presented the vision for the future of increasing the awareness of the CreDiT taxonomy and to create feedback mechanism to evaluate future versions and to link it platforms like ORCID and Crossref.
The closing remarks of the workshop were provided by Ms. Anette Björnsson (European Commission) and by Mr. Kevin Ashley (Digital Curation Center & EOSCPilot).
Ms. Anette Björnsson reflected on the current initiatives within the European Commission towards changing academic rewards. She highlighted the importance of several recent reports produced by EC Working Groups: Evaluation of Research Careers fully acknowledging Open Science Practices, the report on Next Generation Metrics and Turning FAIR Data into reality. All of them influence the current thinking at the European Commission and also help shape the mission and vision of the European Open Science Cloud. She also stressed that large collaborative efforts at the European level require cooperation and consensus between all EU Member States, which often require time and patience. The situation is no different when it comes to the implementation of policies and changing practices in open science: individual Member States are at different stages of implementation and have varying levels of infrastructure and personnel currently available to them. However, Ms.Björnsson ensured us that while sometimes slower than desired, change is coming. Given that EOSC is a collaborative, pan-European endeavour, the chances are that changes brought with the EOSC will also be more effective and sustainable long-term.
Mr. Kevin Ashley then continued reflecting on the discussions which took place throughout the day, and in particular, the points raised by researchers during the interactive workshop part. He stressed that the common priority to all researchers, regardless of their career stages seems to be to get the recognition they deserve for Open Science activities. He reflected that there are (numerous) barriers to practical implementation of Open Science and to rewarding those practising Open Science appropriately, but that these barriers should not stop anyone from changing the status quo. As Dr. Maria Cruz beautifully summarised in her tweet, based on Mr. Ashley’s words: It’s possible to change the academic rewards system. It’s possible for PhD students. It’s possible for senior researchers. And it’s possible for institutions.
The format, content and outcomes of the hands-on workshop during the event, together with some reflections and thoughts on next steps are published in a separate blog post.
- All presentations of the speakers can be viewed and downloaded here.
This blog post was written and originally published by Loek Brinkman on his own blog.
On the 26th of September, I participated in the event “Time for open science skills to count in academic careers!”, organised by the European Open Science Cloud Pilot (EOSCPilot) and the 4TU.Centre for Research Data. The goal was to define open science skills that we thought should be endorsed (more) in academic career advancement.
The setting was nice: we were divided in four groups, representing different stages of academic careers (from PhD to full professor) and discussed which open science skills are essential for each career stage. What I liked about the event was that the outcomes of the discussion were communicated to representatives of EOSCpilot and the European Commission. So I’m optimistic that some of the recommendations will, in time, affect European research policies regarding career advancement.
On the other hand, I think we might be skipping a step here. Open science is often talked about as a good thing that we should all strive for (in line with the (in)famous sticker present on many laptops of open science advocates: “Open Science: just science done right”), as though open science is a goal on itself. To me, this doesn’t make a lot of sense. There is no clear definition of open science. It is an umbrella term covering many aspects, e.g. open access, open data, open code, citizen science and many more. So, in practice, people use various definitions of open science that in- or exclude some of the aforementioned aspects of open science, and differ in how these aspect should be prioritised. That means that while many people are in favour of open science, they may disagree greatly on what they think should be addressed first and how.
I don’t see open science as a goal. I see open science as a means to achieve a goal. I think, we should first agree on the goal: specify what we want to change or improve. The way I see it, the goal is to make science more efficient – to achieve more, faster. Starting from this goal, several sub-goal can be defined, such as:
(1) making science more accessible,
(2) making science more transparent & robust,
(3) making science more inclusive.
Open science can be a means to achieve these subgoals. Depending on how you prioritise the subgoals, you might be more interested in (1) open access, (2) open data and code, or (3) citizen science, respectively.
It is not too difficult to come up with a list of open science skills for academics, and it would be awesome if those skills would be endorsed more in academic career advancement. But we first need to define the goals we want to achieve, before we can start to prioritise the means by which these can be achieved. If the endorsement of open science skills can be aligned with the overall goals, then we are well on our way to make science more efficient.
Authors (in alphabetical order): Maria Cruz (VU), Marc Galland (UvA), Carlos Martinez (NL eScience Center), Raúl Ortiz (TU Delft), Esther Plomp (VU), Anita Schürch (UMCU), Yasemin Türkyilmaz-van der Velden (TU Delft)
Based on the contributions from workshop participants (in alphabetical order): Joke Bakker (University of Groningen), Jochem Bijlard (The Hyve), Mattias de Hollander (NIOO-KNAW), Joep de Ligt (UMCU), Albert Gerritsen (Radboud UMC) Thierry Janssens (rivm), Victor Koppejan (TU Delft), Brett Olivier (Vrije Universiteit Amsterdam), Raúl Ortiz (TU Delft), Esther Plomp (Vrije Universiteit Amsterdam), Jorrit Posthuma (ENPICOM), Anita Schürch (UMCU)
On 2 October 2018, Maria Cruz (VU), Marc Galland (UvA), Carlos Martinez (NL eScienceCenter), and Yasemin Türkyilmaz-van der Velden (TU Delft) facilitated a workshop titled “Software Reproducibility – The Nuts and Bolts”, as part of the DTL Communities@Work 2018 event held in Utrecht, the Netherlands.
Besides the four organisers, there were 24 workshop participants, including researchers, research software engineers/developers, data stewards and others in research support roles.
Below we summarise the background and rationale for the workshop, key discussions and insights, and recommendations. The description of the workshop setup, including information about the participants gathered via Mentimeter, can be found at the end of this report.
The listed authors include the four organisers and the workshop participants who actively contributed to the report. Workshop participants who agreed to be acknowledged for their contributions are also listed.
Rationale for the workshop
The starting point for the workshop was a paper published in Water Resources Research by Hut, van de Giesen and Drost (2017), which argues that carefully documenting and archiving code and research data may not enough to guarantee the reproducibility of computational results. Alongside the use of the current best practices in scientific software development, these authors recommend close collaboration between scientists and research software engineers (RSEs) to ensure scientists are aware of the latest computational advances, most notably the use of containers (e.g. Docker) and open interfaces.
As happened in a previous similar workshop held at TU Delft on 24 May 2018, the participants discussed the merits of these recommendations and how they could be put into practice; and also what role the various stakeholders (researchers, research software engineers, research institutions, data stewards and other research support staff) could play in this regard.
In this second edition of the workshop, the participants also made recommendations for actions that could be taken at the national level in the Netherlands to raise the awareness of software sustainability and reproducibility and to implement the advice from the paper and the workshop. The key discussion points and insights from these discussions and the ensuing recommendations are summarised below, based on information recorded during the workshop within a collaborative google document.
In this report we define reproducibility and reusability of software as follows. Reproducibility is focused on being able to reproduce results obtained in the past – that is, use the same data and the same software to reach the same result (a docker image may be good enough for this). Reusability is concerned with using the software on a different context than it was used before; this could be as simple as using the same software with different data or it may require modification of the original software (docker images may or may not be sufficient for software reusability).
Key discussion points and insights on the advice by Hut, van de Giesen and Drost
Sound but too technical advice
Overall, the groups felt that the advice was sound but too technically focussed, particularly if it is aimed at researchers. Researchers should not need to concern themselves with containers and open API’s, which are too technical to implement. The advice also fails to consider and recognise deeper cultural issues, such as: the lack of awareness on the topic of reproducibility and reusability of research software; the lack of relevant training, tools and support; and the diversity of code.
Concerns regarding the use of containers
Docker may not necessarily be easy to use if you are not a software developer or research software engineer. There was also the concern that containers, although helpful, should not be used to mask bad coding practices. The use of containers also makes it difficult to upgrade the software. Containers make it easier to distribute software on the short term, but to make software sustainable someone needs to understand how to update and build a new container. This is a role for the research software engineers, not the researchers, as there are no tools that are easy to use that allow for re-use of software in different containers. The other issue that was raised was whether Docker and other platforms would still exist in 20 years’ time.
Not all code is equal
Not all software is meant to be maintained or reused. High software quality, version management, code review, etc. will all help with reproducibility and reusability, but at some point in time the software might not be sustainable anymore. Is this necessarily bad? Code from 10 years ago will probably need to be rewritten in newer languages. Defining the scope of code will help determine the level of reproducibility and reusability requirements. In particular, it is important to differentiate between single-use scripts and pipelines that are used repeatedly and/or by different people. While the former do not need to be highly maintained, the latter need to be extensively reviewed and tested. Commercial software is also an issue. In some fields of research, many scientists use Excel or MATLAB. Commercial software is often closed source, making it difficult to test, review and publish it; and sometimes the publication of the code is also not possible for IP or confidentiality reasons.
Training and raising awareness
How much are researchers aware of the reproducibility crisis? Researchers need to be aware of the key features and concepts behind reproducibility and reusability of research software. These concepts are more important than any particular techniques. The first step should thus be raising awareness of these issues. People who are already aware of the reproducibility crisis and of practices conducive to reproducibility and their practical benefits have a responsibility to raise awareness within their department/group/colleagues. Researchers also need to be aware of the possibilities and best practices in order to apply them. Training is important in this regard. Having the right tools and support is also essential. Researchers need to know who to contact for help and support and how to find the right tools.
There should be code review sessions involving all the interested parties. Code review could be similar to peer review and be done at the institutional or departmental level. Working together on software increases the quality of code, particularly if it is reviewed by multiple stakeholders. Sharing the experience and the knowledge gained from these code review sessions more widely would provide a way to advertise and advocate for the best practices in software development.
Building a community behind a particular tool or piece of software was also seen as a good way to ensure that code is maintained and upgraded. If the software is out there and there is interest in it, people will maintain it. Being a part of such community may not necessarily require specific expertise and technical involvement. A user of a tool can very well contribute to the community by raising issues without needing to have specific knowledge about the code.
Good practices in scientific software development
Good coding practices should be publicly available and widely advertised. Building software should start with clearly documented use cases, and these use cases should define the entry points for the code. Materials and methods should include parameters for any executable. The environment configuration should also be added alongside the code to make it reproducible. For software to be redeployable on different platforms (also through time), it needs to be well documented, including open data and workflows. You need to be able to understand what the purpose of the experiment was and how it was done, and how the data was processed, if that is relevant. Version control and releases with DOIs are also important. Testing with proper positive and negative controls, integration, and validation are also critical to re-using software.
The roles of data stewards, RSEs and researchers
RSEs as ambassadors for software reproducibility
While researchers should lead when it comes to reproducibility, data stewards could help raise the awareness of this important issue and of the best practices for software reproducibility. RSEs – often in support roles, standing between the researchers and their software – have a key role to play as ambassadors and should be part of the driving force behind efforts towards software reproducibility. In particular, they should be creating and maintaining software development guidelines. Research support roles, including those of data stewards and RSEs, should be more clearly defined and rewarded; these roles should not be seen or performed as just a side activity. RSEs should be actively involved in the research design and publication process, and should not been seen solely as a supporter of the researcher, but as a collaborator. Unfortunately, the current funding schemes do not reward these activities.
Communication and interaction between the three key stakeholders (researchers, RSEs and data stewards) was seen as a shared responsibility. However, setting up cross-expertise speed-networking events could be an easy way to connect researchers, data stewards and RSEs, and to encourage collaboration. This type of initiatives could be implemented at institutional, national and/or even at international level. At the institutional level, a central service desk could work as a hub to connect researchers to research support experts. Encouraging collaboration by helping researchers connect with available experts provides a way to avoid redundant solutions to similar problems. For collaborations to be fruitful, however, researchers need to understand the perspective of RSEs and data stewards, and vice versa. Domain-specificity is another barrier that can block the collaboration between data stewards, RSEs and researchers.
How to encourage reproducibility in computational research?
As said earlier, researchers should lead when it comes to reproducibility. However, they may not always be interested in reproducibility, as reproducibility does not always guarantee good science. Researchers need to be intrinsically stimulated to document and review their code and to follow the best practices in software management and development. Publishing a methods or software paper that includes easy-to-reuse, high-quality software will help researchers get more citations. User friendly tools that help with software management and reproducibility will also stimulate use by researchers.
Reproducibility should be enforced from the top down
Journals and funders, in particular NWO, should enforce their policies. There should be funding for reproducibility; there should also be standards and requirements and appropriate audits. Data management plans as well as software sustainability plans are essential to ensure best practices. The funders need to become more aware of software sustainability and the needs for software management. For FAIR data there are funding opportunities, but these are not available for FAIR software. There is a need to make good practices in science the de facto standard. FAIR (both for data and software) should be the rule and no longer the exception. There should also be more recognition about publishing data and code, not only papers.
A leading role for national platforms
National platforms, such as the Netherlands eScience Center, should also be responsible and lead the research community into making software and data sustainability a recognised element of the research process. There is also a need among the research community for more knowledge and awareness about the NL eScience Center and the possibilities for collaborations between researchers and RSEs. In this respect, the Netherlands eScience Center should also take the lead in promoting collaboration between RSEs and researchers.
Community building as a bottom-up approach
Besides a top-down approach, building communities from the bottom up was also recommended as a way to connect researchers with relevant research support experts. The Dutch Techcentre for Life Sciences (DTL), for example, could set up a platform to connect individual researchers with software experts. This could be in the form of national cross-expertise speed-networking events or a forum. The NL-RSE initiative could also play a role in this regard and could help raise awareness of the issues around software reproducibility and sustainability.
It is crucial to educate early career researchers, who have the time and interest. Courses and trainings are needed at the universities and at the national level. Researchers should be made aware of good practices for software development and software engineering at the earliest stages of their careers, including at the bachelor and master level.
The workshop session lasted two hours. It started with the organisers introducing themselves, followed by a short survey of the audience using Mentimeter, led by Yasemin Türkyilmaz-van der Velden. Maria Cruz then gave a presentation setting the scene, providing information on reproducibility and summarising the paper and the suggestions by Hut, van de Giesen & Drost (2017). Marc Galland gave a short presentation on software sustainability from the researcher’s point of view, and Carlos Martinez Ortiz gave his perspective on the same subject from the research software engineer’s point of view.
The audience was then split into four groups, with the organisers each joining a group to help facilitate the discussion. Each groups was allotted 45 minutes to answer the following questions within a collaborative google document:
- How can the advice by Hut, van de Giesen & Drost be put in practice?
- Any additional advice?
- How can researchers, RSEs, and data stewards work together towards implementing the advice?
- What needs to happen at the national level in the Netherlands to raise awareness of research software reproducibility and help implement the above or any of your ideas and recommendations?
About the participants
We asked a few questions to the audience, using Mentimeter, to get familiar with their background and their experiences with research software. As seen in the responses below, we had a mixed audience of researchers, research software engineers, data stewards, and people in other research support positions. As expected from a DTL conference, which focussed on the life sciences, most participants had a research background within this area, ranging from biomedical sciences to bioprocess engineering and plant breeding. All participants had experience with research software.
Almost all participants agreed that there is a reproducibility crisis in science, reflecting the high level of awareness among the audience of this important issue. Before moving to the presentation about software reproducibility, we asked the participants what came to their mind about this topic. The answers, which ranged from version control, documentation and persistent identifiers to Git, containers, and Docker, clearly show that the audience was already very familiar with the topic of software reproducibility. In line with this, when we asked what they were doing themselves in terms of software reproducibility, we received very similar answers, with version control taking the lead among the answers to both questions.
On Thursday 30 August and on Friday 31 August TU Delft Library hosted two events dedicated to the new European General Data Protection Regulation (GDPR) and its implications for research data. Both events were organised by the Research Data Netherlands: collaboration between the 4TU.Center for Research Data, DANS and SURF (represented by the National Research Data Management Coordination Point).
First: do no harm. Protecting personal data is not against data sharing
On the first day, we heard case studies from experts in the field, as well as from various institutional support service providers. Veerle Van den Eynden from the UK Data Service kicked off the day with her presentation, which clearly stated that the need to protect personal is not against data sharing. She outlined the framework provided by the GDPR which make sharing possible, and explained that when it comes to data sharing one should always adhere to the principle “do no harm”. However, she reflected that too often, both researchers and research support services (such as ethics committees), prefer to avoid any possible risks rather than to carefully consider them and manage them appropriately. She concluded by providing a compelling case study from the UK Data Service, where researchers were able to successfully share data from research on vulnerable individuals (asylum seekers and refugees).
From a one-stop shop solution to privacy champions
We have subsequently heard case studies from four Dutch research institutions: Tilburg University, TU Delft, VU Amsterdam and Erasmus University Rotterdam about their practical approaches to supporting researchers working with personal research data. Jan Jans from Tilburg explained their “one stop shop” form, which, when completed by researchers, sorts out all the requirements related to GDPR, ethics and research data management. Marthe Uitterhoeve from TU Delft said that Delft was developing a similar approach, but based on data management plans. Marlon Domingus from Erasmus University Rotterdam explained their process based on defining different categories of research and determining the types of data processing associated with them, rather than trying to list every single research project at the institution. Finally, Jolien Scholten from VU Amsterdam presented their idea of appointing privacy champions who receive dedicated training on data protection and who act as the first contact points for questions related to GDPR within their communities.
Lots of inspiring ideas and there was a consensus in the room that it would be worth re-convening in a year’s time to evaluate the different approaches and to share lessons learned.
How to share research data in practice?
Next, we discussed three different models for helping researchers share their research data. Emilie Kraaikamp from DANS presented their strategy for providing two different access levels to data: open access data and restricted access data. Open datasets consist mostly of research data which are fully anonymised. Restricted access data need to be requested (via an email to the depositor) before the access can be granted (the depositor decides whether access to data can be granted or not).
Veerle Van Den Eynden from the UK Data Service discussed their approach based on three different access levels: open data, safeguarded data (equivalent to “restricted access data” in DANS) and controlled data. Controlled datasets are very sensitive and researchers who wish to get access to such datasets need to undergo a strict vetting procedure. They need to complete training, their application needs to be supported by a research institution, and typically researchers access such datasets in safe locations, on safe servers and are not allowed to copy the data. Veerle explained that only a relatively small number of sensitive datasets (usually from governmental agencies) are shared under controlled access conditions.
The last case study was from Zosia Beckles from the University of Bristol, who explained that at Bristol, a dedicated Data Access Committee has been created to handle requests for controlled access datasets. Researchers responsible for the datasets are asked for advice how to respond to requests, but it is the Data Access Committee who ultimately decides whether access should be granted or not, and, if necessary, can overrule the researcher’s advice. The procedure relieves researchers from the burden of dealing with data access requests.
DataTags – decisions about sharing made easy(ier)
Ilona von Stein from DANS continued the discussion about data sharing and means by which sharing could be facilitated. She described an online tool developed by DANS (based on a concept initially developed by colleagues from Harvard University, but adapted to European GDPR needs) allowing researchers to answer simple questions about their datasets and to return a tag, which defines whether data is suitable for sharing and what are the most suitable sharing options. The prototype of the tool is now available for testing and DANS plans to develop it further to see if it could be also used to assist researchers working with data across the whole research lifecycle (not only at the final, data sharing stage).
What are the most impactful & effortless tactics to provide controlled access to research data?
The final interactive part of the workshop was led by Alastair Dunning, the Head of 4TU.Center for Research Data. Alastair used Mentimeter to ask attendees to judge the impact and effort of fourteen different tactics and solutions which can be used at research institutions to provide controlled access to research data. More than forty people engaged with the online survey and this allowed Alastair to shortlist five tactics which were deemed the most impactful/effort-efficient:
- Create a list of trusted archives for researchers can deposit personal data
- Publish an informed consent template for your researchers
- Publish on university website a list of FAQs concerning personal data
- Provide access to a trusted Data Anonymisation Service
- Create categories to define different types of personal data at your institution
Alastair concluded that these should probably be the priorities to work on for research institutions which don’t yet have the above in place.
How to put all the learning into practice?
The second event was dedicated to putting all the learning and concepts developed during the first day into practice. Researchers working with personal data, as well as those directly supporting researchers, brought their laptops and followed practical exercises led by Veerle Van den Eynden and Cristina Magder from the UK Data Service. We started by looking at a GDPR-compliant consent form template. Subsequently, we practised data encryption using VeraCrypt. We then moved to data anonymisation strategies. First, Veerle explained possible tactics (again, with nicely illustrated examples) for de-identification and pseudo-nymisation of qualitative data. This was then followed by a comprehensive hands-on training delivered by Cristina Magder on disclosure review and de-identification of quantitative data using sdcMicro.
Altogether, the practical exercises allowed one to clearly understand how to effectively work with personal research data from the very start of the project (consent, encryption) all the way to data de-identification to enable sharing and data re-use (whilst protecting personal data at all stages).
Conclusion: GDPR as an opportunity
I think that the key conclusion of both days was that the GDPR, while challenging to implement, provides an excellent opportunity both to researchers and to research institutions to review and improve their research practices. The key to this is collaboration: across the various stakeholders within the institution (to make workflows more coherent and improve collaboration), but also between different institutions. An important aspect of these two events was that representatives from multiple institutions (and countries!) were present to talk about their individual approaches and considerations. Practice exchange and lessons learned can be invaluable to allow institutions to avoid similar mistakes and to decide which approaches might work best in particular settings.
We will definitely consider organising a similar meeting in a year’s time to see where everyone is and which workflows and solutions tend to work best.
Presentations from both events are available on Zenodo:
Authors (in alphabetical order): Maria Cruz, Shalini Kurapati, Yasemin Türkyilmaz-van der Velden
With contribution from workshop participants (in alphabetical order): Patrick Aerts (Netherlands eScience Center + DANS), Kees den Heijer (TU Delft), Jelle de Plaa (SRON), Jordi Domingo (KNMI), Martin Donnelly (University of Edinburgh), Raman Ganguly (University of Vienna), Rolf Hut (TU Delft), Karsten Kryger Hansen (Aalborg University), Carlos Martinez (Netherlands eScience center), Joakim Philipson (Stockholm University), Wessel Sloof (University Medical Center Groningen), Martijn Staats (Wageningen University & Research), Michael Svendsen (Royal Danish Library), Jan van der Ploeg (University Medical Center Groningen), Ronald van Haren (Netherlands eScience Center), Egbert Westerhof (DIFFER).
How to cite: A citable version of this report is available since July 06, 2018 through the Open Science Framework. DOI: 10.31219/osf.io/z48cm.
On 24 May 2018, Maria Cruz, Shalini Kurapati, and Yasemin Türkyilmaz-van der Velden led a workshop titled “Software Reproducibility: How to put it into practice?”, as part of the event Towards cultural change in data management – data stewardship in practice held at TU Delft, the Netherlands. There were 17 workshop participants, including researchers, data stewards, and research software engineers. Here we describe the rationale of the workshop, what happened on the day, key discussions and insights, and suggested next steps.
Rationale for the workshop
There is no denying about a reproducibility crisis in science. In some fields, over half of published studies fail reproducibility tests. A survey of 1576 scientists conducted by Nature in 2016 revealed that over 90% of the respondents agreed that there was some level of crisis and over 70% said they had tried and failed to reproduce another group’s experiments. Given the ubiquitousness of software in many areas of contemporary scientific research, it could be argued that there can’t be reproducibility in science without reproducible software.
In a recent Comment in Water Resources Research, in response to “Most computational hydrology is not reproducible, so is it really science?”, Hut, Van de Giesen and Drost (2017) argue that documenting and archiving code and data is not enough to guarantee the reproducibility of computational results. They suggest the use of software containers and open interfaces, and that researchers work more closely with research software engineers (RSEs) to learn best practices in software design. This advice is presented in the context of hydrology, but it could be applied more generally.
Inspired by the article and its advice, the workshop aimed to explore the various topics of software reproducibility— how some of the advice could be put in practice, and what role could institutions, data stewards, and research software engineers play in this regard.
What happened on the day
The workshop session lasted one hour. It started with the moderators introducing themselves, followed by a short survey of the audience using Mentimeter, led by Yasemin Türkyilmaz-van der Velden. Maria Cruz then gave a presentation setting the scene, providing information on reproducibility, and summarising the paper and the suggestions by Hut, Van de Giesen and Drost (2017). One of the authors of the paper, Rolf Hut, attended the session and also said a few words about his paper and his ideas. Shalini Kurapati then moderated the main activity described below.
Using Mentimeter, we asked a few questions to the audience to get familiar with their background and their experiences with research software. As seen in the responses below, there was an almost perfectly balanced audience formed by researchers, research software engineers, data stewards, and people in other research support positions.
There was also a very good balance in terms of the participants’ research backgrounds, which ranged from various disciplines in the physical sciences and medical research to intellectual history and information science. Almost all participants had experience with research software.
The majority (65%) of the participants agreed that there is a reproducibility crisis in science. The reproducibility crisis was a hot topic during the main event (Towards cultural change in data management – data stewardship in practice) and had been already discussed comprehensively earlier in the programme, by the keynote speaker Danny Kingsley. Therefore, a potential bias in the responses of the participants cannot be excluded. Regardless, it was interesting to see that there is an increasing awareness of this important issue.
Before moving to the presentation about software reproducibility, we asked the participants what came to their mind about this topic. The answers, which ranged from sustainability, preservation, and integrity to GitHub, Zenodo, containers, and Docker, clearly show that the audience was already very familiar with the topic of software reproducibility.
Prior to the workshop, we hoped to have a group of participants with diverse backgrounds and interests. Fortunately, that turned out to be the case, and we could form groups with the ideal representation from all stakeholders of interest. We divided the participants into 4 groups, each containing at least a data steward/research support staff, a research software engineer, and a researcher. The groups were invited to answer the following questions within a collaborative google document:
- What do you think about the advice of Hut, Van de Giesen and Drost, i.e., use containers (e.g. Docker), use open interfaces, and closely collaborate with Research Software Engineers to improve software reproducibility?
- Any additional advice to Hut et al., to improve software reproducibility?
- How can researchers, RSEs and data stewards work together towards implementing the above advice?
The groups were allotted 20 minutes to discuss answers to the questions and record them in the google document. The workshop moderators were able to actively monitor the google document to steer the groups towards timely conclusion of their activity. After the activity was concluded, a representative from each group pitched their activity summary and their key findings for a minute. The contents of the google document and the pitches, which were recorded live in the workshop slides, provide us the insights on the challenges and corresponding solutions for software sustainability and reproducibility, that are reported in the next section.
Key discussion points and insights on the advice by Hut, van de Giesen & Drost
Lack of funding for Research Software Engineers
Lack of (sustainable) funding for hiring RSEs is one of the obstacles to putting the advice of Hut, Van de Giesen and Drost into practice. Larger projects typically already have RSEs on board, but for smaller projects this is not always possible. It is difficult to recruit and hire RSEs across disciplines. However, the Netherlands eScience Center is a good example of a way to centrally fund research software development and to pool developer expertise across disciplines.
Open source software is not always an option
Because of scientific competition, commercial and IP interests, it is not always an option to make research software available as open source software. Dockers (containers) are also not an option for commercial software.
High-level documentation is very important. A good README file does part of the job, but documentation and a user manual are also important. Any information (e.g. equations, model) behind the software also needs to be shared.
Lack of support for software validation is also a problem. As an addition to the advice by Hut, van de Giesen and Drost, one of the groups suggested that support should also be provided for software validation (in-house code review). In cases where professional software support is limited, it would already be helpful if researchers would review each others’ code, just like they would do with papers. If the goal is to make code understandable to other researchers, then their feedback will be paramount. Organizing code reviews in a research group could improve the quality of the code significantly with only a small time investment.
The role of data stewards, RSEs and researchers
Data stewards – the link between researchers and RSEs?
Two groups saw the role of data stewards as brokers between researchers and RSEs. It was acknowledged that researchers and RSEs should interact more to improve research codes (e.g. review of codes). Data stewards could be the link between the two. Data stewards could monitor possible synergy between projects and link researchers with specialist RSE expertise. One group felt that data stewards should provide the toolbox, with principles (e.g FAIR principles) and guidance, and RSEs should help implement those principles, because they have the knowledge to do so.
Could RSEs do more to promote best practices?
Two groups thought that RSEs could take a more proactive role in providing training for researchers, promoting best practices, and generally propagating their knowledge. Without assigning roles, one of the groups felt that implementing the advice of Hut, Van de Giesen and Drost required programming courses, support staff to help out researchers at departmental level, and the breakdown of problems into smaller problems that could be solved with up-to-date techniques based on expert knowledge. Could RSEs also help with this?
Opportunities and barriers, and the role of institutions
Integrated teams working across university faculties, departments, and institutes, with a single point of contact, could provide a way for researchers, data stewards, and RSEs to work together. Fear of stepping into others’ “working areas” and different working cultures may create barriers, as well as the potential lack of scientific/research expertise from RSEs and software developers.
Sustainable funding is a challenge, so is the lack of recognition for developing research software in the current academic rewards system. There also needs to be a persuasive driver beyond just doing the right thing. This can come from funders, publishers and possibly institutions. Any driver will be most persuasive when it comes from the research community itself.
Universities and institutes should promote good practice for software engineering as part of open science.
The short-term goal of this workshop was to start a conversation on the topic of software reproducibility between researchers, research support staff (data stewards and others with a similar role), and research software engineers, and to make the results of this discussion public via this forum.
The immediate next step is to bring the results of this interaction to the attention of the community Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE), which includes researchers and research software engineers, but lacks a strong connection with data stewards and research data support. We plan to submit a paper for the 9th International Workshop on Sustainable Software for Science: Practice and Experiences, to be held in Amsterdam on 29 October 2018.
The time available for the workshop was limited and not all the issues were discussed or discussed in enough depth. For example, it would be interesting to discuss in more detail what training and resources researchers and research supports need to help software reproducibility become more of a reality and what role could data stewards and research software engineers play in this regard.
Institutions could certainly do more in terms of funding and rewards for software development, and promoting best practices. How to make this happen in a global and concerted manner?
In the long-term we will continue to engage with the necessary stakeholders to keep the discussion alive and to define operational solutions towards improving software reproducibility and sustainability.
- Workshop slides
- Water Resources Research paper by Hut, van de Giesen & Drost, in response to “Most computational hydrology is not reproducible, so is it really science?”
- Participants’ contribution through collaborative google document