Written by Shalini Kurapati and Marta Teperek
Training needs: research computing skills for open science
In addition to good data management, software sustainability is important for open science.
In accordance with the survey conducted by the Software Sustainability Institute in 2014, 7 out of 10 researchers rely on code for their research. Sharing research data without the supporting code often makes research impossible to reproduce. Good documentation and version control have been highlighted as major contributors to sustainable software. In addition, earlier workshops and survey results indicated that researchers need training on good code writing and code management practices and version control.
Similarly, TU Delft-wide survey on data management needs revealed that 32% of researchers were interested in training on version control and 18% specifically in software carpentry workshops.
What are The Carpentries?
The Carpentries “teach foundational coding, and data science skills to researchers worldwide.” That’s a community-based organisation, which maintains and develops curricula for three different types of workshops: software carpentry, data carpentry, and library carpentry. Detailed and structured lesson plans are available on GitHub and they are delivered by a network of carpentry instructors.
An important element of The Carpentries is that in order to deliver a workshop, instructors need to be certified. The certification process puts a particular emphasis on the pedagogical skills of the instructors.
First software carpentry at TU Delft
TU Delft hosted the first software carpentry workshop on 29 November 2018 as a pilot before officially joining The Carpentries. We had around 30 researchers participating (and another 45 on the waiting list!). The participants were from four faculties at TU Delft: Civil Engineering and Geosciences, Applied Sciences, Technology Policy & Management, and Architecture and Built Environment. We had three instructors and four helpers in the room.
The GitHub pages with the lesson materials are publicly available and can be found here: https://mariekedirk.github.io/2018-11-29-Delft/ All participants were asked to bring their laptops along and to install some specific software. No prior programming knowledge was required. Collaborative notes were taken with Etherpad.
During the workshop, participants downloaded a prepared dataset and they worked with that dataset through the two days. They learnt task automation using Unix shell, version control using git, and python programming using jupyter notebooks.
The Carpentries have a special way of organising feedback. Participants receive red and green post-it notes and use them to indicate problems / completion of tasks during the whole course. Similarly, after the end of each day, the participants are asked to indicate all the plus sides and negatives of the workshop on green and red post-it notes, respectively.
The feedback from the participants after the workshop helped us evaluate the training. The participants were overwhelmingly appreciative of the instructors and helpers and seem to have enjoyed the training. Some of the participants felt that the pace of the workshop was fast and they did not have time to experiment with the data set. Some others wished to get a more personal approach and to actually get an opportunity to work with their own disciplinary datasets.
Plans for the future
The waiting list for the workshop was very long and we had to disappoint more than 45 researchers who didn’t manage to get their spot on the day. In addition, faculty graduate schools have been willing to give course credits for PhD students who attend this workshop, which made the course even more attractive to attend for PhD students. Therefore, to meet the demand, we are planning to organise four more workshops in 2019: two workshops at TU Delft, one in Eindhoven and one in Twente. We will continue to monitor the number of interested researchers and if the need arises, we might consider scheduling some additional courses.
In addition, to increase our capacity in delivering carpentry training, some of the TU Delft’s data stewards and data champions will attend the training to become instructors. We hope to have this instructor training organised in April.
To address the feedback about the pace of the course, we will be more selective and include fewer exercises in our future workshops to ensure that the participants get the chance to experiment and play with their datasets and scripts.
In order to provide some more tailored support to researchers who have started to code but need some additional support to make it work, or who might have attended a carpentry workshop but are not sure how to apply the learning into practice, we will host dedicated coding walk-in hours consultations starting in January 2019.
So… watch out for the next carpentry workshop – scheduled for Spring 2019!
Authors (in alphabetical order): Maria Cruz (VU), Marc Galland (UvA), Carlos Martinez (NL eScience Center), Raúl Ortiz (TU Delft), Esther Plomp (VU), Anita Schürch (UMCU), Yasemin Türkyilmaz-van der Velden (TU Delft)
Based on the contributions from workshop participants (in alphabetical order): Joke Bakker (University of Groningen), Jochem Bijlard (The Hyve), Mattias de Hollander (NIOO-KNAW), Joep de Ligt (UMCU), Albert Gerritsen (Radboud UMC) Thierry Janssens (rivm), Victor Koppejan (TU Delft), Brett Olivier (Vrije Universiteit Amsterdam), Raúl Ortiz (TU Delft), Esther Plomp (Vrije Universiteit Amsterdam), Jorrit Posthuma (ENPICOM), Anita Schürch (UMCU)
On 2 October 2018, Maria Cruz (VU), Marc Galland (UvA), Carlos Martinez (NL eScienceCenter), and Yasemin Türkyilmaz-van der Velden (TU Delft) facilitated a workshop titled “Software Reproducibility – The Nuts and Bolts”, as part of the DTL Communities@Work 2018 event held in Utrecht, the Netherlands.
Besides the four organisers, there were 24 workshop participants, including researchers, research software engineers/developers, data stewards and others in research support roles.
Below we summarise the background and rationale for the workshop, key discussions and insights, and recommendations. The description of the workshop setup, including information about the participants gathered via Mentimeter, can be found at the end of this report.
The listed authors include the four organisers and the workshop participants who actively contributed to the report. Workshop participants who agreed to be acknowledged for their contributions are also listed.
Rationale for the workshop
The starting point for the workshop was a paper published in Water Resources Research by Hut, van de Giesen and Drost (2017), which argues that carefully documenting and archiving code and research data may not enough to guarantee the reproducibility of computational results. Alongside the use of the current best practices in scientific software development, these authors recommend close collaboration between scientists and research software engineers (RSEs) to ensure scientists are aware of the latest computational advances, most notably the use of containers (e.g. Docker) and open interfaces.
As happened in a previous similar workshop held at TU Delft on 24 May 2018, the participants discussed the merits of these recommendations and how they could be put into practice; and also what role the various stakeholders (researchers, research software engineers, research institutions, data stewards and other research support staff) could play in this regard.
In this second edition of the workshop, the participants also made recommendations for actions that could be taken at the national level in the Netherlands to raise the awareness of software sustainability and reproducibility and to implement the advice from the paper and the workshop. The key discussion points and insights from these discussions and the ensuing recommendations are summarised below, based on information recorded during the workshop within a collaborative google document.
In this report we define reproducibility and reusability of software as follows. Reproducibility is focused on being able to reproduce results obtained in the past – that is, use the same data and the same software to reach the same result (a docker image may be good enough for this). Reusability is concerned with using the software on a different context than it was used before; this could be as simple as using the same software with different data or it may require modification of the original software (docker images may or may not be sufficient for software reusability).
Key discussion points and insights on the advice by Hut, van de Giesen and Drost
Sound but too technical advice
Overall, the groups felt that the advice was sound but too technically focussed, particularly if it is aimed at researchers. Researchers should not need to concern themselves with containers and open API’s, which are too technical to implement. The advice also fails to consider and recognise deeper cultural issues, such as: the lack of awareness on the topic of reproducibility and reusability of research software; the lack of relevant training, tools and support; and the diversity of code.
Concerns regarding the use of containers
Docker may not necessarily be easy to use if you are not a software developer or research software engineer. There was also the concern that containers, although helpful, should not be used to mask bad coding practices. The use of containers also makes it difficult to upgrade the software. Containers make it easier to distribute software on the short term, but to make software sustainable someone needs to understand how to update and build a new container. This is a role for the research software engineers, not the researchers, as there are no tools that are easy to use that allow for re-use of software in different containers. The other issue that was raised was whether Docker and other platforms would still exist in 20 years’ time.
Not all code is equal
Not all software is meant to be maintained or reused. High software quality, version management, code review, etc. will all help with reproducibility and reusability, but at some point in time the software might not be sustainable anymore. Is this necessarily bad? Code from 10 years ago will probably need to be rewritten in newer languages. Defining the scope of code will help determine the level of reproducibility and reusability requirements. In particular, it is important to differentiate between single-use scripts and pipelines that are used repeatedly and/or by different people. While the former do not need to be highly maintained, the latter need to be extensively reviewed and tested. Commercial software is also an issue. In some fields of research, many scientists use Excel or MATLAB. Commercial software is often closed source, making it difficult to test, review and publish it; and sometimes the publication of the code is also not possible for IP or confidentiality reasons.
Training and raising awareness
How much are researchers aware of the reproducibility crisis? Researchers need to be aware of the key features and concepts behind reproducibility and reusability of research software. These concepts are more important than any particular techniques. The first step should thus be raising awareness of these issues. People who are already aware of the reproducibility crisis and of practices conducive to reproducibility and their practical benefits have a responsibility to raise awareness within their department/group/colleagues. Researchers also need to be aware of the possibilities and best practices in order to apply them. Training is important in this regard. Having the right tools and support is also essential. Researchers need to know who to contact for help and support and how to find the right tools.
There should be code review sessions involving all the interested parties. Code review could be similar to peer review and be done at the institutional or departmental level. Working together on software increases the quality of code, particularly if it is reviewed by multiple stakeholders. Sharing the experience and the knowledge gained from these code review sessions more widely would provide a way to advertise and advocate for the best practices in software development.
Building a community behind a particular tool or piece of software was also seen as a good way to ensure that code is maintained and upgraded. If the software is out there and there is interest in it, people will maintain it. Being a part of such community may not necessarily require specific expertise and technical involvement. A user of a tool can very well contribute to the community by raising issues without needing to have specific knowledge about the code.
Good practices in scientific software development
Good coding practices should be publicly available and widely advertised. Building software should start with clearly documented use cases, and these use cases should define the entry points for the code. Materials and methods should include parameters for any executable. The environment configuration should also be added alongside the code to make it reproducible. For software to be redeployable on different platforms (also through time), it needs to be well documented, including open data and workflows. You need to be able to understand what the purpose of the experiment was and how it was done, and how the data was processed, if that is relevant. Version control and releases with DOIs are also important. Testing with proper positive and negative controls, integration, and validation are also critical to re-using software.
The roles of data stewards, RSEs and researchers
RSEs as ambassadors for software reproducibility
While researchers should lead when it comes to reproducibility, data stewards could help raise the awareness of this important issue and of the best practices for software reproducibility. RSEs – often in support roles, standing between the researchers and their software – have a key role to play as ambassadors and should be part of the driving force behind efforts towards software reproducibility. In particular, they should be creating and maintaining software development guidelines. Research support roles, including those of data stewards and RSEs, should be more clearly defined and rewarded; these roles should not be seen or performed as just a side activity. RSEs should be actively involved in the research design and publication process, and should not been seen solely as a supporter of the researcher, but as a collaborator. Unfortunately, the current funding schemes do not reward these activities.
Communication and interaction between the three key stakeholders (researchers, RSEs and data stewards) was seen as a shared responsibility. However, setting up cross-expertise speed-networking events could be an easy way to connect researchers, data stewards and RSEs, and to encourage collaboration. This type of initiatives could be implemented at institutional, national and/or even at international level. At the institutional level, a central service desk could work as a hub to connect researchers to research support experts. Encouraging collaboration by helping researchers connect with available experts provides a way to avoid redundant solutions to similar problems. For collaborations to be fruitful, however, researchers need to understand the perspective of RSEs and data stewards, and vice versa. Domain-specificity is another barrier that can block the collaboration between data stewards, RSEs and researchers.
How to encourage reproducibility in computational research?
As said earlier, researchers should lead when it comes to reproducibility. However, they may not always be interested in reproducibility, as reproducibility does not always guarantee good science. Researchers need to be intrinsically stimulated to document and review their code and to follow the best practices in software management and development. Publishing a methods or software paper that includes easy-to-reuse, high-quality software will help researchers get more citations. User friendly tools that help with software management and reproducibility will also stimulate use by researchers.
Reproducibility should be enforced from the top down
Journals and funders, in particular NWO, should enforce their policies. There should be funding for reproducibility; there should also be standards and requirements and appropriate audits. Data management plans as well as software sustainability plans are essential to ensure best practices. The funders need to become more aware of software sustainability and the needs for software management. For FAIR data there are funding opportunities, but these are not available for FAIR software. There is a need to make good practices in science the de facto standard. FAIR (both for data and software) should be the rule and no longer the exception. There should also be more recognition about publishing data and code, not only papers.
A leading role for national platforms
National platforms, such as the Netherlands eScience Center, should also be responsible and lead the research community into making software and data sustainability a recognised element of the research process. There is also a need among the research community for more knowledge and awareness about the NL eScience Center and the possibilities for collaborations between researchers and RSEs. In this respect, the Netherlands eScience Center should also take the lead in promoting collaboration between RSEs and researchers.
Community building as a bottom-up approach
Besides a top-down approach, building communities from the bottom up was also recommended as a way to connect researchers with relevant research support experts. The Dutch Techcentre for Life Sciences (DTL), for example, could set up a platform to connect individual researchers with software experts. This could be in the form of national cross-expertise speed-networking events or a forum. The NL-RSE initiative could also play a role in this regard and could help raise awareness of the issues around software reproducibility and sustainability.
It is crucial to educate early career researchers, who have the time and interest. Courses and trainings are needed at the universities and at the national level. Researchers should be made aware of good practices for software development and software engineering at the earliest stages of their careers, including at the bachelor and master level.
The workshop session lasted two hours. It started with the organisers introducing themselves, followed by a short survey of the audience using Mentimeter, led by Yasemin Türkyilmaz-van der Velden. Maria Cruz then gave a presentation setting the scene, providing information on reproducibility and summarising the paper and the suggestions by Hut, van de Giesen & Drost (2017). Marc Galland gave a short presentation on software sustainability from the researcher’s point of view, and Carlos Martinez Ortiz gave his perspective on the same subject from the research software engineer’s point of view.
The audience was then split into four groups, with the organisers each joining a group to help facilitate the discussion. Each groups was allotted 45 minutes to answer the following questions within a collaborative google document:
- How can the advice by Hut, van de Giesen & Drost be put in practice?
- Any additional advice?
- How can researchers, RSEs, and data stewards work together towards implementing the advice?
- What needs to happen at the national level in the Netherlands to raise awareness of research software reproducibility and help implement the above or any of your ideas and recommendations?
About the participants
We asked a few questions to the audience, using Mentimeter, to get familiar with their background and their experiences with research software. As seen in the responses below, we had a mixed audience of researchers, research software engineers, data stewards, and people in other research support positions. As expected from a DTL conference, which focussed on the life sciences, most participants had a research background within this area, ranging from biomedical sciences to bioprocess engineering and plant breeding. All participants had experience with research software.
Almost all participants agreed that there is a reproducibility crisis in science, reflecting the high level of awareness among the audience of this important issue. Before moving to the presentation about software reproducibility, we asked the participants what came to their mind about this topic. The answers, which ranged from version control, documentation and persistent identifiers to Git, containers, and Docker, clearly show that the audience was already very familiar with the topic of software reproducibility. In line with this, when we asked what they were doing themselves in terms of software reproducibility, we received very similar answers, with version control taking the lead among the answers to both questions.
Written by Julie Beardsell and originally published on the ICT innovation blog.
Responding to the challenge
Navigating the often complex legal landscape of software licensing can be a genuine challenge for researchers, particularly when starting up a research project for the first time.
Today’s researchers, when starting out on a PhD, may typically need to be competent scientists and programmers, but also understand and be sufficiently knowledgeable to make the right choices for the licensing of the software that they build. Without the latter, they risk a number of potentially undesirable situations.
To help researchers navigate their way, a working group at TU Delft has put together a set of guidelines for researchers, which can be downloaded here.
In addition, the working group is drafting a document to provide more detailed information and links to related documents and useful sources.
Open, reproducibility, peer-review and building upon others’ work
The very nature of the research itself, may be to create or improve upon software, which might be worked on openly and collaboratively with others, from institutions other than those of the institution by which the researcher is employed.
In addition, the task of creating scientific software as output of the researcher does not end with the publication of results which will have been generated as a result of the developed software. Making that software available for inspection and use by other scientists is essential to reproducibility, peer-review, and the ability to build upon others’ work.
Importance of licenses
Licenses are important for setting out the terms on which software may be used, modified, or distributed and by whom. Without a license agreement, software may be left in a state of legal uncertainty in which potential users may not know which limitations owners may want to enforce, and owners may leave themselves vulnerable to legal claims or have difficulty controlling how their work is used. Licenses can also be used to facilitate access to software as well as restrict it.
The working group consists of Julie Beardsell, Merlijn Bazuine, Susan Branchett, Maria Marques de Barros Cruz and Marta Teperek and the group would like to thank those researchers across the faculties who have contributed so far and encouraged the development of this initiative at TU Delft.
About the Author
“Open Source Software Guidelines for Researchers” by Julie Beardsell is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
Authors (in alphabetical order): Maria Cruz, Shalini Kurapati, Yasemin Türkyilmaz-van der Velden
With contribution from workshop participants (in alphabetical order): Patrick Aerts (Netherlands eScience Center + DANS), Kees den Heijer (TU Delft), Jelle de Plaa (SRON), Jordi Domingo (KNMI), Martin Donnelly (University of Edinburgh), Raman Ganguly (University of Vienna), Rolf Hut (TU Delft), Karsten Kryger Hansen (Aalborg University), Carlos Martinez (Netherlands eScience center), Joakim Philipson (Stockholm University), Wessel Sloof (University Medical Center Groningen), Martijn Staats (Wageningen University & Research), Michael Svendsen (Royal Danish Library), Jan van der Ploeg (University Medical Center Groningen), Ronald van Haren (Netherlands eScience Center), Egbert Westerhof (DIFFER).
How to cite: A citable version of this report is available since July 06, 2018 through the Open Science Framework. DOI: 10.31219/osf.io/z48cm.
On 24 May 2018, Maria Cruz, Shalini Kurapati, and Yasemin Türkyilmaz-van der Velden led a workshop titled “Software Reproducibility: How to put it into practice?”, as part of the event Towards cultural change in data management – data stewardship in practice held at TU Delft, the Netherlands. There were 17 workshop participants, including researchers, data stewards, and research software engineers. Here we describe the rationale of the workshop, what happened on the day, key discussions and insights, and suggested next steps.
Rationale for the workshop
There is no denying about a reproducibility crisis in science. In some fields, over half of published studies fail reproducibility tests. A survey of 1576 scientists conducted by Nature in 2016 revealed that over 90% of the respondents agreed that there was some level of crisis and over 70% said they had tried and failed to reproduce another group’s experiments. Given the ubiquitousness of software in many areas of contemporary scientific research, it could be argued that there can’t be reproducibility in science without reproducible software.
In a recent Comment in Water Resources Research, in response to “Most computational hydrology is not reproducible, so is it really science?”, Hut, Van de Giesen and Drost (2017) argue that documenting and archiving code and data is not enough to guarantee the reproducibility of computational results. They suggest the use of software containers and open interfaces, and that researchers work more closely with research software engineers (RSEs) to learn best practices in software design. This advice is presented in the context of hydrology, but it could be applied more generally.
Inspired by the article and its advice, the workshop aimed to explore the various topics of software reproducibility— how some of the advice could be put in practice, and what role could institutions, data stewards, and research software engineers play in this regard.
What happened on the day
The workshop session lasted one hour. It started with the moderators introducing themselves, followed by a short survey of the audience using Mentimeter, led by Yasemin Türkyilmaz-van der Velden. Maria Cruz then gave a presentation setting the scene, providing information on reproducibility, and summarising the paper and the suggestions by Hut, Van de Giesen and Drost (2017). One of the authors of the paper, Rolf Hut, attended the session and also said a few words about his paper and his ideas. Shalini Kurapati then moderated the main activity described below.
Using Mentimeter, we asked a few questions to the audience to get familiar with their background and their experiences with research software. As seen in the responses below, there was an almost perfectly balanced audience formed by researchers, research software engineers, data stewards, and people in other research support positions.
There was also a very good balance in terms of the participants’ research backgrounds, which ranged from various disciplines in the physical sciences and medical research to intellectual history and information science. Almost all participants had experience with research software.
The majority (65%) of the participants agreed that there is a reproducibility crisis in science. The reproducibility crisis was a hot topic during the main event (Towards cultural change in data management – data stewardship in practice) and had been already discussed comprehensively earlier in the programme, by the keynote speaker Danny Kingsley. Therefore, a potential bias in the responses of the participants cannot be excluded. Regardless, it was interesting to see that there is an increasing awareness of this important issue.
Before moving to the presentation about software reproducibility, we asked the participants what came to their mind about this topic. The answers, which ranged from sustainability, preservation, and integrity to GitHub, Zenodo, containers, and Docker, clearly show that the audience was already very familiar with the topic of software reproducibility.
Prior to the workshop, we hoped to have a group of participants with diverse backgrounds and interests. Fortunately, that turned out to be the case, and we could form groups with the ideal representation from all stakeholders of interest. We divided the participants into 4 groups, each containing at least a data steward/research support staff, a research software engineer, and a researcher. The groups were invited to answer the following questions within a collaborative google document:
- What do you think about the advice of Hut, Van de Giesen and Drost, i.e., use containers (e.g. Docker), use open interfaces, and closely collaborate with Research Software Engineers to improve software reproducibility?
- Any additional advice to Hut et al., to improve software reproducibility?
- How can researchers, RSEs and data stewards work together towards implementing the above advice?
The groups were allotted 20 minutes to discuss answers to the questions and record them in the google document. The workshop moderators were able to actively monitor the google document to steer the groups towards timely conclusion of their activity. After the activity was concluded, a representative from each group pitched their activity summary and their key findings for a minute. The contents of the google document and the pitches, which were recorded live in the workshop slides, provide us the insights on the challenges and corresponding solutions for software sustainability and reproducibility, that are reported in the next section.
Key discussion points and insights on the advice by Hut, van de Giesen & Drost
Lack of funding for Research Software Engineers
Lack of (sustainable) funding for hiring RSEs is one of the obstacles to putting the advice of Hut, Van de Giesen and Drost into practice. Larger projects typically already have RSEs on board, but for smaller projects this is not always possible. It is difficult to recruit and hire RSEs across disciplines. However, the Netherlands eScience Center is a good example of a way to centrally fund research software development and to pool developer expertise across disciplines.
Open source software is not always an option
Because of scientific competition, commercial and IP interests, it is not always an option to make research software available as open source software. Dockers (containers) are also not an option for commercial software.
High-level documentation is very important. A good README file does part of the job, but documentation and a user manual are also important. Any information (e.g. equations, model) behind the software also needs to be shared.
Lack of support for software validation is also a problem. As an addition to the advice by Hut, van de Giesen and Drost, one of the groups suggested that support should also be provided for software validation (in-house code review). In cases where professional software support is limited, it would already be helpful if researchers would review each others’ code, just like they would do with papers. If the goal is to make code understandable to other researchers, then their feedback will be paramount. Organizing code reviews in a research group could improve the quality of the code significantly with only a small time investment.
The role of data stewards, RSEs and researchers
Data stewards – the link between researchers and RSEs?
Two groups saw the role of data stewards as brokers between researchers and RSEs. It was acknowledged that researchers and RSEs should interact more to improve research codes (e.g. review of codes). Data stewards could be the link between the two. Data stewards could monitor possible synergy between projects and link researchers with specialist RSE expertise. One group felt that data stewards should provide the toolbox, with principles (e.g FAIR principles) and guidance, and RSEs should help implement those principles, because they have the knowledge to do so.
Could RSEs do more to promote best practices?
Two groups thought that RSEs could take a more proactive role in providing training for researchers, promoting best practices, and generally propagating their knowledge. Without assigning roles, one of the groups felt that implementing the advice of Hut, Van de Giesen and Drost required programming courses, support staff to help out researchers at departmental level, and the breakdown of problems into smaller problems that could be solved with up-to-date techniques based on expert knowledge. Could RSEs also help with this?
Opportunities and barriers, and the role of institutions
Integrated teams working across university faculties, departments, and institutes, with a single point of contact, could provide a way for researchers, data stewards, and RSEs to work together. Fear of stepping into others’ “working areas” and different working cultures may create barriers, as well as the potential lack of scientific/research expertise from RSEs and software developers.
Sustainable funding is a challenge, so is the lack of recognition for developing research software in the current academic rewards system. There also needs to be a persuasive driver beyond just doing the right thing. This can come from funders, publishers and possibly institutions. Any driver will be most persuasive when it comes from the research community itself.
Universities and institutes should promote good practice for software engineering as part of open science.
The short-term goal of this workshop was to start a conversation on the topic of software reproducibility between researchers, research support staff (data stewards and others with a similar role), and research software engineers, and to make the results of this discussion public via this forum.
The immediate next step is to bring the results of this interaction to the attention of the community Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE), which includes researchers and research software engineers, but lacks a strong connection with data stewards and research data support. We plan to submit a paper for the 9th International Workshop on Sustainable Software for Science: Practice and Experiences, to be held in Amsterdam on 29 October 2018.
The time available for the workshop was limited and not all the issues were discussed or discussed in enough depth. For example, it would be interesting to discuss in more detail what training and resources researchers and research supports need to help software reproducibility become more of a reality and what role could data stewards and research software engineers play in this regard.
Institutions could certainly do more in terms of funding and rewards for software development, and promoting best practices. How to make this happen in a global and concerted manner?
In the long-term we will continue to engage with the necessary stakeholders to keep the discussion alive and to define operational solutions towards improving software reproducibility and sustainability.
- Workshop slides
- Water Resources Research paper by Hut, van de Giesen & Drost, in response to “Most computational hydrology is not reproducible, so is it really science?”
- Participants’ contribution through collaborative google document
The TU Delft Library website now hosts a webpage with information on the process the Library has initiated to get a better understanding of the ideas and needs regarding open source software at TU Delft.
One aspect of this initiative was a series of sandbox sessions, “open to all who want to contribute to the dialogue to bring together a community with the purpose of engaging on open source software.”
The presentations from the sandbox are now available on the new Open Source Software webpage.
The 4TU.Centre for Research Data announces its report on research data management within the 4TU Research Centres.
Over the last few months, the 4TU.Centre for Research Data had the chance to make contact and to speak with several of the Scientific Directors of the 4TU Research Centres about research data management. The report published today highlights the findings from these contacts and conversations.
A citable version of the report is available on OSF Preprints (DOI: 10.17605/OSF.IO/SGFTW).
1. Research data management is not addressed at a strategic level by the 4TU Research Centres, but left to individual research groups or to individual researchers connected to the Centres.
2. Within the 4TU Research Centres, there is a broad range of attitudes towards data and a broad range of data types and characteristics, including large datasets; commercially sensitive datasets; privacy and ethical concerns regarding data; software and its sustainability.
3. Software sustainability is an important and much discussed topic, but there are currently no standards or systematic way of looking after software.
4. Research on human subjects and datasets including personally identifiable information or sensitive personal information are more prominent than might be expected in engineering and the technical sciences. Lack of transparency and reproducibility of scientific results can be an issue in these areas because the underlying datasets are often not available.
An Opportunity to Collaborate
Research data management is increasingly viewed as an important part of high-quality research. International and national funding bodies now mandate institutions and researchers to make data available. Data sharing is predicated on good research data management and has the potential to make scientific research more transparent, open, and efficient. In view of these principles and developments, the 4TU.Centre for Research Data wishes to maintain and deepen its links with the 4TU Research Centres and to support the Centres in various aspects of research data management.
The framework is entitled “Impact for a better society” and “openness” is listed as one of the four major guiding principles. The principle of openness was apparent already during the consultation phase of the framework: “more than 600 internal and external stakeholders have been actively participating” in the process.
The purpose of the strategic framework is “to serve as a high-level compass that will guide decision-making bodies at all levels within our university in the years ahead”. But what does the framework really mean for Open Science? In this blog post, I highlighted the key quotations from the strategic framework which are likely to have the highest impact on future Open Science developments at TU Delft.
Impact for a better society
First, Open Science fits neatly with the overall title of the framework “Impact for a better society”. The framework states in the preface that “societal impact and academic excellence can be mutually reinforcing”. And this is indeed the case. Open Science means that research results can be accessed and re-used by everyone in the society, including the members of the public. TU Delft also wishes to increase its societal engagement by “promoting public participation in scientific research (‘citizen science’). Which is all deeply in line with the principles of Open Science.
Open Access publishing
Within Open Access publishing, TU Delft wishes to first develop a stronger awareness among its researchers. Second, the strategic framework also emphasises the need for a sustainable transition to Open Access publishing and it thus includes the commitment to “reducing costs for Open Access publishing by negotiating journal subscriptions with publishers.” At the same time, TU Delft will explore “new ways to present and disseminate knowledge”, which will not necessarily rely on publishing via the traditional scientific journals. Finally, researchers are encouraged “to serve on relevant Editorial Boards”, suggesting that TU Delft researchers take an active part in shaping publishers’ policies.
The importance of good data management and sharing is also stipulated in the strategic framework. TU Delft wishes to stimulate the sharing of research data, and it realises that in order to achieve this, researchers need to be provided “with the necessary support, for example by appointing data stewards and data engineers within all faculties who advise researchers in managing their data.”
In addition, TU Delft will implement a “policy for research data, and enable researchers to control their own research data in accordance with this policy.” And, quite importantly, the strategic framework states that TU Delft wants to “involve researchers in contributing to TU Delft’s policy for research data management.”
Finally, the strategic framework recognises the importance of the new EU General Data Protection Regulation, and will “set up an integrity policy that protects scientific data and personal data in line with the EU directives.”
Software is an integral part of research and is necessary for research reproducibility. It is therefore not surprising that the commitment to open source software has been stated in several locations in the strategic framework. First, TU Delft will develop “best practices for working with open source software, for example in relation to copyright and archiving of source code” and “facilitate a central place of support for researchers who want to use open source software.” Furthermore, TU Delft stresses the importance of communities in raising awareness and reinforcing good practice. It will, therefore, create “an open source software community with active ambassadors.”
Rewards for Open Science
The Strategic Framework is aiming at recognising the engagement with Open Science by changing the ways in which researchers are evaluated. TU Delft wants to include a more explicit recognition of “engagement with Open Science and Open Education” in yearly R&O evaluation cycles. To facilitate this, TU Delft supports “(inter) national initiatives aimed at finding alternative indicators that positively value open access publications” and is “collaborating with (inter)national leaders in the field of non-traditional metrics.”
Supporting researchers in their transition to Open Science
Importantly, TU Delft recognises that researchers need to be professionally supported in order to ensure that the objectives of the strategic framework can be successfully met. Therefore, it aims to “improve the quality of [its] professional services” and wants to provide researchers with a clear, ‘one-stop-shop’ contacts for requests which should be “simple and effective”, “digital where possible, and personalised where needed”.
TU Delft also plans to appropriately recognise and reward those supporting researchers in their transition to Open Science. TU Delft will “take the lead in national initiatives aimed at extending the job classification for support staff with positions that support recent developments, such as data stewards that advise researchers in managing their (open) research data”.
Strategy for Open Education was also widely mentioned in the framework. The one-page summary outlines TU Delft’s commitment to “promote and facilitate Open Education”, which is then followed by a declaration: “we wholeheartedly support Open Education and want to make Open Educational Resources part of our educational policy”. To achieve this, TU Delft will support lecturers and students in the use of open education resources and will encourage “lecturers to publish their educational material under an open license”
Importantly, TU Delft also wishes to appropriately reward those engaged in Open Education activities. It wishes to strengthen a culture “in which education and teaching receive more appreciation and recognition” and “will refine [its] HR policy so that it will offer further scope for professional development and career opportunities within education”. In addition, as part of its educational policy, TU Delft wants to make “open education part of the basic teaching qualification programme and the evaluation criteria of courses.”
Last, the framework also states that TU Delft has the ambition to replace “commercial textbooks by open resources in all BSc programmes as much as possible.”
How important is the strategic framework?
So how important is the framework? Will the statements be really implemented?
To answer these questions I will conclude with the final quotation from the framework: “this framework is more than a formal requirement; it is our moral responsibility”.
As part of the TU Delft Open Science initiative, the Library together with ICT are looking at the issues related to open source software created by researchers at TU Delft – sustainability, career recognition, training, archiving, licensing and copyright. As part of the process that will help deliver a strategic framework for Open Science at TU Delft, we have been interviewing researchers about their ideas and needs regarding open source software. The interview with Hugo Ledoux, an associate-professor in the 3D geoinformation research group, part of the Department of Urbanism of the Faculty of Architecture & the Built Environment, has been published in the last edition of the Library Online Magazine.
On Thursday 26 October, TU Deft Library together with ICT hosted the third, but possibly not the last, of a series of Innovation Sandbox Sessions on Open Source Software at TU Delft. The topic of the session was training and support.
There were three very interesting presentations and lots of engagement from the audience.
- Carlos Martinez Ortiz – Netherlands eScience Center. He talked about open source software and software sustainability.
- Julian Kooij – TU Delft, Assistant Professor “Visual Sensing and Learning” in the Intelligent Vehicles group, part of the Biomechanical Engineering department, 3mE faculty. Julian talked about the use of Gitlab (and Robot Operating System, ROS) in the Prius Demonstrator vehicle.
- Rob van Laarhoven – TU Delft, Manager of the Data Management department, ICT. Rob talked about the same Gitlab project and broader ambition of setting up a TU Delft-wide Gitlab.
How to keep research software alive?
This discussion caught my attention. Software decays over time because it depends on other code or technology (operating systems, browsers, etc.) that change over time. To keep up with these changes, software needs to be maintained and updated, but this takes time, skill and resources.
10 Ways to keep your successful scientific software alive
This is the title of a blogpost from Vincent van Hees, an eScience Research Engineer at The Netherlands eScience Center. One of his recommendations is to build a community of developers. Building a large community may only be possible for generic research software, but it may still be worth the effort for more domain-specific pieces of software if that leads to a reduction of the maintenance work load.
There is no magic recipe for how to build a community. The Netherlands Research Software Engineer community is a recent initiative “to bring together the community of research software engineers from Dutch universities, knowledge institutes, companies and other relevant organizations to share knowledge, to organize meetings and raise awareness for the scientific recognition of research software.”