Archive Now! Research Data and Openness, Presentation to VU Open Science Festival, October 2018
Presentation by Shalini Kurapati and Michiel de Jong for PhD students at TPM faculty at TU Delft (presented on 7 September 2018): https://doi.org/10.5281/zenodo.1409027
Authors (in alphabetical order): Maria Cruz, Shalini Kurapati, Yasemin Türkyilmaz-van der Velden
With contribution from workshop participants (in alphabetical order): Patrick Aerts (Netherlands eScience Center + DANS), Kees den Heijer (TU Delft), Jelle de Plaa (SRON), Jordi Domingo (KNMI), Martin Donnelly (University of Edinburgh), Raman Ganguly (University of Vienna), Rolf Hut (TU Delft), Karsten Kryger Hansen (Aalborg University), Carlos Martinez (Netherlands eScience center), Joakim Philipson (Stockholm University), Wessel Sloof (University Medical Center Groningen), Martijn Staats (Wageningen University & Research), Michael Svendsen (Royal Danish Library), Jan van der Ploeg (University Medical Center Groningen), Ronald van Haren (Netherlands eScience Center), Egbert Westerhof (DIFFER).
How to cite: A citable version of this report is available since July 06, 2018 through the Open Science Framework. DOI: 10.31219/osf.io/z48cm.
On 24 May 2018, Maria Cruz, Shalini Kurapati, and Yasemin Türkyilmaz-van der Velden led a workshop titled “Software Reproducibility: How to put it into practice?”, as part of the event Towards cultural change in data management – data stewardship in practice held at TU Delft, the Netherlands. There were 17 workshop participants, including researchers, data stewards, and research software engineers. Here we describe the rationale of the workshop, what happened on the day, key discussions and insights, and suggested next steps.
Rationale for the workshop
There is no denying about a reproducibility crisis in science. In some fields, over half of published studies fail reproducibility tests. A survey of 1576 scientists conducted by Nature in 2016 revealed that over 90% of the respondents agreed that there was some level of crisis and over 70% said they had tried and failed to reproduce another group’s experiments. Given the ubiquitousness of software in many areas of contemporary scientific research, it could be argued that there can’t be reproducibility in science without reproducible software.
In a recent Comment in Water Resources Research, in response to “Most computational hydrology is not reproducible, so is it really science?”, Hut, Van de Giesen and Drost (2017) argue that documenting and archiving code and data is not enough to guarantee the reproducibility of computational results. They suggest the use of software containers and open interfaces, and that researchers work more closely with research software engineers (RSEs) to learn best practices in software design. This advice is presented in the context of hydrology, but it could be applied more generally.
Inspired by the article and its advice, the workshop aimed to explore the various topics of software reproducibility— how some of the advice could be put in practice, and what role could institutions, data stewards, and research software engineers play in this regard.
What happened on the day
The workshop session lasted one hour. It started with the moderators introducing themselves, followed by a short survey of the audience using Mentimeter, led by Yasemin Türkyilmaz-van der Velden. Maria Cruz then gave a presentation setting the scene, providing information on reproducibility, and summarising the paper and the suggestions by Hut, Van de Giesen and Drost (2017). One of the authors of the paper, Rolf Hut, attended the session and also said a few words about his paper and his ideas. Shalini Kurapati then moderated the main activity described below.
Using Mentimeter, we asked a few questions to the audience to get familiar with their background and their experiences with research software. As seen in the responses below, there was an almost perfectly balanced audience formed by researchers, research software engineers, data stewards, and people in other research support positions.
There was also a very good balance in terms of the participants’ research backgrounds, which ranged from various disciplines in the physical sciences and medical research to intellectual history and information science. Almost all participants had experience with research software.
The majority (65%) of the participants agreed that there is a reproducibility crisis in science. The reproducibility crisis was a hot topic during the main event (Towards cultural change in data management – data stewardship in practice) and had been already discussed comprehensively earlier in the programme, by the keynote speaker Danny Kingsley. Therefore, a potential bias in the responses of the participants cannot be excluded. Regardless, it was interesting to see that there is an increasing awareness of this important issue.
Before moving to the presentation about software reproducibility, we asked the participants what came to their mind about this topic. The answers, which ranged from sustainability, preservation, and integrity to GitHub, Zenodo, containers, and Docker, clearly show that the audience was already very familiar with the topic of software reproducibility.
Prior to the workshop, we hoped to have a group of participants with diverse backgrounds and interests. Fortunately, that turned out to be the case, and we could form groups with the ideal representation from all stakeholders of interest. We divided the participants into 4 groups, each containing at least a data steward/research support staff, a research software engineer, and a researcher. The groups were invited to answer the following questions within a collaborative google document:
- What do you think about the advice of Hut, Van de Giesen and Drost, i.e., use containers (e.g. Docker), use open interfaces, and closely collaborate with Research Software Engineers to improve software reproducibility?
- Any additional advice to Hut et al., to improve software reproducibility?
- How can researchers, RSEs and data stewards work together towards implementing the above advice?
The groups were allotted 20 minutes to discuss answers to the questions and record them in the google document. The workshop moderators were able to actively monitor the google document to steer the groups towards timely conclusion of their activity. After the activity was concluded, a representative from each group pitched their activity summary and their key findings for a minute. The contents of the google document and the pitches, which were recorded live in the workshop slides, provide us the insights on the challenges and corresponding solutions for software sustainability and reproducibility, that are reported in the next section.
Key discussion points and insights on the advice by Hut, van de Giesen & Drost
Lack of funding for Research Software Engineers
Lack of (sustainable) funding for hiring RSEs is one of the obstacles to putting the advice of Hut, Van de Giesen and Drost into practice. Larger projects typically already have RSEs on board, but for smaller projects this is not always possible. It is difficult to recruit and hire RSEs across disciplines. However, the Netherlands eScience Center is a good example of a way to centrally fund research software development and to pool developer expertise across disciplines.
Open source software is not always an option
Because of scientific competition, commercial and IP interests, it is not always an option to make research software available as open source software. Dockers (containers) are also not an option for commercial software.
High-level documentation is very important. A good README file does part of the job, but documentation and a user manual are also important. Any information (e.g. equations, model) behind the software also needs to be shared.
Lack of support for software validation is also a problem. As an addition to the advice by Hut, van de Giesen and Drost, one of the groups suggested that support should also be provided for software validation (in-house code review). In cases where professional software support is limited, it would already be helpful if researchers would review each others’ code, just like they would do with papers. If the goal is to make code understandable to other researchers, then their feedback will be paramount. Organizing code reviews in a research group could improve the quality of the code significantly with only a small time investment.
The role of data stewards, RSEs and researchers
Data stewards – the link between researchers and RSEs?
Two groups saw the role of data stewards as brokers between researchers and RSEs. It was acknowledged that researchers and RSEs should interact more to improve research codes (e.g. review of codes). Data stewards could be the link between the two. Data stewards could monitor possible synergy between projects and link researchers with specialist RSE expertise. One group felt that data stewards should provide the toolbox, with principles (e.g FAIR principles) and guidance, and RSEs should help implement those principles, because they have the knowledge to do so.
Could RSEs do more to promote best practices?
Two groups thought that RSEs could take a more proactive role in providing training for researchers, promoting best practices, and generally propagating their knowledge. Without assigning roles, one of the groups felt that implementing the advice of Hut, Van de Giesen and Drost required programming courses, support staff to help out researchers at departmental level, and the breakdown of problems into smaller problems that could be solved with up-to-date techniques based on expert knowledge. Could RSEs also help with this?
Opportunities and barriers, and the role of institutions
Integrated teams working across university faculties, departments, and institutes, with a single point of contact, could provide a way for researchers, data stewards, and RSEs to work together. Fear of stepping into others’ “working areas” and different working cultures may create barriers, as well as the potential lack of scientific/research expertise from RSEs and software developers.
Sustainable funding is a challenge, so is the lack of recognition for developing research software in the current academic rewards system. There also needs to be a persuasive driver beyond just doing the right thing. This can come from funders, publishers and possibly institutions. Any driver will be most persuasive when it comes from the research community itself.
Universities and institutes should promote good practice for software engineering as part of open science.
The short-term goal of this workshop was to start a conversation on the topic of software reproducibility between researchers, research support staff (data stewards and others with a similar role), and research software engineers, and to make the results of this discussion public via this forum.
The immediate next step is to bring the results of this interaction to the attention of the community Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE), which includes researchers and research software engineers, but lacks a strong connection with data stewards and research data support. We plan to submit a paper for the 9th International Workshop on Sustainable Software for Science: Practice and Experiences, to be held in Amsterdam on 29 October 2018.
The time available for the workshop was limited and not all the issues were discussed or discussed in enough depth. For example, it would be interesting to discuss in more detail what training and resources researchers and research supports need to help software reproducibility become more of a reality and what role could data stewards and research software engineers play in this regard.
Institutions could certainly do more in terms of funding and rewards for software development, and promoting best practices. How to make this happen in a global and concerted manner?
In the long-term we will continue to engage with the necessary stakeholders to keep the discussion alive and to define operational solutions towards improving software reproducibility and sustainability.
- Workshop slides
- Water Resources Research paper by Hut, van de Giesen & Drost, in response to “Most computational hydrology is not reproducible, so is it really science?”
- Participants’ contribution through collaborative google document
This presentation is available via Zenodo 10.5281/zenodo.1252925.
Presentation at TU Delft as part of Erasmus+ Programme, May 2018
This report is also available in a pdf version on the Open Science Framework: https://doi.org/10.17605/OSF.IO/JR9U2
On 15th and 16th of March 2018, two events dedicated to Electronic Lab Notebooks (ELNs) took place at TU Delft Library: “Digital Notebooks – productivity tools for researchers” and “Digital Notebooks – how to provide solutions for researchers?”. The events were organized by the Research Data Services, TU Delft Library. Both events attracted a lot of attention nationally and internationally, and the tickets got quickly sold out. We were very happy to see the amount of interest in these events, and the inspiring discussions initiated by the participants. During my PhD study in molecular biology and genetics, I have always felt the need for a digital tool to manage my research data. Currently being the Data Steward at the TU Delft Faculties of Applied Sciences and Mechanical, Maritime and Materials Engineering, my responsibility is to address the data management needs of the researchers at these faculties. Therefore, it was especially interesting for me to join these events and explore the currently available tools. Below is a report of the first day.
The need for digital notebooks
Many academic researchers use paper notebooks to document all sorts of experimental details ranging from date, purpose, methodology and raw/analyzed data to conclusions. The main problem with paper-based notebooks is that they are not searchable, especially considering that each researcher typically leaves behind a shelf full of such notebooks. As a result, it often becomes very difficult to find the results and details of experiments performed by previous lab members or even just to read and understand the related handwritten notes. Moreover, paper notebooks mostly could store only a printed copy of the finalized dataset, which is not reusable. Furthermore, in a paper notebook, it is impossible to directly link the experimental details to all of the raw, intermediate and final datasets which are mostly digital. Together all of these do not only decrease research efficiency but also presents challenges to research reproducibility, which is a particularly important issue in the light of the current reproducibility crisis in science.
Digital notebooks provide a searchable alternative to paper-based traditional notebooks, and additionally offer lots of efficiency-saving integrations – with various cloud storage platforms, calendars and project management tools.
Digital Notebooks – productivity tools for researchers on 15th of March
This full-day event was aimed at researchers, students, and supervisors who are interested in making their research digital, and research support staff who want to learn more about ELNs and how could ELNs meet the needs of the researchers. All of the presentations in this event can be found here: DOI 10.5281/zenodo.1247390.
Image from the presentation by Esther Maes
Esther Maes from TU Delft Library opened the event stressing the importance of archiving and that archiving is required not only to minimize the risk of losing data but also to avoid fraud. She continued with asking intriguing questions: “What happens when you leave? How can people access the correct version of your data? Is it even easily accessible for you?”
Then Alastair Dunning, the head of TU Delft Research Data Services and 4TU.Centre for Research Data, took the lead and emphasized that data documentation is a time-consuming process, involving many disjointed jumps such as experimenting, analyzing, indexing and publishing, therefore there is a need for making data documentation smoother. He finalized his speech with a valuable remark stating that a new digital solution cannot have poorer usability than the existing paper ones.
The rest of morning sessions focused on case studies from researchers who not only use digital notebooks in daily practice but also took the lead in the implementation of the ELNs in their research groups and institutes.
Image from the presentation by Alastair Dunning
Case study 1: Let’s go digital; keeping track of your research using eLABJournal by Evelien Stouten from the Department of Biology, Utrecht University
Evelien Stouten described that the researchers expect an ELN to be not only well-organized and searchable but also suitable for integration with other tools and software packages, adding literature references and data sharing with collaborators. She also highlighted that an ELN is expected to provide safe data storage and be fraud-proof, meaning that everything that is documented remains traceable, even if it is deleted or changed.
Image from the presentation by Evelien Stouten
The Faculty of Science at Utrecht University started discussing ELNs in 2013 and the researchers were invited to take part in a test phase from 2014. Her research group found out that eLABJournal meets their expectations and provides an additional application suitable for their needs, namely eLABInventory. This application enables digital documentation and categorization of samples such as strains, plasmids, cell lines, chemicals, antibodies, RNA and DNA samples, and linking of these samples to the experimental data. She stressed that they are obligated by law to keep records of all genetically modified organisms (GMO) and usage of eLABInventory is currently obligatory for all Utrecht University labs using GMOs. She also mentioned that they find the mobile app useful since it enables the researchers to use eLABJournal also on their phones or tablets when they are working in the lab.
She concluded her talk by pointing out that some people are really attached to their paper lab journals and it might take some effort to convince them to start using it, even though it is made obligatory.
Case study 2: From paper to screen: What users really think about electronic lab notebooks by Katharina Hanika, Department of Plant Sciences, Wageningen University
Image from the presentation by Katharina Hanika
Katharina Hanika shared with the audience her experience with eLABJournal and her insights into using ELNs. She focused on why to switch from paper to screen by listing the pros and cons of ELNs. For pros, she indicated the readable, structured and searchable information, digital storage of samples, and easy collaboration with colleagues not only for sharing or discussing data but also for version control. As for cons, she pointed out that the startup was time-intensive since it takes time to figure out how the program works. Moreover, a good internet connection is required as eLABJournal is web-based. Although eLABJournal is still under improvement, she sees that as an advantage, since the company provides support and adjusts accordingly to needs of the researchers.
She further continued with discussing how to achieve department-wide implementation of ELNs. She suggested that it is best to start with volunteers since it is challenging to convince the “creatures of habit” to change their ways of working. She pointed out that if researchers try ELNs themselves, they can get frustrated and give up, and therefore it is a good idea to first start with online demonstrations and hands-on exercises. It would be also beneficial to assign the experienced ELN users as contact persons to be reached for questions. Moreover, creating an ELN user group would enable researchers to help each other.
She concluded her talk by stating that any (electronic) lab notebook is only as good as its user and what it takes is time, commitment and adaptability.
Case study 3: Enabling connectivity in electronic laboratory notekeeping – a pilot approach in biomedical sciences by Harald Kusch, University Medical Center Göttingen
Image from the presentation by Harald Kusch
Harald Kusch talked about the pilot implementation of RSpace at CRC 1002 Research Data Platform. He highlighted that using an ELN enables linking of experimental data to other relevant elements, such as catalogs for cell lines, mouse lines, and antibodies, as well as databases. He explained the possible ways of structuring data in an ELN, which are chronological, project-oriented and method-oriented. Although it is a challenge to decide which is the best option, the chronological option is the only option in a paper lab journal. He described that RSpace allows both structured and unstructured documentation. Structured documentation is very handy, especially for new people in the lab, as it allows using centralized protocols and facilitated metadata recording. Meanwhile, unstructured documentation offers room for creativity and is especially suitable for new lab protocols. He also stressed that all versions of each document are saved, which prevents fraud. He explained that the data can be exported in different formats, such as PDF, HTML, and XML. Moreover, RSpace offers interfaces for easy transfer of datasets to data repositories such as Dataverse. He finalized his talk emphasizing that start-up phase takes time.
Interactive questions from the audience
During this interactive session, the audience had the chance to ask their questions to the presenters of the case studies. Most questions were focused on the following topics:
Where is the data stored? Is institutional data storage an option?
- Both eLABJournal and RSpace give the institutional data storage option to their users.
How to use an ELN in a lab environment without going up and down between the lab and the office to write down notes?
- Katharina: There are fixed tablets available in the lab, some people directly type in the tablet, some make handwritten notes and go back to their PCs.
- Harald: Not every lab can afford a tablet per lab member, but it may also not be necessary.
- Evelien: Not everyone types right away, some prefer to make small notes and then type it in the ELN in the office.
What happens to the added hyperlinks in the ELNs if folders are moved, do links work still?
- If the name or location is changed, the link would indeed break but at least it is possible to trace back to the previous link. There is no direct solution available yet.
Does setting up an ELN in a department need a fully dedicated staff member?
- To be able to implement an ELN, an ideal way would be that a lab member who knows the research type and needs takes the lead to implement it.
Keynote by Alastair Downie, University of Cambridge: Choosing an Electronic Lab Notebook
Alastair Downie told that the first ELN came in 1997 and the industry was quick to adopt, while this was not the case with academia. He explained that the industry has a variety of incentives to use ELNs, such as the requirement for absolutely consistent processes, protection of intellectual property and other commercial and corporate responsibilities. He answered the question “What is holding universities back?” saying that there are so many different types of ELNs and so many different types of research and research needs which altogether makes it difficult to find the ideal solution. To make it easier for the researchers to choose an ELN, he prepared a valuable resource with an overview of the available solutions. In this source, information about a variety of issues are provided:
- What is an electronic lab notebook and why should I use one?
- A note about DIY systems
- ELN vs LIMS
- Disengagement – what if I want to change systems?
- Narrowing the scope, creating a shortlist
- Evaluating ELN products
- Table of 25 current ELN products
- Discussion forum
As an alternative option to the available ELNs, he introduced Do-It-Yourself (DIY) ELNs which could be made by using tools such as EVERNOTE, OneNote, asana, Basecamp, Dropbox, OneDrive. He emphasized that using one of these tools as a DIY ELN still requires a very disciplined approach; however, without any ELN, one needs to be even more structured. He also stressed that these tools are not designed to be used as an ELN and therefore do not provide custom solutions.
Image from the presentation by Alastair Downie
He also focused on the question “What if you chose the wrong product?”. It is possible that after implementing an ELN, the ELN software can change and may not be really suitable for the research needs of the users. If you stop using an ELN, in most cases all you can export is a PDF, HTML or XML file(s), but on the other hand at least such files are easily accessible and searchable and can be backed-up and securely stored.
Then he focused on creating a shortlist to find the ideal option:
- Do you have a budget?
- Free or a paid ELN? Is a paid ELN worth the money?
- Will you use the software as an individual or a group?
- Collaborative vs self-contained, comprehensive vs lightweight
- Do you need team collaboration and supervisor features?
- Group activity dashboard, commenting & discussions
- Constant discussion, even if the group leader is away
- Departmental or institutional deployment?
- Please everyone? Or focus on stability, accessibility, and universal relevance?
- Do you need multi-operating system (OS) compatibility?
- Browser-based & OS agnostic, or application-based
- What devices will be used to operate the software?
- Tablets on bench? Voice recognition? Phones? Paper?
- Data security and compliance requirements?
- GDPR compliance? Local storage?
He further explained how to evaluate the shortlisted products:
- Interface design: Look and feel user-friendly, intuitive and efficient?
- Workflow suitability: Does ELN workflow match your own workflow?
- Content creation tools: Writing, drawing, annotation, markup, equations, chemical structures…
- Data management & storage features: Upload typical file types/sizes? Larger files? Display/operation? Backed-up?
- Integration with other software and/or cloud services: Office apps, Statistics, Institutional storage, Community repositories…
- Collaboration features: Share data and comments in a group? Invite external collaboration?
- Group leader/Supervisor features: Sufficient oversight and feedback tools? Team/account management?
- Export features: Pages, sections, entire ELN? Data in original formats?
More detailed information can be found at: www.gurdon.cam.ac.uk/eln
Info from ELN providers about afternoon workshops
There were four ELN providers present at the event:
- RSpace – presentation by Richard Adams
- eLABJournal – presentation by Ulrike Dijkman, Florian Studener
- labfolder – presentation by Yannick Skop
- Hivebench – presentation by Wouter Haak and Julien Therier
Before the interactive demonstration sessions, each ELN provider was given the opportunity to give a pitch about their ELN product. The presentations in this session and the morning session can be found here: DOI 10.5281/zenodo.1247390.
Hands-on workshops and opportunity to test tools offered by various ELN providers
In this session, the participants were given the opportunity to try out the ELNs listed above and ask their questions directly to the providers. Here is the feedback that was given by the participants about each ELN at the end of the hands-on workshops:
After this event, we got contacted by researchers from various TU Delft departments to discuss the possibilities of implementing an ELN. Currently, we are in contact with researchers to determine what they expect and require from an ELN and we are planning to start a pilot study afterwards.
I would like to finalize this report by sharing the feedbacks given by the participants about this event:
This report is available in a pdf version on the Open Science Framework: https://doi.org/10.17605/OSF.IO/JR9U2
First of all, I would like to thank the Research Data Services, TU Delft Library for organizing this very informative event. We also thank all the speakers for the informative presentations and all the participants for the fruitful discussions. Finally, I would like to give special thanks to Marta Teperek for her critical reading and inspiring suggestions during the preparation of this report.