On Thursday 30 August and on Friday 31 August TU Delft Library hosted two events dedicated to the new European General Data Protection Regulation (GDPR) and its implications for research data. Both events were organised by the Research Data Netherlands: collaboration between the 4TU.Center for Research Data, DANS and SURF (represented by the National Research Data Management Coordination Point).
First: do no harm. Protecting personal data is not against data sharing
On the first day, we heard case studies from experts in the field, as well as from various institutional support service providers. Veerle Van den Eynden from the UK Data Service kicked off the day with her presentation, which clearly stated that the need to protect personal is not against data sharing. She outlined the framework provided by the GDPR which make sharing possible, and explained that when it comes to data sharing one should always adhere to the principle “do no harm”. However, she reflected that too often, both researchers and research support services (such as ethics committees), prefer to avoid any possible risks rather than to carefully consider them and manage them appropriately. She concluded by providing a compelling case study from the UK Data Service, where researchers were able to successfully share data from research on vulnerable individuals (asylum seekers and refugees).
From a one-stop shop solution to privacy champions
We have subsequently heard case studies from four Dutch research institutions: Tilburg University, TU Delft, VU Amsterdam and Erasmus University Rotterdam about their practical approaches to supporting researchers working with personal research data. Jan Jans from Tilburg explained their “one stop shop” form, which, when completed by researchers, sorts out all the requirements related to GDPR, ethics and research data management. Marthe Uitterhoeve from TU Delft said that Delft was developing a similar approach, but based on data management plans. Marlon Domingus from Erasmus University Rotterdam explained their process based on defining different categories of research and determining the types of data processing associated with them, rather than trying to list every single research project at the institution. Finally, Jolien Scholten from VU Amsterdam presented their idea of appointing privacy champions who receive dedicated training on data protection and who act as the first contact points for questions related to GDPR within their communities.
Lots of inspiring ideas and there was a consensus in the room that it would be worth re-convening in a year’s time to evaluate the different approaches and to share lessons learned.
How to share research data in practice?
Next, we discussed three different models for helping researchers share their research data. Emilie Kraaikamp from DANS presented their strategy for providing two different access levels to data: open access data and restricted access data. Open datasets consist mostly of research data which are fully anonymised. Restricted access data need to be requested (via an email to the depositor) before the access can be granted (the depositor decides whether access to data can be granted or not).
Veerle Van Den Eynden from the UK Data Service discussed their approach based on three different access levels: open data, safeguarded data (equivalent to “restricted access data” in DANS) and controlled data. Controlled datasets are very sensitive and researchers who wish to get access to such datasets need to undergo a strict vetting procedure. They need to complete training, their application needs to be supported by a research institution, and typically researchers access such datasets in safe locations, on safe servers and are not allowed to copy the data. Veerle explained that only a relatively small number of sensitive datasets (usually from governmental agencies) are shared under controlled access conditions.
The last case study was from Zosia Beckles from the University of Bristol, who explained that at Bristol, a dedicated Data Access Committee has been created to handle requests for controlled access datasets. Researchers responsible for the datasets are asked for advice how to respond to requests, but it is the Data Access Committee who ultimately decides whether access should be granted or not, and, if necessary, can overrule the researcher’s advice. The procedure relieves researchers from the burden of dealing with data access requests.
DataTags – decisions about sharing made easy(ier)
Ilona von Stein from DANS continued the discussion about data sharing and means by which sharing could be facilitated. She described an online tool developed by DANS (based on a concept initially developed by colleagues from Harvard University, but adapted to European GDPR needs) allowing researchers to answer simple questions about their datasets and to return a tag, which defines whether data is suitable for sharing and what are the most suitable sharing options. The prototype of the tool is now available for testing and DANS plans to develop it further to see if it could be also used to assist researchers working with data across the whole research lifecycle (not only at the final, data sharing stage).
What are the most impactful & effortless tactics to provide controlled access to research data?
The final interactive part of the workshop was led by Alastair Dunning, the Head of 4TU.Center for Research Data. Alastair used Mentimeter to ask attendees to judge the impact and effort of fourteen different tactics and solutions which can be used at research institutions to provide controlled access to research data. More than forty people engaged with the online survey and this allowed Alastair to shortlist five tactics which were deemed the most impactful/effort-efficient:
- Create a list of trusted archives for researchers can deposit personal data
- Publish an informed consent template for your researchers
- Publish on university website a list of FAQs concerning personal data
- Provide access to a trusted Data Anonymisation Service
- Create categories to define different types of personal data at your institution
Alastair concluded that these should probably be the priorities to work on for research institutions which don’t yet have the above in place.
How to put all the learning into practice?
The second event was dedicated to putting all the learning and concepts developed during the first day into practice. Researchers working with personal data, as well as those directly supporting researchers, brought their laptops and followed practical exercises led by Veerle Van den Eynden and Cristina Magder from the UK Data Service. We started by looking at a GDPR-compliant consent form template. Subsequently, we practised data encryption using VeraCrypt. We then moved to data anonymisation strategies. First, Veerle explained possible tactics (again, with nicely illustrated examples) for de-identification and pseudo-nymisation of qualitative data. This was then followed by a comprehensive hands-on training delivered by Cristina Magder on disclosure review and de-identification of quantitative data using sdcMicro.
Altogether, the practical exercises allowed one to clearly understand how to effectively work with personal research data from the very start of the project (consent, encryption) all the way to data de-identification to enable sharing and data re-use (whilst protecting personal data at all stages).
Conclusion: GDPR as an opportunity
I think that the key conclusion of both days was that the GDPR, while challenging to implement, provides an excellent opportunity both to researchers and to research institutions to review and improve their research practices. The key to this is collaboration: across the various stakeholders within the institution (to make workflows more coherent and improve collaboration), but also between different institutions. An important aspect of these two events was that representatives from multiple institutions (and countries!) were present to talk about their individual approaches and considerations. Practice exchange and lessons learned can be invaluable to allow institutions to avoid similar mistakes and to decide which approaches might work best in particular settings.
We will definitely consider organising a similar meeting in a year’s time to see where everyone is and which workflows and solutions tend to work best.
Presentations from both events are available on Zenodo:
Authors (in alphabetical order): Maria Cruz, Shalini Kurapati, Yasemin Türkyilmaz-van der Velden
With contribution from workshop participants (in alphabetical order): Patrick Aerts (Netherlands eScience Center + DANS), Kees den Heijer (TU Delft), Jelle de Plaa (SRON), Jordi Domingo (KNMI), Martin Donnelly (University of Edinburgh), Raman Ganguly (University of Vienna), Rolf Hut (TU Delft), Karsten Kryger Hansen (Aalborg University), Carlos Martinez (Netherlands eScience center), Joakim Philipson (Stockholm University), Wessel Sloof (University Medical Center Groningen), Martijn Staats (Wageningen University & Research), Michael Svendsen (Royal Danish Library), Jan van der Ploeg (University Medical Center Groningen), Ronald van Haren (Netherlands eScience Center), Egbert Westerhof (DIFFER).
How to cite: A citable version of this report is available since July 06, 2018 through the Open Science Framework. DOI: 10.31219/osf.io/z48cm.
On 24 May 2018, Maria Cruz, Shalini Kurapati, and Yasemin Türkyilmaz-van der Velden led a workshop titled “Software Reproducibility: How to put it into practice?”, as part of the event Towards cultural change in data management – data stewardship in practice held at TU Delft, the Netherlands. There were 17 workshop participants, including researchers, data stewards, and research software engineers. Here we describe the rationale of the workshop, what happened on the day, key discussions and insights, and suggested next steps.
Rationale for the workshop
There is no denying about a reproducibility crisis in science. In some fields, over half of published studies fail reproducibility tests. A survey of 1576 scientists conducted by Nature in 2016 revealed that over 90% of the respondents agreed that there was some level of crisis and over 70% said they had tried and failed to reproduce another group’s experiments. Given the ubiquitousness of software in many areas of contemporary scientific research, it could be argued that there can’t be reproducibility in science without reproducible software.
In a recent Comment in Water Resources Research, in response to “Most computational hydrology is not reproducible, so is it really science?”, Hut, Van de Giesen and Drost (2017) argue that documenting and archiving code and data is not enough to guarantee the reproducibility of computational results. They suggest the use of software containers and open interfaces, and that researchers work more closely with research software engineers (RSEs) to learn best practices in software design. This advice is presented in the context of hydrology, but it could be applied more generally.
Inspired by the article and its advice, the workshop aimed to explore the various topics of software reproducibility— how some of the advice could be put in practice, and what role could institutions, data stewards, and research software engineers play in this regard.
What happened on the day
The workshop session lasted one hour. It started with the moderators introducing themselves, followed by a short survey of the audience using Mentimeter, led by Yasemin Türkyilmaz-van der Velden. Maria Cruz then gave a presentation setting the scene, providing information on reproducibility, and summarising the paper and the suggestions by Hut, Van de Giesen and Drost (2017). One of the authors of the paper, Rolf Hut, attended the session and also said a few words about his paper and his ideas. Shalini Kurapati then moderated the main activity described below.
Using Mentimeter, we asked a few questions to the audience to get familiar with their background and their experiences with research software. As seen in the responses below, there was an almost perfectly balanced audience formed by researchers, research software engineers, data stewards, and people in other research support positions.
There was also a very good balance in terms of the participants’ research backgrounds, which ranged from various disciplines in the physical sciences and medical research to intellectual history and information science. Almost all participants had experience with research software.
The majority (65%) of the participants agreed that there is a reproducibility crisis in science. The reproducibility crisis was a hot topic during the main event (Towards cultural change in data management – data stewardship in practice) and had been already discussed comprehensively earlier in the programme, by the keynote speaker Danny Kingsley. Therefore, a potential bias in the responses of the participants cannot be excluded. Regardless, it was interesting to see that there is an increasing awareness of this important issue.
Before moving to the presentation about software reproducibility, we asked the participants what came to their mind about this topic. The answers, which ranged from sustainability, preservation, and integrity to GitHub, Zenodo, containers, and Docker, clearly show that the audience was already very familiar with the topic of software reproducibility.
Prior to the workshop, we hoped to have a group of participants with diverse backgrounds and interests. Fortunately, that turned out to be the case, and we could form groups with the ideal representation from all stakeholders of interest. We divided the participants into 4 groups, each containing at least a data steward/research support staff, a research software engineer, and a researcher. The groups were invited to answer the following questions within a collaborative google document:
- What do you think about the advice of Hut, Van de Giesen and Drost, i.e., use containers (e.g. Docker), use open interfaces, and closely collaborate with Research Software Engineers to improve software reproducibility?
- Any additional advice to Hut et al., to improve software reproducibility?
- How can researchers, RSEs and data stewards work together towards implementing the above advice?
The groups were allotted 20 minutes to discuss answers to the questions and record them in the google document. The workshop moderators were able to actively monitor the google document to steer the groups towards timely conclusion of their activity. After the activity was concluded, a representative from each group pitched their activity summary and their key findings for a minute. The contents of the google document and the pitches, which were recorded live in the workshop slides, provide us the insights on the challenges and corresponding solutions for software sustainability and reproducibility, that are reported in the next section.
Key discussion points and insights on the advice by Hut, van de Giesen & Drost
Lack of funding for Research Software Engineers
Lack of (sustainable) funding for hiring RSEs is one of the obstacles to putting the advice of Hut, Van de Giesen and Drost into practice. Larger projects typically already have RSEs on board, but for smaller projects this is not always possible. It is difficult to recruit and hire RSEs across disciplines. However, the Netherlands eScience Center is a good example of a way to centrally fund research software development and to pool developer expertise across disciplines.
Open source software is not always an option
Because of scientific competition, commercial and IP interests, it is not always an option to make research software available as open source software. Dockers (containers) are also not an option for commercial software.
High-level documentation is very important. A good README file does part of the job, but documentation and a user manual are also important. Any information (e.g. equations, model) behind the software also needs to be shared.
Lack of support for software validation is also a problem. As an addition to the advice by Hut, van de Giesen and Drost, one of the groups suggested that support should also be provided for software validation (in-house code review). In cases where professional software support is limited, it would already be helpful if researchers would review each others’ code, just like they would do with papers. If the goal is to make code understandable to other researchers, then their feedback will be paramount. Organizing code reviews in a research group could improve the quality of the code significantly with only a small time investment.
The role of data stewards, RSEs and researchers
Data stewards – the link between researchers and RSEs?
Two groups saw the role of data stewards as brokers between researchers and RSEs. It was acknowledged that researchers and RSEs should interact more to improve research codes (e.g. review of codes). Data stewards could be the link between the two. Data stewards could monitor possible synergy between projects and link researchers with specialist RSE expertise. One group felt that data stewards should provide the toolbox, with principles (e.g FAIR principles) and guidance, and RSEs should help implement those principles, because they have the knowledge to do so.
Could RSEs do more to promote best practices?
Two groups thought that RSEs could take a more proactive role in providing training for researchers, promoting best practices, and generally propagating their knowledge. Without assigning roles, one of the groups felt that implementing the advice of Hut, Van de Giesen and Drost required programming courses, support staff to help out researchers at departmental level, and the breakdown of problems into smaller problems that could be solved with up-to-date techniques based on expert knowledge. Could RSEs also help with this?
Opportunities and barriers, and the role of institutions
Integrated teams working across university faculties, departments, and institutes, with a single point of contact, could provide a way for researchers, data stewards, and RSEs to work together. Fear of stepping into others’ “working areas” and different working cultures may create barriers, as well as the potential lack of scientific/research expertise from RSEs and software developers.
Sustainable funding is a challenge, so is the lack of recognition for developing research software in the current academic rewards system. There also needs to be a persuasive driver beyond just doing the right thing. This can come from funders, publishers and possibly institutions. Any driver will be most persuasive when it comes from the research community itself.
Universities and institutes should promote good practice for software engineering as part of open science.
The short-term goal of this workshop was to start a conversation on the topic of software reproducibility between researchers, research support staff (data stewards and others with a similar role), and research software engineers, and to make the results of this discussion public via this forum.
The immediate next step is to bring the results of this interaction to the attention of the community Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE), which includes researchers and research software engineers, but lacks a strong connection with data stewards and research data support. We plan to submit a paper for the 9th International Workshop on Sustainable Software for Science: Practice and Experiences, to be held in Amsterdam on 29 October 2018.
The time available for the workshop was limited and not all the issues were discussed or discussed in enough depth. For example, it would be interesting to discuss in more detail what training and resources researchers and research supports need to help software reproducibility become more of a reality and what role could data stewards and research software engineers play in this regard.
Institutions could certainly do more in terms of funding and rewards for software development, and promoting best practices. How to make this happen in a global and concerted manner?
In the long-term we will continue to engage with the necessary stakeholders to keep the discussion alive and to define operational solutions towards improving software reproducibility and sustainability.
- Workshop slides
- Water Resources Research paper by Hut, van de Giesen & Drost, in response to “Most computational hydrology is not reproducible, so is it really science?”
- Participants’ contribution through collaborative google document
Late last month, I took a day trip to the Netherlands to attend an event at TU Delft entitled “Towards cultural change in data management – data stewardship in practice”. My Software Sustainability Institute Fellowship application “pitch” last year had been based around building bridges and sharing strategies and lessons between advocacy approaches for data and software management, and encouraging more holistic approaches to managing (and simply thinking about) research outputs in general. When I signed up for the event I expected it to focus exclusively on research data, but upon arrival at the venue (after a distressingly early start, and a power-walk from the train station along the canal) I was pleasantly surprised to find that one of the post-lunch breakout sessions was on the topic of software reproducibility, so I quickly signed up for that one.
I made it in to the main auditorium just in time to hear TU Delft’s Head of Research Data Services, Alastair Dunning, welcome us to the event. Alastair is a well-known face in the UK, hailing originally from Scotland and having worked at Jisc prior to his move across the North Sea. He noted the difference between managed and Open research data, a distinction that translates to research software too, and noted the risk of geographic imbalance between countries which are able to leverage openness to their advantage while simultaneously coping with the costs involved – we should not assume that our northern European privilege is mirrored all around the globe.
Danny Kingsley during her keynote presentation
The first keynote came from Danny Kingsley, Deputy Director of Scholarly Communication and Research Services at the University of Cambridge, whom I also know from a Research Data Management Forum event I organised last year in London. Danny’s theme was the role of research data management in demonstrating academic integrity, quality and credibility in an echo-chamber/social media world where deep, scholarly expertise itself is becoming (largely baselessly) distrusted. Obviously as more and more research depends upon software driven processing, what’s good for data is just as important for code when it comes to being able to reproduce or replicate research conclusions; an area currently in crisis, according to at least one high profile survey. One of Danny’s proposed solutions to this problem is to distribute and reward dissemination across the whole research lifecycle, not only attaching credit and recognition/respect to traditional publications, but also to datasets, code and other types of outputs.
Questions from the audience
After a much-appreciated coffee break, Marta Teperek introduced TU Delft’s Vision for data stewardship, which, again, has repercussions and relevance beyond just data. The broad theme of “Openness”, for example, is one of the four major principles in current TU Delft strategic plan, indicating the degree of institutional support it has as an underpinning philosophy. Marta was keen to emphasise that the cohort of data stewards which Delft have recently hired are intended to be consultants, not police! Their aim is to shift scholarly culture, not to check or enforce compliance, and the effectiveness of their approach is being measured by regular surveys. It will be interesting to see how they have got on in a year or two years’ time: already they are looking to expand from one data steward per faculty to one per department.
There followed a number of case studies from the Delft data stewards themselves. My main takeaways from these were the importance of mixing top-down and bottom-up approaches (culture change has to be driven from the grassroots, but via initiatives funded by the budget holders at the top), and the importance of driving up engagement and making people care about these issues.
Data Stewards answering questions from the audience
After lunch we heard from a couple of other European universities. From Martine Pronk, we learned that Utrecht University stripes its research support across multiple units and services, including library and the academic departments themselves, in order to address institutional, departmental, and operational needs and priorities. In common with the majority of UK universities, Utrecht’s library is main driving and coordination force, with specific responsibility for research data management being part of the Research IT programme. From Stockholm University’s Joakim Philipson we heard about the Swedish context, which again seemed similar to the UK’s development path and indeed my own home institution’s. Sweden now has a national data services consortium (the SND), analogous to the DCC in the UK, and Stockholm, like Edinburgh, was the first university in its country to have a dedicated RDM policy.
We then moved into our breakout groups, in my case the one titled “Software reproducibility – how to put it into practice?”, which had a strange gender distribution with the coordinators all female, but the other participants all male. One of the coordinators noted that this reminded her of being an Engineering undergraduate again. We began by exploring our own roles and levels of experience/understanding of research software. The group comprised a mixture of researchers, software engineers, data stewards and ‘other’ (I fell into this last category), and in terms of hands-on experience with research software roughly two-thirds of participants were actively developing software, and another third used it. Participants came from a broad range of research backgrounds, as well as a smaller number of research support people such as myself. We then voted on how serious we felt the aforementioned reproducibility crisis actually was, with a two-thirds/one-third split between “crisis” and “what-crisis?” We explored the types of issues that come to mind when we think about software preservation, with the most popular responses being terms such as “open source”, “GitHub” and “workflows”. We then moved on to the main business of the group, which was to consider a recent article by Hut, van de Giesen and Drost. In a nutshell, this says that archiving code and data is not sufficient to enable reproducibility, therefore collaboration with dedicated Research Software Engineers (RSEs) should be encouraged and facilitated. We broke into smaller groups to discuss this from our various standpoints, and presented back in the room. The various notes and pitches are more detailed than this blog post requires, but those interested can check out the collaboratively-authored Google Doc to see what we came up with. The breakout session will also be written up as a blog post and an IEEE proposal, so keep an eye out for that.
After returning to the main auditorium for reports from each of the groups, including an interesting-looking one from my friend and colleague Marjan Grootveld on “Why Is This A Good Data Management Plan?”, the afternoon concluded with two more keynote presentations. First up, Kim Huijpen from VSNU (the Association of Universities in the Netherlands) spoke about “Giving scientists more of the recognition they deserve”, followed by Ingeborg Verheul of LCRDM (the Dutch national coordination point for research data management), whose presentation was titled “Data Stewardship? Meet your peers!” Both of these national viewpoints were very interesting from my current perspective as a member of a nationally-oriented organisation. From my coming perspective as manager of an institutional support service – I’m in the process of changing roles at the moment – Kim’s emphasis on Team Science struck a chord, and relates to what we’re always saying about research data: it’s a hybrid activity, and takes a village to raise a child, etc. Ingeborg spoke about the dynamics involved between institutional and national level initiatives, and emphasised the importance of feeling like part of a community network, with resources and support which can be drawn upon as needed.
Closing the event, TU Delft Library Director Wilma van Wezenbeek underlined the necessity of good data management in enabling reproducible research, just as the breakout group emphasised the necessity of software preservation, and in effect confirming a view of mine that has been developing recently: that boundaries between managing data and managing software (or other types of research output) are often artificially created, and not always helpful. We need to enable and support more holistic approaches to this, acting in sympathy and harmony with actual research practices. (We also need to put our money where our mouth is, and fund it!)
After all that there was just enough time for a quick beer in downtown Delft before catching the train and plane back to Edinburgh. Many thanks to TU Delft for hosting a most enjoyable and interesting event, and to the Software Sustainability Institute whose support covered the costs of my attendance.
Several resources from the event are now available:
- Presentations and other materials (via Zenodo)
- Recorded presentations (via TU Delft website)
- Recorded presentations (via YouTube)
- Photos (via Flickr)
- Tweets (via Twitter)
- Blog posts:
This blog post reports from a workshop session led by Marjan Grootveld and Ellen Leenarts from DANS. The workshop was part of a larger event “Towards cultural change in data management – data stewardship in practice” organised by TU Delft Library on 24th of May 2018.
This blog post was written by Marjan Grootveld from DANS it was published before on the OpenAIRE blog.
It’s not just colonel Hannibal Smith, who loves it when a plan comes together. Don’t we all? On a more serious note, this also holds for Data Management Plans or DMPs. In a DMP a researcher or research team describes what data goes into a project (reuse) and comes out of it (potential reuse), How the team takes care of the data, and Who is allowed to do What with the data When.
Just like a project plan a DMP undergoes a reviewing process. Often, however, researchers share their draft version and questions with research support staff and data stewards (see the results of this survey by OpenAIRE and the FAIR Data Expert Group). About twenty data stewards shared their review and pre-view experiences in a lively session at the Technical University Delft on May 24th. During the day the organisers and speakers highlighted various aspects of data stewardship with a welcome focus on practice situations, especially in the break-out sessions. (When the presentations are available we will add a link to this blog post.)
In the session called “Why is this a good Data Management Plan?” Marjan Grootveld (DANS, OpenAIRE) and Ellen Leenarts (DANS, EOSC-hub) presented text samples taken from DMPs. By raising their hands – or not! – and subsequent discussion the participants gave their view on the quality of the sample DMP texts. For instance, the majority gave a thumbs-up for “A brief description of each dataset is provided in table 2, including the data source, file formats and estimated volume to plan for storage and sharing”. In contrast, the quote “Both the collected and the generated data, anonymised or fictional, are not envisioned to be made openly accessible.” drew a good laugh and the thumbs went down. Similarly, the information that the length of time for which the data will remain re-usable “may vary for the type of data and <is> difficult to specify at this stage of the project” was found not acceptable; the plan should a least explain why it is difficult, and how and when the project team nevertheless will provide a specific answer. And is it really more difficult than for other projects, whose DMPs do provide this information?
Although it can be hard to be specific in the first version of a DMP, it’s essential to demonstrate that you know what Data Management is about, and that you will deliver FAIR and maximally Open data. Does the DMP, for instance, tell what kind of metadata and documentation will be shared to provide the necessary context for others to interpret the data correctly? Does it distinguish between storing the data during the project and sustainably archiving them afterwards? (Yes, we had a sample text neatly describing the file formats during the data processing stage versus the file formats for sharing and preservation.)
There was consensus in the group on the quality of most of the quotes. Where opinions differed, this had mainly to do with the fact that the quotes were brief and therefore open to more lenient or more picky interpretation. In other cases, a sample text had both positive and negative aspects. For instance, “The source code will be released under an open source licensing scheme, whenever IPR of the partners is not infringed.” was found rather hedging (“whenever”) and unspecific (which licensing scheme?), but the plan to make also source code available is good; too often this seems to be forgotten, when the notion of “data” is understood in a limited way.
The session participants agreed that a plan with many phrases like “where suitable/ where appropriate/ should/ possibly” is too vague and doesn’t inspire much trust. On the other hand, information on who is responsible for particular data management activities is valuable, and so is planning like “The work package leaders will evaluate and update the DMP at least in months 12, 24 and 36”. Reviewers prefer explicit information and commitment to good intentions – which may be something to keep in mind for your “Open A-Team“.
This report is also available in a pdf version on the Open Science Framework: https://doi.org/10.17605/OSF.IO/JR9U2
On 15th and 16th of March 2018, two events dedicated to Electronic Lab Notebooks (ELNs) took place at TU Delft Library: “Digital Notebooks – productivity tools for researchers” and “Digital Notebooks – how to provide solutions for researchers?”. The events were organized by the Research Data Services, TU Delft Library. Both events attracted a lot of attention nationally and internationally, and the tickets got quickly sold out. We were very happy to see the amount of interest in these events, and the inspiring discussions initiated by the participants. During my PhD study in molecular biology and genetics, I have always felt the need for a digital tool to manage my research data. Currently being the Data Steward at the TU Delft Faculties of Applied Sciences and Mechanical, Maritime and Materials Engineering, my responsibility is to address the data management needs of the researchers at these faculties. Therefore, it was especially interesting for me to join these events and explore the currently available tools. Below is a report of the first day.
The need for digital notebooks
Many academic researchers use paper notebooks to document all sorts of experimental details ranging from date, purpose, methodology and raw/analyzed data to conclusions. The main problem with paper-based notebooks is that they are not searchable, especially considering that each researcher typically leaves behind a shelf full of such notebooks. As a result, it often becomes very difficult to find the results and details of experiments performed by previous lab members or even just to read and understand the related handwritten notes. Moreover, paper notebooks mostly could store only a printed copy of the finalized dataset, which is not reusable. Furthermore, in a paper notebook, it is impossible to directly link the experimental details to all of the raw, intermediate and final datasets which are mostly digital. Together all of these do not only decrease research efficiency but also presents challenges to research reproducibility, which is a particularly important issue in the light of the current reproducibility crisis in science.
Digital notebooks provide a searchable alternative to paper-based traditional notebooks, and additionally offer lots of efficiency-saving integrations – with various cloud storage platforms, calendars and project management tools.
Digital Notebooks – productivity tools for researchers on 15th of March
This full-day event was aimed at researchers, students, and supervisors who are interested in making their research digital, and research support staff who want to learn more about ELNs and how could ELNs meet the needs of the researchers. All of the presentations in this event can be found here: DOI 10.5281/zenodo.1247390.
Image from the presentation by Esther Maes
Esther Maes from TU Delft Library opened the event stressing the importance of archiving and that archiving is required not only to minimize the risk of losing data but also to avoid fraud. She continued with asking intriguing questions: “What happens when you leave? How can people access the correct version of your data? Is it even easily accessible for you?”
Then Alastair Dunning, the head of TU Delft Research Data Services and 4TU.Centre for Research Data, took the lead and emphasized that data documentation is a time-consuming process, involving many disjointed jumps such as experimenting, analyzing, indexing and publishing, therefore there is a need for making data documentation smoother. He finalized his speech with a valuable remark stating that a new digital solution cannot have poorer usability than the existing paper ones.
The rest of morning sessions focused on case studies from researchers who not only use digital notebooks in daily practice but also took the lead in the implementation of the ELNs in their research groups and institutes.
Image from the presentation by Alastair Dunning
Case study 1: Let’s go digital; keeping track of your research using eLABJournal by Evelien Stouten from the Department of Biology, Utrecht University
Evelien Stouten described that the researchers expect an ELN to be not only well-organized and searchable but also suitable for integration with other tools and software packages, adding literature references and data sharing with collaborators. She also highlighted that an ELN is expected to provide safe data storage and be fraud-proof, meaning that everything that is documented remains traceable, even if it is deleted or changed.
Image from the presentation by Evelien Stouten
The Faculty of Science at Utrecht University started discussing ELNs in 2013 and the researchers were invited to take part in a test phase from 2014. Her research group found out that eLABJournal meets their expectations and provides an additional application suitable for their needs, namely eLABInventory. This application enables digital documentation and categorization of samples such as strains, plasmids, cell lines, chemicals, antibodies, RNA and DNA samples, and linking of these samples to the experimental data. She stressed that they are obligated by law to keep records of all genetically modified organisms (GMO) and usage of eLABInventory is currently obligatory for all Utrecht University labs using GMOs. She also mentioned that they find the mobile app useful since it enables the researchers to use eLABJournal also on their phones or tablets when they are working in the lab.
She concluded her talk by pointing out that some people are really attached to their paper lab journals and it might take some effort to convince them to start using it, even though it is made obligatory.
Case study 2: From paper to screen: What users really think about electronic lab notebooks by Katharina Hanika, Department of Plant Sciences, Wageningen University
Image from the presentation by Katharina Hanika
Katharina Hanika shared with the audience her experience with eLABJournal and her insights into using ELNs. She focused on why to switch from paper to screen by listing the pros and cons of ELNs. For pros, she indicated the readable, structured and searchable information, digital storage of samples, and easy collaboration with colleagues not only for sharing or discussing data but also for version control. As for cons, she pointed out that the startup was time-intensive since it takes time to figure out how the program works. Moreover, a good internet connection is required as eLABJournal is web-based. Although eLABJournal is still under improvement, she sees that as an advantage, since the company provides support and adjusts accordingly to needs of the researchers.
She further continued with discussing how to achieve department-wide implementation of ELNs. She suggested that it is best to start with volunteers since it is challenging to convince the “creatures of habit” to change their ways of working. She pointed out that if researchers try ELNs themselves, they can get frustrated and give up, and therefore it is a good idea to first start with online demonstrations and hands-on exercises. It would be also beneficial to assign the experienced ELN users as contact persons to be reached for questions. Moreover, creating an ELN user group would enable researchers to help each other.
She concluded her talk by stating that any (electronic) lab notebook is only as good as its user and what it takes is time, commitment and adaptability.
Case study 3: Enabling connectivity in electronic laboratory notekeeping – a pilot approach in biomedical sciences by Harald Kusch, University Medical Center Göttingen
Image from the presentation by Harald Kusch
Harald Kusch talked about the pilot implementation of RSpace at CRC 1002 Research Data Platform. He highlighted that using an ELN enables linking of experimental data to other relevant elements, such as catalogs for cell lines, mouse lines, and antibodies, as well as databases. He explained the possible ways of structuring data in an ELN, which are chronological, project-oriented and method-oriented. Although it is a challenge to decide which is the best option, the chronological option is the only option in a paper lab journal. He described that RSpace allows both structured and unstructured documentation. Structured documentation is very handy, especially for new people in the lab, as it allows using centralized protocols and facilitated metadata recording. Meanwhile, unstructured documentation offers room for creativity and is especially suitable for new lab protocols. He also stressed that all versions of each document are saved, which prevents fraud. He explained that the data can be exported in different formats, such as PDF, HTML, and XML. Moreover, RSpace offers interfaces for easy transfer of datasets to data repositories such as Dataverse. He finalized his talk emphasizing that start-up phase takes time.
Interactive questions from the audience
During this interactive session, the audience had the chance to ask their questions to the presenters of the case studies. Most questions were focused on the following topics:
Where is the data stored? Is institutional data storage an option?
- Both eLABJournal and RSpace give the institutional data storage option to their users.
How to use an ELN in a lab environment without going up and down between the lab and the office to write down notes?
- Katharina: There are fixed tablets available in the lab, some people directly type in the tablet, some make handwritten notes and go back to their PCs.
- Harald: Not every lab can afford a tablet per lab member, but it may also not be necessary.
- Evelien: Not everyone types right away, some prefer to make small notes and then type it in the ELN in the office.
What happens to the added hyperlinks in the ELNs if folders are moved, do links work still?
- If the name or location is changed, the link would indeed break but at least it is possible to trace back to the previous link. There is no direct solution available yet.
Does setting up an ELN in a department need a fully dedicated staff member?
- To be able to implement an ELN, an ideal way would be that a lab member who knows the research type and needs takes the lead to implement it.
Keynote by Alastair Downie, University of Cambridge: Choosing an Electronic Lab Notebook
Alastair Downie told that the first ELN came in 1997 and the industry was quick to adopt, while this was not the case with academia. He explained that the industry has a variety of incentives to use ELNs, such as the requirement for absolutely consistent processes, protection of intellectual property and other commercial and corporate responsibilities. He answered the question “What is holding universities back?” saying that there are so many different types of ELNs and so many different types of research and research needs which altogether makes it difficult to find the ideal solution. To make it easier for the researchers to choose an ELN, he prepared a valuable resource with an overview of the available solutions. In this source, information about a variety of issues are provided:
- What is an electronic lab notebook and why should I use one?
- A note about DIY systems
- ELN vs LIMS
- Disengagement – what if I want to change systems?
- Narrowing the scope, creating a shortlist
- Evaluating ELN products
- Table of 25 current ELN products
- Discussion forum
As an alternative option to the available ELNs, he introduced Do-It-Yourself (DIY) ELNs which could be made by using tools such as EVERNOTE, OneNote, asana, Basecamp, Dropbox, OneDrive. He emphasized that using one of these tools as a DIY ELN still requires a very disciplined approach; however, without any ELN, one needs to be even more structured. He also stressed that these tools are not designed to be used as an ELN and therefore do not provide custom solutions.
Image from the presentation by Alastair Downie
He also focused on the question “What if you chose the wrong product?”. It is possible that after implementing an ELN, the ELN software can change and may not be really suitable for the research needs of the users. If you stop using an ELN, in most cases all you can export is a PDF, HTML or XML file(s), but on the other hand at least such files are easily accessible and searchable and can be backed-up and securely stored.
Then he focused on creating a shortlist to find the ideal option:
- Do you have a budget?
- Free or a paid ELN? Is a paid ELN worth the money?
- Will you use the software as an individual or a group?
- Collaborative vs self-contained, comprehensive vs lightweight
- Do you need team collaboration and supervisor features?
- Group activity dashboard, commenting & discussions
- Constant discussion, even if the group leader is away
- Departmental or institutional deployment?
- Please everyone? Or focus on stability, accessibility, and universal relevance?
- Do you need multi-operating system (OS) compatibility?
- Browser-based & OS agnostic, or application-based
- What devices will be used to operate the software?
- Tablets on bench? Voice recognition? Phones? Paper?
- Data security and compliance requirements?
- GDPR compliance? Local storage?
He further explained how to evaluate the shortlisted products:
- Interface design: Look and feel user-friendly, intuitive and efficient?
- Workflow suitability: Does ELN workflow match your own workflow?
- Content creation tools: Writing, drawing, annotation, markup, equations, chemical structures…
- Data management & storage features: Upload typical file types/sizes? Larger files? Display/operation? Backed-up?
- Integration with other software and/or cloud services: Office apps, Statistics, Institutional storage, Community repositories…
- Collaboration features: Share data and comments in a group? Invite external collaboration?
- Group leader/Supervisor features: Sufficient oversight and feedback tools? Team/account management?
- Export features: Pages, sections, entire ELN? Data in original formats?
More detailed information can be found at: www.gurdon.cam.ac.uk/eln
Info from ELN providers about afternoon workshops
There were four ELN providers present at the event:
- RSpace – presentation by Richard Adams
- eLABJournal – presentation by Ulrike Dijkman, Florian Studener
- labfolder – presentation by Yannick Skop
- Hivebench – presentation by Wouter Haak and Julien Therier
Before the interactive demonstration sessions, each ELN provider was given the opportunity to give a pitch about their ELN product. The presentations in this session and the morning session can be found here: DOI 10.5281/zenodo.1247390.
Hands-on workshops and opportunity to test tools offered by various ELN providers
In this session, the participants were given the opportunity to try out the ELNs listed above and ask their questions directly to the providers. Here is the feedback that was given by the participants about each ELN at the end of the hands-on workshops:
After this event, we got contacted by researchers from various TU Delft departments to discuss the possibilities of implementing an ELN. Currently, we are in contact with researchers to determine what they expect and require from an ELN and we are planning to start a pilot study afterwards.
I would like to finalize this report by sharing the feedbacks given by the participants about this event:
This report is available in a pdf version on the Open Science Framework: https://doi.org/10.17605/OSF.IO/JR9U2
First of all, I would like to thank the Research Data Services, TU Delft Library for organizing this very informative event. We also thank all the speakers for the informative presentations and all the participants for the fruitful discussions. Finally, I would like to give special thanks to Marta Teperek for her critical reading and inspiring suggestions during the preparation of this report.
Authors (in alphabetical order; underlined are the main authors of the blog post): Charlotte Buus Jensen, Valentino Cavalli, Maria Cruz, Raman Ganguly, Madeleine Huber, Mojca Kotar, Iryna Kuchma, Peter Löwe, Inge Rutsaert, Melanie Stummvoll, Gintare Tautkeviciene, Marta Teperek, Hannelore Vanhaverbeke
On 1 December 2017 Maria Cruz and Marta Teperek facilitated a workshop titled Evaluation of Research Careers fully acknowledging Open Science Practices. This was part of a larger conference – Digital Infrastructures for Research 2017 in Brussels, Belgium. The workshop was attended by about 15 people from various backgrounds: library professionals, repository managers, research infrastructure providers, members of international networks for research organisations and others. Below is the summary of what happened at the workshop, key discussions and suggested next steps.
Rationale for the workshop
The workshop was inspired by a report published by the European Commission’s Working Group on Rewards under the Open Science Policy Platform “Evaluation of Research Careers fully acknowledging Open Science Practices”. Noting that “exclusive use of bibliometric parameters as proxies for excellence in assessment (…) does not facilitate Open Science”, the report concludes that “a more comprehensive recognition and reward system incorporating Open Science must become part of the recruitment criteria, career progression and grant assessment procedures…” However, in order to make this a reality, multiple stakeholders need to be involved and make appropriate steps to recognise and implement open science practices. The workshop aimed at developing roadmaps for some of these stakeholders and at identifying ways of effectively engaging with them, and discussing their possible goals and actions.
What happened on the day
The initial plan was to look into four different stakeholder groups: research institutions, funding bodies and governments, principal investigators, and publishers. However, in order to ensure group work and interaction between workshop participants given that only about 15 people attended the workshop, it was decided the focus would be solely on the first two stakeholder groups: research institutions and funding bodies and governments. These stakeholders were also identified in the original EC’s report.
The participants split into two teams, each trying to create a roadmap for a different stakeholder group using collaborative google documents. To start with, the teams tried to address the following four questions for their stakeholders:
- What methods could be used to effectively engage with this stakeholder group and to ensure that they are willing to implement Open Science Practices in their research evaluation?
- What should be the goals for this stakeholder to fully implement Open Science Practices in research evaluation? What are the key milestones?
- What will be the main barriers to implementation of these goals and how to overcome them?
- Propose metrics which could be used to assess this stakeholder’s progress towards implementation of Open Science Practices in their research evaluation practices.
Subsequently, the groups swapped stakeholders, and reviewed and commented on the work of the other group, enriching the roadmaps and adding a broader perspective. The workshop concluded with a reporting session which brought the two groups together and allowed the attendees to engage in discussion.
Key observations about successfully engaging with research institutions
The participants identified internal and external drivers important to engage with research institutions and to encourage them to change their academic rewards systems to ones based on open science practices. Not surprisingly, requirements for open science from funding bodies and governments were at the very top of external drivers’ list. If funders start identifying commitment to open science practices as funding criteria, institutions will have no choice but to reward researchers for open science in order to continue securing funding bids.
One of the most appealing internal drivers discussed was lobbying within institutions by prominent researchers who are themselves committed to open science: this could not only help institutions roll out policy changes, but also demonstrate to younger researchers that commitment to open science might be valuable for their careers.
Key observations about successfully engaging with funding bodies and governments
Quite interestingly, external drivers were also seen as important factors to engage with funding bodies and governments. Joint statements from several academic institutions were mentioned as tangible ways to establish effective collaborations with funding bodies. Therefore, there seems to be a need for synergy between institutions and funding bodies/governments. In addition, it has been stressed that better networks between international funding agencies and governments might also lead to cross-fertilisation of ideas and good practice exchange. For example, the European Commission could advise Member States to develop national policies on open science.
Lack of credible metrics to measure the commitment to open science practices was discussed as one of the main barriers, which might discourage funders and governments from changing academic rewards systems.
Can quality be measured with quantitative metrics?
The initial discussion about the lack of credible evaluation metrics as a potential barrier preventing funding bodies and governments from changing their academic rewards systems led to a longer debate about the usefulness of metrics in open science in general. One of the participants mentioned that a new metric, analogous to journal’s impact factor but tailored to research data, could potentially offer a solution to the problem. However, others felt that it might be simply inappropriate to measure qualitative outcomes with quantitative metrics, and such approach risks replicating all the flaws of metrics based on journal’s impact factor. It was proposed that instead high-quality peer-review on selected outputs should be emphasised and promoted (and rewarded as well).
The short-term aim is to share the outcomes of this workshop with the authors of the European Commission’s Working Group on Rewards under Open Science “Evaluation of Research Careers fully acknowledging Open Science Practices”.
In addition, roadmaps for the two remaining stakeholder groups (publishers and principal investigators) need to be drafted. Moreover, and as pointed out by participants of this workshop, even though it could be impossible (or not desirable) to create metrics for commitment to open science practices, it would be still valuable to develop frameworks for the different stakeholders to provide them with broad guidelines as to what kind of achievements could be rewarded. The same frameworks could be also used by researchers as a source of inspiration and motivation for open science.
Finally, one of the key drivers for change, identified during the workshop, were funding bodies and pilot funding schemes to which only researchers able to demonstrate commitment to openness could apply. Such funding schemes would not only allow the community to learn about suitable ways of assessment of open science practices, but would also provide researchers practising open science with immediate benefits and much needed recognition.
- Slides in support of the workshop
- Roadmaps prepared by the workshop participants for the two different stakeholder groups
- European Commission’s Working Group on Rewards under Open Science “Evaluation of Research Careers fully acknowledging Open Science Practices”
Written by: Marta Teperek and Madeleine de Smaele
On 3 November 2017 Madeleine de Smaele from TU Delft Library was invited by Scott Cunningham, Associate Professor at the Faculty of Technology, Policy & Management, to deliver a workshop to his Data Science students. Marta Teperek attended the workshop as an observer and, given that she only started working at TU Delft on 15 August, it was also a good opportunity for her to learn more about data management support available to researchers.
Below are our key reflections on that session.
Structure and content
Madeleine’s session was divided into two parts, each one lasting for 45 minutes and with a 15 minutes break in between. The session was a mixture of Madeleine’s presentation and some interactive exercises.
Part I – finding datasets
The first part introduced:
- The key elements of research data management: data backup, file organisation and data management plans
- Information about data licensing, with the CC kiwi video explaining the different types of Creative Commons licences
- Information about research data repositories and finding existing research datasets
The first part concluded with an interactive exercise where participants were asked to find a repository and a dataset of interest for their research, by using re3data.org. Afterwards, we had a roundtable discussion about the datasets found by the participants and what was good and what not so good about them (e.g. clear licence, citation, DOI).
Part II – publishing own datasets
In the second part of the workshop, we discussed the benefits and ways of publishing own research data. We thought this was relevant to the course participants as they had been working on a dataset for their data science course. We thought that they could have been interested in sharing their study results in a repository, and thus getting credit for their work. We spoke about DOIs, visibility and tracking citations.
The second part finished with an exercise as well, where participants were allowed to practise depositing research data into the 4TU.Centre for Research Data.
This was the first time that TU Delft Library was delivering a similar presentation to students, so we thought it was necessary to ask the participants for feedback afterwards to see how the session could be improved in the future.
What went well
We were happy to see that participants valued the interactive exercise on finding existing datasets and that they liked the information we provided about data sharing possibilities. Many participants were also happy to learn about the various repositories available for them to use (not only for datasets), as well as about the dedicated support available to them at TU Delft.
We were also happy to see that students liked the slides and they valued the presenter.
What could be improved
It was also extremely useful for us to learn how our sessions could be improved in the future.
The primary suggestion was to tailor the content to the level of knowledge of the students. It turned out that students were already familiar with the principles behind good data management and the benefits of data sharing, and therefore wished the pace of the session to be increased and more focused on the parts they were not aware of. In addition, the participants wanted to see more examples tailored to their discipline and types of research.
The other suggestion was to make the session more interactive: to ask more questions and to facilitate more discussion throughout the session. This could also allow the presenter to expose the right content to the participants during the presentation.
In the future, we will want to find out more about the audience in advance of the workshop to ensure that we can tailor the messages, examples and pace of the session better. We will also revise the content of the workshop to make it more interactive and to facilitate more discussions with the participants along the session.
In addition, we also had issues with accessing the live version of the 4TU.Centre for Research Data during the demo, which was quite unfortunate. To future-proof ourselves, we will prepare some screenshots of a deposit process and always have the slides with us during similar presentations.
Overall, it was a very useful exercise for us and provided us with a lot of ideas on how we could improve the workshop in the future. We are very grateful to both Dr Scott Cunningham and his students for the opportunity.
- Madeleine’s presentation: https://doi.org/10.6084/m9.figshare.5579731.v1