Late last month, I took a day trip to the Netherlands to attend an event at TU Delft entitled “Towards cultural change in data management – data stewardship in practice”. My Software Sustainability Institute Fellowship application “pitch” last year had been based around building bridges and sharing strategies and lessons between advocacy approaches for data and software management, and encouraging more holistic approaches to managing (and simply thinking about) research outputs in general. When I signed up for the event I expected it to focus exclusively on research data, but upon arrival at the venue (after a distressingly early start, and a power-walk from the train station along the canal) I was pleasantly surprised to find that one of the post-lunch breakout sessions was on the topic of software reproducibility, so I quickly signed up for that one.
I made it in to the main auditorium just in time to hear TU Delft’s Head of Research Data Services, Alastair Dunning, welcome us to the event. Alastair is a well-known face in the UK, hailing originally from Scotland and having worked at Jisc prior to his move across the North Sea. He noted the difference between managed and Open research data, a distinction that translates to research software too, and noted the risk of geographic imbalance between countries which are able to leverage openness to their advantage while simultaneously coping with the costs involved – we should not assume that our northern European privilege is mirrored all around the globe.
Danny Kingsley during her keynote presentation
The first keynote came from Danny Kingsley, Deputy Director of Scholarly Communication and Research Services at the University of Cambridge, whom I also know from a Research Data Management Forum event I organised last year in London. Danny’s theme was the role of research data management in demonstrating academic integrity, quality and credibility in an echo-chamber/social media world where deep, scholarly expertise itself is becoming (largely baselessly) distrusted. Obviously as more and more research depends upon software driven processing, what’s good for data is just as important for code when it comes to being able to reproduce or replicate research conclusions; an area currently in crisis, according to at least one high profile survey. One of Danny’s proposed solutions to this problem is to distribute and reward dissemination across the whole research lifecycle, not only attaching credit and recognition/respect to traditional publications, but also to datasets, code and other types of outputs.
Questions from the audience
After a much-appreciated coffee break, Marta Teperek introduced TU Delft’s Vision for data stewardship, which, again, has repercussions and relevance beyond just data. The broad theme of “Openness”, for example, is one of the four major principles in current TU Delft strategic plan, indicating the degree of institutional support it has as an underpinning philosophy. Marta was keen to emphasise that the cohort of data stewards which Delft have recently hired are intended to be consultants, not police! Their aim is to shift scholarly culture, not to check or enforce compliance, and the effectiveness of their approach is being measured by regular surveys. It will be interesting to see how they have got on in a year or two years’ time: already they are looking to expand from one data steward per faculty to one per department.
There followed a number of case studies from the Delft data stewards themselves. My main takeaways from these were the importance of mixing top-down and bottom-up approaches (culture change has to be driven from the grassroots, but via initiatives funded by the budget holders at the top), and the importance of driving up engagement and making people care about these issues.
Data Stewards answering questions from the audience
After lunch we heard from a couple of other European universities. From Martine Pronk, we learned that Utrecht University stripes its research support across multiple units and services, including library and the academic departments themselves, in order to address institutional, departmental, and operational needs and priorities. In common with the majority of UK universities, Utrecht’s library is main driving and coordination force, with specific responsibility for research data management being part of the Research IT programme. From Stockholm University’s Joakim Philipson we heard about the Swedish context, which again seemed similar to the UK’s development path and indeed my own home institution’s. Sweden now has a national data services consortium (the SND), analogous to the DCC in the UK, and Stockholm, like Edinburgh, was the first university in its country to have a dedicated RDM policy.
We then moved into our breakout groups, in my case the one titled “Software reproducibility – how to put it into practice?”, which had a strange gender distribution with the coordinators all female, but the other participants all male. One of the coordinators noted that this reminded her of being an Engineering undergraduate again. We began by exploring our own roles and levels of experience/understanding of research software. The group comprised a mixture of researchers, software engineers, data stewards and ‘other’ (I fell into this last category), and in terms of hands-on experience with research software roughly two-thirds of participants were actively developing software, and another third used it. Participants came from a broad range of research backgrounds, as well as a smaller number of research support people such as myself. We then voted on how serious we felt the aforementioned reproducibility crisis actually was, with a two-thirds/one-third split between “crisis” and “what-crisis?” We explored the types of issues that come to mind when we think about software preservation, with the most popular responses being terms such as “open source”, “GitHub” and “workflows”. We then moved on to the main business of the group, which was to consider a recent article by Hut, van de Giesen and Drost. In a nutshell, this says that archiving code and data is not sufficient to enable reproducibility, therefore collaboration with dedicated Research Software Engineers (RSEs) should be encouraged and facilitated. We broke into smaller groups to discuss this from our various standpoints, and presented back in the room. The various notes and pitches are more detailed than this blog post requires, but those interested can check out the collaboratively-authored Google Doc to see what we came up with. The breakout session will also be written up as a blog post and an IEEE proposal, so keep an eye out for that.
After returning to the main auditorium for reports from each of the groups, including an interesting-looking one from my friend and colleague Marjan Grootveld on “Why Is This A Good Data Management Plan?”, the afternoon concluded with two more keynote presentations. First up, Kim Huijpen from VSNU (the Association of Universities in the Netherlands) spoke about “Giving scientists more of the recognition they deserve”, followed by Ingeborg Verheul of LCRDM (the Dutch national coordination point for research data management), whose presentation was titled “Data Stewardship? Meet your peers!” Both of these national viewpoints were very interesting from my current perspective as a member of a nationally-oriented organisation. From my coming perspective as manager of an institutional support service – I’m in the process of changing roles at the moment – Kim’s emphasis on Team Science struck a chord, and relates to what we’re always saying about research data: it’s a hybrid activity, and takes a village to raise a child, etc. Ingeborg spoke about the dynamics involved between institutional and national level initiatives, and emphasised the importance of feeling like part of a community network, with resources and support which can be drawn upon as needed.
Closing the event, TU Delft Library Director Wilma van Wezenbeek underlined the necessity of good data management in enabling reproducible research, just as the breakout group emphasised the necessity of software preservation, and in effect confirming a view of mine that has been developing recently: that boundaries between managing data and managing software (or other types of research output) are often artificially created, and not always helpful. We need to enable and support more holistic approaches to this, acting in sympathy and harmony with actual research practices. (We also need to put our money where our mouth is, and fund it!)
After all that there was just enough time for a quick beer in downtown Delft before catching the train and plane back to Edinburgh. Many thanks to TU Delft for hosting a most enjoyable and interesting event, and to the Software Sustainability Institute whose support covered the costs of my attendance.
Several resources from the event are now available:
- Recorded presentations (via YouTube)
- Blog posts:
This blog post reports from a workshop session led by Marjan Grootveld and Ellen Leenarts from DANS. The workshop was part of a larger event “Towards cultural change in data management – data stewardship in practice” organised by TU Delft Library on 24th of May 2018.
This blog post was written by Marjan Grootveld from DANS it was published before on the OpenAIRE blog.
It’s not just colonel Hannibal Smith, who loves it when a plan comes together. Don’t we all? On a more serious note, this also holds for Data Management Plans or DMPs. In a DMP a researcher or research team describes what data goes into a project (reuse) and comes out of it (potential reuse), How the team takes care of the data, and Who is allowed to do What with the data When.
Just like a project plan a DMP undergoes a reviewing process. Often, however, researchers share their draft version and questions with research support staff and data stewards (see the results of this survey by OpenAIRE and the FAIR Data Expert Group). About twenty data stewards shared their review and pre-view experiences in a lively session at the Technical University Delft on May 24th. During the day the organisers and speakers highlighted various aspects of data stewardship with a welcome focus on practice situations, especially in the break-out sessions. (When the presentations are available we will add a link to this blog post.)
In the session called “Why is this a good Data Management Plan?” Marjan Grootveld (DANS, OpenAIRE) and Ellen Leenarts (DANS, EOSC-hub) presented text samples taken from DMPs. By raising their hands – or not! – and subsequent discussion the participants gave their view on the quality of the sample DMP texts. For instance, the majority gave a thumbs-up for “A brief description of each dataset is provided in table 2, including the data source, file formats and estimated volume to plan for storage and sharing”. In contrast, the quote “Both the collected and the generated data, anonymised or fictional, are not envisioned to be made openly accessible.” drew a good laugh and the thumbs went down. Similarly, the information that the length of time for which the data will remain re-usable “may vary for the type of data and <is> difficult to specify at this stage of the project” was found not acceptable; the plan should a least explain why it is difficult, and how and when the project team nevertheless will provide a specific answer. And is it really more difficult than for other projects, whose DMPs do provide this information?
Although it can be hard to be specific in the first version of a DMP, it’s essential to demonstrate that you know what Data Management is about, and that you will deliver FAIR and maximally Open data. Does the DMP, for instance, tell what kind of metadata and documentation will be shared to provide the necessary context for others to interpret the data correctly? Does it distinguish between storing the data during the project and sustainably archiving them afterwards? (Yes, we had a sample text neatly describing the file formats during the data processing stage versus the file formats for sharing and preservation.)
There was consensus in the group on the quality of most of the quotes. Where opinions differed, this had mainly to do with the fact that the quotes were brief and therefore open to more lenient or more picky interpretation. In other cases, a sample text had both positive and negative aspects. For instance, “The source code will be released under an open source licensing scheme, whenever IPR of the partners is not infringed.” was found rather hedging (“whenever”) and unspecific (which licensing scheme?), but the plan to make also source code available is good; too often this seems to be forgotten, when the notion of “data” is understood in a limited way.
The session participants agreed that a plan with many phrases like “where suitable/ where appropriate/ should/ possibly” is too vague and doesn’t inspire much trust. On the other hand, information on who is responsible for particular data management activities is valuable, and so is planning like “The work package leaders will evaluate and update the DMP at least in months 12, 24 and 36”. Reviewers prefer explicit information and commitment to good intentions – which may be something to keep in mind for your “Open A-Team“.
This report is also available in a pdf version on the Open Science Framework: https://doi.org/10.17605/OSF.IO/JR9U2
On 15th and 16th of March 2018, two events dedicated to Electronic Lab Notebooks (ELNs) took place at TU Delft Library: “Digital Notebooks – productivity tools for researchers” and “Digital Notebooks – how to provide solutions for researchers?”. The events were organized by the Research Data Services, TU Delft Library. Both events attracted a lot of attention nationally and internationally, and the tickets got quickly sold out. We were very happy to see the amount of interest in these events, and the inspiring discussions initiated by the participants. During my PhD study in molecular biology and genetics, I have always felt the need for a digital tool to manage my research data. Currently being the Data Steward at the TU Delft Faculties of Applied Sciences and Mechanical, Maritime and Materials Engineering, my responsibility is to address the data management needs of the researchers at these faculties. Therefore, it was especially interesting for me to join these events and explore the currently available tools. Below is a report of the first day.
The need for digital notebooks
Many academic researchers use paper notebooks to document all sorts of experimental details ranging from date, purpose, methodology and raw/analyzed data to conclusions. The main problem with paper-based notebooks is that they are not searchable, especially considering that each researcher typically leaves behind a shelf full of such notebooks. As a result, it often becomes very difficult to find the results and details of experiments performed by previous lab members or even just to read and understand the related handwritten notes. Moreover, paper notebooks mostly could store only a printed copy of the finalized dataset, which is not reusable. Furthermore, in a paper notebook, it is impossible to directly link the experimental details to all of the raw, intermediate and final datasets which are mostly digital. Together all of these do not only decrease research efficiency but also presents challenges to research reproducibility, which is a particularly important issue in the light of the current reproducibility crisis in science.
Digital notebooks provide a searchable alternative to paper-based traditional notebooks, and additionally offer lots of efficiency-saving integrations – with various cloud storage platforms, calendars and project management tools.
Digital Notebooks – productivity tools for researchers on 15th of March
This full-day event was aimed at researchers, students, and supervisors who are interested in making their research digital, and research support staff who want to learn more about ELNs and how could ELNs meet the needs of the researchers. All of the presentations in this event can be found here: DOI 10.5281/zenodo.1247390.
Image from the presentation by Esther Maes
Esther Maes from TU Delft Library opened the event stressing the importance of archiving and that archiving is required not only to minimize the risk of losing data but also to avoid fraud. She continued with asking intriguing questions: “What happens when you leave? How can people access the correct version of your data? Is it even easily accessible for you?”
Then Alastair Dunning, the head of TU Delft Research Data Services and 4TU.Centre for Research Data, took the lead and emphasized that data documentation is a time-consuming process, involving many disjointed jumps such as experimenting, analyzing, indexing and publishing, therefore there is a need for making data documentation smoother. He finalized his speech with a valuable remark stating that a new digital solution cannot have poorer usability than the existing paper ones.
The rest of morning sessions focused on case studies from researchers who not only use digital notebooks in daily practice but also took the lead in the implementation of the ELNs in their research groups and institutes.
Image from the presentation by Alastair Dunning
Case study 1: Let’s go digital; keeping track of your research using eLABJournal by Evelien Stouten from the Department of Biology, Utrecht University
Evelien Stouten described that the researchers expect an ELN to be not only well-organized and searchable but also suitable for integration with other tools and software packages, adding literature references and data sharing with collaborators. She also highlighted that an ELN is expected to provide safe data storage and be fraud-proof, meaning that everything that is documented remains traceable, even if it is deleted or changed.
Image from the presentation by Evelien Stouten
The Faculty of Science at Utrecht University started discussing ELNs in 2013 and the researchers were invited to take part in a test phase from 2014. Her research group found out that eLABJournal meets their expectations and provides an additional application suitable for their needs, namely eLABInventory. This application enables digital documentation and categorization of samples such as strains, plasmids, cell lines, chemicals, antibodies, RNA and DNA samples, and linking of these samples to the experimental data. She stressed that they are obligated by law to keep records of all genetically modified organisms (GMO) and usage of eLABInventory is currently obligatory for all Utrecht University labs using GMOs. She also mentioned that they find the mobile app useful since it enables the researchers to use eLABJournal also on their phones or tablets when they are working in the lab.
She concluded her talk by pointing out that some people are really attached to their paper lab journals and it might take some effort to convince them to start using it, even though it is made obligatory.
Case study 2: From paper to screen: What users really think about electronic lab notebooks by Katharina Hanika, Department of Plant Sciences, Wageningen University
Image from the presentation by Katharina Hanika
Katharina Hanika shared with the audience her experience with eLABJournal and her insights into using ELNs. She focused on why to switch from paper to screen by listing the pros and cons of ELNs. For pros, she indicated the readable, structured and searchable information, digital storage of samples, and easy collaboration with colleagues not only for sharing or discussing data but also for version control. As for cons, she pointed out that the startup was time-intensive since it takes time to figure out how the program works. Moreover, a good internet connection is required as eLABJournal is web-based. Although eLABJournal is still under improvement, she sees that as an advantage, since the company provides support and adjusts accordingly to needs of the researchers.
She further continued with discussing how to achieve department-wide implementation of ELNs. She suggested that it is best to start with volunteers since it is challenging to convince the “creatures of habit” to change their ways of working. She pointed out that if researchers try ELNs themselves, they can get frustrated and give up, and therefore it is a good idea to first start with online demonstrations and hands-on exercises. It would be also beneficial to assign the experienced ELN users as contact persons to be reached for questions. Moreover, creating an ELN user group would enable researchers to help each other.
She concluded her talk by stating that any (electronic) lab notebook is only as good as its user and what it takes is time, commitment and adaptability.
Case study 3: Enabling connectivity in electronic laboratory notekeeping – a pilot approach in biomedical sciences by Harald Kusch, University Medical Center Göttingen
Image from the presentation by Harald Kusch
Harald Kusch talked about the pilot implementation of RSpace at CRC 1002 Research Data Platform. He highlighted that using an ELN enables linking of experimental data to other relevant elements, such as catalogs for cell lines, mouse lines, and antibodies, as well as databases. He explained the possible ways of structuring data in an ELN, which are chronological, project-oriented and method-oriented. Although it is a challenge to decide which is the best option, the chronological option is the only option in a paper lab journal. He described that RSpace allows both structured and unstructured documentation. Structured documentation is very handy, especially for new people in the lab, as it allows using centralized protocols and facilitated metadata recording. Meanwhile, unstructured documentation offers room for creativity and is especially suitable for new lab protocols. He also stressed that all versions of each document are saved, which prevents fraud. He explained that the data can be exported in different formats, such as PDF, HTML, and XML. Moreover, RSpace offers interfaces for easy transfer of datasets to data repositories such as Dataverse. He finalized his talk emphasizing that start-up phase takes time.
Interactive questions from the audience
During this interactive session, the audience had the chance to ask their questions to the presenters of the case studies. Most questions were focused on the following topics:
Where is the data stored? Is institutional data storage an option?
- Both eLABJournal and RSpace give the institutional data storage option to their users.
How to use an ELN in a lab environment without going up and down between the lab and the office to write down notes?
- Katharina: There are fixed tablets available in the lab, some people directly type in the tablet, some make handwritten notes and go back to their PCs.
- Harald: Not every lab can afford a tablet per lab member, but it may also not be necessary.
- Evelien: Not everyone types right away, some prefer to make small notes and then type it in the ELN in the office.
What happens to the added hyperlinks in the ELNs if folders are moved, do links work still?
- If the name or location is changed, the link would indeed break but at least it is possible to trace back to the previous link. There is no direct solution available yet.
Does setting up an ELN in a department need a fully dedicated staff member?
- To be able to implement an ELN, an ideal way would be that a lab member who knows the research type and needs takes the lead to implement it.
Keynote by Alastair Downie, University of Cambridge: Choosing an Electronic Lab Notebook
Alastair Downie told that the first ELN came in 1997 and the industry was quick to adopt, while this was not the case with academia. He explained that the industry has a variety of incentives to use ELNs, such as the requirement for absolutely consistent processes, protection of intellectual property and other commercial and corporate responsibilities. He answered the question “What is holding universities back?” saying that there are so many different types of ELNs and so many different types of research and research needs which altogether makes it difficult to find the ideal solution. To make it easier for the researchers to choose an ELN, he prepared a valuable resource with an overview of the available solutions. In this source, information about a variety of issues are provided:
- What is an electronic lab notebook and why should I use one?
- A note about DIY systems
- ELN vs LIMS
- Disengagement – what if I want to change systems?
- Narrowing the scope, creating a shortlist
- Evaluating ELN products
- Table of 25 current ELN products
- Discussion forum
As an alternative option to the available ELNs, he introduced Do-It-Yourself (DIY) ELNs which could be made by using tools such as EVERNOTE, OneNote, asana, Basecamp, Dropbox, OneDrive. He emphasized that using one of these tools as a DIY ELN still requires a very disciplined approach; however, without any ELN, one needs to be even more structured. He also stressed that these tools are not designed to be used as an ELN and therefore do not provide custom solutions.
Image from the presentation by Alastair Downie
He also focused on the question “What if you chose the wrong product?”. It is possible that after implementing an ELN, the ELN software can change and may not be really suitable for the research needs of the users. If you stop using an ELN, in most cases all you can export is a PDF, HTML or XML file(s), but on the other hand at least such files are easily accessible and searchable and can be backed-up and securely stored.
Then he focused on creating a shortlist to find the ideal option:
- Do you have a budget?
- Free or a paid ELN? Is a paid ELN worth the money?
- Will you use the software as an individual or a group?
- Collaborative vs self-contained, comprehensive vs lightweight
- Do you need team collaboration and supervisor features?
- Group activity dashboard, commenting & discussions
- Constant discussion, even if the group leader is away
- Departmental or institutional deployment?
- Please everyone? Or focus on stability, accessibility, and universal relevance?
- Do you need multi-operating system (OS) compatibility?
- Browser-based & OS agnostic, or application-based
- What devices will be used to operate the software?
- Tablets on bench? Voice recognition? Phones? Paper?
- Data security and compliance requirements?
- GDPR compliance? Local storage?
He further explained how to evaluate the shortlisted products:
- Interface design: Look and feel user-friendly, intuitive and efficient?
- Workflow suitability: Does ELN workflow match your own workflow?
- Content creation tools: Writing, drawing, annotation, markup, equations, chemical structures…
- Data management & storage features: Upload typical file types/sizes? Larger files? Display/operation? Backed-up?
- Integration with other software and/or cloud services: Office apps, Statistics, Institutional storage, Community repositories…
- Collaboration features: Share data and comments in a group? Invite external collaboration?
- Group leader/Supervisor features: Sufficient oversight and feedback tools? Team/account management?
- Export features: Pages, sections, entire ELN? Data in original formats?
More detailed information can be found at: www.gurdon.cam.ac.uk/eln
Info from ELN providers about afternoon workshops
There were four ELN providers present at the event:
- RSpace – presentation by Richard Adams
- eLABJournal – presentation by Ulrike Dijkman, Florian Studener
- labfolder – presentation by Yannick Skop
- Hivebench – presentation by Wouter Haak and Julien Therier
Before the interactive demonstration sessions, each ELN provider was given the opportunity to give a pitch about their ELN product. The presentations in this session and the morning session can be found here: DOI 10.5281/zenodo.1247390.
Hands-on workshops and opportunity to test tools offered by various ELN providers
In this session, the participants were given the opportunity to try out the ELNs listed above and ask their questions directly to the providers. Here is the feedback that was given by the participants about each ELN at the end of the hands-on workshops:
After this event, we got contacted by researchers from various TU Delft departments to discuss the possibilities of implementing an ELN. Currently, we are in contact with researchers to determine what they expect and require from an ELN and we are planning to start a pilot study afterwards.
I would like to finalize this report by sharing the feedbacks given by the participants about this event:
This report is available in a pdf version on the Open Science Framework: https://doi.org/10.17605/OSF.IO/JR9U2
First of all, I would like to thank the Research Data Services, TU Delft Library for organizing this very informative event. We also thank all the speakers for the informative presentations and all the participants for the fruitful discussions. Finally, I would like to give special thanks to Marta Teperek for her critical reading and inspiring suggestions during the preparation of this report.
The TU Delft Research Data Framework Policy is currently going through the necessary wheels before its final ratification
Below is the current version currently being reviewed by stakeholders.
- 15th May – Discussion by Heads of University Department
- 5th June – Discussion with The Graduate School over implementation for PhDs
- 26th June – Final presentation (and hopefully adoption) to the College van Bestuur.
- July onwards – Faculties begin work on their own faculty policies
The 4TU.Centre for Research Data announces its report on research data management within the 4TU Research Centres.
Over the last few months, the 4TU.Centre for Research Data had the chance to make contact and to speak with several of the Scientific Directors of the 4TU Research Centres about research data management. The report published today highlights the findings from these contacts and conversations.
A citable version of the report is available on OSF Preprints (DOI: 10.17605/OSF.IO/SGFTW).
1. Research data management is not addressed at a strategic level by the 4TU Research Centres, but left to individual research groups or to individual researchers connected to the Centres.
2. Within the 4TU Research Centres, there is a broad range of attitudes towards data and a broad range of data types and characteristics, including large datasets; commercially sensitive datasets; privacy and ethical concerns regarding data; software and its sustainability.
3. Software sustainability is an important and much discussed topic, but there are currently no standards or systematic way of looking after software.
4. Research on human subjects and datasets including personally identifiable information or sensitive personal information are more prominent than might be expected in engineering and the technical sciences. Lack of transparency and reproducibility of scientific results can be an issue in these areas because the underlying datasets are often not available.
An Opportunity to Collaborate
Research data management is increasingly viewed as an important part of high-quality research. International and national funding bodies now mandate institutions and researchers to make data available. Data sharing is predicated on good research data management and has the potential to make scientific research more transparent, open, and efficient. In view of these principles and developments, the 4TU.Centre for Research Data wishes to maintain and deepen its links with the 4TU Research Centres and to support the Centres in various aspects of research data management.
We have an exciting job opening for a Data Steward at TU Delft at the Faculty of Architecture & Built Environment and the Faculty of Industrial Design (joint appointment): https://www.academictransfer.com/employer/TUD/vacancy/45483/lang/en/
- Closing date: 15 March 2018
- Salary: up to € 4084/month
- We are looking for individuals enthusiastic about data management and who have a PhD degree in the relevant subject area (or equivalent experience).
This is a great chance to join the dynamically growing team of Data Stewards at TU Delft and to contribute to a cultural change in research data management in a disciplinary manner. The job is really about inspiring the research community and improving day to day practices, and not about policy compliance.
All informal inquiries can be directed to me: M.Teperek@tudelft.nl
Author: Heather Andrews, Data Steward at the Faculty of Aerospace Engineering