Written by Shalini Kurapati and Marta Teperek
Training needs: research computing skills for open science
In addition to good data management, software sustainability is important for open science.
In accordance with the survey conducted by the Software Sustainability Institute in 2014, 7 out of 10 researchers rely on code for their research. Sharing research data without the supporting code often makes research impossible to reproduce. Good documentation and version control have been highlighted as major contributors to sustainable software. In addition, earlier workshops and survey results indicated that researchers need training on good code writing and code management practices and version control.
Similarly, TU Delft-wide survey on data management needs revealed that 32% of researchers were interested in training on version control and 18% specifically in software carpentry workshops.
What are The Carpentries?
The Carpentries “teach foundational coding, and data science skills to researchers worldwide.” That’s a community-based organisation, which maintains and develops curricula for three different types of workshops: software carpentry, data carpentry, and library carpentry. Detailed and structured lesson plans are available on GitHub and they are delivered by a network of carpentry instructors.
An important element of The Carpentries is that in order to deliver a workshop, instructors need to be certified. The certification process puts a particular emphasis on the pedagogical skills of the instructors.
First software carpentry at TU Delft
TU Delft hosted the first software carpentry workshop on 29 November 2018 as a pilot before officially joining The Carpentries. We had around 30 researchers participating (and another 45 on the waiting list!). The participants were from four faculties at TU Delft: Civil Engineering and Geosciences, Applied Sciences, Technology Policy & Management, and Architecture and Built Environment. We had three instructors and four helpers in the room.
The GitHub pages with the lesson materials are publicly available and can be found here: https://mariekedirk.github.io/2018-11-29-Delft/ All participants were asked to bring their laptops along and to install some specific software. No prior programming knowledge was required. Collaborative notes were taken with Etherpad.
During the workshop, participants downloaded a prepared dataset and they worked with that dataset through the two days. They learnt task automation using Unix shell, version control using git, and python programming using jupyter notebooks.
The Carpentries have a special way of organising feedback. Participants receive red and green post-it notes and use them to indicate problems / completion of tasks during the whole course. Similarly, after the end of each day, the participants are asked to indicate all the plus sides and negatives of the workshop on green and red post-it notes, respectively.
The feedback from the participants after the workshop helped us evaluate the training. The participants were overwhelmingly appreciative of the instructors and helpers and seem to have enjoyed the training. Some of the participants felt that the pace of the workshop was fast and they did not have time to experiment with the data set. Some others wished to get a more personal approach and to actually get an opportunity to work with their own disciplinary datasets.
Plans for the future
The waiting list for the workshop was very long and we had to disappoint more than 45 researchers who didn’t manage to get their spot on the day. In addition, faculty graduate schools have been willing to give course credits for PhD students who attend this workshop, which made the course even more attractive to attend for PhD students. Therefore, to meet the demand, we are planning to organise four more workshops in 2019: two workshops at TU Delft, one in Eindhoven and one in Twente. We will continue to monitor the number of interested researchers and if the need arises, we might consider scheduling some additional courses.
In addition, to increase our capacity in delivering carpentry training, some of the TU Delft’s data stewards and data champions will attend the training to become instructors. We hope to have this instructor training organised in April.
To address the feedback about the pace of the course, we will be more selective and include fewer exercises in our future workshops to ensure that the participants get the chance to experiment and play with their datasets and scripts.
In order to provide some more tailored support to researchers who have started to code but need some additional support to make it work, or who might have attended a carpentry workshop but are not sure how to apply the learning into practice, we will host dedicated coding walk-in hours consultations starting in January 2019.
So… watch out for the next carpentry workshop – scheduled for Spring 2019!
Post by Amit Gal, Alastair Dunning and Nicole Will
The research organisastion Ithaka S+R recently issued the report “Scholars ARE Collectors: A Proposal for Re-thinking Research Support”. The report takes a user-centred approach when trying to understand what would be a good way to support researchers in the future, and outline possible places to invest.
It makes the case that researchers are, in fact, collectors, and that their (often massive) collections vary widely in form across different disciplines. All of these collections however, are not properly managed – which is quite understandable, as “collecting” requires a different set of skills and tools than “researching”.
From the context of our own research support services at TU Delft, we made some specific points in observation :
1. The report has a great focus on the right point of view – the user’s point of view. If we at TU Delft want to support the researcher better, we must understand her better. That means more than just knowing what she does, it means having an empathetic understanding of why she does it and who she is. Understanding is more than just talking.
3. The report points to four different stakeholders that support scholarly collecting – funders, open data advocacy groups, external tool and service providers, and academic institutions. It might be useful to realize that we, as the TU Delft library, represent two of these stakeholders – we are the academic institution, naturally, but we are also the FAIR data advocacy group. Is it possible that these two sometimes clash? Could one role impede the other, and if so – how should we address it?
3. The journey of understanding our users better, improving our services and creating new, better ones – is a journey we cannot be taking on our own. At the very least, ICT and the researcher groups must be partners here. So we should get better at collaborating with these, and other, parties around us.
4. Some of the language (eg, ‘scholars’, ‘personal collections’) and evidence here is drawn from the humanities and doesn’t feel right in the context of a technical university. The report misses some of the language and developments occurring in a technical university (eg., there is no mention of data science, data stewards etc, and the importance of writing code or running simulations is underplayed)
5. Our instinct is that scientists (as opposed to humanities scholars) have fewer ‘personal collections’ and more ‘group collections’. E.g. A team gets access to data, or a department collects data, or a consortium writes a proposal, or a group writes a paper. While individual roles always play a part, access to these different outputs is managed at a team level.
6. Many of the key points are similar to what we know here at TU Delft, eg about fear of being scooped or the time taken to document data. The metaphor of collection is also important, as it emphasises the emotional ownership scientists feel about their outputs.
7. The conclusions of the final page is definitely worth holding on how do we (and by that I mean not just the library but all the relevant support service) offer the kind of support the researcher needs throughout her workflow (not just the start and end). The goal is not Open Science per se, but getting to Open Science by responding to specific user needs.
Presentation given at European Open Science Cloud (EOSC) Stakeholder’s Forum, Vienna, November 22 2018
This blog post was originally published by the LSE Impact Blog.
Recommendations on how to better support researchers in good data management and sharing practices are typically focused on developing new tools or improving infrastructure. Yet research shows the most common obstacles are actually cultural, not technological. Marta Teperekand Alastair Dunning outline how appointing data stewards and data champions can be key to improving research data management through positive cultural change.
By now, it’s probably difficult to find a researcher who hasn’t heard of journal requirements for sharing research data supporting publications. Or a researcher who hasn’t heard of funder requirements for data management plans. Or of institutional policies for data management and sharing. That’s a lot of requirements! Especially considering data management is just one set of guidelines researchers need to comply with (on top of doing their own competitive research, of course).
All of these requirements are in place for good reasons. Those who are familiar with the research reproducibility crisis and understand that missing data and code is one of the main reasons for it need no convincing of this. Still, complying with the various data policies is not easy; it requires time and effort from researchers. And not all researchers have the knowledge and skills to professionally manage and share their research data. Some might even wonder what exactly their research data is (or how to find it).
Therefore, it is crucial for institutions to provide their researchers with a helping hand in meeting these policy requirements. This is also important in ensuring policies are actually adhered to and aren’t allowed to become dry documents which demonstrate institutional compliance and goodwill but are of no actual consequence to day-to-day research practice.
The main obstacles to data management and sharing are cultural
But how to best support researchers in good data management and sharing practices? The typical answers to these questions are “let’s build some new tools” or “let’s improve our infrastructure”. When thinking how to provide data management support to researchers at Delft University of Technology (TU Delft), we decided to resist this initial temptation and do some research first.
Several surveys asking researchers about barriers to data sharing indicated that the main obstacles are cultural, not technological. For example, in a recent survey by Houtkoop at el. (2018), psychology researchers were given a list of 15 different barriers to data sharing and asked which ones they agreed with. The top three reasons preventing researchers from sharing their data were:
- “Sharing data is not a common practice in my field.”
- “I prefer to share data upon request.”
- “Preparing data is too time-consuming.”
Interestingly, the only two technological barriers – “My dataset is too big” and “There is no suitable repository to share my data” – were among three at the very bottom of the list. Similar observations can be made based on survey results from Van den Eynden et al. (2016) (life sciences, social sciences, and humanities disciplines) and Johnson et al. (2016) (all disciplines).
At TU Delft, we already have infrastructure and tools for data management in place. The ICT department provides safe storage solutions for data (with regular backups at different locations), while the library offers dedicated support and templates for data management plans and hosts 4TU.Centre for Research Data, a certified and trusted archive for research data. In addition, dedicated funds are made available for researchers wishing to deposit their data into the archive. This being the case, we thought researchers may already receive adequate data management support and no additional resources were required.
To test this, we conducted a survey among the research community at TU Delft. To our surprise, the results indicated that despite all the services and tools already available to support researchers in data management and sharing activities, their practices needed improvement. For example, only around 40% of researchers at TU Delft backed up their data automatically. This was striking, given the fact that all data storage solutions offered by TU Delft ICT are automatically backed up. Responses to open questions provided some explanation for this:
- “People don’t tell us anything, we don’t know the options, we just do it ourselves.”
- “I think data management support, if it exists, is not well-known among the researchers.”
- “I think I miss out on a lot of possibilities within the university that I have not heard of. There is too much sparsely distributed information available and one needs to search for highly specific terminology to find manuals.”
It turns out, again, that the main obstacles preventing people from using existing institutional tools and infrastructure are cultural – data management is not embedded in researchers’ everyday practice.
How to change data management culture?
We believe the best way to help researchers improve data management practices is to invest in people. We have therefore initiated the Data Stewardship project at TU Delft. We appointed dedicated, subject-specific data stewards in each faculty at TU Delft. To ensure the support offered by the data stewards is relevant and specific to the actual problems encountered by researchers, data stewards have (at least) a PhD qualification (or equivalent) in a subject area relevant to the faculty. We also reasoned that it was preferable to hire data stewards with a research background, as this allows them to better relate to researchers and their various pain points as they are likely to have similar experiences from their own research practice.
Vision for data stewardship
There are two main principles of this project. Crucially, the research must stay central. Data stewards are not there to educate researchers on how to do research, but to understand their research processes and workflows and help identify small, incremental improvements in their daily data management practices.
Consequently, data stewards act as consultants, not as police (the objective of the project is to improve cultures, not compliance). The main role of the data stewards is to talk with researchers: to act as the first contact point for any data-related questions researchers might have (be it storage solutions, tools for data management, data archiving options, data management plans, advice on data sharing, budgeting for data management in grant proposals, etc.).
Data stewards should be able to answer around 80% of questions. For the remaining 20%, they ask internal or external experts for advice. But most importantly, researchers no longer need to wonder where to look for answers or who to speak with – they have a dedicated, local contact point for any questions they might have.
Data Champions are leading the way
So has the cultural change happened? This is, and most probably always be, a work in progress. However, allowing data stewards to get to know their research communities has already had a major positive effect. They were able to identify researchers who are particularly interested in data management and sharing issues. Inspired by the University of Cambridge initiative, we asked these researchers if they would like to become Data Champions – local advocates for good data management and sharing practices. To our surprise, more than 20 researchers have already volunteered as Data Champions, and this number is steadily growing. Having Data Champions teaming up with the data stewards allows for the incorporation of peer-to-peer learning strategies into our data management programme and also offers the possibility to create tailored data management workflows, specific to individual research groups.
Technology or people?
Our case at TU Delft might be quite special, as we were privileged to already have the infrastructure and tools in place which allowed us to focus our resources on investing in the right people. At other institutions circumstances may be different. Nonetheless, it’s always worth keeping in mind that even the best tools and infrastructures, without the right people to support them (and to communicate about them!), may fail to be widely adopted by the research community.
Yasemin Turkyilmaz-van der Velden and Marta Teperek are very privileged to represent TU Delft at the International Data Week 2018 in Gaborone, Botswana. Yasemin has been awarded a very competitive grant for Early Career researchers to attend the conference.
We are working hard to make sure that we get the most of our attendance. Yasemin is presenting:
- Poster: Data Stewardship at Delft University of Technology
- As of 7 November 2018, downloaded already 35 times
- Presentation with the same title: Data Stewardship at Delft University of Technology
- Presentation: Research Data Management in Engineering disciplines
- Presentation: Libraries for Research Data: Engaging Researchers with Research Data: what works?
- Session proposal and chairing: Motivations and recognition for good data stewardship
The full programme of the International Data Week can be accessed online.
Written by Julie Beardsell and originally published on the ICT innovation blog.
Responding to the challenge
Navigating the often complex legal landscape of software licensing can be a genuine challenge for researchers, particularly when starting up a research project for the first time.
Today’s researchers, when starting out on a PhD, may typically need to be competent scientists and programmers, but also understand and be sufficiently knowledgeable to make the right choices for the licensing of the software that they build. Without the latter, they risk a number of potentially undesirable situations.
To help researchers navigate their way, a working group at TU Delft has put together a set of guidelines for researchers, which can be downloaded here.
In addition, the working group is drafting a document to provide more detailed information and links to related documents and useful sources.
Open, reproducibility, peer-review and building upon others’ work
The very nature of the research itself, may be to create or improve upon software, which might be worked on openly and collaboratively with others, from institutions other than those of the institution by which the researcher is employed.
In addition, the task of creating scientific software as output of the researcher does not end with the publication of results which will have been generated as a result of the developed software. Making that software available for inspection and use by other scientists is essential to reproducibility, peer-review, and the ability to build upon others’ work.
Importance of licenses
Licenses are important for setting out the terms on which software may be used, modified, or distributed and by whom. Without a license agreement, software may be left in a state of legal uncertainty in which potential users may not know which limitations owners may want to enforce, and owners may leave themselves vulnerable to legal claims or have difficulty controlling how their work is used. Licenses can also be used to facilitate access to software as well as restrict it.
The working group consists of Julie Beardsell, Merlijn Bazuine, Susan Branchett, Maria Marques de Barros Cruz and Marta Teperek and the group would like to thank those researchers across the faculties who have contributed so far and encouraged the development of this initiative at TU Delft.
About the Author
“Open Source Software Guidelines for Researchers” by Julie Beardsell is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.