Tagged: software

Managing and Sharing Data in 2021

Authors: Esther Plomp

NWO/Horizon Europe projects have the following requirements:

Data management should follow the FAIR principles to maximise the effectiveness and reproducibility of the research undertaken. 

  • The FAIR principles recommend that scientific data are: 
    • Findable’ thanks to their persistent identifier that is assigned to your dataset when it is shared using a data repository (see below).
    • Accessible’ so that the data and metadata can be examined; FAIR data is not necessarily open data, but the metadata could still be shared to ensure that the data is still FAIR.
    • Interoperable’ so that comparable data can be analysed and integrated through the use of common vocabulary and formats.
    • Reusable’ as a result of appropriate documentation and provision of a license that tells others what they can do with the data. 
Illustration by The Turing Way/Scriberia

Proposals

Data Management Plans

Data Management Plans (DMP) are required for projects funded by NWO (within four months after the awarding of the grant) and Horizon Europe (within 6 months of the project’s start). 

  • DMP templates are available on the websites of NWO and Horizon Europe, as well as on the platform DMPonline that you can use with your netID. 
    • DMPonline has TU Delft specific guidance that will help you to set up your DMP more efficiently. 
    • You can also use the TU Delft template available through DMPonline (although this will need some additions for Horizon Europe as this template is more extensive). It is especially efficient to use the TU Delft template if your project needs HREC approval (for example, working with personal data).
  • A DMP should be a living document, which is updated as the project evolves. Horizon Europe expects you to update the template throughout the project.

Data sharing

Data should be shared through a trusted repository, for example: 4TU.ResearchData.

  • Data underpinning a scientific publication should be deposited at the latest at the time of publication.
  • Data is in principle open, unless restricted access is needed for legitimate reasons
    • Access can be restricted when it concerns aspects such as privacy, public security, ethical limitations, property rights and commercial interest.
  • For Horizon Europe, the data should be licensed using CC-BY or CC0 (or an equivalent license), and metadata of the datasets should be CC0 licensed. 
    • For CC-BY it means that others should cite the work when they reuse the data. 
    • CC0 waives any rights, with citation still being expected as this follows best scientific practises. 
    • Note that some Horizon Europe calls may require additional obligations for the validations of scientific publications. 
  • These requirements are in line with the TU Delft Research Data Framework Policy, stating that “research data, code and any other materials needed to reproduce research findings are appropriately documented and shared in a research data repository in accordance with the FAIR principles for at least 10 years from the end of the research project, unless there are valid reasons not to do so.” (NWO also expects data preservation for at least ten years, unless legal provisions or discipline-specific guidelines dictate otherwise). 
Illustration by The Turing Way/Scriberia

Software

Software is seen as a separate research output from data:

  • Horizon Europe recommends sharing software (under an Open Source license).
  • NWO expects that software that is needed to access and interpret the data is made available, following the Five Recommendations for FAIR Software.
  • See the TU Delft Research Software Policy for more information on how to share your research software. 
    • TU Delft encourages you to share your code/software through 4TU.ResearchData choosing one of the TU Delft approved licenses (Apache, MIT, BSD, EUPL, AGPL, LGPL, GPL, CC0). You can also choose another data repository, such as Zenodo, but then you have to ensure that the output is correctly registered in PURE yourself. See the TU Delft Guidelines on Research Software or this recording for more information on sharing your software/code.

Need any help? 

Resources

Horizon Europe Programme Guide (pages 41 – 46)
Annotated Horizon Europe Grant Agreement (Annex 5, pages 152-153)
NWO: Research Data Management
TU Delft: Research Data Management
TU Delft & Faculty policies (data/software)

Towards user-driven design for an institutional software catalogue

Software is a crucial part of most researchers’ and educators’ daily lives: we use software to collect and analyse data, curate literature to read, write and edit papers, prepare teaching materials, grade assignments, etc. ICT (Information and Communications Technologies) and support teams at institutions realised these needs and began making software available for everyone in the organizations to use through purchasing, serving and maintaining of institutional copies of software. This saves researchers the hassle and money of buying their own licenses and updating the software, and at the same time lowers the risk of data and security breaches.

As the number and diversity of software used by researchers grow, these “institutional software catalogues” become bigger and harder to maintain and navigate. With modern search engines, in some cases, researchers would rather search for and buy their own copy of the software than trying to find an institutional license. There is a need to rethink how these catalogues can be designed to offer unique values to its users (beyond what a search engine can) and to lower the workload for the maintenance and support teams.

As a first part of our design process, it is vital to understand and map what users’ needs are: how do researchers discover, select, install and use software for their research tasks? What are the pain points within that process, and how could it have been better? We decided to set up a value proposition design workshop with researchers (based on the Strategyzer’s value proposition design canvas and the accompanying book) with researchers with the aim of clarifying our understanding of their needs.

The Value Propostion Canvas from Strategyzer AG and Strategyzer.com – download and see terms of use.

We worked with 8 participants from different faculties. In the 1-hour workshop, we first asked them to individually write down their jobs, pains and gains with regards to discovering, selecting, installing and using software in their research. They were also asked to rank their most important job, most extreme pain and most essential gain.  

As they wrote down and told the group their experiences and thoughts, it becomes clear that some of these experiences, pains and gains are common between the researchers. We share a summary of the workshop output below. 

We worked with 8 participants from different faculties. In the 1-hour workshop, we first asked them to individually write down their jobs, pains and gains with regards to discovering, selecting, installing and using software in their research. They were also asked to rank their most important job, most extreme pain and most essential gain.  

As they wrote down and told the group their experiences and thoughts, it becomes clear that some of these experiences, pains and gains are common between the researchers. We share a summary of the workshop output below. 

Researchers’ jobs, pains and gains in finding, selecting, installing and using software for their research.

Our user profile: jobs, pains and gains 

Finding the right software to use is time-consuming and difficult 

Most workshop participants find it difficult to effectively find and compare between software alternatives, particularly for discipline-specific activities. Researchers often rely on some colleagues’ and field experts’ recommendations, and in many cases, researchers resort to trial-and-error to find the most suitable software. There is a lack of a clear institutional point of contact for software-related issues and advice.

Compatibility issues complicate this task: researchers must make sure that the chosen software is compatible with their collaborators’ operating systems as well as their own, and that the collaborators also have the software license. Other considerations include cost and whether the software is open source – it will be nice to have easily accessible information on free and/or open-source alternatives.

Finally, some participants have trouble finding up-to-date information on the institutional licenses that TU Delft has, while others are not sure how the licenses work.  

Installation is not always straight-forward 

Remote work during the pandemic means that some researchers cannot access equipment in their offices and need to use their private laptops at home. This makes installing software more challenging as researchers can no longer just go to ICT service-points directly for support. It is also unclear whether or how one can use institutional licenses on private machines.

Additionally, in our quick raise-of-hand exercise during the workshop, half the researchers (4 out of 8) regularly use Linux. Participants reflected that it is typically more difficult to obtain information regarding Linux-compatible software alternatives or support – there is a need to evaluate how to make our resources and services more friendly for Linux users.

A strong need for reliable data storage and handling solutions 

While TU Delft provides and supports multiple solutions for data storage, we need to explore how we can demonstrate the reliability and user-friendliness of our data storage solutions and build trust. 

Researchers also want to be able to collaborate and share data and findings, and use software that can facilitate these collaborations.

Our solutions should relief pain and create gains 

This user profile that we’ve co-built with our users helps us prioritise the development of features and services, to focus on tackling the most extreme pains and creating essential gains. For example: 

  • Provide clear, easily comparable information about operating system compatibility, available licenses or their pricing, data security and privacy, etc.
  • Make sure researchers can access help quickly when needed
  • Offer ways to provide feedback, e.g. when users come across outdated information
The workshop’s output is summarised in the user profile on the left, the right shows a possible value proposition design for our solution, with direct reference to the user profile. The value proposition canvas used is based on the original from Strategyzer AG and Strategyzer.com.

This is only the beginning! We would like to run a few more of these workshops, perhaps with researchers from other faculties and career stages, to build a more complete user profile.   

To create a sustainable solution, we also need to make sure that not only users’ needs are met, but that everyone involved in designing, implementing and maintaining the solution would have the expertise, capacity and resources to do so effectively. For this, we hope to run similar workshops with different stakeholders involved, to also understand their tasks, challenges and wants, and ultimately to design a user-driven, sustainable solution. 

Lessons learnt 

Overall, we have succeeded in taking a first step towards the goal of bringing end users closer to our research support work. Here, we share a few lessons that we learnt throughout the design and planning of this workshop: 

  • It is crucial to ask the right questions, so that participants can recall specific experiences when thinking about jobs, pains and gains. 
    • We discussed both what we want and do not want to help us scope the question, e.g. we wanted to focus on participants’ experience with finding technological solutions (e.g. data transfer, storage and compute), but not with human solutions (e.g. someone not replying to their emails) 
    • We gave examples of what we wanted for each of the jobs, pains and gains; we asked participants to be as specific as possible 
  • Researchers’ time is extremely valuable – we thought hard about what they would get in return: 
    • We primarily approached members of faculty PhD boards and councils; part of their role is to represent the PhD students within their faculties, and this then is an opportunity that fits within their role 
    • We also promised to keep them in the loop and acknowledge their help in the output 
    • Moving onwards, we think it is important to explore more, different and better ways to reward contributors 
  • Stakeholders’ expectation management is challenging 
    • Users are not the only stakeholders- we need to understand the concerns and challenges faced by solution providers in order to design a sustainable solution 
    • This design approach and type of exercise can be foreign to many; we still need to learn how to get our stakeholders’ buy-in to maximise the value we can get from this process and output  

Please let us know in the comments section if you have similar experiences – we would love to hear and learn from you. 

Workshop contributors

  • Participants: 
    • Agnes Broer (LR) 
    • Stephan de Hoop (CiTG) 
    • Mariska Koning (CiTG) 
    • Annika Krieger (TNW) 
    • Yuxin Liu (LR) 
    • Sven Pfeiffer (LR) 
    • Stefania Usai (TNW) 
    • Hongpeng Zhou (3mE) 
  • Design and implementation:
    • Meta Keijzer-de Ruijter 
    • Masha Rudneva 
    • Emmy Tsang 

Coding problems? Just pop over!

Launch of code walk-in consultations at TU Delft

Authors: Nicolas Dintzner, Kees den Heijer, Marta Teperek

On Wednesday 24th of January, the data stewards at TU Delft organised the first (might  be re-named in the future) “code walk-in consultation” hosted at the Faculty of Civil Engineering.  

The main objective of this event was to provide support to researchers facing software and/or data processing related issues.  To this end, we gathered data stewards (Esther, Kees, Nicolas) and data champions (Joseph Weston, Victor Koppejan) and got ready for… whatever software issue troubled people on that day!

Several people turned up ranging from MSc students to a full professor (Mark van Koningsveld, one of our data champions). The participants came in with rather interesting and diverse problems. From data plots in Python, to Fortran compiler behavior, we had our hands full for a little while! Code was reviewed, some of it was compiled (more than once), tests were run and some participants saw their problems being solved on the spot, while others only got some ideas for resolutions.

Everything happened in a relaxed atmosphere. People came in and where greeted by a member of the team. They described their issue(s) and based on this, we decided who among the stewards and champions had the most experience in that domain or was the most likely to be able to help. Then, we opened the laptop of the problem-giver and started hacking away.

Here are a few take-away points from this first session:

  • Bring-your-laptop is a great practice: having working code to play with is really valuable to get started quickly and get to the core problem
  • An external point of view is always useful: we did not manage to solve all issues, but at least, we provided some insights on what could be the possible causes and a course of action to move forward.
  • Minimum working examples are welcome: having a small size example of the issue at hand (when relevant) is quite useful to get to the core of the problem quickly. While not necessary for walk-in sessions (we’ll help you with what you have!), such test cases are useful when the error scenario involves remote code execution, or complex setups.

From a pure data stewardship perspective, such sessions are quite valuable as well. We get to see what researchers work on, what  tools are used and what kind of issues that brings. For instance, we had no idea that people were still working with Fortran 77 code.

So far, we received little feedback, but the little we have is quite encouraging:

Thank you! That is very helpful to see. I also really appreciated all your help this morning at the coding consultation.

So, we’ll keep organizing those code walk-ins, but most likely with a cooler name.  We will start to do so on a monthly basis.

In the meantime be aware that you can get in touch with your faculty data steward at any time for a bit of help regarding your software/data issues!