On 7 December 2017, the Data Stewards who started at TU Delft in 2017 gave presentations summarising their achievements so far.
It was an internal meeting held at the TU Delft Library and its main focus was to provide constructive feedback to the Data Stewards on their work. Apart from the Data Stewards, the meeting was attended by the TU Delft Research Data Services team and by Alexey Pristupa, the Data Steward at the Amsterdam Institute for Advanced Metropolitan Solutions.
Here are the presentations given by the Data Stewards:
- Faculty of Electrical Engineering, Mathematics and Computer Science
- Faculty of Civil Engineering and Geosciences
- Faculty of Aerospace Engineering
We are planning to open up future meetings to anyone interested, so if you would like to be invited to come, please get in touch with the Data Stewardship Coordinator.
Faculty of Civil Engineering and Geosciences
Pdf version of the report can be also downloaded from here: 2017-11_data-steward-ceg-status
Faculty of Aerospace Engineering
Data Steward and author: Heather Andrews
The Data Steward at the Faculty of Aerospace Engineering (AE), Heather Andrews, started working in October 2017. So far her main duties have been:
- undergoing training in research data management;
- performing qualitative interviews of staff members to understand the current data management practices and issues within the different research groups in the Faculty;
- working on strategies to introduce the Data Stewardship project to the staff members and create awareness about Open Science and responsible data management;
- monitoring and promoting the Research Data Management (RDM) survey that has been launched in each Faculty that has a Data Steward.
In the following sections we will cover:
- Regular meetings
- Training sessions
- Qualitative interviews with researchers
- RDM survey results
- Scheduled appointments
- Plans for 2018
The Data Steward has weekly meetings together with the other Data Stewards and the Data Stewardship Coordinator, in order to exchange practice about data stewardship work at Faculty levels, and discuss matters regarding the Data Stewardship Project at a university level (1.5 hours weekly).
In addition to that, the Data Steward meets the secretary of the AE Faculty every 2 weeks to discuss progress within the Faculty itself (1 hour every two weeks).
The Data Steward must also attend the meetings of the Library Committee of the AE Faculty, which are once every 3 months.
Aside from these regular meetings, other appointments were carried out in the first weeks, in order to get to know important support staff members of the Faculty and to have an optimal overview of how the AE Faculty works.
The Data Steward is undergoing two different types of training on research data management: internal and one external. The main goal of these training sessions is to get a broad overview of data management topics and learn the essential skills to support researchers with their data questions.
As part of the internal training, the Data Steward has attended 2 training sessions delivered by local experts and organised specially for the Data Stewards at TU Delft: one on Data Management Plans, and the other on central ICT solutions for researchers (2 hours each).
As part of the external training, the Data Steward has been enrolled in the Essentials 4 Data Support course organised by SURFSara, 4TU.Centre for Research Data and DANS. This course consisted of 16 hours of in-person training: on October the 5th and on November the 23rd, plus 4 hours a week (on average) of doing the assignments and studying the provided online material.
Aside these training sessions, the Data Steward has also spent time self-training, as she arrived 1.5 months later than the other Data Stewards.
Qualitative Interviews with Researchers
During this 1.5 months period, the Data Steward has carried out several interviews with researchers within the AE Faculty (from all four departments), in order to get to know the data management practices within the different research groups. Up to 16-11-2017, 13 interviews were completed (1 Postdoc, 1 Associate Professor, 8 Assistant Professors and 3 Professors). Among these, 3 researchers are from the Space systems Engineering (SpE) department; 2 are from Control & Operations (C&O); 4 are from Aerospace Structures and Materials (ASM); and 4 are from the Aerodynamics, Wind Energy, Flight performance & Propulsion (AWEP) department. There are 3 more interviews scheduled in the upcoming weeks (with Professors only).
Main issues discussed during the interviews are about data storage, archiving, and sharing. From talking to the researchers, it is clear that every research group is different and has specific needs regarding data management. For example, some groups need to continuously have access to 10-year-old data, while others might only require access to research data from the past year. There are groups that generate so much data they have to process it in Supercomputers, and reduce it to sizes which could be retrieved relatively fast to work on their local disks. There are groups which keep all their data on their own external hard drives, which costs more than 10-15k euros a year.
Overall trends can be summarized as follows
- There is an important lack of information regarding services provided by TU Delft. Either researchers have never heard of available services, or they have heard something but have not spent the necessary time on getting to know more about them.
- All interviewees are willing to share their data once an article is published (and as long as it is allowed when working with companies).Regarding sharing unpublished data, most researchers are open to the idea. However, the main problem is finding the time to put such datasets in a comprehensible format, which would be useful for the community. Most researchers claim that this subject is not viewed as a ‘high priority’. Unless researchers see a direct benefit, they will invest time in making all their data “human-readable”. Therefore, most researchers currently share data under private requests, meaning that whoever is interested in the data, sends an email explaining why they want the data, and then the researcher evaluates whether it is pertinent to share the data or not.
- Some groups are now starting to have standard procedures regarding data storage. This has been motivated by the fact that whenever someone leaves the group, the data is either lost or delivered in a non-understandable format.
- There are groups interested in using a repository to save all their data during research.
- SURFdrive is not always the preferred choice for data sharing. Some staff members claim there are technical problems with SURFdrive and it is not very user-friendly when it comes to accessing the data from abroad or sharing the data with colleagues outside of academia.
- Some groups save all their research data files on external hard drives.
- Researchers would appreciate more awareness and training about good data management practices, which was also confirmed in responses to the quantitative RDM survey (see below).
Research Data Management Survey
Preliminary results of the RDM survey conducted at the AE Faculty (answers gathered between 26/10/2017 and 14/11/2017) can be summarized as follows:
- More than 27% of the staff members who received the survey, filled it in.
- Among the people who replied to the survey, 3% are MSc students, ~48% are PhDs, 14% are Postdocs/Researchers, 17% are Assistant Professors, 10% are Associate Professors and 8% are Full Professors. Considering the total number of PhD students who received the email, we have about 29% response rate to the survey. Estimate response rate for Full Professors is 52%.
- Most of the people who replied to the survey (78%) are unaware or not sure of what FAIR data is.
- 67% of the people who replied to the survey have not heard about the Data Stewardship project and the dedicated support for data management at the Faculty (9% of the surveyed people are ‘not sure’ whether they have heard of them).
- 53% of the people who replied to the survey claim to know who owns the data they work on, while 33% do not know who owns it and the rest simply do not know.
- 44% of the PhDs that replied to the survey claim to know who owns their data. Out of these, 16% of them claims to have full or partial ownership of the data. In case of partial ownership, PhDs think that the data ownership is shared between them and either TU Delft or their promoter.
- Most of the people who replied to the survey do not have data management plans for their research projects (63%), or are not sure of having one (20%).
- 35% of the people who replied have their data automatically backed up.
- 85% of the surveyed people are interested in training on research data management. Only 17% of the PhDs and 14% of the Postdocs/Researchers that replied to the survey claimed to be ‘Not interested in any training’.
- 51% of the people who replied to the survey are not aware of the 4TU.Centre for Research Data.
A more in-depth analysis will be performed once the survey is closed. It is worth mentioning that also the interviewees mentioned in Section 3 have all expressed great interest in knowing what comes out of this survey.
Plans for 2018
During the first 3 months of 2018, the plan is to make sure the Data Stewardship project has been properly introduced to the community at the AE Faculty. In addition, the results of the RDM survey will be presented to the staff members, to create awareness of the current data management practices across the Faculty, and also to show how they compare to the practices in the other 2 Faculties that were surveyed. From the Data Stewardship project perspective, the results of this survey will also be used to benchmark the impact of the project.
The Data Stewardship project will set up new guidelines and policies regarding research data management; a work that has already started this year but it is expected to be implemented from 2018 onwards.
The Data Champions program will start in 2018. This program aims to create a community of researchers who are actively engaged with responsible data management practices. The Data Champions will work at a department-level within the Faculty, helping with the implementation of discipline-specific policies, being coordinated by the Data Steward. First, the program will be presented to researchers, to look for interested candidates. Candidates will be assessed at the beginning of 2018, so that the selected ones (the ‘Data Champions’) can start working starting from March-April 2018.
Given that the training sessions described in Section 1 will be completed by the end of 2017, it is expected that throughout the year 2018 the Data Steward can start actively helping researchers regarding their data management issues; focusing on problems that were mentioned by researchers during qualitative interviews, but also on new projects across the Faculty to make sure that their data management is well set from the beginning.
Aside from providing data management support, the Data Steward will also focus on creating awareness about Open Science among researchers and will provide training for the community (e.g. to PhD students) on responsible research data management practices and data management services provided by the TU Delft (e.g., 4TU.Centre for Research Data).
Finally, at the end of 2018, the RDM survey will be ran again to see what the differences are with respect to the beginning of the Data Stewardship Project.
Faculty of Electrical Engineering, Mathematics and Computer Science
Data Stewards and authors: Munire van der Kruyk, Robbert Eggermont, Jasper van Dijck
From 2016 to 2017
In 2016, eleven researchers from the faculty of electrical engineering, mathematics and computer science (EEMCS) were interviewed about their current research data management practices (RDM) and their opinion regarding RDM. These interviews were part of the foundation that started the data stewardship programme at TU Delft since it became clear that even within one faculty, the RDM methods employed varied significantly. This called for a discipline-specific approach to data stewardship and eventually led to the start of the data stewards in 2017.
Start of Data Stewards in 2017
During the first half of 2017, data stewardship took shape. EEMCS chose to divide the data stewardship tasks among current personnel rather than adding new personnel, for two reasons: to ensure that the newfound knowledge would not be lost due to the temporary nature of the programme and to access the already available data management knowledge within EEMCS. With that, EEMCS started data stewardship with the goal to facilitate researchers with their RDM and to create a culture where topics such as RDM, reproducibility, and open data are high on the agenda.
We again started interviewing researchers, this time focussing more on current RDM practices and challenges relating to specific EEMCS projects. We also sent out a faculty-wide news item informing researchers of the data stewardship programme as well as inviting them to talk to us if they have any RDM related questions. We gave several training sessions and presentations, such as a workshop at the AMS institute, presentations for a national data stewardship meeting organised by the LCRDM, and a presentation, panel discussion, and closing remarks at the Cambridge OpenCon 2017. During this time we also started receiving questions from researchers and it quickly became clear that the faculty-specific knowledge was valuable when engaging researchers. However, for each of the data stewards the broader scope of RDM was missing. This gap is currently being bridged by training in general RDM aspects.
To summarise 2017 in numbers so far:
- 6 interviews with researchers;
- 1 department presentation (DIAM), 2 upcoming (INSY, ST);
- 6 researchers reached out with questions.
EEMCS Opinions on RDM and the RDM Survey 2017 Results
So what have we learned from talking to EEMCS researchers? Most researchers we talked to are in favor of more standardised methods for RDM. They agree that their research would benefit from clearer guidelines on how to handle research data. However, whether or not several aspects should be made mandatory, such as data management plans, is a topic of debate. Researchers do not want another administrative obligation (in their eyes) preventing them from doing actual research. That is why a key topic for 2018 will be to find out how to implement guidelines and procedures that will enable standardised RDM without, if possible, forcing the issue.
In addition to the researchers that were interviewed, the RDM survey that was conducted in October-November 2017 also yielded interesting results for EEMCS. First, EEMCS had the lowest response rate of the three participating faculties even though they are not much smaller than CEG and are larger than LR.
This might be explained by looking at the response rates of the different departments.
Of the 6 EEMCS departments, the applied mathematics department (DIAM) and electrical engineering (EE) departments (ESE, ME, QE) had a much lower response rate than the computer science (CS) departments (ST, INSY). Since CS is heavy on big data and software sustainability, we believe that the perceived usefulness of RDM is higher, but that still fails to explain a perceived lack of awareness or interest from EE. Though the general EE research methodology differs from CS in that it more focused on modelling, simulation results, and measurement data, we believe that EE research could possibly benefit from standardised RDM since they often invest much time into preserving their models.
The most striking difference between faculties is in the use of dedicated tools for research data management. In the faculty of EEMCS, 62% of the respondents use such tools. This is easily explained, as using version control tools is a daily work practice for most computer scientists, of which GitHub is a prime example.
We also found a significant disconnect between researchers who feel responsible for their data and protecting the data. When asked if they felt responsible for their research data, 76% (79) of the respondents thought themselves responsible. However, when asked if their research data was automatically backed up, only 51% (40) of this group of respondents for EEMCS responded “yes.” Our estimate is that most people are either unaware of the backup facilities provided by the central IT department or are unwilling to use them possibly preferring services such as GitHub or external cloud storage.
What also became clear from the results is that the data stewards have quite a way to go towards informing the faculty about the data stewardship project: only 19% of the respondents from EEMCS have even heard of the project. PhDs in particular are not yet aware of it.
Lastly, the answers to the open questions also indicate a need for specific information or training. We see:
- A general need for easy-to-access information on RDM practices as well as available facilities;
- PhD students who recently started would like RDM training;
- And there is a general desire for more information on using version control software.
Challenges for EEMCS
Based on the interviews so far, as well as the RDM survey, we identified the following main topics of interest for data stewardship at EEMCS:
- Software sustainability;
- Privacy for sensitive data;
- Intellectual property rights;
- Archiving of high volume datasets (“big data”).
We believe the most important of these topics is software sustainability. Software plays a major role in research within EEMCS research. Since applied mathematics is the most common research methodology (with varying applications) most researchers write their own software to analyse their research data. For example, big data research would be impossible without software. Ensuring that software written by researchers will be accessible in the future will be of vital importance for the reproducibility of scientific research at EEMCS.
In addition to these topics, we think that our biggest challenge will be to create an RDM culture within EEMCS, specifically within EE. So far we have seen that the CS departments, and to an extent DIAM as well, are more inclined towards RDM than the EE departments, which is mirrored by the RDM survey results. Our effort for 2018 will therefore focus more actively on reaching researchers from all disciplines within EEMCS.
Set-Up for 2018
After our training is completed, we can start effectively engaging the faculty. During the last 4 months we have laid the foundation to start engaging researchers across the entire faculty. The financial administration as well as the contract managers will notify us of new projects (of which we already have a backlog) and all departments are aware of the data stewardship programme.
In 2018, we believe our work will genuinely start. We will start proactively talking to PIs from as many projects as possible and we will have the time to assist researchers with RDM-specific cases. Talking to PIs will help spread awareness across the faculty and help identify possible discrepancies in supply and demand for RDM related issues, such as storage or RDM software (Github/lab, Docker). Helping researchers with specific cases will enable us to apply our newly acquired general RDM knowledge in practical situations which in turn will expand our discipline-specific RDM knowledge.
We will also start actively building a data champions community. We already have several possible candidates from the CS (and DIAM) departments, but we will have to start looking in the EE departments as well. At the end of 2018, we hope to have a better grip on how data champions will be able to help inform and train their fellow researchers, and provide feedback on current practices and challenges. We think a prime topic for data champions at EEMCS will be software sustainability.
Using this bottom-up approach we aim to gather enough input and support to start creating faculty-specific guidelines, policies, and training during 2018.
Authors: Alastair Dunning, Marta Teperek
Thanks to strategic funding from the TU Delft Executive Board, the Data Stewardship project is now up and running. Data Stewards have been in place at EWI and CiTG since summer, and at LR since October. A Coordinator has been appointed at the library. Processes for embedding Stewards at faculties and for fostering collaboration and good practice between the Stewards (and Faculty Secretaries) have been established. A training programme for Data Stewards has been developed. Interviews are currently taking place for Data Stewards at the other five faculties, who will start in early 2018.
Since August 2017, the team published five blog posts about Data Stewardship and presented at four external conferences.
During the first three months of the project, 34 independent interviews with researchers were completed. This, together with results of the ongoing quantitative survey (which has received more than 420 responses, and a response rate varying from 10 to almost 30%) allowed Stewards to get a better understanding of data management needs at their Faculties.
In addition to some discipline-specific challenges, common data management problems were identified across the three faculties: lack of training on data and software management, need for guidance on data documentation and backup and for better support in general. Importantly, as a result of outreach work done in the first few months, one in five staff are now aware of the Data Stewards, of the need to improve research data practices, and more than 70% of respondents wish to be kept informed about the results of the survey, confirming their interest in the topic.
Critical data management issues
However, the survey also reveals how much work is to be done. Over half of those questioned in our survey did not know who actually owned their data, only 40% automatically backed up their data, and 15% had lost research data in the past year. Over 230 researchers at TU Delft (from 420 responses) requested introductory training on research data management.
See also the report from the Faculty of Electrical Engineering, Mathematics and Computer Science for a more in-depth analysis of the preliminary results.
Strategic importance of data stewardship in future funding schemes
The wider research environment continues to be influenced by research data. All NWO and H2020 projects starting from 2017 onwards must now create a Data Management Plan and make their data open. The European Open Science Cloud promises new tools; related EU calls point to new data infrastructure, and related EU strategy papers reveals new rewards and recognition systems to benefit those practicing open science.
The beginning of 2018 will see the project moving to the implementation phase. Data Stewards will carry outreach, training and advisory work to the research community and will start addressing the Critical data management issues outlined above. This will happen in parallel with developing Research Data Policies for each faculty. In addition, the Stewards will initiate a Data Champions programme will be initiated at faculties to promote further engagement with researchers and to address subject needs in data management.
There are two main challenges to the project. First, 0,5 FTE of a Data Steward per faculty is insufficient to address the needs described above, and in particular to address the Critical data management issues. In addition, much time is currently spent undergoing the Data Stewardship training programme. This is essential to get all Data Stewards to the same level but gives Stewards less time to interact with researchers. Secondly, funding for the programme is currently only for 1 year which might not allow enough time for faculties to evaluate the current model and to make informed decisions about taking data stewardship forward. 1-year contracts also present staff retention risks to TU Delft, given that other institutions are now interested in rolling out similar programmes and our Data Stewards are already trained and experienced. We will be therefore looking into possibilities of funding extension for the project.
Faculty-specific reports on Data Stewardship
- Faculty of Electrical Engineering, Mathematics and Computer Science report by Munire van der Kruyk, Robbert Eggermont and Jasper van Dijck
- Faculty of Civil Engineering and Geosciences report by Kees den Heijer
- Faculty of Aerospace Engineering report by Heather Andrews
Written by: Marta Teperek
On 23 November 2017 I participated in the launch of the Austrian chapter of Research Data Alliance (RDA Austria). It was an all day long event organised and hosted jointly by colleagues from the University of Vienna, TU Wien and RDA. Fourteen national and international presenters spoke about various aspects of research data management: from infrastructure support, through data publication and citation, and all the way through to data management and data stewardship. In my opinion, the event was highly successful, and I highlighted two interesting discussions which took part during the conference:
- Discussion about engaging with researchers and what’s important – this blog post (below)
- Discussion about the challenges for data services and the outlook for the future – separate blog post
Is engagement with researchers important?
In my talk about Data Stewardship at TU Delft, I mentioned that it was important to work closely with local Data Champions in order to deliver tailored disciplinary solutions for research data. Data Champions are researchers who are good at managing and sharing their research data and who voluntarily act as advocates for their local communities. This idea sparked an interesting discussion.
Some people thought that those offering research data support do not have sufficient time to look for Data Champions and to engage with them. They thought that involvement in bottom-up activities drains the energy out of service providers and offers little return on investment. It was suggested that efforts should be invested instead in doing the actual work and collaborating with other service providers. Others questioned the idea of Data Champions and suggested that unless researchers are paid for this, they will not be willing to spend their time advocating about the benefits of data management and sharing.
To be relevant, service providers should not forget about their users
The argument of time investment is an important one. Research data support services are quite new and most of the time they are not yet fully embedded within institutions and the exact scope and priorities are often not well-defined. Therefore, pressures on data service providers tend to be high and the choices on where to invest precious time can be quite hard.
However, the following ‘why’ questions can be asked: “Why do you provide data support services? Why do you want to help researchers manage their data better?”. After some discussion, everyone usually agrees that the core mission of data support services is to help researchers do better research and to improve research integrity. So everyone agrees that the primary users (customers) of the data support services at research institutions are researchers (data creators) themselves.
Now, I might be personally biased as I worked for 6 months for a start-up company and I appreciate the Lean Startup methodology, but I strongly agree with one of the attendees who stated: if service providers wish to develop useful services, they need to talk to their customers. There are numerous examples of companies which failed because they developed products which were not of interest to their customers. Therefore, in my opinion, it is key that as data service providers, we speak with researchers, understand their needs, their problems, and provide solutions which solve these problems. This is essential not only to develop products which are useful to the research community but also to tailor the language used in promoting these services to what the community is indeed interested in hearing.
Of course, one cannot spend all the time just engaging and it is key there is a balance between reaching out to end users and doing the actual work. In addition, it needs to be stated that engagement work at research institutions is not easy. One needs to have good communication skills, high level of empathy, and at the same time understand the researcher, the research they are doing and research data they are collecting. The latter is often necessary to find the common language with the researcher and to even start communication. Such mixture of the different skills might be not easy to find and might mean hiring dedicated people. In fact, when looking to appoint Data Stewards at TU Delft, we primarily searched for people with relevant research background (at least with a PhD degree) and with good communication skills, reasoning that as long as they are interested in data management, we can teach them the information and skills they require.
Searching for Data Champions
Another question was about the effort required to find Data Champions and how to motivate them to do this extra work. Kevin Ashley said that sometimes to find Data Champions it is sufficient to simply know the research community a little bit. Kevin explained that many researchers are already doing excellent good data management work and advocate for better practice within their communities anyway. Those researchers might appreciate a formal reward for their efforts by being officially recognised as Champions.
So are financial rewards necessary to motivate researchers for good data management? This is another important question. Nicole Janz, who champions good research data management, explained that her commitment to reproducible working helped her secure lectureship position. In addition, some people say that researchers are already paid to do research and that doing research equals caring about research integrity and good data management. When (in my previous job) I interviewed David Savage, Principal Investigator at the University of Cambridge, he said that it was “the responsibility of every researcher to the profession to try to produce data which is robust”.
I personally think that the lack of dedicated financial rewards should not justify the lack of good data management practices. Yet, it is one thing to adhere to data management practices, and another to advocate within own research community, which requires time and resources. I think that the latter needs to be formally recognised and appropriately rewarded. However, in order for this to be sustainable, the whole academic rewards system should be drastically revised. Currently, researchers tend to be rewarded for the number of high impact factor publications, rather than quality research and good data management.
Thankfully, the European Commission’s Working Group on Open Science Rewards published the report “Evaluation of Research Careers fully acknowledging Open Science Practices”, which proposes that researchers should be rewarded for their commitment to practising Open Science. The report also contains a very useful matrix with evaluation criteria for assessing Open Science activities and suggests that the commitment to practising Open Science should be implemented in FP9 funding scheme (the successor of H2020). In addition, the report proposes that research performing organisations should use these criteria in their hiring and promotion practices.
So I strongly believe that change is coming (and hopefully in not too long).
I would like to thank Raman Ganguly, Paolo Budroni, Kevin Ashley, Barbara Sanchez Solis, Aude Dieude and Rainer Stotzka for inspiring discussions on the topic.
For those interested in rolling out Data Champions programmes at their institutions, the University of Cambridge published the materials used to promote their Data Champions initiative: https://doi.org/10.17863/CAM.7417