EEMCS – 4 months’ Data Stewardship Progress Report
Faculty of Electrical Engineering, Mathematics and Computer Science
Data Stewards and authors: Munire van der Kruyk, Robbert Eggermont, Jasper van Dijck
From 2016 to 2017
In 2016, eleven researchers from the faculty of electrical engineering, mathematics and computer science (EEMCS) were interviewed about their current research data management practices (RDM) and their opinion regarding RDM. These interviews were part of the foundation that started the data stewardship programme at TU Delft since it became clear that even within one faculty, the RDM methods employed varied significantly. This called for a discipline-specific approach to data stewardship and eventually led to the start of the data stewards in 2017.
Start of Data Stewards in 2017
During the first half of 2017, data stewardship took shape. EEMCS chose to divide the data stewardship tasks among current personnel rather than adding new personnel, for two reasons: to ensure that the newfound knowledge would not be lost due to the temporary nature of the programme and to access the already available data management knowledge within EEMCS. With that, EEMCS started data stewardship with the goal to facilitate researchers with their RDM and to create a culture where topics such as RDM, reproducibility, and open data are high on the agenda.
We again started interviewing researchers, this time focussing more on current RDM practices and challenges relating to specific EEMCS projects. We also sent out a faculty-wide news item informing researchers of the data stewardship programme as well as inviting them to talk to us if they have any RDM related questions. We gave several training sessions and presentations, such as a workshop at the AMS institute, presentations for a national data stewardship meeting organised by the LCRDM, and a presentation, panel discussion, and closing remarks at the Cambridge OpenCon 2017. During this time we also started receiving questions from researchers and it quickly became clear that the faculty-specific knowledge was valuable when engaging researchers. However, for each of the data stewards the broader scope of RDM was missing. This gap is currently being bridged by training in general RDM aspects.
To summarise 2017 in numbers so far:
- 6 interviews with researchers;
- 1 department presentation (DIAM), 2 upcoming (INSY, ST);
- 6 researchers reached out with questions.
EEMCS Opinions on RDM and the RDM Survey 2017 Results
So what have we learned from talking to EEMCS researchers? Most researchers we talked to are in favor of more standardised methods for RDM. They agree that their research would benefit from clearer guidelines on how to handle research data. However, whether or not several aspects should be made mandatory, such as data management plans, is a topic of debate. Researchers do not want another administrative obligation (in their eyes) preventing them from doing actual research. That is why a key topic for 2018 will be to find out how to implement guidelines and procedures that will enable standardised RDM without, if possible, forcing the issue.
In addition to the researchers that were interviewed, the RDM survey that was conducted in October-November 2017 also yielded interesting results for EEMCS. First, EEMCS had the lowest response rate of the three participating faculties even though they are not much smaller than CEG and are larger than LR.
This might be explained by looking at the response rates of the different departments.
Of the 6 EEMCS departments, the applied mathematics department (DIAM) and electrical engineering (EE) departments (ESE, ME, QE) had a much lower response rate than the computer science (CS) departments (ST, INSY). Since CS is heavy on big data and software sustainability, we believe that the perceived usefulness of RDM is higher, but that still fails to explain a perceived lack of awareness or interest from EE. Though the general EE research methodology differs from CS in that it more focused on modelling, simulation results, and measurement data, we believe that EE research could possibly benefit from standardised RDM since they often invest much time into preserving their models.
The most striking difference between faculties is in the use of dedicated tools for research data management. In the faculty of EEMCS, 62% of the respondents use such tools. This is easily explained, as using version control tools is a daily work practice for most computer scientists, of which GitHub is a prime example.
We also found a significant disconnect between researchers who feel responsible for their data and protecting the data. When asked if they felt responsible for their research data, 76% (79) of the respondents thought themselves responsible. However, when asked if their research data was automatically backed up, only 51% (40) of this group of respondents for EEMCS responded “yes.” Our estimate is that most people are either unaware of the backup facilities provided by the central IT department or are unwilling to use them possibly preferring services such as GitHub or external cloud storage.
What also became clear from the results is that the data stewards have quite a way to go towards informing the faculty about the data stewardship project: only 19% of the respondents from EEMCS have even heard of the project. PhDs in particular are not yet aware of it.
Lastly, the answers to the open questions also indicate a need for specific information or training. We see:
- A general need for easy-to-access information on RDM practices as well as available facilities;
- PhD students who recently started would like RDM training;
- And there is a general desire for more information on using version control software.
Challenges for EEMCS
Based on the interviews so far, as well as the RDM survey, we identified the following main topics of interest for data stewardship at EEMCS:
- Software sustainability;
- Privacy for sensitive data;
- Intellectual property rights;
- Archiving of high volume datasets (“big data”).
We believe the most important of these topics is software sustainability. Software plays a major role in research within EEMCS research. Since applied mathematics is the most common research methodology (with varying applications) most researchers write their own software to analyse their research data. For example, big data research would be impossible without software. Ensuring that software written by researchers will be accessible in the future will be of vital importance for the reproducibility of scientific research at EEMCS.
In addition to these topics, we think that our biggest challenge will be to create an RDM culture within EEMCS, specifically within EE. So far we have seen that the CS departments, and to an extent DIAM as well, are more inclined towards RDM than the EE departments, which is mirrored by the RDM survey results. Our effort for 2018 will therefore focus more actively on reaching researchers from all disciplines within EEMCS.
Set-Up for 2018
After our training is completed, we can start effectively engaging the faculty. During the last 4 months we have laid the foundation to start engaging researchers across the entire faculty. The financial administration as well as the contract managers will notify us of new projects (of which we already have a backlog) and all departments are aware of the data stewardship programme.
In 2018, we believe our work will genuinely start. We will start proactively talking to PIs from as many projects as possible and we will have the time to assist researchers with RDM-specific cases. Talking to PIs will help spread awareness across the faculty and help identify possible discrepancies in supply and demand for RDM related issues, such as storage or RDM software (Github/lab, Docker). Helping researchers with specific cases will enable us to apply our newly acquired general RDM knowledge in practical situations which in turn will expand our discipline-specific RDM knowledge.
We will also start actively building a data champions community. We already have several possible candidates from the CS (and DIAM) departments, but we will have to start looking in the EE departments as well. At the end of 2018, we hope to have a better grip on how data champions will be able to help inform and train their fellow researchers, and provide feedback on current practices and challenges. We think a prime topic for data champions at EEMCS will be software sustainability.
Using this bottom-up approach we aim to gather enough input and support to start creating faculty-specific guidelines, policies, and training during 2018.