Date: 29 January 2018
Author: Marta Teperek, Data Stewardship Coordinator, TU Delft Library
Qualitative interviews with nine researchers at the Faculty of Technology, Policy and Management (TPM) at TU Delft were undertaken in order to get an understanding of data management needs at the faculty in advance of appointing a dedicated Data Steward. The purpose of this was to aid the recruitment of the Data Steward and to define the skills and experience of an ideal candidate, as well as help deciding on the work priority areas for the Data Steward. The results of this research can be also used as a point in time reference to monitor changes in data management practice at the faculty.
The main data management challenges identified were: handling personal sensitive research data; working with big data, managing and sharing commercially confidential information and software management issues. Despite the diversity of problems, some common issues were identified as well: the need for improving daily data management practice, as well as the need for revising workflows for students’ research data. With the exception of one researcher, who was in opposition to the Data Stewardship project, all other researchers expressed their support for the project and welcomed the idea of having a dedicated Data Steward at the faculty.
Additionally, several follow up actions were already undertaken as a follow up of these interviews:
- the Data Stewardship Coordinator was invited to give two talks about Data Stewardship to two different groups of researchers;
- a member of the Research Data Support from the Library team was asked to deliver a training course for students;
- the Data Stewardship Coordinator was asked to discuss the best way of rolling our data management training for PhD students at TPM in coordination with the TPM Graduate School.
Given that the financial allocation for the Data Steward at TPM faculty is currently at 0,5 FTE for the first year and 1,0 FTE for the two subsequent years (until December 2020), it is recommended that the first year is spent on continuing and extending this research to better understand the needs of the faculty. It is suggested that at the same time, the Data Steward starts addressing the most urgent data management needs at TPM faculty, in particular, the development of a data management policy, as well as the development of solutions and recommendations for working with personal sensitive research data.
The two subsequent years could be devoted to developing resources and solutions for the remaining problems and for critical evaluation of the project and its effect on data management practice at the faculty. This approach should provide the faculty with enough resources and information to decide on the best strategy for Data Stewardship beyond December 2020.
Data stewardship has been recognised internationally as a key foundation of future science. Carlos Moedas from the European Commision (EC) said that Open Science “is a move towards better science, to get more value out of our investment in science and to make research more reproducible and transparent. (…) Recent advances such as the discovery of the Higgs boson and gravitational waves, decoding of complex genetic schemas, climate change models, all required thousands of scientists to collaborate (…) on data. And that implies that research data are findable and accessible and that they are interoperable and reusable”. In support of this, the EC anticipated that about 5% of research expenditure should be spent on properly managing and stewarding data. Barend Mons, the Chair of EC high level expert group on the European Open Science Cloud, estimated that 500.000 Data Stewards will be needed in Europe to ensure effective research data management. Consequently, all NWO and H2020 projects starting from 2017 onwards must create a Data Management Plan and are required to make their data open. In addition, the European Open Science Cloud promises new tools and related EC strategy papers suggest new rewards and grant funding schemes (such as FP9) to benefit those practising open science.
TU Delft’s College van Bestuur (CvB) made a strategic decision to be a frontrunner of this global move and a dedicated Data Stewardship programme was initiated. The long-term goal of this programme is to comprehensively address research data management needs across the whole campus in a disciplinary manner. To achieve this, subject-specific Data Stewards are to be appointed at every TU Delft faculty. Strategic funding from the CvB was allocated to support 0,5 FTE of a Data Steward per Faculty until December 2018, and 1,0 FTE of a Data Steward per Faculty for two years from January 2019 to December 2020. Subsequently, faculties are to decide how to best address their researcher data management needs.
In 2017 the first Data Stewards were appointed at three TU Delft faculties: Faculty of Electrical Engineering, Mathematics and Computer Science, Faculty of Civil Engineering and Geosciences and Faculty of Aerospace Engineering. At the beginning of 2018, Data Stewards are to be appointed at the five remaining faculties, including the Faculty of Technology, Policy and Management (TPM).
In order to facilitate the recruitment decision over the appointment of a Data Steward at TPM faculty, the Data Stewardship Coordinator was set out to investigate the faculty’s research data management needs. Qualitative interviews were undertaken with TPM researchers in autumn 2017, which led to the identification of four main data management issues, specific to the types of research done, and revealed some common data problems for the faculty overall. The report below describes the key findings of this research and makes some recommendations for the future work of a Data Steward at TPM Faculty.
Semi-structured qualitative interviews were conducted with four full professors, three associate professors and two assistant professors in September and October 2017. Initial interviewees were selected and approached by the Data Stewardship Coordinator based on their online profile content to ensure a representation of the different research methodologies used across the faculty as well as representation of all three TPM’s departments: Engineering Systems and Services, Multi-Actor Systems and Values, Technology and Innovation. In addition, one researcher was interviewed as a result of a recommendation from the initial interviewee, and two other interviewees were suggested by the Secretary General of the faculty.
All interviewees were informed that interview findings will be used to create a preliminary report on data management needs at the faculty and that the report might be made publicly available. Interviewees were assured that no information will be directly attributed to them and that they will not be named in the report. Interviewees agreed for the interview notes, including personal information, to be shared internally with the Secretary General of the faculty.
Interviews lasted for 30 – 60 minutes. Interviews were not recorded, and instead, notes of key discussion points were taken by the interviewer during the interview.
Categories of data management issues
Diverse nature of research topics at TPM suggested that researchers could have different data management needs. Nine interviews conducted so far revealed that this was indeed the case and identified four top data management issues: handling personal sensitive research data; working with big data, software management issues and managing and sharing commercially confidential information.
Handling personal sensitive research data
Questions about handling of personal sensitive research data were from across the whole research lifecycle: starting with experimental design and ensuring that only the minimum necessary data about people were collected and the right consent forms were in place, all the way through to data anonymisation and deciding which parts of data could be made publicly available, which could be shared only under managed access conditions, and which datasets should never be shared. Researchers also mentioned difficulties of working with sensitive data on a daily basis – the need to use secure servers, encryption to share the data and to ensure that only authorised partners have access to data. Some discussed the challenges of working with sensitive information in fieldwork conditions, especially if the data was politically contentious.
Interviewers wished to have more guidance about recommended workflows and policies, as well as practical support for finding the right storage solutions and means for sharing data with collaborators. In addition, better support was required at the experimental design stage: deciding on the minimal amount of personal information to be collected and drafting the right consent forms. Finally, many expressed the need for resources which could help them with data anonymisation and to manage the risks and benefits of making datasets publicly available.
All these concerns seemed particularly pressing in light of the new EC Data Protection Regulation, coming into force in May 2018. Some interviewees feared that they were unprepared for the new regulation and felt they had not received sufficient information about the impact of the new regulation on their research.
Challenges of working with big data
Challenges of working with big data were mainly related to infrastructure limitations. For researchers working with very large files simple aspects of data management become a difficulty. For example, due to ever-increasing storage requirements for big datasets, many researchers were unable to backup their data. This consequently led to occasional irretrievable data loss. Due to large volumes, big datasets were rarely archived, raising reproducibility concerns. In addition, many researchers had to use third-party computing services in order to effectively process their data. These often resulted in issues associated with very slow data transfer.
Working with big datasets, especially those which needed to be dynamically updated, also meant challenges for data publishing. Many data repositories providers did not offer options for big data sharing and had strict limitations on the maximum size of files. In addition, publishing of big datasets often meant substantial costs and it was often more cost-effective to simply re-generate the data when needed.
Software management issues
The third issue was with software management. In general, researchers did not have policies within their research groups on how software should be managed, annotated and shared. Often the very platforms for software management differed within the same research group. Some researchers felt they did not have sufficient time to annotate their software properly and that their colleagues, especially students, did not have the right skills to effectively work with tools which could help them manage their software better. One researcher mentioned missed commercialisation opportunity due to the fact that the software developed by the group was not understandable to anyone outside the group, including the third party interested in commercialisation.
Interviewees mentioned that due to lack of appropriate skills amongst researchers, there was a need for professional service support in data science. In addition, many suggested that training on the use of software management tools (such as Git, Subversion or Jupyter Notebooks) would be useful, in particular for students. Several wished to receive more information about methods for software archiving and for getting citation credit for code publishing.
Managing and sharing commercially confidential information
Working with commercially confidential data also proved problematic. First, there were tensions between sharing data for the sake of reproducibility, and the need to protect third party’s commercial interests. Interviewees mentioned that navigating between the different contractual clauses could be difficult. One researcher admitted that the inability to share research data obtained from commercial partners made it more difficult to publish papers due to the fact that some journals now required that research data supporting publications was made publicly available. Another researcher felt that collaborating with industry negatively affected the progress of his academic career because commercial clauses consequently meant fewer papers published. That researcher thought that when it came to academic promotions, commercial collaborations were valued less than the number of published articles.
Common data management problems
In addition to data management issues related to the type of research conducted, some common problems mentioned by almost all the interviewees were identified as well. These were related to improving daily data management practice, and to better data management procedures for students.
Daily data management practice
Problems related to daily data management practice concerned issues such as designing a data backup strategy and adhering to it, good file and folder naming, as well as issues with version control. These problems were shared also by researchers who based their research primarily on literature reviews. Overall, very few interviewees established workflows for good data management which would be followed by entire research groups. Most of the time it was down to individuals as to whether data was properly managed or not. Many researchers expressed the wish to improve their data management practice and to attend appropriate training.
Students’ data management practice
Almost all interviewees said that data management practice amongst students needed to be improved and that data management training should be part of the Graduate School’s curriculum. Training needs were related to both awareness of general principles, such as data backup, as well as knowledge of specific techniques and practices, such as data science skills and software management tools.
In addition, one interviewee expressed his concern about the fact that PhD students were not required to archive their research data at the time of graduation. This, he believed, led to research reproducibility concerns and potential reputational damages. The researcher suggested that all PhD students should be required to archive their research data before leaving TU Delft. This view was shared by researchers from the TPM Policy Analysis section (see ‘Follow up actions undertaken’).
An additional concern regarding students’ data was raised during the meeting with researchers from the Engineering Systems and Services department (see the section ‘Follow up actions undertaken’). When discussing research data ownership, researchers mentioned that according to TU Delft regulations, research data collected by Master students belonged to the students, and not to TU Delft. As a result, in several cases, Master students left TU Delft and took all their research data with them, without leaving a copy with their TU Delft supervisors. Researchers believed that this was a concerning and a serious issue from the research integrity and research continuity point of view. To avoid similar issues occurring in the future and to overcome the unfavourable regulation, supervisors now avoided offering participation in valuable, larger projects to Master students.
Views on Data Stewardship
With the exception of one researcher, who was in strong opposition to the Data Stewardship project, all other researchers welcomed the project and thought that there were data management needs at the faculty which could be addressed by the Data Steward.
The researcher with negative views on the Data Stewardship project thought that appointing a dedicated staff member to support researchers in data management was counterproductive. That researcher believed that a Data Steward would “develop guidelines (…) and hold meetings to raise awareness etc.” instead of solving “any actual operational issue”. He also suggested that a quantitative survey should be done to define the common practices and to decide whether any corrective steps needed to be taken. Interestingly, despite the negative attitude in general, the researcher agreed that there were issues with data management which needed to be solved and thought that training in data management for all PhD students was particularly needed.
Another researcher who welcomed the overall idea of the Data Stewardship project raised his concern about the number of resources allocated to the project and suggested that care was taken to ensure that the project would not result in new compliance expectations.
All remaining researchers were enthusiastic about the project and identified numerous data management issues with which they hoped that a Data Steward could help. These included:
- Advice on data management workflows and best practices (such as data backup, version control, file and folder naming)
- Advice on data sharing and citation
- Advice on working with different types of confidential data (such as personal sensitive and commercially sensitive data)
- Support in designing strategies for sustainable code management
- Advice on code sharing and citation
- Help with managing funders’ and publishers’ expectations
- Training on data and software management, in particular for PhD students
Follow up actions undertaken
As a result of the initial interviews with researchers at TPM, several actions were undertaken, which might suggest that interviewed researchers were genuinely interested in data management issues. First, the Data Stewardship Coordinator was invited to give two presentations about the Data Stewardship project: to researchers from the Department of Engineering Systems and Services, and to researchers from the Policy Analysis section of the Multi-Actor Systems Department. Second, one of the interviewed researchers asked members of the Research Data Services team to deliver a workshop to his students about using data repositories. Third, one of the interviewees made a suggestion to connect with the TPM’s Graduate School to discuss the possibilities of rolling out data management training for PhD students.
In addition, the Data Stewardship Coordinator initiated discussions with other faculties to determine whether the issues around Master students’ research data ownership were also problematic at other faculties and whether the problem should be tackled centrally or not. The Furthermore, the Research Data Services team started liaising with the Human Research Ethics Committee to ensure alignment between research ethics and data management guidelines and policies.
This preliminary report identifies several areas where data management practices at TPM faculty could be improved with the help of a Data Steward. However, given the preliminary nature of these findings and the risk that they might not be representative of the whole faculty, it is recommended that the work of the newly appointed Data Steward is initially focused on a more in-depth investigation of data management needs. While qualitative interviews should be continued, a quantitative survey at the faculty is also needed, in agreement with the advice of the interviewee who was negative about the Data Stewardship project. Indeed, results of quantitative surveys conducted at the three faculties that already have Data Stewards proved to be valuable for measuring the scale of data management issues and deciding on priority actions. The thorough investigation of data management needs will allow the faculty to decide how to prioritise them. Finally, understanding the faculty-specific requirements will inform the development of a faculty data management policy.
In addition, given the fact that many researchers interviewed expressed uncertainties about the recommended procedures for working with personal sensitive data and that the new EC Data Protection Regulation becomes legally binding in May 2018, it is suggested that development of recommendations and training for working with personal sensitive data is also prioritised. This work should be done in collaboration with other teams at TU Delft: the Data Protection Officer, the Research Data Support team at the Library, the ICT team and the Human Research Ethics Committee.
Subsequent two years during which the Data Steward will be appointed at 1,0 FTE could be solely devoted to developing solutions for the remaining priority data management needs and also to evaluating the project. Comprehensive evaluation of the project should help the faculty make an informed decision on how to take the Data Stewardship forward after the end of 2020.
I would like to thank: all researchers who agreed to participate in my interviews for their time and valuable feedback; Martijn Blaauw for interviewee suggestions and introduction to the faculty; Alastair Dunning and Heather Andrews for comments on this report.
A citable version of this report is available on the Open Science Framework: https://osf.io/8ce5v