A PDF (and citable) version of this document is available via Zenodo. DOI: https://doi.org/10.5281/zenodo.1316938
On 26 June 2018, the new TU Delft Research Data Framework Policy was approved by TU Delft’s Executive Board. The Framework Policy is an overarching policy on research data management for TU Delft as a whole and it defines the roles and responsibilities at the University level. In addition, the Framework provides templates for faculty-specific data management policies.
From now on, the deans and the faculty management teams, together with the Data Stewards, will lead the development of faculty-specific policies on data management which will define faculty-level responsibilities.
If you are working at TU Delft and if you would like to be involved in the development of faculty-specific policies, please do get in touch with the relevant Data Steward.
The full text of the policy (pdf) is available below.
The 4TU.Centre for Research Data announces its report on research data management within the 4TU Research Centres.
Over the last few months, the 4TU.Centre for Research Data had the chance to make contact and to speak with several of the Scientific Directors of the 4TU Research Centres about research data management. The report published today highlights the findings from these contacts and conversations.
A citable version of the report is available on OSF Preprints (DOI: 10.17605/OSF.IO/SGFTW).
1. Research data management is not addressed at a strategic level by the 4TU Research Centres, but left to individual research groups or to individual researchers connected to the Centres.
2. Within the 4TU Research Centres, there is a broad range of attitudes towards data and a broad range of data types and characteristics, including large datasets; commercially sensitive datasets; privacy and ethical concerns regarding data; software and its sustainability.
3. Software sustainability is an important and much discussed topic, but there are currently no standards or systematic way of looking after software.
4. Research on human subjects and datasets including personally identifiable information or sensitive personal information are more prominent than might be expected in engineering and the technical sciences. Lack of transparency and reproducibility of scientific results can be an issue in these areas because the underlying datasets are often not available.
An Opportunity to Collaborate
Research data management is increasingly viewed as an important part of high-quality research. International and national funding bodies now mandate institutions and researchers to make data available. Data sharing is predicated on good research data management and has the potential to make scientific research more transparent, open, and efficient. In view of these principles and developments, the 4TU.Centre for Research Data wishes to maintain and deepen its links with the 4TU Research Centres and to support the Centres in various aspects of research data management.
Authors: Alastair Dunning, Marta Teperek, Anke Versteeg, Wilma van Wezenbeek
This is a joint response from TU Delft Library to the public consultation on the draft version of the VSNU Code of Conduct for Research Integrity.
- The focus on the relationship of research data to research integrity is welcomed.
- There are some inconsistencies in good practice in data management that need to be ironed out.
- The VSNU may need to consider the implications of researchers that frequently work with companies that do not have equivalents to this code.
- Reporting on research is increasingly done via other channels than traditional journals, e.g. via platforms, preprint servers or blogs. The paragraphs on assessment and reporting are still very much focused on the traditional ways of communicating about research.
- With the upcoming GDPR, is there enough being said about the need for researchers to be increasingly aware of their own role in how they share their own personal information and what tools or applications they use and share this information with?
- The document currently does not discuss the importance of management and sharing of source code used to create, process or analyse the data. However, most research projects have now a computational element and the ability to validate and reproduce research results often relies on the availability of the supporting source code. Results of a recent survey revealed that 92% of academics use research software in their research. Therefore, if the Code of Conduct is to be relevant and applicable to the current research practices, the issues associated with managing and sharing software/code need to be addressed.
- Terms such as “data”, “research material” and “sources” need to be defined.
1. Preamble; paragraph 4 Remove “large” in “the growing importance of the way large data files are used and managed” – this is applicable to all data files, and not only to big data.
2.2.7 Many private institutions will not have subscribed to the code, and may not even have these guarantees in place. It is good that this issue is mentioned in the code, but it will have implications for universities and their researchers that work with private companies, particularly smaller companies.
2.3.9 The reference to Citizen Science seems rather cavalier, and perhaps deserves more detail. Research projects can involve thousands of citizen scientists; sometimes they may come from non-western countries, with different ethical expectations/norms etc.
2.4, footnote 7 – Students (such as masters’ students) are excluded from this code of research. But what happens if work done by masters’ student (eg preliminary data collection) is integrated into research?
4.2 Overarching comments to the section on “design” standards:
- Make extra emphasis on transparency by design and the need for planning data management and sharing from the start.
- The ethical and societal issues of fair use and access to research results need to be addressed at the design stage of research experiments. As discussed in the recent issue of Science magazine, research should aim to “ensure that those societies providing and collecting the data, particularly in resource-limited settings, benefit from their contributions”.
4.2.9 What is meant by “joint research”? Should this not apply to any funded or commissioned research?
4.2.12 The research should not be accepted if agreements outlined in point 4.2.9 are not defined and signed by all partners.
4.3.21 “To your discipline” is a bit weak. Consider “appropriate for your discipline and methodology” instead.
4.3.22 Emphasise that all data underpinning an article should be FAIR. The statement is weak at the moment.
4.4 Given that source code is often necessary for validation and reproducibility of research results, it is crucial that availability of source code used to create, process or analyse data is also discussed.
4.4.26 Given the fact that we want all contributors to be acknowledged properly (“author” is not always the right word for this), could we add “and processing” before the data?
4.4.30 Methods and protocols necessary to verify and reproduce research results should be made available.
4.4.37 Rephrase to “Always provide references and attribution when reusing research materials, including research data and code”. It is crucial that any reused research outputs are properly cited and the original authors properly attributed. In addition, the phrasing “that can be used for meta-analysis or the analysis of pooled data” was limiting the scope of the reuse and should be omitted.
4.4.40 More emphasis is needed to ensure that research data and code supporting your findings are available for scrutiny. In addition, emphasise that research outputs should be made as open as possible, as closed as necessary.
5.4 As discussed before, ensure that good practices for managing and sharing research software are also discussed.
5.4.12 Research Infrastructure is the wrong phrase. Rather: “Ensure that proper data management is embedded in the research lifecycle and that the necessary support is provided.”
5.4.13 & 5.4.14 These need clarification. At present they are contradictory (ie should data be stored permanently vs data should be stored for a period appropriate for the discipline. Again, the appeal to disciplinary practice might be incorrect, eg one can have very different data in the same discipline. “Archived in the long term” would be a better phrase than stored permanently.
5.14.15 There is an appeal to the FAIR principle earlier in the document. It should be repeated here.
5.5.17 Does this refer to commercial funders/industry partners as well?
6.3 Under “other measures”, if a retraction would be valid as measure, this should also apply to the underlying data.
Author: Heather Andrews, Data Steward at the Faculty of Aerospace Engineering
Date: 29 January 2018
Author: Marta Teperek, Data Stewardship Coordinator, TU Delft Library
Qualitative interviews with nine researchers at the Faculty of Technology, Policy and Management (TPM) at TU Delft were undertaken in order to get an understanding of data management needs at the faculty in advance of appointing a dedicated Data Steward. The purpose of this was to aid the recruitment of the Data Steward and to define the skills and experience of an ideal candidate, as well as help deciding on the work priority areas for the Data Steward. The results of this research can be also used as a point in time reference to monitor changes in data management practice at the faculty.
The main data management challenges identified were: handling personal sensitive research data; working with big data, managing and sharing commercially confidential information and software management issues. Despite the diversity of problems, some common issues were identified as well: the need for improving daily data management practice, as well as the need for revising workflows for students’ research data. With the exception of one researcher, who was in opposition to the Data Stewardship project, all other researchers expressed their support for the project and welcomed the idea of having a dedicated Data Steward at the faculty.
Additionally, several follow up actions were already undertaken as a follow up of these interviews:
- the Data Stewardship Coordinator was invited to give two talks about Data Stewardship to two different groups of researchers;
- a member of the Research Data Support from the Library team was asked to deliver a training course for students;
- the Data Stewardship Coordinator was asked to discuss the best way of rolling our data management training for PhD students at TPM in coordination with the TPM Graduate School.
Given that the financial allocation for the Data Steward at TPM faculty is currently at 0,5 FTE for the first year and 1,0 FTE for the two subsequent years (until December 2020), it is recommended that the first year is spent on continuing and extending this research to better understand the needs of the faculty. It is suggested that at the same time, the Data Steward starts addressing the most urgent data management needs at TPM faculty, in particular, the development of a data management policy, as well as the development of solutions and recommendations for working with personal sensitive research data.
The two subsequent years could be devoted to developing resources and solutions for the remaining problems and for critical evaluation of the project and its effect on data management practice at the faculty. This approach should provide the faculty with enough resources and information to decide on the best strategy for Data Stewardship beyond December 2020.
Data stewardship has been recognised internationally as a key foundation of future science. Carlos Moedas from the European Commision (EC) said that Open Science “is a move towards better science, to get more value out of our investment in science and to make research more reproducible and transparent. (…) Recent advances such as the discovery of the Higgs boson and gravitational waves, decoding of complex genetic schemas, climate change models, all required thousands of scientists to collaborate (…) on data. And that implies that research data are findable and accessible and that they are interoperable and reusable”. In support of this, the EC anticipated that about 5% of research expenditure should be spent on properly managing and stewarding data. Barend Mons, the Chair of EC high level expert group on the European Open Science Cloud, estimated that 500.000 Data Stewards will be needed in Europe to ensure effective research data management. Consequently, all NWO and H2020 projects starting from 2017 onwards must create a Data Management Plan and are required to make their data open. In addition, the European Open Science Cloud promises new tools and related EC strategy papers suggest new rewards and grant funding schemes (such as FP9) to benefit those practising open science.
TU Delft’s College van Bestuur (CvB) made a strategic decision to be a frontrunner of this global move and a dedicated Data Stewardship programme was initiated. The long-term goal of this programme is to comprehensively address research data management needs across the whole campus in a disciplinary manner. To achieve this, subject-specific Data Stewards are to be appointed at every TU Delft faculty. Strategic funding from the CvB was allocated to support 0,5 FTE of a Data Steward per Faculty until December 2018, and 1,0 FTE of a Data Steward per Faculty for two years from January 2019 to December 2020. Subsequently, faculties are to decide how to best address their researcher data management needs.
In 2017 the first Data Stewards were appointed at three TU Delft faculties: Faculty of Electrical Engineering, Mathematics and Computer Science, Faculty of Civil Engineering and Geosciences and Faculty of Aerospace Engineering. At the beginning of 2018, Data Stewards are to be appointed at the five remaining faculties, including the Faculty of Technology, Policy and Management (TPM).
In order to facilitate the recruitment decision over the appointment of a Data Steward at TPM faculty, the Data Stewardship Coordinator was set out to investigate the faculty’s research data management needs. Qualitative interviews were undertaken with TPM researchers in autumn 2017, which led to the identification of four main data management issues, specific to the types of research done, and revealed some common data problems for the faculty overall. The report below describes the key findings of this research and makes some recommendations for the future work of a Data Steward at TPM Faculty.
Semi-structured qualitative interviews were conducted with four full professors, three associate professors and two assistant professors in September and October 2017. Initial interviewees were selected and approached by the Data Stewardship Coordinator based on their online profile content to ensure a representation of the different research methodologies used across the faculty as well as representation of all three TPM’s departments: Engineering Systems and Services, Multi-Actor Systems and Values, Technology and Innovation. In addition, one researcher was interviewed as a result of a recommendation from the initial interviewee, and two other interviewees were suggested by the Secretary General of the faculty.
All interviewees were informed that interview findings will be used to create a preliminary report on data management needs at the faculty and that the report might be made publicly available. Interviewees were assured that no information will be directly attributed to them and that they will not be named in the report. Interviewees agreed for the interview notes, including personal information, to be shared internally with the Secretary General of the faculty.
Interviews lasted for 30 – 60 minutes. Interviews were not recorded, and instead, notes of key discussion points were taken by the interviewer during the interview.
Categories of data management issues
Diverse nature of research topics at TPM suggested that researchers could have different data management needs. Nine interviews conducted so far revealed that this was indeed the case and identified four top data management issues: handling personal sensitive research data; working with big data, software management issues and managing and sharing commercially confidential information.
Handling personal sensitive research data
Questions about handling of personal sensitive research data were from across the whole research lifecycle: starting with experimental design and ensuring that only the minimum necessary data about people were collected and the right consent forms were in place, all the way through to data anonymisation and deciding which parts of data could be made publicly available, which could be shared only under managed access conditions, and which datasets should never be shared. Researchers also mentioned difficulties of working with sensitive data on a daily basis – the need to use secure servers, encryption to share the data and to ensure that only authorised partners have access to data. Some discussed the challenges of working with sensitive information in fieldwork conditions, especially if the data was politically contentious.
Interviewers wished to have more guidance about recommended workflows and policies, as well as practical support for finding the right storage solutions and means for sharing data with collaborators. In addition, better support was required at the experimental design stage: deciding on the minimal amount of personal information to be collected and drafting the right consent forms. Finally, many expressed the need for resources which could help them with data anonymisation and to manage the risks and benefits of making datasets publicly available.
All these concerns seemed particularly pressing in light of the new EC Data Protection Regulation, coming into force in May 2018. Some interviewees feared that they were unprepared for the new regulation and felt they had not received sufficient information about the impact of the new regulation on their research.
Challenges of working with big data
Challenges of working with big data were mainly related to infrastructure limitations. For researchers working with very large files simple aspects of data management become a difficulty. For example, due to ever-increasing storage requirements for big datasets, many researchers were unable to backup their data. This consequently led to occasional irretrievable data loss. Due to large volumes, big datasets were rarely archived, raising reproducibility concerns. In addition, many researchers had to use third-party computing services in order to effectively process their data. These often resulted in issues associated with very slow data transfer.
Working with big datasets, especially those which needed to be dynamically updated, also meant challenges for data publishing. Many data repositories providers did not offer options for big data sharing and had strict limitations on the maximum size of files. In addition, publishing of big datasets often meant substantial costs and it was often more cost-effective to simply re-generate the data when needed.
Software management issues
The third issue was with software management. In general, researchers did not have policies within their research groups on how software should be managed, annotated and shared. Often the very platforms for software management differed within the same research group. Some researchers felt they did not have sufficient time to annotate their software properly and that their colleagues, especially students, did not have the right skills to effectively work with tools which could help them manage their software better. One researcher mentioned missed commercialisation opportunity due to the fact that the software developed by the group was not understandable to anyone outside the group, including the third party interested in commercialisation.
Interviewees mentioned that due to lack of appropriate skills amongst researchers, there was a need for professional service support in data science. In addition, many suggested that training on the use of software management tools (such as Git, Subversion or Jupyter Notebooks) would be useful, in particular for students. Several wished to receive more information about methods for software archiving and for getting citation credit for code publishing.
Managing and sharing commercially confidential information
Working with commercially confidential data also proved problematic. First, there were tensions between sharing data for the sake of reproducibility, and the need to protect third party’s commercial interests. Interviewees mentioned that navigating between the different contractual clauses could be difficult. One researcher admitted that the inability to share research data obtained from commercial partners made it more difficult to publish papers due to the fact that some journals now required that research data supporting publications was made publicly available. Another researcher felt that collaborating with industry negatively affected the progress of his academic career because commercial clauses consequently meant fewer papers published. That researcher thought that when it came to academic promotions, commercial collaborations were valued less than the number of published articles.
Common data management problems
In addition to data management issues related to the type of research conducted, some common problems mentioned by almost all the interviewees were identified as well. These were related to improving daily data management practice, and to better data management procedures for students.
Daily data management practice
Problems related to daily data management practice concerned issues such as designing a data backup strategy and adhering to it, good file and folder naming, as well as issues with version control. These problems were shared also by researchers who based their research primarily on literature reviews. Overall, very few interviewees established workflows for good data management which would be followed by entire research groups. Most of the time it was down to individuals as to whether data was properly managed or not. Many researchers expressed the wish to improve their data management practice and to attend appropriate training.
Students’ data management practice
Almost all interviewees said that data management practice amongst students needed to be improved and that data management training should be part of the Graduate School’s curriculum. Training needs were related to both awareness of general principles, such as data backup, as well as knowledge of specific techniques and practices, such as data science skills and software management tools.
In addition, one interviewee expressed his concern about the fact that PhD students were not required to archive their research data at the time of graduation. This, he believed, led to research reproducibility concerns and potential reputational damages. The researcher suggested that all PhD students should be required to archive their research data before leaving TU Delft. This view was shared by researchers from the TPM Policy Analysis section (see ‘Follow up actions undertaken’).
An additional concern regarding students’ data was raised during the meeting with researchers from the Engineering Systems and Services department (see the section ‘Follow up actions undertaken’). When discussing research data ownership, researchers mentioned that according to TU Delft regulations, research data collected by Master students belonged to the students, and not to TU Delft. As a result, in several cases, Master students left TU Delft and took all their research data with them, without leaving a copy with their TU Delft supervisors. Researchers believed that this was a concerning and a serious issue from the research integrity and research continuity point of view. To avoid similar issues occurring in the future and to overcome the unfavourable regulation, supervisors now avoided offering participation in valuable, larger projects to Master students.
Views on Data Stewardship
With the exception of one researcher, who was in strong opposition to the Data Stewardship project, all other researchers welcomed the project and thought that there were data management needs at the faculty which could be addressed by the Data Steward.
The researcher with negative views on the Data Stewardship project thought that appointing a dedicated staff member to support researchers in data management was counterproductive. That researcher believed that a Data Steward would “develop guidelines (…) and hold meetings to raise awareness etc.” instead of solving “any actual operational issue”. He also suggested that a quantitative survey should be done to define the common practices and to decide whether any corrective steps needed to be taken. Interestingly, despite the negative attitude in general, the researcher agreed that there were issues with data management which needed to be solved and thought that training in data management for all PhD students was particularly needed.
Another researcher who welcomed the overall idea of the Data Stewardship project raised his concern about the number of resources allocated to the project and suggested that care was taken to ensure that the project would not result in new compliance expectations.
All remaining researchers were enthusiastic about the project and identified numerous data management issues with which they hoped that a Data Steward could help. These included:
- Advice on data management workflows and best practices (such as data backup, version control, file and folder naming)
- Advice on data sharing and citation
- Advice on working with different types of confidential data (such as personal sensitive and commercially sensitive data)
- Support in designing strategies for sustainable code management
- Advice on code sharing and citation
- Help with managing funders’ and publishers’ expectations
- Training on data and software management, in particular for PhD students
Follow up actions undertaken
As a result of the initial interviews with researchers at TPM, several actions were undertaken, which might suggest that interviewed researchers were genuinely interested in data management issues. First, the Data Stewardship Coordinator was invited to give two presentations about the Data Stewardship project: to researchers from the Department of Engineering Systems and Services, and to researchers from the Policy Analysis section of the Multi-Actor Systems Department. Second, one of the interviewed researchers asked members of the Research Data Services team to deliver a workshop to his students about using data repositories. Third, one of the interviewees made a suggestion to connect with the TPM’s Graduate School to discuss the possibilities of rolling out data management training for PhD students.
In addition, the Data Stewardship Coordinator initiated discussions with other faculties to determine whether the issues around Master students’ research data ownership were also problematic at other faculties and whether the problem should be tackled centrally or not. The Furthermore, the Research Data Services team started liaising with the Human Research Ethics Committee to ensure alignment between research ethics and data management guidelines and policies.
This preliminary report identifies several areas where data management practices at TPM faculty could be improved with the help of a Data Steward. However, given the preliminary nature of these findings and the risk that they might not be representative of the whole faculty, it is recommended that the work of the newly appointed Data Steward is initially focused on a more in-depth investigation of data management needs. While qualitative interviews should be continued, a quantitative survey at the faculty is also needed, in agreement with the advice of the interviewee who was negative about the Data Stewardship project. Indeed, results of quantitative surveys conducted at the three faculties that already have Data Stewards proved to be valuable for measuring the scale of data management issues and deciding on priority actions. The thorough investigation of data management needs will allow the faculty to decide how to prioritise them. Finally, understanding the faculty-specific requirements will inform the development of a faculty data management policy.
In addition, given the fact that many researchers interviewed expressed uncertainties about the recommended procedures for working with personal sensitive data and that the new EC Data Protection Regulation becomes legally binding in May 2018, it is suggested that development of recommendations and training for working with personal sensitive data is also prioritised. This work should be done in collaboration with other teams at TU Delft: the Data Protection Officer, the Research Data Support team at the Library, the ICT team and the Human Research Ethics Committee.
Subsequent two years during which the Data Steward will be appointed at 1,0 FTE could be solely devoted to developing solutions for the remaining priority data management needs and also to evaluating the project. Comprehensive evaluation of the project should help the faculty make an informed decision on how to take the Data Stewardship forward after the end of 2020.
I would like to thank: all researchers who agreed to participate in my interviews for their time and valuable feedback; Martijn Blaauw for interviewee suggestions and introduction to the faculty; Alastair Dunning and Heather Andrews for comments on this report.
A citable version of this report is available on the Open Science Framework: https://osf.io/8ce5v