Authors: Heather Andrews, Nicolas Dintzner, Alastair Dunning, Kees den Heijer, Santosh Ilamparuthi, Jeff Love, Esther Plomp, Marta Teperek, Yasemin Turkyilmaz-van der Velden, Yan Wang
From February 2019 onwards and with the appointment of the data steward at the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS), the team of data stewards is complete: there is a dedicated data steward per every faculty in TU Delft. Therefore, the work in 2019 focuses on embedding the data stewards within their faculties, policy development, and also on making the project sustainable beyond the current funding allocation.
The document below outlines high-level plans for the data stewardship project in 2019.
Engagement with researchers
In 2019, the data stewards will (among others) apply the following new tactics to increase researchers’ engagement with research data management:
Meeting with all full professors
Inspired by the successful case study at the faculty of Aerospace Engineering, data stewards will aim to meet with all full professors at their respective faculties.
Development of training resources for PhD students and supervisors
Ensure that appropriate training recommendations and online data management resources are available for PhD students to help them comply with the requirements of the TU Delft Research Data Framework Policy. These should include:
- Appropriate resources for PhD students, e.g. support for data management plan preparation, and/or data management training for PhD students
- Support for PhD supervisors, e.g. data management guidance and data management plan checklists for PhD supervisors
- Online manuals/checklists for all researchers, e.g. information on TU Delft storage facilities, how to request a project drive, how to make data FAIR
Support for data management plans preparation
Ensure that researchers at the faculty are appropriately supported in writing of data management plans:
- At the proposal stage of projects, researchers are notified about available support for writing the data paragraph by the contract managers and/or project officers of their department
- All new grantees are contacted by the data stewards with an offer of data management and data management plan writing support
- Training resources on the use of DMPonline, which will be used by TU Delft for writing Data Management Plans, are available and known to faculty researchers
Coding Lunch & Data Crunch
Organise monthly 2h walk-in sessions for code and data management questions for faculty researchers. Researchers will be supported by all data stewards and the sessions will rotate between the 8 faculties.
The Electronic Lab Notebooks trial
Following up on the successful Electronic Lab Notebooks event in March 2018, a pilot is being set up to test Electronic Lab Notebooks at TU Delft in 2019. The data stewards from the faculties of 3mE and TNW are part of the Electronic Lab Notebooks working group and are in contact with interested researchers who will be invited to get involved in the pilot.
Further develop the data champions network at TU Delft:
- Ensure that every department at every faculty has at least one data champion
- Develop a community of faculty data champions by organising a meeting every two months on average
- Organise two joint events for all data champions at TU Delft and explore the possibility of organising an international event for data champions in collaboration with other universities
Faculty policies and workflows
In 2019, all faculties are expected to develop their own policies on research data management. However, successful implementation of these policies will depend on creating effective workflows for supporting researchers across the research lifecycle. Therefore, the following objectives are planned for 2019:
- Draft, consult on and publish faculty policies on research data management.
- Develop a strategy for faculty policy implementation
- Develop effective connections and workflows to support researchers throughout the research lifecycle (e.g. contacting every researcher who was successfully awarded a grant)
A survey on research data management needs was completed at 6 TU Delft Faculties (EWI, LR, CiTG, TPM, 3mE and TNW). In 2019, the following activities are planned:
- Publish the results of the survey conducted in the 6 faculties in a peer-reviewed journal
- Conduct the survey at BK and IDE – first quarter of 2019
- Re-run the survey at EWI, LR, CiTG, TPM, 3mE and TNW – September 2019
- Compare the results of the survey in 2017/2018 with the results from 2019 of the re-run survey and publish faculty-specific reports with their key reflections on the Open Working blog
- Survey data visualisation in R or python
The visualisation of 2017/2018 RDM survey results was available in Tableau, which is proprietary software. To adhere to the openness principle, and also to practice data carpentry skills (see below), the 2019 data visualisation will be conducted in R.
Training and professional development
On top of specific training on data management, in 2019 data stewards will invest in training in the following areas:
Software carpentry skills
Code management is now an integral part of research and is likely to become even more important in the coming years. Therefore, as a minimum, every data steward should complete the full software carpentry training as an attendee in order to be able to effectively communicate with researchers about their code management and sharing needs. In addition, data stewards are strongly encouraged to complete training for carpentry instructors to further develop their skills and capabilities.
Participation in disciplinary meetings
In order to keep up with the research fields they are supporting, data stewards will also participate in at least one meeting, specific to researchers from their discipline. Giving talks about data stewardship / open science during disciplinary meetings is strongly encouraged.
In addition to dedicated events for the Data Champions, the following activities are planned for 2019:
- 28 January 2019:
- Talk by Sebastian Karcher: Limits of Reproducibility: Strategies for Transparent Qualitative Research
- Workshop by Sebastian Karcher: Managing Qualitative Data for Sharing and Transparency
- 16 May 2019: Afternoon seminar on publishing reproducible research
- Seminar bi-monthly seminar series “”Future Forward: Science in the Open Era”, starting on 27 February with a talk by Dr Tim Smith “Research Markets or Research Commons”
In addition, the team is planning to organise the following events (no dates yet)
- Software Carpentry workshops
- March & November 2019 – at TU Delft
- May 2019: at Eindhoven
- October 2019: at Twente
- Workshop on preserving social media data – workshop which will feature presentations from experts in the field of social media preservation, as well as investigative journalists (e.g. Bellingcat)
- Conference on effectively collaborating with the industry (managing the tensions between open science and commercial collaborations)
Individual roles and responsibilities
Some data stewards have also undertaken additional roles and responsibilities:
- Yasemin: Electronic Lab Notebooks, Data Champions
- Esther: Electronic Lab Notebooks, DMP registry
- Kees: Software Consultancy Lead
Sustainable funding for data stewardship
The current funding for the data stewardship project (salaries for the data stewards) comes from the University’s Executive Board and is until the end of 2020. However, the importance of the support offered to the research community by the data stewards has been already recognised not only by the academic community at TU Delft but also by support staff.
In order to ensure the continuation of the data stewardship programme and for TU Delft not to lose the highly skilled, trained and sought-after professionals, it is crucial that the source of sustainable funding is identified in 2019.
Written by Maria Cruz, VU Community Manager Research Data Management on 15 November 2018.
This blog post has been originally published on the Vrije Universiteit Amsterdam Research Support Newsletter (re-blogged with permission).
This is the interview between Maria Cruz and Prof. Bas Teusink, the Scientific Director of the Amsterdam Institute for Molecules, Medicines and Systems (AIMMS) about his experience with having dedicated data management support for his research group.
“I hired the right person at the right time”, says Prof. Bas Teusink , Scientific Director of the Amsterdam Institute for Molecules, Medicines and Systems (AIMMS). His institute was founded in 2010 on the back of major breakthroughs in the fields of molecular, cellular and systems biology. Recently, rapid changes in the pace of data acquisition and data volume in this field asked for the hiring of a dedicated Research Data Manager.
Why has data management become so important in your field?
“At AIMMS our focus is on molecular life sciences – the study of molecules in living systems, of how molecules affect living systems, and of the molecular mechanisms of how drugs work, how toxic compounds work, and how cells work. For biologists, the generation of data is getting less and less labour-intensive, and the interpretation of the data is getting more and more complicated.
Does this mean that researchers need to acquire new skills?
“Yes, bioinformatics, data analysis, and data science are becoming more and more prominent in biology and also in chemistry. It would be a good idea for any bachelor programme in the life sciences to include proper data management, data science, and a little bit of programming and maybe bioinformatics in the curriculum. We’re developing such courses for the bachelor students of the Faculty (of Science).”
Why did you think a dedicated research data manager was needed?
“People in the life sciences community have been talking a lot about the importance of Research Data Management (RDM). When you think about biobanks and other types of big data collections, it is obvious that you have to sort out your data management, but what about a PhD student doing simple experiments in the lab using Excel to process data? How do we help them? As a Principal Investigator, I have no idea how to instruct my students in RDM. I’m not an expert. So I needed support. I needed somebody who actually has the time to look up what tools are available and who can translate general policies and general infrastructure into daily practical solutions that fit our local needs. There’s a huge gap between policy and implementation for people doing the daily work. We need discipline-specific support and we need hands-on help.”
What skills did you look for in a data manager?
“I wanted somebody who understands our field of work, who understands the data management side of things, and who also understands the technologies.”
Was it difficult to find the right person for the job?
“I happened to have Brett Olivier in my group and I could convince management that research data support was worth the investment. Brett is a biochemist with a strong theoretical background, but he also knows how to do experiments, so he can talk with everybody. He has also moved into programming and writing scientific software. Having this technical background means he can talk with people in IT. So he is the perfect guy.”
How is this position financed?
“We have found a pragmatic way of financing Brett’s position. And that is by project money. When we write a project proposal, if the funders find data management important, we budget a certain amount for data management, say 20K. If we get 5 projects, then we can afford a data manager just from project money. So far I’ve been able to fund Brett almost completely from my own projects.”
Is this funding model sustainable?
“I think it shouldn’t be difficult to finance somebody with this model for the long term. The university or the institute will have to take the risk, of course. If the money doesn’t come in, if the projects are not funded, then somebody has to pay the salary of the data manager. What is interesting with this model is that the chance of getting your project funded increases, because research data management is being taken more and more seriously by the funding agencies.”
What is Brett doing in concrete terms?
“He writes the Data Management Plans (DMPs) for project proposals and supports their implementation. He has been actively involved in the piloting and implementation of a new data management platform with AIMMS researchers. Brett has developed encoding standards for computational models of biological systems. Because of that, he knows how important it is to annotate data using appropriate ontologies and thereby making them more FAIR (Findable, Accessible, Interoperable and Reusable). Many scientists don’t know what an ontology is, let alone use it. Brett will address this and related RDM issues by providing advice on what the current best standards, tools and practices are in the field.”
“Well implemented data strategies can contribute to the quality and efficiency of a research project.”
On 26 June 2018, the new TU Delft Research Data Framework Policy was approved by TU Delft’s Executive Board. The Framework Policy is an overarching policy on research data management for TU Delft as a whole and it defines the roles and responsibilities at the University level. In addition, the Framework provides templates for faculty-specific data management policies.
From now on, the deans and the faculty management teams, together with the Data Stewards, will lead the development of faculty-specific policies on data management which will define faculty-level responsibilities.
If you are working at TU Delft and if you would like to be involved in the development of faculty-specific policies, please do get in touch with the relevant Data Steward.
The full text of the policy (pdf) is available below.
This blog post reports from a workshop session led by Marjan Grootveld and Ellen Leenarts from DANS. The workshop was part of a larger event “Towards cultural change in data management – data stewardship in practice” organised by TU Delft Library on 24th of May 2018.
This blog post was written by Marjan Grootveld from DANS it was published before on the OpenAIRE blog.
It’s not just colonel Hannibal Smith, who loves it when a plan comes together. Don’t we all? On a more serious note, this also holds for Data Management Plans or DMPs. In a DMP a researcher or research team describes what data goes into a project (reuse) and comes out of it (potential reuse), How the team takes care of the data, and Who is allowed to do What with the data When.
Just like a project plan a DMP undergoes a reviewing process. Often, however, researchers share their draft version and questions with research support staff and data stewards (see the results of this survey by OpenAIRE and the FAIR Data Expert Group). About twenty data stewards shared their review and pre-view experiences in a lively session at the Technical University Delft on May 24th. During the day the organisers and speakers highlighted various aspects of data stewardship with a welcome focus on practice situations, especially in the break-out sessions. (When the presentations are available we will add a link to this blog post.)
In the session called “Why is this a good Data Management Plan?” Marjan Grootveld (DANS, OpenAIRE) and Ellen Leenarts (DANS, EOSC-hub) presented text samples taken from DMPs. By raising their hands – or not! – and subsequent discussion the participants gave their view on the quality of the sample DMP texts. For instance, the majority gave a thumbs-up for “A brief description of each dataset is provided in table 2, including the data source, file formats and estimated volume to plan for storage and sharing”. In contrast, the quote “Both the collected and the generated data, anonymised or fictional, are not envisioned to be made openly accessible.” drew a good laugh and the thumbs went down. Similarly, the information that the length of time for which the data will remain re-usable “may vary for the type of data and <is> difficult to specify at this stage of the project” was found not acceptable; the plan should a least explain why it is difficult, and how and when the project team nevertheless will provide a specific answer. And is it really more difficult than for other projects, whose DMPs do provide this information?
Although it can be hard to be specific in the first version of a DMP, it’s essential to demonstrate that you know what Data Management is about, and that you will deliver FAIR and maximally Open data. Does the DMP, for instance, tell what kind of metadata and documentation will be shared to provide the necessary context for others to interpret the data correctly? Does it distinguish between storing the data during the project and sustainably archiving them afterwards? (Yes, we had a sample text neatly describing the file formats during the data processing stage versus the file formats for sharing and preservation.)
There was consensus in the group on the quality of most of the quotes. Where opinions differed, this had mainly to do with the fact that the quotes were brief and therefore open to more lenient or more picky interpretation. In other cases, a sample text had both positive and negative aspects. For instance, “The source code will be released under an open source licensing scheme, whenever IPR of the partners is not infringed.” was found rather hedging (“whenever”) and unspecific (which licensing scheme?), but the plan to make also source code available is good; too often this seems to be forgotten, when the notion of “data” is understood in a limited way.
The session participants agreed that a plan with many phrases like “where suitable/ where appropriate/ should/ possibly” is too vague and doesn’t inspire much trust. On the other hand, information on who is responsible for particular data management activities is valuable, and so is planning like “The work package leaders will evaluate and update the DMP at least in months 12, 24 and 36”. Reviewers prefer explicit information and commitment to good intentions – which may be something to keep in mind for your “Open A-Team“.
Written by: Mary Donaldson and Vessela Ensberg
On the 21st February 2018, a Birds of a Feather session was held as part of the 13th International Digital Curation Conference in Barcelona on ‘Data management costing in grants’. The session was proposed and chaired by Marta Teperek of TU Delft.
The session proposal recognised that ‘many research funders now require that research data is properly managed and shared. Consequently, many agree for the costs of data management to be budgeted in grant proposals. This is necessary for the sustainability of data management activities. So why is this not a normality yet?’
Identifying the problems
We identified two main sources for data management not being included in the grant proposal budget: lack of awareness among researchers as to what funds they can request and lack of available support at research institutions.
Among all the usual suspects for the reasons why Research Data Management (RDM) activities are not costed into grant proposals
- researchers prefer to ask for money for other purposes
- researchers are not aware which costs are eligible
- researchers believe that RDM costs should come from award overhead
Identifying solutions to researchers’ issues
Some of the RDM activities we identified as eligible for funding are
- transcription of interviews
- data anonymization
- data curation assistance (outside of existing central posts)
We acknowledge that some of these activities are already included in research proposals as parts of the normal research process, and a specialist, such as a data curator, maybe difficult to hire for a less than full-time post. Growing the list of examples and viable options is likely key to having data management included in grant budgets.
As we moved on from discussing why grants don’t often contain data management costing, we strayed into the related territory of institutional issues. Those included
- worries about ‘double dipping’ for RDM costs, especially when trying to recover staffing costs
- need for training for research admin staff who are directly involved in application processes; high staff turn-over in these positions
- lack of a centralised system which tracks all grant applications or lack of communication between the Office coordinating the grant awards and RDM services
- preservation costs being incurred after the award has been closed
- lack of a pool of ‘expert’ staff which can be hired out to research projects
Identifying solutions to institutional issues
Institutional issues can be addressed by investment in the processes. In particular, Utrecht University and the University of Glasgow gave examples of addressing communication and training of research support staff. The RDM team at Glasgow investigating the possibility of adding a check-box to the central grant review system to indicate that funding for RDM has been costed and included in the application. Utrecht also provides consultations on data management costs and is experimenting with a pool of data managers who can be hired from the library for a certain amount of time to work on specific projects. The library is funding these positions but hopes to be able to recover up to 75% of the cost of each position from research projects in the future.
We also looked for lessons learned from the Open Access for publications. Funders have experimented with different models to pay for the more mature requirement for open access to publications in recent years. We explored whether these models could be adapted to help with the requirement for data management and sharing of research data. The first model we discussed was the FP7 pilot for open access where eligible projects were entitled to apply to a central pot of money, provided certain conditions were met. This pilot is due to end this week (28th Feb 2018), and has encountered administrative issues. In the UK, Research Councils UK (RCUK) have provided large research-intensive institutions with a block grant award to pay for Open Access charges for eligible articles. At the end of the pilot, RCUK will accept longer embargo periods. While we felt that centralized pots of money might work to support data management, the administrative burden of this funding is high.
To summarize, institutions can consider the following options to boost up data management inclusion in the grant budget.
- An institution should have a centralized grant administration system. These systems can be adapted to ensure data management is included in the budget.
- RDM should provide more advocacy with researchers using vocabulary the researchers understand and relate to. RDM should match researchers with resources to support costing of RDM activities.
- Providing seed funding to researchers for legacy projects. These might help researchers engage better with RDM and consider their needs earlier in the process on subsequent projects.
- Institutions should consider having a core team of RDM specialists (data curators, statisticians etc) whose time can be bought out by grants, in the way that technicians already are in the life sciences.
- Provide in-depth training for technical or other support staff to enable them to deliver data management for a project. This would provide regular subject-specific RDM support for projects and help build capacity in departments.
However, despite all the ways in which institutions could help improve and support costing for RDM activities, we felt that tackling funders to better support this process would be more effective than each institution having to develop their own solutions. We also thought that funders should be alerted that in cases in which they only require an outline plan at the time of application, by the time the award is made and a more detailed plan is developed, the opportunity to identify and cost data management activities has passed
Proposed funder interventions
- Improve review process for data management plans. Check for discrepancies between the RDM activities promised and the resources requested.
- Provide a clear statement with examples about acceptable and fundable data management activities.
- Indicate the proportion of each grant award expected to be spent on RDM activities.
This could be expressed as a percentage, or a range (to avoid the figure itself from becoming a point for argument) and would signal to researchers that funders don’t see RDM as a waste of money that could better be spent on generating more research data.
- Make it clear who in the funding body is the person /role to contact to discuss RDM issues. RDM requirements are still new enough that clarification is regularly required.
- Fund more data re-use.
For researchers, the cost/benefit analysis of making research data available is difficult to assess. Issuing calls specifically to encourage re-use of datasets would improve the understanding of data re-use and drive demand for shared datasets, helping tip the scales in favour of sharing data.
Ultimately, better alignment of funder RDM requirements would make it simpler for researchers to comply. It was mentioned that Research Data Alliance RDA had tried to get a funder working group together. Perhaps this is something Science Europe could also be involved with.
Jisc have funded a project in the UK to produce centralised guidance by July on the following:
- What do different funders require in terms of RDM?
- What do different funders require in terms of data sharing?
- What are different funders willing to pay for?
- How should funding for RDM be justified in grant applications?
- How can funds for RDM be used by institutions?
- Utrecht University data management costing guidance: https://www.uu.nl/en/research/research-data-management/guides/costs-of-data-management
- RDM: http://www.uu.nl/rdm
- Data management costing tool and checklist: http://www.data-archive.ac.uk/media/247429/costingtool.pdf
- ‘Funder requirements for datasets’ project (Jisc-funded): https://rdmfunderrequirements.wordpress.com/
Written by: Marta Teperek and Madeleine de Smaele
On 3 November 2017 Madeleine de Smaele from TU Delft Library was invited by Scott Cunningham, Associate Professor at the Faculty of Technology, Policy & Management, to deliver a workshop to his Data Science students. Marta Teperek attended the workshop as an observer and, given that she only started working at TU Delft on 15 August, it was also a good opportunity for her to learn more about data management support available to researchers.
Below are our key reflections on that session.
Structure and content
Madeleine’s session was divided into two parts, each one lasting for 45 minutes and with a 15 minutes break in between. The session was a mixture of Madeleine’s presentation and some interactive exercises.
Part I – finding datasets
The first part introduced:
- The key elements of research data management: data backup, file organisation and data management plans
- Information about data licensing, with the CC kiwi video explaining the different types of Creative Commons licences
- Information about research data repositories and finding existing research datasets
The first part concluded with an interactive exercise where participants were asked to find a repository and a dataset of interest for their research, by using re3data.org. Afterwards, we had a roundtable discussion about the datasets found by the participants and what was good and what not so good about them (e.g. clear licence, citation, DOI).
Part II – publishing own datasets
In the second part of the workshop, we discussed the benefits and ways of publishing own research data. We thought this was relevant to the course participants as they had been working on a dataset for their data science course. We thought that they could have been interested in sharing their study results in a repository, and thus getting credit for their work. We spoke about DOIs, visibility and tracking citations.
The second part finished with an exercise as well, where participants were allowed to practise depositing research data into the 4TU.Centre for Research Data.
This was the first time that TU Delft Library was delivering a similar presentation to students, so we thought it was necessary to ask the participants for feedback afterwards to see how the session could be improved in the future.
What went well
We were happy to see that participants valued the interactive exercise on finding existing datasets and that they liked the information we provided about data sharing possibilities. Many participants were also happy to learn about the various repositories available for them to use (not only for datasets), as well as about the dedicated support available to them at TU Delft.
We were also happy to see that students liked the slides and they valued the presenter.
What could be improved
It was also extremely useful for us to learn how our sessions could be improved in the future.
The primary suggestion was to tailor the content to the level of knowledge of the students. It turned out that students were already familiar with the principles behind good data management and the benefits of data sharing, and therefore wished the pace of the session to be increased and more focused on the parts they were not aware of. In addition, the participants wanted to see more examples tailored to their discipline and types of research.
The other suggestion was to make the session more interactive: to ask more questions and to facilitate more discussion throughout the session. This could also allow the presenter to expose the right content to the participants during the presentation.
In the future, we will want to find out more about the audience in advance of the workshop to ensure that we can tailor the messages, examples and pace of the session better. We will also revise the content of the workshop to make it more interactive and to facilitate more discussions with the participants along the session.
In addition, we also had issues with accessing the live version of the 4TU.Centre for Research Data during the demo, which was quite unfortunate. To future-proof ourselves, we will prepare some screenshots of a deposit process and always have the slides with us during similar presentations.
Overall, it was a very useful exercise for us and provided us with a lot of ideas on how we could improve the workshop in the future. We are very grateful to both Dr Scott Cunningham and his students for the opportunity.
- Madeleine’s presentation: https://doi.org/10.6084/m9.figshare.5579731.v1
Slides for presentation including active links at Open Science Days 2017 in Berlin hosted by Max Planck Digital Library (MPDL) on 17th October 2017.