By Marta Teperek, Paula Martinez-Lavanchy and Yan Wang
Research Data Alliance (RDA) is an international organisation dedicated to everything about research data. It has over 9,000 members world-wide and has a plenary meeting twice a year at various locations around the globe. Marta Teperek, Paula Martinez-Lavanchy, Yan Wang and Esther Plomp* represented TU Delft Research Data Services and Data Stewards at RDA Plenary 14 meeting in Helsinki 23-25 October 2019 and are sharing their key contributions and take away messages.
Research Data Alliance Plenary meetings are always very rich with new content, innovative ideas, and offer plenty of opportunities for networking and collaboration – all evolving around research data management. The meeting consists not only of three days full of meetings of the various working and interest groups (multiple parallel sessions last from the very morning until late in the afternoon/evening!), but also other events: knowing that 500+ data experts attend each plenary meeting, there are always numerous co-located events. In addition, most people stay at a few hotels located close to the main conference venue, facilitating additional networking opportunities. Discussions often start at very early mornings (breakfast working meetings) and last until late evenings (networking dinners).
Marta, Paula and Yan decided to share a small selection of what we thought were our key contributions and take-away messages and explain why these are relevant to TU Delft.
Libraries for Research Data Interest Group
Marta is the co-chair of the Libraries for Research Data Interest Group (L4RD IG) and co-organised the group’s meeting during Plenary 14. Yan has presented the work looking at Engaging Researchers with Research Data.
What does this group do?
Libraries are actively developing new services in the digital environment, and in research data management in particular. The purpose of this interest group is to provide a forum for international data management experts to share practice and experience as RDM library services mature and to keep each other posted about the latest developments on data management.
Why is this important for TU Delft?
TU Delft wishes to be at the forefront of innovation when it comes to research data management. Being part of the group means not only being part of the network of international library experts on data management but also being at the centre of the newest developments and initiatives in RDM.
Research Data Management in Engineering
Paula is one of the co-chairs of Research Data Management in Engineering IG and co-organised the group’s kick-off meeting during Plenary 14.
What does this group do?
Engineering comprises a vast span of sub-disciplines including for example chemical, civil, electrical, and mechanical engineering. The research in Engineering is highly multidisciplinary, often involves close collaboration with industry and it generates a vast range of outputs from innovative materials to the production of software. Research data management practices within these sub-disciplines tend to be unaligned with e.g. open data initiatives/requirements and the implementation of the FAIR data principles. This group aims at changing the culture of handling data, creating awareness and bridging (sub-)communities and existing initiatives. It also aims at providing a platform for developing consensus on RDM best practices for engineering and to actively collaborate with other groups at RDA for adopting and/or adapting their outputs to be used by researchers in engineering.
Why is this important for TU Delft?
TU Delft has a strong focus on engineering and innovation. Our researchers are engaging more and more in Open Science and incorporating Research Data Management best practices within their workflows thanks to the work of our Data stewards. But, there are still some barriers to fully comply with the FAIR data principles e.g. the lack of metadata standards in engineering disciplines. Some concerns on how to balance the collaboration with industry and open science practices also exist. Through the involvement in this group, TU Delft hopes to overcome those barriers by working together with the research communities and institutions which are RDA members. After Plenary 14, the work of the group will focus on metadata standards, re-use of data and open science within engineering.
RDA also provides a great opportunity to link and disseminate the work done by TUDelft in the Conference of European Schools for Advanced Engineering Education and Research (CESAER) especially through the Task Force Open Science.
Birds of a Feather: Engaging Researchers with Research Data: What Works?
Marta organised a dedicated Birds of a Feather session: Engaging Researchers with Research Data: What Works? Yan presented a case study during the session.
What was this session about?
The objective of this meeting was to receive community feedback on the proposal to create the Engaging Researchers With Research Data Interest Group. So far, the group acted informally and gathered case studies on how academic institutions engage with researchers about research data. Selected case studies were published in the book “Engaging Researchers with Data Management: The Cookbook”. The group now wished to formalise its activities with the aim to continue information exchange about innovative activities, which facilitate researcher engagement with research data and to welcome other RDA members to join the group.
Why is this important to TU Delft?
In order to implement good RDM practice within research communities, a cultural shift is necessary. TU Delft has been actively investing in various strategies aiming to achieve cultural change, and have already benefited from advice and lessons learnt from other universities. For example, TU Delft’s Data Champions programme is a direct inspiration from the University of Cambridge. Through active participation in this group, TU Delft hopes not only to exchange practice and lessons learnt with other RDA members but also stay up to date with new tactics which could help us further increase engagement with our research communities.
- Engaging Researchers with Data Management: The Cookbook – book with case studies on how institutions can effectively engage with researchers about research data.
- Collaborative session notes with links to presentations
Birds of a Feather: Professionalizing data stewardship
Marta also co-organised another Birds of a Feather session on Professionalizing data stewardship.
What was this session about?
During this session, we discussed the various models used by institutions worldwide to provide data stewardship support to research communities. This was followed by a discussion about the need to professionalise data stewardship: to create job profiles, to agree on career progression, on the skills which data stewards need to have and many others. The participants agreed that these issues are beyond individual institutions or countries and are better to be addressed by a dedicated international group, such as RDA.
Why is this important to TU Delft?
TU Delft is at the forefront of data stewardship, which also means it is often the first one to deal with issues related to lack of recognition of data stewards as a dedicated profession: questions about remuneration, career progression, agreed set of skills and tasks are daily problems experienced by the data stewards and those coordinating the programme. Therefore, it is essential for TU Delft to find solutions to these problems, ideally through collaboration with a group of international experts on the topic.
- Data (Stewardship) Makes The Difference: Towards A Community-Endorsed Data Stewardship Profession – blog post by Connie Clare reporting from “Professionalizing Data Stewardship” session
- Collaborative notes from the session
- Link to slides
Education and Training on handling of research data IG
Paula is a member of the Education and Training on handling of research data IG and participated in the meeting of the group organized in Plenary 14.
What does this group do?
Research has become a highly data-intensive activity. Education programmes at universities have recognized that new skills for data analysis are needed and have started to include e.g. programming skills within their curricula. However, there is less focus on skills related to good management, documentation and preservation of data. The objective of this IG is the exchange of information about existing developments and initiatives and promotion of training/education to manage research data throughout the data lifecycle.
Why is this important for TU Delft?
TU Delft is intensively working in preparing its Open Science Strategic plan 2020-2024. One of the cross-cutting themes for all the project lines (Open Access; Open Publishing; FAIR Data; Open Software and Open Education) are ‘skills’. In this line, the Research Data Services team has recently published its Vision for Research Data Management Training at TU Delft. To implement this training vision in a sustainable way TU Delft must collaborate with those communities, organizations and projects that are already providing training and those who would like to get started. Besides exchanging knowledge with those relevant stakeholders, the involvement of TU Delft in this group is very relevant to collaboratively work on the development of training curricula and materials.
The International Research Data Community contributing to EOSC
One of the events which were co-located with RDA was the European Open Science Cloud (EOSC) meeting. The purpose of this meeting was to engage with the international research data community in the development of the EOSC. As a member of the EOSC FAIR Working Group, Marta was one of the co-organisers of one session of this meeting – discussion about FAIR practices across disciplines. Yan was the facilitator and rapporteur of the multidisciplinary research breakout group.
What was this session about?
During this session, members of various disciplinary communities within the EOSC split into separate discussion groups. In addition to covering the well-recognised disciplines like engineering, social science and medical science, there was also a dedicated group focusing on FAIR practices in multidisciplinary research.
The discussion groups provided feedback on FAIR practices within their disciplines: what do the communities already do to put FAIR into reality (e.g. do they have disciplinary standards, do they use disciplinary repositories for their research outputs), what are the societal and technical barriers preventing full implementation of FAIR principles and what could be done to overcome these. They also discussed what services/components within the EOSC require FAIR certification, and what metrics would be the most suitable for the different components identified.
Why is this important to TU Delft?
All members of the EOSC Working groups act as impartial advisors and do not represent the interests of the organisations they are affiliated with. The mission of the FAIR Working Group is to provide recommendations on the implementation of FAIR (Findable, Accessible, Interoperable, Reusable) practices within the EOSC. EOSC is a pan-European initiative and its success is dependant on the practical implementation of FAIR practices across all European research institutions. Given that TU Delft wants to be the frontrunner in the culture change towards better and FAIRer data management, the activities of the group are very important and relevant: it is essential that TU Delft researchers are able to fully participate in the EOSC.
The above are just snapshots of what has happened during the last Plenary 14. Marta, Paula, Yan and Esther returned to TU Delft exhausted, but also very enthusiastic and full of new ideas on how to continue working on further improvement of research data management services at TU Delft. The Plenary 14 is now over and the next one is only in 6 months. However, this does not mean that nothing happens in the time being. To the contrary, the hard work has just begun. Members of working and interest groups continue their work in the period between plenaries – through teleconferences and other types of online collaboration. In-person group meetings at plenaries are important milestones used to review the work progress and jointly agree on new priorities. So Marta, Paula, Yan and Esther will be now busy contributing to the activities mentioned above.
What is really inspiring about RDA, and what makes it very unique compared with other conferences and meetings, is that it focuses on collaborative working – it’s not just another conference where people gather to listen to presentations. In RDA anyone can come up with an important problem, which needs to be solved. If there are enough people who wish to work together to find a solution to this problem, a new working group or interest group can be created, which will look for a solution. Collectively, RDA is a community of 9,000+ data experts worldwide, who are part of a myriad of working and interest groups. This set up truly allows the international community of data management professionals to solve their data management challenges collaboratively – and jointly come up with the best solutions.
Being part of RDA and actively participating in the working/interest groups allows TU Delft to stay close with the state of the art in research data management. It helps maintain the professionalism and a high level of expertise of TU Delft research data services, not only at the international RDM community level but also, more importantly, at our local level – when supporting our own researchers.
* – Esther of course also actively participated in the RDA meeting and contributed to several interest and working groups. Her contribution could not be included here due to conflicting annual leave schedules. This information might be added later.
This post is written by: Marta Teperek, with contributions by Neil Chue Hong, Stefano Cozzini, Marta Hoffman Sommer, Rob Hooft (Chair of the FAIR-practice team), Liisi Lembinen, Juuso Marttila, and it was originally published on the EOSC Secretariat blog.
Let us know how you are implementing the FAIR principles in practice by filling in a brief survey
On the 4th of July 2019, we had a kick-off meeting in Brussels of the FAIR Working group of the EOSC European Open Science Cloud (EOSC) governance. Members of this group have been nominated by the EOSC Governance Board and Executive Board. The aim of the group is to provide recommendations on the implementation of FAIR (Findable, Accessible, Interoperable, Reusable) practices within the EOSC, largely inspired by the action plan outlined in the report Turning FAIR into reality. Given that the FAIR Working Group consists of almost 30 members, we split into 4 teams to enable efficient and effective working: PID Policy, FAIR Practice, Interoperability and Metrics & Certification.
We, the authors of this blog post, are the FAIR Practice team. The key objective of our team is to understand what are the current practices in different (research) communities and what are their levels of FAIRness. After the FAIR principles were published, they rapidly gained a lot of traction and interest, including among the advocates of good data management practices and open data. National and international funding bodies ask researchers to make all their data FAIR as one of their funding conditions. Now, FAIR is to be at the core of the EOSC. Barend Mons even remarked on Twitter that attitudes to FAIR have changed in the last four years – it started to be embarrassing to admit in public that one hadn’t heard of FAIR.
FAIR principles – reality check
Many communities, however, still seem to be far from putting FAIR into their daily practices. The 2018 State of Open Data Report found that just 15% of researchers were “familiar with FAIR principles”. Unsurprisingly, out of the 4 classes of FAIR principles, Interoperability and Reusability were the least understood by the respondents. On 26 June 2019, Marta Teperek attended the Carpentry Connect Conference in Manchester and asked the attendees (around 80 people) if they heard about the FAIR principles. Almost all of them replied positively. However, when Marta asked the follow-up question: “Who would feel comfortable explaining what FAIR data really means in practice?”, only 4 out of 80 people replied “yes”. This is quite revealing given that the participants of the Carpentry Connect conference are typically very well aware of interoperability and reusability issues. Similar reflections were made by Maria Cruz on 19 June 2019 at the OAI 11 – The CERN-UNIGE Workshop on Innovations in Scholarly Communication.
What are the community practices?
So how to bridge that gap? That’s exactly what the FAIR practice team will be investigating and making recommendations on to the European Commission. To develop these recommendations we first need to understand the current community practices. This will allow us to identify both the best practices, which might serve as a source of inspiration for others, as well as barriers preventing communities from implementing FAIR practices. Understanding the barriers will help us to make recommendations to overcome those challenges. The awareness of the current practices and the ability to make realistic expectations is also essential for two other teams of our WG: Interoperability and Metrics & Certification. These teams need to ensure that the recommendations they propose are fit for purpose for the diverse communities they are to serve.
How are we going to do that?
So how are we going to do that? The plan for the group is not to reinvent the wheel, but to instead identify and flag up existing valuable resources which investigate practices in various disciplines (such as, the State of Open Data Report 2018, FAIR Data case studies in Engineering, FAIR Data Advanced Use Cases, FAIR in practice report by Jisc, the FAIR Implementation Matrix), and also to liaise with other projects, such as FAIRsFAIR, which are already investigating these practices. This will allow us to gather a body of knowledge and evidence, based on which recommendations will be made.
How can you get involved?
In order to better understand the community practices, we would be delighted to hear from you. You can get involved in numerous ways:
- Tell us what FAIR principles mean to you in practice by filling in this short (less than 3 minutes!) survey: https://forms.gle/c6TbS42ndvatZbn8A
- Let us know (email@example.com) if you are aware of any existing reports or case studies about FAIR practices in different communities – we will add it to our body of evidence
- Don’t have the time to contribute, but would like to be informed about the work of the FAIR WG? Sign up to the mailing list at firstname.lastname@example.org
If you have any additional questions or comments, don’t hesitate to get in touch with us at any stage by emailing email@example.com.
For more info, read the blog post from the inaugural meeting.
On 27 and 28 February 2019, I attended the NSF FAIR Hackathon Workshop for Mathematics and the Physical Sciences research communities held in Alexandria, Virginia, USA. I travelled to the event at the invitation of the TU Delft Data Stewards and with the generous support of the Hackathon organisers, Natalie Meyers and Mike Hildreth from the University of Notre Dame.
Participants were encouraged to register and assemble as duos of researchers and/or students along with a data scientist and/or research data librarian. I was invited, as a data librarian with a research background in the physical sciences, to form a duo with Joseph Weston, a theoretical physicist by background and a scientific software developer at TU Delft, who is also one of the TU Delft Data Champions.
I presented about the Hackathon at the last TU Delft Data Champions meeting. The presentation is available via Zenodo. All the presentations and materials from the FAIR Hackathon are also publicly available. The FAIR data principles are defined and explained here. This blog post aims to offer some of my views and reflections on the workshop, as an addition to the presentation I gave at the Data Champions meeting on 21 May 2019.
The grand vision of FAIR
The workshop’s keynote presentation, given by George Strawn, was one the highlights of the event for me. His talk set clearly and authoritatively what is the vision behind FAIR and the challenges ahead. Strawn’s words still ring in my head: “FAIR data may bring a revolution on the same magnitude as the science revolution of the 17th century, by enabling reuse of all science outputs – not just publications.” Drawing parallels between the development of the internet and FAIR data, Strawn explained: “The internet solved the interoperability of heterogeneous networks problem. FAIR data’s aspiration is to solve the interoperability of heterogeneous data problem.” One computer (“the network is the computer”) was the result of the internet, one dataset will be FAIR’s achievement. FAIR data will be a core infrastructure as much as the internet is today.
“The internet solved the interoperability of heterogeneous networks problem. FAIR data’s aspiration is to solve the interoperability of heterogeneous data problem.” — George Strawn
Strawn warned that it isn’t going to be easy. The challenge of FAIR data is ten times harder to solve than that of the internet, intellectually but also with fewer resources. Strawn has strong credentials and track record in this matter. He was part of the team that transitioned the experimental ARPAnet (the precursor to today’s internet) into the global internet and he is part of the global efforts trying to bring about an Internet of FAIR Data and Services. In his view, “scientific revolution will come because of FAIR data, but likely not in a couple of years but in a couple decades.”
Researchers do not know about FAIR
Strawn referred mainly to technical and political challenges in his presentation. One of the challenges I encounter in my daily job as a research data community manager is not technical in nature but rather cultural and sociological: how to get researchers engaged with FAIR data and how to make them enthusiastic to join the road ahead? Many researchers are not aware of the FAIR principles, and those who are, do not always understand how, or are willing, to put the principles into practice. As reported in a recent news item in Nature Index, the 2018 State of Open Data report, published by Digital Science, found that just 15% of researchers were “familiar with FAIR principles”. Of the respondents to this survey who were familiar with FAIR, only about a third said that their data management practices were very compliant with the principles.
The workshop tried to address this particular challenge by bringing together researchers in the physical sciences, experts in data curation and data analysts, FAIR service providers and FAIR experts. About half of the participants were researchers, mainly in the areas of experimental high energy physics, chemistry, and materials science research, at different stages in their careers. Most were based in the US and funded by NSF.
These researchers were knowledgeable about data management and for the most part familiar with the FAIR principles. However, the answers to a questionnaire sent to all participants in preparation for the Hackathon, shows that even a very knowledgeable and interested group of participants, such as this one, struggled when answering detailed questions about the FAIR principles. For example, when asked specific questions about provenance metadata and ontologies and/or vocabularies, many respondents answered they didn’t know. As highlighted in the 2018 State of Open Data report, interoperability, and to a lesser extent re-usability, are the least understood of the the FAIR principles. Interoperability, in particular, is the one that causes most confusion.
There were many opportunities during the workshop to exchange ideas with the other participants and to learn from each other. There was much optimism and enthusiasm among the participants, but also some words of caution, especially from those who are trying to apply the FAIR principles in practice. The PubChem use case “Making Data Interoperable”, presented by Evan Bolton from the U.S. National Center for Biotechnology Information, was a case in point. It could be said, as noted by one of the participants, that the chemists “seem to really have their house in order” when it comes to metadata standards. Not all communities have such standards. However, when it comes to “teaching chemistry to computers” – or put in other words, to make it possible for datasets to be interrogated automatically, as intended by the FAIR principles – Bolton’s closing slide hit a more pessimistic note. “Annotating and FAIR-ifying scientific content can be difficult to navigate”, Bolton noted, and it can feel like chasing windmills. “Everything [is] a work in-progress” and “what you can do today may be different from tomorrow”.
What can individual researchers do?
If service providers, such as PubChem, are struggling, what are individual researchers to do? The best and most practical thing a researcher can do is to obtain a persistent identifier (e.g. a DOI) by uploading data to a trusted repository such as the 4TU.Centre for Research Data archive, hosted at TU Delft, or a more general archive such as Zenodo. This will make datasets at the very least Findable and Accessible. Zenodo conveniently lists on its website how it helps datasets comply with the FAIR principles. The 4TU.Centre for Research Data, and many other repositories, offer similar services when it comes to helping make data FAIR.
I am grateful to the University of Notre Dame for covering my travel costs to the MPS FAIR Hackathon. Special thanks to Natalie Meyers from the University of Notre Dame, and Marta Teperek, Yasemin Turkyilmaz-van der Velden and the TU Delft Data Stewards for making it possible for me to attend.
Maria Cruz is Community Manager Research Data Management at the VU Amsterdam.
In the autumn of 2018 I took up the post of Data Steward in the Faculty of Industrial Design Engineering (IDE). As I am not a designer myself (my academic background is in historical literature), a significant portion of my time is dedicated to understanding how research is conducted in the realm of design, in particular trying to compose an overview of the types of data collected & used by designers, as well as how current and upcoming ideas & tools for research data management might potentially benefit their activities. This is no mean feat, and at present I cannot lay claim to more than a superficial understanding of the inner workings of design research. Through day-to-day data steward activities – attending events, reading papers and, perhaps most revealing, conversations with individual researchers, to name but a few – the landscape of design research data gradually becomes more intelligible to me. Cobbling together a coherent picture from these disparate sources requires a modicum of dedicated thought, so it was my good fortune to have recently been invited to an event arranged by the Faculty of Health, Ethics & Society (HES) at Maastricht University to present my experiences with design data thus far. Here we discussed and compared research data practices, and my preparation for this discussion afforded me the opportunity to reflect a bit on what research data means in the field of design, how design methodology relates to other academic fields and what kinds of challenges and opportunities exist for handling data and making it more impactful within the discipline and beyond.
The HES workshop, organized early in February of this year, was a forum for the group to discuss how their work and the data they produce intersect with some of the issues currently being debated within academic communities. A specific goal was to evaluate some of the arguments originating in the (at times competing) discourses of Open Science and personal privacy. Topics of discussion included how one should make sociological and healthcare data FAIR, especially given that the materials collected in HES are often predominantly qualitative in nature: personal interviews, ethnographic field notes, etc. Questions surrounding these topics are broadly applicable to some qualitative types of data in design as well, e.g. the extent to which data should be shared, in what format and under what conditions. The slides from my talk are available here: https://doi.org/10.5281/zenodo.2592280, and this blog post is intended to give them some context.
Research Data in Design
Maintaining an overview of the various types and amounts of data produced, analyzed and re-used within the Faculty of Industrial Design Engineering is a core aspect of my work as a data steward, but it is an ongoing challenge due to the heterogeneity of data used by designers and the quantity of different projects simultaneously active. Some designers do market research involving i.a. surveys, others take sensor readings and yet others develop algorithms for improving the manufacturing process. Each of these, along with the many other efforts within IDE, merit their own suite of questions and concerns when it comes to openness and privacy. The more we understand data types and usage in a field, the better we can judge the impact of present and future actions germane to research data – open access initiatives, legislation (esp. the GDPR), shifts in policy or practice, etc. More importantly, we can predict how we might turn some of these to our advantage.
For instance, TU Delft recently instituted a policy that all PhD students will be required to deposit the data underlying their thesis. For new PhD students, this will simply be a part of the process, one step among the many novel activities they experience on the way to earning their PhD. The real challenge lies with members of my faculty, the experienced researchers and teachers, as well as myself, who will have to identify the value in applying this new policy to research data in their field. To do this we must ask ourselves a series of questions. In addition to the aforementioned ‘what kind of data do we have and use?’, we must determine what should be made public as well as to what degree. Underlying all of this is a more fundamental question is, of course: how does sharing this information improve the production of knowledge in design and the fields which it touches? Some of these queries have clear answers, but the majority require further discussion and reflection.
Data Sharing and Data Publishing
One common question I receive in various forms is why designers and design researchers should share their data more widely than they presently do. In many instances I find this returns to the aforementioned issue of diverse types of data. For some designers who have a clear definition of what their data is, why it is collected and how others can use the data, such as the DINED anthropometrics group, a conversation on what data to share and how can be fairly straightforward. But what are the actual benefits of sharing design notes or other types of context-bound qualitative data? In the data management community we have a set of commonly purveyed answers to this query, and I have been trying to see how they match up to existing practice in design.
The first is idealistic, that publishing data will further the field, improve science through increased transparency, accuracy and integrity. Reactions to this argument often take the form of a slow nod, a sign I take to be cautious optimism (one which I happen to share). This outcome is difficult to measure. I was once asked who would be interested in seeing the transcripts of x number of their interviews. A legitimate question, and one with an inscrutable answer – it is difficult to tell who will use your data if they do not know it exists in the first place. A corollary to this is that we ask people to weigh the requisite time investment in making materials publishable (sometimes substantial if working with qualitative and/or sensitive data) against this unpredictable benefit. I believe we need more evidence of the positive impact of making design data FAIR, whether this be figures of dataset citations (currently a desideratum) or anecdotal evidence of new contacts and collaborations resulting from data sharing. Essentially this means a few interested volunteers willing to learn the tools, put in some extra time and test the waters. Will sharing my sensor data attract the attention of a new commercial partner? Will my model be taken up and improved upon by the community using the product or service we design? These are certainly possibilities, but at present they remain a future less vivid.
For PhD students and early career researchers I frequently posit the possibility that publishing data, making their publications Open Access and other actions to make their work more transparent could yield direct career opportunities. This ties into efforts promoting expansion in the interpretation of research assessment such as DORA. In my current position, I feel that designers may be ahead of the curve when it comes to evaluating research impact. In addition to research papers published in journals boasting various impact factors, desirable results from design projects include engagement tools, reflections from projects, and prototypes to name only a few. The weighting of these outputs is unclear to me when it comes to, e.g. obtaining a research position, but I suspect there is room here for alloting credit to demonstrations of open working. This is certainly the case in some fields where lectureship advertisements include explicit language supporting Open Science. As far as I have been able to determine (in my extremely casual browsing of job postings) this is not yet an element of the narrative designers weave to present their work to potential employers nor one sought by employers themselves. However, data publications as part of CVs attached to grant applications may indeed have some cache, as funding agencies such as the NWO and ZonMw presently stress the importance of such activities in the pursuit of maximizing investment returns in the grants they award. Here is an opportunity to serve the interests of many.
Food for Thought
One of my takeaway messages from these debates is that there is a need for a community – in design, in many research areas – an opportunity to convene and discuss issues and test some of the options being afforded or demanded under the umbrella of Open Science. Some design research shares a number of data issues in common with social sciences – questions of consent, of data collection and access – while others are more aligned with mathematics or medicine. Furthermore I’d be interested to hear whether any RDA outputs have an application in design, as well as whether repositories for design materials would be desirable and how they should be arranged. From my admittedly biased position, I believe there is much that designers stand to gain from picking up versioning tools or sharing data more widely, and I think designers’ methods and the iterative nature of design thinking, as I understand them, could in turn only benefit Open Science communities.
This blog post was originally published by the LSE Impact Blog.
Recommendations on how to better support researchers in good data management and sharing practices are typically focused on developing new tools or improving infrastructure. Yet research shows the most common obstacles are actually cultural, not technological. Marta Teperekand Alastair Dunning outline how appointing data stewards and data champions can be key to improving research data management through positive cultural change.
By now, it’s probably difficult to find a researcher who hasn’t heard of journal requirements for sharing research data supporting publications. Or a researcher who hasn’t heard of funder requirements for data management plans. Or of institutional policies for data management and sharing. That’s a lot of requirements! Especially considering data management is just one set of guidelines researchers need to comply with (on top of doing their own competitive research, of course).
All of these requirements are in place for good reasons. Those who are familiar with the research reproducibility crisis and understand that missing data and code is one of the main reasons for it need no convincing of this. Still, complying with the various data policies is not easy; it requires time and effort from researchers. And not all researchers have the knowledge and skills to professionally manage and share their research data. Some might even wonder what exactly their research data is (or how to find it).
Therefore, it is crucial for institutions to provide their researchers with a helping hand in meeting these policy requirements. This is also important in ensuring policies are actually adhered to and aren’t allowed to become dry documents which demonstrate institutional compliance and goodwill but are of no actual consequence to day-to-day research practice.
The main obstacles to data management and sharing are cultural
But how to best support researchers in good data management and sharing practices? The typical answers to these questions are “let’s build some new tools” or “let’s improve our infrastructure”. When thinking how to provide data management support to researchers at Delft University of Technology (TU Delft), we decided to resist this initial temptation and do some research first.
Several surveys asking researchers about barriers to data sharing indicated that the main obstacles are cultural, not technological. For example, in a recent survey by Houtkoop at el. (2018), psychology researchers were given a list of 15 different barriers to data sharing and asked which ones they agreed with. The top three reasons preventing researchers from sharing their data were:
- “Sharing data is not a common practice in my field.”
- “I prefer to share data upon request.”
- “Preparing data is too time-consuming.”
Interestingly, the only two technological barriers – “My dataset is too big” and “There is no suitable repository to share my data” – were among three at the very bottom of the list. Similar observations can be made based on survey results from Van den Eynden et al. (2016) (life sciences, social sciences, and humanities disciplines) and Johnson et al. (2016) (all disciplines).
At TU Delft, we already have infrastructure and tools for data management in place. The ICT department provides safe storage solutions for data (with regular backups at different locations), while the library offers dedicated support and templates for data management plans and hosts 4TU.Centre for Research Data, a certified and trusted archive for research data. In addition, dedicated funds are made available for researchers wishing to deposit their data into the archive. This being the case, we thought researchers may already receive adequate data management support and no additional resources were required.
To test this, we conducted a survey among the research community at TU Delft. To our surprise, the results indicated that despite all the services and tools already available to support researchers in data management and sharing activities, their practices needed improvement. For example, only around 40% of researchers at TU Delft backed up their data automatically. This was striking, given the fact that all data storage solutions offered by TU Delft ICT are automatically backed up. Responses to open questions provided some explanation for this:
- “People don’t tell us anything, we don’t know the options, we just do it ourselves.”
- “I think data management support, if it exists, is not well-known among the researchers.”
- “I think I miss out on a lot of possibilities within the university that I have not heard of. There is too much sparsely distributed information available and one needs to search for highly specific terminology to find manuals.”
It turns out, again, that the main obstacles preventing people from using existing institutional tools and infrastructure are cultural – data management is not embedded in researchers’ everyday practice.
How to change data management culture?
We believe the best way to help researchers improve data management practices is to invest in people. We have therefore initiated the Data Stewardship project at TU Delft. We appointed dedicated, subject-specific data stewards in each faculty at TU Delft. To ensure the support offered by the data stewards is relevant and specific to the actual problems encountered by researchers, data stewards have (at least) a PhD qualification (or equivalent) in a subject area relevant to the faculty. We also reasoned that it was preferable to hire data stewards with a research background, as this allows them to better relate to researchers and their various pain points as they are likely to have similar experiences from their own research practice.
Vision for data stewardship
There are two main principles of this project. Crucially, the research must stay central. Data stewards are not there to educate researchers on how to do research, but to understand their research processes and workflows and help identify small, incremental improvements in their daily data management practices.
Consequently, data stewards act as consultants, not as police (the objective of the project is to improve cultures, not compliance). The main role of the data stewards is to talk with researchers: to act as the first contact point for any data-related questions researchers might have (be it storage solutions, tools for data management, data archiving options, data management plans, advice on data sharing, budgeting for data management in grant proposals, etc.).
Data stewards should be able to answer around 80% of questions. For the remaining 20%, they ask internal or external experts for advice. But most importantly, researchers no longer need to wonder where to look for answers or who to speak with – they have a dedicated, local contact point for any questions they might have.
Data Champions are leading the way
So has the cultural change happened? This is, and most probably always be, a work in progress. However, allowing data stewards to get to know their research communities has already had a major positive effect. They were able to identify researchers who are particularly interested in data management and sharing issues. Inspired by the University of Cambridge initiative, we asked these researchers if they would like to become Data Champions – local advocates for good data management and sharing practices. To our surprise, more than 20 researchers have already volunteered as Data Champions, and this number is steadily growing. Having Data Champions teaming up with the data stewards allows for the incorporation of peer-to-peer learning strategies into our data management programme and also offers the possibility to create tailored data management workflows, specific to individual research groups.
Technology or people?
Our case at TU Delft might be quite special, as we were privileged to already have the infrastructure and tools in place which allowed us to focus our resources on investing in the right people. At other institutions circumstances may be different. Nonetheless, it’s always worth keeping in mind that even the best tools and infrastructures, without the right people to support them (and to communicate about them!), may fail to be widely adopted by the research community.
A PDF (and citable) version of this document is available via Zenodo. DOI: https://doi.org/10.5281/zenodo.1316938
SURF – as the national collaborative ICT organisation for the Dutch education and research environment – has joined the effort to support the FAIR data principles implementation and application in the Netherlands. The first product of this endeavor is a report of the six case studies that were conducted by Melanie Imming. The interviewed institutions span from support services of various universities, over to research institutions, and ending with the national health care institute.
The purpose of this report is to build and share expertise on the implementation of FAIR data policy in the Netherlands. The six use cases included in this report describe developments in FAIR data, and different approaches taken, within different domains. For SURF, it is important to gain a better picture of the best way to support researchers who want to make their data FAIR. – Melanie Imming. (2018, April 23). FAIR Data Advanced Use Cases: from principles to practice in the Netherlands (Version Final). Zenodo. http://doi.org/10.5281/zenodo.1250535
On 22nd May 2018 the report was officially launched, accompanied by a lovely workshop in the SURF venue in Utrecht.
This week, we are presenting at the International Digital Curation Conference 2018 in Barcelona.
This presentation can be downloaded from Zenodo.
The pre-print version of the practice paper accepted for the conference is available on OSF Preprints.
Title: From Passive to Active, From Generic to Focused: How Can an Institutional Data Archive Remain Relevant in a Rapidly Evolving Landscape?
Authors: Maria J. Cruz, Jasmin K. Böhmer, Egbert Gramsbergen, Marta Teperek, Madeleine de Smaele, Alastair Dunning
Abstract: Founded in 2008 as an initiative of the libraries of three of the four technical universities in the Netherlands, the 4TU.Centre for Research Data (4TU.Research Data) provides since 2010 a fully operational, cross-institutional, long-term archive that stores data from all subjects in applied sciences and engineering. Presently, over 90% of the data in the archive is geoscientific data coded in netCDF (Network Common Data Form) – a data format and data model that, although generic, is mostly used in climate, ocean and atmospheric sciences. In this practice paper, we explore the question of how 4TU.Research Data can stay relevant and forward-looking in a rapidly evolving research data management landscape. In particular, we describe the motivation behind this question and how we propose to address it.
Slides for presentation including active links at Open Science Days 2017 in Berlin hosted by Max Planck Digital Library (MPDL) on 17th October 2017.