In the autumn of 2018 I took up the post of Data Steward in the Faculty of Industrial Design Engineering (IDE). As I am not a designer myself (my academic background is in historical literature), a significant portion of my time is dedicated to understanding how research is conducted in the realm of design, in particular trying to compose an overview of the types of data collected & used by designers, as well as how current and upcoming ideas & tools for research data management might potentially benefit their activities. This is no mean feat, and at present I cannot lay claim to more than a superficial understanding of the inner workings of design research. Through day-to-day data steward activities – attending events, reading papers and, perhaps most revealing, conversations with individual researchers, to name but a few – the landscape of design research data gradually becomes more intelligible to me. Cobbling together a coherent picture from these disparate sources requires a modicum of dedicated thought, so it was my good fortune to have recently been invited to an event arranged by the Faculty of Health, Ethics & Society (HES) at Maastricht University to present my experiences with design data thus far. Here we discussed and compared research data practices, and my preparation for this discussion afforded me the opportunity to reflect a bit on what research data means in the field of design, how design methodology relates to other academic fields and what kinds of challenges and opportunities exist for handling data and making it more impactful within the discipline and beyond.
The HES workshop, organized early in February of this year, was a forum for the group to discuss how their work and the data they produce intersect with some of the issues currently being debated within academic communities. A specific goal was to evaluate some of the arguments originating in the (at times competing) discourses of Open Science and personal privacy. Topics of discussion included how one should make sociological and healthcare data FAIR, especially given that the materials collected in HES are often predominantly qualitative in nature: personal interviews, ethnographic field notes, etc. Questions surrounding these topics are broadly applicable to some qualitative types of data in design as well, e.g. the extent to which data should be shared, in what format and under what conditions. The slides from my talk are available here: https://doi.org/10.5281/zenodo.2592280, and this blog post is intended to give them some context.
Research Data in Design
Maintaining an overview of the various types and amounts of data produced, analyzed and re-used within the Faculty of Industrial Design Engineering is a core aspect of my work as a data steward, but it is an ongoing challenge due to the heterogeneity of data used by designers and the quantity of different projects simultaneously active. Some designers do market research involving i.a. surveys, others take sensor readings and yet others develop algorithms for improving the manufacturing process. Each of these, along with the many other efforts within IDE, merit their own suite of questions and concerns when it comes to openness and privacy. The more we understand data types and usage in a field, the better we can judge the impact of present and future actions germane to research data – open access initiatives, legislation (esp. the GDPR), shifts in policy or practice, etc. More importantly, we can predict how we might turn some of these to our advantage.
For instance, TU Delft recently instituted a policy that all PhD students will be required to deposit the data underlying their thesis. For new PhD students, this will simply be a part of the process, one step among the many novel activities they experience on the way to earning their PhD. The real challenge lies with members of my faculty, the experienced researchers and teachers, as well as myself, who will have to identify the value in applying this new policy to research data in their field. To do this we must ask ourselves a series of questions. In addition to the aforementioned ‘what kind of data do we have and use?’, we must determine what should be made public as well as to what degree. Underlying all of this is a more fundamental question is, of course: how does sharing this information improve the production of knowledge in design and the fields which it touches? Some of these queries have clear answers, but the majority require further discussion and reflection.
Data Sharing and Data Publishing
One common question I receive in various forms is why designers and design researchers should share their data more widely than they presently do. In many instances I find this returns to the aforementioned issue of diverse types of data. For some designers who have a clear definition of what their data is, why it is collected and how others can use the data, such as the DINED anthropometrics group, a conversation on what data to share and how can be fairly straightforward. But what are the actual benefits of sharing design notes or other types of context-bound qualitative data? In the data management community we have a set of commonly purveyed answers to this query, and I have been trying to see how they match up to existing practice in design.
The first is idealistic, that publishing data will further the field, improve science through increased transparency, accuracy and integrity. Reactions to this argument often take the form of a slow nod, a sign I take to be cautious optimism (one which I happen to share). This outcome is difficult to measure. I was once asked who would be interested in seeing the transcripts of x number of their interviews. A legitimate question, and one with an inscrutable answer – it is difficult to tell who will use your data if they do not know it exists in the first place. A corollary to this is that we ask people to weigh the requisite time investment in making materials publishable (sometimes substantial if working with qualitative and/or sensitive data) against this unpredictable benefit. I believe we need more evidence of the positive impact of making design data FAIR, whether this be figures of dataset citations (currently a desideratum) or anecdotal evidence of new contacts and collaborations resulting from data sharing. Essentially this means a few interested volunteers willing to learn the tools, put in some extra time and test the waters. Will sharing my sensor data attract the attention of a new commercial partner? Will my model be taken up and improved upon by the community using the product or service we design? These are certainly possibilities, but at present they remain a future less vivid.
For PhD students and early career researchers I frequently posit the possibility that publishing data, making their publications Open Access and other actions to make their work more transparent could yield direct career opportunities. This ties into efforts promoting expansion in the interpretation of research assessment such as DORA. In my current position, I feel that designers may be ahead of the curve when it comes to evaluating research impact. In addition to research papers published in journals boasting various impact factors, desirable results from design projects include engagement tools, reflections from projects, and prototypes to name only a few. The weighting of these outputs is unclear to me when it comes to, e.g. obtaining a research position, but I suspect there is room here for alloting credit to demonstrations of open working. This is certainly the case in some fields where lectureship advertisements include explicit language supporting Open Science. As far as I have been able to determine (in my extremely casual browsing of job postings) this is not yet an element of the narrative designers weave to present their work to potential employers nor one sought by employers themselves. However, data publications as part of CVs attached to grant applications may indeed have some cache, as funding agencies such as the NWO and ZonMw presently stress the importance of such activities in the pursuit of maximizing investment returns in the grants they award. Here is an opportunity to serve the interests of many.
Food for Thought
One of my takeaway messages from these debates is that there is a need for a community – in design, in many research areas – an opportunity to convene and discuss issues and test some of the options being afforded or demanded under the umbrella of Open Science. Some design research shares a number of data issues in common with social sciences – questions of consent, of data collection and access – while others are more aligned with mathematics or medicine. Furthermore I’d be interested to hear whether any RDA outputs have an application in design, as well as whether repositories for design materials would be desirable and how they should be arranged. From my admittedly biased position, I believe there is much that designers stand to gain from picking up versioning tools or sharing data more widely, and I think designers’ methods and the iterative nature of design thinking, as I understand them, could in turn only benefit Open Science communities.
This blog post was originally published by the LSE Impact Blog.
Recommendations on how to better support researchers in good data management and sharing practices are typically focused on developing new tools or improving infrastructure. Yet research shows the most common obstacles are actually cultural, not technological. Marta Teperekand Alastair Dunning outline how appointing data stewards and data champions can be key to improving research data management through positive cultural change.
By now, it’s probably difficult to find a researcher who hasn’t heard of journal requirements for sharing research data supporting publications. Or a researcher who hasn’t heard of funder requirements for data management plans. Or of institutional policies for data management and sharing. That’s a lot of requirements! Especially considering data management is just one set of guidelines researchers need to comply with (on top of doing their own competitive research, of course).
All of these requirements are in place for good reasons. Those who are familiar with the research reproducibility crisis and understand that missing data and code is one of the main reasons for it need no convincing of this. Still, complying with the various data policies is not easy; it requires time and effort from researchers. And not all researchers have the knowledge and skills to professionally manage and share their research data. Some might even wonder what exactly their research data is (or how to find it).
Therefore, it is crucial for institutions to provide their researchers with a helping hand in meeting these policy requirements. This is also important in ensuring policies are actually adhered to and aren’t allowed to become dry documents which demonstrate institutional compliance and goodwill but are of no actual consequence to day-to-day research practice.
The main obstacles to data management and sharing are cultural
But how to best support researchers in good data management and sharing practices? The typical answers to these questions are “let’s build some new tools” or “let’s improve our infrastructure”. When thinking how to provide data management support to researchers at Delft University of Technology (TU Delft), we decided to resist this initial temptation and do some research first.
Several surveys asking researchers about barriers to data sharing indicated that the main obstacles are cultural, not technological. For example, in a recent survey by Houtkoop at el. (2018), psychology researchers were given a list of 15 different barriers to data sharing and asked which ones they agreed with. The top three reasons preventing researchers from sharing their data were:
- “Sharing data is not a common practice in my field.”
- “I prefer to share data upon request.”
- “Preparing data is too time-consuming.”
Interestingly, the only two technological barriers – “My dataset is too big” and “There is no suitable repository to share my data” – were among three at the very bottom of the list. Similar observations can be made based on survey results from Van den Eynden et al. (2016) (life sciences, social sciences, and humanities disciplines) and Johnson et al. (2016) (all disciplines).
At TU Delft, we already have infrastructure and tools for data management in place. The ICT department provides safe storage solutions for data (with regular backups at different locations), while the library offers dedicated support and templates for data management plans and hosts 4TU.Centre for Research Data, a certified and trusted archive for research data. In addition, dedicated funds are made available for researchers wishing to deposit their data into the archive. This being the case, we thought researchers may already receive adequate data management support and no additional resources were required.
To test this, we conducted a survey among the research community at TU Delft. To our surprise, the results indicated that despite all the services and tools already available to support researchers in data management and sharing activities, their practices needed improvement. For example, only around 40% of researchers at TU Delft backed up their data automatically. This was striking, given the fact that all data storage solutions offered by TU Delft ICT are automatically backed up. Responses to open questions provided some explanation for this:
- “People don’t tell us anything, we don’t know the options, we just do it ourselves.”
- “I think data management support, if it exists, is not well-known among the researchers.”
- “I think I miss out on a lot of possibilities within the university that I have not heard of. There is too much sparsely distributed information available and one needs to search for highly specific terminology to find manuals.”
It turns out, again, that the main obstacles preventing people from using existing institutional tools and infrastructure are cultural – data management is not embedded in researchers’ everyday practice.
How to change data management culture?
We believe the best way to help researchers improve data management practices is to invest in people. We have therefore initiated the Data Stewardship project at TU Delft. We appointed dedicated, subject-specific data stewards in each faculty at TU Delft. To ensure the support offered by the data stewards is relevant and specific to the actual problems encountered by researchers, data stewards have (at least) a PhD qualification (or equivalent) in a subject area relevant to the faculty. We also reasoned that it was preferable to hire data stewards with a research background, as this allows them to better relate to researchers and their various pain points as they are likely to have similar experiences from their own research practice.
Vision for data stewardship
There are two main principles of this project. Crucially, the research must stay central. Data stewards are not there to educate researchers on how to do research, but to understand their research processes and workflows and help identify small, incremental improvements in their daily data management practices.
Consequently, data stewards act as consultants, not as police (the objective of the project is to improve cultures, not compliance). The main role of the data stewards is to talk with researchers: to act as the first contact point for any data-related questions researchers might have (be it storage solutions, tools for data management, data archiving options, data management plans, advice on data sharing, budgeting for data management in grant proposals, etc.).
Data stewards should be able to answer around 80% of questions. For the remaining 20%, they ask internal or external experts for advice. But most importantly, researchers no longer need to wonder where to look for answers or who to speak with – they have a dedicated, local contact point for any questions they might have.
Data Champions are leading the way
So has the cultural change happened? This is, and most probably always be, a work in progress. However, allowing data stewards to get to know their research communities has already had a major positive effect. They were able to identify researchers who are particularly interested in data management and sharing issues. Inspired by the University of Cambridge initiative, we asked these researchers if they would like to become Data Champions – local advocates for good data management and sharing practices. To our surprise, more than 20 researchers have already volunteered as Data Champions, and this number is steadily growing. Having Data Champions teaming up with the data stewards allows for the incorporation of peer-to-peer learning strategies into our data management programme and also offers the possibility to create tailored data management workflows, specific to individual research groups.
Technology or people?
Our case at TU Delft might be quite special, as we were privileged to already have the infrastructure and tools in place which allowed us to focus our resources on investing in the right people. At other institutions circumstances may be different. Nonetheless, it’s always worth keeping in mind that even the best tools and infrastructures, without the right people to support them (and to communicate about them!), may fail to be widely adopted by the research community.
A PDF (and citable) version of this document is available via Zenodo. DOI: https://doi.org/10.5281/zenodo.1316938
SURF – as the national collaborative ICT organisation for the Dutch education and research environment – has joined the effort to support the FAIR data principles implementation and application in the Netherlands. The first product of this endeavor is a report of the six case studies that were conducted by Melanie Imming. The interviewed institutions span from support services of various universities, over to research institutions, and ending with the national health care institute.
The purpose of this report is to build and share expertise on the implementation of FAIR data policy in the Netherlands. The six use cases included in this report describe developments in FAIR data, and different approaches taken, within different domains. For SURF, it is important to gain a better picture of the best way to support researchers who want to make their data FAIR. – Melanie Imming. (2018, April 23). FAIR Data Advanced Use Cases: from principles to practice in the Netherlands (Version Final). Zenodo. http://doi.org/10.5281/zenodo.1250535
On 22nd May 2018 the report was officially launched, accompanied by a lovely workshop in the SURF venue in Utrecht.
This week, we are presenting at the International Digital Curation Conference 2018 in Barcelona.
This presentation can be downloaded from Zenodo.
The pre-print version of the practice paper accepted for the conference is available on OSF Preprints.
Title: From Passive to Active, From Generic to Focused: How Can an Institutional Data Archive Remain Relevant in a Rapidly Evolving Landscape?
Authors: Maria J. Cruz, Jasmin K. Böhmer, Egbert Gramsbergen, Marta Teperek, Madeleine de Smaele, Alastair Dunning
Abstract: Founded in 2008 as an initiative of the libraries of three of the four technical universities in the Netherlands, the 4TU.Centre for Research Data (4TU.Research Data) provides since 2010 a fully operational, cross-institutional, long-term archive that stores data from all subjects in applied sciences and engineering. Presently, over 90% of the data in the archive is geoscientific data coded in netCDF (Network Common Data Form) – a data format and data model that, although generic, is mostly used in climate, ocean and atmospheric sciences. In this practice paper, we explore the question of how 4TU.Research Data can stay relevant and forward-looking in a rapidly evolving research data management landscape. In particular, we describe the motivation behind this question and how we propose to address it.
Slides for presentation including active links at Open Science Days 2017 in Berlin hosted by Max Planck Digital Library (MPDL) on 17th October 2017.
On a rainy Thursday a couple of weeks ago, 14th September 2017, the national Platform for eScience/Data Research (ePLAN) had invited to exchange the latest news about FAIR dataat the eScience Centre in the Amsterdam Science Park. Close to 30 people from different Dutch universities, research support services, research institutions, and ventures followed the workshop appeal. Thus the recaps of Wilco Hazelger (ePlan), Barend Mons (GoFAIR), Peter Doorn (DANS) and Gareth O’Neill (Eurodoc) were received by ears of a quite diverse group of attendees.
For me this event was a good chance to refresh my knowledge about current FAIR processes here in the Netherlands, and to receive some confirmations or contradictions of my interpretation of the FAIR data principles. After nearly half a year of absence on my own FAIR project at TU Delft library, I hoped to get some inspiration out of the conversation with likeminded people on how to implement these principles in everyday research (support) life.
Before I shortly rehash the discussion and consensus of the break out session, I want to share some brain teaser I’ve noted down of the key speakers insights:
Aspects of FAIRness by Barend Mons
∴ Much to my relief Barend confirmed that FAIR is nothing measured in binary but rather a spectrum.
∴ TCP / IPv4 protocols are the current bottle necks of the hourglass design of the soon to be ‘internet of fair’.
∴ Interoperability never can exist without a purpose. Therefore rather assess it in that way: interoperability with what and not just interoperability on its own.
∴ The origin of FAIR emphasizes the machine action-ability of (meta)data.
∴ When talking about a FAIRness evaluation, declare the assessed matter as “re-useless” rather than calling it “unfair”.
∴ The goal of FAIR is R. However, technically I is the key thing of FAIR. “I without F+A makes no sense for R”.
∴ FAIR data can be achieved with FAIR metadata and closed data files.
∴ New perspective on data sharing: establish data visitation instead of data sharing, i.e. your workflow visits the data instead of you receiving data files that were sent to you. To me that is a thrilling shift of perspective: forget sending data files directly via whatever channel, rather establish a platform where the interested people are redirected to landing page of the data-set. Don’t get me wrong, of course this is what we are doing with our archive already. But I also still hear researcher saying, that they share their data via email by request.
∴ A new GoFAIR website is currently under construction and will be launched till end of the year, with a complete makeover and more functionalities as future forming European platform for FAIR work. I am intrigued and will keep an eye out for its launch!
The I in FAIR by Peter Doorn
∴ DANS has 2.6 million pictures as top data category (65% of the archive). Therefore, interoperability of images needs to be tackled. Unfortunately, interoperability of images is hard to determine.
∴ Side note: 4TU.Centre for Research Data has nearly 6.500 datasets in netCDF format as top data category (>90% of the archive). Perhaps this data-format has more advantages in terms of interoperability? Want to know more about our current work with netCDF? Leave a bookmark for the category on this blog.
∴ Barend’s remark about the image interoperability threshold mentioned by Peter: the rich metadata of images makes the interoperability of pictures possible.
∴ The self-assessment tool for FAIR data created by DANS is also connected to the FAIR metrics group.
The Open Science Survey 2017 by Gareth O’Neill
∴ My conclusion of the open science survey by Gareth: the need to improve awareness about open science /access /data /education etc. and the already existing support services will highly likely never decrease.
∴ But who is responsible for increasing the awareness? The university board? The faculties? The research support staff from e.g. the library?
∴ ‘Research visibility’ seems to be the main driver to comply to open science.
∴ The final report and survey analysis will be published in the next 3-6 months. Keep an eye on the Eurodoc website.
A few bits from the group session
∴ What’s the incentive to re-use existing data (where the origin might be untrustworthy) vs. regenerating the data oneself?
∴ Is metadata sufficient for reusability or is there a need for linked data?
∴ Incentives for researcher to create FAIR data needs to be improved asap.
∴ Better distinctions between “data stewards”, “data managers”, “data scientist”; and improved appreciation for researchers doing these jobs.
∴ Biggest nut to crack: what does FAIR data mean in terms of data quality? The data-set (metadata, documentation, and data files) could be perfectly fair, but the actual content of the data files are rubbish. My thoughts on this: first establish certified and trusted data archive / repository that enables FAIR data-sets; secondly gather critical mass of FAIR research data; lastly: enable peer-review of these data-sets to get an actual evaluation of the data quality.
Current FAIR work in the Netherlands, September 2017
∴ The Dutch Tech Centre for Life Science (DTLS) in Utrecht provides a lot of valuable information about FAIR in the life science context. In more detail, DTLS is also focussing on the semantic side of the FAIR data principles and how to implement them.
∴ Data Archiving and Network Services (DANS) in Den Haag are covering the work on these principles predominantly for the humanities and social sciences. One of their practical approaches is a FAIR data assessment tool with subsequent rating of each FAIR facet.
∴ TU Delft Library and 4TU.Centre for Research Data are concentrating on the FAIR data guidance for technological data. A first practical approach was the evaluation of Dutch data repositories and data archives to determine their maturity for the FAIR data demands by funding bodies. The consecutive work is investigating researchers sentiment of the FAIR data principles in relation to their research subject.
∴ In reaction to the individual development of research support and research institutions regarding the FAIR data principles, the European Commission enabled an Expert Group on FAIR data to review these evolvements and received feedback. The report produced by this Expert Group will be delivered first quarter 2018.
∴ The Conference of European Schools for advanced engineering education and research (CESAER) features a task force for Open Science, including a research data management group that also explores FAIR data.