Presentation given at TU Delft International Privacy Day, 28th January 2020. You can also download the PowerPoint slides here – Making Research Easier_ The Personal Research Data Workflow, Jan 2020
Presentation given by Alastair Dunning as part of OECD Global Science Forum: Digital Skills for Data Intensive Science Workshop, Cologne, 28th October 2019.
Presentation given in TU Delft Library, October 2019
On 27 and 28 February 2019, I attended the NSF FAIR Hackathon Workshop for Mathematics and the Physical Sciences research communities held in Alexandria, Virginia, USA. I travelled to the event at the invitation of the TU Delft Data Stewards and with the generous support of the Hackathon organisers, Natalie Meyers and Mike Hildreth from the University of Notre Dame.
Participants were encouraged to register and assemble as duos of researchers and/or students along with a data scientist and/or research data librarian. I was invited, as a data librarian with a research background in the physical sciences, to form a duo with Joseph Weston, a theoretical physicist by background and a scientific software developer at TU Delft, who is also one of the TU Delft Data Champions.
I presented about the Hackathon at the last TU Delft Data Champions meeting. The presentation is available via Zenodo. All the presentations and materials from the FAIR Hackathon are also publicly available. The FAIR data principles are defined and explained here. This blog post aims to offer some of my views and reflections on the workshop, as an addition to the presentation I gave at the Data Champions meeting on 21 May 2019.
The grand vision of FAIR
The workshop’s keynote presentation, given by George Strawn, was one the highlights of the event for me. His talk set clearly and authoritatively what is the vision behind FAIR and the challenges ahead. Strawn’s words still ring in my head: “FAIR data may bring a revolution on the same magnitude as the science revolution of the 17th century, by enabling reuse of all science outputs – not just publications.” Drawing parallels between the development of the internet and FAIR data, Strawn explained: “The internet solved the interoperability of heterogeneous networks problem. FAIR data’s aspiration is to solve the interoperability of heterogeneous data problem.” One computer (“the network is the computer”) was the result of the internet, one dataset will be FAIR’s achievement. FAIR data will be a core infrastructure as much as the internet is today.
“The internet solved the interoperability of heterogeneous networks problem. FAIR data’s aspiration is to solve the interoperability of heterogeneous data problem.” — George Strawn
Strawn warned that it isn’t going to be easy. The challenge of FAIR data is ten times harder to solve than that of the internet, intellectually but also with fewer resources. Strawn has strong credentials and track record in this matter. He was part of the team that transitioned the experimental ARPAnet (the precursor to today’s internet) into the global internet and he is part of the global efforts trying to bring about an Internet of FAIR Data and Services. In his view, “scientific revolution will come because of FAIR data, but likely not in a couple of years but in a couple decades.”
Researchers do not know about FAIR
Strawn referred mainly to technical and political challenges in his presentation. One of the challenges I encounter in my daily job as a research data community manager is not technical in nature but rather cultural and sociological: how to get researchers engaged with FAIR data and how to make them enthusiastic to join the road ahead? Many researchers are not aware of the FAIR principles, and those who are, do not always understand how, or are willing, to put the principles into practice. As reported in a recent news item in Nature Index, the 2018 State of Open Data report, published by Digital Science, found that just 15% of researchers were “familiar with FAIR principles”. Of the respondents to this survey who were familiar with FAIR, only about a third said that their data management practices were very compliant with the principles.
The workshop tried to address this particular challenge by bringing together researchers in the physical sciences, experts in data curation and data analysts, FAIR service providers and FAIR experts. About half of the participants were researchers, mainly in the areas of experimental high energy physics, chemistry, and materials science research, at different stages in their careers. Most were based in the US and funded by NSF.
These researchers were knowledgeable about data management and for the most part familiar with the FAIR principles. However, the answers to a questionnaire sent to all participants in preparation for the Hackathon, shows that even a very knowledgeable and interested group of participants, such as this one, struggled when answering detailed questions about the FAIR principles. For example, when asked specific questions about provenance metadata and ontologies and/or vocabularies, many respondents answered they didn’t know. As highlighted in the 2018 State of Open Data report, interoperability, and to a lesser extent re-usability, are the least understood of the the FAIR principles. Interoperability, in particular, is the one that causes most confusion.
There were many opportunities during the workshop to exchange ideas with the other participants and to learn from each other. There was much optimism and enthusiasm among the participants, but also some words of caution, especially from those who are trying to apply the FAIR principles in practice. The PubChem use case “Making Data Interoperable”, presented by Evan Bolton from the U.S. National Center for Biotechnology Information, was a case in point. It could be said, as noted by one of the participants, that the chemists “seem to really have their house in order” when it comes to metadata standards. Not all communities have such standards. However, when it comes to “teaching chemistry to computers” – or put in other words, to make it possible for datasets to be interrogated automatically, as intended by the FAIR principles – Bolton’s closing slide hit a more pessimistic note. “Annotating and FAIR-ifying scientific content can be difficult to navigate”, Bolton noted, and it can feel like chasing windmills. “Everything [is] a work in-progress” and “what you can do today may be different from tomorrow”.
What can individual researchers do?
If service providers, such as PubChem, are struggling, what are individual researchers to do? The best and most practical thing a researcher can do is to obtain a persistent identifier (e.g. a DOI) by uploading data to a trusted repository such as the 4TU.Centre for Research Data archive, hosted at TU Delft, or a more general archive such as Zenodo. This will make datasets at the very least Findable and Accessible. Zenodo conveniently lists on its website how it helps datasets comply with the FAIR principles. The 4TU.Centre for Research Data, and many other repositories, offer similar services when it comes to helping make data FAIR.
I am grateful to the University of Notre Dame for covering my travel costs to the MPS FAIR Hackathon. Special thanks to Natalie Meyers from the University of Notre Dame, and Marta Teperek, Yasemin Turkyilmaz-van der Velden and the TU Delft Data Stewards for making it possible for me to attend.
Maria Cruz is Community Manager Research Data Management at the VU Amsterdam.
Presentation given at TU Delft, 30th April 2019.
In the autumn of 2018 I took up the post of Data Steward in the Faculty of Industrial Design Engineering (IDE). As I am not a designer myself (my academic background is in historical literature), a significant portion of my time is dedicated to understanding how research is conducted in the realm of design, in particular trying to compose an overview of the types of data collected & used by designers, as well as how current and upcoming ideas & tools for research data management might potentially benefit their activities. This is no mean feat, and at present I cannot lay claim to more than a superficial understanding of the inner workings of design research. Through day-to-day data steward activities – attending events, reading papers and, perhaps most revealing, conversations with individual researchers, to name but a few – the landscape of design research data gradually becomes more intelligible to me. Cobbling together a coherent picture from these disparate sources requires a modicum of dedicated thought, so it was my good fortune to have recently been invited to an event arranged by the Faculty of Health, Ethics & Society (HES) at Maastricht University to present my experiences with design data thus far. Here we discussed and compared research data practices, and my preparation for this discussion afforded me the opportunity to reflect a bit on what research data means in the field of design, how design methodology relates to other academic fields and what kinds of challenges and opportunities exist for handling data and making it more impactful within the discipline and beyond.
The HES workshop, organized early in February of this year, was a forum for the group to discuss how their work and the data they produce intersect with some of the issues currently being debated within academic communities. A specific goal was to evaluate some of the arguments originating in the (at times competing) discourses of Open Science and personal privacy. Topics of discussion included how one should make sociological and healthcare data FAIR, especially given that the materials collected in HES are often predominantly qualitative in nature: personal interviews, ethnographic field notes, etc. Questions surrounding these topics are broadly applicable to some qualitative types of data in design as well, e.g. the extent to which data should be shared, in what format and under what conditions. The slides from my talk are available here: https://doi.org/10.5281/zenodo.2592280, and this blog post is intended to give them some context.
Research Data in Design
Maintaining an overview of the various types and amounts of data produced, analyzed and re-used within the Faculty of Industrial Design Engineering is a core aspect of my work as a data steward, but it is an ongoing challenge due to the heterogeneity of data used by designers and the quantity of different projects simultaneously active. Some designers do market research involving i.a. surveys, others take sensor readings and yet others develop algorithms for improving the manufacturing process. Each of these, along with the many other efforts within IDE, merit their own suite of questions and concerns when it comes to openness and privacy. The more we understand data types and usage in a field, the better we can judge the impact of present and future actions germane to research data – open access initiatives, legislation (esp. the GDPR), shifts in policy or practice, etc. More importantly, we can predict how we might turn some of these to our advantage.
For instance, TU Delft recently instituted a policy that all PhD students will be required to deposit the data underlying their thesis. For new PhD students, this will simply be a part of the process, one step among the many novel activities they experience on the way to earning their PhD. The real challenge lies with members of my faculty, the experienced researchers and teachers, as well as myself, who will have to identify the value in applying this new policy to research data in their field. To do this we must ask ourselves a series of questions. In addition to the aforementioned ‘what kind of data do we have and use?’, we must determine what should be made public as well as to what degree. Underlying all of this is a more fundamental question is, of course: how does sharing this information improve the production of knowledge in design and the fields which it touches? Some of these queries have clear answers, but the majority require further discussion and reflection.
Data Sharing and Data Publishing
One common question I receive in various forms is why designers and design researchers should share their data more widely than they presently do. In many instances I find this returns to the aforementioned issue of diverse types of data. For some designers who have a clear definition of what their data is, why it is collected and how others can use the data, such as the DINED anthropometrics group, a conversation on what data to share and how can be fairly straightforward. But what are the actual benefits of sharing design notes or other types of context-bound qualitative data? In the data management community we have a set of commonly purveyed answers to this query, and I have been trying to see how they match up to existing practice in design.
The first is idealistic, that publishing data will further the field, improve science through increased transparency, accuracy and integrity. Reactions to this argument often take the form of a slow nod, a sign I take to be cautious optimism (one which I happen to share). This outcome is difficult to measure. I was once asked who would be interested in seeing the transcripts of x number of their interviews. A legitimate question, and one with an inscrutable answer – it is difficult to tell who will use your data if they do not know it exists in the first place. A corollary to this is that we ask people to weigh the requisite time investment in making materials publishable (sometimes substantial if working with qualitative and/or sensitive data) against this unpredictable benefit. I believe we need more evidence of the positive impact of making design data FAIR, whether this be figures of dataset citations (currently a desideratum) or anecdotal evidence of new contacts and collaborations resulting from data sharing. Essentially this means a few interested volunteers willing to learn the tools, put in some extra time and test the waters. Will sharing my sensor data attract the attention of a new commercial partner? Will my model be taken up and improved upon by the community using the product or service we design? These are certainly possibilities, but at present they remain a future less vivid.
For PhD students and early career researchers I frequently posit the possibility that publishing data, making their publications Open Access and other actions to make their work more transparent could yield direct career opportunities. This ties into efforts promoting expansion in the interpretation of research assessment such as DORA. In my current position, I feel that designers may be ahead of the curve when it comes to evaluating research impact. In addition to research papers published in journals boasting various impact factors, desirable results from design projects include engagement tools, reflections from projects, and prototypes to name only a few. The weighting of these outputs is unclear to me when it comes to, e.g. obtaining a research position, but I suspect there is room here for alloting credit to demonstrations of open working. This is certainly the case in some fields where lectureship advertisements include explicit language supporting Open Science. As far as I have been able to determine (in my extremely casual browsing of job postings) this is not yet an element of the narrative designers weave to present their work to potential employers nor one sought by employers themselves. However, data publications as part of CVs attached to grant applications may indeed have some cache, as funding agencies such as the NWO and ZonMw presently stress the importance of such activities in the pursuit of maximizing investment returns in the grants they award. Here is an opportunity to serve the interests of many.
Food for Thought
One of my takeaway messages from these debates is that there is a need for a community – in design, in many research areas – an opportunity to convene and discuss issues and test some of the options being afforded or demanded under the umbrella of Open Science. Some design research shares a number of data issues in common with social sciences – questions of consent, of data collection and access – while others are more aligned with mathematics or medicine. Furthermore I’d be interested to hear whether any RDA outputs have an application in design, as well as whether repositories for design materials would be desirable and how they should be arranged. From my admittedly biased position, I believe there is much that designers stand to gain from picking up versioning tools or sharing data more widely, and I think designers’ methods and the iterative nature of design thinking, as I understand them, could in turn only benefit Open Science communities.