Event report: Towards cultural change in data management – data stewardship in practice

Wellcome slides

This blog post is written by Martin Donnelly from the University of Edinburgh and was originally published on the Software Sustainability Institute blog.



Late last month, I took a day trip to the Netherlands to attend an event at TU Delft entitled “Towards cultural change in data management – data stewardship in practice”. My Software Sustainability Institute Fellowship application “pitch” last year had been based around building bridges and sharing strategies and lessons between advocacy approaches for data and software management, and encouraging more holistic approaches to managing (and simply thinking about) research outputs in general. When I signed up for the event I expected it to focus exclusively on research data, but upon arrival at the venue (after a distressingly early start, and a power-walk from the train station along the canal) I was pleasantly surprised to find that one of the post-lunch breakout sessions was on the topic of software reproducibility, so I quickly signed up for that one.

I made it in to the main auditorium just in time to hear TU Delft’s Head of Research Data Services, Alastair Dunning, welcome us to the event. Alastair is a well-known face in the UK, hailing originally from Scotland and having worked at Jisc prior to his move across the North Sea. He noted the difference between managed and Open research data, a distinction that translates to research software too, and noted the risk of geographic imbalance between countries which are able to leverage openness to their advantage while simultaneously coping with the costs involved – we should not assume that our northern European privilege is mirrored all around the globe.

Danny

Danny Kingsley during her keynote presentation

The first keynote came from Danny Kingsley, Deputy Director of Scholarly Communication and Research Services at the University of Cambridge, whom I also know from a Research Data Management Forum event I organised last year in London. Danny’s theme was the role of research data management in demonstrating academic integrity, quality and credibility in an echo-chamber/social media world where deep, scholarly expertise itself is becoming (largely baselessly) distrusted. Obviously as more and more research depends upon software driven processing, what’s good for data is just as important for code when it comes to being able to reproduce or replicate research conclusions; an area currently in crisis, according to at least one high profile survey. One of Danny’s proposed solutions to this problem is to distribute and reward dissemination across the whole research lifecycle, not only attaching credit and recognition/respect to traditional publications, but also to datasets, code and other types of outputs.

Audience

Questions from the audience

After a much-appreciated coffee break, Marta Teperek introduced TU Delft’s Vision for data stewardship, which, again, has repercussions and relevance beyond just data. The broad theme of “Openness”, for example, is one of the four major principles in current TU Delft strategic plan, indicating the degree of institutional support it has as an underpinning philosophy. Marta was keen to emphasise that the cohort of data stewards which Delft have recently hired are intended to be consultants, not police! Their aim is to shift scholarly culture, not to check or enforce compliance, and the effectiveness of their approach is being measured by regular surveys. It will be interesting to see how they have got on in a year or two years’ time: already they are looking to expand from one data steward per faculty to one per department.

There followed a number of case studies from the Delft data stewards themselves. My main takeaways from these were the importance of mixing top-down and bottom-up approaches (culture change has to be driven from the grassroots, but via initiatives funded by the budget holders at the top), and the importance of driving up engagement and making people care about these issues.

Data StewardsData Stewards answering questions from the audience

After lunch we heard from a couple of other European universities. From Martine Pronk, we learned that Utrecht University stripes its research support across multiple units and services, including library and the academic departments themselves, in order to address institutional, departmental, and operational needs and priorities. In common with the majority of UK universities, Utrecht’s library is main driving and coordination force, with specific responsibility for research data management being part of the Research IT programme. From Stockholm University’s Joakim Philipson we heard about the Swedish context, which again seemed similar to the UK’s development path and indeed my own home institution’s. Sweden now has a national data services consortium (the SND), analogous to the DCC in the UK, and Stockholm, like Edinburgh, was the first university in its country to have a dedicated RDM policy.

We then moved into our breakout groups, in my case the one titled “Software reproducibility – how to put it into practice?”, which had a strange gender distribution with the coordinators all female, but the other participants all male. One of the coordinators noted that this reminded her of being an Engineering undergraduate again. We began by exploring our own roles and levels of experience/understanding of research software. The group comprised a mixture of researchers, software engineers, data stewards and ‘other’ (I fell into this last category), and in terms of hands-on experience with research software roughly two-thirds of participants were actively developing software, and another third used it. Participants came from a broad range of research backgrounds, as well as a smaller number of research support people such as myself. We then voted on how serious we felt the aforementioned reproducibility crisis actually was, with a two-thirds/one-third split between “crisis” and “what-crisis?” We explored the types of issues that come to mind when we think about software preservation, with the most popular responses being terms such as “open source”, “GitHub” and “workflows”. We then moved on to the main business of the group, which was to consider a recent article by Hut, van de Giesen and Drost. In a nutshell, this says that archiving code and data is not sufficient to enable reproducibility, therefore collaboration with dedicated Research Software Engineers (RSEs) should be encouraged and facilitated. We broke into smaller groups to discuss this from our various standpoints, and presented back in the room. The various notes and pitches are more detailed than this blog post requires, but those interested can check out the collaboratively-authored Google Doc to see what we came up with. The breakout session will also be written up as a blog post and an IEEE proposal, so keep an eye out for that.

software workshop
Workshop on Software Reproducibility

After returning to the main auditorium for reports from each of the groups, including an interesting-looking one from my friend and colleague Marjan Grootveld on “Why Is This A Good Data Management Plan?”, the afternoon concluded with two more keynote presentations. First up, Kim Huijpen from VSNU (the Association of Universities in the Netherlands) spoke about “Giving scientists more of the recognition they deserve”, followed by Ingeborg Verheul of LCRDM (the Dutch national coordination point for research data management), whose presentation was titled “Data Stewardship? Meet your peers!” Both of these national viewpoints were very interesting from my current perspective as a member of a nationally-oriented organisation. From my coming perspective as manager of an institutional support service – I’m in the process of changing roles at the moment – Kim’s emphasis on Team Science struck a chord, and relates to what we’re always saying about research data: it’s a hybrid activity, and takes a village to raise a child, etc. Ingeborg spoke about the dynamics involved between institutional and national level initiatives, and emphasised the importance of feeling like part of a community network, with resources and support which can be drawn upon as needed.

Closing the event, TU Delft Library Director Wilma van Wezenbeek underlined the necessity of good data management in enabling reproducible research, just as the breakout group emphasised the necessity of software preservation, and in effect confirming a view of mine that has been developing recently: that boundaries between managing data and managing software (or other types of research output) are often artificially created, and not always helpful. We need to enable and support more holistic approaches to this, acting in sympathy and harmony with actual research practices. (We also need to put our money where our mouth is, and fund it!)

After all that there was just enough time for a quick beer in downtown Delft before catching the train and plane back to Edinburgh. Many thanks to TU Delft for hosting a most enjoyable and interesting event, and to the Software Sustainability Institute whose support covered the costs of my attendance.

Several resources from the event are now available:

Launch of the ‘FAIR Data Advanced Use Cases’ report by SURF

surffairbanner

SURF – as the national collaborative ICT organisation for the Dutch education and research environment – has joined the effort to support the FAIR data principles implementation and application in the Netherlands.  The first product of this endeavor is a report of the six case studies that were conducted by Melanie Imming.  The interviewed institutions span from support services of various universities, over to research institutions, and ending with the national health care institute.

The purpose of this report is to build and share expertise on the implementation of FAIR data policy in the Netherlands. The six use cases included in this report describe developments in FAIR data, and different approaches taken, within different domains. For SURF, it is important to gain a better picture of the best way to support researchers who want to make their data FAIR.  – Melanie Imming. (2018, April 23). FAIR Data Advanced Use Cases: from principles to practice in the Netherlands (Version Final). Zenodo. http://doi.org/10.5281/zenodo.1250535

On 22nd May 2018 the report was officially launched, accompanied by a lovely workshop in the SURF venue in Utrecht.

Finding Dynamic Statistics about 4TU.ResearchData

Search queries on the 4TU.ResearchData browser allow for statistics to be uncovered

Some examples

How many archived datasets come from TU Delft

Number of datasets from U Twente

Number of datasets from TU Eindhoven

Number of datasets from Wageningen University 

Number of datasets deposited in a particular year. For example (how many datasets were deposited in 20162017, 2018?)

Why is this a good Data Management Plan?

pencil-1891732_960_720

This blog post reports from a workshop session led by Marjan Grootveld and Ellen Leenarts from DANS. The workshop was part of a larger event “Towards cultural change in data management – data stewardship in practice” organised by TU Delft Library on 24th of May 2018.

This blog post was written by Marjan Grootveld from DANS it was published before on the OpenAIRE blog.


It’s not just colonel Hannibal Smith, who loves it when a plan comes together. Don’t we all? On a more serious note, this also holds for Data Management Plans or DMPs. In a DMP a researcher or research team describes what data goes into a project (reuse) and comes out of it (potential reuse), How the team takes care of the data, and Who is allowed to do What with the data When.

Just like a project plan a DMP undergoes a reviewing process. Often, however, researchers share their draft version and questions with research support staff and data stewards (see the results of this survey by OpenAIRE and the FAIR Data Expert Group). About twenty data stewards shared their review and pre-view experiences in a lively session at the Technical University Delft on May 24th. During the day the organisers and speakers highlighted various aspects of data stewardship with a welcome focus on practice situations, especially in the break-out sessions. (When the presentations are available we will add a link to this blog post.)

In the session called “Why is this a good Data Management Plan?” Marjan Grootveld (DANS, OpenAIRE) and Ellen Leenarts (DANS, EOSC-hub) presented text samples taken from DMPs. By raising their hands – or not! – and subsequent discussion the participants gave their view on the quality of the sample DMP texts. For instance, the majority gave a thumbs-up for “A brief description of each dataset is provided in table 2, including the data source, file formats and estimated volume to plan for storage and sharing”. In contrast, the quote “Both the collected and the generated data, anonymised or fictional, are not envisioned to be made openly accessible.” drew a good laugh and the thumbs went down. Similarly, the information that the length of time for which the data will remain re-usable “may vary for the type of data and <is> difficult to specify at this stage of the project” was found not acceptable; the plan should a least explain why it is difficult, and how and when the project team nevertheless will provide a specific answer. And is it really more difficult than for other projects, whose DMPs do provide this information?

Although it can be hard to be specific in the first version of a DMP, it’s essential to demonstrate that you know what Data Management is about, and that you will deliver FAIR and maximally Open data. Does the DMP, for instance, tell what kind of metadata and documentation will be shared to provide the necessary context for others to interpret the data correctly? Does it distinguish between storing the data during the project and sustainably archiving them afterwards? (Yes, we had a sample text neatly describing the file formats during the data processing stage versus the file formats for sharing and preservation.)

There was consensus in the group on the quality of most of the quotes. Where opinions differed, this had mainly to do with the fact that the quotes were brief and therefore open to more lenient or more picky interpretation. In other cases, a sample text had both positive and negative aspects. For instance, “The source code will be released under an open source licensing scheme, whenever IPR of the partners is not infringed.” was found rather hedging (“whenever”) and unspecific (which licensing scheme?), but the plan to make also source code available is good; too often this seems to be forgotten, when the notion of “data” is understood in a limited way.

The session participants agreed that a plan with many phrases like “where suitable/ where appropriate/ should/ possibly” is too vague and doesn’t inspire much trust. On the other hand, information on who is responsible for particular data management activities is valuable, and so is planning like “The work package leaders will evaluate and update the DMP at least in months 12, 24 and 36”. Reviewers prefer explicit information and commitment to good intentions – which may be something to keep in mind for your “Open A-Team“.