Monash University campus
Marta Teperek, the Data Stewardship Coordinator at TU Delft Library, is now in Australia: doing a short study trip to exchange practice with Australian colleagues, and subsequently taking part in the RDA Plenary 15 Meeting and associated events in Melbourne.
She aims to post short updates from her trip.
On the second day of my study trip, I have met with David Groenewegen and his colleagues from Monash University. I met with colleagues working at the central Monash University Library (David, Neil Dickson, Beth Pearson, Patrick Splawa-Neyman and Andrew Harrison) and with colleagues from the Monash e-Research Center (Anitha Kannan, Stephen Dart and Nicholas McPhee).
In summary: I wish I had more time to spend with colleagues from Monash University – truly impressive work, despite a very lean team and a massive university (spread around several campuses… and continents) – there is really a lot to learn from their approach to research data.
While I am sure that my poor head wasn’t able to contain everything, see below some key highlights and points which I think are most important for us at TU Delft (and any other institution which aspires to have world-leading data management services).
Data Fluency and the importance of perseverance
At Monash University the main problem when it comes to providing digital skills training is the size of the institution: not only ~80,000 students, ~8,000 staff and ~5,000 doctoral students (source: Wikipedia), but also multiple campuses to take care of. To address the demand for training, the Library is now the hub of the Data Fluency initiative. The aim of this initiative is to empower researchers with skills to transform their own research data and workflows as they see fit, and make them more effective.
Similarly to Melbourne’s Research Computing Services, the Data Fluency initiative has community engagement at its core. Courses are mostly based on Carpentry-style lessons and are taught by academics and professional staff. In addition, postgraduate students are often hired as professional instructors (this means a workforce of dozen people paid by the hour for their work). This allows the library to run courses every week. Last year they have trained impressive 1,200 people (this year’s target is even higher at 1,500!). The instructors are part of Monash University Community of Practice – people who attended previous workshops and want to continue learning through sharing, exchange, and regular interactions. In addition to frequent workshops, the community is sustained by afternoon data seminars (monthly) and Friday drop-in sessions (weekly).
Perseverance can be sometimes very important
At TU Delft we also tried running our Coding Lunch and Data Crunch sessions (monthly) as drop-in opportunities for any code and data questions. However, not many researchers attended these sessions and as a result, they are now on hold. Interestingly, colleagues from Monash explained to me that in their case perseverance was key. They also experienced sessions where no researchers turned up. Instead of stopping the sessions, they experimented with locations. Turned out that rooms with glass windows (so that passersby can see that the inside is not scary), proximity to cafeterias, and organising the sessions in locations which don’t require researchers to make a lot of effort to get to, worked best.
What is very interesting, the successful approach to training seems to have had an immense reputational gain for the Library. Other offices and departments (research platforms, graduate offices, others running specialist training) now partner with the Library, as the central hub where valuable, quality training on digital skills is made available to researchers across the campus.
Show me your data
We also discussed the need for engagement with schools and faculties. Patrick Splawa-Neyman’s briefly introduced his approach to raising data management awareness at the School of Public Health and Preventive Medicine. He decided to interview individual researchers and ask them about their research data. He had 25 questions and he interviewed 25 people, which not only helped him to map out areas where data management practices needed improvement but also raised awareness about the benefits of good data management within the school. As a result, Patrick was able to introduce REDCap as a tool for research data management. The tool collects basic information about the authors and about the data, and contains a built-in GDPR checklist. The tool and the workflow were endorsed by the School and used so successfully that now a separate instance of the tool (with specific customisations) was introduced specifically for PhD students.
BRIDGES for online research presence
4TU.Research Data will be soon moving to figshare for its repository platform software. Monash was one of the first universities worldwide to use the institutional instance of figshare. Thus, it was very timely that Beth, Andrew and Neil shared with me some useful thoughts about running BRIDGES (which is the name of Monash’s figshare instance) as a repository solution.
Community again: let researchers own it
When rolling out BRIDGES, Monash decided to give researchers full responsibility for the content they upload into the repository. Nobody checks what researchers upload into the repository. Everything gets a DOI and goes live straight away. Researchers appreciate this solution: there are no delays when it comes to publishing research outputs (so it is possible to get a DOI on a Friday night when preparing a last-minute grant proposal). In addition, workflows are very simple and intuitive, and researchers feel responsible for their own research outputs.
In addition, researchers have the freedom to decide which research outputs they perceive as valuable and worth sharing. BRIDGES is thus not a data repository. BRIDGES means collections of various research outputs. This not only better resonates with researchers from arts and humanities (to whom the word ‘data’ is sometimes a bit nebulous) but gives researchers the freedom to share all valuable components of their research.
This approach has been particularly valued by PhD students, who thanks to BRIDGES gain the possibility to promote their own work and establish their own academic profile in the way they want. Before they start publishing serious academic papers, they can already promote their conference presentations, posters, blog articles, reports and theses.
Hands off = opportunity to do other things
Simplifying the workflow and giving researchers the freedom to use BRIDGES as they want, not only empowered the research community and encouraged them to explore BRIDGES, but also freed a lot of time of many colleagues at the Library. The simplicity of the workflow and the intuitive upload process meant that suddenly there was no need for training and powerpoints. Also, researchers no longer needed to be convinced about the benefits of using BRIDGES – they simply saw them.
In addition, Monash’s staff have worked very closely with the figshare’s team to maximise the use of APIs and ensure integration of various workflows. One of the biggest successes in this field was the integration between the University’s system for digital submission of PhD theses and BRIDGES. This meant that theses could be automatically pushed to BRIDGES and that the cataloguing work was no longer necessary. This meant that the cataloguers’ team could join and strengthen the research data team instead, who needed to increase its capacity.
Other useful notes
There were many other useful tips which colleagues from Monash shared with me:
- Colleagues from Monash were overall very satisfied with their relationship with figshare – they were very positive about figshare’s responsiveness, regular (fortnightly) releases, lack of downtime, clear roadmaps, transparency about development goals
- APIs and integrations seem to be working very well:
- Figshare now enabled data export to Pure through dedicated Pure import files (users are matched by their unique university IDs)
- Colleagues from Monash are now considering options for importing user profile information from existing systems (such as Pure) into figshare
- The active data management space proved to be rather confusing and not very useful for active collaborations – the Library is now downplaying this functionality
MeRC and reverse DMPs
The Monash e-Research Centre (MeRC) has been launched to help researchers with the 21st-century digital research challenges. The MeRC supports researchers working with big data, complex analytics workflows, projects requiring streamlined interfaces between instruments and data, and other research projects, which needs for data management and processing are out of scope for corporate IT support. Researchers who wish to benefit from the support and facilities offered by the MeRC need to make an application in which they need to make a case for their project: explain the rationale for requesting bespoke support and outline the potential impact of their work.
How to avoid orphan data?
Providing bespoke solutions for complex digital research projects (often involving very large datasets – Monash currently stores 16 PB of data) means that good data governance is essential. Colleagues at Monash have learnt their lesson when decommissioning old storage and attempting to move data to a newer storage solution. It turned out that lots of data were ‘orphaned’ – researchers who produced them left the university, but their undocumented data was left behind on the server. It proved very difficult to find someone ready to decide what to do with such orphaned assets (makes me think of ‘bulk data’ situation at TU Delft 😉 – thankfully Nick agreed to share the write up of their decommissioning workflow).
To tackle the problem of ‘orphan’ data, Monash is currently working on a solution which could ensure that information about data creators, data provenance, governance, legal and ethical aspects, as well as information about the datasets themselves could be recorded together with the data files. The goal is to make this process as painless for researchers as possible thanks to system integrations.
When researcher comes to use a piece of equipment or a facility, they should be automatically recognised – the system should know what they are doing and why, and record appropriate metadata with this information, instead of asking researchers to endlessly re-enter their credentials or information about their project.
Monash colleagues refer to this vision as a ‘reverse DMP’ – a Data Management Plan which isn’t created for compliance purposes at the start of the project, but is continuously (and automatically) created by all the systems with which the researcher interacts (by recording metadata about all these interactions, data provenance, data storage, access etc.).
The value of data
Asking researchers to make a case for their projects when they apply for the MeRC’s support, together with the emphasis on better data governance (and the pain points of decommissioning the old storage system), brought people’s attention to the value of research data.
The introduction of the GDPR (Monash decided to comply with the GDPR, because one of their campuses is based in Italy, because they have a lot of international collaborations, and also because they saw GDPR as an opportunity to improve data management practices) was an important stick: the potential risks and liabilities associated with working with personal data, meant more care and responsibility when processing them. The offer of artificial intelligence tools to interrogate data and see completely new qualities provided an important carrot – data became the crucial asset for the development of new tools and insights. The need to make a case for the use of resources and the need to make decisions about data custodianship increased the feeling of ownership for research data among the research community.
As Anita explained, realising the value of data is a cultural change and it is a journey which takes time, but Monash clearly seem to be on the right path.
Other blog posts from my trip to Australia: