Tagged: RDM

Survey report on long-term impact of Carpentry workshops at TU Delft

Written by: Meta Keijzer-de Ruijter

Image ‘ Survey’ by Nick Youngson CC BY-SA 3.0 Alpha Stock Images

Introduction

Digitalisation offers a lot of opportunities for researchers to boost their research: automation of repetitive tasks, collection of more (complex) data, increase in computing capacity, re-use of data and code plus new ways to collaborate with colleagues within and outside of the research group.

At TU Delft, Library Research Data Services and ICT/Innovation are working closely together to define a vision and strategy to improve the support for developing digital competencies amongst the research community. In this process, we are looking at workshops and activities that are already being organised to assess their impact and possible improvements. With that aim in mind, a survey was developed to be sent to former participants of Data and Software Carpentry workshops. Our survey was based on the standard questions of the long-term impact survey that The Carpentries use, but we added questions on the actual usage of the tools taught and additional learning needs on the topics that were addressed in the workshops.

The survey was sent to 315 former participants of Data and Software Carpentry workshops held by TU Delft between 2018 and the summer of 2021. We received 45 responses. In addition, the feedback from both instructors/helpers and participants over the years was considered in this report.

Survey Outcomes

Demographics and Scope

Most of the respondents were PhD candidates (37 out 45) that attended the workshops during their first or second year (21 out of 37). Respondents came from all faculties, but the majority from Applied Sciences (8), Civil Engineering and Geosciences (6), Architecture and the Built Environment (6) and Mechanical, Maritime and Materials Engineering (6).

Most respondents provided feedback on the software carpentry workshops (30 of 45).

Main Takeaways from Carpentries

The top three takeaways from the carpentry workshops, according to the respondents, were:

  • Using programming languages including R, Python, or the Command Line to automate repetitive tasks (16)
  • Making code reusable (12)
  • Using scripts and queries to manage large data sets (10)

The workshop(s) helped them to improve their efficiency, data analysis and data management skills.

It should be noted that eight respondents wrote that they are not using any of the tools. The main reason for that was that they use alternative tools that better match their practice or that are easier to use on an ad hoc basis.

Relevance of the content

Software Carpentry

Most of the respondents found programming with Python (partly or very) relevant. But there is a lot of debate on the level of Python being taught. Although it was communicated that the level is very basic beginner, a recurrent note (also in the surveys right after the Software Carpentry Workshops) was being made that the topics addressed are too basic and the pace of the course too slow for those who want to refresh. The reason they still attended was that in order to get the full Graduate School credits for this course, everyone should attend all parts of the carpentry. In an online setting this seemed to result in less engagement in the breakout rooms.

Figure 1 Response to the question ‘How relevant was the content of the Software Carpentry for your research’? (n=22).

Data Carpentry Genomics

In the Genomics carpentry the command line part was found most relevant to the participants, closely followed by data wrangling.

Figure 2 Response to the question ‘How relevant was the content of Data Carpentry Genomics for your research?'(n=3)

Data Carpentry Social Sciences

In this data carpentry the data analysis and visualization with R is most valued. The relevance (and eventual) use of OpenRefine was questioned. None of the respondents (n=5) in the long-term survey reported using OpenRefine in their current work.

Figure 3 Response to the question ‘How relevant was the content of Data Carpentry Social Sciences for your research?'(n=5)

Use of tools that are taught in the carpentries

In the survey, respondents were asked about their usage of the tools that were taught in the workshops. Did they start using the tools and to what extent, or did they quit using the tools or never even started using them. On each tool they were also asked if and how they would like to advance their skills.

Unix

The use of Unix was picked up by a good number of respondents, those that didn’t use Unix mentioned that it did not serve their purpose. Two respondents stated that they did not feel confident enough to use it. Additional materials for self-paced learning or help with how to apply it in their work might help them get started.

Python

It appeared that the respondents that already used Python increased their usage from occasional to more frequent. Those who reported not using Python stated that they use an alternative tool.
Both frequent and occasional Python-users would like to improve their skills either by attending more advanced workshops, self-paced materials or talking with peers. Examples of topics they are interested in include data analysis, (big) data management, packaging, Jupyter Notebooks, libraries (including scikit and numba), handling large databases and automating tasks.

R

Only three people responded to the usage of R question, one of them started using R from that moment on, the other two report that they were using a different tool to do their data management. The respondent that used R said that additional training, learning materials and talking to peers would be appreciated to improve the skills.

Git

Amongst the respondents the use of Git increased. Those who didn’t use the tool (7 respondents) said either that it didn’t fit their purpose (4 respondents) or that they did not feel confident enough to start using it (3 respondents). This last group would like access to additional learning materials, guided practice or consultation on how to apply Git in their own work.
Most of the Git users would love to improve their skills in various ways. Topics that are mentioned are dealing with complex files, comparing different versions of code, collaborative use of Git and building more confidence using Git.

OpenRefine

None of the respondents (n=5) used OpenRefine after the workshop. Three of them stated that the tool doesn’t fit their purpose, while two respondents would like to have additional materials or consultation on how to apply it in their work.

Spreadsheets

The use of spreadsheets remained about the same after the workshop. And the respondents did not feel the need to improve their skills.

Cloud Computing

None of the respondents (n=3) had been using this prior to the workshop. After the workshop two of them started using it and they would like to learn more. No specific topics were mentioned.
The respondent that did not start using cloud computing stated that it did not fit his/her purpose.

Additional training on research data management or software development

Respondents could tick multiple boxes in this question about additional training needs. The top five training topics were:

  • Basic programming – 14 votes
  • Modular code development – 13 votes
  • RDM workflows for specific data types – 13 votes
  • Software versioning, documentation and sharing – 11 votes
  • Software testing – 10 votes

Impact of attending the carpentry workshops

Most respondents agreed that attending the carpentries had an impact on their work. They gained confidence in working with data, made their analysis more reproducible, were motivated to seek more knowledge about the tools, advanced their career, improved their research productivity and improved their coding practices.
The only topic where there was a wide spread of (dis)agreement was on the impact of the carpentry workshops on the professional recognition of their work.

Recommendation of the Carpentry workshops

Most respondents recommended or would recommend the carpentry workshops to their colleagues.

Our Conclusions

From the survey we learned that the carpentries fulfill an important role in the introduction of tools that help members of the research community to carry out their work more efficiently, but additional means and support are necessary. The translation of the carpentry materials to the daily practice can be challenging, causing some to quit or not even start using the tools. In the discussions with observers, helpers, and instructors of the software carpentry workshop we identified the need to link the carpentry topics more clearly to the research workflow, in order to increase the understanding of the ‘why’ we teach these topics and tools.

And of course..

We (TU Delft) would like to thank all those who participated in the survey….

A Data Steward journey 

Author: Esther Plomp 

When I started as a Data Steward at the Faculty of Applied Sciences I attended the Essentials 4 Data Support course to learn more about research data management support. I was therefore happy to accept Eirini Zormpa’s invitation to discuss my Data Steward journey with the participants of the Essentials 4 Data Support course. Together with Zafer Öztürk from Twente University we shared our experiences during the data supporter panel on the 14th of April. This blog post is a summary of what I highlighted during the panel. 

The Essentials 4 Data Support course is an introductory course about how to support researchers with the management and sharing of research data. The course materials are helpful to gain an overview of what research data management support entails. The course also provided an opportunity to link up with peers (such as Lena Karvovskaya) and meet experts (such as Petra Overveld).

The role of a Data Steward visualised by The Turing Way. A Data Steward can facilitate the exchange of data, identify gaps in services, provide insights in best practices and point researchers to existing tools that they can use. This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence.

In December 2018 I started as the Data Steward at the Faculty of Applied Sciences. In my first couple of months I had the privilege to be peer-mentored by Yasemin Türkyilmaz-van der Velden, who showed me the ropes of data management support. Initially, I had to get to know the position, the workings of the faculty, my new colleagues and the researchers I was now supporting. 

In this first year I worked together with Yasemin on our Faculties Research Data Management Policies, based on the TU Delft Research Data Framework Policy. This was an arduous process, as we visited all departments of our faculties. The policy was discussed with many stakeholders, including PhD candidates. In the beginning of 2020 the Applied Sciences Policy on Research Data Management was officially released! Yasemin and I also worked together in the Electronic Lab Notebook pilot that took place at TU Delft resulting in TU Delft licences for RSpace and eLABjournal

In 2019 I followed a Software Carpentry Workshop to learn basic programming skills so I could better support researchers with any software support questions. I later took the train-the-trainer course and became a Carpentries Instructor myself. By being a Carpentries instructor I can teach basic programming lessons set up by the Carpentries community. With the pandemic we had to shift these trainings online, and I coordinated these workshops for a year (2020-2021). 

Over the years, I also increasingly supported researchers with Open Science questions. This is an aspect of the role that I very much enjoy and currently try to expand upon. My role differs somewhat from the other Data Stewards at TU Delft: we each have our own preferences and areas of expertise next to data support (such as software, ethics, or personal data). Another difference is my involvement in a side project focused on PhD duration. At TU Delft and at my faculty we try to reduce the amount of time that PhD candidates take to finish their PhD project. While the official duration for a Dutch PhD is four years, the majority of PhD candidates take much longer. This often means that they have to finish the project in their unpaid free time. As someone who has spent seven years on a PhD project I can say that finishing your PhD next to a full time job is no joke. 

As a Data Steward I’m also a connection point in the university network. This allows me to address researcher’s questions myself or to connect them with the expert that they need. 

  • My position at the Faculty itself allows for close contact with researchers. Before the pandemic I regularly hopped between their offices to help them with any questions. At the Faculty I’m embedded in the support team where I work together with the Faculty Graduate School and the Information Coordinator. I’m in regular contact with project officers, managers and researchers from all levels at the faculty. 
  • As part of the Data Stewards team I meet the other Data Stewards once a week (virtually) and we communicate through Slack/Teams. 
  • I’m also in contact with colleagues from the Library and the Digital Competence Center, either through collaborative work or because they are the experts that can address questions from researchers. 
  • Sometimes I reach out to central experts from the Human Research Ethics Committee, the Privacy Team and ICT Security when needed. 

Next to my activities as a Data Steward at TU Delft, I’m also involved in several other initiatives that are revolving around data and open research:

Visualisation of mentoring, where you help each other in taking a step up the ladder. Image by Esther Plomp, created for an Open Life Science Programme blogpost on mentoring.

Over the years I very much enjoyed writing blogs like this one, summarising my experiences of conferences, activities and learnings. 

I very much enjoy the Data Steward role, for various reasons: 

  • I support researchers in making their research more transparent.
  • I work with amazing colleagues and collaborators 
  • I meet new people interested in similar topics.
  • I can continuously develop and learn new skills.
  • I have a lot of autonomy over my working activities and schedule.

A lot of this is made possible by a supportive manager, and many individuals that I learned from along the way. 

“Create the world you want, and fill it with the opportunities that matter to you.”

– Alicia Keys

My tips for people just starting in a data support role:

  • Accept that things can take more time than you originally anticipated. Starting in a new role will take some time to adjust and achieving cultural change in university processes will not happen overnight. 
  • The downside of being able to create your own opportunities is that there might be a lot of things that you want to do. Even if everything seems important or fun to do, it could mean that you will end up with too much on your plate. Sometimes it is good to say no to shiny opportunities. 
  • In whatever you do I would recommend you to not take the road alone and seek out others to collaborate with, or ask feedback from. Exchanging expertise and experience will not only be more efficient, it will make the road more worthwhile to walk.

Publishing a Data Article

Author: Esther Plomp. Contributions from Vicky Hellon, Achintya Rao, Yanina Bellini Saibene, and Lora Armstrong. 

This blog post has been adapted from the Turing Way (The Turing Way Community, The Turing Way: A Handbook for Reproducible Data Science. 10.5281/zenodo.3233853) under a CC-BY 4.0 licence.

A Data Article (also known as a Data Paper/Note/Release, or Database article) is a publication that is focused on the description of a dataset. It uses the traditional journal article structure, but focuses on the data-collection and methodological aspects and generally not on the interpretation or discussion of the results. Data articles are in line with the FAIR principles, especially since most publishers will encourage you to share the data through a data repository. The benefit of a Data Article is that your output will be peer reviewed, something which is generally not the case for datasets that are archived on data repositories. It also facilitates recognition for datasets through research assessment procedures that are more traditionally focused on publication output. Publishing a data paper will therefore increase the visibility, credibility and usability of the data, as well as giving you credit as a data producer (The Turing Way Community 2022).

Options to publish a Data Article

Below you can find some journals that publish data articles. The costs information was collected in February 2022.

DisciplinePublisher/JournalCost estimateDeals for TU Delft
AllExperimental Results£775 / €928100% APC discount for TUD authors
AllScientific Data€1790 No, but eligible for the TU Delft OA Fund
AllData in BriefUSD 500 / €440100% APC discount for TUD authors
AllChina Scientific DataRMB 3000 / €416No
AllData Science Journal£650 / €778No, but eligible for the TU Delft OA Fund
AllDataCHF 1400100% APC discount for TUD authors
AllGigaScience€1089No, but eligible for the TU Delft OA Fund
AllGigabyteUSD 350 / €308No, but eligible for the TU Delft OA Fund
AllF1000ResearchUSD 800 / €704No, but eligible for the TU Delft OA Fund
ArchaeologyJournal of Open Archaeology Data£100 / €120No, but eligible for the TU Delft OA Fund
ArchaeologyOpen Quaternary£300 / €359No, but eligible for the TU Delft OA Fund
ChemistryJournal of Cheminformatics0NA
Computer ScienceJaiiovariableNo
Earth SciencesGeoscience Data Journal€1450No, but eligible for the TU Delft OA Fund
Earth SciencesEarth System Science Data0NA
Earth SciencesBig Earth Data€910No, but eligible for the TU Delft OA Fund

For more journals that offer the Data Article format you can see the ‘Data Article’ section in The Turing Way.

Research Data Management Survey 2019: the results are here !

Data. We advise researchers on how to manage theirs, but we are not averse to gathering and sharing some of our own.

The problem

The Data Stewardship Project started over 2 years ago. Much has been done, but given that our activities are scattered over eight TU Delft Faculties, and cover numerous issues (data ownership, data storage, data management plans, personal data and GDPR, programming training and support…), it is difficult to know how effective these activities are in supporting the researchers with their daily data management practice.

The solution

A survey! Over the summer (May-June) of 2019 we ran a survey, which was advertised in all faculties. We based this survey on the one that was run in 2017, but, as the challenges evolved, so did research data support services and therefore the content of the survey. This year, 937 staff members involved in research answered our call (from PhD students to full professors, including lecturers and lab assistants).

The results

The results can be browsed here!

The results are presented in terms of percentage of respondents in each faculty. If you put your mouse over each number, you will get a bit more information about who replied (total number of answers for that faculty, and across all faculties)

If you are curious, you can find the results of the previous survey here!

We also provide here a direct comparison between the results of the first and the second survey for some of the questions (not available for all faculties because not all of them participated in 2018 survey – pay attention when looking at the results!)

A few takeaways from the survey

We are happy to see that awareness of the FAIR principles has increased since last year (+8.3 percentage points on average across the faculties)

Data loss was (slightly) less frequent in this survey than reported during the previous one (-3 percentage point on average)

We also appreciate that Data stewards are getting known across all faculties, with an increase of 25 percentage points on average. 

It should be noted that, in the Faculty of Aerospace, awareness of data steward support increased from just 19% of the respondents to 63% this year – the largest increase in all faculties! (Great job Heather!)

The number of researchers relying on manual backup solution is quite high (25% to 53% of respondents). This implies a lot of tedious and error prone work. We intend to communicate more on backup solutions offered by the University.

Publishing data is still not common! Despite our efforts, only about 10% (8% to 12% depending on the faculty) of respondents indicated that they published data in the last year. We may use this information as an indicator of progress in the future. However, for now, this indicates that the culture change toward data sharing is a work in progress.

Next steps

As last year, we will now proceed with carefully analysing the data. This is essential for us to understand what are the key community needs and where should we focus our support efforts. Similarly to what we did with the results of 2017/2018 survey, we will aim to publish the outcomes openly, as a peer-reviewed article. So stay tuned!

We might call on you for yet another survey in a year or so… We are here to help you, so your opinion matters! 

Until that time, if you have any questions, please contact us at datastewards@tudelft.nl.

Acknowledgement

TABLEAU IS HARD! Many thanks to Bonnie van Huik (FIC for TPM and AS) for her help. Without her, this would have never been possible ☺

Related resources

  • Andrews Mancilla, H., Teperek, M., van Dijck, J., den Heijer, K., Eggermont, R., Plomp, E., Turkyilmaz-van der Velden, Y. and Kurapati, S., 2019. On a Quest for Cultural Change – Surveying Research Data Management Practices at Delft University of Technology. LIBER Quarterly, 29(1), pp.1–27. DOI: http://doi.org/10.18352/lq.10287