Do as you preach: results of 2017/2018 data management survey now published
Author: Jasper van Dijck, Data Steward at the Faculty of Electrical Engineering, Mathematics and Computer Science
Data. We advise researchers on how to manage theirs, but we are not averse to gathering and sharing some of our own.
As data stewards at TU Delft we were asked how we are going to keep track of our progress. After some discussion amongst ourselves, we concluded that we could count the number of researchers we helped with their data management (plans) and we would love to measure the number of data sets shared by TU Delft researchers in the public domain. Presumably, an increase in the former would lead to an increase in the latter. That did not seem quite enough though, since there is a time difference between our usual first point of contact with a researcher, at the beginning of a project, and the archiving/sharing of research data, usually at the end. We would have to be quite patient in finding out if our ventures had paid off since most research projects usually last a few years. So we felt we also needed to know how researchers were currently thinking about research data management (RDM) since one of the focus points of being a data steward at TU Delft is creating awareness and facilitating a change in culture.
That is why we set up a survey. Nothing fancy, but a simple survey asking researchers a couple of questions on their (attitude towards) research data management. If you are Dutch, this would be our infamous “nulmeting.” This will give us a starting point in measuring the change in attitude and behaviour over time (yes, we are planning to re-do the survey regularly): it will give us insight into what effect our presence and actions have had.
So, we would like to present to you the results of the TU Delft “Quantitative assessment of research data management practice 2017-2018,” or RDM survey 2017/2018 for short. This survey has been set up in cooperation with EPFL and Cambridge University. EPFL has also already finished their survey and Cambridge is currently completing their survey. Our goal is to cross-compare the results between the different institutions to see if we could learn from each other’s approach.
You can find a visualisation of the survey here: https://public.tableau.com/profile/jasper.van.dijck#!/vizhome/20180809TUDelftResearchDataManagementSurvey2017-2018/TUDelftRDMsurvey2017-2018.
And yes, the anonymised(!) data is in the public domain. You can find it here: https://zenodo.org/record/1164398. We practise what we preach.
Feel free to explore the results of the survey in the visualisation or download the data yourself. We will learn a lot from it and we are looking forward to finding out what has changed in the next survey.
If you are a researcher at TU Delft and you are reading this: we are counting on you to fill out the RDM survey 2018, somewhere near the end of the year. Until that time, if you have any questions, please contact us at firstname.lastname@example.org.
Invitation to collaborate
If you are interested in research data management and would like to do a similar survey at your institutions, you are most welcome to join TU Delft, EPFL and the University of Cambridge in our efforts. The survey itself is available on the Open Science Framework: https://osf.io/mz3fx/wiki/home/
So, just drop us an email at email@example.com
The data may have been more tidy…
I made some analysis and it seems that people answering no backup and those not knowing have similar data loss.
(code to be found at https://github.com/open-science-promoters/RDM_promotion into the outreach folder)
Here are some new plots: https://osf.io/f8v7j: I plotted the amount of data loss depending on different answers. Data loss is either around the mean of the time indicated in the survey, or 1 for all (second approach seem to give easier to interpret data, because there is so few data loss)
It seem to show:
1. !! there is not enough data to make statistically sounded analysis
2. Presence of a backup seem to have an effect, but the effect seem to be smaller than what I would have expected.
3. If people use or do not use a rdm tool does not make a difference, but if people are not sure whether a tool exists or not, then the amount of data loss seem to rise.