The Second TU Delft Data Champions meeting
Authors in alphabetical order: Maria Cruz, Santosh Ilamparuthi, Marta Teperek, Yasemin Turkyilmaz-van der Velden
On 21st of May, we had our second Data Champion meeting with interesting talks given by our Data Champions. The agenda and all the presentations can be found here. The blog post summarizing the first TU Delft Data Champion meeting can be found here.
Yasemin Turkyilmaz-van der Velden – Welcome Slides
The slides of this presentation can be found here.
The meeting started with an introductory presentation by Yasemin (Data Steward of 3mE and the Data Champion Community Manager). Yasemin highlighted that the TU Delft Data Champions community has already reached 44 members and she focussed on nice examples of Data Stewards and Data Champions working together. So far, two TU Delft Software Carpentry workshops have been organized which has been possible thanks to the collaborative efforts of the TU Delft Data Stewards (Kees den Heijer, Heather Andrews, Nicolas Dintzer and Santosh Ilamparuthi) and the Data Champions (Victor Koppejan, Raúl A. Ortiz Merino, Elena Zhebel, Susan Branchett and Marcel van den Broek). As a side note, both of these workshops, which are for free but require registration, have received very positive feedback and got fully booked within 24-48 hours, leading to a waiting list of ~70 TU Delft researchers. Due to this high demand, recently 5 Data Stewards and 1 Data Champion have been trained as certified Carpentries instructors and 2 more Data Champions will be trained this year, to enable organization of more Software Carpentry Workshops at TU Delft. The next TU Delft Software Carpentry workshop will take place on 8-9 July and got fully booked within 2 hours. Additionally, last Thursday of every month Coding Lunch & Data Crunch sessions take place, thanks to the collaborative efforts of the TU Delft Data Stewards and Data Champions. During these sessions, any TU Delft researcher is welcome to walk in with any data and/or code related questions. The TU Delft Software Carpentry workshops and Coding Lunch & Data Crunch sessions are coordinated by the Research Data Officer from 4TU.ResearchData, Paula Martinez Lavanchy, who is also leading TU Delft Research Data Management training development.
Inspired by the success of the TU Delft Software Carpentry workshops, two of our Data Champions, Raúl A. Ortiz Merino and Marcel van den Broek, who have also contributed to the previous Software Carpentry workshops, organized the first TU Delft Genomics Data Carpentry workshop together with the Data Stewards Santosh Ilamparuthi, Esther Plomp, Kees den Heijer and Nicolas Dintzer. A dedicated blog post can be found here.
Yasemin also reminded the Data Champion travel fund which could be used to go to conferences, workshops and meetings to learn new skills and/or present Data Champion activities. This fund has been so far offered to Raúl A. Ortiz Merino to cover his costs to join a recent Genomics Data Carpentry workshop held in Ghent, in Belgium, as a preparation to the first TU Delft Genomics Data Carpentry workshop and to Victor Koppejan for him to be able to join the Collaborations Workshop 2019 (CW19), Loughborough, UK, which he also mentioned during his talk.
Finally, the Data Champions were asked to contact the Data Steward Esther Plomp, if they are interested in getting involved in a TU Delft pilot for the premium version of protocols.io. Protocols.io is an online platform enabling managing and sharing of detailed experimental protocols while getting DOIs for them which enables protocols to be cited.
Victor Koppejan – Just do it! How embracing the Carpentries made me a happier researcher
The slides of this presentation can be found here.
Victor Koppejan’s presentation was on his personal experience as a researcher and how that experience influenced his views on good data management practices and also the need to share his expertise. He recounted his experiences with data loss when the bulk server he stored his data in was destroyed and it turned out that the server did not have adequate redundancy.
Victor elaborated on his own research with how he migrated from proprietary software to using open source tools to model fluid flow and reaction in a fluidized bed. The level to which this enabled easier sharing of research output and greater possibility for reproducibility were huge positives. His own experience with using better version control through Git and the trials with maintaining good code quality while working on a deadline offered a glimpse into the real process of research code development. His interactions with ICT were both illustrative of how good communication between researchers and research support and services was important but also about some of the mismatches that is experienced when researchers’ resource utilization pushes up against the services that are commonly required.
Victor also spoke about his motivations for involving himself in the software carpentry program and this resulted in him becoming an instructor. Having experienced the benefits of good code management he wishes to spread coding literacy and enabling researchers to become better programmers. In this vein he was both an instructor and a helper during the last Software Carpentry event that the data stewards helped organize. He hopes to continue this into the future as a research software engineer after completing his PhD.
He also relayed his experiences about attending his first unconference during the Collaborations Workshop 2019 and the collaborative atmosphere that is enabled. One of his takeaways from the event about having a more inclusive interactive space – the pacman principle of always leaving a space for a new person to join a circle of people who are in conversation – was practiced during the post event drinks.
Susan Branchett – Results of the TU Delft Research Data Storage survey
The slides of this presentation can be found here, with all the results publically available on GitHub.
TU Delft ICT set out a survey at the beginning of this year to come up with storage solutions that are safe as well as accessible. The survey asked questions on what storage solutions researchers are using, how they would like to use it and what their needs are. They received 625 responses. It became clear that ~75% of the respondents have to share their data with students. As students cannot use group drives or SharePoint, the project drive may provide a suitable solution. Over 75% required to share their data with researchers outside of TU Delft, which is currently not optimal but the results of the survey suggest that email and Surffilesender may already provide an adequate solution for sharing data with these researchers. More than half of the respondents need to share data outside of the Netherlands. Only 25% of the respondents did not work with personal data at all, with researchers from IDE and TPM usually (always?) working with personal data. The majority of the respondents (70%) would like to be able to track changes to data, especially in collaborations or in terms of software versions. In order to facilitate this need, TU Delft offers Subversion (link) which you can request through topdesk (link).
TU Delft does not have a contract with Dropbox or Google Drive to process the data, although they are now in the process to set up these agreements. This process has not started for OneDrive yet, as the survey showed that it is not used by many of the respondents.
Gary Steele – Open Science: What is it? Do we want it? (And if so, how do we get there?)
The slides of this presentation can be found here.
Earlier this year, at the NanoFront Winter Retreat 2019 in Courchevel, France, Gary Steele has given a presentation about Open Science and during this presentation he has ran a survey, with the aim of stimulating awareness, discussion, and dialogue about open science. During this session, he has repeated this presentation and the survey. The results of the survey can be found here.
Gary Steele has started his talk by addressing questions such as “What is Open Science?” and “What is Open Data?” and continued with “How does it work?” and “What does it look like?”. He commented on FAIR saying that it is a great philosophy but not clear what it means, also not something you can easily apply in the lab. This is why he and Anton Akhmerov aimed to provide some simple definitions about openly publishing data by developing an Open Data Policy for their Department Quantum Nanoscience. This policy is approved by their department and is now being adopted by the rest of the Applied Sciences Faculty. Gary Steele predicts that in the coming 5 years, open data will become the norm and therefore it is better to lead the dialogue and define the standards.
Gary Steele continued by asking the following questions to the audience, using Google Forms: “What type of open data would you be willing to publish?”, “How do you feel about open data? Good idea or bad idea? Do you think it’s worth it? How far should we go?” and “How do you feel about sharing the code you write? Would you be happy to do it? Why / why not?”.
Then Gary Steele moved to the second part of his talk during which he focused on the scholarly publishing system, listed differences between gold and green open access models and asked the audience “What percentage of your papers are on the arxiv or a similar preprint server?”. Then he highlighted that academic publishing is an extremely profitable business by showing that publishing giants such as Springer, Elsevier and Wiley have similar or even higher profit margins than Apple. Then he asked the audience “In the modern era, with the arxiv and google, what do we need journals for?” After this question, Gary Steele highlighted SciPost which is an “ArXiv overlay journal that coordinates peer review’’.
Then Gary Steele moved to his next question “Do you want to publish in high impact journals like Nature / Science / Cell / Baby Nature (or the equivalent in your field)?” and focused on motivations for publishing in these journals in the current academic recognition and rewards system. This was followed by his last question “Who do you want recognition from?”. He showed that this question was answered as “our peers” by most of the audience at the NanoFront Winter Retreat. To set this up, he proposed that a panel of scientists from the field could make a selection of the “key” arxiv postings each month. He finalized his presentation by asking whether through transparency and effort from people in the field, such a system could establish trust and recognition.
Juan Duran – “Dark Data and the Scientific Data Officer in the era of Big Data Science”
The publication presented during Juan’s talk can be found here.
Juan Duran’s presentation was on “Dark Data” and how to utilize it going forward, facilitated by a new position that was proposed, that of “Scientific Data Officer”. Duran focussed on Dark data specifically at high performance computing (HPC) facilities which are data that researchers are unaware of and thus remains unused despite accessibility. The presentation was based on a publication, with research carried out on the High-Performance Computing Centre Stuttgart (HLRS).
Juan talked about the need to look at dark data as this was a source of information whose neglect meant that storage space was wasted, waste in money and computational time by way of repeating experiments or gathering data that is already available. He emphasised that dark data was not solely created by careless researchers but that research facilities putting the onus on researchers to curate, manage and store data leads to the problem.
To address the problems with dark data at HPCs, Juan Duran advocated for the creation of the position of scientific data officer (SDO). The SDO would assist with the large amount of dark data – estimated at a possible 691 TB for HLRS or just under three and a half percent of the total storage capacity of the facility – by promoting responsible use of data and implementing FAIR standards.
Juan Duran continued the conversation with the data champions and the data stewards post presentation with discussions about the necessity of storage of various kinds of data and the possibility of storing simulation conditions instead of the simulation output directly.
Maria Cruz – Talk about the MPS FAIR Hackathon Maria & Joseph Weston have joined
The slides of this presentation can be found here. Also, a blog post written by Maria Cruz can be found here. Below a shortened version of this blog post is provided.
During this presentation, Maria Cruz (Community Manager Research Data Management, VU Amsterdam) has shared her views and reflections from the NFS MPS FAIR Hackathon held in Alexandria, Virginia, USA, 27-28 February 2019. For this meeting, participants were encouraged to register and assemble as duos of researchers and/or students along with a data scientist and/or research data librarian and Maria Cruz formed a duo with the Data Champion Joseph Weston.
Maria Cruz highlighted that while there are great ambitions behind FAIR data, many researchers are not aware of the FAIR principles, and those who are, do not always understand how, or are willing, to put the principles into practice. As reported in a recent news item in Nature Index, the 2018 State of Open Data report, published by Digital Science, found that just 15% of researchers were “familiar with FAIR principles”. Of the respondents to this survey who were familiar with FAIR, only about a third said that their data management practices were very compliant with the principles.
The workshop tried to address this particular challenge by bringing together researchers in the physical sciences, experts in data curation and data analysts, FAIR service providers and FAIR experts. The researchers were knowledgeable about data management and for the most part familiar with the FAIR principles. However, the answers to a questionnaire sent to all participants in preparation for the Hackathon, shows that even a very knowledgeable and interested group of participants, such as this one, struggled when answering detailed questions about the FAIR principles. For example, when asked specific questions about provenance metadata and ontologies and/or vocabularies, many respondents answered they didn’t know. As highlighted in the 2018 State of Open Data report, interoperability, and to a lesser extent re-usability, are the least understood of the FAIR principles. Interoperability, in particular, is the one that causes most confusion.
Maria finalized her presentation by suggesting that the best and most practical thing a researcher can do is to obtain a persistent identifier (e.g. a DOI) by uploading data to a trusted repository such as the 4TU.Centre for Research Data archive, hosted at TU Delft, or a more general archive such as Zenodo. This will make datasets at the very least Findable and Accessible. Zenodo conveniently lists on its website how it helps datasets comply with the FAIR principles. The 4TU.Centre for Research Data, and many other repositories, offer similar services when it comes to helping make data FAIR.
Closing Remarks
Marta Teperek and Anke Versteeg did the closing remarks of the session. The slides of this presentation can be found here. Marta announced that on the 17th of June Connie Clare, a PhD student from the University of Nottingham, will start her internship at TU Delft. The main aim of her internship is to promote the activities of the Data Champions at TU Delft. She will be therefore interviewing Data Champions and writing short blog posts of her discussions.
Anke Versteeg thanked all Data Champions who participated in the consultations on the TU Delft Vision on Open Science. The vision will consist of five separate sections: Open Access, Open Publishing, FAIR data, Open Software and Open Education. There will also be three parallel themes which will be addressed in all five sections: Rewards and Recognition, Intellectual Property & Transparent Working with Industry, and Skills.
Finally, Anke mentioned interest in the Data Champions initiative expressed by the Executive Board of TU Delft and their willingness to join the meeting in Autumn, and with that she invited everyone for networking drinks.
2 comments