In his novel The Kingdom (link to WorldCat), which concerns the origin of the Biblical Gospels, French novelist Emmanuel Carrère makes a sudden detour into the difficulties of digital preservation:
In the more than twenty years that I’ve been using computers, everything I’ve written by hand is still in my possession, for example the notebooks I base this book on, while without exception everything I typed directly onto the screen has disappeared.
Of course, I made all kinds of backups, and backups of my backups, just like everyone said I should, but only the ones I printed out on paper have survived.
The others were saved on floppy disks, sticks, external drives—all supposedly much safer but ultimately obsolete one after the next—and are now as inaccessible as the tapes we listened to in our youth.(page 56)
In his book, The Universe: A Biography, Paul Murdin charts the history of the universe via the astronomers that have explored and researchers it.
In this quote here (page 55 of the book), he explains how the open data from the Sloan Digital Sky Survey (SDSS) helped nourish a community of researchers who could understand more about the development of the universe.
The Open Science Festival at Vrij Universiteit Amsterdam inspired many ideas. Here are a few that struck me (mainly from the strategic rather than research practice perspective)
- To do open science properly, we need to break down the boundary between research and research support staff. The new roles required fit in between these old boundaries (Read https://www.nature.com/articles/d41586-022-01081-8).
- There’s a young, bubbling ecosystem of platforms and ideas that seek to fundamentally change the process of publication. At the core of this is updating the position of peer review (good slides at https://zenodo.org/record/7040997) But, many outside the open science system are unaware of this, assuming that the journey is complete when are 100% of journals are open access.
- How to achieve change? There are discussions over vision policies, strategies, seed funding, sustainability. The words used are different, but the notion of a fusion of top down and bottom up approaches is popular. (One good example at https://zenodo.org/record/7025049)
- Change is not just about the world of science. For the benefits of open science to blossom, much better dialogue is needed with the broader public. Without dialogue, the risk of misunderstanding of (open) science will grow. (See https://zenodo.org/record/7038872 for more)
5. And nothing will really change without changing the incentives – how do we reward and recognise all staff involved in (open) science? (Dutch approach – https://recognitionrewards.nl/)
Written by: Meta Keijzer-de Ruijter
Digitalisation offers a lot of opportunities for researchers to boost their research: automation of repetitive tasks, collection of more (complex) data, increase in computing capacity, re-use of data and code plus new ways to collaborate with colleagues within and outside of the research group.
At TU Delft, Library Research Data Services and ICT/Innovation are working closely together to define a vision and strategy to improve the support for developing digital competencies amongst the research community. In this process, we are looking at workshops and activities that are already being organised to assess their impact and possible improvements. With that aim in mind, a survey was developed to be sent to former participants of Data and Software Carpentry workshops. Our survey was based on the standard questions of the long-term impact survey that The Carpentries use, but we added questions on the actual usage of the tools taught and additional learning needs on the topics that were addressed in the workshops.
The survey was sent to 315 former participants of Data and Software Carpentry workshops held by TU Delft between 2018 and the summer of 2021. We received 45 responses. In addition, the feedback from both instructors/helpers and participants over the years was considered in this report.
Demographics and Scope
Most of the respondents were PhD candidates (37 out 45) that attended the workshops during their first or second year (21 out of 37). Respondents came from all faculties, but the majority from Applied Sciences (8), Civil Engineering and Geosciences (6), Architecture and the Built Environment (6) and Mechanical, Maritime and Materials Engineering (6).
Most respondents provided feedback on the software carpentry workshops (30 of 45).
Main Takeaways from Carpentries
The top three takeaways from the carpentry workshops, according to the respondents, were:
- Using programming languages including R, Python, or the Command Line to automate repetitive tasks (16)
- Making code reusable (12)
- Using scripts and queries to manage large data sets (10)
The workshop(s) helped them to improve their efficiency, data analysis and data management skills.
It should be noted that eight respondents wrote that they are not using any of the tools. The main reason for that was that they use alternative tools that better match their practice or that are easier to use on an ad hoc basis.
Relevance of the content
Most of the respondents found programming with Python (partly or very) relevant. But there is a lot of debate on the level of Python being taught. Although it was communicated that the level is very basic beginner, a recurrent note (also in the surveys right after the Software Carpentry Workshops) was being made that the topics addressed are too basic and the pace of the course too slow for those who want to refresh. The reason they still attended was that in order to get the full Graduate School credits for this course, everyone should attend all parts of the carpentry. In an online setting this seemed to result in less engagement in the breakout rooms.
Figure 1 Response to the question ‘How relevant was the content of the Software Carpentry for your research’? (n=22).
Data Carpentry Genomics
In the Genomics carpentry the command line part was found most relevant to the participants, closely followed by data wrangling.
Figure 2 Response to the question ‘How relevant was the content of Data Carpentry Genomics for your research?'(n=3)
Data Carpentry Social Sciences
In this data carpentry the data analysis and visualization with R is most valued. The relevance (and eventual) use of OpenRefine was questioned. None of the respondents (n=5) in the long-term survey reported using OpenRefine in their current work.
Figure 3 Response to the question ‘How relevant was the content of Data Carpentry Social Sciences for your research?'(n=5)
Use of tools that are taught in the carpentries
In the survey, respondents were asked about their usage of the tools that were taught in the workshops. Did they start using the tools and to what extent, or did they quit using the tools or never even started using them. On each tool they were also asked if and how they would like to advance their skills.
The use of Unix was picked up by a good number of respondents, those that didn’t use Unix mentioned that it did not serve their purpose. Two respondents stated that they did not feel confident enough to use it. Additional materials for self-paced learning or help with how to apply it in their work might help them get started.
It appeared that the respondents that already used Python increased their usage from occasional to more frequent. Those who reported not using Python stated that they use an alternative tool.
Both frequent and occasional Python-users would like to improve their skills either by attending more advanced workshops, self-paced materials or talking with peers. Examples of topics they are interested in include data analysis, (big) data management, packaging, Jupyter Notebooks, libraries (including scikit and numba), handling large databases and automating tasks.
Only three people responded to the usage of R question, one of them started using R from that moment on, the other two report that they were using a different tool to do their data management. The respondent that used R said that additional training, learning materials and talking to peers would be appreciated to improve the skills.
Amongst the respondents the use of Git increased. Those who didn’t use the tool (7 respondents) said either that it didn’t fit their purpose (4 respondents) or that they did not feel confident enough to start using it (3 respondents). This last group would like access to additional learning materials, guided practice or consultation on how to apply Git in their own work.
Most of the Git users would love to improve their skills in various ways. Topics that are mentioned are dealing with complex files, comparing different versions of code, collaborative use of Git and building more confidence using Git.
None of the respondents (n=5) used OpenRefine after the workshop. Three of them stated that the tool doesn’t fit their purpose, while two respondents would like to have additional materials or consultation on how to apply it in their work.
The use of spreadsheets remained about the same after the workshop. And the respondents did not feel the need to improve their skills.
None of the respondents (n=3) had been using this prior to the workshop. After the workshop two of them started using it and they would like to learn more. No specific topics were mentioned.
The respondent that did not start using cloud computing stated that it did not fit his/her purpose.
Additional training on research data management or software development
Respondents could tick multiple boxes in this question about additional training needs. The top five training topics were:
- Basic programming – 14 votes
- Modular code development – 13 votes
- RDM workflows for specific data types – 13 votes
- Software versioning, documentation and sharing – 11 votes
- Software testing – 10 votes
Impact of attending the carpentry workshops
Most respondents agreed that attending the carpentries had an impact on their work. They gained confidence in working with data, made their analysis more reproducible, were motivated to seek more knowledge about the tools, advanced their career, improved their research productivity and improved their coding practices.
The only topic where there was a wide spread of (dis)agreement was on the impact of the carpentry workshops on the professional recognition of their work.
Recommendation of the Carpentry workshops
Most respondents recommended or would recommend the carpentry workshops to their colleagues.
From the survey we learned that the carpentries fulfill an important role in the introduction of tools that help members of the research community to carry out their work more efficiently, but additional means and support are necessary. The translation of the carpentry materials to the daily practice can be challenging, causing some to quit or not even start using the tools. In the discussions with observers, helpers, and instructors of the software carpentry workshop we identified the need to link the carpentry topics more clearly to the research workflow, in order to increase the understanding of the ‘why’ we teach these topics and tools.
And of course..
We (TU Delft) would like to thank all those who participated in the survey….
There was considerable focus on AI, and its implications for research and research support, at the Surf Research Week in Utrecht this week (10 May 2022). Here’s a brief set of bullet points on the discussions I noted:
There’s too much focus on the negative examples of AI. How can we do more to demonstrate the positive benefits? And ally that to greater transparency in how AI functions.
Also, how can we intelligently reflect on the applied uses of AI? AI should not be viewed as a panacea for all social and research challenges; we need to gain the critical insight to understand when it is (and is not) the right methodology to be applied.
AI changes the nature of how research projects are organised. Also: if AI permits researchers to make discoveries about things they were not even aware of, should we even think of AI itself as a collaborator in research projects, rather than a methodology that we use?
In any case the need for collaboration between people with different knowledge becomes even stronger with AI. There are wide-ranging skills that are needed to deploy AI within a project – not just in technical terms, but in terms of data science and management, and embedding the ethical context. In particular, ensuring an ethical footing for AI projects with potentially profound social implications should involve the right philosophical expertise. The related need for management and archival skills in curating and documenting the datasets that underpin AI.
And also efficiencies and expertise provided by high-quality software engineering can drastically time and money when it comes to deploying the computer power needed for AI. A familiar discussion arose – should AI expertise be situated locally or nationally? Should it be generic or focussed on specific subject areas?
The tension within the AI community between acceleration and regulation. On the one hand, global challenges desperately required the novel ideas that AI can provide – let’s move quickly! On the other hand, we need standardisation to be able to deploy AI in a sustainable and regulation to provide the necessary ethical context. Let’s get this sorted out.
Are they distinctions between AI algorithms and datasets that area developed within the research community, and buying AI as a service from third parties? What does the latter mean for issues such as reproducibility, ethics? What are the implications for university procurement departments if a whole university wants to use AI as a platform.
(Thanks to SURF for organsing the event, and to the panellists and workhsop speakers who contributed so many ideas Emily Sullivan (Tu/E), Nanda Piersma (HvA), Maarten de Rijke (University of Amsterdam), Damian Podareanu (SURF), Antal van den Bosch (Meertens Institute) Sascha Caron (Nikhef) Matthieu Laneuville (SURF)
Author: Esther Plomp
When I started as a Data Steward at the Faculty of Applied Sciences I attended the Essentials 4 Data Support course to learn more about research data management support. I was therefore happy to accept Eirini Zormpa’s invitation to discuss my Data Steward journey with the participants of the Essentials 4 Data Support course. Together with Zafer Öztürk from Twente University we shared our experiences during the data supporter panel on the 14th of April. This blog post is a summary of what I highlighted during the panel.
The Essentials 4 Data Support course is an introductory course about how to support researchers with the management and sharing of research data. The course materials are helpful to gain an overview of what research data management support entails. The course also provided an opportunity to link up with peers (such as Lena Karvovskaya) and meet experts (such as Petra Overveld).
In December 2018 I started as the Data Steward at the Faculty of Applied Sciences. In my first couple of months I had the privilege to be peer-mentored by Yasemin Türkyilmaz-van der Velden, who showed me the ropes of data management support. Initially, I had to get to know the position, the workings of the faculty, my new colleagues and the researchers I was now supporting.
In this first year I worked together with Yasemin on our Faculties Research Data Management Policies, based on the TU Delft Research Data Framework Policy. This was an arduous process, as we visited all departments of our faculties. The policy was discussed with many stakeholders, including PhD candidates. In the beginning of 2020 the Applied Sciences Policy on Research Data Management was officially released! Yasemin and I also worked together in the Electronic Lab Notebook pilot that took place at TU Delft resulting in TU Delft licences for RSpace and eLABjournal.
In 2019 I followed a Software Carpentry Workshop to learn basic programming skills so I could better support researchers with any software support questions. I later took the train-the-trainer course and became a Carpentries Instructor myself. By being a Carpentries instructor I can teach basic programming lessons set up by the Carpentries community. With the pandemic we had to shift these trainings online, and I coordinated these workshops for a year (2020-2021).
Over the years, I also increasingly supported researchers with Open Science questions. This is an aspect of the role that I very much enjoy and currently try to expand upon. My role differs somewhat from the other Data Stewards at TU Delft: we each have our own preferences and areas of expertise next to data support (such as software, ethics, or personal data). Another difference is my involvement in a side project focused on PhD duration. At TU Delft and at my faculty we try to reduce the amount of time that PhD candidates take to finish their PhD project. While the official duration for a Dutch PhD is four years, the majority of PhD candidates take much longer. This often means that they have to finish the project in their unpaid free time. As someone who has spent seven years on a PhD project I can say that finishing your PhD next to a full time job is no joke.
As a Data Steward I’m also a connection point in the university network. This allows me to address researcher’s questions myself or to connect them with the expert that they need.
- My position at the Faculty itself allows for close contact with researchers. Before the pandemic I regularly hopped between their offices to help them with any questions. At the Faculty I’m embedded in the support team where I work together with the Faculty Graduate School and the Information Coordinator. I’m in regular contact with project officers, managers and researchers from all levels at the faculty.
- As part of the Data Stewards team I meet the other Data Stewards once a week (virtually) and we communicate through Slack/Teams.
- I’m also in contact with colleagues from the Library and the Digital Competence Center, either through collaborative work or because they are the experts that can address questions from researchers.
- Sometimes I reach out to central experts from the Human Research Ethics Committee, the Privacy Team and ICT Security when needed.
Next to my activities as a Data Steward at TU Delft, I’m also involved in several other initiatives that are revolving around data and open research:
- Since 2020 I’ve been a contributor to The Turing Way. I have primarily written about Research Data Management and contributed a Data Steward case study.
- I am also part of the team that is behind the Open Research Calendar.
- Since 2021 I’m a mentor of the Open Life Science programme, which is now also offered for credits for the PhD candidates of my Faculty. In this 16 week mentor programme you will learn about open science practices and apply them practically to your own project.
- I’ve written an essay on the importance of physical samples in data management and I’m one of the co-chairs of the Research Data Alliance group on physical samples and collections.
- I’m the Open Research Ambassador and Secretary General of IsoArcH, a disciplinary specific data repository for isotope data.
Over the years I very much enjoyed writing blogs like this one, summarising my experiences of conferences, activities and learnings.
I very much enjoy the Data Steward role, for various reasons:
- I support researchers in making their research more transparent.
- I work with amazing colleagues and collaborators
- I meet new people interested in similar topics.
- I can continuously develop and learn new skills.
- I have a lot of autonomy over my working activities and schedule.
A lot of this is made possible by a supportive manager, and many individuals that I learned from along the way.
“Create the world you want, and fill it with the opportunities that matter to you.”– Alicia Keys
My tips for people just starting in a data support role:
- Accept that things can take more time than you originally anticipated. Starting in a new role will take some time to adjust and achieving cultural change in university processes will not happen overnight.
- The downside of being able to create your own opportunities is that there might be a lot of things that you want to do. Even if everything seems important or fun to do, it could mean that you will end up with too much on your plate. Sometimes it is good to say no to shiny opportunities.
- In whatever you do I would recommend you to not take the road alone and seek out others to collaborate with, or ask feedback from. Exchanging expertise and experience will not only be more efficient, it will make the road more worthwhile to walk.
Author: Esther Plomp. Contributions from Vicky Hellon, Achintya Rao, Yanina Bellini Saibene, and Lora Armstrong.
A Data Article (also known as a Data Paper/Note/Release, or Database article) is a publication that is focused on the description of a dataset. It uses the traditional journal article structure, but focuses on the data-collection and methodological aspects and generally not on the interpretation or discussion of the results. Data articles are in line with the FAIR principles, especially since most publishers will encourage you to share the data through a data repository. The benefit of a Data Article is that your output will be peer reviewed, something which is generally not the case for datasets that are archived on data repositories. It also facilitates recognition for datasets through research assessment procedures that are more traditionally focused on publication output. Publishing a data paper will therefore increase the visibility, credibility and usability of the data, as well as giving you credit as a data producer (The Turing Way Community 2022).
Below you can find some journals that publish data articles. The costs information was collected in February 2022.
|Discipline||Publisher/Journal||Cost estimate||Deals for TU Delft|
|All||Experimental Results||£775 / €928||100% APC discount for TUD authors|
|All||Scientific Data||€1790||No, but eligible for the TU Delft OA Fund|
|All||Data in Brief||USD 500 / €440||100% APC discount for TUD authors|
|All||China Scientific Data||RMB 3000 / €416||No|
|All||Data Science Journal||£650 / €778||No, but eligible for the TU Delft OA Fund|
|All||Data||CHF 1400||100% APC discount for TUD authors|
|All||GigaScience||€1089||No, but eligible for the TU Delft OA Fund|
|All||Gigabyte||USD 350 / €308||No, but eligible for the TU Delft OA Fund|
|All||F1000Research||USD 800 / €704||No, but eligible for the TU Delft OA Fund|
|Archaeology||Journal of Open Archaeology Data||£100 / €120||No, but eligible for the TU Delft OA Fund|
|Archaeology||Open Quaternary||£300 / €359||No, but eligible for the TU Delft OA Fund|
|Chemistry||Journal of Cheminformatics||0||NA|
|Earth Sciences||Geoscience Data Journal||€1450||No, but eligible for the TU Delft OA Fund|
|Earth Sciences||Earth System Science Data||0||NA|
|Earth Sciences||Big Earth Data||€910||No, but eligible for the TU Delft OA Fund|
For more journals that offer the Data Article format you can see the ‘Data Article’ section in The Turing Way.
Authors: Esther Plomp, Emmy Tsang, Emma Henderson and Delwen Franzen
This blogpost summarises a discussion session held during the AIMOS2021 conference (1 Dec – 08:30-9:30 AM UTC). During the discussion, we focused on what our institutes and departments could do to improve the awareness of Open Science practices and support the change towards a more open research culture. We started our session with some of the questions that the participants were currently struggling with, and some of our (not so) success stories:
- The Delft University of Technology (the Netherlands) already has a lot of policies and support roles in place that support Open Science practices. There is the Open Science Programme with a dedicated Community Manager that also supports the building and growth of the TU Delft Open Science Community. At the Faculty level, Data Stewards provide support for research data and software management and sharing. Thanks to these Data Stewards, the faculties each have their own Data Management policy.
- The Flinders University (Adelaide, Australia) is working on policy changes and has an Open Science training in place.
- The BIH QUEST Center (Charité – Universitätsmedizin Berlin, Germany) has developed a pilot dashboard that provides an up-to-date overview of several metrics of open and responsible research at the Charité.
Having dedicated roles or policies for Open Science and Data Management is crucial to drive effective change in research practises, but not every institute has these resources. While the uptake of Open Science practises in the last five to ten years has increased, there is also still a lot of frustration at the local level. Not everyone has the time to pay attention to or is enthusiastic about Open Science developments, and participants indicated that some principal investigators did not care about replicability in research. If bachelor/master students are following training on open research practices, they are equipped to take this aspect into account when selecting a supervisor for their PhD research (see also Emily Sena’s contributions in the AIMOS 2021 Panel Discussion on “How to start a revolution in your discipline”). While some institutions offer Open Science training, sometimes the uptake is low. During the session we struggled with some of these obstacles and discussed the following four questions in more detail:
How can you make the case for hiring professionals that support Open Science practices?
It helps if other institutes have examples of professional support roles, especially if there is visible impact in the uptake of Open Science practices. A great example of this is the UK Reproducibility Network (UKRN). The UKRN is actively involved in supporting institutions in setting up roles that focus on increasing reproducibility of research, by connecting stakeholders to share best practices and by providing expert advice.
To build the case for the institution to prioritise investment in Open Science, it is often helpful to illustrate to institutional leadership the effects of (inter)national funders’ commitments to Open Science. Funder mandates on data management planning and sharing are now commonplace (for example, the European Commission, NWO, NIH) and are directly impacting the institution’s researchers.
It was also noted that support from institutional/faculty leadership alone was often not sufficient: the establishment for these roles should also be driven by the needs of the researchers. Ideally, there is alignment in these bottom-up needs and top-down strategic decisions.
How do you set up an Open Science policy at your institute?
To set up an Open Science policy, you may be more successful if you tackle the variety of different aspects of Open Science separately. Open Science is a very broad concept and it may be complicated to address Open Access, Data, Software, Education, Engagement in a single policy.
Stakeholder engagement is essential when setting up a policy. You should make sure that the policy represents various interests at your institution. Stakeholder mapping is a helpful exercise that could help one understand who to talk to, how and at what stage of policy development. While it may take time to actively engage all of your stakeholders, in the end your policy will be more practically applicable and supported. At the same time, it is also an opportunity to engage in conversation with your stakeholders with this topic, as an upcoming policy that would affect them creates a sense of urgency. It is helpful to run your policy past procedural check points (such as Human Research Ethics committees).
How do we incentivise/reward researchers practising Open Science?
One way to incentivise researchers to practise Open Science is setting up Awards:
- The BIH QUEST Center offers several awards, including an Open Data Award and an Open Data Reuse Award.
- The University of Surrey organised a showcase event on Open Research and Transparency, where researchers from any discipline could present their case studies in 20 minutes. The presentations were followed by an award ceremony and afterwards the case studies were listed on the website.
- The Health and Technology Open Research Awards involved scoring people on their Open Science practices as objectively as possible. This award benefitted from the UKRN primer on Open Research Awards.
- The University of Bristol and University of Groningen also awarded Open Research Awards.
- There is the Parasite Award for rigorous secondary data analysis.
While it is important to recognise the efforts of individual researchers in practising Open Science, there are discussions on whether incentivising them with awards is the best approach (see Lizzie Gadd’s post ‘How (not) to incentivise open research’ and Anton Akhmerov’s Twitter thread).
How do you get more people onboard in practising Open Science?
In order to gain more support for Open Science practices, it helps if there are practical examples. It is not always clear from hypothetical or abstract statements what can be done on a daily basis to make research practices more open.
It was noted that it is easier to start at the beginning of the research career with learning about open research practises, for example, during undergraduate or early graduate school training. Once the students have gained more knowledge, they can also demonstrate to their supervisors that these practices are beneficial. However, it cannot just be up to PhD candidates to drive these changes as they are in a hierarchical relationship with their supervisors. Supervisors should also receive training and support to adjust their practices.
Useful links and resources
- Open Science communities (for example, TU Delft)
- See the Starter Kit
- Open Life Science programme
- Utrecht University Rewards and Recognition model: TRIPLE
- Utrecht University (NL): Open Science Monitor survey
- Academic job offers that mentioned open science
- Open Scholarship Grassroots Community Networks
- UK Reproducibility Network
- Surrey Open Research and Transparency Showcase
- Promoting Open Science: A holistic approach to changing behaviour
- Open Research Toolkit
- CARL Institutional Policy Template
- ODDPub: a text-mining algorithm to detect data sharing statements in biomedical publications by Riedel et al. (2020) (BIH QUEST Center, Charité – Universitätsmedizin Berlin)
- The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice by Chris Chambers
- Science Fictions by Stuart Ritchie
- Bad Pharma by Ben Goldacre
This blogpost is written based on contributions by the session participants: Peter Neish (The University of Melbourne, @peterneish), Delwen Franzen (BIH QUEST Center for Responsible Research, @DelwenFranzen), Jen Beaudry (Flinders University, @drjbeaudry), Emma Henderson (University of Surrey, @EmmaHendersonRR), Fiona Fidler (University of Melbourne, @fidlerfm), Nora Vilami and Pranali Patil.
Authors: Meta Keijzer-de Ruijter, Masha Rudneva
Cooking class for researchers
Some people think that digital skills in research focus on learning how to program (Python, R, C++, MATLAB, etc.) or use digital tools to automate recurring tasks, but it entails a lot more.
Becoming a digitally-skilled researcher requires more than ‘just’ learning to use individual tools. It is like becoming a star chef: It does not suffice to know how to use the different cooking appliances (knife, mixer, oven, stove, etc). You also need to know how to run a kitchen efficiently, making sure all prepared ingredients for the dish come together on a plate at the right time without mixing up steps in the recipe that affect the final quality of the dish. To summarize, it is essential to consider, plan and prepare all steps and aspects of the research process workflow at the beginning of each research project.
The potential drawbacks
Implementing best practices in using digital tools requires a significant change in workflow to achieve efficiency and good quality outcomes. If not, code and other scientific outputs can be lost or become unusable by others in the future. Think about the master student who had done a great research and successfully graduated. However, after the student has left, the successors cannot find or re-use the developed code and have to start from scratch. So, the valuable contribution to the project is lost, and the continuity of the work is disturbed.
Also, if researchers do not document the actions and steps during the research project, they may need to figure out things twice when it comes to publication. Reproducibility of the results largely depends on good digital skill practices. So, how could one make sure that the research artefacts remain useful for society and successors?
The Open Science community formulated four main principles or the best practices helping in this process. Those are the “FAIR” principles, which stands for “Findable, Accessible, Interoperable and Reusable”. So, let us analyze a typical life cycle of research software or code creation and see how to build FAIR principles and essential digital skills into your research project.
What “digital skill” ingredients do you need during your project?
At the start of a research project, it helps to have a good overview of the elements in the workflow in relation to the various stages of your project. In this section, we provide a roadmap that could help you to plan your work.
Step 1 – Preparation Phase:
Defining what you need to build and what tools will be used is essential.
- Analysis of the project requirements – What research questions do you need to answer, and what results do you want to achieve? Breaking down big questions into smaller problems/deliverables often helps in building a more modular code in the long run.
- Investigation of the available codebases – Can you build your project on an existing platform, codebase or use available algorithms, for example, or do you have to “start from scratch”? It is often more efficient to re-use available resources, but you always need to check the licenses and conditions before using, copying or modifying the code of someone.
- Learning about the best practices for Research Software development – a high-level understanding of the best practices allows you to avoid the common pitfalls and makes your work more efficient.
- Choosing the platform, programming language and the concepts for the code produced – is the last preparation step. You are now ready to start the actual work.
Step 2 – Research project:
Creating the code or research software is only one piece in the whole story. The other essential aspects one should consider are:
- Data Management – consider the best practices for data management at the beginning of the project by drawing up a Data Management Plan, which will detail how the data is structured, stored, and archived. Having a Data Management Plan will save you a lot of work and time later. Data Stewards at the faculties are available to provide you with all the support, training and information required (https://www.tudelft.nl/library/research-data-management/r/support/data-stewardship/contact)
- Backing up the code – think about the storage with periodic automated backup or set up the backup routine yourself. If you use TUDelft research drive or SurfDrive, your code and data are automatically backed up for you. If you use your laptop or external hard drive, you can lose data if the storage drive is damaged. When storing in the cloud, make sure that your credentials are secured, and you will always be able to retrieve them if forgotten.
- Documentation – code by itself can be great but not (re)usable by others if no documentation is attached. It might be challenging to remember and understand what the code is doing a year later. So, having proper documentation is a valuable step in making your code reusable by yourself and others.
- Metadata to describe your code and results – metadata can be as broad and descriptive as possible. It may contain information about the code creation (author, date, OS, configurations) and describe when and how to use it. Adding appropriate metadata can make your code more findable.
- Use of Version Control – this is an essential part of any research project. It allows you to see and manage changes to files over time, keep track of those modifications and ease the collaboration and co-creation of the code for you and your colleagues. The use of GitLab, GitHub or other version control systems ensures that you can always go back to the previous version of your code if something went wrong at the current state. It enhances the reproducibility of the research produced.
- Testing / Distribution – you should build tests into the code at various stages to make debugging easier, mitigate potential errors and ensure that you and others can use your code without errors and reproduce results.
- Security and Privacy – you often need to build some security features or choose the framework with the built-in security to keep classified and sensitive data well-protected and keep vulnerabilities out of your system.
Step 3 – Publishing and Sharing
Now, the code has been built, and the first results are obtained. It is a perfect moment to celebrate, but this is not the end of the story. Now think about the sharing and archiving of your results. If you would like the community to use your results, your code should have a license, be stored where others can find it, have explicit metadata attached to it and possess unique identifiers. But no worries, if you have followed the FAIR principles, you are well-covered.
- Licensing – Whether you want others to (re)use your code or you are thinking about patenting your software, you should choose a license for it. The most common software license models are Public domain, Permissive, LGPL, Copyleft and Proprietary – they are different types of licenses varying from completely open to fully restricted.
Often if you are developing software openly, e.g. on GitHub/GitLab, the advice is to choose a license at the beginning. This also has implications for registering the software as per the Research Software policy.
- Citations – Citing the sources you used acknowledges and gives credit to the authors. It also allows others to learn more about the previous work your project is built upon. To make your code more citable, it is worth adding a citation file (CFF) to your repository (https://citation-file-format.github.io/)
- Publishing – there are many platforms on which you can share and publish your code, e.g. GitHub or SourceForge. Publishing and sharing your project on these platforms can attract collaboration and increase visibility. Please remember that the code or any digital object should have a Digital Object Identifier (DOI) to make it easier to find or cite. If the data/code cannot be shared, you can still share the metadata in a repository so that others can find your project and request access to it.
- Archiving – when the project is over, you may want to archive your code in a repository to access it in the future. Code can be archived at, for example, 4TU.ResearchData or at Zenodo.
Digitization brings a lot of opportunities to researchers to do more advanced research and collaborate with others. But it requires adjustments to the workflow, development of a common language and learning skills to effectively use new tools that come available. The good news is that at TU Delft, we have training courses and excellent support available through the Digital Competence Center (DCC, https://dcc.tudelft.nl/ ) and Data Stewards that can help you run your kitchen as a star chef in the digital age.
Who are we?
Meta Keijzer-de Ruijter:
Meta has a background in Chemical Engineering and Corporate Education. She spent more than 10 years in the ICT Innovation department developing digital assessment in education at TU Delft. Recently, she became a project manager of the FAIR Software project within the Open Science Program. Together with colleagues in ICT Innovation and Research Support at the Library, she set up the Digital Competence Center (DCC) support team. She currently investigates the needs for digital skills for researchers.
Masha has got her PhD in Physics at TUDelft and recently joined the ICT department as an Innovation Analyst. She focuses on supporting researchers in challenging ICT related requests.
It is the first in the series of blog posts in which we want to talk about the work we do to support the researchers as the Innovation Department and DCC team to reflect on the things we come across.