Five W’s of Open Data.

TU Delft library met Open Data Expert and Data Champion, Anneke Zuiderwijk, to learn about the ‘who, what, where, when and why’ of Open Data.

Written and illustrated by Connie Clare

Anneke Zuiderwijk is a Data Champion working within the Faculty of Technology, Policy and Management. As an Assistant Professor in the Department of Engineering, Systems and Services, she has dedicated her academic career to investigating the complex rationale behind open data. In Zuiderwijk’s unique case, data sharing isn’t just a research consideration, rather it’s her research domain.

Zuiderwijk believes that great value can be created through open data sharing and use and that it can be revolutionary for scientific advancement. Her mission is to develop theory for designing infrastructures and institutional arrangements that incentivize open data sharing so that it eventually becomes the standard rather than the exception.

In order to realise her mission, her academic research and responsibilities as a Data Champion address the following ‘5W’s of open data’:

  1. Who are the actors involved?
  2. What are their motivations to share data? And, what can infrastructure do to motivate data sharing?
  3. Where can data be shared?
  4. When should they prepare to share data?
  5. Why consider data sharing?

About Anneke

Before joining TU Delft in 2011, Zuiderwijk worked for the Dutch Ministry of Justice and Security where she gleaned insight into the real world of open judicial data. “The Ministry were collecting data on crime statistics and were faced with many challenges of sharing such sensitive information with the wider community.”

Motivated to mitigate the challenges associated with sharing and using government data, Zuiderwijk began her PhD under the supervision of Professor Marijn Janssen at TU Delft. Her doctoral research involved the design of a socio-technical infrastructure that enhanced the coordination of open government data use. By creating a metadata model with interaction and data quality mechanisms, Zuiderwijk aimed to stimulate interaction between data providers, data users and policy makers, thereby creating a feedback loop to better inform policy about government data use.

After obtaining her PhD with distinction, Zuiderwijk assumed a postdoctoral researcher position working on the VRE4EIC project, funded by the European Union’s 3-year Horizon 2020 research and innovation programme. ‘VRE4EIC’ stands for ‘Virtual Research Environment to Empower multidisciplinary research communities and accelerate Innovation and Collaboration’. Essentially, this online platform draws professionals from various scientific domains to exchange data, resources and knowledge within a multidisciplinary data environment. Data can be made open at different levels within the platform meaning that there is control over data sharing and data can be protected if necessary.

Zuiderwijk emphasises the importance of embracing diversity and fostering inclusion for scientific progression. “Many research environments focus on discrete scientific disciplines with limited convergence, yet, as scientific problems are related across all disciplines, it’s of tremendous value to bring disciplines together to solve problems collectively. After all, societal problems must be studied from multiple perspectives.”

1. Who are the actors involved?

The field of open data involves a number of actors. “Researchers, governments, companies, citizens, journalists, librarians and archivists are all concerned with open data to some extent,” informs Zuiderwijk. A key objective of her research is to understand the needs of these different actors and how to motivate them to openly share their data and use the data of others. “In order to build a sociotechnical infrastructure that facilitates effective data sharing across multiple disciplines, it’s necessary to understand the perspectives, behaviours and motivations of each actor involved.”

2. What are their motivations to share data?
And, what can infrastructure do to motivate data sharing?

She explains that their motivations are diverse and discipline-specific. “Researchers may be motivated to openly share data if it means greater visibility and more citations to boost their career development.” However, Zuiderwijk admits, “There are still obstacles that prevent researchers from sharing data, such as a lack of the required skills.”

Whilst the majority of obstacles cannot be mitigated completely, Zuiderwijk explains that implementing suitable infrastructure and related institutional arrangements can help. “It’s important to improve the ease of use of data infrastructures, of which open data portals are one element, and to provide instutional traning for using such portals so that researchers acquire the necessary skills and become motivated to share data.”

Data sharing is more common in some academic disciplines than others and there are obvious disparities across them. Her recent publication, ‘Sharing and re-using open data: A case study of motivations in astrophysics’, explores the complex interaction of factors influencing open data sharing and use in astrophysics, a single scientific discipline where data is already extensively shared and re-used. Her case study demonstrates that benefits associated with data sharing involve intrinsic motivations, such as more reproducible science and faster rates of scientific advancement. Zuiderwijk hopes that insights gained from her study of astrophysics can be transferred to disciplines where data sharing is less common.

She discusses the motivations of other actors. “Whilst governmental organisations are typically motivated to share data for public value, it’s more difficult for corporate organisations to do so.” She continues, “Commercial enterprises are often restricted by proprietary interests, trade secrets, and concerns over legal ramifications of privacy and security breaches.” Zuiderwijk is ambitious to help overcome the barriers of data sharing and use within the private sector and build trust through the creation of mechanisms for sharing open business data. This is an objective she wishes to explore in the near future.

3. Where can data be shared?

Data is typically shared via online platforms. Zuiderwijk uses the 4TU.ResearchData repository to openly publish her data. She believes that one of her most important contributions as a Data Champion is motivating others to follow her practice. “I actively encourage my students to publish their data on the 4TU repository.” She adds, “Working transparently gives them more credibility as early career researchers. If a reviewer can see that their underlying data is openly accessible it instils confidence that the findings presented within their manuscript are robust and reliable.”

4. When should they prepare to share data?

As soon as Zuiderwijk’s students begin their research, she educates them on the principles of Open Science and proper research data management. She ensures that they each receive relevant training to openly publish their data upon completion of their research project. “From the point of data collection, students are trained on how to cleanse and curate their data so that it’s understandable to others when they openly share it. Formatting data correctly saves time and effort when it comes to uploading data to the repository.”

Zuiderwijk coordinates with Faculty Data Steward, Nicolas Dintzner, to help students create their data management plans prior to conducting research. Wherever necessary, she also mandates that students apply to the research ethics committee so that they understand the regulations and requirements of collecting and sharing confidential data.

“Reluctance to share confidential data often arises because researchers believe that data anonymisation costs extra time and effort.” She argues against this belief, using her own research methodology as an example. “When I systematically analyse personal data, such as interview transcripts, I assign codes to the data to describe the content. This means the codebook; an Excel spreadsheet containing analysed data, is already anonymised. With no extra labour, the coded version can be openly published on the repository to benefit the wider community.”

5. Why consider data sharing?

Despite important considerations for confidentiality, there’s no doubt that the arguments for sharing data are powerful. Individuals have an opportunity to advance their scientific knowledge, and improve their visibility and credibility, which in turn raises their professional profile. They have more scope to build international, multidisciplinary connections and foster a collaborative community wherein resources can be shared for social and economic gain.

Whilst the benefits are obvious, Zuiderwijk admits that the culture change towards open data is slow. “It’s a long-term transition. It took me several years to appreciate why I should share my data,” she says. “I’m fortunate that in my field of research I get to experience the benefits of data sharing and re-use. Unfortunately, actors in other disciplines rarely get to witness the advantages.”

Passionate about educating others about the importance of data sharing, Zuiderwijk has instructed the ProfEd course on Open Data Governance: From Policy and Use and the Massive Open Online Course (MOOC), Open Science: Sharing Your Research with the World, together with TU Delft Library instructors, Michiel de Jong and Nicole Will. Alongside the leader of TU Delft’s Knowledge Centre Open Data (Kenniscentrum Open Data), Bastiaan van Loenen, she will lecture during the upcoming MOOC on Open Government organised by Marijn Janssen, which takes place in September this year.

Starting small and thinking big: A Quantum Tinkerer’s quest to mentor Open Science

We hear from Data Champion and Associate Professor in Quantum Nanoscience, Anton Akhmerov, about his journey towards becoming an ambassador for Open Science at TU Delft.

Written and illustrated by Connie Clare

An individual who volunteers their discipline-specific expertise, promotes FAIR data principles and advocates proper research data management.

An individual who uses their passion for knowledge exchange and their desire to build a collaborative and researcher-led community to drive the uptake of Open Science within their faculties and departments.

By definition, Anton Akhmerov leads by example as an outstanding TU Delft Data Champion.

Inspired by open-source

Akhmerov is a theoretical physicist at the Kavli Institute of Nanoscience and QuTech at TU Delft. His expertise in quantum mechanics and nano-superconductors involves numerical simulations to explore complex phenomena, such as topological quantum computation. His research uses various open-source software, including SciPy (a Python-based software for STEM subjects) and Jupyter (a project that exists to develop open-source software, standards and services for interactive computing). 

Akhmerov explains why open-source software is integral to his work. “Many of my ideas aren’t radically new; I see something good and I want to make it even more awesome.” He adds, “Sharing code increases the impact of such work, and makes it possible to work on topics as a community.” To this end, we learned about Akhmerov’s contribution to Jupyter whereby he created a library for incorporating code output in documentation using Jupyter-Sphinx together with collaborators. His example demonstrates how incremental improvements can take place through open-source. He also feels compelled to make his research open for altruistic reasons. “Since my software development is achieved within the open-source domain, it seems natural to share my code so that others can use it.”

Starting small

Starting small with his ambitions, Akhmerov began by perpetuating his culture of sharing amongst colleagues of the Quantum Tinkerer research group at TU Delft. After back-and-forth discussion about how they could make their research more open, it was agreed that the group’s work in progress would be shared internally on their Kwant Gitlab server whilst finalised code would be published openly on zenodo

Anton Akhmerov and his colleagues of the Quantum Tinkerer research group at TU Delft

Thinking big 

In collaboration with another Data Champion Gary Steele, Akhmerov worked to develop and implement an Open Data Policy to engender a culture change across the entire Quantum Nanoscience Department. The policy provides guidelines on preparing and openly publishing data for the department but is to be adopted across the Faculty of Applied Sciences in the hope that it will eventually become the norm at TU Delft. 

He also actively engages with the wider scientific community to stimulate a discourse about making research data and software open. His talk, ‘Time to share is now’, was delivered at the Data Champions kick off meeting in December 2018 to emphasise the importance of software in contemporary research. He was also an author of the presentation, ‘Making Research Software a First-Class Citizen in Research’, that was delivered at a meeting with The Netherlands Organisation for Scientific Research (NWO) in March. The presentation advances the idea that if we are to produce transparent, reliable and reproducible research in the name of Open Science, then publications, data and software must be treated on equal footing at the policy level.

The Spirit of Secrecy

Encouraging researchers to make their software open isn’t easy. “Many researchers see sharing code as a radical idea and may object for several reasons; they may feel like their code is invaluable to others or that it takes too much time to clean up and make readable. Alternatively, they may have dedicated a lot of time to develop a piece of code that gives them a competitive advantage and, therefore, feel reluctant to share it.” Says Akhmerov. 

Whilst he understands such objections, his personal perspective is somewhat different. “I don’t particularly appreciate this spirit of secrecy and I hope the prominence of this viewpoint will eventually diminish.” He claims, “Researchers who want to publish their papers but refuse to publish their code aren’t helping to solve the problem of reproducibility.” 

Reviewers, rethink! Do something for Open Science that requires zero effort.

Akhmerov realised that he could do something to help solve the problem with no extra effort. As a reviewer, he began requesting all of the underlying data and code for each paper that he reviews. “It’s the responsibility of the reviewer to ensure that the publication is based on sound results. By examining the data and code used in the paper, one can verify whether or not the findings are indeed true.” His article, ‘I need your data, your code and your DOI’, relays the message that reviewers are in a position of power and can mentor Open Science by ensuring quality standards are met. 

He is a member of the editorial board of the New Journal of Physics, and more recently, SciPost; an online, open access, community-run journal managed by scientists. Akhmerov highlights the benefits of SciPost’s guiding principles and is in favour of making the peer-review process more visible to the scientific community. “SciPost assigns each manuscript review a DOI, meaning that the review process can be witnessed and is no longer confidential. This leads to a stricter peer-review process as reviewers are held accountable. What’s more, reviewers can be credited for their time and effort invested in evaluating submissions.” Using publication portals where public-funded science is freely and openly accessible to anyone is an effective way to advance Open Science worldwide. 

Open Education

Akhmerov leads by example as a mentor for Open Science. He teaches the undergraduate Solid State Physics course at TU Delft and has created publicly open online lectures notes with simulations. The source code is openly available so that students can use and modify the lecture material. He explains how this aids the learning process, “Students are encouraged to engage with the lecture material. Since they can access the code, they can correct it, develop it and can even design their own lectures.” Akhmerov’s lectures are a  great example of self-directed, researcher-led learning. Moreover, making lecture material open means that it can be used by many course lecturers simultaneously, making teaching more efficient. To this effect, he also conducted a Massive Open Online Course (MOOC) to educate thousands of students about topology in quantum mechanics. 

In collaboration with software developer, Joseph Weston, Akhmerov developed ‘Zesje’, a web app where written exam manuscripts are scanned and systematically graded on a question-by-question basis. The app, named after the Dutch term ‘zesjecultuur’ which means ‘Grade C-’, was devised to streamline the grading process by assessing exam manuscripts electronically, rather than on paper. Akhmerov came up with the idea when faced with the daunting task of marking around 300 physics undergraduate exam papers. “Whilst an examiner may have a predefined idea of how they want to grade exam answers, it’s extremely difficult to ensure consistency when assessing a large cohort of students on paper.” He adds, “Zesje saves time and effort. An examiner can mark more consistently and the grading can be distributed throughout the entire course team.” Akhmerov’s enthusiasm for low effort exam grading has spread across the university. Now, around 20 courses use Zesje, including those within the Computer Science department that comprises approximately 900 students. Akhmerov was awarded the 2018 Delft Educational Fellowship for his innovation.  

Dedicated to advocating good practice in open-source software development, Akhmerov organised a one week course, inspired by Software Carpentry, to teach basic programming skills to PhD students of the Casimir Research School. 

Future ideas 

As a member of the Young Academy, he is part of a dynamic team of scientists who share a broad interest in science practice, policy and communication. His group project within the academy aims to reduce the carbon footprint of academic travel through the organisation of virtual conferences. “Aside reducing carbon footprint, online conferences have several advantages over offline conferences,” says Akhmerov. “They reach a broader audience and are more inclusive since they accommodate individuals who are limited by funding or prohibitive travel logistics. Online conferences are also less administrative, easier to organise and cheaper to host.” We look forward to hearing how his idea to run virtual conferences at TU Delft progresses. 

We are truly inspired by the positive energy and enthusiasm Akhmerov brings to TU Delft as a Data Champion and mentor for Open Science. Learn more about his latest ideas and perspectives by following his blog and twitter accounts; @AkhmerovAnton and @QuantumTinkerer, to keep-up-to-date!

Empowering people with data.

Data Champion from the Faculty of Industrial Design and Engineering, Natalia Romero Herrera, explores innovative ways of giving data back to the user for social good.

Written and illustrated by Connie Clare

Chilean computer scientist, Natalia Romero Herrera, has worked in the field of human-computer interaction for almost 20 years. Her research within the Design Conceptualisation and Communication section at TU Delft focuses on the development and application of user-centred design methods to understand people’s daily life practices and, in particular, the impact of environmental and health issues.

As an active member of the idStudioLab, Romero Herrera uses living labs to contextualise complex behavioural data. In a living lab, innovative technological applications and tools are designed, and tested by participants to determine how people engage with specific products and services. Romero Herrera’s technological designs are ‘experience-centred’, meaning that they aim to improve user experience as a whole. Therefore, a key element of her research is to learn about the needs and values of users; to engage, empathise and empower them through data.

A ‘give and take’ perspective on data

“Technology has always used data to direct people towards a desired goal,” explains Romero Herrera. “For example, environmental technologies use data to encourage people to reduce their energy consumption at home. Likewise, healthcare technologies use data to encourage people to walk 10,000 steps a day because it’s good for their health.”

Whilst data plays a central role to drive change in societal behaviour, Romero Herrera has different ideas for its use. “Rather than simply taking data from people, we can give data back to people to influence their actions.” She enlightens us on her mission to empower citizens and communities by giving them data. “My research aims to grant people access to the information they need to make autonomous decisions and change their own behaviour.”

To this effect, we discussed Romero Herrera’s current international projects that give data back to people for personal and wider social benefit.  

Getting students involved with ‘ENERGE’

The EU has set itself the goal of reducing greenhouse gas (GHG) emissions to zero by 2050. ENERGE (ENergizing Education to Reduce Greenhouse Gas Emissions) is a project funded by InterregNWE that helps to achieve this goal by facilitating the implementation of low carbon, energy and climate protection strategies to reduce GHG emissions in North-Western Europe.

The project focuses on secondary schools which are typically housed in pre-war buildings and are, therefore, energy inefficient. “If such schools are to meet the zero emissions target, a significant financial investment will be required to renovate their infrastructure.” Romero Herrera continues, “As these renovations will take a long time to plan, there is a demand for immediate low-cost solutions that enable long-term resource efficiency and reduced GHG emissions.”

ENERGE aims to achieve a 15% reduction in total energy consumption within 12 demonstration site schools in the UK, Ireland, France, Germany, Luxembourg and the Netherlands over a four-year period. Here’s how…

Quantity and quality with mixed methods

“We’re collaborating with research institutes and industry partners to design mixed method tools to capture quantitative and qualitative data from schools. This will involve targeted physical interventions, such as a web-based platform, energy meters and building sensors that measure temperature, humidity, carbon dioxide, sound and light.” Romero Herrera has used sensors to collect indoor climate data from buildings in past research projects.

Objective (quantitative) data collected from sensors will be combined with subjective (qualitative) data collected from behavioural studies involving students. Using similar tools to those developed for her previous SusLabNWE project, Romero Herrera can learn more about the comfort of the classroom by asking students how they are feeling. For instance, students can report their thermal comfort using a dial. A digital diary presents the data from the dial in a daily timeline and invites users to indicate what they did to manage their comfort… Did they have to open a window? Or, turn on the heater?

Mixed method tools used in Romero Herrera’s research to collect integrated indoor climate data.
(SusLabNWE and Building Occupant Certificate System (BOCS) Climate-KIC)

Education to empower

Ultimately, the data collected from ENERGE is given back to the students, along with various data enabled tools, workshops and a hackathon. Students are educated on the data lifecycle as they gain first-hand experience of data capture, processing, analysis and visualisation.

Romero Herrera adds that students also learn the importance of collaboration. “Taking a holistic and multidisciplinary approach involves many stakeholders within the school ecosystem. Students, teachers and managers collectively engage with data, learn about the impacts of their actions on energy consumption and can experiment to develop new strategies to solve problems together.”

She also highlights an important future impact of ENERGE. “The project raises awareness about environmental sustainability amongst our future generation. It gives students agency to reduce their energy consumption and help to mitigate GHG emissions. These teenagers are our planet’s future decision-makers and it gives them a voice to express their ideas and opinions.”

The long-term impacts of ENERGE will be consolidated by the addition of revised educational material to supplement existing school curricula for secondary school students. Moreover, the project will monitor the effects of its initiatives beyond the school environment. Data captured in staff and student homes will emphasise the importance of sustainable energy efficiency within the domestic environment.

A healthy relationship with data

Food Sampler is another of Romero Herrera’s current projects. Funded by the ZonMw Create Health programme, this project uses mixed methods to monitor food intake in overweight or obese adult patients.  

Romero Herrera describes the problems of existing methods for monitoring food intakes. “Paper-based questionnaires are laborious, time-consuming and are often not applicable to real life. Patients are, therefore, reluctant to complete questionnaires.” She advises, “If we are to be successful in changing food consumption habits of overweight individuals, we must develop better reporting techniques that engage participation and extend data collection with contextual aspects of dietary practice.”

Persuasive engagement with patients

Food Sampler integrates objective data from tools and subjective data from patient self-reports to evaluate complex dietary behaviour.

Upon interviewing patients, Romero Herrera found that their main objection to completing questionnaires was not due to labour or time, but their fear of judgement. Design has a main role in redefining the qualities of current reporting practices to reduce these negative experiences.

She explains the importance of designing non-intrusive in-situ mixed methods to persuade patients to engage with reporting tools. “Future methods must identify not what individuals are overeating but why they are overeating in order to understand the ecology of food intake.”

In the project’s preliminary stages, Romero Herrera is trialling several prototypes in living labs. “Inspired by the generation of e-health prevention apps, I’m developing ideas for tools that can be used in people’s homes. I want to design a tool with a user-friendly interface that encourages patients to confide. Like a secret diary, patients can report on their mood, emotion and other contextual factors that influence their food intake in a way that is non-judgemental and non-confrontational.”

As with all of Romero Herrera’s research projects, the resulting data will be given back to patients in a way that they can understand and relate. She hopes that Food Sampler will give patients the relevant knowledge to reflect and take action to improve their personal health and wellbeing.

Publishing personal data

Romero Herrera discusses her ambitions to openly publish her datasets. “My datasets are rich. The data is not only relevant to designers but many other professional fields. I’m interested to see how making my data accessible and referable can benefit other scientific research communities.”

She talks about the challenges of sharing personal data. “Of course, the data I have collected during my research is confidential and extremely sensitive. Therefore, if it is made public, it must be anonymised.” After guidance and support from faculty Data Steward, Jeff Love, Romero Herrera has investigated ways to anonymise her data by clustering patients into categories so that individuals cannot be identified. She is now confident to deposit a coded version of her Food Sampler dataset on the 4TU.ResearchData repository.

Back to Chile!

As an editor-in-chief of the open access journal, EAI Endorsed Transactions on Pervasive Health and Technology, Romero Herrera, shares her interest of empowering people through data with the wider scientific community. Next year she will chair the first edition of the EAI International Conference on Digitalising Healthcare, ‘DigiCare 2020’, in Santiago (Chile), inviting researchers, designers, developers and policy makers to further explore the role of technological innovation in solving societal health challenges.

Thank you Natalia for showing us a new side to citizen science… “Rather than simply taking data from people, we can give data back to people to influence their actions.”

Carve your niche with ‘The Carpentries’

Written and illustrated by Connie Clare

TU Delft Library met Data Champions from the Department of Biotechnology, Victor Koppejan and Raúl A. Ortiz Merino, to celebrate their Software and Data Carpentry workshop success.

Victor Koppejan is a PhD student in computational Bioprocess Engineering. His doctoral research involves the purification of proteins from biological matrices for industrial use using expanded bed adsorption. Koppejan employs open source computer simulation tools to model the dynamics of fluid and particle flow in a fluidized bed using the Dutch National Super Computer

Raúl Ortiz Merino is a Postdoctoral researcher in Industrial Microbiology. He works within the field of comparative genomics to characterise microbial gene sequences of commercial relevance. After years spent training as a wet lab biochemist, Ortiz Merino made his transition to dry lab computational science and is now an experienced bioinformatician.

Both Data Champions have made a significant contribution to their local research community by sharing their knowledge and expertise during Software and Data Carpentry workshops.

What are ‘The Carpentries’?

The Carpentries are a non-profit project, formed in January 2018, to teach basic computing skills to researchers worldwide. The aim is to train and foster an active, inclusive and diverse community of learners and instructors that promotes efficient, open and reproducible computational research.

During two-day Carpentry workshops, instructors and helpers share their mission to teach foundational coding and data skills using openly-available lesson material and evidence-based teaching practices. Anyone can register to attend, no matter their skill level, and what’s more, at TU Delft it’s free to join!

Bringing The Carpentries to TU Delft

The 4TU.Centre for Research Data became a Gold Member of The Carpentries to bring instructor training and workshops to TU Delft. Since piloting Software Carpentry for the first time in November 2019, the university has hosted two Software Carpentry workshops. After helping in the first workshop, Koppejan decided to become a certified Carpentries instructor in February and successfully held his own Software Carpentry workshop in March this year! The most recent Software Carpentry workshop took place on 8-9th July.

Ortiz Merino has volunteered his help during all workshops at TU Delft and has conducted his own whilst undertaking his certified instructor training. To build upon the experience he gained during the Software Carpentries, Ortiz Merino was financially supported by The Data Champion travel fund to join the Introduction to Reproducible Genomics: Data Carpentry in Ghent (Belgium). He used this valuable experience to collaborate with fellow Bioinformatician and Data Champion, Marcel van den Broek, and organise the first TU Delft Data Carpentry workshop in June. You can read more about TU Delft’s Data Carpentry workshop here.

Researchers attend TU Delft’s first Genomics Carpentry to ‘shape up’ their data science skills!

Software or Data Carpentry? What’s the difference?

Software Carpentry workshops are designed for researchers who want to learn how to programme more effectively. Typically, three core topics are taught; The Unix shell, version control with Git, and a programming language (Python or R).

Data Carpentry workshops are designed for researchers who are dealing with domain-specific data. The workshops are centred around a single dataset and teach participants project organisation and management, introduction to the command line, data wrangling and processing, and introduction to cloud computing for genomics.

Why invest in The Carpentries?

We heard why both Data Champions elect to use Carpentry workshops as a means of disseminating knowledge amongst a wider audience.

“I became inspired by open online training on high performance computing provided by Argonne National Laboratory (USA),” says Koppejan. “After undertaking online tutorials, I was enthusiastic to share the knowledge I’d gained with my colleagues but didn’t have sufficient time to train them all on a one-to-one basis.” He continues, “I believed that becoming a Carpentries instructor would help me spread the word of good code management amongst a larger research community.”

Ortiz Merino shared similar motivations. His research section comprises 4 principal investigators, 4 postdoctoral researchers, 16 PhD students and 10 technicians, not to mention the constant flux of Masters and Bachelors degree students that can reach as many as 50 individuals. He also uses Carpentry workshops to reach more people. “Most members of my section encounter similar research problems I thought The Carpentries workshops would make it easier to gather together to answer queries, explain common concepts and learn as a group.”

Carve your niche, be your own bioinformatician

Bench scientist turned computational biologist, Ortiz Merino, understands the challenges of moving from the wet to dry lab environment. “Nowadays, it’s difficult for biologists to avoid computational approaches all together. Most modern scientists will have to learn computer programming at some point during their career.”

He reflects on his personal experience. “Making the switch is not easy. It took me several years to learn the specialist data science skills required to make my transition from experimental to computational biology. Working as an intermediary between the two spheres, I want to bring the wet and dry lab closer together and I believe Data Carpentry workshops can help me to achieve this.”

The workshops introduce wet lab scientists to computational tools in an approachable way, bridging the gap between generating and analysing data. “Participants receive all of the basic information they need in a structured two-day workshop so that they can start learning how to become their own bioinformatician.” Ortiz Merino assures that Data Carpentry workshops are the best way to learn.

Sculpt your Soft Skills

Aside teaching technical skills, The Carpentries teach soft skills that enhance personal and professional development. Koppejan explains how training to become a certified instructor aided in the development of his interpersonal skills. “The instructor training programme taught me how to communicate more effectively and interact harmoniously with workshop participants. I became more conscious of listening and teaching with empathy.”

Koppejan emphasised the importance of creating an inclusive, interactive and collaborative learning environment. He recounted his positive experience of attending the Collaborations Workshop 2019 (CW19), an ‘un-conference’ that brought multidisciplinary personnel together to Loughborough University (UK), to explore best practices and the future of research software in a relaxed, social setting. The rules of an ‘un-conference’ are simple:

#1. Whoever shows up are the right people.
#2. Whatever happens is fine.
#3. Whenever it starts is the right time.
#4. It’s over when it’s over. 

And, not forgetting ‘The Pac-Man Rule’: When standing in a social circle, always leave enough space to encourage new people to join the group conversation!

We wish our Data Champions good luck in their bright future!

Koppejan’s personal attributes, and the transferable skills he has developed during his time as a Data Champion and Carpentries instructor at TU Delft, have led to exciting career prospects. It’s pleasing to hear that after expressing his interest in Open and FAIR data in a recent job interview, he has secured a position as a Data Scientist at DSM and will start his new role in October!

We send our grateful thanks and best wishes to Victor Koppejan as he enters the next chapter of his career, and also to Raúl Ortiz Merino as he becomes a certified Carpentries instructor at TU Delft.

Book sprint success: A team writing exercise for the win.

Written and illustrated by Connie Clare

Following in the footsteps of the Open Science Training Handbook, we share our book sprint success story and some ideas to help with your collaborative writing.

On 10th July our multidisciplinary team of dedicated volunteers checked-in at a hotel in the Hague, Netherlands, to participate in a three-day book sprint. For those unfamiliar with the concept, a ‘book sprint’ is an exercise of writing a book collaboratively in a short period of time, usually less than a week. Yet, together we learned that a book sprint is much more than a mere writing exercise; for us it was a truly rewarding and memorable experience that we can all reflect back on with pride. Here, we share our book sprint success story and reasons for why we advocate collaborative writing.

Book sprint motivation

Our motivation to write the open book, ‘Engaging researchers with research data: The Cookbook’, came about to inform and inspire members of the wider research community who are interested in good research data management (RDM) practice. As an RDA project, under the umbrella of the Libraries for Research Interest Group, the book presents a variety of case studies that demonstrate innovative approaches taken by international institutions to effectively engage researchers with RDM. As the title implies, the book is analogous to a cookbook in the sense that each case study is presented in a similar format to that of a recipe; each comprises a list of ‘key ingredients’, i.e. the essential constituents required to successfully implement the initiative. We wanted to create a resource that can be used by institutions to select suitable initiatives that promise to drive cultural change towards better RDM.

Planning, preparation and positivity

Undoubtedly, drafting a book in three days is an ambitious task that requires thorough planning and preparation. Our book sprint preparation began in January 2019 when the survey ‘Research Engagement with Data Management – What works?’ was circulated among 60 funding organisations, 80 scientific institutions and 28 mailing lists to invite case study contributors to share their stories. From 90 complete survey responses, the most interesting case studies were shortlisted and following successive rounds of selection, 24 case studies were finally chosen for publication based on the novelty and innovation criteria agreed by the book sprint team.

The selected case studies were divided between six authors from different European institutions; Connie Clare (TU Delft), Elli Papadopoulou (Athena Research Center), Marta Teperek (TU Delft), Yan Wang (TU Delft), Iza Witkowska (Utrecht University) and Joanne Yeomans (Leiden University). In advance of the book sprint, authors conducted hour-long interviews with their respective contributors to collect all of the content required to write each case study. After collating interview transcripts, photographs, quotes and various other supplementary material, we each arrived at the book sprint equipped with all of the essentials to co-write the book, including a positive mindset!

On your marks, get set, go!

The book sprint began with a warm welcome from facilitators, Marta Teperek (author) and Maria Cruz (editor), to introduce the aims, objectives and overarching vision for the event. Marta explains that “it’s important that everyone involved in the book sprint understands the intentions and expectations of the exercise in order to work efficiently towards the common goal.” Maria shared some handy writing ‘tips and tricks’ with us to stimulate ideas and boost our creative thinking as we embarked on our first book sprint together. The room filled with anticipation as we grew eager to start writing. 

Sticking to schedule 

In order to assemble the wealth of information and write a book under strict time constraints, it’s important to maximise content creation from the very beginning of the book sprint. Adhering to a daily itinerary helped authors to plan their writing goals, gather momentum and focus on writing for prolonged time periods. With time allocated for writing blocks, group discussion and scheduled breaks, each day was varied to retain interest and enthusiasm amongst the team.

Freeing your creative mind

Whilst the itinerary offered structure, our facilitators ensured that it was flexible, permitting team members to take breaks whenever they needed to recharge. “It’s important that individuals look after themselves during the book sprint,” says Maria. “If anyone needs a change of scenery or to go for a walk in the fresh air they should have the freedom to do so.”

In this regard, the hotel provided the perfect setting to free our creative minds. Most of the time, authors could be found working together in the cosy, shared conference room with an unlimited supply of coffee, chocolates and pastries. Occasionally, it was nice to wander the spacious grounds in search of a quiet location to write alone, or to retire to the comfort of your relaxing hotel room to eliminate distractions.

The book sprint team hard at work in the conference room.
Note: The green and red post-it notes!
A red post-it note on your laptop means ‘Engaged – Don’t bother me!’
A green post-it note on your laptop means ‘Vacant – Open for conversation!’

The entire team would convene in the conference room before lunch and at the end of each day to assess progress. This offered an opportunity to collectively reflect on the day’s achievements and adjust goals accordingly in preparation for the following day. During the evening, our remote editor, James Savage (University of Cambridge), logged in to the book sprint from the UK to edit case study drafts. This perpetuated a state of creative flow as authors felt encouraged to complete their drafts before the end of each day so that James could edit them overnight. Case studies could then be further improved the next day in response to his editorial comments and suggestions.

Team writing for the win 

Our collaborative efforts were facilitated by Google Docs which worked for us as the online authoring tool. With options to highlight, edit and leave comments within the text, all team members were able to work and communicate simultaneously within the same document. Being able to work transparently built confidence among team members; it was inspiring to read other authors’ case studies whilst writing your own. What’s more, since each author brought their own unique style of writing to the book sprint and so we could all learn valuable lessons from one another.  

Interactive activities also helped us to write as a team. One tactic, ‘Pitch it to your partner’, employed the traditional method of writing our names on slips of paper and drawing them from a hat to be randomly partnered with a fellow author. We then ‘pitched’ our case study to our partner and discussed how we planned to approach writing it up as a story. This was an ideal chance to provide guidance and support for one another. 

Team Spirit 

Our book sprint success clearly demonstrates the power of teamwork. “Writing a book on my own would have taken forever,” admits Marta. “In only three days we’ve produced a solid draft of a book! I’m so proud of the team and will now always advocate collaborative writing. Thanks to all team members for your hard work, feedback and encouragement.”

Yan shares similar sentiments, “I had a joyous few days working on the book! I could never have imagined myself working so efficiently. We inspired each other, worked hard and laughed together.”

It’s true to say that the team established a rapport instantly and we had so much fun. If you don’t believe that three days of writing a book can be fun we have photo evidence to prove it!

Excitement takes hold on Day 2 of the book sprint as the team explore the hotel grounds.

The next steps: Crossing the book sprint finish line 

In the final hour of the book sprint, roles and responsibilities were assigned to team members so that the draft can be finalised and made publicly open for comments as soon as possible. 

Final versions of case studies have now been sent to contributors for their approval and the appropriate revisions are underway. Team members are mutually reviewing chapters of the book, and regular communication via Slack is keeping the team up-to-date with progress as the book takes shape.

We hope that our book sprint success story inspires you to consider collaborative writing as a new way of academic thinking. We look forward to taking the next steps to publishing our book and ‘crossing the finish line’ as part of a team!

We need you! Our open book is a work in progress as the team wanted to make it available for your comments as soon as possible. We welcome you to provide feedback on the document in order to help us improve the first draft. Thank you!

We would also like to express our thanks to TU Delft Library, and Alastair Dunning in particular, for sponsoring the book sprint.

How to move from FAIR principles to FAIR practice?


This post is written by: Marta Teperek, with contributions by Neil Chue Hong, Stefano Cozzini, Marta Hoffman Sommer, Rob Hooft (Chair of the FAIR-practice team), Liisi Lembinen, Juuso Marttila, and it was originally published on the EOSC Secretariat blog.

Let us know how you are implementing the FAIR principles in practice by filling in a brief survey

On the 4th of July 2019, we had a kick-off meeting in Brussels of the FAIR Working group of the EOSC European Open Science Cloud (EOSC) governance. Members of this group have been nominated by the EOSC Governance Board and Executive Board. The aim of the group is to provide recommendations on the implementation of FAIR (Findable, Accessible, Interoperable, Reusable) practices within the EOSC, largely inspired by the action plan outlined in the report Turning FAIR into reality. Given that the FAIR Working Group consists of almost 30 members, we split into 4 teams to enable efficient and effective working: PID Policy, FAIR Practice, Interoperability and Metrics & Certification.

We, the authors of this blog post, are the FAIR Practice team. The key objective of our team is to understand what are the current practices in different (research) communities and what are their levels of FAIRness. After the FAIR principles were published, they rapidly gained a lot of traction and interest, including among the advocates of good data management practices and open data. National and international funding bodies ask researchers to make all their data FAIR as one of their funding conditions. Now, FAIR is to be at the core of the EOSC. Barend Mons even remarked on Twitter that attitudes to FAIR have changed in the last four years – it started to be embarrassing to admit in public that one hadn’t heard of FAIR.


FAIR principles – reality check

Many communities, however, still seem to be far from putting FAIR into their daily practices. The 2018 State of Open Data Report found that just 15% of researchers were “familiar with FAIR principles”. Unsurprisingly, out of the 4 classes of FAIR principles, Interoperability and Reusability were the least understood by the respondents. On 26 June 2019, Marta Teperek attended the Carpentry Connect Conference in Manchester and asked the attendees (around 80 people) if they heard about the FAIR principles. Almost all of them replied positively. However, when Marta asked the follow-up question: “Who would feel comfortable explaining what FAIR data really means in practice?”, only 4 out of 80 people replied “yes”. This is quite revealing given that the participants of the Carpentry Connect conference are typically very well aware of interoperability and reusability issues. Similar reflections were made by Maria Cruz on 19 June 2019 at the OAI 11 – The CERN-UNIGE Workshop on Innovations in Scholarly Communication.

What are the community practices?

So how to bridge that gap? That’s exactly what the FAIR practice team will be investigating and making recommendations on to the European Commission. To develop these recommendations we first need to understand the current community practices. This will allow us to identify both the best practices, which might serve as a source of inspiration for others, as well as barriers preventing communities from implementing FAIR practices. Understanding the barriers will help us to make recommendations to overcome those challenges. The awareness of the current practices and the ability to make realistic expectations is also essential for two other teams of our WG: Interoperability and Metrics & Certification. These teams need to ensure that the recommendations they propose are fit for purpose for the diverse communities they are to serve.

How are we going to do that?

So how are we going to do that? The plan for the group is not to reinvent the wheel, but to instead identify and flag up existing valuable resources which investigate practices in various disciplines (such as, the State of Open Data Report 2018FAIR Data case studies in EngineeringFAIR Data Advanced Use CasesFAIR in practice report by Jiscthe FAIR Implementation Matrix), and also to liaise with other projects, such as FAIRsFAIR, which are already investigating these practices. This will allow us to gather a body of knowledge and evidence, based on which recommendations will be made.

How can you get involved?

In order to better understand the community practices, we would be delighted to hear from you. You can get involved in numerous ways:

Questions? Comments?

If you have any additional questions or comments, don’t hesitate to get in touch with us at any stage by emailing

For more info, read the blog post from the inaugural meeting.