On 25-29 September 2017 I participated in the “Open Science in Practice” summer school in Lausanne, Switzerland, which was organised by the EPFL. I gave two talks on research data management: one to PhD students who attended the summer school, and another one to the wider community of people interested in Open Science, and for the rest of the event I was learning together with the participants about exciting developments in Open Science, but also observing the event with the hope of bringing it to TU Delft in the future.
The summer school was intense and split into five days focusing on different themes:
- Philosophy and history of Open Science (Day 1)
- Novel ways of publishing (Day 2)
- Research data management and sharing (Day 3)
- Code and tools for software management (Day 4)
- Examples and best practice from researchers working openly (Day 5)
Each day started with a series of morning talks from guest speakers, who set the scene for the different topics and presented most recent developments on the subjects, while the afternoon part consisted of hands-on workshops, where attendees could put their learning into practice.
Below are some highlights of the event and a couple of my personal reflections.
Philosophy and history of Open Science
Luc Henry, the Scientific Advisor to the EPFL’s President and the main driving force behind the event, gave a very inspiring opening speech. He provided the rationale for this event and stressed that “Today’s innovation is tomorrow’s norm”. Attendees who came to the event and who pioneered Open Science now would become the frontrunners of the future.
Open Science might be older than it seems
After this motivating (but still futuristic!) vision, we learned about the history of Open Science from Benedikt Fecher. Benedikt made an important reference to Mertonian Norms of Democratic Science – 4 principles announced in 1942 by Robert K. Merton, which were very similar to ideas postulated nowadays by Open Science movements:
- Disinterestedness – researchers should not be striving for their own good
- Communalism – all outputs and results belong to the community, and not to individuals
- Organised scepticism – community criticism, ability to scrutinise results
- Universalism – results are communicated to everyone
A sad reflection was that today, 75 years later, these principles were still far from being implemented in practice.
Benedikt’s reflections were reinforced by the subsequent speaker, Arnaud Vaganay, who discussed the consequences of not adhering to Mertonian Norms in science. He started with asking the attendees a question what percentage of scientific research was reproducible. Interestingly, most attendees thought that between 5% to 15% of published research was reproducible, which was consistent with data coming from a report recently published by Nature.
Arnaud then reflected on the reasons for lack of reproducibility and concluded that one of the main reasons was that many researchers misunderstood (if not misused) statistics and properties of the p-valued. Many researchers perceived p-value of lower than 0.05 as their Holy Grail. But, as Arnaud reflected citing the Goodhart’s law, “when a measure becomes a target, it ceases to be a good measure”. Which leads to all sorts of reproducibility issues, and many of them have significant negative effects on the society – if results backed up by misused statistics drive policy changes, or inform decisions about introducing of new drugs to market.
How to motivate researchers for Open Science?
Arnaud suggested that to tackle reproducibility crisis researchers needed to skill up to use statistics with greater awareness, and also be more reflective about their projects and experimental design. Arnaud suggested that pre-registering studies on platforms such as the Open Science Framework was an excellent solution to ensure that analyses and results were reported accurately, in an unbiased way. But how to motivate researchers for doing this?
Martin Vetterli, the President of EPFL quoted Max Planck and said that “science advances one funeral at a time”, and that the pace of one funeral at a time was too slow to support an effective transition to Open Science. Instead, he insisted that rewards systems in science needed to change to acknowledge researchers who practised greater openness. He mentioned that the change to reward systems was discussed with the human resources division at EPFL, with the aim that Open Science criteria would be used both in hiring and in academic promotion decisions. This was in line with the new European Commission’s proposal on rewards in academia of which TU Delft’s Research Data Services were supportive as well.
The desire for change was also stressed by Laurent Gatto, who gave a brilliant, refreshing talk about the honest early career researcher’s view on open science. He started from reflecting on his own scientific career, saying that being a principal investigator, white male at the University of Cambridge, put him in a very privileged position and as such, he had an obligation to stand up for Open Science. He wished that one day we would stop needing to use the term ‘Open Science’ – it should become simply “science”. His talk was finished with a nice call for greater engagement in the openness from every researcher – every person could make a difference and contribute to change. Laurent mentioned various initiatives in which Early Career Researchers could get involved – from ensuring fair and thorough peer-review, all the way through participating in larger campaigns such as Bullied into Bad Science.
The power of pre-prints
The second day of the summer school focused on exploring the new ways of publishing. The day kicked off with an inspiring talk from Jessica Polka, who introduced the participants to pre-prints: what they were, what was their goal and how researchers could use them to accelerate discovery and improve the quality of research overall.
The idea I have not heard about before and I liked a lot was to base journal clubs on pre-prints, similarly to what Prachee Avasthi did in her research group. This meant reviewing unpublished work, which not only accelerated discovery, but also improved the quality of research by sharing feedback with authors before corresponding papers were published. In addition, if coupled with using tools such as prepubmed to search for pre-print content, one could really look for valuable content, and counteract the addiction to the journal’s brand and impact factor.
Stories vs science; sexiness vs substance
Lawrence Rajendran explored the issue of the content and quality of research which should matter more in science in greater details. He started from a sad reflection that due to the flawed reward systems in academia, many researchers worked on the assumption that it was better to be first than to be right. This was, of course, a huge problem for scientific integrity. He suggested that journals which published sexy stories instead of important, but orphan data and results, were one of the main reasons for this.
For this reason, Lawrence Rajendran co-founded Science Matters – a journal which published single observations, for as long as the methodology was sound. Single observations could be then merged into larger stories with new observations published. In addition, to combat peer-review bias, Science Matters supported a triple blind peer-review process, where even the handling editor did not know the identity of all authors.
Reproducible, Replicable, Robust and Generalisable
But what does it mean to actually conduct reproducible experiments and use sound methodology to conduct research? Kirstie Whitaker started explaining the issue from defining several terms:
- Reproducible: getting the same result using the same data and the same code (this is usually considered as the minimum requirement for scientific findings)
- Replicable: getting the same result using different data, but the same code
- Robust: ending up with the same outcome by using the same data, but a different code
- Generalisable: ending up with the same results despite using different data and different code – this should be the aim of all research
Kirstie acknowledged that writing a reproducible paper took more time than writing papers which were not reproducible. However, the time it takes to write reproducibly decreases with time and practice. She suggested that every researcher should start from making small steps towards improving their day to day research workflows in order to make their work reproducible and easily shareable. Some of her tips included publishing protocols,using dedicated software to support working with code and effective commenting, such as Jupyter Notebooks, and managing the code itself in tools which support version control, such as GitHub.
Tailoring the message
I personally loved all three talks in this session. But I wonder whether the attendees, who were mostly PhD students who came to learn about Open Science, would not have benefitted from getting to know first about the current publishing systems and their flaws. Otherwise, I would imagine that it might have been quite difficult for them to fully embrace important messages in Jessica Polka’s talk about the benefits of using pre-prints, or her comments about prepubmed selectively excluding pre-prints from BioRN.
Something worth keeping in mind for future summer schools!
Good data management and sharing is a prerequisite for Open Science
The third day of the summer school focused on good data management, as a necessary prerequisite for Open Science.
It started from a talk by Sunje Dallmeier-Tiessen. Sunje emphasised that too often the discussion about data management centers around Open Data and data sharing, whereas the focus should be data accessibility. There might be many valid reasons, such as commercial constraints or privacy concerns which would not allow for public sharing of research data. On the other hand, datasets should be always accessible, in accordance with FAIR principles, so the emphasis should be on the “R” words, such as reproducibility, reusability, replicability, repurposability. Sunje reflected about the fact that current policies and mandates for data sharing allowed for irreproducible research to be published, as long as supporting research data were shared (= tick box checked).
The licensing conundrum
Sunje also spoke about licensing research data and postulated that because the research was mostly publicly funded, results belonged to the public – researchers should share their data publicly under a CC0 licence. In fact, all CERN’s datasets were published under CC0 licence. CC0 licence means that anyone can re-use, modify, repurpose, and basically do everything they want with research data, without the need for attributing the original author. This sparked an interesting discussion with the attendees and other speakers, who thought that for many researchers the attribution requirement, but sometimes also the fact that data won’t be used for commercial purposes, are important.
This sparked an interesting discussion with the attendees and other speakers, who thought that for many researchers the attribution requirement, but sometimes also the fact that data won’t be used for commercial purposes, were important and thus a CC0 licence was not always an acceptable choice.
My personal view on this is that we should always listen to what researchers have to say and that they might have valid reasons for selecting licences more restrictive than CC0. My feeling is that people who support researchers in their data management and sharing should encourage best behaviours, but should also respect all the little steps that researchers take towards greater openness and sharing. Sometimes being too prescriptive and too demanding might be discouraging for people who otherwise could be nudged in the right direction.
Selfish benefits of good data management and sharing
My talk was also on the third day and it focused on the selfish benefits of data management and sharing. The rationale behind my talk was that policies and mandates alone were unlikely to lead to more reproducible research. I previously worked at the University of Cambridge, which, similarly to many other UK institutions, received research funding from the Engineering and Physical Sciences Research Council (EPSRC). EPSRC had a strict policy on data sharing, saying that all papers resulted from their funding needed to have a data sharing statement and research data had to be available. The policy resulted in a big spike in research data shared in data repositories, but many of these datasets were of poor quality and were not reusable.
If data policies and mandates are not accompanied by proper training and advocacy, they might lead to tick box checking activities, and not necessarily to real improvements of behaviours. Therefore, my talk focused on the selfish reasons which would motivate researchers to improve their day to day data management practice and make them consider sharing of their research data due to internal incentives. My presentation is publicly available, so you can check out the arguments yourself.
Make a real difference with Open Hardware
The final talk of the session went beyond research data. Lucia Prieto spoke about her work for Trend in Africa and about the real-life importance of open hardware. She explained that while trying to teach researchers in Africa scientific methods, the barrier was often the expensive hardware. She gave a shocking example of a simple plastic comb used by researchers to cast gels to separate molecules of different size. These combs were sold for over $70 per comb! Which prompted her and her colleagues towards DYI and constructing own tools and equipment with the use of 3D printing and open hardware projects. Lucia provided several examples of extremely successful open hardware projects and how they could provide viable alternatives to their very expensive market analogues, such as Open PCR, qualitative PCR machines or a microscope.
Lucia finished her talk by referring to the story of the Boy Who Harnessed The Wind – a Malawian boy who built his own windmill in order to provide irrigation to his home village and to end famine. A wonderful inspiration and a great example stressing the real life importance of knowledge sharing.
Research will be impossible without openness and transparency
Software management and sharing is an integral part of Open Science, and the fourth day of the summer school was devoted to this topic. Victoria Stodden started her talk by saying that virtually all of today’s discoveries had a computational component. Unfortunately, there was a mismatch between the traditional research communication and the desire to present everything in a pdf of a paper, and the needs of computational research, which led to reproducibility concerns. However, Victoria was positive about the future. She believed that change would happen. She thought it would not be driven by policies or ethical thinking, but because computational models and the connections between data and code would be so deep and complex, that there would not be any other possibility for doing research other than to be transparent from the very start of the research process.
Paper should become simply the advertisement for research
Victoria hoped that soon the traditional pdfs of papers would become merely advertisements for reproducible research, which would be available and documented elsewhere. Gael Varogueux further explored the complex issue of computational reproducibility and started from saying that it was not trivial – it was not simple to make one’s computations reproducible, and lots of effort was needed. He then provided series of advice to the participants and my favourite one was to try making the code understandable. Gael quoted Martin Fowler: “Any fool can write a code that a computer can understand. Good programmers write code that humans can understand.” One example was to use meaningful variable names instead of “a”, “b”, “c” and “d”. Another one was to simply ask people for feedback which would result in an improvement to coding practices. He said that in his research community, researchers would read each other’s code and help improve it. Which I thought was an excellent step not only to improve computational reproducibility but also to facilitate new collaborations.
Open Science in practice – case studies
The last day of the summer school was my favourite. It consisted of short presentations by researchers from EPFL and around, who tried to implement Open Science practices in their day to day workflows and who shared their experiences. Case studies presented by researchers were excellent and brought a lot of authenticity to the event.
Open data does not lead to scooping
Katrin Beyer mentioned that on top of the ‘usual’ motivations for data sharing, one of her drivers was that she would have to share research data underpinning her findings anyway if someone asked for it. So why not doing it right from the start and sharing by default? She also mentioned that her initial concern when she decided to start publishing her data openly, was that perhaps as a result she would publish fewer papers (because someone could do additional analyses and ask interesting questions interrogating her datasets before her). I personally hear this argument from researchers too often! Katrin said that the outcome was to the contrary: not only she published more papers, but sharing also led to many new collaborations due to improved visibility of her datasets.
For some disciplines “Open Science” is simply “Science”
Bart Deplancke started his talk by saying that he was surprised to be invited to give a talk because according to him he did not do anything special with his research practice. He explained that doing genomics research implied openness and sharing. In genomics, Open Science and Open Data were simply “science” and “data”. Which, coming from genomics field myself, I completely agree with and I love to use this example when talking to researchers from other disciplines to demonstrate that change is possible and already took place for some communities.
Open Source and long-term sustainability of the product
Bart also raised an important point that when it comes to software, open source might not always be the best solution. He mentioned that most of his projects were entirely open source, but that he also created a company called Genohm – a commercial entity, with a flagship product called SLIMS – an Electronic Lab Notebook platform. He argued that commercialising the product made it self-sustainable long-term. In addition, his company was hiring people, paying their salaries and thus contributed to economic growth. He stressed that “open source” was not a philosophy, but simply a feature of the software, and one should think carefully when deciding whether commercial or open source options were better for the use and long-term sustainability of the product. Which I thought were interesting arguments and perhaps worth exploring in a dedicated event on the topic.
Global problems can’t be solved without public sharing
The last talk in the series that I wished to highlight was the talk by Dasardeen Mauree. Dasaraden is a climate change researcher. He started from saying that before coming to EPFL, he studied in Paris, where he often struggled to get access to scientific literature. How come that outcomes of research funded by public money are not available to taxpayers? He pledged that climate change is a global problem and therefore his research has to be globally and openly available. He reflected that everyone needed to pick their battles; his battle was to publish all his research in fully Open Access journals, which he was winning so far. Dasaraden also highlighted issues with single-blind peer-review systems which often put early career researchers at disadvantage. He suggested that the best solution for him was to publish his work with Frontiers – which not only was a fully Open Access journal but also supported open peer-review.
Workshops – putting learning into practice
The summer school not only had excellent speakers, but each day had great, interactive workshops in the afternoon, which allowed the participants to put their learning into practice. These were delivered by local support services from EPFL and I need to admit I was truly impressed by their quality. All workshops were very interactive and effectively engaged the participants. As an example, workshops on data management and sharing are usually quite theoretical and typically have only a few interactive exercises. I was impressed by the clever introduction of Zenodo Sandbox, which allowed participants experience the workflow of data publication (and understand how easy it was).
Another beautiful example of supposedly boring topics made engaging and interactive was the workshop on copyright. Participants got real publisher contracts to work on, tried to crack the jargon used by lawyers and to understand what the different clauses meant to them in practice. All with post it notes and flip chart sheets to support effective visualisation of the conclusions made.
Overall I think that the summer school was a great success and the organisers should be really proud of what they managed to achieve. Excellent talks, interactive workshops and communication between participants supported by the use of Slack for chatting, Authorea for note-taking and Twitter for social media sharing, allowed for effective network building between the participants. It was amazing to observe how in five days a group of people who did not know each other and did not know much about Open Science became fully connected and started asking truly deep and important questions about openness, transparency, choice of licensing and disciplinary approaches.
And I thought that the best feedback was that on the last day participants were brainstorming about future regular meetings to keep their community alive and to keep the momentum going. In addition, some participants started implementing open working practices straight after the event. For example, Remy Joseph started his own blog.
I would now really like to organise something similar for researchers at TU Delft! So if you are reading this blog post, if you think alike and would like to collaborate on this, please do get in touch with me!
- My presentation for PhD students: https://doi.org/10.5281/zenodo.997313
- My evening presentation for the wider EPFL community: https://doi.org/10.5281/zenodo.997574
- The programme of the event (all presentations are available and hyperlinked from talk titles): https://osip2017.epfl.ch/page-146300-en.html
- Summary slides of the Summer School presented at the weekly TU Delft Library catch up meeting: Google Slides