Open Science Barcamp Berlin 2019
Authors: Esther Plomp, Nicolas Dintzer, Heather Andrews, Yan Wang
On the March the 18th, the Data Stewards of Applied Sciences – Esther Plomp, Technology Policy and Management – Nicolas Dintzer, Architecture and The Built Environment – Yan Wang, and Aerospace Engineering – Heather Andrews, attended their first Open Science Barcamp at the Wikimedia in Berlin. The typical barcamp session does not have lengthy talks, instead you should look for advice or initiate an interactive discussion. The barcamp provided opportunity to join five sessions of 45 min each (with strict time keeping done with the latest equipment), followed by 15 min breaks, with four parallel choices. In total 20 sessions took place. All sessions were recorded in an etherpad. Some of the sessions were prepared before the start of the barcamp, using the etherpad, others were proposed during the barcamp. The sessions were organised during the barcamp in a morning and afternoon planning session. This was done to stimulate the proposal of new sessions to further discuss topics that were raised during the morning sessions. Below, some of the sessions are outlined and briefly summarised.
Session #1 Help the trainer train a trainer
Helene Brinken and Gwen Frank from the FOSTER project wanted feedback on the train-the-trainer approach, which identifies the gaps in current knowledge/skills and provides materials and training for trainers that promote the use of Open Science practices (bootcamps, the Open Science Training Handbook). As the FOSTER project ends the created materials, which are under an open licence, will likely become available through OpenAIRE, and through the projects of partners.
Among the recommendations discussed during this session, it was agreed that Open Science trainings should be short (to allow attendance of people with crowded schedules) and take place in a context where they connect with the participants (e.g., tailored to a specific field or addressing daily practise problems). Lengthy talks are not advisable (max 20 min), but at the same time the audience cannot be expected to be familiar with the basics and/or be at the same level. If you do not want to go over the basics make sure that participants go over existing courses (FOSTER,Open Science MOOC) to ensure that they are on the same page. Open Science trainings should not be too evangelistic: sometimes the term ‘Open Science’ can scare off people from attending the training/workshop. Other phrasing, such as ‘good scientific practises in the digital age’, may be more attractive to participants. Trainings should provide an environment where participants can learn from each other: for example the Open Science communities in the Netherlands, at the universities of Utrecht, Leiden, Amsterdam and Eindhoven, that follow a bottom-up strategy to promote Open Science practices among researchers
Conclusions of the session were that 1) it is difficult to train people in Open Science practises, as we are still in a transitional phase, 2) there is no training on how to inspire the cultural change required for the transition to Open Science, 3) and although there is plenty course material available, it is not organised in a register. This register should include interactive exercises which are properly licenced and attributed with metadata, so that materials can be found, re-used and adjusted for specific purposes/trainings.
Session #2 Participatory research
This session on participatory research, or “citizen science”, followed up on the inspirational talk in the morning by Claudia Gobel (Museum fur naturkunde Berlin) during the kick-off session. Participatory research involves the engagement of people who are not employed to do research in the production of scientific knowledge. Claudia argued that Open Science discussions usually focus on developments within scientific institutions, but that science should also be open to participation and cooperation of volunteers. This requires research results to be made available to everyone and everyone should be welcome to comment/improve/build on those materials.
During this session, the participants listed what they perceived as challenges to be overcome in order to make “citizen science” possible and accelerate knowledge dissemination. Some of the key challenges identified are scientific literacy of the non-researchers, ethical constraints, ownership and access to data, delayed dissemination of knowledge in academic research, funding for lasting solutions, and lack of awareness of citizen initiatives on the part of researchers. Participatory research is often not specified/defined and can be very narrow in its application (limited to crowd sourcing), which does not explore the full potential of citizen science. It is very important to work together across disciplines and boundaries to improve worldwide public engagement: to transform what is thought of as ‘scientific knowledge’.
Session #5 Humanities = black sheep?
This session was chaired by Erzsébet Tóth-Czifra and Ulrike Wuttke. The discussion focused on disciplinary characteristics in humanities regarding digitalisation and Open Science: are ’digital humanities’ actually the ‘new’ Open Science practices or the ‘black sheeps’ for the humanities research communities?
For researchers in Humanities, Open Science poses specific challenges. They, more than in other fields, publish their findings in books which are rarely open access. The data is often personal and sensitive, and is therefore difficult to share as anonymisation may lead to significant losses of information and conditioning for re-using data are case specific.
However, there are also opportunities for humanities to move towards Open Science. Some initiatives in humanities include DARIAH, DARIAH-DE, CLARIN, CESSDA, OPERAS, and the SSH Open Cloud project. The open methods platform highlights all kinds of content types (blogs, preprints, research articles, podcasts etc.) about digital humanities methods and tools. There are also many attempts to promoting Open Access publishing without asking for Article Processing Fees (APCs), such as Open Library of Humanities and Language Science Press. We also see digital leaders in different fields moving the discipline forward, such as German Galleries Libraries Archives Museum (GLAM) institutions which collaborate and share data in the Coding da Vinci project.
Session #6 Federated Databases for sustainability
This session was chaired by Christian Busse from the German Center on Cancer Research (DKFZ) who posed some challenges on sharing research data on repositories. Currently a small number of large repositories is used. Although repositories strive to make their data FAIR (Findable, Accessible, Interoperable and Reusable), the repositories themselves may be difficult to find, which results in data being available in a small number of large repositories.
The proposed solution is an API (Application Programming Interface) which could be implemented in local repositories (the ones researchers actively work with). This API would expose basic information of the repository and offer a unified way of discovering data. This approach would allow researchers to keep their data close (the data could be copied from one repository to another), facilitate backups, and the unified API would support the creation of a meta-search engine. This solution requires a proof of concept to be put in place, and long term funding for its sustainable development and deployment. How such project can be funded in the long run is still unclear.
Session #7 Open Science and IP
This session, organised by Elisabeth Eppinger and Viola Prifti from the IPACST project, focused on Open Science discussions in research where patentable inventions are involved. Increased patenting can potentially harm science as it forces the system to remain closed. During the session it became apparent that there is no ‘one size fits all’ solution, as the degree of openness when collaborating with commercial partners varies between and within institutes, depending on the commercial companies and (inter/national) funders involved. Researchers think that you can either commercialise the data or open it up. This is a matter of perception, however, as there are usually ‘in between options’ available, such as publishing the results using mock data instead of the original data (e.g. when the original data belongs to a commercial partner and it has to remain closed). Publishing the methods may be more important in some cases than publishing the underlying (commercial) data, as the innovative methodology can be applied to independent datasets and it is therefore still a verifiable research output. Another solution is to only allow patent-free research (as done by e.g. Aarhus University, The Montreal Neurological Institute and Hospital).
Issues were raised during this session that also directly concern researchers at TU Delft, such as publishing code on Github and the ownership of data when master students are involved in research projects. Researchers at other universities also have to go through forms to publish code on Github. This is seen as an administrative burden and there is a degree of fear of being sued involved when researchers do not follow the universities/companies valorisation rules. A powerful argument for convincing researchers to make software open source (and publish it on platforms like Github) is that by doing so, they will be able to avoid the questions about who owns the data/software when they switch institutes. With respect to what happens when students are involved in research projects with commercial partners, master students do not fall under the same regulations as university employees. This has implications when discussing data ownership. Master students are generally not aware/notified of who owns the data, which can lead to problematic situations where students take the data with them when they finish their project. At TU Delft a way to prevent this is to let the student sign a waiver before they participate in commercial research projects.
Involvement of commercial partners is not necessarily bad for scientific progress! These collaborations increase the awareness of IP rights (data ownership), patents can boost researchers visibility; and patents can be used as a form of ‘risk management’ by patenting the highly valuable research outcomes before companies do so (and then opening the outcomes up to the public under a Creative Commons licence).
Session #10 Knowledge repo
In this session Michael Rustler from the FAKIN project presented their pilot of a ‘knowledge repository’ which collects information from different sources (e.g. DataCamp, GitHub, Zenodo and Endnote) and establishes and stores links between different objects (e.g. codes, project, people, publication, and tools) in one place. The repository has been built using R, Hugo and GitHub/GitLab, and contents can be added in the form of text file templates. This knowledge repository allows for standardised workflows and can help to implement a pre-producible research process, avoiding the loss of important knowledge as a result of informal workflows that have insufficient documentation of procedures and poor description of decision-making processes by multiple stakeholders.
This 20-day young ‘Knowledge Repo’ has great potential for establishing greater transparency across staff working within the same institutions and increase collaboration opportunities. One current challenge for end users is that the platform is quite tech-savvy and it is not at the stage for user open contributions yet. One participant shared his experience when creating a knowledge repository at ZB med institute by using XWiki. Discussions among the participants also arose about other forms of knowledge exchange, such as writing minutes, providing links and so on.
Session # 11 Exchange between less + more experienced OS people, advice giving – Melanie Imming
This session, organised by Melanie Imming, aimed to establish a way to easily connect people who are looking for advice on Open Science matters, by for example using a Twitter hashtag. In the Netherlands the Open Science speaker registry attempts to connect people for this purpose, but Melanie argued that this platform does not make it easier to approach people and that it is not desirable to pin this connection on a single registry or organisation. Melanie introduced the OSNL meet-up event as an example of such an initiative for alternative forms of connection.
The participants argued that this connection could be established by this attached it to an existing initiative to make implementation easier (such as within the university’s course curriculum), preferable at a local level. At this point the Data Champion programme at TU Delft was introduced, demonstrating that it works well to have individuals in an organisation that set an example for others and are approachable for questions. Other mentoring initiatives with institutional coordination that were brought up in the discussion were the Wikimedia Open Science Fellowships and Mozilla Open Leadership Fellows. Examples of more international approaches are OpenAIRE, Open Science MOOC, Open Science Reddit, and the Moving MOOC.
Session #12 How to find Research Software for Open Science
Software has become a powerful tool for research in an era where machine readable data is gathered by the second. Increased automation, high performance computation and high resolution modelling have become a crucial aspect of research, and have led to an exponential development of research software. At the same time, this has increased the demand for more re-use and interoperability of software developed by the community. However, finding the necessary piece of code to re-use or build upon can become a tricky business, and citability of research software is not a well-established practice yet.
During this session it was discussed how researchers look for and discover software to re-use or built upon. Is it necessary to have a unique place where all software could be searched for? Attendants pointed out that software discovery happens mostly by word-of-mouth (through colleagues), by looking at references in research papers, and via services like Gitlab/Github, Google, and Twitter. It was then proposed that instead of creating a mega repository for all available software, discoverability is a matter of making code citable and linking it to datasets with which it has been used. To make code citable an persistent identifier can be assigned to a Github repository by linking it to Zenodo. University of Bielefeld has implemented a similar system linked to their institutional repository. This type of approach would facilitate the link between papers, software and data.
The participants also raised concerns regarding the quality of software tools developed by academics (which is very difficult to assess without trying the software out) and the lack of support when relying on open source solutions. Instead of measuring the quality of software it was proposed that more software metadata should be provided when publishing software, so that re-users are better informed about what the software can/cannot do, and also about the datasets the software has been used on.
Session #13 Vive la Open Science revolution!
This session dealt with an evaluation of the current movements in Open Science. Participants agreed that it is difficult to change current practises, as it feels like you have to completely haul over the current system. The Open Science movement was compared to feminism where intersectionality plays a key role. This means that a single problem, such as ‘closed science’, cannot be tackled alone as it is interconnected with other areas such as the hierarchy and exclusivity in science. The Open Science movement is thus complex and it remains difficult to find a common ground and shared understanding, which is required to revolutionise Open Science.
Session #15 Including OS in current PM activities
This session was attended mainly by support staff of universities and research institutes. Hence the focus was on sharing the problems faced when dealing with project management while complying with Open Science requirements. Most common issues included researchers not allocating project money for Open Science (e.g., costs related to data publishing); researchers not foreseeing data sharing protocols/services when exchanging data with collaborators (particularly when dealing with sensitive data); and the persistent view of data management planning as an administrative task rather than a research-related deliverable.
In order to tackle such issues, participants agreed that a proper approach would be to establish communication with the researcher(s) during the proposal phase. Contacting researchers at an early stage would help solving mainly the unforeseen issues (e.g., budget, data sharing) and can also help in changing the researcher’s culture about data management planning (and Open Science). In this aspect, the Data Stewards of TU Delft present at the session shared their experiences when talking to researchers about creating awareness and trying to change the culture from within. This is definitely a difficult task to accomplish, and constant (and consistent!) communication between the different support services from the respective university/institute is crucial.
Aside what is mentioned above, the lack of support staff itself and the lack of Open Science infrastructure was admitted to become an increasing problem nowadays: projects need more specific support on how to manage their data and practice Open Science. Such type of support is not uniquely related to ‘project management’ nor uniquely related to ‘research’. It is in between, and this is what needs to be understood by both, the support staff and the researchers (and the big bosses of course!).
Session #16 Open vs. closed citations
This session talked about opening up citations, the references in articles, to make them openly available. A practical guide is provided in the etherpad.
Session #17 Researchers Engagement in Open Science
During our own TU Delft Data Stewards session we aimed to bring the researcher’s perspective towards Open Science and Research Data Management (RDM) on the barcamp table. We were interested in knowing how people from various institutions interact with researchers and try to accomplish a ‘change in attitude’ in researchers that moves away from practising Open Science because the funders mandate it, to the new scientific norm.
At the start of the session we briefly explained the Data Stewardship programme at TU Delft, where the Data Stewards are based at the faculties and are coordinated by the TU Delft Library. Each Data Steward has a related research background to their faculty, in order to support researchers with discipline-specific RDM practises. Each Data Steward then acts as a liaison agent to other services within the university, such as the legal office, ICT and the Library whenever necessary. This connection with researchers is vital to promote Open Science as evident from a summary Open Science Radio recording from another session.
Communication between different support services was a key point of the session. Local and central support should work together rather than around each other, to provide high quality support; and it should be made clear where the boundaries between the two support services are. Local support gradually builds up relationships by providing discipline specific support that will lead to a better understanding of researchers needs and earn their trust. Local support can then connect the researcher to the more general services that are provided by the central/library support, while at the same time, increase the awareness of these services amongst researchers.
As researchers are the main players in Open Science, we argued that they should be in the centre of the movement and should be supported when implementing best practises. This opinion was shared with the workshop participants that argued scientists have to take too many things into account: they are expected to have several skills and be aware of all the regulations. Their job should therefore be made as easy as possible, and tools/services for scientists should be made interoperable by the platforms/services themselves. When Open Science guidelines are aligned with the daily practises of researchers, researchers gradually become more aware of the Open Science movement and they can amend and amplify these practises within their own field. New professional profiles are now emerging to facilitate support with Open Science practises, but not every institution will be able to hire a Data Steward per faculty as at TU Delft. The workload of the Data Stewards at TU Delft will also likely expand with the increasing awareness of their presence at the faculties, and the progressively important role of RDM practises at institutional and national levels.
During our session three other sessions took place, for which short summary recordings made by Open Science Radio are available on Open Science Strategy on Error Culture (#18), Future barcamps (#19) and Feelings and Open Science reflections (#20).
During the wrap up, the meeting was briefly summarised, raising positive and negative aspects. There is a distrust by participants regarding the monitoring of Open Science progression, and a lack of awareness among researchers. However, it is now possible to copy the best practises in Open Science from others as long as their materials are available (such as the FOSTER project and the Open Science MOOC). The barcamp team offered to organise barcamps at other institutions, so perhaps we will meet again soon in the Netherlands!