Staying open to open options

Authors: Sarah Jones, Tim Smith, Tracy Teal, Marta Teperek, Laurian Williamson


Public institutions who wish to provide top quality data services, but who do not have the capacity to develop or to maintain the necessary infrastructures in-house, often end up procuring solutions from providers of proprietary, commercial services. This happens despite the fact that frequently, the very same public institutions, have strong policies and invest substantial efforts to promote Open Science.

Why does this happen and are there alternative scenarios? What are the challenges from the institutional perspective? Why is it difficult for open source software providers to participate and successfully compete in tenders? And how can we ensure we always keep a range of service options open?

Our intention is to highlight some of the inherent issues of using  tender processes to identify open solutions, to discuss alternative routes, and to suggest possible next steps for the community.

Open competition – unintentionally closed?

Procurement is often the preferred route for selecting new service providers at big public institutions. For example, the European Commission’s public procurement strategy determines thresholds above which it is obligatory to use open procedures for identifying new service providers. This is justified by the principles of transparency, equal treatment, open competition, sound procedural management and the need to put public funds to good use. Hence, the legal teams at public institutions often perceive public procurement as the default option. Public procurement, however, often unintentionally blocks pathways to open solutions, favouring corporate providers of proprietary software.

First, to ensure an equal and fair process, everything needs to be measured. For example, what does usability mean and what level is good enough? What is sufficient service availability? How is it going to be measured? With the emphasis on numbers and legal frameworks, there is little place for open science values and the importance of aligning with missions and visions.

In addition, to facilitate competition, legal teams at public institutions sometimes question requirements or preferences, which seem to them too specific, or which might limit the number of parties able to respond to a tender. This might sometimes put smaller initiatives, with innovative or niche solutions at disadvantage.

Teams going through the tender preparation are often faced with confidentiality clauses. They are intended to make the process fair and equal to everyone. This, however, can make communication for clarifications and scoping with prospective providers (or sometimes even with colleagues within the same department!) challenging. It also means that it might not be possible to communicate with the unsuccessful applicants why their bids were not successful and what areas of their application could have been improved. And it might prevent the sharing of lessons across the sector which is hugely valuable to prevent other institutions falling into the same pitfalls.

Last, small institutional teams at libraries or IT departments who are tasked with finding new services for research data often lack the necessary experience and expertise in procuring solutions. Yet, suddenly they are faced with discussions with legal experts, legal jargon and lengthy documents they are often unfamiliar with and unsure how to tackle, or how to effectively explain what is needed. 

Balancing values, costs and requirements

Providers of open source software, or providers of open services built on open software, are usually fully focused and resourced to simply do specifically that! They are rarely embedded in a larger unit that can market, tender or legally draft/validate responses. They either rely on upfront agreements for expanded functionality or scope where the resources are provided to effect the change, or third parties to offer the service selling and instantiation for specific needs. Hence when they see the needs of a new institute expressed in a tender document, they can often spot an easy match to their current or slightly extended functionality, but can’t afford to speculate resources on trying to compete in an administrative process. 

The odds are low since they often will not have necessary documentation and proofs required in a typical tender process, particularly in an international context. They are unlikely to have the minimum income/turnover, or reference sites, or certifications typically demanded.  They may be excluded from tenders merely on the basis of not having a VAT number in a given country, or turnover in a given currency, or for not having been in existence for sufficient years, or not charging enough for the service. They are focused on what they do well, and often much above the level tendered for, but without the means to guarantee it. Hence, providers of open source software, or providers of open services built on open software perceive tenders as stacked against them. 

Theatre of risk

Much of the challenge simply comes from open source projects being smaller organisations without dedicated personnel to perform compliance and legal work. Additionally, they aren’t able to take and absorb as much risk. Tender processes often involve several types of statements to ensure against certain types of risks. While bigger organisations can absorb such risk, or litigate if needed, smaller organisations don’t have that capacity. 

However, this does not at all mean that they are riskier. The paperwork required does not in fact ensure the organisation proposing the tender against risk, it only has some paperwork to show that it tried. Big organisations can default on their obligations as often as smaller ones. In fact, large organisations may even make the choice to do this without significant negative impact, or decide to change focus. Smaller organisations on the other hand, are committed to that primary purpose as the core of their operations and are able to be more responsive and connected with the client. 

There is always risk involved in any relationship or process, but the requirements of the tender process does not in fact alleviate that risk, creating more risk mitigation theatre than actual risk reduction.

Alternative models

There are many different service delivery models that can be explored. Some of these may not fit a tender exercise, so it’s best to consider all routes first and chat to potential service providers before deciding which avenue to progress.

  • Many companies run open source software on a commercial basis. Atmire, Cosector, Haplo and others can install and maintain services like DSpace and ePrints. They may not be able to respond to procurement exercises as they don’t own the solution so take care in how you frame the specification if you go down this route.
  • Some open infrastructure is run on memberships or subscription models. DMPonline, for example, has an annual or three-year subscription for institutions and funders who wish to customise the tool. Dryad’s model is based on membership fees and individual data publishing charges.
  • Providers like Jisc and GÉANT may broker sector-wide deals that help institutions procure services more easily. Recently Jisc launched a dynamic procurement framework for research data repositories which pre-approves common terms and conditions so institutions can do a lightweight mini-competition based on required functionality. This approach prevents tender exercises from being too heavyweight for smaller service providers, and helps institutions access a wider range of options.

One challenge may be in convincing institutional boards that the university’s typical model for engaging external contractors may not be suitable and could limit the options of who can respond. Exploring some of these alternative models and the relative costs and benefits (e.g. supporting open scholarly infrastructures) is worthwhile.

How to change the status quo?

There are clearly a number of challenges facing research institutions and service providers alike. Everybody wants an open competition where everybody is fairly evaluated on their relative strengths, however the prevalent methods for assessing service options and choosing a provider do not always facilitate this. How can we change the status quo and ensure we keep all options open?

  • Can we provide a forum for research organisations to share lessons learned from running procurement exercises so others have a place to seek advice?
  • Are we able to adjust the de-facto institutional procedures, or consult with providers before defining tenders to ensure the framing doesn’t exclude certain groups or service delivery models? For example, consider the weighting of the functional and non-functional requirements. Should the final deciding criteria be cost or alignment with values?
  • Can we share tactics on helping institutional boards to consider alternative options and challenge preconceptions that it will be cheaper, easier, more sustainable? 
  • Can sector-wide deals be brokered to facilitate a broader range of providers to engage, or how can smaller service providers be enabled to compete with larger operations better placed to respond to tenders? 
  • Can collective bargaining help the sector to secure better terms for education which embody our core values of openness, or can these factors be more heavily weighted in the evaluation criteria?
  • How can the scholarly community work collectively to invest in and sustain open infrastructure?
  • How do we ensure one institution’s investment in a platform (e.g. to develop a new feature) benefits the sector at large?
  • What is the role of user groups to help direct development roadmaps?

Much discussion between institutions and service providers is needed to align needs and visions, especially as tender processes will involve a far wider range of stakeholders who may not have an awareness of the service being procured and what matters in terms of delivery. We hope to provide a forum to explore some of these points in the “Delivering RDM services” workshop which will run adjacent to the RDA plenary in November. 

If we want to keep our options open, we need to share experiences and collectively define a more flexible procedure for commissioning our scholarly infrastructure.

Celebrating 4TU.ResearchData’s Role in Fostering Open Science

Authors: Esther Plomp

The 29th of September 2020 was a big day for 4TU.ResearchData and its team as we celebrated the 10th anniversary and launch of the renewed data repository in the form of a 2-hour online event using the hashtag #10years4TUResearchData.

The event was opened by Marjolein Drent (University Librarian at the University of Twente) who welcomed us and chaired the event. Marjolein introduced us to Alastair Dunning (Head of Research Services at TU Delft Library and previous Head of 4TUResearchData) who delivered the opening speech in which he took us back to the start of 4TU.Research data, or rather, 3TU Datacentrum (see also the blogpost here). Alastair found that much of what was written in the 2006 Report on user requirements of researchers that would use the archive back then is still relevant today. Back in 2006 concerns were already expressed about the time investment and lack of incentives required for practising good data management, as well as a lack of standardisation in these practises. In order to overcome these challenges it is important to keep building connections between people and work on disciplinary specific networks in the future. 

The event’s keynote speaker, Sarah Jones (EOSC Engagement Manager at Géant), shared the do’s and don’ts of supporting Open Science. She praised 4TU.ResearchData approach as both an institutional and discipline specific repository. Rather than inventing the wheel for themselves, the three technical universities set up a network to tackle the challenge of data preservation. To support researchers in practising open science, data archives and support staff should listen to researchers in order to understand their needs and user requirements. According to Sarah, we should not insist on the ‘open’ too much or be evangelical about it as we may push researchers away rather than engage with them in a meaningful way. Sarah highlighted that we should not reinvent the wheel: If there is already a solution, adopt and adapt that one and only develop a new solution as a last resort. It is crucial to incentivise the practices that we would like to see and practise what we preach. Unfortunately, the pressure to publish is still the main stressor of researchers, as they are usually evaluated on this when they want to progress their careers. Unless we make open science practices valuable to researchers, why would they engage in this work? It is also important to build career paths that focus on data and software stewardship. Innovation and space to fail at innovating are important and Sarah thinks we can learn from businesses that are usually more flexible in their approaches to changes. Commercial partners should be engaged with open science. We should highlight the benefits of working openly to commercial partners rather than shutting everything down from the start by signing closed agreements. According to Sarah it is important to share the lessons learned with the wider community, as is done for example with the OpenWorking blog. Sarah also highlighted some of the issues in procuring services for universities, a concern that will be addressed in one of Géant’s workshops on Delivering Research Data Management Services in November. The main take away from Sarah’s keynote is that the problems that we face in open science are not limited to a specific institution or country: they apply globally and cross discipline lines. We need a collective approach in supporting Open Science and this can only be achieved if we work more closely together.

Tweet by Deirdre Casella (Communications Officer 4TU.ResearchData)

The keynote was followed by an interactive pop quiz that was led by Yan Wang (Data Stewardship Coordinator at TU Delft). The quiz consisted of some very tough questions on data repositories, linked open data, computer programmers, and open source sharing. Participants could refuel their energy in the short break after the quiz.

The popquiz participants visualised through Menti.

After the break we had three live interviews! Qian Zhang (Data Steward at University of Twente) interviewed Arnd Hartmanns (Assistant Professor at University of Twente), Natalia Romero (Assistant Professor at TU Delft), and Mathias Funk (Associate Professor at Eindhoven University of Technology). Arnd thinks that we should share research data in order to be able to compare and reproduce research. Natalia thinks it is useful to share her research data, as she collects costly data that consist primarily of in-depth interviews. She deposited her data with 4TU.ResearchData because the Data Steward of her faculty, Jeff Love, pointed her towards the repository. Mathias did not share his data yet, but said that  “data sharing  is important for replication, community building and education.” Both Arnd and Natalia stated that they chose 4TU.ResearchData because of the community behind the repository, through which they feel supported in archiving their data. According to Arnd “4TU.ResearchData is a natural choice if you work at a 4TU University. It is free, local and it is not some anonymous entity: there are people nearby that care.” Natalia thinks that the Data Stewards and Data Champions contribute to the community and said that the 4TU.ResearchData data funds made it an attractive repository.

4TU.ResearchData is a natural choice if you work at a 4TU University. It is free, local and it is not some anonymous entity: there are people nearby that care. – Arnd Hartmanns

The interviews were followed by five parallel breakout sessions (download the slides here):

  1. Dedicated services for Environmental Researchers (led by Egbert Gramsbergen and Kees den Heijer
  2. Restricted access for confidential/personal data (led by Santosh Ilamparuthi)
  3. ‘Reproducibility’ serious game (led by Nicolas Dintzer)
  4. Expert curation services for FAIR Data (led by Jan van der Heul & Eric Rumondor)
  5. 4TU.ResearchData: Demonstration of new features and functionalities (led by Mark Hahnel)
Tweet from the breakout room led by Jan van der Heul & Eric Rumondor

The breakout sessions were followed by the formal relaunch of 4TU.ResearchData by Madeleine de Smaele (4TU.ResearchData Repository Manager):

Tweet by 4TU.ResearchData

4TU.ResearchData transferred their infrastructure from Fedora to Figshare over the summer. Madeleine stressed that 4TU.ResearchData remains in full control of the repository. Users can now publish data under restricted access. This is useful for data that is confidential or contains personal data. When data is archived under restricted access, re-users have to ask the uploader of the dataset for permission to access the data first. You can also place a temporary embargo on a dataset: if you do so, only metadata about data is available until the embargo ends. This feature is applicable to, for example, papers that are still under review. 4TU.ResearchData now also has an integration with GitHub, which makes it possible to assign DOIs to software and code. Statistics associated with individual datasets are now also publicly visible: you can see the amount of views, downloads and citations of a dataset, as well as their Altmetric score!

After the relaunch of the repository, the directorship of the 4TU.ResearchData (and a literal scepter!) was officially handed over from Alastair to Marta Teperek (Director of 4TU.ResearchData – see this interview). 

Marta shared her vision on 4TU.ResearchData for the future, in which she would like to expand the 4TU.ResearchData community. Having good infrastructure alone is not enough to make data FAIR. Researchers should be supported in doing so by providing them with guidance, training and disciplinary standards. In doing this we need to work together with these communities and take into account disciplinary differences and practises. All 4TU.ResearchData partners now have Data Stewards to support researchers in archiving their data and code. These Data Stewards form a network in the Science, Engineering and Design disciplines which makes it possible to move much faster together and develop better solutions! Marta invited all the participants to collaborate in our journey towards FAIR data.

Tweet by Hannah Blyth (PhD student from Nottingham University and intern at 4TU.ResearchData)

The meeting was closed by Merle Rodenburg (Director of Data Management and Library, TU Eindhoven) who asked the participants two questions. The first question was about the key take away points of the event and the second question was about the future direction of data repositories. Keywords that came out of the answers were collaboration, FAIR, and community. Collaboration and discipline specific support were found to be very important throughout the meeting, and we are looking forward to working together with you on this! 

Special thanks to: Madeleine, Marjolein, Sarah, Merle, Ardi, Berjan, Egbert, Arie, Jan, Eric, Mark, Alastair, Marta, Deirdre, Femke, Santosh, Yasemin, Esther, Ellen, Kees, Nicolas, Yan, Jeff and everyone who joined the event to celebrate 4TU.ResearchData’s anniversary and relaunch!

All the slides from the event are available on Zenodo.

Online Genomics Workshop @TUDelft

Author: Esther Plomp

This blog provides a short summary of the recent online Genomics Workshop at TU Delft. Due to the Corona regulations implemented at TU Delft all of 4TU.ResearchData’s Carpentry Workshops are now taking place online. The Genomics Workshop was no exception.The original plan was to hold the workshop in person in June, just like the first Genomics Workshop we organised in 2019. Due to the increased workload at that time, the decision was made to move the workshop to September (16, 17, 23 & 24, 9:00-13:00). 

Moving a beginner friendly interactive programming course online was quite the challenge, but thanks to all the efforts of the team of instructors and helpers, it was still successful! Our instructors for this course were Raúl A. Ortiz Merino, Marcel van den Broek, Santosh Ilamparuthi, Esther Plomp and Carlos Teijeiro Barjas (SURF). Our helpers were Mario Beck, Nicolas Dintzner, Wijb Dekker, Maurits Kok, Sam Nooij (LUMC) and Carien Hilvering (Maastricht University). 

The 22 participants, who tuned in from several different countries and time zones, received a crash course in project organization for genomics from Esther, cloud genomics from Carlos, shell for genomics from Santosh, and data wrangling and processing by Raúl and Marcel (see also the programme of the workshop on GitHub). Raúl and Marcel set up a local virtual environment at the TU Delft for the previous workshop, but this was not possible for this workshop. Instead, the Carpentries provided us with virtual Amazon Web Services (AWS) instances which contained the dataset and software that we needed for the workshop. 

The course was held online through Zoom. Zoom is a great tool for these workshops, as it provides you with the option to open up mini meetings within your online meetings (breakout rooms). Breakout rooms can be used for more interaction between the participants when they are doing exercises or for helping participants out when they are stuck. We used the breakout rooms primarily for exercises in the first two days. During the ‘data wrangling and processing’ days we only used break out rooms if a participant got stuck. In general, participants, helpers and instructors were happy with this format. There were some pacing problems if a participant was stuck for a long time. We tried to resolve longer issues during breaks or by getting them back on track by documenting when individuals left for a break out room and came back, so that it was clear what they missed. 

Screenshot of Zoom during the Workshop (Day 4).

Collaborative notes were taken using a Google Doc instead of the etherpad that we normally use. In a Google Doc you can share screenshots when participants are stuck, a functionality that is not available in the etherpad. In the end we mostly solved issues by screen sharing in Zoom rather than exchanging screenshots. For the Google Docs we used the same format as the FAIR software workshop (held on the 8th of September). The Google Doc provided an overview for the participants of the programme of the day and the materials used, a way for them to introduce themselves and connect with the other participants, a platform for questions and a list of the commands that were used during the day. We also used Google Docs to ask the participants for feedback, rather than using the sticky notes that we normally use during physical workshops. Questions were also asked through the Zoom chat and we used the participant list buttons (yes and no) as a replacement for the sticky notes that we used previously as an indication whether participants were ready with their exercise or needed help. 

Screenshot of the Google Docs that we used during the workshop.

Most of the participants of the course were graduate students. We provided the opportunity to obtain a certificate for anyone that wanted to take the course for credits in their graduate school programme. Out of the 22 participants, 15 requested such a certificate. 

Despite being held online, we still had a lot of interaction with each other and we managed to get through most of the materials of the lessons! Onward to our next online basic programming workshop in October! 

Celebrating 4TU.ResearchData’s Role in Fostering Open Science – Opening Remarks

(Notes written for the event celebrating 10 years of 4TU.ResearchData)

In 2006, research was undertaken to test the feasibility of a 3TU datacentrum.  It was commissioned by the head librarians of Delft, Eindhoven and Twente (Wageningen was not part of the federation at the time).

Photo by ian dooley on Unsplash

The report (available here in Dutch) spoke to the groups you would expect (researchers, support staff international stakeholders such as the National Science Foundation) and asked the familiar questions. What happens to data in your discipline? Is there much re-use of data?  What role did universities have in preserving such data? Was there need for a Dutch based archive in the technical sciences? 

When I recently read the report I hoped to find some laughably bad predictions, the type of thing you could share for easy comedy at an event such as this one.

However, the statements made were largely accurate in their ideas. There were laments over how much time good data management took; the lack of reward and recognition for publishing data; the complete lack of standardisation made data sharing between projects difficult. Much of what was written around 15 years ago is familiar to us now. 

What was more striking was what was not discussed either by the consultant or the researchers. It was assumed that technical universities would not have personal data and that the licensing of data was not an issue.

There was plenty of discussion about practical things (PhDs staying up late at night to burn data on to CDs) and standards (different lab apparatus throwing out data in completely non-interoperable formats). Software is not mentioned at all. 

What was also entirely absent was any concept of the type of people needed to engage in data management. It was just assume that PhD would continue to carry the data burden for a lab. So while many of the concepts have stayed the same (interoperability, recognition, lack of time), we can also see entirely new  staff and types of expertise inhabiting our research data landscape.

We reflect that in our programme today, with Yan Wang, Data Stewardship Coordinator at TU Delft, and Qian Zhang, Research Data Officer at Twente. I am sure that everyone in the audience can look at their own institutions and see the changes there. The excellent LCRDM counts nearly 140 in its Pool of Data Experts.

We’ve also reflected this in our 2020-3 strategy for 4TU.ResearchData. While our service still allows all scientists to publish data and code, get metadata reviews or query complex gridded data, we’ve made connections between people much more important.

Building discipline-specific networks to stimulate the creation and re-use  of FAIR research data – this will be much more important to us in the future. How can we create groups of interested scientists and related staff to build standards that work within their specific research communities.

As Eva Mendes from University of Carlos III Madrid pointed out in her report for the European Commission: “Key to the success of the EOSC is that the research community is making their data FAIR.”

This is true for everyone, not just EOSC. And it is not just a technical issue; it requires people to get together and make innovative and sometimes challenging decisions about how to structure data within their disciplines, and how to maintain that structure over time; and it requires groups like 4TU.ResearchData (and DANS, DTL, eScience Centre, LCRDM) to manage and assist in that process. While we are repository for the technical science, it is still the human connections that make the thing run.   

A final note: According to the consultant, Maurits van de Graaf of Pleiade Management and Consultancy who wrote the report, there were some critical commentary of the decision to create the data archive, not all of which were represented in the report.

But after the archive was launched, the Deirdrik Stapel case came to light – widespread fabrication of research data that underpinned internationally recognised research. Dissenting voices against the archive were no longer to be heard. This was the bad news we needed, and it came not because of a technical issues, but because of a most human fallibility.

How to make research software a first-class citizen in the Netherlands?

Image credit: Clive Warneford / CC BY-SA

This blog is originally posted in the NL-RSE website and re-posted here.


TL;DR – Read our recommendations to make research software a first-class citizen.

Antoni van Leeuwenhoek is considered the father of microbiology. His discovery of microbes, using the lenses he made himself, created an entire new field of research. He was at the same time a researcher and a tool maker: his research would have not been possible without the tools he built. Leeuwenhoek was well known for both his discoveries in microbiology as well as the unmatched quality of his lenses.

Four centuries after Leeuwenhoek, research tools have only gained importance. Recently, a new type of tool gained critical importance: research software. The 2020 COVID-19 pandemic has brought up to the public eye how important research software is.

However, research software does not receive the recognition it deserves. A group of members of the NL-RSE network, and software minded data specialists, got together in an attempt to raise the profile of research software. Our position paper provides further details.

Back in March 2019, we had a meeting with NWO about the role of software in research. Following that meeting, we wrote a position paper with recommendations for funding agencies and research institutions to raise the profile of research software. In August 2019 made it publicly available for comments from the RSE community. In November 2019, we also had a feedback session during the NL-RSE conference. The author group got together again in January 2020 to integrate the community feedback. After a long revision process, the “final” version is ready. This paper focuses on the Netherlands, but the issues and recommendations could be adapted and adopted by other countries.

These recommendations have been broadly commented on however if you would like to comment on them feel free to reach out to any of the authors or contact us via the NL-RSE network.

Why figshare? Choosing a new technical infrastructure for 4TU.ResearchData

Written by Marta Teperek & Alastair Dunning

4TU.ResearchData is an international repository for research data in science, engineering and design. After over 10 years of using Fedora, an open source repository system, to run  4TU.ResearchData, we have made a decision to migrate a significant part of our technical infrastructure to a commercial solution offered by figshare. Why did we decide to do it? Why now, at a time of increasing concerns about relying on proprietary solutions, particularly associated with large publishing houses, to run scholarly communication infrastructures? (see for example, In pursuit of open science, open access is not enough and the SPARC Landscape Analysis)

We anticipate that members of our community, as well as colleagues that use or manage scholarly communications infrastructures might be wondering the same. We are therefore explaining our thinking in this blogpost, hoping it will facilitate more discussion about such developments in the scholarly communications infrastructure.

Why not continue with Fedora?

So, first, why not continue with Fedora? Any software, but open source software in particular, needs to be maintained. It’s a tough process. Maintenance means developers, which are often difficult to retain within academic environments (competitive salaries in industry); other developers, approaching retirement, are irreplaceable. We also faced the challenge of migrating to the next version of Fedora – a significant challenge simply to keep the repository running.

At the same time, researchers started requesting additional functionality: better statistics, restricted access to confidential datasets, integration with github, among many others. With insufficient development capacity it proved increasingly challenging to keep up with these demands. Undertaking a public tender for a managed repository platform, where developed efforts could be outsourced to the entity providing the repository solution, looked like the best way to deal with these twin challenges.

Why not Zenodo or Dryad? Or another open source repository?

Open Source advocates may ask why we did not try with open source repository solutions. We tried hard. We were in discussion with Zenodo (who are working on the Invenio out of the box repository solution), but the product was still at the pilot stage when we had to start our tender. We had discussions with Dryad, but Dryad’s offering at the time did not give us the functionality we required. Another university running an open source repository platform contacted us, but they withdrew in the end  – the tender process required too much bureaucracy.

We received no interest from other open source repository tools providers, despite utilising several data management and repository networks to share the information about the tender to solicit broader participation.

Tender

The next step was to start the public tender process. Within the EU, this is a compulsory step for transparency and accountability purposes at any public institution making purchases over a certain threshold. The tender process is an exhausting hurdle. But it does offer the opportunity to describe exactly the services and guarantees which are essential. This is very useful for building some security against vendor lock-in.

We had already made the decision to retain servers at TU Delft for data storage; additional requirements within the tender included the guarantee that all metadata was CC0; that import and export formats (eg JSON, XML) and protocols (a good API)  would be available and well documented; that an escape strategy would be supplied by the winning bidder, demonstrating the measures that would be enacted for DOIs, data, user information, metadata and webpages should either party wish to leave the contract. The winning bidder also offered to make its code open source should it ever cease development. Such arrangements provide some flexibility for us; if conditions change in the future we are in a position to change our technical infrastructure.

Why figshare?

There were two main bidders in the tender process; both coming from the commercial sector. Figshare turned out to have the better and more mature solution, and was the winner of the tender process.

Collaboration with figshare

We have now started working together with figshare and are just about to complete the migration process from the old Fedora repository to figshare.

What we have already noticed is the professionalism and responsiveness of figshare colleagues. They have a team of developers devoted to the product. We are pleased with figshare’s integration capability – data in and out via APIs, which enables opportunities to connect our repository with other tools and products used by our research community.

We are also pleased to see that figshare are interested in receiving and considering user feedback. They are now in the process of rethinking the whole metadata structure offered by the platform as a result of user feedback and are now considering potential future support for RDF format datasets. Such a move could enable greater interoperability of data.

But. Figshare is not an open source tool and it is one of the products offered by a technology company called Digital Science. Digital Science is part of Holtzbrinck Publishing group, which also owns the publishing giant Springer Nature. As mentioned before, there are concerns within the community about publishers getting a strong grip on scholarly communication infrastructure and data infrastructure in particular.

Future

Short-term (our contract with figshare is for three years), figshare promises to deliver functionalities for which our end user communities have been waiting for a long time. We are pleased working with them.

But long-term we are still interested in the alternatives. There are a number of initiatives, for example, Invest in Open Infrastructure, which aim to develop collaborative, viable and sustainable open source infrastructures for scholarly communications. These are crucial strategic investments for research institutions and time, money and expertise should be devoted to such activities.

The broader research community is still in need of open source alternatives, which can be developed and sustained in a collaborative manner.

Capacity needed

However, someone needs to develop, organise and sustain long-term maintenance of such open source alternatives. Who will that be? There are many organisations facing this challenge. 

So we should really invest in our capacity to collaborate on open source projects. Only then we will be able to co-develop much needed open source alternatives to proprietary products.

Short-term savings and long-term strategic plans are different matters and both require careful planning.

Lessons learnt

Finally, we also wanted to share some lessons learnt. 

More transparency and more evidence needed

Our first lesson learnt is that more transparency and more evidence comparing the costs of running an open source versus commercial infrastructures are needed. Many say that commercial, managed infrastructures are cheaper. However, implementation of such infrastructures does not happen at no cost. All the efforts involved in migration, customisations, communication etc are not negligible and apply to both open source software and commercial platforms. One recent publication suggests that the effort needed to sustain own open source infrastructure is comparable to that involved in implementing a third party solution in an institutional setting.

We need more evidence-based comparisons of running such infrastructures in scholarly communications settings. 

Easy to criticise. Easy to demand. But we need working, sustainable solutions.

Finally, we have received some criticism over our decision to migrate to figshare, in particular from Open Science advocates. 

While we acutely appreciate, understand and wholeheartedly support the strategic preference for Open Source infrastructures at academic institutions and in information management in particular, viable alternatives to commercial products are not always available in the short term.

We need to talk more and share much needed evidence and experience. We also need to invest in a skilled workforce and join forces to work together on developing viable solutions for open source infrastructures for scholarly communications, which hopefully will be coordinated by umbrella organisations such as Invest in Open Infrastructure.

Running Tender Processes make different workforce demands 

While outsourcing solves the problem of lack of developers, running an EU tender process creates other challenges. Tender processes are slow, cumbersome and require dedicated legal and procurement support. Discussions are no longer with in-house developers but with legal advisers. The procurement process requires numerous long documents, a forensic eye for detail, and an ability to explain and justify even the simplest functional demands. To ensure an equal and fair process, everything needs to be quantified. For example, one cannot simply require that an interface shows ‘good usability’ – the tender documents need to define good usability and indicate how it will be judged in the marking process. 

If others are undertaking the same process, they may wish to consult the published version of the tender document

We hope that the published tender document, as well as this blog post, might initiate greater discussion within the community about infrastructures for scholarly communication and encourage more sharing of evidence and experience.

Clarification

Added on 20 September 2020

We are grateful for all the comments and reactions we have received on our recent blog post “Why figshare? Choosing a new technical infrastructure for 4TU.ResearchData”.

Our intention behind the original post was to explain the processes behind our decision, as honestly as we possibly could. However, some of the comments we received made us realise that we unfairly portrayed our colleagues from the  Invenio and Dryad teams, as well as other colleagues supporting open source infrastructures. This is explained in the blog post “Sustainable, Open Source Alternatives Exist”, which was published as a reaction to our post. We apologise for this. 

We did not mean to imply in our post that sustainable open source alternatives do not exist. That is not what we think or believe. We also did not mean to imply that open source and hosted are mutually exclusive. 

We wholeheartedly agree with the remark that tenders are bureaucratic hurdles. However, tender processes are often favoured by big public institutions. The absence of open source infrastructure providers being able to successfully compete in such processes is an issue.

In the future, we would like to be involved in discussions about making tender processes accessible and fair to open source providers, or how to make alternatives to tender processes acceptable at large public institutions.

Remote ReproHacking

Author: Esther Plomp

The first Remote ReproHack was held on the 14th of May 2020. About 30 participants joined the online party with the mission to learn more about reproducibility and reproduce some papers! A ReproHack is a one day event where participants aim to reproduce papers of their choice, from a list of proposed papers of which the authors indicated that they would like to receive feedback on. The ReproHack aims to provide a safe space to provide constructive feedback, so that it is a valuable learning experience for the participants and the authors.

Recent studies and surveys have indicated that scientific papers can often not be reproduced because supporting data and code are not accessible or incorrect (see for example the Nature survey results here). In computational research only 26% of the papers are reproducible (Stodden 2018). To learn more about how these numbers can be improved, I joined the first ReproHack in the Netherlands last year. During this ReproHack I managed to reproduce the figures from a physics paper on Majorana bound states by André Melo and colleagues. I must admit that most of the work was done by Sander, who was very patient with my beginner Python skills. This year, I was set on trying to reproduce a paper that made use of R, a language that I learned to appreciate more since attending the Repro2020 course earlier this year.

The Remote Reprohack started with welcoming the participants through signing in on an online text document (HackMd) where we could list our names, affiliations and Twitter/GitHub information. This way we could learn more about the other participants. The check-in document also provided us with the schedule of the day, the list of research papers from which we could choose to reproduce, and the excellent code of conduct. After this digital check in and words of welcome, Daniel Nüst gave a talk about his work on improving the reproducibility of software and code. Next, Anna Krystalli, one of the organisers, took us through the process of how to reproduce and review the papers during the ReproHacking breakout sessions. During these breakout sessions the participants were ‘split up’ in smaller groups to work on the papers that they selected to reproduce. It was also possible to try to reproduce a paper by yourself.

Slide on CODECHECK from the presentation by Daniel Nüst

10:00 – Welcome and Intro to Blackboard Collaborate
10:10 – Ice breaker session in groups
10:20 – TALK: Daniel Nüst – Research compendia enable code review during peer review (
slides)
10:40 – TALK: Anna Krystalli
Tips and Tricks for Reproducing and Reviewing (slides)
11:00 – Select Papers
11:15 – Round I of ReproHacking (break-out rooms)
12:15 – Re-group and sharing of experiences
12:30 – LUNCH
13:30 – TALK: Daniel Piqué – How I discovered a missing data point in a paper with 8000+ citations
13:45 – Round II of ReproHacking (break-out rooms)
14:45 – COFFEE
15:00 – Round III of ReproHacking (break-out rooms) – Complete Feedback form
16:00 – Re-group and sharing of experiences
16:30 – TALK: Sarah Gibson – Sharing Reproducible Computational Environments with Binder (
slides) (see also here for materials from a Binder Workshop)
16:45 – Feedback and Closing

The participants had ~15 minutes to decide which paper we would like to reproduce from a list that contained almost 50 papers! The group that I joined was going to reproduce the preprint by Eiko Fried et al. on mental health and social contact during the COVID19 pandemic. Our group consisted of Linda Nab, one of the organisers of the ReproHack, Alessandro Gasparini (check out his work on INTEREST here if you work with simulations!), Anna Lohmann, Ciu and myself. The first session was spent by finding out how we could download all the data and code from the Open Science Framework. After we retrieved all the files, we had to download packages (or update R). During the second session we were able to do more reproducing rather than just getting set up. The work by Eiko Fried was well structured and documented, so after the initial problems with getting everything set up, the process of reproducing the work went quite smoothly. In the end, we managed to reproduce the majority of the paper!

Tweet by Eiko Fried on his experiences on submitting a paper for feedback to the Remote ReproHack.

In the third session feedback was provided to the authors of the papers that were being reproduced using the feedback form that the ReproHack team set up. This form contained questions about which paper was chosen, if the participants were able to reproduce the paper, and how much of the paper was reproduced. In more detail we could describe which procedure/tools/operating system/software that we used to reproduce the paper and how familiar we were with these. We also had to rate the reusability of the material, and indicate if the material had a licence. A very important section of the feedback form asked which challenges we ran into while trying to reproduce the paper, and what the positive features were. A separate section was dedicated on the documentation of the data and code, asking how well the material was documented. Additional suggestions and comments to improve the reproducibility were also welcomed.

After everyone returned from the last breakout sessions and filled in their feedback forms, the groups took turns to discuss whether they were able to reproduce the papers that they had chosen and if not, which challenges they faced. Most of the papers that were selected were reproduced by the participants. It was noted that especially proper documentation, such as a readme files, manuals and comments in the scripts themselves explaining the correct operating instructions to users, were especially helpful in reproducing someone else’s work.

Another way of improving the quality and reproducibility of the research is by asking your colleagues to reproduce your findings and offer them a co-author position (see this paper by Reimer et al. (2019) for more details on the ‘co-pilot system’). Some universities have dedicated services for checking the code and data before they are published (see this service at Cornell university).

There are several tools available to check and clean your data:

If you would like to learn more about ReproHacks, the Dutch ReproHack team wrote a paper on the Dutch ReproHack in November 2019. If you would like to participate, organise your own ReproHack, or contribute to the ReproHack work, the ReproHack team invites contributions on GitHub.

Anna Krystalli provided the Remote Reprohack participants with some additional resources to improve the reproducibility of our own papers:

FAIRsharing: how to contribute to standards?

Contributors in order of chronological contribution: Esther Plomp, Paula Martinez Lavanchy, Marta Teperek, Santosh Ilamparuthi, and Yasemin Turkyilmaz – van der Velden.

FAIRsharing organised a workshop for the Data Stewards and Champions at TU Delft on the afternoons of the 11th and 12th of June. We were joined by colleagues from University of Stuttgart, RWTH Aachen University, Technical University of Denmark (DTU), and the Swiss Federal Institute of Technology Lausanne (EPFL).

FAIRsharing is a cross-disciplinary platform that houses manually curated metadata on standards, databases and data policies. FAIRsharing works together with a large community that can add their metadata standards, policies and databases to the platform. You can view the introduction presentation here (see here for the slides).

During the first day of the workshop, which was led by Peter McQuilton, there was a demonstration of how to search FAIRsharing and how to apply the standards therein. The curation activities involved around the standards and databases in FAIRsharing were also explained in detail. On the second day, the participants discussed how to develop standards when there are no community-endorsed standards available and also how to contribute a standard to FAIRsharing. You can view a recording of the second day here (slides available here).

Day 1: FAIR and FAIRsharing

For anyone that has never heard of the FAIR principles (Findable, Accessible, Interoperable and Reusable), a short explanation is outlined below:

Findable

  • For your information/data to be findable, it needs to be discoverable on the web
  • It needs to be accompanied by a unique persistent identifier (e.g., DOI)

Accessible

  • For your information/data to be accessible, it needs to be clearly defined how this would be possible, and appropriate security protocols need to be in place (especially important for sensitive data which contains personal information)

Interoperable

  • For your information/data to be interoperable, it needs to be machine-actionable; it needs to be structured in a way that not only humans can interact with it, but also software/machines
  • Your data can be more easily integrated with data of other researchers when you use community adopted standards (formats and guidelines such as a report or publication)
  • You should link your information/data to other relevant resources

Reusable

  • For your information/data to be reusable, it needs to be clearly licensed, well documented and the provenance needs to be clear (for example, found in a community repository)

During our workshop, the FAIRsharing team highlighted that in order to make data truly FAIR, we need to have data standards! FAIRsharing helps people find disciplinary standards and provides support on application of standards. Delphine Dauga highlighted that it is important for communities to share vocabularies in order to effectively communicate with each other, as well as with machines.You can view the recording of her talk on the curation process of standards on FAIRsharing.org on YouTube.

You can contribute to FAIRsharing by adding standards. During the workshop we were guided by Allyson Lister through this process.

FAIRsharing also allows one to see the relationships between objects, which can be used to see how widely adopted a standard is. Example Graph of the recommended repositories by PLOS.

Day 2: How to contribute to, or develop, a standard?

To start off this day, a definition of “a standard” was given by Susanna Sansone. A standard is an agreed-upon convention for doing ‘something’, established by community consensus or an authority. For example, nuts and bolts are currently following international standards that outline their size, but this was not always the case (see below)!

Image from the slides by Susanne Sansone.

When you cannot find an applicable standard and you’re ready to work on a new standard, you should set up a community of governance for the standard. This means that a group should be established with individuals that have specific roles and tasks to work on the standard. Groups that are developing standards should have a code of conduct to successfully operate (for example, see The Turing Way code of conduct). There are different directions the group can take, one is to work under established or formal organisations which produce standards that might be adopted by industry (think of standards that govern the specifications of a USB drive), or grass-roots groups that form bottom up communities. There are advantages and limitations to both. The formal organisations already have  developmental processes in place which may not be flexible but can engender greater trust to the end users. The grass-roots groups, while not having an established community to begin with, provide greater flexibility and are often the route taken when developing research level standards. 

Development of a standards requires time and commitment

The standard needs to be tested and open to feedback, possibly multiple times over a long time period. The group needs to generate a web presence and share the different versions of the standard, ideally in a place that people can contribute to these versions (e.g., GitHub). It is desirable to use multiple communication channels to facilitate broad and inclusive contributions. These contributions do not stop when the standard is developed, but will need to be maintained and new requests for changes and contributions will have to be implemented. To maintain momentum, one should set clear timelines and ensure that there are moments where more intensive discussions can take place. This governance group needs to be sustainable. Sustainability can be ensured by dedicated funding, or by identifying other ways that can guarantee the maintenance of the group.

Community engagement

When working on new standards, it is good to first look at existing standards such as ISO/TC 276 or ISA, IEEE, ASTM, ANSI, and release any technical documentation that you have with practical examples so that all community members will be able to understand what needs to be done and contribute effectively. It also helps to create educational materials for diverse stakeholders to make it easier for them to engage with the development of the standard.

The success of grass-root governance groups depends on their ability to sustain the work in all phases, reward and incentivise all contributors, and deliver a standard that is fit for purpose. This is thus not primarily technical development but also depends on how well you are able to set up and maintain a community that contributes to the standard. After all, a standard is not going to adopt itself!

If you need more information on how you can maintain an (online) community, you can see this blog for some more pointers. 

FAIRsharing continues to grow and work with the community to ensure the metadata captured therein is as comprehensive and accurate as possible. To help with this, FAIRsharing is looking for users with specific domain experience to help with the curation of appropriate resources into FAIRsharing. This new initiative, to recruit Community Curators, will roll out over the summer. Please contact them (contact@fairsharing.org) to find out more!

Latest developments

The recommendation filter in the advanced search options on FAIRsharing.org.

FAIRsharing is in the process of integrating FAIRsharing with DMPonline. They are also setting up a collection of all the available tools to assess whether digital objects are FAIR on FAIRassist.org. FAIRsharing is also working on standard criteria for recommending data repositories (see below) so that publishers can assess whether they should endorse a certain data repository.  

FAIRsharing is currently being redesigned, with a new version being released by the end of 2020, and they are always happy to hear from you (through email, Facebook or Twitter) what is still missing!