Written by Marta Teperek & Alastair Dunning
4TU.ResearchData is an international repository for research data in science, engineering and design. After over 10 years of using Fedora, an open source repository system, to run 4TU.ResearchData, we have made a decision to migrate a significant part of our technical infrastructure to a commercial solution offered by figshare. Why did we decide to do it? Why now, at a time of increasing concerns about relying on proprietary solutions, particularly associated with large publishing houses, to run scholarly communication infrastructures? (see for example, In pursuit of open science, open access is not enough and the SPARC Landscape Analysis)
We anticipate that members of our community, as well as colleagues that use or manage scholarly communications infrastructures might be wondering the same. We are therefore explaining our thinking in this blogpost, hoping it will facilitate more discussion about such developments in the scholarly communications infrastructure.
Why not continue with Fedora?
So, first, why not continue with Fedora? Any software, but open source software in particular, needs to be maintained. It’s a tough process. Maintenance means developers, which are often difficult to retain within academic environments (competitive salaries in industry); other developers, approaching retirement, are irreplaceable. We also faced the challenge of migrating to the next version of Fedora – a significant challenge simply to keep the repository running.
At the same time, researchers started requesting additional functionality: better statistics, restricted access to confidential datasets, integration with github, among many others. With insufficient development capacity it proved increasingly challenging to keep up with these demands. Undertaking a public tender for a managed repository platform, where developed efforts could be outsourced to the entity providing the repository solution, looked like the best way to deal with these twin challenges.
Why not Zenodo or Dryad? Or another open source repository?
Open Source advocates may ask why we did not try with open source repository solutions. We tried hard. We were in discussion with Zenodo (who are working on the Invenio out of the box repository solution), but the product was still at the pilot stage when we had to start our tender. We had discussions with Dryad, but Dryad’s offering at the time did not give us the functionality we required. Another university running an open source repository platform contacted us, but they withdrew in the end – the tender process required too much bureaucracy.
We received no interest from other open source repository tools providers, despite utilising several data management and repository networks to share the information about the tender to solicit broader participation.
The next step was to start the public tender process. Within the EU, this is a compulsory step for transparency and accountability purposes at any public institution making purchases over a certain threshold. The tender process is an exhausting hurdle. But it does offer the opportunity to describe exactly the services and guarantees which are essential. This is very useful for building some security against vendor lock-in.
We had already made the decision to retain servers at TU Delft for data storage; additional requirements within the tender included the guarantee that all metadata was CC0; that import and export formats (eg JSON, XML) and protocols (a good API) would be available and well documented; that an escape strategy would be supplied by the winning bidder, demonstrating the measures that would be enacted for DOIs, data, user information, metadata and webpages should either party wish to leave the contract. The winning bidder also offered to make its code open source should it ever cease development. Such arrangements provide some flexibility for us; if conditions change in the future we are in a position to change our technical infrastructure.
There were two main bidders in the tender process; both coming from the commercial sector. Figshare turned out to have the better and more mature solution, and was the winner of the tender process.
Collaboration with figshare
We have now started working together with figshare and are just about to complete the migration process from the old Fedora repository to figshare.
What we have already noticed is the professionalism and responsiveness of figshare colleagues. They have a team of developers devoted to the product. We are pleased with figshare’s integration capability – data in and out via APIs, which enables opportunities to connect our repository with other tools and products used by our research community.
We are also pleased to see that figshare are interested in receiving and considering user feedback. They are now in the process of rethinking the whole metadata structure offered by the platform as a result of user feedback and are now considering potential future support for RDF format datasets. Such a move could enable greater interoperability of data.
But. Figshare is not an open source tool and it is one of the products offered by a technology company called Digital Science. Digital Science is part of Holtzbrinck Publishing group, which also owns the publishing giant Springer Nature. As mentioned before, there are concerns within the community about publishers getting a strong grip on scholarly communication infrastructure and data infrastructure in particular.
Short-term (our contract with figshare is for three years), figshare promises to deliver functionalities for which our end user communities have been waiting for a long time. We are pleased working with them.
But long-term we are still interested in the alternatives. There are a number of initiatives, for example, Invest in Open Infrastructure, which aim to develop collaborative, viable and sustainable open source infrastructures for scholarly communications. These are crucial strategic investments for research institutions and time, money and expertise should be devoted to such activities.
The broader research community is still in need of open source alternatives, which can be developed and sustained in a collaborative manner.
However, someone needs to develop, organise and sustain long-term maintenance of such open source alternatives. Who will that be? There are many organisations facing this challenge.
So we should really invest in our capacity to collaborate on open source projects. Only then we will be able to co-develop much needed open source alternatives to proprietary products.
Short-term savings and long-term strategic plans are different matters and both require careful planning.
Finally, we also wanted to share some lessons learnt.
More transparency and more evidence needed
Our first lesson learnt is that more transparency and more evidence comparing the costs of running an open source versus commercial infrastructures are needed. Many say that commercial, managed infrastructures are cheaper. However, implementation of such infrastructures does not happen at no cost. All the efforts involved in migration, customisations, communication etc are not negligible and apply to both open source software and commercial platforms. One recent publication suggests that the effort needed to sustain own open source infrastructure is comparable to that involved in implementing a third party solution in an institutional setting.
We need more evidence-based comparisons of running such infrastructures in scholarly communications settings.
Easy to criticise. Easy to demand. But we need working, sustainable solutions.
Finally, we have received some criticism over our decision to migrate to figshare, in particular from Open Science advocates.
While we acutely appreciate, understand and wholeheartedly support the strategic preference for Open Source infrastructures at academic institutions and in information management in particular, viable alternatives to commercial products are not always available in the short term.
We need to talk more and share much needed evidence and experience. We also need to invest in a skilled workforce and join forces to work together on developing viable solutions for open source infrastructures for scholarly communications, which hopefully will be coordinated by umbrella organisations such as Invest in Open Infrastructure.
Running Tender Processes make different workforce demands
While outsourcing solves the problem of lack of developers, running an EU tender process creates other challenges. Tender processes are slow, cumbersome and require dedicated legal and procurement support. Discussions are no longer with in-house developers but with legal advisers. The procurement process requires numerous long documents, a forensic eye for detail, and an ability to explain and justify even the simplest functional demands. To ensure an equal and fair process, everything needs to be quantified. For example, one cannot simply require that an interface shows ‘good usability’ – the tender documents need to define good usability and indicate how it will be judged in the marking process.
If others are undertaking the same process, they may wish to consult the published version of the tender document.
We hope that the published tender document, as well as this blog post, might initiate greater discussion within the community about infrastructures for scholarly communication and encourage more sharing of evidence and experience.
Added on 20 September 2020
We are grateful for all the comments and reactions we have received on our recent blog post “Why figshare? Choosing a new technical infrastructure for 4TU.ResearchData”.
Our intention behind the original post was to explain the processes behind our decision, as honestly as we possibly could. However, some of the comments we received made us realise that we unfairly portrayed our colleagues from the Invenio and Dryad teams, as well as other colleagues supporting open source infrastructures. This is explained in the blog post “Sustainable, Open Source Alternatives Exist”, which was published as a reaction to our post. We apologise for this.
We did not mean to imply in our post that sustainable open source alternatives do not exist. That is not what we think or believe. We also did not mean to imply that open source and hosted are mutually exclusive.
We wholeheartedly agree with the remark that tenders are bureaucratic hurdles. However, tender processes are often favoured by big public institutions. The absence of open source infrastructure providers being able to successfully compete in such processes is an issue.
In the future, we would like to be involved in discussions about making tender processes accessible and fair to open source providers, or how to make alternatives to tender processes acceptable at large public institutions.