August 18, 2020

Why figshare? Choosing a new technical infrastructure for 4TU.ResearchData

Written by Marta Teperek & Alastair Dunning

4TU.ResearchData is an international repository for research data in science, engineering and design. After over 10 years of using Fedora, an open source repository system, to run 4TU.ResearchData, we have made a decision to migrate a significant part of our technical infrastructure to a commercial solution offered by figshare. Why did we decide to do it? Why now, at a time of increasing concerns about relying on proprietary solutions, particularly associated with large publishing houses, to run scholarly communication infrastructures? (see for example, In pursuit of open science, open access is not enough and the SPARC Landscape Analysis)

We anticipate that members of our community, as well as colleagues that use or manage scholarly communications infrastructures might be wondering the same. We are therefore explaining our thinking in this blogpost, hoping it will facilitate more discussion about such developments in the scholarly communications infrastructure.

Why not continue with Fedora?

So, first, why not continue with Fedora? Any software, but open source software in particular, needs to be maintained. It’s a tough process. Maintenance means developers, which are often difficult to retain within academic environments (competitive salaries in industry); other developers, approaching retirement, are irreplaceable. We also faced the challenge of migrating to the next version of Fedora – a significant challenge simply to keep the repository running.

At the same time, researchers started requesting additional functionality: better statistics, restricted access to confidential datasets, integration with github, among many others. With insufficient development capacity it proved increasingly challenging to keep up with these demands. Undertaking a public tender for a managed repository platform, where developed efforts could be outsourced to the entity providing the repository solution, looked like the best way to deal with these twin challenges.

Why not Zenodo or Dryad? Or another open source repository?

Open Source advocates may ask why we did not try with open source repository solutions. We tried hard. We were in discussion with Zenodo (who are working on the Invenio out of the box repository solution), but the product was still at the pilot stage when we had to start our tender. We had discussions with Dryad, but Dryad’s offering at the time did not give us the functionality we required. Another university running an open source repository platform contacted us, but they withdrew in the end – the tender process required too much bureaucracy.

We received no interest from other open source repository tools providers, despite utilising several data management and repository networks to share the information about the tender to solicit broader participation.

Tender

The next step was to start the public tender process. Within the EU, this is a compulsory step for transparency and accountability purposes at any public institution making purchases over a certain threshold. The tender process is an exhausting hurdle. But it does offer the opportunity to describe exactly the services and guarantees which are essential. This is very useful for building some security against vendor lock-in.

We had already made the decision to retain servers at TU Delft for data storage; additional requirements within the tender included the guarantee that all metadata was CC0; that import and export formats (eg JSON, XML) and protocols (a good API) would be available and well documented; that an escape strategy would be supplied by the winning bidder, demonstrating the measures that would be enacted for DOIs, data, user information, metadata and webpages should either party wish to leave the contract. The winning bidder also offered to make its code open source should it ever cease development. Such arrangements provide some flexibility for us; if conditions change in the future we are in a position to change our technical infrastructure.

Why figshare?

There were two main bidders in the tender process; both coming from the commercial sector. Figshare turned out to have the better and more mature solution, and was the winner of the tender process.

Collaboration with figshare

We have now started working together with figshare and are just about to complete the migration process from the old Fedora repository to figshare.

What we have already noticed is the professionalism and responsiveness of figshare colleagues. They have a team of developers devoted to the product. We are pleased with figshare’s integration capability – data in and out via APIs, which enables opportunities to connect our repository with other tools and products used by our research community.

We are also pleased to see that figshare are interested in receiving and considering user feedback. They are now in the process of rethinking the whole metadata structure offered by the platform as a result of user feedback and are now considering potential future support for RDF format datasets. Such a move could enable greater interoperability of data.

But. Figshare is not an open source tool and it is one of the products offered by a technology company called Digital Science. Digital Science is part of Holtzbrinck Publishing group, which also owns the publishing giant Springer Nature. As mentioned before, there are concerns within the community about publishers getting a strong grip on scholarly communication infrastructure and data infrastructure in particular.

Future

Short-term (our contract with figshare is for three years), figshare promises to deliver functionalities for which our end user communities have been waiting for a long time. We are pleased working with them.

But long-term we are still interested in the alternatives. There are a number of initiatives, for example, Invest in Open Infrastructure, which aim to develop collaborative, viable and sustainable open source infrastructures for scholarly communications. These are crucial strategic investments for research institutions and time, money and expertise should be devoted to such activities.

The broader research community is still in need of open source alternatives, which can be developed and sustained in a collaborative manner.

Capacity needed

However, someone needs to develop, organise and sustain long-term maintenance of such open source alternatives. Who will that be? There are many organisations facing this challenge.

So we should really invest in our capacity to collaborate on open source projects. Only then we will be able to co-develop much needed open source alternatives to proprietary products.

Short-term savings and long-term strategic plans are different matters and both require careful planning.

Lessons learnt

Finally, we also wanted to share some lessons learnt.

More transparency and more evidence needed

Our first lesson learnt is that more transparency and more evidence comparing the costs of running an open source versus commercial infrastructures are needed. Many say that commercial, managed infrastructures are cheaper. However, implementation of such infrastructures does not happen at no cost. All the efforts involved in migration, customisations, communication etc are not negligible and apply to both open source software and commercial platforms. One recent publication suggests that the effort needed to sustain own open source infrastructure is comparable to that involved in implementing a third party solution in an institutional setting.

We need more evidence-based comparisons of running such infrastructures in scholarly communications settings.

Easy to criticise. Easy to demand. But we need working, sustainable solutions.

Finally, we have received some criticism over our decision to migrate to figshare, in particular from Open Science advocates.

While we acutely appreciate, understand and wholeheartedly support the strategic preference for Open Source infrastructures at academic institutions and in information management in particular, viable alternatives to commercial products are not always available in the short term.

We need to talk more and share much needed evidence and experience. We also need to invest in a skilled workforce and join forces to work together on developing viable solutions for open source infrastructures for scholarly communications, which hopefully will be coordinated by umbrella organisations such as Invest in Open Infrastructure.

Running Tender Processes make different workforce demands

While outsourcing solves the problem of lack of developers, running an EU tender process creates other challenges. Tender processes are slow, cumbersome and require dedicated legal and procurement support. Discussions are no longer with in-house developers but with legal advisers. The procurement process requires numerous long documents, a forensic eye for detail, and an ability to explain and justify even the simplest functional demands. To ensure an equal and fair process, everything needs to be quantified. For example, one cannot simply require that an interface shows ‘good usability’ – the tender documents need to define good usability and indicate how it will be judged in the marking process.

If others are undertaking the same process, they may wish to consult the published version of the tender document.

We hope that the published tender document, as well as this blog post, might initiate greater discussion within the community about infrastructures for scholarly communication and encourage more sharing of evidence and experience.

Clarification

Added on 20 September 2020

We are grateful for all the comments and reactions we have received on our recent blog post “Why figshare? Choosing a new technical infrastructure for 4TU.ResearchData”.

Our intention behind the original post was to explain the processes behind our decision, as honestly as we possibly could. However, some of the comments we received made us realise that we unfairly portrayed our colleagues from the Invenio and Dryad teams, as well as other colleagues supporting open source infrastructures. This is explained in the blog post “Sustainable, Open Source Alternatives Exist”, which was published as a reaction to our post. We apologise for this.

We did not mean to imply in our post that sustainable open source alternatives do not exist. That is not what we think or believe. We also did not mean to imply that open source and hosted are mutually exclusive.

We wholeheartedly agree with the remark that tenders are bureaucratic hurdles. However, tender processes are often favoured by big public institutions. The absence of open source infrastructure providers being able to successfully compete in such processes is an issue.

In the future, we would like to be involved in discussions about making tender processes accessible and fair to open source providers, or how to make alternatives to tender processes acceptable at large public institutions.

15 comments

August 19, 2020 - 3:31 pm Ingeborg Verheul

Thanks for the open sharing of your considerations during this process. Since we are also using Figshare at the Library UvA/HvA I would be happy to stay in touch with you – also on thinking about future solutions

LikeLike

Reply
August 19, 2020 - 3:32 pm Ingeborg Verheul

Thanks for the open sharing of your considerations during this process. Since we are also using Figshare at the Library UvA/HvA I would be happy to stay in touch with you – also on thinking about future solutions

LikeLike

Reply
August 19, 2020 - 5:46 pm Mariëtte van Selm

Welcome in the Dutch figshare family! I recognise all you write about the tender process being long, cumbersome, tiring and, at times, plain frustrating – we went through the same experience in Amsterdam, and let’s just say I’ve learned a lot 😉 We’ve just decided to continue with figshare after our first four year contract; your signing up with figshare certainly helped that decision, as did our experiences with figshare colleagues so far: professional, responsive and feedback-friendly, they indeed check all those boxes!

One thing though: you write “Figshare is not an open source tool and its mother company, Digital Science, is part of Holtzbrinck Publishing group, which also owns the publishing giant Springer Nature.” That’s not completely accurate. Yes, Digital Science is part of the Holtzbrinck group, but it’s *not* figshare’s “mother company” – Digital Science is figshare’s main investor and figshare uses some of its infrastructure (billing and administration), but it’s *not* a majority shareholder in figshare LLP. Holtzbrinck or Digital Science can decide whatever they want, but that doesn’t mean figshare will do or has to do it.

As for the open source tool: no, figshare’s not, and figshare’s founder Mark Hahnel – in a meeting in Amsterdam when asked why figshare’s source code wasn’t open – gives a compelling reason for that: to prevent abuse of the code by shady commercial parties. I believe him, because up until now I haven’t been able to catch him or his figshare colleagues – and be sure, I’ve been looking – out on anything that goes against Open Science.

LikeLiked by 1 person

Reply
- August 23, 2020 - 8:23 pm martateperek
  
  Thank you Mariëtte for your comments and for flagging up the mistake – I have now corrected to say that figshare is one of the products offered by Digital Science 🙂
  
  LikeLike
  
  Reply
August 25, 2020 - 10:22 am KU Leuven

Thanks Marta and Alastair for sharing your experience in such great detail. Very relevant for the academic community !

LikeLike

Reply
August 27, 2020 - 1:50 pm Pingback: Sustainable, Open Source Alternatives Exist | Dryad news and views
August 31, 2020 - 3:21 pm John Chodacki (@chodacki)

It seems the main takeaway is that you created a tender process knowing that open solutions wouldn’t enter, so you ended up choosing a closed platform. That in no way reflects poorly on open solutions though.

I am glad Zenodo and Dryad have posted a response: https://blog.zenodo.org/2020/08/27/2020-08-28-4tu-response/ Too often these synopses are left without context.

LikeLike

Reply
August 31, 2020 - 4:07 pm David Nicholson

I would encourage people making similar decisions to read this reply post from Dryad and Zenodo
https://blog.datadryad.org/2020/08/26/sustainable-infrastructure-exists/

LikeLike

Reply
September 3, 2020 - 8:29 am Nico Poppelier

Just curious: why doesn’t this article mention Dataverse, which is offered in The Netherlands by DANS as a (paid) service to all academic institutions?

LikeLike

Reply
September 4, 2020 - 2:14 pm Pingback: Weekly digest: what’s happening in open science? – Open Pharma
September 5, 2020 - 8:28 am nemobis

Thank you for discussing this in the open. It seems that the tender failed to attract open source offers because it was written in a way that assumed only closed-source commercial providers existed.

For instance, «8.1 Price component 1: Yearly License fee […] The price quoted by the tenderer for price component 1 is restricted to a minimum price of € 100.000,-excl. VAT and a maximum price of € 175.000,-excl. VAT».

Already the idea that the cost of the software is the “license” (although actually this item includes maintenance and support) is a dogmatic article of faith of the proprietary software providers. It’s also not clear how the price component accurately reflects the real costs of the supply (as in Total Cost of Ownership/TCO).

I cannot tell whether some of the other requirements (such as an insurance for 3 M€, or past similar projects) are mandated by law or not and whether they artificially restricted the competition. I can see however that no preference for open source solutions was reflected in the scoring; this might be legal or not, but for sure it determined the result. Finally, the section on GDPR doesn’t ask where there is data transfer outside the EU, so the entire edifice can easily collapse after the recent EU Court of Justice rulings.

Chi è causa del suo mal pianga sé stesso.

LikeLike

Reply
September 15, 2020 - 9:06 am Heila Pienaar

Pienaar, H. et al. 2017. Criteria and evaluation of research data repository platforms @ the University of Pretoria, South Africa. Research Data Alliance 9th Plenary Meeting, Repository Platforms for Research Data IG session, 7 May 2017, Barcelona, Spain. https://www.rd-alliance.org/system/files/documents/3%20RDAevaluationrepositoriesUP.pdf

We also went through an extensive process and we did evaluate quite a few open source solutions.

LikeLike

Reply
September 28, 2020 - 1:44 pm Pingback: Bulletin de veille - Septembre 2020 - DATACC
October 6, 2020 - 11:57 am Pingback: Celebrating 4TU.ResearchData’s Role in Fostering Open Science | Open Working
October 20, 2020 - 11:11 am Pingback: Promoting an inclusive market place with the Research Repositories Dynamic Purchasing System - Research infrastructure and data