Comments on the European Open Science Cloud Rules of Participation
Authors: Wilma van Wezenbeek, Alastair Dunning, Marta Teperek
Date: 3 August 2018
- Report: Prompting EOSC in Practice (Rules of Participation are on page 28): https://www.eudat.eu/sites/default/files/prompting_an_eosc_in_practice_eosc_hleg_interim_report.pdf
- EOSC Open Consultation website: https://eoscpilot.eu/open-consultation
Structure of our comments
- Overarching comments
- Specific comments on the Report Prompting EOSC in Practice
- Comments on specific Rules of Participation
- Overall, the report is a useful outline of the vision for the European Open Science Cloud (EOSC), with concrete proposals for the rules of participation.
- Recommendations for working with commercial partners need to be carefully thought through. In particular, mechanisms need to be in place to ensure that commercial partners participating in the EOSC do so on the same rights as non-commercial partners.
- Care needs to be taken when deciding on recommendations for Member States and for research communities. The latter are always international in nature.
- It is not clear why additional intermediaries are required for managing financial contributions to the EOSC partners. Couldn’t transactions be arranged without additional intermediaries, which increase the costs and complexity of the ecosystem?
- At some places the recommendations are unclear, lacking structure and alignment. It feels as if the founding vision for the EOSC is not clear enough to create harmonised rules for participation.
- The chapters on business model and financing feel like “far away” from today’s practice.
Specific comments on the Report Prompting EOSC in Practice:
Page 8 and 9 (Executive Summary)
“To help drive forward and implement the EOSC, the main thread of the report is to understand how the EOSC can effectively interlink People, Data, Services and Training, Publications, Projects and Organisations.”
- Should “labs and instruments” and “places” be added there as well?
“The EOSC should implement “whatever works” and do “whatever it takes” to increase the availability and volume of quality & user-friendly scientific information on-line.”
- Why should EOSC focus on “ volume”? This is an odd phrase to use.
“Define an EOSC Quality of Service (QoS) standards, separate for all elements of the ecosystem (data, data access services, software, etc.), to develop a trustable ecosystem.”
- Publications should also be part of the EOSC ecosystem
“Introduce, as part of EOSC’s mission, that a state of the art analysis is carried out on a national level within the Member States for assessing statistics and key assets around the composition and relevant clustering of the community of users, with the respective eInfrastructures & research infrastructures & scientific communities.”
- Why do this on a national (Member State) level? That is not the way these communities are constructed, or how science works.
“The universal entry point to the EOSC should provide access to a marketplace of efficient and effective services, with lightweight integration (authentication and authorization infrastructure, order management, etc…) where service providers can find service users and vice versa. Nothing is wrong with a number of multiple entry points which should be seen as a plus rather than a negative fragmentation.”
- This recommendation is unclear. We want a universal entry point, but promote decentralised ones?
“Introduce a regular assessment of EOSC against other alternatives, including commercial providers. This could be made to either enhance an EOSC Service, or to support new Services;”
- Alternatives to EOSC? This recommendation is unclear.
“Build a workforce able to execute the vision of the EOSC by ensuring data stewards, data and infrastructure technologists and scientific data experts who are trained and supported adequately.”
- I notice here and also in the six action lines on page 14 an expansion of what the EOSC should be or become, and envisions. The question whether that actually will help pace and clarity. Is EOSC not slowly taking over all the topics laid down in the OSPP?
“All activities mentioned above have a stronger focus on research data as opposed to services for research data management.” (page 17)
- Why would “research data” and “research data management” be presented as opposing activities?
“Flexible ways to access and share data and direct access to fast networks to do so are at the top of the agenda for researchers.” (3.2, page 19)
- This is not true at all. What about inability to find datasets because of lack of interoperability and integrated resources? Or the lack of recognition for good data management? Or not being rewarded for doing thorough and reproducible research?
3.4. Governance (page 21)
“Cooperation is needed between end user, service providers, and funding agencies / policy makers.”
- And what about the organisations that represent the end users?
- How the depicted layers and other existing temporary or structural governing bodies will work together (e.g. OSPP, Science Europe, ALLEA, EUA, etc.)
4.2 Business model (page 23)
“The EOSC Business model is a critical non-technical element that will determine the success of the EOSC vision.”
- Scoping principles around the business model requirements are also needed, to outline governance structure (of the infrastructure or service itself), community involvement, sustainability (see the paper by Bilder and Neylon), ownership and openness
“The currents model for provisioning access to Research Infrastructures is based on the guidelines contained in the Charter for Access, where three main models are described” and “a model based on the Wide Access mode modulated by a negotiated, agreeable Access restriction, is the pragmatic way to start moving with the EOSC. Private providers willing to provide resources within the EOSC framework will envision a Market-Driven approach to support users.”
- Also in reference to the guiding principles, along with the business model, it seems good to set Wide Access as the default, and jointly decide where exceptions are allowed. Simply saying that private providers will envision a Market-Driven approach seems to be against the Rule of Participation 5.1. that “Private sector users should be considered stakeholders in the EOSC as well as participants from the start, not added after (…). By participating, private sector may want to invest in the long-term development and sustainability of the EOSC, along with the public sector and not just serve to exploit public data for free.”
- Also, while Excellence-Driven Access model and Market-Driven Access are well defined, the principles behind the Wide Access model need to be better articulated.
“To coordinate acquirement, the EOSC and member states would also certify one or more brokers to manage the acquisition, distribution and payment for EOSC vouchers. These brokers could be government agencies in member states, entities within member states, transnational governments or private firms” (p.25)
- Why would it be necessary to involve brokers in the process?
4.3 Funding Model and Payment Mechanisms (page 25)
“…similarly to how YouTube pays people who upload videos based on how many times they are viewed.”
- You need registration of your account and be compliant to get paid by YouTube, so that is not the default situation. We would plea for a null or onset situation based on reciprocity, not immediately starting with payments.
- Difficult to judge what would be the best fit. Also, guiding principles are needed here, e.g. transparency, efficiency and simplicity. Is there a way to avoid the giant profit margins being made by some players in the scientific publications industry? What principle should be used to achieve this? What have we learned from the big deals? We want researchers (end users) to be cost-aware, without stressing them with workflow troubles and micropayments.
- In Direct Support: the disadvantage “Resources can have internal foci, reducing access from outside stakeholders” could be easily overcome by establishing clear funding rules demanding equal access rights to internal and external stakeholders
- In Direct Support: the disadvantage “Burdensome for commercial entities, even where they could provide significant cost savings and be incentivized to innovate.” – this is unclear to me – why burdensome, and why specifically for commercial entities?
Comments on specific Rules of Participation (from p. 28 of the report Prompting EOSC in Practice )
5.1 Federating the existing infrastructures
“Private sector users should be considered stakeholders in the EOSC as well as participants from the start, not added after (…). By participating, private sector may want to invest in the long-term development and sustainability of the EOSC, along with the public sector and not just serve to exploit public data for free.
Brokers would be obliged to behave in a disinterested fashion with all providers. Entities that establish brokers must require that the broker does not establish a monopoly, or fall under the control of a service provider that then uses their influence to exclude other service providers from the marketplace.”
- How is this going to be achieved in practice?
5.2 Eligibility criteria for actors
“Key rules for participants therefore will include”
- These rules for actors are also interlinked with the eligibility criteria for data and for service providers. Perhaps it would be valuable to map them.
- Identifiers and Metadata:
“While maintenance of this metadata is fundamentally the responsibility of the submitter of data or other digital objects…”
- Why would maintenance of metadata be the responsibility of the submitter of data and not of the data service provider/repository?
5.3 Participation according to the business model
“The development of novel capabilities, long-term storage/maintenance of data resources and fixed cost capabilities are likely to be provided using direct payments to organisations setting up nodes in the EOSC. By contrast, numerous research activities by individual investigators may be supported via EOSC vouchers. Nodes in the EOSC will have to be able to engage with the business model. This will probably imply a business arrangement with the brokers set up by funding agencies in order to accept these vouchers as payment.”
- Why need brokers? Couldn’t transactions be arranged without additional intermediaries, which increase the costs and complexity of the ecosystem?
“As the submitters control access, they retain liability for data leakage and to ensure that relevant individuals accessing information meet the necessary requirements.”
- Why would submitters, and not the service providers, be responsible for access control and be liable for data leakage?
“As regards to data quality and warranties as to fitness for purpose, the EOSC MVE would need to operate under the principle of caveat emptor. That is, while submitters may be liable for outright fraudulent data, the nature of scientific research data determines that EOSC data should probably be provided with no warranties for any particular purpose, although Section 5.5 section below, on assessing data quality, should be also taken into consideration.”
- What does “the nature of scientific research data determines that EOSC data should probably be provided with no warranties for any particular purpose” mean? Is it not contradictory with the statements which follow straight after that: “Data should be: »» processed lawfully, fairly and in a transparent manner in relation to the data subject (principle of ‘lawfulness, fairness and transparency’); »» collected for specified, explicit and legitimate purposes;”
- The GDPR is about data “processing”, not about data “collection” only. Data re-use is also form of data processing. The statements above seem contradictory to “no warranties for any particular purpose”
5.5 Data quality
- Suggestions are made that search results for datasets could be arranged based on reviews, views etc., and a comparison is made to TripAdvisor. I find this rather worrying: 1. Isn’t there a risk that this would lead to a high risk of data/score manipulation? 2. Wouldn’t it lead to self-perpetuation of certain objects/datasets? (similarly to what happened with journal impact factors)? 3. This could also be very detrimental to certain disciplines