On Thursday 30 August and on Friday 31 August TU Delft Library hosted two events dedicated to the new European General Data Protection Regulation (GDPR) and its implications for research data. Both events were organised by the Research Data Netherlands: collaboration between the 4TU.Center for Research Data, DANS and SURF (represented by the National Research Data Management Coordination Point).
First: do no harm. Protecting personal data is not against data sharing
On the first day, we heard case studies from experts in the field, as well as from various institutional support service providers. Veerle Van den Eynden from the UK Data Service kicked off the day with her presentation, which clearly stated that the need to protect personal is not against data sharing. She outlined the framework provided by the GDPR which make sharing possible, and explained that when it comes to data sharing one should always adhere to the principle “do no harm”. However, she reflected that too often, both researchers and research support services (such as ethics committees), prefer to avoid any possible risks rather than to carefully consider them and manage them appropriately. She concluded by providing a compelling case study from the UK Data Service, where researchers were able to successfully share data from research on vulnerable individuals (asylum seekers and refugees).
From a one-stop shop solution to privacy champions
We have subsequently heard case studies from four Dutch research institutions: Tilburg University, TU Delft, VU Amsterdam and Erasmus University Rotterdam about their practical approaches to supporting researchers working with personal research data. Jan Jans from Tilburg explained their “one stop shop” form, which, when completed by researchers, sorts out all the requirements related to GDPR, ethics and research data management. Marthe Uitterhoeve from TU Delft said that Delft was developing a similar approach, but based on data management plans. Marlon Domingus from Erasmus University Rotterdam explained their process based on defining different categories of research and determining the types of data processing associated with them, rather than trying to list every single research project at the institution. Finally, Jolien Scholten from VU Amsterdam presented their idea of appointing privacy champions who receive dedicated training on data protection and who act as the first contact points for questions related to GDPR within their communities.
Lots of inspiring ideas and there was a consensus in the room that it would be worth re-convening in a year’s time to evaluate the different approaches and to share lessons learned.
How to share research data in practice?
Next, we discussed three different models for helping researchers share their research data. Emilie Kraaikamp from DANS presented their strategy for providing two different access levels to data: open access data and restricted access data. Open datasets consist mostly of research data which are fully anonymised. Restricted access data need to be requested (via an email to the depositor) before the access can be granted (the depositor decides whether access to data can be granted or not).
Veerle Van Den Eynden from the UK Data Service discussed their approach based on three different access levels: open data, safeguarded data (equivalent to “restricted access data” in DANS) and controlled data. Controlled datasets are very sensitive and researchers who wish to get access to such datasets need to undergo a strict vetting procedure. They need to complete training, their application needs to be supported by a research institution, and typically researchers access such datasets in safe locations, on safe servers and are not allowed to copy the data. Veerle explained that only a relatively small number of sensitive datasets (usually from governmental agencies) are shared under controlled access conditions.
The last case study was from Zosia Beckles from the University of Bristol, who explained that at Bristol, a dedicated Data Access Committee has been created to handle requests for controlled access datasets. Researchers responsible for the datasets are asked for advice how to respond to requests, but it is the Data Access Committee who ultimately decides whether access should be granted or not, and, if necessary, can overrule the researcher’s advice. The procedure relieves researchers from the burden of dealing with data access requests.
DataTags – decisions about sharing made easy(ier)
Ilona von Stein from DANS continued the discussion about data sharing and means by which sharing could be facilitated. She described an online tool developed by DANS (based on a concept initially developed by colleagues from Harvard University, but adapted to European GDPR needs) allowing researchers to answer simple questions about their datasets and to return a tag, which defines whether data is suitable for sharing and what are the most suitable sharing options. The prototype of the tool is now available for testing and DANS plans to develop it further to see if it could be also used to assist researchers working with data across the whole research lifecycle (not only at the final, data sharing stage).
What are the most impactful & effortless tactics to provide controlled access to research data?
The final interactive part of the workshop was led by Alastair Dunning, the Head of 4TU.Center for Research Data. Alastair used Mentimeter to ask attendees to judge the impact and effort of fourteen different tactics and solutions which can be used at research institutions to provide controlled access to research data. More than forty people engaged with the online survey and this allowed Alastair to shortlist five tactics which were deemed the most impactful/effort-efficient:
- Create a list of trusted archives for researchers can deposit personal data
- Publish an informed consent template for your researchers
- Publish on university website a list of FAQs concerning personal data
- Provide access to a trusted Data Anonymisation Service
- Create categories to define different types of personal data at your institution
Alastair concluded that these should probably be the priorities to work on for research institutions which don’t yet have the above in place.
How to put all the learning into practice?
The second event was dedicated to putting all the learning and concepts developed during the first day into practice. Researchers working with personal data, as well as those directly supporting researchers, brought their laptops and followed practical exercises led by Veerle Van den Eynden and Cristina Magder from the UK Data Service. We started by looking at a GDPR-compliant consent form template. Subsequently, we practised data encryption using VeraCrypt. We then moved to data anonymisation strategies. First, Veerle explained possible tactics (again, with nicely illustrated examples) for de-identification and pseudo-nymisation of qualitative data. This was then followed by a comprehensive hands-on training delivered by Cristina Magder on disclosure review and de-identification of quantitative data using sdcMicro.
Altogether, the practical exercises allowed one to clearly understand how to effectively work with personal research data from the very start of the project (consent, encryption) all the way to data de-identification to enable sharing and data re-use (whilst protecting personal data at all stages).
Conclusion: GDPR as an opportunity
I think that the key conclusion of both days was that the GDPR, while challenging to implement, provides an excellent opportunity both to researchers and to research institutions to review and improve their research practices. The key to this is collaboration: across the various stakeholders within the institution (to make workflows more coherent and improve collaboration), but also between different institutions. An important aspect of these two events was that representatives from multiple institutions (and countries!) were present to talk about their individual approaches and considerations. Practice exchange and lessons learned can be invaluable to allow institutions to avoid similar mistakes and to decide which approaches might work best in particular settings.
We will definitely consider organising a similar meeting in a year’s time to see where everyone is and which workflows and solutions tend to work best.
Presentations from both events are available on Zenodo:
Presentation by Shalini Kurapati and Michiel de Jong for PhD students at TPM faculty at TU Delft (presented on 7 September 2018): https://doi.org/10.5281/zenodo.1409027
Written by Marta Teperek and Alastair Dunning
There are lots of drivers pushing for preservation of research data long-term and to make them Findable Accessible Interoperable are Re-usable. There is a consensus that sharing and preserving data makes research more efficient (no need to generate same data all over again), more innovative (data re-use across disciplines) and more reproducible (data supporting research findings are available for scrutiny and validation). Consequently, most funding bodies require that research data are stored, preserved and made available for at least 10 years.
For example, the European Commission requires that projects “develop a Data Management Plan (DMP), in which they will specify what data will be open: detailing what data the project will generate, whether and how it will be exploited or made accessible for verification and re-use, and how it will be curated and preserved.”
But who should pay for that long-term data storage and preservation?
Given that most funding bodies now require that research data is preserved and made available long-term, it is perhaps natural to think that funding bodies would financially support researchers in meeting these new requirements. Coming back to the previous example, the funding guide for the European Commission’s Horizon 2020 funding programme says that “costs associated with open access to research data, including the creation of the data management plan, can be claimed as eligible costs of any Horizon 2020 grant.”
So one would think that the problem is solved and that funding for making data available long-term can be obtained. But then… why would we be writing this blog post?… As is usually the case, the devil is in the detail. European Commission’s financial rules require that grant money can only be spent during the timeline of the project (and only for the duration of the project).
Naturally, long-term preservation of research data occurs only after datasets have been created and curated, and most of the time only starts at the time when the project finishes. In other words, the costs of long-term data preservation are not eligible costs on grants funded by the European Commission*.
Importantly, the European Commission’s funding is just an example. Most funding bodies do not consider the costs of long-term data curation as eligible costs on grants. In fact, the author is not aware of any funding body which would consider these costs eligible**.
So what’s the solution?
Funding bodies suggest that long-term data preservation should be offered to researchers as one of the standard institutional support services. The costs of these should be recovered within overhead/indirect funding allocation on grant applications. Grants from the European Commission have a flat 25% rate overhead allocation. Which is already generous compared with some other funding bodies which do not allow any overhead cost allocation at all. The problem is that at larger, research-intensive institutions, the overhead costs are at around 50% of the original grant value.
This means that for every 1 mln Euro which researchers receive to spend on their research projects, research institutions need to find an extra 0.5 mln Euro from elsewhere to support these projects (facilities costs, administration support, IT support, etc.). Therefore, given that institutions are already not recovering their full economic costs from research grants, it is difficult to imagine how the new requirements for long-term data preservation can be absorbed within the existing overhead/indirect costs stream.
The problems described above are not new. In fact, these were discussed with funding bodies already on several occasions (see here and here for some examples). But not much has changed so far. There were no new streams of money made available: nor through direct grant funding, nor through increased overhead caps for institutions providing long-term preservation services for research data.
Meantime, researchers (those creating large datasets in particular) continue to struggle to find financial support for long-term preservation and curation of their research data, as nicely illustrated in a blog post by our colleagues at Cambridge.
Since the discussions with funding bodies held by individual institutions did not seem to be fruitful, perhaps the time has come for some joint up national (or international) efforts. Could this be an interesting new project to tackle by the Dutch National Coordination Point Research Data Management (LCRDM)?
* – Some suggest that the costs are eligible if the invoices for long-term data preservation are paid during the lifetime of the project. However, this is only true if the invoice itself does not specify that the costs are for long-term preservation (i.e. says that the invoice is simply for ‘storage charges’, without indicating the long-term aspects of it). Which only confirms the fact that funders are not willing to pay for long-term preservation and forces some to use more creative tactics and measure to finance long-term preservation.
** – Two funding bodies in the UK, NERC (Natural Environment Research Council) and ESRC (Economic and Social Research Council), pay for the costs of long-term data preservation by financing their own data archives (NERC Data Centres and the UK Data Service, respectively) where the grantees are required to deposit any data resulting from the awarded funding.
- Django is updated
- We’ve added some explanatory text on the login page to give more clarity on our main conditions for accepting datasets
- 4TU.Centre for Research Data will automatically be added as publisher to all new datasets
- Provisioning updates are implemented
Authors: Wilma van Wezenbeek, Alastair Dunning, Marta Teperek
Date: 3 August 2018
- Interim version of the FAIR Data Action Plan: https://doi.org/10.5281/zenodo.1285290
- Interim version of the FAIR Data Report: https://doi.org/10.5281/zenodo.1285272
- TU Delft welcomes these two very thorough reports, with plenty of valid recommendations. TU Delft is particularly pleased to see the importance of long-term curation, data stewardship and disciplinary frameworks for FAIR highlighted in the Action Plan
- Careful attention needs to be paid to role and responsibilities, particularly with regards to the role of data stewards. Data stewards are not be the only keepers of research data; researchers must assume individual responsibility as well.
- There should be an explicit action on funders to recognise the costs of long-term data curation and preservation as eligible costs on grants (this is a particular problem for current EU funding mechanisms)
- The general public are not addressed in the document. The Action Plan should address how the FAIR principles can be understood to help maintain broad trust in science
- There also needs to be attention on the implication of implementing FAIR data for those working between academia and industry.
- There should be greater clarity between tasks that occur at national level, and those that occur at international level.
- There is nothing about the governance of the FAIR Principles. Do the Principles require a broader governance model so they can represent all the necessary stakeholders?
- Consideration to be given to renaming DMPs as Output Management Plans to allow for the management of code, models etc.
- There is overlap between many of the recommendations; they should be merged to provide a more concise document
FAIR Data report: Specific Comments on the FAIR Data report:
Page 10, “The key arguments can be summarised in three categories”
- Agreed that these are important, though I miss the “peer advantage”, so not per se sharing your data because you want or need to be integer, or want to be transforming research, but because immediate and open sharing helps research further (perhaps this is just semantics?
Page 23, “…stewardship in the wider science community”.
- In this context, ‘science’ could be more clearly defined. The LERU open science roadmap provides a good example
“..and the international scientific unions”.
- It is indeed preferable not to limit discussions to data-related groups or venues. Data-related discussions should indeed happen at normal “community” conferences and venues.
Page 24, “It would be useful to define use cases taking advantage of FAIR beyond their current data sharing capacities to convince such communities to engage more fully with a FAIR ecosystem.”
- Perhaps there is a room for a clear recommendation here?
Page 28, “..motivation to join the movement”
- The word “movement” might have some negative connotations
Page 32, “A set of case study examples should be developed and maintained to demonstrate that providing FAIR data can increase the impact of facilities by increasing data reuse and thereby return on investment in the facility.”
- Perhaps there is a room for a clear recommendation here?
Page 47, “Researchers should also preferably deposit in certified repositories”
- Perhaps Rec 18 on this page could be combined with Rec 10, so that the researchers know what certification is the “right” one (although is challenging), and relate this to what is said said in the following section about researcher awareness. To know whether a journal is eligible, researchers use(d) the impact factor, now they need something to identify the right certified data repository (e.g. CoreTrustSeal)
Comments on FAIR Data Action Plan
Rec.1, page 3:
The FAIR principles should be consulted on and clarified to ensure they are understood to include appropriate openness, timeliness of sharing, assessability, data appraisal, long-term stewardship and legal interoperability.
Stakeholders: Global coordination fora; Research communities; Data services.
- We feel that policy makers & research funders should be listed as a stakeholder here as well. In the end their policy requirements (and their definitions) will be the key drivers for researchers and academic institutions (key content producers)
Rec. 2: Mandates and boundaries for Open
- Perhaps it would be worth adding add that with the use of the principle “as open as possible”, the slide shifts once other criteria have been made explicit (e.g. national security or endangered species)
- A further recommendation could be to create specific rights statements for data, along the lines of http://rightsstatements.org/
Concrete and accessible guidance should be provided to researchers in relation to sharing sensitive and commercial data as openly as possible.
Stakeholders: Data stewards; Data services; Institutions; Publishers.
- Much more needs to be done to help the sharing of sensitive and commercial data, for instance development of shared practices and protocols for how academia and industry share and manage research data. Best practice should be documented.
- Policy makers & research funders should be part of this as well => their endorsement of the guidance is necessary
Rec. 4: Components of a FAIR data ecosystem
- While these are all infrastructure-related elements, it is strange that Research Communities are not identified as key stakeholders in every action
Rec. 8: Cross-disciplinary FAIRness
Case studies for cross-disciplinary data sharing and reuse should be collected. Based on these case studies, mechanisms that facilitate the development of frameworks for interoperability and reuse should be developed.
Stakeholders: Global coordination fora; Data stewards
- Research communities are also key stakeholders here – they are the key actors and need to be willing to work with the data stewards and the global coordination for the case studies to be identified and developed.
Rec. 9: Develop robust FAIR data metrics
- It should be clarified who the data metrics are for – different stakeholders require different metrics
- Any reference to rewarding FAIR practices is missing – perhaps add this reference to Rec. 14
Rec. 10: Trusted Digital Repositories
At an appropriate point, the language of the CoreTrustSeal requirements should be reviewed and adapted to reference the FAIR data principles more explicitly (e.g. in sections on levels of curation, discoverability, accessibility, standards and reuse).
Stakeholders: Global coordination fora; Data services; Institutions
- This is a very important point. Perhaps, in-line with previous valid points in the action plan, it would be useful to say that the metrics of FAIRness need to be developed with the research communities, that the language (and the requirements themselves) might have to take into account disciplinary differences. This might be particularly important when it comes to very confidential datasets, very large datasets etc.
Rec. 11: Develop metrics to assess and certify data services
“Certification schemes are needed to assess all components of the FAIR data ecosystem. ….”
- More elements need to be taken into account here indeed – see the report by Bilder and Neylon
Rec. 12: Data Management via DMPs
- The introduction text to this recommendation reads: “…The DMP should be regularly updated to provide a hub of information on the FAIR data objects.”. Given that recommendations 3 and 16 discuss the dependencies between datasets and other types of research outputs (crucially, code), and that the very introduction talks about ‘FAIR data objects’, shouldn’t DMPs be rebranded to “Output Management Plans”?
- While this is of course a useful recommendation, reading the associated questions and seeing the reference: “This applies to the relatively small, informal projects of individual scientists ..”, there is a worry about administering the process too much. This section is called “ Creating a culture of FAIR data”. Perhaps the focus should be on a transition to a culture where researchers understand autonomously that their data should be FAIR. That has a slightly different angle to it.
Data Management Plans should be living documents that are implemented throughout the project. A lightweight data management and curation statement should be assessed at project proposal stage, including information on costs and the track record in FAIR. A sufficiently detailed DMP should be developed at project inception. Project end reports should include reporting against the DMP.
Stakeholders: Funders; Institutions; Data stewards; Research communities
- The statement at the project proposal stage should also outline any reasons for not making data available.
Rec. 13: Professionalise data science and data stewardship roles
Key data roles need to be recognised and rewarded, in particular, the data scientists who will assist research design and data analysis, visualisation and modelling; and data stewards who will inform the process of data curation and take responsibility for data management.
Stakeholders: Funders; Institutions; Publishers; Research communities.
- Researchers, the creators of content, together with the data stewards, have the responsibility for data management. Data Stewards support researchers in data management, but given that they are not the creators of the data, they can only help researchers (who are willing to get help) to manage their research data. They cannot enforce good data management on researchers if the researchers are not willing to cooperate. In addition, data stewards should definitely be included as stakeholders in this point.
Professional bodies for these roles should be created and promoted. Accreditation should be developed for training and qualifications for these roles.
Stakeholders: Institutions; Data services; Research communities.
- Again, given that the discussion is also about data stewards, they should be also recognised as key stakeholders for this action point.
Rec. 14: Recognise and reward FAIR data and data stewardship
Credit should be given for all roles supporting FAIR data, including data analysis, annotation, management, curation and participation in the definition of disciplinary interoperability frameworks.
Stakeholders: Funders; Publishers; Institutions.
- Research communities and data stewards are key stakeholders for this action. Crediting FAIR data is dependent on those creating and re-using data outputs. Do they correctly credit resources they re-used? Do they appropriately credit those who helped them collate, analyse or manage the data?
Evidence of past practice in support of FAIR data should be included in assessments of research contribution. Such evidence should be required in grant proposals (for both research and infrastructure investments), for career advancement, for publication and conference contributions, and other evaluation schemes.
Stakeholders: Funders; Institutions; Publishers; Research communities.
- One element which would be worth mentioning here is hiring criteria. Best practice could be shared
The contributions of organisations and collaborations to the development of certified and trusted infrastructures that support FAIR data should be recognised, rewarded and appropriately incentivised.
Stakeholders: Funders; Institutions.
- This is particularly important for institutions which provide necessary elements of FAIR ecosystem components, but cannot get funding for these through traditional EC funding mechanisms (institutional costs are considered as overheads, which however are capped at 25% and cannot be spent post-project, eg on long-term archiving)
Rec. 15: Policy harmonisation
Concerted work is needed to update policies to incorporate and align with the FAIR principles to ensure that policy properly supports the FAIR data Action Plan.
A funders’ forum at a European and global level should do concrete work to align policies, DMP requirements and principles governing recognition and rewards.
- These two points are independent and need to be coordinated, e.g. policymakers on national and institutional levels need to take into account recognition and rewards, and need to do it jointly with the funding bodies; institutional requirements and funders requirements should be aligned as well. These two points should be merged into one and both funders and policy makers are listed as key stakeholders.
Rec. 16: Broad application of FAIR
- This recommendation is vague and repeats ideas elsewhere.
- Why not try to use the same principles for publications (research articles) to see the publications also as part of the “EOSC ecosystem”
Rec. 17: Selection and prioritisation of FAIR Data Objects
- These actions should be put high on institutional priority lists, they would help both researchers and research leaders
When data are to be deleted as part of selection and prioritisation efforts, metadata about the data and about the deletion decision should be kept.
Stakeholders: Research communities; Data stewards; Data services.
- This feels like an unnecessary burden on the community. Attention should be focused on archiving data that are needed to validate research.
Rec. 18: Deposit in Trusted Digital Repositories
Concrete steps need to be taken to ensure the development of domain repositories and data services for interdisciplinary research communities so the needs of all researchers are covered.
Stakeholders: Data services; Funders; Institutions.
- Researchers should be recognised as key stakeholders here, as they are the ones whose needs have to be addressed.
Rec. 19: Encourage and incentivise data reuse
- This is a weak section and could be absorbed in to others.
- It misses reference to academic rewards systems – if researchers are encouraged to re-use, they should be also appropriately recognised for this in academic evaluation criteria (projects might not be necessarily “novel”). There should be a cross-reference with Rec. 14
Rec. 21: Use information held in Data Management Plans
DMPs should be explicitly referenced in systems containing information about research projects and their outputs (CRIS). Relevant standards and metadata profiles, should consider adaptations to include DMPs as a specific project output entity (rather than inclusion in the general category of research products). The same should apply to FAIR Data Objects.
Stakeholders: Standards bodies; Global coordination fora; Data services.
- Logically, these should be appropriately rewarded (Rec. 14) and consequently, Funders and Policy makers should be considered as stakeholders here.
DMPs themselves should conform to FAIR principles and be Open where possible.
Stakeholders: Data services; Research communities; Policymakers.
- Funders are a key stakeholder here as well, as they can mandate that DMPs are openly available
Skills and roles for FAIR
- Introduction says: “Data stewards who manage data, ensure that it is FAIR and prepare it for long term curation are also essential.” Data Stewards support researchers in data management, but researchers themselves are the ultimate data procedures. Data Stewards cannot enforce good data management without the will and cooperation from researchers. Please rephrase to: “Data stewards who support researchers in data management and help ensure that it is FAIR and prepare it for long term curation are also essential.”
Rec. 29: Implement FAIR metrics
- Seems repetitive with Rec. 9
- Please provide reference to Rec. 31
Rec 30: Monitor FAIR
- Here also I feel that we are overstretching ourselves slightly. Funders need to monitor much, and yes any requirements or compliancy issues need to be tracked, but the research and funder have more related issues to tackle under the open science or “just science” umbrella. Is that all now captured under FAIR? Or should it be the other way round?
Rec. 31: Support data citation and next generation metrics
- Provide reference to software citation recommendations.
- There should be an additional action on publishers to scrap the limit on the number of possible citations. For example, in Science reports, the maximum number of citations is 30. As a result, not only researchers struggle to cite all relevant literature within this citation number limit, but would be also disinclined to cite datasets. The citation limits might have been justified in the print era. However, in the digital era citation limits are anachronistic and only serve the interests of publishers, who wish to further boost the impact factor of their most prominent venues.
Rec. 32: Costing data management
- There should be an explicit action on funders to recognise the costs of long-term data curation and preservation as eligible costs on grants.
Authors: Wilma van Wezenbeek, Alastair Dunning, Marta Teperek
Date: 3 August 2018
- Report: Prompting EOSC in Practice (Rules of Participation are on page 28): https://www.eudat.eu/sites/default/files/prompting_an_eosc_in_practice_eosc_hleg_interim_report.pdf
- EOSC Open Consultation website: https://eoscpilot.eu/open-consultation
Structure of our comments
- Overarching comments
- Specific comments on the Report Prompting EOSC in Practice
- Comments on specific Rules of Participation
- Overall, the report is a useful outline of the vision for the European Open Science Cloud (EOSC), with concrete proposals for the rules of participation.
- Recommendations for working with commercial partners need to be carefully thought through. In particular, mechanisms need to be in place to ensure that commercial partners participating in the EOSC do so on the same rights as non-commercial partners.
- Care needs to be taken when deciding on recommendations for Member States and for research communities. The latter are always international in nature.
- It is not clear why additional intermediaries are required for managing financial contributions to the EOSC partners. Couldn’t transactions be arranged without additional intermediaries, which increase the costs and complexity of the ecosystem?
- At some places the recommendations are unclear, lacking structure and alignment. It feels as if the founding vision for the EOSC is not clear enough to create harmonised rules for participation.
- The chapters on business model and financing feel like “far away” from today’s practice.
Specific comments on the Report Prompting EOSC in Practice:
Page 8 and 9 (Executive Summary)
“To help drive forward and implement the EOSC, the main thread of the report is to understand how the EOSC can effectively interlink People, Data, Services and Training, Publications, Projects and Organisations.”
- Should “labs and instruments” and “places” be added there as well?
“The EOSC should implement “whatever works” and do “whatever it takes” to increase the availability and volume of quality & user-friendly scientific information on-line.”
- Why should EOSC focus on “ volume”? This is an odd phrase to use.
“Define an EOSC Quality of Service (QoS) standards, separate for all elements of the ecosystem (data, data access services, software, etc.), to develop a trustable ecosystem.”
- Publications should also be part of the EOSC ecosystem
“Introduce, as part of EOSC’s mission, that a state of the art analysis is carried out on a national level within the Member States for assessing statistics and key assets around the composition and relevant clustering of the community of users, with the respective eInfrastructures & research infrastructures & scientific communities.”
- Why do this on a national (Member State) level? That is not the way these communities are constructed, or how science works.
“The universal entry point to the EOSC should provide access to a marketplace of efficient and effective services, with lightweight integration (authentication and authorization infrastructure, order management, etc…) where service providers can find service users and vice versa. Nothing is wrong with a number of multiple entry points which should be seen as a plus rather than a negative fragmentation.”
- This recommendation is unclear. We want a universal entry point, but promote decentralised ones?
“Introduce a regular assessment of EOSC against other alternatives, including commercial providers. This could be made to either enhance an EOSC Service, or to support new Services;”
- Alternatives to EOSC? This recommendation is unclear.
“Build a workforce able to execute the vision of the EOSC by ensuring data stewards, data and infrastructure technologists and scientific data experts who are trained and supported adequately.”
- I notice here and also in the six action lines on page 14 an expansion of what the EOSC should be or become, and envisions. The question whether that actually will help pace and clarity. Is EOSC not slowly taking over all the topics laid down in the OSPP?
“All activities mentioned above have a stronger focus on research data as opposed to services for research data management.” (page 17)
- Why would “research data” and “research data management” be presented as opposing activities?
“Flexible ways to access and share data and direct access to fast networks to do so are at the top of the agenda for researchers.” (3.2, page 19)
- This is not true at all. What about inability to find datasets because of lack of interoperability and integrated resources? Or the lack of recognition for good data management? Or not being rewarded for doing thorough and reproducible research?
3.4. Governance (page 21)
“Cooperation is needed between end user, service providers, and funding agencies / policy makers.”
- And what about the organisations that represent the end users?
- How the depicted layers and other existing temporary or structural governing bodies will work together (e.g. OSPP, Science Europe, ALLEA, EUA, etc.)
4.2 Business model (page 23)
“The EOSC Business model is a critical non-technical element that will determine the success of the EOSC vision.”
- Scoping principles around the business model requirements are also needed, to outline governance structure (of the infrastructure or service itself), community involvement, sustainability (see the paper by Bilder and Neylon), ownership and openness
“The currents model for provisioning access to Research Infrastructures is based on the guidelines contained in the Charter for Access, where three main models are described” and “a model based on the Wide Access mode modulated by a negotiated, agreeable Access restriction, is the pragmatic way to start moving with the EOSC. Private providers willing to provide resources within the EOSC framework will envision a Market-Driven approach to support users.”
- Also in reference to the guiding principles, along with the business model, it seems good to set Wide Access as the default, and jointly decide where exceptions are allowed. Simply saying that private providers will envision a Market-Driven approach seems to be against the Rule of Participation 5.1. that “Private sector users should be considered stakeholders in the EOSC as well as participants from the start, not added after (…). By participating, private sector may want to invest in the long-term development and sustainability of the EOSC, along with the public sector and not just serve to exploit public data for free.”
- Also, while Excellence-Driven Access model and Market-Driven Access are well defined, the principles behind the Wide Access model need to be better articulated.
“To coordinate acquirement, the EOSC and member states would also certify one or more brokers to manage the acquisition, distribution and payment for EOSC vouchers. These brokers could be government agencies in member states, entities within member states, transnational governments or private firms” (p.25)
- Why would it be necessary to involve brokers in the process?
4.3 Funding Model and Payment Mechanisms (page 25)
“…similarly to how YouTube pays people who upload videos based on how many times they are viewed.”
- You need registration of your account and be compliant to get paid by YouTube, so that is not the default situation. We would plea for a null or onset situation based on reciprocity, not immediately starting with payments.
- Difficult to judge what would be the best fit. Also, guiding principles are needed here, e.g. transparency, efficiency and simplicity. Is there a way to avoid the giant profit margins being made by some players in the scientific publications industry? What principle should be used to achieve this? What have we learned from the big deals? We want researchers (end users) to be cost-aware, without stressing them with workflow troubles and micropayments.
- In Direct Support: the disadvantage “Resources can have internal foci, reducing access from outside stakeholders” could be easily overcome by establishing clear funding rules demanding equal access rights to internal and external stakeholders
- In Direct Support: the disadvantage “Burdensome for commercial entities, even where they could provide significant cost savings and be incentivized to innovate.” – this is unclear to me – why burdensome, and why specifically for commercial entities?
Comments on specific Rules of Participation (from p. 28 of the report Prompting EOSC in Practice )
5.1 Federating the existing infrastructures
“Private sector users should be considered stakeholders in the EOSC as well as participants from the start, not added after (…). By participating, private sector may want to invest in the long-term development and sustainability of the EOSC, along with the public sector and not just serve to exploit public data for free.
Brokers would be obliged to behave in a disinterested fashion with all providers. Entities that establish brokers must require that the broker does not establish a monopoly, or fall under the control of a service provider that then uses their influence to exclude other service providers from the marketplace.”
- How is this going to be achieved in practice?
5.2 Eligibility criteria for actors
“Key rules for participants therefore will include”
- These rules for actors are also interlinked with the eligibility criteria for data and for service providers. Perhaps it would be valuable to map them.
- Identifiers and Metadata:
“While maintenance of this metadata is fundamentally the responsibility of the submitter of data or other digital objects…”
- Why would maintenance of metadata be the responsibility of the submitter of data and not of the data service provider/repository?
5.3 Participation according to the business model
“The development of novel capabilities, long-term storage/maintenance of data resources and fixed cost capabilities are likely to be provided using direct payments to organisations setting up nodes in the EOSC. By contrast, numerous research activities by individual investigators may be supported via EOSC vouchers. Nodes in the EOSC will have to be able to engage with the business model. This will probably imply a business arrangement with the brokers set up by funding agencies in order to accept these vouchers as payment.”
- Why need brokers? Couldn’t transactions be arranged without additional intermediaries, which increase the costs and complexity of the ecosystem?
“As the submitters control access, they retain liability for data leakage and to ensure that relevant individuals accessing information meet the necessary requirements.”
- Why would submitters, and not the service providers, be responsible for access control and be liable for data leakage?
“As regards to data quality and warranties as to fitness for purpose, the EOSC MVE would need to operate under the principle of caveat emptor. That is, while submitters may be liable for outright fraudulent data, the nature of scientific research data determines that EOSC data should probably be provided with no warranties for any particular purpose, although Section 5.5 section below, on assessing data quality, should be also taken into consideration.”
- What does “the nature of scientific research data determines that EOSC data should probably be provided with no warranties for any particular purpose” mean? Is it not contradictory with the statements which follow straight after that: “Data should be: »» processed lawfully, fairly and in a transparent manner in relation to the data subject (principle of ‘lawfulness, fairness and transparency’); »» collected for specified, explicit and legitimate purposes;”
- The GDPR is about data “processing”, not about data “collection” only. Data re-use is also form of data processing. The statements above seem contradictory to “no warranties for any particular purpose”
5.5 Data quality
- Suggestions are made that search results for datasets could be arranged based on reviews, views etc., and a comparison is made to TripAdvisor. I find this rather worrying: 1. Isn’t there a risk that this would lead to a high risk of data/score manipulation? 2. Wouldn’t it lead to self-perpetuation of certain objects/datasets? (similarly to what happened with journal impact factors)? 3. This could also be very detrimental to certain disciplines