Between 14 and 17 October 2019 I attended the Beilstein Open Science Symposium. As always, excellent, inspiring talks. This year’s talks related to openness and commercialisation were particularly interesting to me, so I would like to share some of my thoughts and observations.
Collaboration with industry is at the core of many research projects at Delft University of Technology. However, working with industry and commercialisation often entails secrecy and close protection of knowledge. At the same time, the University is also a public body, and a substantial proportion of its funding comes from taxpayers’ money. Research funded by the public should be shared as broadly as possible with the public. So how do these two come together? Is openness inherently antagonistic to commercialisation? Can there be a middle ground?
Industry, academia and the public as allies
Chas Bountra, the Pro-Vice-Chancellor for Innovation at the University of Oxford and the Chief Scientist at the Structural Genomics Consortium (SGC) provided a compelling example of how industry and academia can work together to find new medicines and address some of the most pressing healthcare problems in the society.
Bringing a new drug to market typically costs pharma companies several billion dollars. To ensure return on investment, pharmaceutical companies need to make successful drugs appropriately priced. This, in turn, might make life-saving medicines unaffordable to patients and healthcare providers. Why does it cost so much money to make new drugs? Chas explained that everyone seems to be working on similar drug targets: both industry and academia read the same papers, attend the same conferences, and come up with the same ideas in parallel. Secrecy of the research process means that no one shares negative outcomes of their studies (true for both academia and industry). As a result, only about 7.5% of potential cancer drugs which enter Phase I of clinical trials, make it to the market. This also means that successful drugs need to compensate in their price for all the unsuccessful ones.
Structural Genomics Consortium was created as a collaboration between academia, public funders and industry (nine big pharma companies) out of a desire to accelerate to find new medicines and to improve discovery of new drug targets. Resources from all partners are being pooled to make these to processes more efficient. In addition, the consortium works only on novel ideas – novel targets, which are not explored elsewhere. The consortium purifies human proteins, builds assays, works out 3D structures and creates tools: highly specific inhibitors against these new targets. And how to identify these novel targets? Members of the SGC consortium work with committees composed of experts in academia, industry and clinicians who donate their free time to help SGC decide which new targets to work on. Patient groups not only provide precious human material to work on (patient tissue) but also help identify the experts, as they know well which labs all over the world work on cures for their disease.
Why would all these stakeholders do all this work for the consortium? Because all the results, all the tools and molecules developed by SGC are made available for free to anyone willing to work on them. For academics this means new, robust research tools enabling innovative research. Pharma companies benefit because they get the chance to take these novel, highly specific molecules and turn them into successful drugs. Clinicians and patients are motivated by the collaboration as it brings hope for new medicines.
In the end, everyone benefits from openness and collaboration. By now over 70 molecules have been generated by SGC, which are made available to anyone interested in working on them.
Collaboration and openness at any scale speeds up innovation
The example of SGC is certainly inspiring. At the same time, perhaps a bit intimidating for others to follow. Establishing an open collaboration with nine big pharma companies and numerous academics and clinicians is certainly not an easy task to achieve, which must require a lot of trust and relationship building. What if you don’t yet have such connections? Or what if you are an early career researcher, who doesn’t yet have such connections?
I was greatly inspired by the talk of Lori Ferrins from Northeastern University. Lori is part of Michael Pollastri’s lab, which is working on neglected tropical diseases (NTDs). NTDs are a group of parasitic diseases, such as malaria or sleeping sickness, that disproportionately affect those living in poverty. Pharmaceutical companies are not interested in developing drugs for these diseases because there is no commercial incentive (return on investment rather unlikely). To address this issue, Lori and her colleagues collaborate with pharma companies and with other academic labs. Pharma companies provide access to their existing molecules and are then trying to repurpose these existing molecules into effective parasite growth inhibitors. Academics join in driven by their research interest.
However, not everyone in such collaboration is comfortable with going fully open. To address this issue and to enable cooperation nonetheless, the lab developed a shared database where all data and results are shared within the group of collaborators. In addition, various levels of sharing and collaboration are allowed to ensure that investigators are comfortable to work together. Lori’s story is, therefore, a beautiful example that flexibility can be essential and sharing can occur at various levels and scales. What’s most important is that collaboration and information exchange happens. This helps reduce duplication of effort (collaboration and division of labour instead of competition) and speeds up innovation.
Open source and commercialisation
Lastly, Frank Schuhmacher spoke about his impressive open hardware endeavour, which is to create an automated oligosaccharide synthesizer. An automated oligosaccharide synthesizer is a machine able to automate the multi-step synthesis reaction of longer saccharide molecules. Self-made synthesizer offers researchers a lot of flexibility: they can add and remove various components of the synthesizer, as necessary for a particular reaction. In addition, researchers are also fully in control if anything goes wrong (without relying on obscure block box mechanisms provided by commercial companies). Moreover, the automation of chemical reactions means more reproducible research.
Frank’s talk sparked a discussion about whether open hardware projects can become self-sustainable and whether they offer any commercialisation potential. And here inspiration from my TU Delft colleague Jerry de Vos, who is involved in Precious Plastics, came in very handy. Precious Plastics started as a collaboration between people who wanted to help recycle the ever-growing amount of plastic waste. They have built a series of machines, which are all modular and consist of simple components. Designs for these machines are available openly – meaning that anyone interested can re-use the design, build their own machines and contribute to plastic recycling. So where’s the money? The fact that everything is open, means that money can be anywhere one can think of. Some business might be started by making the machines needed to process plastics commercially available (in the end, not everyone will be interested in building them themselves). Others might want to create products for sale made from recycled plastics. In fact, lots of businesses have been started with this very idea and Precious Plastics website already has its own Bazaar where myriad of pretty things made from recycled plastics are sold to customers worldwide.
The philosophy behind is that the more people join in (driven by commercial prospects or not), the more plastic is recycled.
Mix and match
Concluding, while the view that commercialisation must entail secrecy seems to still dominate in academia, the three examples above are clear demonstrations that sharing and openness do not have to go against commercialisation. To the contrary, collaboration can speed up and facilitate innovation and provide new commercial opportunities. What is therefore needed is perhaps a will to experiment and to be flexible to come up with a value proposition which would be interesting enough to all partners to join in.
And importantly, effective sharing does not mean that everything must be made publicly available – any collaboration, at any level, is better than competition.
On Thursday 30 August and on Friday 31 August TU Delft Library hosted two events dedicated to the new European General Data Protection Regulation (GDPR) and its implications for research data. Both events were organised by the Research Data Netherlands: collaboration between the 4TU.Center for Research Data, DANS and SURF (represented by the National Research Data Management Coordination Point).
First: do no harm. Protecting personal data is not against data sharing
On the first day, we heard case studies from experts in the field, as well as from various institutional support service providers. Veerle Van den Eynden from the UK Data Service kicked off the day with her presentation, which clearly stated that the need to protect personal is not against data sharing. She outlined the framework provided by the GDPR which make sharing possible, and explained that when it comes to data sharing one should always adhere to the principle “do no harm”. However, she reflected that too often, both researchers and research support services (such as ethics committees), prefer to avoid any possible risks rather than to carefully consider them and manage them appropriately. She concluded by providing a compelling case study from the UK Data Service, where researchers were able to successfully share data from research on vulnerable individuals (asylum seekers and refugees).
From a one-stop shop solution to privacy champions
We have subsequently heard case studies from four Dutch research institutions: Tilburg University, TU Delft, VU Amsterdam and Erasmus University Rotterdam about their practical approaches to supporting researchers working with personal research data. Jan Jans from Tilburg explained their “one stop shop” form, which, when completed by researchers, sorts out all the requirements related to GDPR, ethics and research data management. Marthe Uitterhoeve from TU Delft said that Delft was developing a similar approach, but based on data management plans. Marlon Domingus from Erasmus University Rotterdam explained their process based on defining different categories of research and determining the types of data processing associated with them, rather than trying to list every single research project at the institution. Finally, Jolien Scholten from VU Amsterdam presented their idea of appointing privacy champions who receive dedicated training on data protection and who act as the first contact points for questions related to GDPR within their communities.
Lots of inspiring ideas and there was a consensus in the room that it would be worth re-convening in a year’s time to evaluate the different approaches and to share lessons learned.
How to share research data in practice?
Next, we discussed three different models for helping researchers share their research data. Emilie Kraaikamp from DANS presented their strategy for providing two different access levels to data: open access data and restricted access data. Open datasets consist mostly of research data which are fully anonymised. Restricted access data need to be requested (via an email to the depositor) before the access can be granted (the depositor decides whether access to data can be granted or not).
Veerle Van Den Eynden from the UK Data Service discussed their approach based on three different access levels: open data, safeguarded data (equivalent to “restricted access data” in DANS) and controlled data. Controlled datasets are very sensitive and researchers who wish to get access to such datasets need to undergo a strict vetting procedure. They need to complete training, their application needs to be supported by a research institution, and typically researchers access such datasets in safe locations, on safe servers and are not allowed to copy the data. Veerle explained that only a relatively small number of sensitive datasets (usually from governmental agencies) are shared under controlled access conditions.
The last case study was from Zosia Beckles from the University of Bristol, who explained that at Bristol, a dedicated Data Access Committee has been created to handle requests for controlled access datasets. Researchers responsible for the datasets are asked for advice how to respond to requests, but it is the Data Access Committee who ultimately decides whether access should be granted or not, and, if necessary, can overrule the researcher’s advice. The procedure relieves researchers from the burden of dealing with data access requests.
DataTags – decisions about sharing made easy(ier)
Ilona von Stein from DANS continued the discussion about data sharing and means by which sharing could be facilitated. She described an online tool developed by DANS (based on a concept initially developed by colleagues from Harvard University, but adapted to European GDPR needs) allowing researchers to answer simple questions about their datasets and to return a tag, which defines whether data is suitable for sharing and what are the most suitable sharing options. The prototype of the tool is now available for testing and DANS plans to develop it further to see if it could be also used to assist researchers working with data across the whole research lifecycle (not only at the final, data sharing stage).
What are the most impactful & effortless tactics to provide controlled access to research data?
The final interactive part of the workshop was led by Alastair Dunning, the Head of 4TU.Center for Research Data. Alastair used Mentimeter to ask attendees to judge the impact and effort of fourteen different tactics and solutions which can be used at research institutions to provide controlled access to research data. More than forty people engaged with the online survey and this allowed Alastair to shortlist five tactics which were deemed the most impactful/effort-efficient:
- Create a list of trusted archives for researchers can deposit personal data
- Publish an informed consent template for your researchers
- Publish on university website a list of FAQs concerning personal data
- Provide access to a trusted Data Anonymisation Service
- Create categories to define different types of personal data at your institution
Alastair concluded that these should probably be the priorities to work on for research institutions which don’t yet have the above in place.
How to put all the learning into practice?
The second event was dedicated to putting all the learning and concepts developed during the first day into practice. Researchers working with personal data, as well as those directly supporting researchers, brought their laptops and followed practical exercises led by Veerle Van den Eynden and Cristina Magder from the UK Data Service. We started by looking at a GDPR-compliant consent form template. Subsequently, we practised data encryption using VeraCrypt. We then moved to data anonymisation strategies. First, Veerle explained possible tactics (again, with nicely illustrated examples) for de-identification and pseudo-nymisation of qualitative data. This was then followed by a comprehensive hands-on training delivered by Cristina Magder on disclosure review and de-identification of quantitative data using sdcMicro.
Altogether, the practical exercises allowed one to clearly understand how to effectively work with personal research data from the very start of the project (consent, encryption) all the way to data de-identification to enable sharing and data re-use (whilst protecting personal data at all stages).
Conclusion: GDPR as an opportunity
I think that the key conclusion of both days was that the GDPR, while challenging to implement, provides an excellent opportunity both to researchers and to research institutions to review and improve their research practices. The key to this is collaboration: across the various stakeholders within the institution (to make workflows more coherent and improve collaboration), but also between different institutions. An important aspect of these two events was that representatives from multiple institutions (and countries!) were present to talk about their individual approaches and considerations. Practice exchange and lessons learned can be invaluable to allow institutions to avoid similar mistakes and to decide which approaches might work best in particular settings.
We will definitely consider organising a similar meeting in a year’s time to see where everyone is and which workflows and solutions tend to work best.
Presentations from both events are available on Zenodo:
On 30 January 2018, I attended the Research Data Alliance EU Data Innovation Forum in Brussels. The meeting was focused on innovation, but the common issue discussed in every session was trust: the lack of which sometimes prevented the implementation of innovative solutions, and which sometimes, was the main driver for innovation. Below you will find some of my key reflections on the topics discussed.
Lack of trust as a barrier to innovation
First, Marta Pont Guixa introduced the results of her research into the practices of business to business data sharing. Not surprisingly, her findings revealed that many businesses relied on access to data and on the exchange of data with other businesses for development of new products, services and for efficiency gains. However, what was quite interesting to hear, is that sharing was typically happening between businesses operating within the same sector and usually a very small proportion of data was shared. Some companies complained about frequent denials of access requests.
Apart from technical challenges, the main obstacles preventing a more widespread data sharing were the issues of awareness and trust. The absence of a legal framework and uncertainties about the practical meaning of the new EU GDPR regulation meant that businesses were often unsure if they were allowed to share data and under what conditions. In addition, the risk of competition led to lack of trust over how the data will be re-used and for what purposes. During the coffee break, one of the attendees mentioned that effective sharing between businesses is often enabled by knowing the right people and gave an example of sharing facilitated by someone knowing the company’s CEO since the primary school times. Could the presence of effective legal frameworks increase the efficiency of sharing?
The discussion about sharing in a business context was extended to a wider discussion about trust: who do we trust? One hypothesised that perhaps academic institutions are more trustworthy. However, while indeed it seemed that commercial competitiveness is not as fierce in academia than in business environments, the pressure to publish and to be the first to report on academic findings, resulted in research reproducibility crisis, which in turn lead to distrust in academic research.
In addition, one mentioned that the lack of legal frameworks and adequate legal support for data sharing led to problems at academic institutions. One of the attendees reflected that overworked research support officers often preferred to be risk-averse rather than to help researchers decide on appropriate risk management strategies. This, in turn, resulted in datasets not being shared, or in researchers entering into agreements with third parties on their own, bypassing the institutional regulations. As a consequence of the latter, discussions about sharing become extremely convoluted, and with no legal agreements in place, and release of datasets becomes cumbersome. This, of course, raises further questions not only about the transparency and reproducibility of research but also prevents the wider society from benefiting from research discoveries.
Innovation leading to distrust
In the second session of the conference, Andreas Rauber reflected on the popular analogy that data is the new oil and, consequently, the notion that Data Scientist is the “sexiest job of the 21st century”. However, Andreas also mentioned that with the vast amounts of data available, data analysis becomes more and more challenging. Decision makers are served the end results and pretty pictures, which often influence not only business decisions, but also policy changes. Do these decision makers see the algorithms used, do they bother to understand the input data and the inherent biases before they decide to include the results in reports and recommendations?
So how can we trust the data? How can we trust the individuals who make their recommendations and policy changes based on “data-driven” decisions? Who scrutinises the process and who takes the ownership and accountability?
Lack of trust as a driver for innovation?
The final session was about possible blockchain applications for data. Monique Morrow from Humanized Internet discussed cases where the use of blockchain technologies could provide solutions in extreme environments where lack of trust is an everyday reality. She introduced the drama of people in humanitarian crisis situations, where the basic human right – the right of own identity, is denied. How can warzone refugees with no documentation prove their identity? How can they prove their education, degree certificates, and job qualifications? There are numerous hopeful examples of what could be achieved with the use of blockchain in situations of distrust.
It was also interesting to talk about the potential use of blockchain in scholarly communication and in the management of (research) data. Edwin Morley-Fletcher introduced the concept developed by the consortium My Health My Data, which aims to create a novel blockchain-based platform for handling medical data transactions. There were recently numerous discussions about the potential of blockchain technologies for scholarly communication, including the extensive report “Blockchain for Research”, but it is yet unknown whether blockchain will be the game changer. As Peter Wittenburg summarised, the blockchain, as any novel technology, is not yet perfect, and there will certainly be issues which will have to be addressed.
And my final reflection: is the distrust in academia enough of a problem to create the need for blockchain-based solutions? Or, perhaps, the technology will develop so attractive offerings that trust issues won’t be the main drivers for adoption anymore. I hope for the latter!
Date: 29 January 2018
Author: Marta Teperek, Data Stewardship Coordinator, TU Delft Library
Qualitative interviews with nine researchers at the Faculty of Technology, Policy and Management (TPM) at TU Delft were undertaken in order to get an understanding of data management needs at the faculty in advance of appointing a dedicated Data Steward. The purpose of this was to aid the recruitment of the Data Steward and to define the skills and experience of an ideal candidate, as well as help deciding on the work priority areas for the Data Steward. The results of this research can be also used as a point in time reference to monitor changes in data management practice at the faculty.
The main data management challenges identified were: handling personal sensitive research data; working with big data, managing and sharing commercially confidential information and software management issues. Despite the diversity of problems, some common issues were identified as well: the need for improving daily data management practice, as well as the need for revising workflows for students’ research data. With the exception of one researcher, who was in opposition to the Data Stewardship project, all other researchers expressed their support for the project and welcomed the idea of having a dedicated Data Steward at the faculty.
Additionally, several follow up actions were already undertaken as a follow up of these interviews:
- the Data Stewardship Coordinator was invited to give two talks about Data Stewardship to two different groups of researchers;
- a member of the Research Data Support from the Library team was asked to deliver a training course for students;
- the Data Stewardship Coordinator was asked to discuss the best way of rolling our data management training for PhD students at TPM in coordination with the TPM Graduate School.
Given that the financial allocation for the Data Steward at TPM faculty is currently at 0,5 FTE for the first year and 1,0 FTE for the two subsequent years (until December 2020), it is recommended that the first year is spent on continuing and extending this research to better understand the needs of the faculty. It is suggested that at the same time, the Data Steward starts addressing the most urgent data management needs at TPM faculty, in particular, the development of a data management policy, as well as the development of solutions and recommendations for working with personal sensitive research data.
The two subsequent years could be devoted to developing resources and solutions for the remaining problems and for critical evaluation of the project and its effect on data management practice at the faculty. This approach should provide the faculty with enough resources and information to decide on the best strategy for Data Stewardship beyond December 2020.
Data stewardship has been recognised internationally as a key foundation of future science. Carlos Moedas from the European Commision (EC) said that Open Science “is a move towards better science, to get more value out of our investment in science and to make research more reproducible and transparent. (…) Recent advances such as the discovery of the Higgs boson and gravitational waves, decoding of complex genetic schemas, climate change models, all required thousands of scientists to collaborate (…) on data. And that implies that research data are findable and accessible and that they are interoperable and reusable”. In support of this, the EC anticipated that about 5% of research expenditure should be spent on properly managing and stewarding data. Barend Mons, the Chair of EC high level expert group on the European Open Science Cloud, estimated that 500.000 Data Stewards will be needed in Europe to ensure effective research data management. Consequently, all NWO and H2020 projects starting from 2017 onwards must create a Data Management Plan and are required to make their data open. In addition, the European Open Science Cloud promises new tools and related EC strategy papers suggest new rewards and grant funding schemes (such as FP9) to benefit those practising open science.
TU Delft’s College van Bestuur (CvB) made a strategic decision to be a frontrunner of this global move and a dedicated Data Stewardship programme was initiated. The long-term goal of this programme is to comprehensively address research data management needs across the whole campus in a disciplinary manner. To achieve this, subject-specific Data Stewards are to be appointed at every TU Delft faculty. Strategic funding from the CvB was allocated to support 0,5 FTE of a Data Steward per Faculty until December 2018, and 1,0 FTE of a Data Steward per Faculty for two years from January 2019 to December 2020. Subsequently, faculties are to decide how to best address their researcher data management needs.
In 2017 the first Data Stewards were appointed at three TU Delft faculties: Faculty of Electrical Engineering, Mathematics and Computer Science, Faculty of Civil Engineering and Geosciences and Faculty of Aerospace Engineering. At the beginning of 2018, Data Stewards are to be appointed at the five remaining faculties, including the Faculty of Technology, Policy and Management (TPM).
In order to facilitate the recruitment decision over the appointment of a Data Steward at TPM faculty, the Data Stewardship Coordinator was set out to investigate the faculty’s research data management needs. Qualitative interviews were undertaken with TPM researchers in autumn 2017, which led to the identification of four main data management issues, specific to the types of research done, and revealed some common data problems for the faculty overall. The report below describes the key findings of this research and makes some recommendations for the future work of a Data Steward at TPM Faculty.
Semi-structured qualitative interviews were conducted with four full professors, three associate professors and two assistant professors in September and October 2017. Initial interviewees were selected and approached by the Data Stewardship Coordinator based on their online profile content to ensure a representation of the different research methodologies used across the faculty as well as representation of all three TPM’s departments: Engineering Systems and Services, Multi-Actor Systems and Values, Technology and Innovation. In addition, one researcher was interviewed as a result of a recommendation from the initial interviewee, and two other interviewees were suggested by the Secretary General of the faculty.
All interviewees were informed that interview findings will be used to create a preliminary report on data management needs at the faculty and that the report might be made publicly available. Interviewees were assured that no information will be directly attributed to them and that they will not be named in the report. Interviewees agreed for the interview notes, including personal information, to be shared internally with the Secretary General of the faculty.
Interviews lasted for 30 – 60 minutes. Interviews were not recorded, and instead, notes of key discussion points were taken by the interviewer during the interview.
Categories of data management issues
Diverse nature of research topics at TPM suggested that researchers could have different data management needs. Nine interviews conducted so far revealed that this was indeed the case and identified four top data management issues: handling personal sensitive research data; working with big data, software management issues and managing and sharing commercially confidential information.
Handling personal sensitive research data
Questions about handling of personal sensitive research data were from across the whole research lifecycle: starting with experimental design and ensuring that only the minimum necessary data about people were collected and the right consent forms were in place, all the way through to data anonymisation and deciding which parts of data could be made publicly available, which could be shared only under managed access conditions, and which datasets should never be shared. Researchers also mentioned difficulties of working with sensitive data on a daily basis – the need to use secure servers, encryption to share the data and to ensure that only authorised partners have access to data. Some discussed the challenges of working with sensitive information in fieldwork conditions, especially if the data was politically contentious.
Interviewers wished to have more guidance about recommended workflows and policies, as well as practical support for finding the right storage solutions and means for sharing data with collaborators. In addition, better support was required at the experimental design stage: deciding on the minimal amount of personal information to be collected and drafting the right consent forms. Finally, many expressed the need for resources which could help them with data anonymisation and to manage the risks and benefits of making datasets publicly available.
All these concerns seemed particularly pressing in light of the new EC Data Protection Regulation, coming into force in May 2018. Some interviewees feared that they were unprepared for the new regulation and felt they had not received sufficient information about the impact of the new regulation on their research.
Challenges of working with big data
Challenges of working with big data were mainly related to infrastructure limitations. For researchers working with very large files simple aspects of data management become a difficulty. For example, due to ever-increasing storage requirements for big datasets, many researchers were unable to backup their data. This consequently led to occasional irretrievable data loss. Due to large volumes, big datasets were rarely archived, raising reproducibility concerns. In addition, many researchers had to use third-party computing services in order to effectively process their data. These often resulted in issues associated with very slow data transfer.
Working with big datasets, especially those which needed to be dynamically updated, also meant challenges for data publishing. Many data repositories providers did not offer options for big data sharing and had strict limitations on the maximum size of files. In addition, publishing of big datasets often meant substantial costs and it was often more cost-effective to simply re-generate the data when needed.
Software management issues
The third issue was with software management. In general, researchers did not have policies within their research groups on how software should be managed, annotated and shared. Often the very platforms for software management differed within the same research group. Some researchers felt they did not have sufficient time to annotate their software properly and that their colleagues, especially students, did not have the right skills to effectively work with tools which could help them manage their software better. One researcher mentioned missed commercialisation opportunity due to the fact that the software developed by the group was not understandable to anyone outside the group, including the third party interested in commercialisation.
Interviewees mentioned that due to lack of appropriate skills amongst researchers, there was a need for professional service support in data science. In addition, many suggested that training on the use of software management tools (such as Git, Subversion or Jupyter Notebooks) would be useful, in particular for students. Several wished to receive more information about methods for software archiving and for getting citation credit for code publishing.
Managing and sharing commercially confidential information
Working with commercially confidential data also proved problematic. First, there were tensions between sharing data for the sake of reproducibility, and the need to protect third party’s commercial interests. Interviewees mentioned that navigating between the different contractual clauses could be difficult. One researcher admitted that the inability to share research data obtained from commercial partners made it more difficult to publish papers due to the fact that some journals now required that research data supporting publications was made publicly available. Another researcher felt that collaborating with industry negatively affected the progress of his academic career because commercial clauses consequently meant fewer papers published. That researcher thought that when it came to academic promotions, commercial collaborations were valued less than the number of published articles.
Common data management problems
In addition to data management issues related to the type of research conducted, some common problems mentioned by almost all the interviewees were identified as well. These were related to improving daily data management practice, and to better data management procedures for students.
Daily data management practice
Problems related to daily data management practice concerned issues such as designing a data backup strategy and adhering to it, good file and folder naming, as well as issues with version control. These problems were shared also by researchers who based their research primarily on literature reviews. Overall, very few interviewees established workflows for good data management which would be followed by entire research groups. Most of the time it was down to individuals as to whether data was properly managed or not. Many researchers expressed the wish to improve their data management practice and to attend appropriate training.
Students’ data management practice
Almost all interviewees said that data management practice amongst students needed to be improved and that data management training should be part of the Graduate School’s curriculum. Training needs were related to both awareness of general principles, such as data backup, as well as knowledge of specific techniques and practices, such as data science skills and software management tools.
In addition, one interviewee expressed his concern about the fact that PhD students were not required to archive their research data at the time of graduation. This, he believed, led to research reproducibility concerns and potential reputational damages. The researcher suggested that all PhD students should be required to archive their research data before leaving TU Delft. This view was shared by researchers from the TPM Policy Analysis section (see ‘Follow up actions undertaken’).
An additional concern regarding students’ data was raised during the meeting with researchers from the Engineering Systems and Services department (see the section ‘Follow up actions undertaken’). When discussing research data ownership, researchers mentioned that according to TU Delft regulations, research data collected by Master students belonged to the students, and not to TU Delft. As a result, in several cases, Master students left TU Delft and took all their research data with them, without leaving a copy with their TU Delft supervisors. Researchers believed that this was a concerning and a serious issue from the research integrity and research continuity point of view. To avoid similar issues occurring in the future and to overcome the unfavourable regulation, supervisors now avoided offering participation in valuable, larger projects to Master students.
Views on Data Stewardship
With the exception of one researcher, who was in strong opposition to the Data Stewardship project, all other researchers welcomed the project and thought that there were data management needs at the faculty which could be addressed by the Data Steward.
The researcher with negative views on the Data Stewardship project thought that appointing a dedicated staff member to support researchers in data management was counterproductive. That researcher believed that a Data Steward would “develop guidelines (…) and hold meetings to raise awareness etc.” instead of solving “any actual operational issue”. He also suggested that a quantitative survey should be done to define the common practices and to decide whether any corrective steps needed to be taken. Interestingly, despite the negative attitude in general, the researcher agreed that there were issues with data management which needed to be solved and thought that training in data management for all PhD students was particularly needed.
Another researcher who welcomed the overall idea of the Data Stewardship project raised his concern about the number of resources allocated to the project and suggested that care was taken to ensure that the project would not result in new compliance expectations.
All remaining researchers were enthusiastic about the project and identified numerous data management issues with which they hoped that a Data Steward could help. These included:
- Advice on data management workflows and best practices (such as data backup, version control, file and folder naming)
- Advice on data sharing and citation
- Advice on working with different types of confidential data (such as personal sensitive and commercially sensitive data)
- Support in designing strategies for sustainable code management
- Advice on code sharing and citation
- Help with managing funders’ and publishers’ expectations
- Training on data and software management, in particular for PhD students
Follow up actions undertaken
As a result of the initial interviews with researchers at TPM, several actions were undertaken, which might suggest that interviewed researchers were genuinely interested in data management issues. First, the Data Stewardship Coordinator was invited to give two presentations about the Data Stewardship project: to researchers from the Department of Engineering Systems and Services, and to researchers from the Policy Analysis section of the Multi-Actor Systems Department. Second, one of the interviewed researchers asked members of the Research Data Services team to deliver a workshop to his students about using data repositories. Third, one of the interviewees made a suggestion to connect with the TPM’s Graduate School to discuss the possibilities of rolling out data management training for PhD students.
In addition, the Data Stewardship Coordinator initiated discussions with other faculties to determine whether the issues around Master students’ research data ownership were also problematic at other faculties and whether the problem should be tackled centrally or not. The Furthermore, the Research Data Services team started liaising with the Human Research Ethics Committee to ensure alignment between research ethics and data management guidelines and policies.
This preliminary report identifies several areas where data management practices at TPM faculty could be improved with the help of a Data Steward. However, given the preliminary nature of these findings and the risk that they might not be representative of the whole faculty, it is recommended that the work of the newly appointed Data Steward is initially focused on a more in-depth investigation of data management needs. While qualitative interviews should be continued, a quantitative survey at the faculty is also needed, in agreement with the advice of the interviewee who was negative about the Data Stewardship project. Indeed, results of quantitative surveys conducted at the three faculties that already have Data Stewards proved to be valuable for measuring the scale of data management issues and deciding on priority actions. The thorough investigation of data management needs will allow the faculty to decide how to prioritise them. Finally, understanding the faculty-specific requirements will inform the development of a faculty data management policy.
In addition, given the fact that many researchers interviewed expressed uncertainties about the recommended procedures for working with personal sensitive data and that the new EC Data Protection Regulation becomes legally binding in May 2018, it is suggested that development of recommendations and training for working with personal sensitive data is also prioritised. This work should be done in collaboration with other teams at TU Delft: the Data Protection Officer, the Research Data Support team at the Library, the ICT team and the Human Research Ethics Committee.
Subsequent two years during which the Data Steward will be appointed at 1,0 FTE could be solely devoted to developing solutions for the remaining priority data management needs and also to evaluating the project. Comprehensive evaluation of the project should help the faculty make an informed decision on how to take the Data Stewardship forward after the end of 2020.
I would like to thank: all researchers who agreed to participate in my interviews for their time and valuable feedback; Martijn Blaauw for interviewee suggestions and introduction to the faculty; Alastair Dunning and Heather Andrews for comments on this report.
A citable version of this report is available on the Open Science Framework: https://osf.io/8ce5v
On the 25th May 2018 the new General Data Protection Regulation (GDPR) for Europe will be active.
In the introductory paragraph of the official website it quite frankly states: “[…] at which time those organizations in non-compliance will face heavy fines.”
But let’s take a step back and have a quick recap: the GDPR replaces the Data Protection Directive 95/46/EC from 1995. Since then the European Union changed quite a bit and among other things opened borders for many economical streams and development, not least a digital one. In 1980 the US and EU agreed on eight principles to improve the Protection of Privacy and Transborder Flows of Personal Data:
- Collection of Limitation Principles;
- Data Quality Principle;
- Purpose Specification Principle;
- Use Limitation Principle;
- Security Safeguards Principle;
- Openness Principle;
- Individual Participation Principle;
- Accountability Principle;
- You can find the detailed principles on the eugdpr overview web-page.
Due to these principles being non-binding and varying degree of compliance across EU member states, an new data protection regulation was proposed by the European Commission 2012. Skipping forward to December 2015, where the European Parliament and Council agreed on the final version of the GDPR successively adopted the new regulation and starting a “2-year post-adoption grace period“, that ends on the 25th of May this year.
The GDPR not only applies to organisations located within the EU but it will also apply to organisations located outside of the EU if they offer goods or services to, or monitor the behaviour of, EU data subjects. It applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.
The key changes of the new GDPR in comparison with the previous regulations are:
Increased Territorial Scope (extra-territorial applicability)
… “Non-Eu businesses processing the data of EU citizens will also have to appoint a representative in the EU.”
… “Under GDPR organizations in breach of GDPR can be fined up to 4% of annual global turnover or €20 Million (whichever is greater).”
… “Consent must be clear and distinguishable from other matters and provided in an intelligible and easily accessible form, using clear and plain language. It must be as easy to withdraw consent as it is to give it.”
Data Subject Rights
… Breach Notification withing 72 hours of first breach awareness.
… Right to Access personal data free of charge in an electronic format.
… Right to be Forgotten enables the deletion of data, cease of further dissemination, and halt of processing.
… Data Portability introduces the transmission of personal data in a ‘commonly use and machine readable format’ between data controllers and reception for the data subject.
… Privacy by Design enforces the intrinsic implementation of security and privacy measures in system designs and architectures, instead of adding these features by request.
Data Protection Officers
… “[…] appointment will be mandatory only for those controllers and processors whose core activities consist of processing operations which require regular and systematic monitoring of data subjects on a large scale […].”
There are some helpful info-graphics, guidelines and coalitions available online. Here are some pointers:
1.The UK Information Commissioner’s Office published a checklist with 12 easy steps to take in preparation of the GDPR:
2. Bird&Bird GDPR legislation implementation tracker:
3. Privacy For Academic Research Cookbok by Marlon Domingus (Erasmus University Rotterdam) for the Digital Curation Centre (DCC) UK:
4. GDPR Awareness Coalition: