Using Jupyter to study Earth

This blog is originally written and posted by Susan Branchett here under a CC-BY-4.0 international license.


How TU Delft’s ICT-Innovation department is providing hands-on help to researchers in order to understand their IT requirements better.

Jupiter

How did it come about?

Earlier this year I was reading through the ‘TU Delft Strategic Framework 2018-2024’ and buried deep within its pages I found this hidden gem:

We strengthen the social cohesion and interaction within the organisation, by:

  • Supporting mobility across the campus. For example through interfaculty micro-sabbaticals.
  • Stimulating joint activities and knowledge exchange across the various faculties and service departments.
  • Strengthening relations between academic staff members and support staff.

Hidden Gem

This seemed especially relevant to our ICT-Innovation department. We are continually on the look-out for ways to support the primary processes of the university, research and education, by applying IT solutions. I decided to find myself a suitable micro-sabbatical.

Since October 2018 I’ve been spending one day a week in the group of Prof.dr.ir. Nick van de Giesen and Dr.ir. Rolf Hut, working with their bright, new Ph.D. student, Jerom Aerts, on the eWaterCycle II project.

What’s it about?

eWaterCylce II aims to understand water movement on a global scale in order to predict floods, droughts and the effect of land use on water. You can read more about it here https://www.ewatercycle.org/ or here https://www.esciencecenter.nl/project/ewatercycle-ii.

Sacramento River Delta

Hydrologists are encouraged to use their own local models within a global hydrological model.

In order to test whether their model is working properly, the project team is developing a Python Jupyter notebook that makes it easy for hydrologists to produce the graphs and statistics that they are familiar with.

During my micro-sabbatical, I am contributing to the development of this Jupyter notebook.

What did I learn?

  • Wi-Fi is an essential service for researchers and needs to be reliable
  • Standard TU Delft laptops are not adequate for research
  • Data for this project is hosted in Poland due to the collaboration with many partners and funding from EOSC
  • The team initially hosted their forecasting site on AWS, because AWS is quick to set up and it works in all the countries involved. For the minimum viable product of the global hydrology model they moved to the SURFsara HPC Cloud
  • If data is not open, then researchers are hesitant to use it. Their work can’t be reproduced easily, leading to fewer quality checks and less publicity
  • In the face of bureaucracy, cramped conditions and an ever growing number of extra required activities, our researchers’ determination and passion for their field of expertise is truly magnificent

TU Delft light bulb

I shall be using these insights to guide my work within the ICT-Innovation department and to feed our conversations with the Shared Service Center.

What next?

From 1st April 2019 I’ll be moving on to my next micro-sabbatical at the Chemical Engineering department of the Applied Sciences faculty. There I shall be installing molecular simulation software on a computer cluster and getting it up and running.

My ambition is to cover all 8 faculties of the TU Delft within 4 years. In October 2019 I shall be available for the next micro-sabbatical. If you have any suggestions, please do not hesitate to get in touch.

About the Author

Susan Branchett is Expert Research Data Innovation in the ICT-Innovation department of the TU Delft. She has a Ph.D. in physics and many years’ experience in software development and IT. Find her at TU Delft or LinkedIn or Twitter or github.

This blog expresses the views of the author.

Acknowledgements

The image of Jupiter is from here. Image credit: NASA/JPL-Caltech/SwRI/MSSS/Kevin M. Gill. License.

The hidden gem image is from here and is reproduced by kind permission of Macintyres.

The Sacramento River Delta image is from here and is reproduced under a CC-BY-2.0 license.

Except where otherwise noted this blog is available under a CC-BY-4.0 international license.

VU Library Live talk show and podcast on the academic reward system

Authors: Esther Plomp, Maria Cruz, Anke Versteeg

On the 14th of March 2019 the fourth VU Library Live talk show and podcast took place at the Vrije Universiteit Amsterdam (VU). By choosing topics that appeal to researchers and that are at the forefront of scholarly communications and research policy, this podcast series aims to bring researchers back to the library in the current age of digitalisation, where university libraries are becoming increasingly invisible to researchers. The topic of this show was the academic award system of the future.


We need to stop being lazy with just counting numbers of papers and citations and actually start reading stuff – Vinod Subramaniam

Vinod Subramaniam, Rector Magnificus of the VU,opened the show and claimed that we have lost sight of “the core business of the university, which is education”. Vinod stated that the academic reward system should not be based solely on research activities, let alone the traditional impact factor. As a researcher you can also have impact by communicating your results through newspapers and by providing guidelines that are used in society (such as medical guidelines or political policies). “We need to stop being lazy with just counting numbers of papers and citations and actually start reading stuff.”Vinod added that universities are at a turning point: “it is a perfect storm that is converging now, where I think a lot of things are happening where we as universities, but also grant agencies and other stakeholders have to start thinking about how do we reshape the reward system in academics”. Vinod also highlighted the timely occurrence of the meeting, right before the demonstration for education in the Hague on the 15th of March (WOinActie) against the increasing funding reductions in higher education and the very high workload of university staff.

it is a perfect storm that is converging now, where I think a lot of things are happening where we as universities, but also grant agencies and other stakeholders have to start thinking about how do we reshape the reward system in academics – Vinod Subramaniam

Vinod Subramaniam opening the VU Library Live show

Maria Cruz opened the podcast with the phrase ‘publish or perish’, a strategy that is increasingly affecting the academic system, leading to high levels of workload and skewing research priorities. The current focus on publications as the golden standards in the evaluation process at Universities and research institutes decreases the value that education and valorisation activities have in scientific careers. The academic performance evaluation system furthermore barely takes any other academic outputs into account, such as software, data and other forms of communicating scientific research. Maria wonders if there is a way out of this system and asked: “Can we change this system to facilitate research that is open and transparent and contributes to solving key societal issues?”


Can we change this system to facilitate research that is open and transparent and contributes to solving key societal issues? – Maria Cruz

To address this question, Maria had four guest speakers around the table with her: Barbara Braams, Stan Gielen, Jutka Halberstadt, and Frank Miedema.

Maria twittering on Barbara’s view on open science and data/software sharing.

Barbara Braams, Assistant Professor at the VU, who recently wrote an opinion piece for theVolkskrant on the topic, agreed that we should move to a more transparent scientific system, but said that currently the amount of publications on your CV is more important when you write a grant, particularly the number of first author publications and the journal in which they are published. Barbara wants to change how researchers are evaluated and place the focus on not just sharing the articles but also scripts and data sets. “But to do so and to make sure that your data is actually usable by someone else, it takes a lot of time and effort…”, said Barbara. “ I think when you’re moving towards an open science system, we should also think about how can we reward these type of efforts, because it means that if I put a lot of effort in making my data understandable for someone else … it takes time away from my publications.” She argued that it should be clear for Early Career Researchers (ECR) in particular what it is that is expected of them and for what practices they are rewarded, as they have to deal with small time frames in which they have to write grant proposals. This will be important in the coming year, as NWO aims to sign DORA in 2019 but will implement the new indicators from 2020 onwards.


we should also think about how can we reward these type of efforts, because it means that if I put a lot of effort in making my data understandable for someone else … it takes time away from my publications. – Barbara Braams

Maria twittering on Stan’s view on seeking agreement between European funding agencies on how to evaluate scientists.

Stan Gielen, President of The Netherlands Organisation for Scientific Research (NWO), agrees with Barbara that the evaluation of researchers needs to be revisited: “I don’t care where you publish, I want to know what was your contribution to science.” Stan claims that NWO wants to change the system, but that scientists are working in the international scene. This means that, according to Stan, if we want to change the system, this will have to be in cooperation with other funding agencies, at least in Europe. As a member of the Science Europe Governing BoardStan will seek agreement between EU funding agencies and will come up with general guidelines on how to evaluate scientists: “I expect that we will have a document available by the end of this year, which will be open for public consultation…. I hope that in let’s say, first half 2020, we will have some level of agreement among all the funding agencies in Europe about the basic criteria that should be used for evaluating the scientists,” he added. NWO also agreed to sign the San Francisco Declaration on Research Assessment (DORA, a set of recommendations to move away from assessing researchers using the impact factor) in September 2018, but has not yet signed the declaration. This will happen before the 23rd of May, when a meeting will take place to discuss the evaluation of researchers: “We decided to sign DORA and also to make statements what we are going to do before that meeting,” said Stan. “…We will also indicate at that time what [NWO] will do to implement DORA.”

On the question of how ECRs can be more actively involved in the process of change and the evaluation of researchers, Stan answers that The Young Academy is invited to participate actively in the consultation processes in the Netherlands and Europe. Stan added, “we are in close contact with VSNU, the Society of Dutch universities, because they’re talking now with the other communities about academic evaluation system. [NWO] should make sure that our criteria are overlap or are the same as those that are used by universities to evaluate the research component.”


Maria twittering on Jutka’s view on the impact factor.

Jutka Halberstadt, Assistant Professor at the VU, is of the opinion that the impact factor should not matter but that we should focus on what benefits society. Her work centres around valorisation. Jutka describes valorisation as “using our knowledge from teaching and research to help build a better society, to have societal impact.” She thinks that research should be available to society, and said that “we should make it understandable, usable to society… we should be in close contact with society to see what the societal needs are so we can translate those back to relevant research questions.” She adds that “in an ideal world [we should] also do the research in close collaboration with partners in society because I think that will make the research better. And for for me having societal relevance is a vital aspect of being an employee of the VU University.”

Jutka developed a national standard for obesity in collaboration with healthcare professionals and patient organisations in a project called ‘Care for Obesity’, funded by the Dutch Ministry of Health. No one knows how to measure the impact of these standards, even though they have enormous societal impact. Instead of prioritising publications in academic journals, the project produces other outputs such as blogs, questionnaires, guidelines and workshops for health care professionals: tools that are scientifically based and practical for people to use. Yet these research outputs are not valued at the same level as scientific publications. She thinks that universities should focus more on collaborative efforts and use altmetrics, e.g. how many times a name is mentioned in the news. “It won’t be a perfect system,” said Jutka, “but it’s something we can develop and work with.” She added that researchers should not be obliged to excel in education, research and valorisation at the same time, as this is “really a lot to ask” from them.

Maria twittering on Stan’s view on revising the academic reward system.

Barbara also highlights the current focus of the reward systems on the excellence of the individual. “I think one of the great things that we can also use from this transition to open science is make more space for other type of scientists and work as a team. But how are we going to do that, for instance, in a grant system?” Stan answers that NWO is not going to implement separate grants for different types of expertise, but recognises the worth of team-effort. For example, the Spinoza price (the highest award in Dutch science) is awarded to an individual, but as Stan puts it: “every person who gets the Spinoza price spends quite a lot of time explaining and thanking everyone who helped [them]”. Stan mentioned that the Spinoza price should therefore become a team award but did not comment on whether this change would actually be implemented. Instead, he thinks that “it’s up to the HRM policy of universities to make sure that they have excellent teams.” This may be difficult for universities to implement when the funders decide where the money flows.

Maria twittering on Frank’s view on the impact factor.

Frank Miedema, Vice Rector Research and chair of the Open Science Programme Utrecht University and UMC Utrecht, thinks we are on the way to recognising the many forms of excellence. “[As a scientist] you want to produce real significant knowledge”, said Frank. According to him there are frustrations because this work is now not rewarded. “…a couple of years ago, this was still considered as taboo, especially at NWO,” he said, “But now we have it on the table”. When Maria asked if we should break free of the bibliometric mind-set, he said that this dependence on the impact factor has to stop, but that there are “many people who are addicted, especially of course, the people who do well in the system.” Frank is also involved in the evaluation of the current research evaluation protocol and thinks that more excellences need to be rewarded. He warned that the road is long and difficult and it may scare some people off, as scientists are very insecure when it comes to changes in the academic system. Frank raised the question whether the review committees of NWO were trustworthy.


It will take I think, 10, maybe 20 years until the mindsets of the reviewers have really changed – Stan Gielen

Stan indicates that it will take multiple years, perhaps even decades, before the new evaluation system is fully into place because the review panels have to incorporate these new instructions. Stan indicated that instructing reviewers has a positive effect, but said that “it will take I think, 10, maybe 20 years until the mindsets of the reviewers have really changed.” Stan indicates that the transitional phase will not restrict the rewards for more traditional scientists, because it would be too soon. Instead they will be implemented as the primary criteria in the next two years. Stan thinks that researchers will be triggered by these questions in the grant proposals: “we ask you to come up with a short narrative explaining first of all, why this is an important problem, what the impact will be scientifically, but also societal impact and why you or your team is qualified to pursue this project.” This can be done by listing your open science track record and shared data sets, next to the publications. Stan said that it is up to the researcher “to explain what you have done for open science and how you will pursue these activities when your grant application will be funded…” Barbara shedded some more light on the complexities of the transitional phase. She needs to ask for informed consent to share the data of her research, and after this it will still take 4-5 years before the data is available. Barbara explained that “there are so many differences in different fields. Some disciplines need more time to open up their data sets and panellist should be made aware of these differences,” she added.

We ask you to come up with a short narrative explaining first of all, why this is an important problem, what the impact will be scientifically, but also societal impact and why you or your team is qualified to pursue this project. – Stan Gielen

When Barbara asks how universities are going to support their staff in the transition to open science, Stan answered that “we need data stewards, because we should not …. put the burden on the scientist.”

We need data stewards, because we should not put the burden on the scientist – Stan Gielen

The answer to Maria’s final question on what they would advise young scientists to do with traditional supervisors is to take matters into their own hands. Young scientist should follow their hearts, as Jutka puts it. Barbara agreed and says that the young generation will move this forward. Stan thinks supervisors should let young people bring in their own expertise: “if you stick to your own principles, you will be lost in four years.” Frank however, has a different view. “I think to put the burden on the young academics … [is] not really fair, because they have the least power in the system,” he said. Frank thinks that that deans and rectors need to set steps in this “power game” and decide on the right incentives for the academic leaders. The move towards a more transparent, open and societal relevant way of practising science thus requires the effort of all stakeholders: researchers at all academic career levels as well as support staff, funding agencies, universities, libraries and the involvement of the general public.

Links to the VU Library Live podcasts:

I need your data, your code, and your DOI

This blog is originally written and posted by one our Data Champions 

I love open science. Since you are reading a scientific blog, I believe it is likely that you also support many of open science ideas. Indeed, easy access to publications, code, and research data makes research easier to reuse, while also ensuring transparency of the process and better quality control. Unfortunately the academic community is extremely conservative and it just takes forever for new standards to become commonplace.

The push for change in scientific practice comes from many directions.

  • Many funding agencies now require that all publications funded by them are publicly accessible. The upcoming Plan S would go further and only allow open access publications for all public funded research.
  • Frequently when submitting a grant proposal these days one also must include a data management plan [1].
  • The glossy journals in our field tighten their data publication requirements (see Nature and Science).
  • At the same time there are multiple grassroots initiatives for setting up open access community-run journals: SciPost [2] and Quantum.

Also as individual researchers we can do a lot. For example, our group routinely publishes the source code and data for our projects. Recently Gary Steele and I proposed to our department that every group pledges to publish at least the processed data with every single publication. This is miles away from the long-term vision of publishing FAIR data, but it is a step in the right direction that does not cost too much effort and that we can do right now. We were extremely pleased when our colleagues agreed with our reasoning and accepted the proposal.

The policy changes and initiatives help improve the practice, but policy changes are slow and grassroots initiatives require extra work and might require convincing skeptically minded colleagues. Interestingly I realized that there is another way to promote open science, which doesn’t have any of those drawbacks. Instead it is awesome from all points of view:

  • It does not require any effort on your side.
  • It has an immediate effect.
  • It helps researchers to do better what they are doing anyway.

Almost too good to be true, isn’t it? I am talking about one situation where every researcher is in a position of power: reviewing papers. The job of a reviewer is to ensure that the paper is correct, and that it meets a quality standard. As soon as the manuscript is even a bit complex, one cannot assert its correctness without examining the data and the code that are used in it. Likewise, if the data and the code comprise a significant part of the research output, the manuscript quality is directly improved if the code and the data is published as well.

Therefore I have decided that a part of my job as a reviewer is to to ensure that the code and the data is available for review as soon as it is sufficiently nontrivial. I have requested the code and the data on several occasions, following this request with a suggestion to also publish the code and the data.

I was pleasantly surprised with the outcome. Firstly, nobody wants to argue against a reasonable request by a referee. Secondly, often the authors are happy to share their work results and do a really decent job. Finally, on more than one occasion already requesting the data was enough for the authors to find a minor error in their manuscript and fix it. In the current system where publishing this supplementary information does not bring any benefit, the authors are seldom motivated to make their code understandable and data accessible. Once a review requests the data and the code, the situation changes: now whether the paper gets published also depends on the result of this additional evaluation.

So from now on, whenever I review a manuscript, in addition to any other topics relevant to the review, I am going to write the following [3]:

The obtained data as well as the code used in its generation and analysis constitute a significant part of the research output. Therefore in order to establish its correctness I request that the authors submit both for review. Additionally, for the readers to be able to perform the same validation I request that the authors upload the data and the code to an established data repository (e.g. Zenodo/figshare/datadryad) or as supplementary material for this submission.

I hope you join me and do the same [4].

 

[1] One has to note that the data management plans are mostly overlooked during the review.
[2] Full disclosure: I’m a member of SciPost editorial college.
[3] Obviously, I’ll adjust this bit if the paper doesn’t have code or data to speak of.
[4] Consider that bit of text public domain and use it as you see fit.

New features in the 4TU.ResearchData archive!

We are happy to announce two new metadata elements added to the 4TU.ResearchData archive:

  1. Funder information;

To link datasets in a more structured way to funding, we have made funding information available in dedicated metadata fields. Depositors are asked to submit the name(s) of the funder and grant number as part of the standard metadata deposit when they submit their dataset.

The funding information is displayed on the public dataset landing page, which includes the funder identifier from the Funder Registry.

Benefits:

  • Funding organizations are able to better track the published results of their grants
  • Research institutions are able to monitor the published output of their employees
  • Greater transparency on who funded the research

2. Subject;

In addition to ‘Keyword’ that tells what the topic of the dataset is, we have recently added a new metadata element Subject to be able to expose datasets according to their field of research.

When submitting their dataset, depositors are required to choose one or more subject categories (or fields of research) from a list, which originates from the Australian and New Zealand Standard Research Classification (ANZSRC).

The classification schema contains 22 Fields of research:  https://data.4tu.nl/repository/cat:1 and 17 Socio-economic Objectives: https://data.4tu.nl/repository/cat:2

Each main subject category consists of sub-categories to make the field of research more specific, e.g. when selecting the main subject category ‘Biological Sciences’, depositors are offered the following sub-categories from which they can choose:

 

The Subject metadata element is added as search facet to allow users to refine their search results by subject category:

Another notable feature is that every subject category shows all datasets that belong to that category using the relation type ‘is subject of‘, and all related subject categories that shares datasets with the current subject category.

See for example: https://data.4tu.nl/repository/cat:0806

Datasets that have been deposited before we had this new metadata element in place, have been updated with one or more subject categories by our moderators. Although we have been very careful, we could have made a mistake. Should this be the case, please don’t hesitate to contact us at researchdata@4tu.nl.

A Subjective Assessment of Research Data in Design


van Leeuwenhoek’s microscopes by Henry Baker (Source: Wikimedia Commons)

In the autumn of 2018 I took up the post of Data Steward in the Faculty of Industrial Design Engineering (IDE). As I am not a designer myself (my academic background is in historical literature), a significant portion of my time is dedicated to understanding how research is conducted in the realm of design, in particular trying to compose an overview of the types of data collected & used by designers, as well as how current and upcoming ideas & tools for research data management might potentially benefit their activities. This is no mean feat, and at present I cannot lay claim to more than a superficial understanding of the inner workings of design research. Through day-to-day data steward activities – attending events, reading papers and, perhaps most revealing, conversations with individual researchers, to name but a few – the landscape of design research data gradually becomes more intelligible to me. Cobbling together a coherent picture from these disparate sources requires a modicum of dedicated thought, so it was my good fortune to have recently been invited to an event arranged by the Faculty of Health, Ethics & Society (HES) at Maastricht University to present my experiences with design data thus far. Here we discussed and compared research data practices, and my preparation for this discussion afforded me the opportunity to reflect a bit on what research data means in the field of design, how design methodology relates to other academic fields and what kinds of challenges and opportunities exist for handling data and making it more impactful within the discipline and beyond.

The HES workshop, organized early in February of this year, was a forum for the group to discuss how their work and the data they produce intersect with some of the issues currently being debated within academic communities. A specific goal was to evaluate some of the arguments originating in the (at times competing) discourses of Open Science and personal privacy. Topics of discussion included how one should make sociological and healthcare data FAIR, especially given that the materials collected in HES are often predominantly qualitative in nature: personal interviews, ethnographic field notes, etc. Questions surrounding these topics are broadly applicable to some qualitative types of data in design as well, e.g. the extent to which data should be shared, in what format and under what conditions. The slides from my talk are available here: https://doi.org/10.5281/zenodo.2592280, and this blog post is intended to give them some context.

Research Data in Design

Maintaining an overview of the various types and amounts of data produced, analyzed and re-used within the Faculty of Industrial Design Engineering is a core aspect of my work as a data steward, but it is an ongoing challenge due to the heterogeneity of data used by designers and the quantity of different projects simultaneously active. Some designers do market research involving i.a. surveys, others take sensor readings and yet others develop algorithms for improving the manufacturing process. Each of these, along with the many other efforts within IDE, merit their own suite of questions and concerns when it comes to openness and privacy. The more we understand data types and usage in a field, the better we can judge the impact of present and future actions germane to research data – open access initiatives, legislation (esp. the GDPR), shifts in policy or practice, etc. More importantly, we can predict how we might turn some of these to our advantage.

For instance, TU Delft recently instituted a policy that all PhD students will be required to deposit the data underlying their thesis. For new PhD students, this will simply be a part of the process, one step among the many novel activities they experience on the way to earning their PhD. The real challenge lies with members of my faculty, the experienced researchers and teachers, as well as myself, who will have to identify the value in applying this new policy to research data in their field. To do this we must ask ourselves a series of questions. In addition to the aforementioned ‘what kind of data do we have and use?’, we must determine what should be made public as well as to what degree. Underlying all of this is a more fundamental question is, of course: how does sharing this information improve the production of knowledge in design and the fields which it touches? Some of these queries have clear answers, but the majority require further discussion and reflection.

Data Sharing and Data Publishing

One common question I receive in various forms is why designers and design researchers should share their data more widely than they presently do. In many instances I find this returns to the aforementioned issue of diverse types of data. For some designers who have a clear definition of what their data is, why it is collected and how others can use the data, such as the DINED anthropometrics group, a conversation on what data to share and how can be fairly straightforward. But what are the actual benefits of sharing design notes or other types of context-bound qualitative data? In the data management community we have a set of commonly purveyed answers to this query, and I have been trying to see how they match up to existing practice in design.

The first is idealistic, that publishing data will further the field, improve science through increased transparency, accuracy and integrity. Reactions to this argument often take the form of a slow nod, a sign I take to be cautious optimism (one which I happen to share). This outcome is difficult to measure. I was once asked who would be interested in seeing the transcripts of x number of their interviews. A legitimate question, and one with an inscrutable answer – it is difficult to tell who will use your data if they do not know it exists in the first place. A corollary to this is that we ask people to weigh the requisite time investment in making materials publishable (sometimes substantial if working with qualitative and/or sensitive data) against this unpredictable benefit. I believe we need more evidence of the positive impact of making design data FAIR, whether this be figures of dataset citations (currently a desideratum) or anecdotal evidence of new contacts and collaborations resulting from data sharing. Essentially this means a few interested volunteers willing to learn the tools, put in some extra time and test the waters. Will sharing my sensor data attract the attention of a new commercial partner? Will my model be taken up and improved upon by the community using the product or service we design? These are certainly possibilities, but at present they remain a future less vivid.

For PhD students and early career researchers I frequently posit the possibility that publishing data, making their publications Open Access and other actions to make their work more transparent could yield direct career opportunities. This ties into efforts promoting expansion in the interpretation of research assessment such as DORA. In my current position, I feel that designers may be ahead of the curve when it comes to evaluating research impact. In addition to research papers published in journals boasting various impact factors, desirable results from design projects include engagement tools, reflections from projects, and prototypes to name only a few. The weighting of these outputs is unclear to me when it comes to, e.g. obtaining a research position, but I suspect there is room here for alloting credit to demonstrations of open working. This is certainly the case in some fields where lectureship advertisements include explicit language supporting Open Science. As far as I have been able to determine (in my extremely casual browsing of job postings) this is not yet an element of the narrative designers weave to present their work to potential employers nor one sought by employers themselves. However, data publications as part of CVs attached to grant applications may indeed have some cache, as funding agencies such as the NWO and ZonMw presently stress the importance of such activities in the pursuit of maximizing investment returns in the grants they award. Here is an opportunity to serve the interests of many.

Food for Thought

One of my takeaway messages from these debates is that there is a need for a community – in design, in many research areas – an opportunity to convene and discuss issues and test some of the options being afforded or demanded under the umbrella of Open Science. Some design research shares a number of data issues in common with social sciences – questions of consent, of data collection and access – while others are more aligned with mathematics or medicine. Furthermore I’d be interested to hear whether any RDA outputs have an application in design, as well as whether repositories for design materials would be desirable and how they should be arranged. From my admittedly biased position, I believe there is much that designers stand to gain from picking up versioning tools or sharing data more widely, and I think designers’ methods and the iterative nature of design thinking, as I understand them, could in turn only benefit Open Science communities.

Data Stewardship – goals for 2019

bulletin-board-3127287_1280.jpg

Authors: Heather Andrews, Nicolas Dintzner, Alastair Dunning, Kees den Heijer, Santosh Ilamparuthi, Jeff Love, Esther Plomp, Marta Teperek, Yasemin Turkyilmaz-van der Velden, Yan Wang

From February 2019 onwards and with the appointment of the data steward at the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS), the team of data stewards is complete: there is a dedicated data steward per every faculty in TU Delft. Therefore, the work in 2019 focuses on embedding the data stewards within their faculties, policy development, and also on making the project sustainable beyond the current funding allocation.

The document below outlines high-level plans for the data stewardship project in 2019.


Engagement with researchers

In 2019, the data stewards will (among others) apply the following new tactics to increase researchers’ engagement with research data management:

Meeting with all full professors

Inspired by the successful case study at the faculty of Aerospace Engineering, data stewards will aim to meet with all full professors at their respective faculties.

Development of training resources for PhD students and supervisors

Ensure that appropriate training recommendations and online data management resources are available for PhD students to help them comply with the requirements of the TU Delft Research Data Framework Policy. These should include:

  1. Appropriate resources for PhD students, e.g. support for data management plan preparation, and/or data management training for PhD students
  2. Support for PhD supervisors, e.g. data management guidance and data management plan checklists for PhD supervisors
  3. Online manuals/checklists for all researchers, e.g. information on TU Delft storage facilities, how to request a project drive, how to make data FAIR

Support for data management plans preparation

Ensure that researchers at the faculty are appropriately supported in writing of data management plans:

  1. At the proposal stage of projects, researchers are notified about available support for writing the data paragraph by the contract managers and/or project officers of their department
  2. All new grantees are contacted by the data stewards with an offer of data management and data management plan writing support
  3. Training resources on the use of DMPonline, which will be used by TU Delft for writing Data Management Plans, are available and known to faculty researchers

Coding Lunch & Data Crunch

Organise monthly 2h walk-in sessions for code and data management questions for faculty researchers. Researchers will be supported by all data stewards and the sessions will rotate between the 8 faculties.

The Electronic Lab Notebooks trial

Following up on the successful Electronic Lab Notebooks event in March 2018, a pilot is being set up to test Electronic Lab Notebooks at TU Delft in 2019. The data stewards from the faculties of 3mE and TNW are part of the Electronic Lab Notebooks working group and are in contact with interested researchers who will be invited to get involved in the pilot.

Data Champions

Further develop the data champions network at TU Delft:

  1. Ensure that every department at every faculty has at least one data champion
  2. Develop a community of faculty data champions by organising a meeting every two months on average
  3. Organise two joint events for all data champions at TU Delft and explore the possibility of organising an international event for data champions in collaboration with other universities

Faculty policies and workflows

In 2019, all faculties are expected to develop their own policies on research data management. However, successful implementation of these policies will depend on creating effective workflows for supporting researchers across the research lifecycle. Therefore, the following objectives are planned for 2019:

  1. Draft, consult on and publish faculty policies on research data management.
  2. Develop a strategy for faculty policy implementation
  3. Develop effective connections and workflows to support researchers throughout the research lifecycle (e.g. contacting every researcher who was successfully awarded a grant)

RDM survey

A survey on research data management needs was completed at 6 TU Delft Faculties (EWI, LR, CiTG, TPM, 3mE and TNW). In 2019, the following activities are planned:

  1. Publish the results of the survey conducted in the 6 faculties in a peer-reviewed journal
  2. Conduct the survey at BK and IDE  – first quarter of 2019
  3. Re-run the survey at EWI, LR, CiTG, TPM, 3mE and TNW – September 2019
  4. Compare the results of the survey in 2017/2018 with the results from 2019 of the re-run survey and publish faculty-specific reports with their key reflections on the Open Working blog
  5. Survey data visualisation in R or python
    The visualisation of 2017/2018 RDM survey results was available in Tableau, which is proprietary software. To adhere to the openness principle, and also to practice data carpentry skills (see below), the 2019 data visualisation will be conducted in R.

Training and professional development

On top of specific training on data management, in 2019 data stewards will invest in training in the following areas:

Software carpentry skills

Code management is now an integral part of research and is likely to become even more important in the coming years. Therefore, as a minimum, every data steward should complete the full software carpentry training as an attendee in order to be able to effectively communicate with researchers about their code management and sharing needs. In addition, data stewards are strongly encouraged to complete training for carpentry instructors to further develop their skills and capabilities.

Participation in disciplinary meetings

In order to keep up with the research fields they are supporting, data stewards will also participate in at least one meeting, specific to researchers from their discipline. Giving talks about data stewardship / open science during disciplinary meetings is strongly encouraged.

Events

In addition to dedicated events for the Data Champions, the following activities are planned for 2019:

In addition, the team is planning to organise the following events (no dates yet)

  • Software Carpentry workshops
    • March & November 2019 – at TU Delft
    • May 2019: at Eindhoven
    • October 2019: at Twente
  • Workshop on preserving social media data – workshop which will feature presentations from experts in the field of social media preservation, as well as investigative journalists (e.g. Bellingcat)
  • Conference on effectively collaborating with the industry (managing the tensions between open science and commercial collaborations)

Individual roles and responsibilities

Some data stewards have also undertaken additional roles and responsibilities:

  • Yasemin: Electronic Lab Notebooks, Data Champions
  • Esther: Electronic Lab Notebooks, DMP registry
  • Kees: Software Consultancy Lead

Sustainable funding for data stewardship

The current funding for the data stewardship project (salaries for the data stewards) comes from the University’s Executive Board and is until the end of 2020. However, the importance of the support offered to the research community by the data stewards has been already recognised not only by the academic community at TU Delft but also by support staff.

In order to ensure the continuation of the data stewardship programme and for TU Delft not to lose the highly skilled, trained and sought-after professionals, it is crucial that the source of sustainable funding is identified in 2019.

Take research data management seriously and organize discipline-specific support

Picture1

Written by Maria Cruz, VU Community Manager Research Data Management on 15 November 2018.

This blog post has been originally published on the Vrije Universiteit Amsterdam Research Support Newsletter (re-blogged with permission).

This is the interview between Maria Cruz and Prof. Bas Teusink, the Scientific Director of the Amsterdam Institute for Molecules, Medicines and Systems (AIMMS) about his experience with having dedicated data management support for his research group.


“I hired the right person at the right time”, says Prof. Bas Teusink , Scientific Director of the Amsterdam Institute for Molecules, Medicines and Systems (AIMMS).  His institute was founded in 2010 on the back of major breakthroughs in the fields of molecular, cellular and systems biology. Recently, rapid changes in the pace of data acquisition and data volume in this field asked for the hiring of a dedicated Research Data Manager.

Why has data management become so important in your field?
“At AIMMS our focus is on molecular life sciences – the study of molecules in living systems, of how molecules affect living systems, and of the molecular mechanisms of how drugs work, how toxic compounds work, and how cells work. For biologists, the generation of data is getting less and less labour-intensive, and the interpretation of the data is getting more and more complicated.

Does this mean that researchers need to acquire new skills?
“Yes, bioinformatics, data analysis, and data science are becoming more and more prominent in biology and also in chemistry. It would be a good idea for any bachelor programme in the life sciences to include proper data management, data science, and a little bit of programming and maybe bioinformatics in the curriculum. We’re developing such courses for the bachelor students of the Faculty (of Science).”

Why did you think a dedicated research data manager was needed?
“People in the life sciences community have been talking a lot about the importance of Research Data Management (RDM). When you think about biobanks and other types of big data collections, it is obvious that you have to sort out your data management, but what about a PhD student doing simple experiments in the lab using Excel to process data? How do we help them? As a Principal Investigator, I have no idea how to instruct my students in RDM. I’m not an expert. So I needed support. I needed somebody who actually has the time to look up what tools are available and who can translate general policies and general infrastructure into daily practical solutions that fit our local needs. There’s a huge gap between policy and implementation for people doing the daily work. We need discipline-specific support and we need hands-on help.”

What skills did you look for in a data manager?
“I wanted somebody who understands our field of work, who understands the data management side of things, and who also understands the technologies.”

Was it difficult to find the right person for the job?
“I happened to have Brett Olivier in my group and I could convince management that research data support was worth the investment. Brett is a biochemist with a strong theoretical background, but he also knows how to do experiments, so he can talk with everybody. He has also moved into programming and writing scientific software. Having this technical background means he can talk with people in IT. So he is the perfect guy.”

How is this position financed?
“We have found a pragmatic way of financing Brett’s position. And that is by project money.  When we write a project proposal, if the funders find data management important, we budget a certain amount for data management, say 20K. If we get 5 projects, then we can afford a data manager just from project money. So far I’ve been able to fund Brett almost completely from my own projects.”

Is this funding model sustainable?
“I think it shouldn’t be difficult to finance somebody with this model for the long term. The university or the institute will have to take the risk, of course. If the money doesn’t come in, if the projects are not funded, then somebody has to pay the salary of the data manager. What is interesting with this model is that the chance of getting your project funded increases, because research data management is being taken more and more seriously by the funding agencies.”

What is Brett doing in concrete terms?
“He writes the Data Management Plans (DMPs) for project proposals and supports their implementation. He has been actively involved in the piloting and implementation of a new data management platform with AIMMS researchers. Brett has developed encoding standards for computational models of biological systems. Because of that, he knows how important it is to annotate data using appropriate ontologies and thereby making them more FAIR (Findable, Accessible, Interoperable and Reusable). Many scientists don’t know what an ontology is, let alone use it. Brett will address this and related RDM issues by providing advice on what the current best standards, tools and practices are in the field.”

“Well implemented data strategies can contribute to the quality and efficiency of a research project.”