What is an Open Knowledge Base anyway?

The recent contract signed between the Dutch research institutions and the publishers Elsevier mentions the possibility of an Open Knowledge Base (OKB), but the details are vague. This blog post looks some more about definitions of an OKB within the context of scholarly communications and elements that need to be taken into account in building one.

Readers may also be interested in contributing to the consultation that is being run as part of the Dutch Taskforce on Responsible Management of Research Information and Data . The VSNU will also be commissioning a feasibility study on the topic.

Authors: Alastair Dunning, Maurice Vanderfeesten, Sarah de Rijcke, Magchiel Bijsterbosch, Darco Jansen (all members of above taskforce)

Definition of an Open Knowledge Base

An Open Knowledge Base is a relatively new term, and liable to multiple interpretations. For clarification, we have listed some of the common features of an Open Knowledge Base (OKB):

  • it hosts collections of metadata (descriptive data) as opposed to large collections of data (spreadsheets, images etc) 

  • the metadata is structured according to triples of subject object and predicate (eg The Milkmaid (subject) is painted by (predicate) Vermeer (object))

  • each point of the triple is usually related to an identifier elsewhere, for example Vermeer in the OKB could be linked with reference to Vermeer in the Getty Art and Architecture thesaurus

  • The highly structured nature of the metadata makes it easier for other computers to incorporate that data; OKBs have an important role to play for search engines such as Google as well as a basis for far-reaching analysis

  • All the data (whether source or derived) is open for others to access and reuse, whether via an API, SPARQL endpoint, a data dump, or a simple interface, typically via a CC0 licence

  • The data is described according according to existing standards, identifiers, ontologies and thesauri

  • the rules for who can upload and edit the data will vary between OKB. All OKBs need to deal with a a tension between data extent, richness and quality  

  • The technical infrastructure is usually hosted in one place – however, the OKB will link to other OKBs to make a larger network of open metadata. In essence, this creates a federated infrastructure 

  • In some, but not all, cases, the OKB is not an end in itself but supplies the data that other services can build upon; thus there is a deliberate split between the underlying data and the services and tools that use that data   

  • An OKB share some aspects with Knowledge Base of Metadata on Scholarly Communication but is broader in both in terms of content and its commitment to openness

The best current example of an Open Knowledge Base is Wikidata. An example of a service built on top of Wikidata is Histopedia. Also library communities around the globe contribute journal titles to a Global Open Knowledge Base (GOKB).

Open Knowledge Bases and Scholarly Communication

Traditionally, metadata related to scholarly communications has been managed in discrete, unconnected, closed, commercial systems. Such collections of data have been closely tied to the interface to query the data. This restricts the power of the data – whoever creates the interface determines what types of questions can be asked.

An Open Knowledge Base counters this. Firstly, it separates the interface from the data. Secondly, it opens up and connects the underlying metadata to other sources of metadata. Such an approach allows much greater freedom – users are no longer restricted by the specific manner in which the interface was designed nor restricted to querying one set of metadata. Such openness makes the OKB flexible about the type of data it incorporates and when – other data providers with different datasets can connect or incorporate their data at a date that suits them. The openness also allows third parties to build specific interfaces and different services on top of the OKB.

A representation of the Open Knowledge Base, with an idea of how metadata is provided, enriched and then re-used
A representation of the Open Knowledge Base, with an idea of how metadata is provided, enriched and then re-used

For the field of scholarly communication, an ambitious federated metadata infrastructure would connect all sorts of entities, each with clear identifiers. Researchers, articles, books, datasets, research projects, research grants, organisations, organisational units, citations etc could all form part of a national OKB that connects to other OKBs. It would also help create enriched data, which could then be fed back into the OKB.

Such a richness of metadata would be a springboard for an array of services and tools to provide new analyses and insights on the evolution of scholarly communication in the Netherlands.

The best current example of an Open Knowledge Base for scholarly communication is developed by Open-Aire

The OpenAire Research Graph draws on data from many different scholarly communications tools

TIB Hannover is also developing an Open Research Knowledge Graph. Wikidata also holds plenty of metadata relating to scientific articles. A good example of enrichment services built on top of Open Knowledge Bases are Scholia, Semantic Scholar and Lens.orgOpen Citations provides both  a collection of aggregated data (on scholarly citations) and some basic tools to query it. The Global Open Knowledgebase is another example, with a focus on data needed by libraries to undertake collections management. The study by Ludo Waltman looks at further collections of open metadata.

Issues in constructing an Open Knowledge Base for the Netherlands (OKB-NL)

A well constructed open knowledge base can play a significant role in innovation and efficiency in the scholarly communications ecosystem. Given the breadth of data it can contain, it could be the engine for sophisticated research analysis tools. But it requires significant long-term engagement from multiple stakeholders, who will be both providing and consuming data. It is imperative that such stakeholders work in a collaborative fashion, according to an agreed set of principles. 

The Dutch taskforce on Responsible Management of Research Information and Data has opened a consultation on these principles; readers of this blog are invited to contribute until Monday 8th of June 2020.

Whatever principles are used to underlie an OKB, there also needs to be serious thought given to practical concerns. How would an OKB be created and sustained? An OKB is an ambitious project; if it is to succeed it requires strong foundations. The following issues would all need to be addressed:


Who would steer the direction of the OKB? How would any board reflect the multiple research institutions contributing to the OKB? To make an OKB effective, it would require the ongoing participation of every research institution in NL – how would the business model ensure that? And who would actually do the day-to-day management of the OKB? What should be the role of commercial organisations contributing to the OKB and its underlying principles. Should they have a stake in the governance of an OKB? 


Who would pay the initial costs for establishing an OKB? How would the ongoing cost be paid? Via institutional membership? Via consortium costs? Via government subsidy? Via public-private partnerships? Would all institutions gain equal benefit from the OKB? Would they pay different rates?  


What kind of technical architecture does the OKB require – centralised, with all the data in one place, or distributed, with data residing in multiple locations?  If the latter, how can we ensure that the data is open and interoperable? Or some kind of clever hybrid? Given its role as the foundation of other services, how can it be guaranteed that the OKB has close to 100% uptime as possible? And how can it be as responsive as possible, providing instantaneous responses to user demand? 

Scope of Metadata Collection 

The potential scope of an OKB is huge. Each content type has their own specific metadata schemes. These schemes evolve over time. How are different metadata types incorporated over time? Article metadata first? Then datasets, code, funding grants, projects, organisations, authors, journals?  What about different versions of metadata schemes, need all backlog records be converted?

Quality, Provenance and Trust

Would the metadata in the OKB be sufficient to underpin high-quality services? What schema would need to be created for the different sorts of metadata? What critical mass of metadata would be required to create engaging services? What kind of and metadata alignment and enrichment would need to be undertaken? Would that be done centrally or by institutions and publishers? What costs would be associated with that? Would the costs be ongoing? Should provenance to the original supplier of the metadata and metadata enrichments be attributed?

Service development and Commercial engagement

What incentives would there be for commercial partners to a) provide metadata and b) build services on top of the OKB? Would the investment to develop such services simply lead to one or two big companies dominating the service offer? Would they compete with services not relying on the OKB? What would happen to enriched data created by commercial companies? Would it be returned to the OKB? 

Would the resulting services be of use to all contributing members? Could the members develop their own services independent of commercial offerings?  

Implementation timeline: Lean or Big Bang

When implementing the OKB, should we first carefully design the full stack of the infrastructure, and solve all the questions within the grand information architecture? Or let it grow organically, and start with collecting the metadata in the formats that is already legally available according to the publishing contracts? Can we do both in parallel; start collecting, and start designing?

As mentioned above, the VSNU will be commissioning a feasibility study of an Open Knowledge Base. In the meantime, Maurice Vanderfeesten has written a further blog on Solutions for constructing an Open Knowledge Base for the Netherlands (OKB-NL)


  1. Pingback: Solutions for constructing an Open Knowledge Base for the Netherlands (OKB-NL) | Open Working
  2. nemobis

    Thank you for the write-up! It’s a bit confusing to me why such knowledge base would need to be country-specific. Would it be “enough” to go for option 4 of the proposed WikiCite roadmap https://www.wikidata.org/wiki/Wikidata:WikiCite/Roadmap , with a federated Wikibase?

    I’m a bit obtuse and some requirements appear too abstract to me, I prefer to have some clear use cases in mind. My preferred use case are “check OA rates for an institution” and “find all researchers in an institution or country who did not yet deposit their work in OA, send them a notification”. Would your proposal help with either?


    • Alastair Dunning

      A federated Wikibase might well be enough and you’re right – this would be better as an international service rather than a country-specific. But it’s good for Netherlands to define some principles and think how it would like to organise the collection and alignment of data – the difficult task is not building the system technically, but organising the stakeholders to contribute.

      As for the use cases, it’s very much dependent on what data goes into the OKB. But I could very much imagine that the two use cases you mention could be incorporated. Indeed, it would be useful to collect use cases as a rational for an Open Knowledge Base …. especially to ask questions that cannot easily be asked at the moment


      • nemobis

        Agreed on both counts. Collecting use cases would be a very useful exercise. All too often (and the recent contract with Elsevier is a prime example) we start from “solutions” rather than from the specific needs of the target audience. The result is we end up with whatever is convenient for whoever comes first with a proposal (usually the largest for-profit market player).


  3. kclavel11

    TIB’s Open Research Knowledge Graph, which is given as an example, goes beyond metadata. It analyses the actual content of scholarly publications (at this moment, papers), extracting information and structuring it into linked data, making it machine readable and actionable. See current example of extracting covid-19 reproductive number from papers. Use cases would be research questions rather than questions about publications or publishing behaviour.


  4. nemobis

    I wonder if what is desired here is similar to the new portal of the research of Finland, just published this week. https://research.fi/en/
    It relies on a relatively complicated workflow of metadata across several national databases, but maybe something can be reused. The discoverability features are impressive, even though they’re still being worked on.


    • Alastair Dunning

      The excellent Finish portal mentioned by nemobis is, to my eyes, more than an Open Knowledge Base. It provides an interface to the underlying data and this data will, according to the organisers of the site, soon be downloadable. Perhaps you could call research.fi an Open Knowledge Base Plus (OKB+) An Open Knowledge Base is much more austere than this. It provides access to the data but lets other people build the interfaces, using API or SPARQL or data downloads

      However, focussing on an OKB+ might be a good decision. An austere machine friendly, but human unfriendly Open Knowledge Base is a more difficult proposition to sell to supporters who are unfamiliar with the world of data. Having an interface that allows core questions about research output to be answered is a great first way to exploit the underlying data.

      Incidentially, we already have http://narcis.nl as a similar idea in the Netherlands. But this was developed before the idea of an Open Knowledge Base had much traction.


  5. Pingback: Meet Chris Hartgerink, a.k.a. ‘the Bernie Sanders of Open Science’: Scientific Objectivity, Inclusiveness and Socio-technological Frameworks – Open Science Community Utrecht
  6. Pingback: Open Science Community Utrecht | Meet Chris Hartgerink, a.k.a. 'the Bernie Sanders of Open Science': Scientific Objectivity, Inclusiveness and Socio-technological Frameworks
  7. Erik Flikkenschild

    Looks to me as a FAIR project, you need to establish a FAIR community (implementation network) with the objective to adopt the FAIR principles, make your data FAIR, and connect all repositories to a FAIR data Point, and develop a national portal for this purpose


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s