Quality is key

Authors: Esther Plomp & Antonio Schettino & Emmy Tsang

The quality of research is important to advance our knowledge in any field. To evaluate this quality is a complicated task, since there is no agreement in the research community on what high-quality research is, and no objective set of criteria to really quantify the quality of science

Where some try to argue that quantitative metrics, such as the ‘journal impact factor’ and the ‘h-index’ are objective measurements of research quality, there is plenty of scientific literature that provides evidence for the exact opposite. This blog delves into some of that literature and questions the objectivity of these metrics.

Journal Impact factor

The Journal Impact Factor (JIF) was originally established to help librarians identify the most influential journals based on the number of citations the journal’s publications have received over the two preceding years. If this was the intended purpose, why is the JIF currently embraced as an indicator of the importance of a single publication within that journal? Or even further removed, used to assess the quality of the work by individual scientists

Multiple studies demonstrated concerns regarding the use of the Journal Impact Factor for research assessment as the numbers, to put it bluntly, do not add up:

  • By focusing on the mean, rather than the median, the JIF is also arbitrarily increased by 30-50%. Journals with high ratings appear to depend on a minority of very highly cited papers, overestimating the real citation rate.

As if that isn’t enough, the journal impact factor is also heavily affected by gatekeeping and human biases. Citation metrics reflect biases and exclusionary networks that systemically disadvantage women2 and the global majority (see for example racial disparities in grant funding from the NIH). Citations themselves are also biased towards positive outcomes. Reviewers and editors have also tried to increase their citations by requesting references to their work in the peer review process. Authors themselves can also decide to needlessly cite their own papers, or set up agreements with others to cite each other’s papers and thus artificially increase citations.  

Commercial interests

Next to the inaccuracies and biases in using the JIF for quality assessment, it should be noted that the JIF is a commercial product managed by a private company: Clarivate Analytics. This raises further points of concern as the missions of commercial companies do not necessarily align with those of universities.  

h-index

The h-index is another metric that tracks citations. A researcher has an h-index of x when they published x papers which were cited at least x times each. While this metric was developed to assess productivity and research impact at the individual level, it is routinely used for research assessment. This is problematic, as it is based on the same underlying issues of other citation-based metrics (as described above). Furthermore:

  • The number of papers that researchers produce is field-dependent (which makes this metric unsuitable to compare researchers from different disciplines). For example, some disciplines cite more extensively than others, which artificially increases this metric.
  • The h-index also does not take into account the individual’s placement in the author list, which may not be important in some disciplines but it makes the difference in others where the first and last authors have more weight.
  • The h-index will never be higher than the total number of papers published, focusing on quantity over quality.
  • Moreover, the h-index is an accumulating metric which typically favours senior male researchers as these tend to have published more.
  • The h-index has spawned several alternatives (37, as of 2011) in an attempt to counteract these shortcomings. Unfortunately, most of these alternatives are highly correlated with each other, which makes them redundant. 

Warnings against the use of these metrics

Many individuals and institutions have warned against the use of these metrics for research assessment, as this has a profound impact on the way research is conducted. Even Nature has signed DORA (which means Springer Nature is against the use of the impact factor for research assessment). The creator of the JIF, Eugene Garfield, also stated that the JIF was not appropriate for research assessment. Even Clarivate Analytics, the company that generates the JIF, stated that “What the Journal Impact Factor is not is a measure of a specific paper, or any kind of proxy or substitute metric that automatically confers standing on an individual or institution that may have published in a given journal.” The creator of the h-index, Jorge Hirsch, warned that the use of h-index as a measurement of scientific achievement could have severe negative consequences.

Tweet by Michael Merrifield on the use of citations as a proxy for quality in research. The image in the tweet shows an increase of his citations after he joined a large consortium project.

Consequences

The focus on citations has severe consequences on scientific research, as it creates a research culture that values the quantity of what is achieved rather than the quality. For example, the use of the JIF and h-index results in the tendency for individuals that experienced success in the past will more likely experience success in the future, an effect known as the Matthew effect. High-risk research that is likely to fail or research that only provides interesting results over the long term is discouraged by focusing on the quantity of outputs. The focus on short term successes therefore reduces the likelihood of unexpected discoveries that could be of immense value to scientific research and society.

When a measure becomes a target, it ceases to be a good measure. – Goodhart’s Law

So what now?

Rather than using metrics to evaluate scientific outputs or researchers, it may be impossible to objectively assess the quality of research, or reach a universal agreement on how to assess research quality. Instead, we could start judging the content of research by reading the scientific article or the research proposal rather than looking at citation metrics. This means that in an increasingly interdisciplinary world researchers will have to communicate their findings or proposals in different ways that are, to a certain extent, understandable to peers in other fields. If that sounds too simplistic, there are also some other great initiatives listed below that serve as alternatives to citation-based metrics in assessing research quality:

Alternative methods and examples of research assessment

See also:
1) Brito and Rodríguez-Navarro 2019, Seglen 1997, Brembs et al. 2013, Callaham et al. 2002, Glänzel and Moed 2002, Rostami-Hodiegan and Tucker 2001, Seglen 1997, and Lozano et al. 2012
2) Caplar et al. 2017, Chakravartty et al. 2018, King et al. 2017, and Macaluso et al. 2016

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s