In his seminal article on the h-index, physicist Jorge E. Hirsch wrote, “In a world of limited resources… quantification (even if potentially distasteful) is often needed for evaluation and comparison purposes.” And with that, the h-index, a bibliometric for direct comparison of researchers, was born.
The h-index was rapidly taken up as a tool for assessing researchers, perhaps in part due to its simplicity; a scientist has an h-index of n, if she has n papers, with n or more citations. Take for example Dr. Wheatus, Dr. Wheatus has published six papers, with 256, 8, 8, 7, 5, and 2 citations. Dr. Wheatus has an h index of five since he has five articles with a citation count of five or higher.
At first glance this algorithm looks a little random. Surely a more obvious metric, such as ‘number of papers’ or ‘number of citations’ would suffice. Hirsch argues that the former cannot measure impact, and the latter does not account for ‘big hits’, or outliers in an author’s citation count. These outliers can be caused in some instances by co-authorship in which the author played a minor role. For example, Dr. Wheatus’ 256 citation manuscript is clearly not representative of the impact of most of her work.
The effect of ‘one hit wonders’ could be somewhat ameliorated by simply dividing citation count by number of articles, but this would reward low productivity. The h-index by comparison rewards both productivity and impact, since a high number of highly cited articles are required to push up the value.
In the wake of this proposal a number of articles have been published interrogating the relationship between a scientist’s h-index and their performance. It was found that on average scientists with a higher h-index were more successful when applying for post-doctoral positions. It was also found that the h-index correlated well with other standard bibliometric tools, and peer group assessment.
While the measure does seem to correlate well with other measures of scientific success, Hirsch cautioned against over reliance on this measure when comparing scientists, saying “Obviously, a single number can never give more than a rough approximation to an individual’s multifaceted profile, and many other factors should be considered in combination in evaluating an individual.”
In this vein there are a range of commonly identified caveats that are applied to use of this metric. The first, is that it should not be used to compare scientists from different fields, due to differences between citation behavior in different disciplines.
The second, is that the h-index should not be used to directly compare scientists at different stages in their career. As Hirsch predicted, and has been subsequently borne out, the index naturally rises by approximately one for every year that a researcher is active.
Finally, the index has been found to be susceptible to manipulation by self citation. Work done by Bartneck and Kokkelmans demonstrate the ease with which such manipulation could be undertaken. However they concluded that this method of “h-index padding” is unlikely to be as effective as pursuing productive and impactful research questions.
While the h-index is certainly not a perfect metric for comparing researchers, it is one that has gained enormous popularity in a relatively short period. As long as the metric is used with an eye toward its limitations, its rapid take-up need not necessarily be viewed with trepidation.
See our previous article in the “Research Metrics” series on Journal Impact Factors, and the next in the series, a future discussion on altmetrics!