Bibliometrics and preference modelling

Bibliometrics and preference modelling Thierry Marchant Ghent University

Some academic rankings

Top 5% Authors, as of April 2008 Average Rank Score

Outline • Why rank ? • Which attributes? • Some popular rankings. • How can we motivate a ranking ? • The axiomatic approach. • Comparing peers and apples

Why rank?

Why rank universities ? • To choose one for studying (bachelor student). • To attract good students (good university). • To obtain subsidies (good university). • To allocate subsidies (government). • To allocate students to various universities in function of their score at an exam (government). • ...

Why rank departments ? • To choose one for studying (doctoral student). • To attract good students (good department). • To obtain subsidies (good department). • To allocate subsidies (government). • To allocate students to various departments in function of their score at an exam (government). • ...

Why rank scientists ? • To determine the salary (university). • To award a scientific distinction (scientific society). • To hire a new scientist (university). • To choose a thesis director (student). • To evaluate a department or university (...). • To evaluate a journal (...). • To allocate subsidies (government). • ...

Why rank journals ? • To choose one for publishing (scientist). • To maximize the dissemination of one’s results. • To maximize one’s value. • To evaluate a scientist (...). • To evaluate a department (...). • To evaluate a university (...). • To improve one’s image (good publisher). • ...

Why rank articles ? • To select articles (scientist). • To evaluate a scientist (...). • To evaluate a departement (...). • To evaluate a university (...). • To evaluate a journal (...). • ...

Focus in this talk • Rankings of scientists • Rankings of departments • Rankings of universities • Rankings of journals • Rankings of articles

Which attributes ?

Many relevant attributes • Quality • Evaluation by peers • Quality of the journals • Citations (#, authors, journals, +/-) • Coauthors • Patents • Awards • Budget • Quantity • Number of papers • Number of books • Number of pages • Coauthors (#) • Number of patents • Citations (#) • Awards • Budget • Number of thesis students • Various • Age • Carreer length • Land • Nationality • Discipline • Century • University

Bibliometric attributes • Quality • Evaluation by peers • Quality of the journals • Citations (#, authors, journals, +/-) • Coauthors • Patents • Awards • Budget • Quantity • Number of papers • Number of books • Number of pages • Coauthors • Number of patents • Citations (#) • Awards • Budget • Number of thesis students • Various • Age • Carreer length • Land • Nationality • Discipline • Century • University

Bibliometric attributes • Why using bibliometric attributes ? • Cheap • Objective ? • Reliable ?

Some popular rankings of scientists

Some popular rankings • Number of publications • Total number of citations • Maximal number of citations • Number of publications with at least a citations. • Average number of citations • The same ones weighted by • Number of authors • Number of pages • Impact factor • The same ones corrected for age • h-index, g-index, hc-index, hI-index, R-index, A-index, …

The h-index h-index = 6 • Published in 2005 by physicist G. Hirsch. • 462 (1267) citations in March 2009 (May 2013). • Adopted by Web of Science (ISI, Thomson). • The h-index is the largest natural number x such that at least x of his/her papers have at least x citations each.

How to justify a ranking ? • THE true and universal ranking does not exist.

How to justify a ranking ? • Two departments: • 50 scientists with 2000 citations • 3 scientists with 180 citations • THE true and universal ranking does not exist.

How to justify a ranking ? • If one knows the true ranking, one may compute some correlation between the true one and another one. • THE true and universal ranking does not exist.

How to justify a ranking ? • If one knows the true ranking, one may compute some correlation between the true one and another one. Assessing the Accuracy of the h- and g-Indexes for Measuring Researchers’ Productivity, Journal of the American society for information science and technology, 64(6):1224–1234, 2013. “The analysis quantifies the shifts in ranks that occur when researchers’ productivity rankings by simple indicators such as the h- or g-indexes are compared with those by more accurate FSS.” • THE true and universal ranking does not exist.

How to justify a ranking ? • If one knows the true ranking, one may compute some correlation between the true one and another one. • Assume a law linking the numbers of papers and citations to the quality of the scientist (unobserved variable) and his age. This law may be probabilistic. Derive then an estimation of the quality of ascientist from his data (papers and citations). • THE true and universal ranking does not exist.

How to justify a ranking ? • If one knows the true ranking, one may compute some correlation between the true one and another one. • Assume a law linking the numbers of papers and citations to the quality of the scientist(unobserved variable) and his age. This law may be probabilistic. Derive then an estimation of the quality of a scientists from his data (papers and citations). • Analyze the mathematical properties of rankings. • THE true and universal ranking does not exist.

Characterization of scoring rules

Definitions • Set of journals : J= { j, k, l, …} • Paper: a paper in journal j with x citations and a coauthors is represented by the triplet (j,x,a). • Scientist: mapping f from J×N×Nto N. The number f(j,x,a)represents the number of publications of author f in journal j with x citations and a coauthors. • Set of scientists: set X of all mappings from J×N×Nto N such that Σj∈J Σx∈N Σa∈N f(j,x,a)is finite. • Bibliometric ranking : weak order ≥ on X (complete and transitive relation).

Scoring rules • Examples : • u(j,x,a) = 1# papers • u(j,x,a) = x # citations • u(j,x,a) = x/(a+1)# citations weighted by # authors • u(j,x,a) = IF(j) # papers weighted by impact factor • … • Scoring rule : a bibliometric ranking is a scoring rule if there exists a real-valued mapping u defined on J×N×Nsuch that f ≥ g iffΣj ΣxΣaf(j,x,a)u(j,x,a) ≥ ΣjΣx Σag(j,x,a)u(j,x,a)

Axioms • Independence: for all f, g in X, alljinJ, allx, ainN, we have f ≥ g iff f + 1j,x,a≥ g + 1j,x,a .

Axioms > > g f • + 1 paper in j, • with x citations • with a coauthors • + 1 paper in j, • with x citations • with a coauthors • Independence: for all f, g in X, alljinJ, allx, ainN, we have f ≥ g iff f + 1j,x,a≥ g + 1j,x,a .

Axioms • Archimedeanness: for all f, g, h, e inXwithf > g, thereis a natural nsuch that e + nf ≥ h + ng.

Axioms • Archimedeanness: for all f, g, h, e inXwithf > g, thereis a natural nsuch that e + nf ≥ h + ng. < ≥ h e • +f : 10 papers with 20 citations • +f : 10 papers with 20 citations • +f : 10 papers with 20 citations • +f : 10 papers with 20 citations • +g : 1 paper with 1 citation • +g : 1 paper with 1 citation • +g : 1 paper with 1 citation • +g : 1 paper with 1 citation

Axioms • Not satisfied by the max # of citations or h-index. • Reversal with the h-index when adding 2 papers. • Archimedeanness: for all f, g, h, e inXwithf > g, thereis an integer nsuch that e + nf ≥ h + ng. • Not satisfied by the max # of citations, h-index, lexicographic ranking. • Independence: for all f, g in X, alljinJ, allx, ainN, we have f ≥ g iff f + 1j,x,a≥ g + 1j,x,a .

Result • Proof: • (X, +, ≥) is an extensive measurement structure as in [Luce, 2000]. • (X, +) is a cancellative (f+g = f+h g=h) monoid. It can be extended to a group (X’, +) by the Grothendieck construction. (X’, +, ≥) is an Abelian and Archimedean linearly ordered group. It is isomorphic to a subgroup of the ordered group of real numbers (Hölder). • Theorem : A bibliometric ranking satisfies Independence and Archimedeanness iff it is a scoring rule.Furthermoreuis unique up to a positive affine transformation.

Special case: u(j,x,a) = x /(a+1). • Transfer: foralljinJ, allx, y, ainN, we have 1j,x,a+ 1j,y+1,a ~ 1j,x+1,a+ 1j,y,a(u affine in # citations). • Condition Zero: foralljinJ, allainN, there is f in X such that f + 1j,0,a ~ f (u linear in # citations). • Journals Do Not Matter: forallj, j’inJ, alla, xinN, 1j,x,a ~ 1j’,x,a(u independent of journal). • No Reward for Association: foralljinJ, allm, xinN withm >1, 1j,x,0 ~ m 1j,x,m-1(u inversely proportional to # authors).

Characterization of conjugate scoring rules for scientists and departments

Introduction • Consider two departments each consisting of two scientists. The scientists in department A both have 4 papers, each one cited 4 times. The scientists in department B both have 3 papers, each one cited 6 times. • Both scientists in department A have an h-index of 4 and are therefore better than both scientists in department B, with an h-index of 3. Yet, department A has an h-index of 4 and is therefore worse than department B with an h-index of 6. Hence, the “best” department contains the “worst” scientists.

Definitions • Scientist: mapping f from Nto N. The number f(x)represents the number of publications of scientist f in with x citations. • Set of scientists: set X of all mappings from Nto N such that Σx∈N f(x)is finite. • Ranking of scientists : weak order ≥son X. • Department : vector of scientists • Set of all departments denoted by Y. • Ranking of departments : weak order ≥d on Y.

Scoring rules • Scoring rule : a ranking of departments is a scoring rule if there exists a real-valued mapping u defined on Nsuch that (f1, …, fk)≥d(g1, …, gl) iffΣiΣxfi(x)v(x) ≥ ΣjΣxgj(x)v(x) • Conjugate scoring rules : ≥s and ≥d are conjugate scoring rules if u = v. • Scoring rule : a ranking of scientists is a scoring rule if there exists a real-valued mapping u defined on Nsuch that f ≥sg iffΣxf(x)u(x) ≥ Σxg(x)u(x)

Axioms • Totality: if (f1, …, fk)and (g1, …, gl)are such that Σifi = Σjgj, then (f1, …, fk)~d(g1, …, gl) . • Dummy : (f1, …, fk)~d(f1, …, fk, 0) . • Consistency: if fi≥sgi, for i = 1, … , k, then (f1, …, fk)≥d(g1, …, gk) . In addition, if fi>sgi, for some i, then (f1, …, fk)>d(g1, …, gk) .

Result • Theorem : ≥s and ≥d satisfy Consistency, Totality, Dummy and Archimedeannness of ≥s iff they are conjugate scoring rules.Furthermoreuis unique up to a positive affine transformation.

Discussion

Discussion • Axiomatic analysis of more rankings is needed. • Consistency is important (e.g. h-index for scientists and IF for journals). • Axiomatic analysis of indices is different but also relevant.

Literature • Scientometrics • Journal of Informetrics • Journal of the American Society for Information Science and Technology

Comparing peers and apples

Comparing scientists of different ages h-index =a h-index =b a > b

Bibliometrics and preference modelling