Research Metrics What was proposed … … what might work Jonathan Adams

Research MetricsWhat was proposed … … what might workJonathan Adams

Overview • RAE was seen as burdensome and distorting • Treasury proposed a metrics-based QR allocation system • The outline metric model is inadequate, unbalanced and provides no quality assurance • A basket of metrics might nonetheless provide a workable way of reducing the peer review load • Research is a complex process so no assessment system sufficient to purpose is going to be completely “light touch”

The background • RAE introduced in 1986 • ABRC and UGC consensus to increase selectivity • Format settled by 1992 • Progressive improvement in UK impact • Dynamic change and improvement at all levels

The RAE period is linked to an increase in UK share of world citations

UK performance gain is seen across all RAE grades (Data are core sciences, grade at RAE96)

Treasury proposals • RAE peer review produced a grade • Weighting factor in QR allocation model • Quality assurance • But there were doubters • Community said the RAE was onerous • Peer review was opaque • Funding appeared [too] widely distributed • Treasury wanted transparent simplification of the allocation side

The ‘next steps’ model • Noted correlation between QR and earned income (RC or total) • Evidence drew attention to statistical link in work on dual support for HEFCE and UUK in 2001 & 2002 • Treasury hard-wired the model as an allocation system • So RC income determines QR • But … • Statistical correlation is not a sufficient argument • Income is not a measure of quality and should not be used as a driver for evaluation and reward

QR and RC income scale together, but the residual variance would have an impact HEPI produced additional analyses in report

Unmodified outcomes of outline metrics model perturb current system unduly A new model might produce reasonable change, but few would accept that the current QR allocations are as erroneous as these outcomes suggest

The problem • The Treasury model over-simplifies • Outcomes are unpredictable • There are confounding factors such as subject mix • Even within subjects there are complex cost patterns • The outcome does not inspire confidence and would affect morale • There are no checks and balances • Risk of perverse outcomes, drift from original model • Drivers might affect innovation, emerging fields, new staff • There is no quality assurance

What are we trying to achieve?We want to lighten the peer review burden so we need ‘indicators’ to evaluate ‘research performance’ but not simplistic mono-metrics What we want to know research quality Research black box Inputs Outputs Time Time Funding Numbers.. Publications What we have to use

Informed assessment comes from an integrated picture of research, not single metrics

Data options for metrics and indicators • Primary data from a research phase • Input, activity, output, impact • Secondary data from combinations of these • e.g. money or papers per FTE • Three attributes for every datum • Time, place, discipline • This limits possible sources of valid data • Build up a picture • Weighted use of multiple indicators • Balance adjusted for subject • Balance adjusted for policy purpose

We need assured data sourcing • Where the data comes from • Indicator data must emerge naturally from the process being evaluated • Artificial PIs are just that, artificial • Who collects and collates the data • This affects accessibility, quality and timeliness • HESA • Data quality and validation • Discipline structure • Game playing

We need to agree discipline mapping What is Chemistry?

We have to agree how to account for the distribution of data values e.g. income Minimum Maximum

Distribution of data values - impact The variables for which we have metrics are skewed and therefore difficult to picture in a simple way

Agree purpose for data usage • Data are only indicators • So we need some acceptable reference system • Skewed profiles are difficult to interpret • We need simple, transparent descriptions • Benchmarks • Make comparisons • Track changes • Use metrics to monitor performance • Set baseline against RAE2008 outcomes • Check thresholds to trigger fuller reassessment

Example - categorising impact data This grouping is the equivalent of a log 2 transformation. There is no place for zero values on a log scale.

UK ten-year profile 680,000 papers MODE (cited) AVERAGE RBI = 1.24 MODE MEDIAN THRESHOLD OF EXCELLENCE?

Subject profiles and UK reference

HEIs – 10 year totals – 4.1 Smoothing the lines would reveal the shape of the profile

HEIs – 10 year totals – 4.2 Absolute volume would add a further element for comparisons

Conclusions • We can reduce the peer review burden by increased use of metrics • But the transition won’t be simple • Research is a complex, expert system • Assessment needs to produce • Confidence among the assessed • Quality assurance among users • Transparent outcome for funding bodies • Light touch is possible, but not featherweight • Initiate a metrics basket linked to RAE2008 peer review • Set benchmarks & thresholds, then track the basket • Invoke panel reviews to evaluate change, but only where variance exceeds band markers across multiple metrics

Overview (reprise) • RAE was seen as burdensome and distorting • Treasury proposed a metrics-based QR allocation system • The outline model is inadequate, unbalanced and provides no quality assurance • A basket of metrics might nonetheless provide a workable way of reducing the peer review load • But research is a complex process so no assessment system sufficient to purpose is going to be completely “light touch”

Research Metrics What was proposed … … what might work Jonathan Adams