The Impact of Impact

I wrote the following article to explore how Impact in the Research Excellence Framework 2014 (REF2014) affected the average scores of departments (and hence rankings). This produced a “league table” of how strongly impact affected different subjects. Some of the information in this article was used in a THE article by Paul Jump due to come out 00:00 on 19th Feb 2015. I’ve now also produced ranking tables for each UoA using the standardised weighting I advocate below (see Standardised Rankings).

UoA	Unit of Assessment	Effective Weight of GPA ranking in each sub-profile as %
		Outputs	Impact	Envir.
9	Physics	37.9	38.6	23.5
23	Sociology	34.1	38.6	27.3
10	Mathematical Sciences	37.6	37.5	24.9
24	Anthropology and Development Studies	40.2	35.0	24.8
6	Agriculture, Veterinary and Food Science	42.0	33.0	25.0
31	Classics	43.3	32.6	24.0
16	Architecture, Built Environment and Planning	48.6	31.1	20.3
22	Social Work and Social Policy	44.3	31.1	24.7
27	Area Studies	45.8	30.5	23.6
14	Civil and Construction Engineering	49.0	30.2	20.8
32	Philosophy	47.2	30.2	22.7
26	Sport and Exercise Sciences, Leisure and Tourism	50.2	29.7	20.1
36	Communication, Cultural and Media Studies, …	48.4	29.3	22.3
15	General Engineering	47.6	29.1	23.3
25	Education	45.1	29.0	25.9
20	Law	49.3	28.8	21.9
13	Electrical and Electronic Engineering, Metallurgy …	45.9	28.7	25.4
29	English Language and Literature	42.9	28.6	28.5
30	History	47.6	28.5	23.9
1	Clinical Medicine	41.1	28.3	30.6
28	Modern Languages and Linguistics	49.7	28.0	22.3
3	Allied Health Professions, Dentistry, Nursing and …	46.7	27.9	25.4
11	Computer Science and Informatics	51.5	27.9	20.6
17	Geography, Environmental Studies and …	46.9	27.4	25.8
18	Economics and Econometrics	54.3	27.3	18.3
21	Politics and International Studies	48.4	27.0	24.6
34	Art and Design: History, Practice and Theory	50.3	26.9	22.8
5	Biological Sciences	50.7	26.7	22.6
4	Psychology, Psychiatry and Neuroscience	51.1	26.6	22.3
12	Aeronautical, Mechanical, Chemical and …	45.9	26.6	27.5
33	Theology and Religious Studies	48.7	26.6	24.7
7	Earth Systems and Environmental Sciences	51.0	25.6	23.4
19	Business and Management Studies	52.5	24.4	23.0
8	Chemistry	45.8	23.9	30.3
35	Music, Drama, Dance and Performing Arts	56.4	23.5	20.1
2	Public Health, Health Services and Primary Care	56.9	19.6	23.4
	Average	47.1	29.0	23.9

Table 1: Effective weights on the rankings of the Grade Point Averages, expressed as a percentage, in each of the three measures (Outputs, Impact and Environment). Effective weight conveys the contribution the relative position in each of the sub-profiles contributes to the relative position in the Overall GPA) and is the product of the nominal weight (65%, 20% or 15%) and the standard deviation. These have have been normalised to 100%. The Table is ranked by effective weight in the Impact sub-profile.

The Research Excellence Framework 2014 (REF2014) is the latest assessment of the quality of research in UK universities. These assessments occur roughly every six years and have important direct funding and indirect consequences for individual departments and universities as a whole. A new feature of the 2014 exercise was the introduction of a measure of the socioeconomic Impact that could be attributed to research. Its novelty will ensure that it is the subject of much scrutiny. Many people have and will continue to explore in detail what the measurements tell us about the extent to which the UK research does have an impact on the wider society, however, I am going to try and tackle the simpler task of looking at the impact of impact measurement on the overall assessment of research quality and hence league tables.

As Director of Research and Knowledge Exchange for the School of Mathematical and Physical Sciences at the University of Sussex I oversaw the REF 2014 submissions for the Departments of Mathematics, and Physics and Astronomy. On seeing our result for Physics and Astronomy I was initially surprised by how much our poorer ranking in the Impact measurement had affected our overall position. My immediate intuition was that the Impact scores were spread over a larger range and that this caused them to have a greater influence on the overall score than I would have naïvely expected.

For any subject area (Unit of Assessment, UoA) and any individual department, the REF 2014 results are aggregated into a Grade Point Average (GPA) for each of three sub-profiles: Outputs, Impact and Environment. These are then combined with relative weightings of 65%, 20%, 15% to give an Overall GPA. Naïvely one might then expect that the Impact would affect the Overall rankings by quite a lot less than the Outputs and a little bit more than the Environment. However, that is not necessarily the case.

To consider the contribution of each component to an aggregate score we need to consider the weights and the intrinsic spread in each measurement. For example, in many grant or fellowship panels the proposals are graded by a number of panel members and the scores averaged together. It is well known that a panel member who uses the full range of scores (1-5, say) will have a bigger influence on the final rankings than a panel member whose scores tend to span a more conservative range (say 2-4). Suppose one panel member gives candidate A a score of 1 and B a score of 5 while two other panel members both give A a score of 4 and B a score of 3 then the equal weighted averages will be 3.00 and 3.67. So, even though the three panel members have the same nominal weight, the final ranking of the candidates is that of the minority panel member whose scores had a greater variety and thus their scores had a larger effective weight.

We can estimate these effective weights for each sub-profile in REF 2014. The ranking of a department in one sub-profile (e.g. the GPA of Environment, scored as g_e) is determined by comparison to the peers. We can characterise the performance of the peers by their mean GPA, µ_e, and the variety by the standard deviation, σ_e. The rank of the department is then related to its difference from the mean compared with the deviation of other departments from the mean i.e. the rank is related to Δ_e= (g_e-µ_e)/σ_e [1]. When three sub-profiles are combined with nominal weights w_o, w_i, w_e i.e. g=w_og_o+w_ig_i+w_eg_e we can see that Δ, which governs the Overall rank, is proportional to [2] w_os_o Δ_o+ w_is_iΔ_i+ w_es_eΔ_e. The effective weights (i.e. the impact of each sub-profile on the overall ranking) are thus proportional to the nominal weights scaled by the standard deviation of the measure. If all the standard deviations were the same then the effective weights would be equal to the nominal weights. However, if the standard deviation in one sub-profile is higher than another then it will acquire a higher effective weight. These effective weights for each UoA are shown in Table 1.

It is immediately clear from this table that the effective weights for Outputs are always less than their nominal weights. For two units of assessment (UoA 23 – Sociology and UoA 9 – Physics) Impact has a higher effective weight than Outputs. The average effective weights are 47%, 29% and 24% in contrast to the nominal weights of 65%, 20% and 15%. There is also a wide variety of nearly a factor of two in effective weights in Impact ranging from 19.7% to 38.6%.

So, the variations in scores in Impact (and Environment) are generally much larger than the variations in the scores in Outputs. Why should this be? There are probably a number of factors:

Although the same numerical scale is used in each of the sub-profiles, they are measuring different things (e.g. 4* means “Quality that is world-leading in terms of originality, significance and rigour” in the Outputs sub-profile, but means “Outstanding impacts in terms of their reach and significance” in the Impact sub-profile). So, a weighted Overall is somewhat meaningless
There is less variety in the Output sub-profiles because departments typically select the best research to be considered
There is more variety in the Impact sub-profile because this is the first time this profile has been used and UoAs didn’t know how best to present or select their best Impact
There is less variation in the Outputs sub-profile because it is based on many more measurements (4 outputs per faculty) than e.g. Impact (~1 impact case study for 10 faculty), so has a smaller “error”
There is more intrinsic variety e.g. in the environment of departments than there is in the research outputs of individual researchers.

Before the guidelines for REF 2014 were finalised there was some discussion about whether the nominal weight for Impact should be 25% or 20% and there are indications that the weight will increase in the next exercise. This analysis shows that the effective weight is already well above 25% in most cases but with a wide variety of effective weights across different Units of Assessment. Given this variety, and the different definitions of the criteria used in each sub-profile, policy makers should consider carefully how the Overall profiles are constructed in future. They might want to combine standardized statistics rather than raw statistics.

The most obvious conclusion is that care should be taken in interpreting the published Overall scores and rankings. They different sub-profiles do not have the influence you might expect from the naïve weights and, in particular, Impact has a higher impact.

[1] In statistics Δ is known as a standardized variable

[2] The constant of proportionality is the standard deviation, σ, of the Overall GPA. This depends both on the standard deviations of the sub-profiles and their correlations. However, the rank order in GPA is the same in either Δ or Δσ so we don’t need to look at the correlations to understand the weights in the ranking.

Seb Oliver, University of Sussex, 20 January 2015

5 Responses to The Impact of Impact

Pingback: Revised REF rankings tables | Seb Boyd
telescoper says:

February 18, 2015 at 20:03

Reblogged this on In the Dark and commented:
Interesting analysis of the 2014 REF results by my colleague Seb Oliver. Among other things, it shows that Physics was the subject in which “Impact had the greatest impact”..

ian smail says:

February 19, 2015 at 11:02

At Astroforum the HEFCE speaker said that the weights would be finessed in the calculation of the funding to ensure the relative contributions from outputs/impact/environment were as originally proposed.

- sebboyd says:
  
  February 19, 2015 at 11:53
  
  I wasn’t at Astroform, but I’d heard that too (after I’d written this). Of course the funding is a different issue from the one I was looking at. The funding is determined by the absolute level of grades, in particular by the percentage of 4*, the rankings are determined by the relative grades. I was only looking at the GPA rankings. As I said in my follow-up piece it would be good if we could ignore rankings entirely. The problem is that these rankings feed into overall league tables that students use to decide which University to study at and so can affect our teaching income which is more significant than our research income. If we can persuade the Guardian, Times, etc. who produce the league tables for school students to use a sensible weighting too that would be fantastic!
  
Pingback: What’s the difference between ‘game-playing’ and ‘strategizing’? More on the REF. | A HEAD OF DEPARTMENT’S BLOG