The Impact of Impact
I wrote the following article to explore how Impact in the Research Excellence Framework 2014 (REF2014) affected the average scores of departments (and hence rankings). This produced a “league table” of how strongly impact affected different subjects. Some of the information in this article was used in a THE article by Paul Jump due to come out 00:00 on 19th Feb 2015. I’ve now also produced ranking tables for each UoA using the standardised weighting I advocate below (see Standardised Rankings).
||Unit of Assessment
||Effective Weight of GPA
ranking in each sub-profile as %
||Anthropology and Development Studies
||Agriculture, Veterinary and Food Science
||Architecture, Built Environment and Planning
||Social Work and Social Policy
||Civil and Construction Engineering
||Sport and Exercise Sciences, Leisure and Tourism
||Communication, Cultural and Media Studies, …
||Electrical and Electronic Engineering, Metallurgy …
||English Language and Literature
||Modern Languages and Linguistics
||Allied Health Professions, Dentistry, Nursing and …
||Computer Science and Informatics
||Geography, Environmental Studies and …
||Economics and Econometrics
||Politics and International Studies
||Art and Design: History, Practice and Theory
||Psychology, Psychiatry and Neuroscience
||Aeronautical, Mechanical, Chemical and …
||Theology and Religious Studies
||Earth Systems and Environmental Sciences
||Business and Management Studies
||Music, Drama, Dance and Performing Arts
||Public Health, Health Services and Primary Care
Table 1: Effective weights on the rankings of the Grade Point Averages, expressed as a percentage, in each of the three measures (Outputs, Impact and Environment). Effective weight conveys the contribution the relative position in each of the sub-profiles contributes to the relative position in the Overall GPA) and is the product of the nominal weight (65%, 20% or 15%) and the standard deviation. These have have been normalised to 100%. The Table is ranked by effective weight in the Impact sub-profile.
The Research Excellence Framework 2014 (REF2014) is the latest assessment of the quality of research in UK universities. These assessments occur roughly every six years and have important direct funding and indirect consequences for individual departments and universities as a whole. A new feature of the 2014 exercise was the introduction of a measure of the socioeconomic Impact that could be attributed to research. Its novelty will ensure that it is the subject of much scrutiny. Many people have and will continue to explore in detail what the measurements tell us about the extent to which the UK research does have an impact on the wider society, however, I am going to try and tackle the simpler task of looking at the impact of impact measurement on the overall assessment of research quality and hence league tables.
As Director of Research and Knowledge Exchange for the School of Mathematical and Physical Sciences at the University of Sussex I oversaw the REF 2014 submissions for the Departments of Mathematics, and Physics and Astronomy. On seeing our result for Physics and Astronomy I was initially surprised by how much our poorer ranking in the Impact measurement had affected our overall position. My immediate intuition was that the Impact scores were spread over a larger range and that this caused them to have a greater influence on the overall score than I would have naïvely expected.
For any subject area (Unit of Assessment, UoA) and any individual department, the REF 2014 results are aggregated into a Grade Point Average (GPA) for each of three sub-profiles: Outputs, Impact and Environment. These are then combined with relative weightings of 65%, 20%, 15% to give an Overall GPA. Naïvely one might then expect that the Impact would affect the Overall rankings by quite a lot less than the Outputs and a little bit more than the Environment. However, that is not necessarily the case.
To consider the contribution of each component to an aggregate score we need to consider the weights and the intrinsic spread in each measurement. For example, in many grant or fellowship panels the proposals are graded by a number of panel members and the scores averaged together. It is well known that a panel member who uses the full range of scores (1-5, say) will have a bigger influence on the final rankings than a panel member whose scores tend to span a more conservative range (say 2-4). Suppose one panel member gives candidate A a score of 1 and B a score of 5 while two other panel members both give A a score of 4 and B a score of 3 then the equal weighted averages will be 3.00 and 3.67. So, even though the three panel members have the same nominal weight, the final ranking of the candidates is that of the minority panel member whose scores had a greater variety and thus their scores had a larger effective weight.
We can estimate these effective weights for each sub-profile in REF 2014. The ranking of a department in one sub-profile (e.g. the GPA of Environment, scored as ge) is determined by comparison to the peers. We can characterise the performance of the peers by their mean GPA, µe, and the variety by the standard deviation, σe. The rank of the department is then related to its difference from the mean compared with the deviation of other departments from the mean i.e. the rank is related to Δe= (ge-µe)/σe . When three sub-profiles are combined with nominal weights wo, wi, we i.e. g=wogo+wigi+wege we can see that Δ, which governs the Overall rank, is proportional to  woso Δo+ wisi Δi + wese Δe. The effective weights (i.e. the impact of each sub-profile on the overall ranking) are thus proportional to the nominal weights scaled by the standard deviation of the measure. If all the standard deviations were the same then the effective weights would be equal to the nominal weights. However, if the standard deviation in one sub-profile is higher than another then it will acquire a higher effective weight. These effective weights for each UoA are shown in Table 1.
It is immediately clear from this table that the effective weights for Outputs are always less than their nominal weights. For two units of assessment (UoA 23 – Sociology and UoA 9 – Physics) Impact has a higher effective weight than Outputs. The average effective weights are 47%, 29% and 24% in contrast to the nominal weights of 65%, 20% and 15%. There is also a wide variety of nearly a factor of two in effective weights in Impact ranging from 19.7% to 38.6%.
So, the variations in scores in Impact (and Environment) are generally much larger than the variations in the scores in Outputs. Why should this be? There are probably a number of factors:
- Although the same numerical scale is used in each of the sub-profiles, they are measuring different things (e.g. 4* means “Quality that is world-leading in terms of originality, significance and rigour” in the Outputs sub-profile, but means “Outstanding impacts in terms of their reach and significance” in the Impact sub-profile). So, a weighted Overall is somewhat meaningless
- There is less variety in the Output sub-profiles because departments typically select the best research to be considered
- There is more variety in the Impact sub-profile because this is the first time this profile has been used and UoAs didn’t know how best to present or select their best Impact
- There is less variation in the Outputs sub-profile because it is based on many more measurements (4 outputs per faculty) than e.g. Impact (~1 impact case study for 10 faculty), so has a smaller “error”
- There is more intrinsic variety e.g. in the environment of departments than there is in the research outputs of individual researchers.
Before the guidelines for REF 2014 were finalised there was some discussion about whether the nominal weight for Impact should be 25% or 20% and there are indications that the weight will increase in the next exercise. This analysis shows that the effective weight is already well above 25% in most cases but with a wide variety of effective weights across different Units of Assessment. Given this variety, and the different definitions of the criteria used in each sub-profile, policy makers should consider carefully how the Overall profiles are constructed in future. They might want to combine standardized statistics rather than raw statistics.
The most obvious conclusion is that care should be taken in interpreting the published Overall scores and rankings. They different sub-profiles do not have the influence you might expect from the naïve weights and, in particular, Impact has a higher impact.
 In statistics Δ is known as a standardized variable
 The constant of proportionality is the standard deviation, σ, of the Overall GPA. This depends both on the standard deviations of the sub-profiles and their correlations. However, the rank order in GPA is the same in either Δ or Δσ so we don’t need to look at the correlations to understand the weights in the ranking.
Seb Oliver, University of Sussex, 20 January 2015