The Impact of Impact

 The Impact of Impact

I wrote the following article to explore how Impact in the Research Excellence Framework 2014 (REF2014) affected the average scores of departments (and hence rankings). This produced a “league table” of how strongly impact affected different subjects. Some of the information in this article was used in a THE article by Paul Jump due to come out 00:00 on 19th Feb 2015.  I’ve now also produced ranking tables for each UoA using the standardised weighting I advocate below (see Standardised Rankings).

UoA Unit of Assessment Effective Weight of GPA

ranking in each sub-profile as %

Outputs Impact Envir.
9 Physics 37.9 38.6 23.5
23 Sociology 34.1 38.6 27.3
10 Mathematical Sciences 37.6 37.5 24.9
24 Anthropology and Development Studies 40.2 35.0 24.8
6 Agriculture, Veterinary and Food Science 42.0 33.0 25.0
31 Classics 43.3 32.6 24.0
16 Architecture, Built Environment and Planning 48.6 31.1 20.3
22 Social Work and Social Policy 44.3 31.1 24.7
27 Area Studies 45.8 30.5 23.6
14 Civil and Construction Engineering 49.0 30.2 20.8
32 Philosophy 47.2 30.2 22.7
26 Sport and Exercise Sciences, Leisure and Tourism 50.2 29.7 20.1
36 Communication, Cultural and Media Studies, … 48.4 29.3 22.3
15 General Engineering 47.6 29.1 23.3
25 Education 45.1 29.0 25.9
20 Law 49.3 28.8 21.9
13 Electrical and Electronic Engineering, Metallurgy … 45.9 28.7 25.4
29 English Language and Literature 42.9 28.6 28.5
30 History 47.6 28.5 23.9
1 Clinical Medicine 41.1 28.3 30.6
28 Modern Languages and Linguistics 49.7 28.0 22.3
3 Allied Health Professions, Dentistry, Nursing and … 46.7 27.9 25.4
11 Computer Science and Informatics 51.5 27.9 20.6
17 Geography, Environmental Studies and … 46.9 27.4 25.8
18 Economics and Econometrics 54.3 27.3 18.3
21 Politics and International Studies 48.4 27.0 24.6
34 Art and Design: History, Practice and Theory 50.3 26.9 22.8
5 Biological Sciences 50.7 26.7 22.6
4 Psychology, Psychiatry and Neuroscience 51.1 26.6 22.3
12 Aeronautical, Mechanical, Chemical and … 45.9 26.6 27.5
33 Theology and Religious Studies 48.7 26.6 24.7
7 Earth Systems and Environmental Sciences 51.0 25.6 23.4
19 Business and Management Studies 52.5 24.4 23.0
8 Chemistry 45.8 23.9 30.3
35 Music, Drama, Dance and Performing Arts 56.4 23.5 20.1
2 Public Health, Health Services and Primary Care 56.9 19.6 23.4
Average 47.1 29.0 23.9

Table 1: Effective weights on the rankings of the Grade Point Averages, expressed as a percentage, in each of the three measures (Outputs, Impact and Environment). Effective weight conveys the contribution the relative position in each of the sub-profiles contributes to the relative position in the Overall GPA) and is the product of the nominal weight (65%, 20% or 15%) and the standard deviation. These have have been normalised to 100%. The Table is ranked by effective weight in the Impact sub-profile.

The Research Excellence Framework 2014 (REF2014) is the latest assessment of the quality of research in UK universities. These assessments occur roughly every six years and have important direct funding and indirect consequences for individual departments and universities as a whole. A new feature of the 2014 exercise was the introduction of a measure of the socioeconomic Impact that could be attributed to research. Its novelty will ensure that it is the subject of much scrutiny. Many people have and will continue to explore in detail what the measurements tell us about the extent to which the UK research does have an impact on the wider society, however, I am going to try and tackle the simpler task of looking at the impact of impact measurement on the overall assessment of research quality and hence league tables.

As Director of Research and Knowledge Exchange for the School of Mathematical and Physical Sciences at the University of Sussex I oversaw the REF 2014 submissions for the Departments of Mathematics, and Physics and Astronomy. On seeing our result for Physics and Astronomy I was initially surprised by how much our poorer ranking in the Impact measurement had affected our overall position. My immediate intuition was that the Impact scores were spread over a larger range and that this caused them to have a greater influence on the overall score than I would have naïvely expected.

For any subject area (Unit of Assessment, UoA) and any individual department, the REF 2014 results are aggregated into a Grade Point Average (GPA) for each of three sub-profiles: Outputs, Impact and Environment. These are then combined with relative weightings of 65%, 20%, 15% to give an Overall GPA. Naïvely one might then expect that the Impact would affect the Overall rankings by quite a lot less than the Outputs and a little bit more than the Environment. However, that is not necessarily the case.

To consider the contribution of each component to an aggregate score we need to consider the weights and the intrinsic spread in each measurement. For example, in many grant or fellowship panels the proposals are graded by a number of panel members and the scores averaged together. It is well known that a panel member who uses the full range of scores (1-5, say) will have a bigger influence on the final rankings than a panel member whose scores tend to span a more conservative range (say 2-4). Suppose one panel member gives candidate A a score of 1 and B a score of 5 while two other panel members both give A a score of 4 and B a score of 3 then the equal weighted averages will be 3.00 and 3.67. So, even though the three panel members have the same nominal weight, the final ranking of the candidates is that of the minority panel member whose scores had a greater variety and thus their scores had a larger effective weight.

We can estimate these effective weights for each sub-profile in REF 2014. The ranking of a department in one sub-profile (e.g. the GPA of Environment, scored as ge) is determined by comparison to the peers. We can characterise the performance of the peers by their mean GPA, µe, and the variety by the standard deviation, σe. The rank of the department is then related to its difference from the mean compared with the deviation of other departments from the mean i.e. the rank is related to Δe= (gee)/σe [1]. When three sub-profiles are combined with nominal weights wo, wi, we i.e. g=wogo+wigi+wege we can see that Δ, which governs the Overall rank, is proportional to [2] woso Δo+ wisi Δi + wese Δe. The effective weights (i.e. the impact of each sub-profile on the overall ranking) are thus proportional to the nominal weights scaled by the standard deviation of the measure. If all the standard deviations were the same then the effective weights would be equal to the nominal weights. However, if the standard deviation in one sub-profile is higher than another then it will acquire a higher effective weight. These effective weights for each UoA are shown in Table 1.

It is immediately clear from this table that the effective weights for Outputs are always less than their nominal weights. For two units of assessment (UoA 23 – Sociology and UoA 9 – Physics) Impact has a higher effective weight than Outputs. The average effective weights are 47%, 29% and 24% in contrast to the nominal weights of 65%, 20% and 15%. There is also a wide variety of nearly a factor of two in effective weights in Impact ranging from 19.7% to 38.6%.

So, the variations in scores in Impact (and Environment) are generally much larger than the variations in the scores in Outputs. Why should this be? There are probably a number of factors:

  • Although the same numerical scale is used in each of the sub-profiles, they are measuring different things (e.g. 4* means “Quality that is world-leading in terms of originality, significance and rigour” in the Outputs sub-profile, but means “Outstanding impacts in terms of their reach and significance” in the Impact sub-profile). So, a weighted Overall is somewhat meaningless
  • There is less variety in the Output sub-profiles because departments typically select the best research to be considered
  • There is more variety in the Impact sub-profile because this is the first time this profile has been used and UoAs didn’t know how best to present or select their best Impact
  • There is less variation in the Outputs sub-profile because it is based on many more measurements (4 outputs per faculty) than e.g. Impact (~1 impact case study for 10 faculty), so has a smaller “error”
  • There is more intrinsic variety e.g. in the environment of departments than there is in the research outputs of individual researchers.

Before the guidelines for REF 2014 were finalised there was some discussion about whether the nominal weight for Impact should be 25% or 20% and there are indications that the weight will increase in the next exercise. This analysis shows that the effective weight is already well above 25% in most cases but with a wide variety of effective weights across different Units of Assessment. Given this variety, and the different definitions of the criteria used in each sub-profile, policy makers should consider carefully how the Overall profiles are constructed in future. They might want to combine standardized statistics rather than raw statistics.

The most obvious conclusion is that care should be taken in interpreting the published Overall scores and rankings. They different sub-profiles do not have the influence you might expect from the naïve weights and, in particular, Impact has a higher impact.


[1] In statistics Δ is known as a standardized variable

[2] The constant of proportionality is the standard deviation, σ, of the Overall GPA. This depends both on the standard deviations of the sub-profiles and their correlations. However, the rank order in GPA is the same in either Δ or Δσ so we don’t need to look at the correlations to understand the weights in the ranking.


Seb Oliver, University of Sussex, 20 January 2015

Advertisements

About sebboyd

By day...Professor Seb Oliver, Professor of Astrophysics and Director of Research and Knowledge Exchange for School of Mathematical and Physical Sciences. By night...Seb Boyd, father of 3, husband of one and dabbler in board-game design
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

5 Responses to The Impact of Impact

  1. Pingback: Revised REF rankings tables | Seb Boyd

  2. telescoper says:

    Reblogged this on In the Dark and commented:
    Interesting analysis of the 2014 REF results by my colleague Seb Oliver. Among other things, it shows that Physics was the subject in which “Impact had the greatest impact”..

  3. ian smail says:

    At Astroforum the HEFCE speaker said that the weights would be finessed in the calculation of the funding to ensure the relative contributions from outputs/impact/environment were as originally proposed.

    • sebboyd says:

      I wasn’t at Astroform, but I’d heard that too (after I’d written this). Of course the funding is a different issue from the one I was looking at. The funding is determined by the absolute level of grades, in particular by the percentage of 4*, the rankings are determined by the relative grades. I was only looking at the GPA rankings. As I said in my follow-up piece it would be good if we could ignore rankings entirely. The problem is that these rankings feed into overall league tables that students use to decide which University to study at and so can affect our teaching income which is more significant than our research income. If we can persuade the Guardian, Times, etc. who produce the league tables for school students to use a sensible weighting too that would be fantastic!

  4. Pingback: What’s the difference between ‘game-playing’ and ‘strategizing’? More on the REF. | A HEAD OF DEPARTMENT’S BLOG

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s