“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Measuring What You Change: The Impact of Evaluation on Research Systems
1. 1 11
RESEARCH EVALUATION: WHEN YOU MEASURE A
SYSTEM, YOU CHANGE THE SYSTEM
Giorgio Sirilli
IRCrES-CNR Redazione ROARS
2. 2 22
ROARS
Start: 2011
Members of the Editorial board: 14
Collaborators: 250
Contacts: 10,6 million (November 2011 – May 2015)
Average daily contacts: 500 November 2011; 8,000 in 2014)
Articles published: 2,000
Comments by readers: 30,000
ROARS is ranked 8° among the top cultural national blogs
ROARS, a genuine expression of democracy and participation, has
become a very important player in the policy debate and in policy
making
3. 3 33
Evaluation
Evaluation may be defined as an objective process aimed at the
critical analysis of the relevance, efficiency, and effectiveness of
policies, programmes, projects, institutions, groups and individual
researchers in the pursuance of the stated objectives.
Evaluation consists of a set of coordinated activities of
comparative nature, based on formalised methods and
techniques through codified procedures aimed at formulating an
assessment of intentional interventions with reference to their
implementation and to their effectiveness.
Internal/external evaluation
4. 4 44
The first evaluation (Genesis)
The first evaluation
In the beginning God created the heaven and the earth.
And God saw everything that He had made. “Behold”, God said, “it is very
good”. And the evening and morning were the sixth day.
And on the seventh day God rested from all His work. His Archangel came
then unto Him asking, “God, how do you know that what You have
created is ‘very good’? What are Your criteria? On what data do You
base Your judgement? Aren’t You a little close to the situation to make
a fair and unbiased evaluation?”
God thought about these questions all that day and His rest was greatly
disturbed.
On the eighth day, God said, “Lucifer, go to hell!”
(From Halcom’s “The Real Story of Paradise Lost”)
5. 5 55
A brief history of evaluation
Research Assessment Exercise (RAE)
Research Excellence Framework (REF) (impact)
“The REF will over time doubtless become more
sophisticated and burdensome. In short we are
creating a Frankenstein monster” (Ben Martin)
Italy, a latecomer
Evaluation in Italy: yes or no?
Yes, but … good evaluation
7. 7 77
The value of science
William Gladstone, then British Chancellor of the Exchequer
(minister of finance), asked Michael Faraday of the practical
value of electricity.
Gladstone’s only commentary was ‘but, after all, what use is it?”
“Why, sir, there is every probability that you will soon be able to
tax it.”
Michael Faraday William Gladstone
8. 8 88
The case of physicists
Bruno Maksimovič Pontekorvo
9. 9 99
The case of physicists
“Physics is a single discipline but unfortunately nowadays
phisicists belong to two differents groups: the theoreticians and
the experimentalists. If a thoretician does not posses an
extraordinary ability his work does not make sense ….For
experimentalists also ordinary peole can do a useful work …”
(Enrico Fermi, 1931)
“La fisica è una sola ma disgraziatamente oggi i fisici sono divisi
in due categorie: i teorici e gli sperimentatori. Se un teorico non
possiede straordinarie capacità il suo lavoro non ha senso…
Per quanto riguarda la sperimentazione invece anche una
persona di medie capacità ha la possibilità di svolgere un lavoro
utile.”
10. 10 1010
The case of graphene
Grapheneis an allotrope of carbon in the form of a two-dimensional,
atomic-scale, hexagonal lattice.
Graphene has many extraordinary properties. It is about 100 times
stronger than steel by weight, conducts heat and electricity with great
efficiency and is nearly transparent.
Scientists it was first measurably produced and isolated in the lab in 2003.
Andre Geim and Konstantin Novoselov at the University of Manchester
won the Nobel Prize in Physics in 2010 "for groundbreaking experiments
regarding graphene."
The global market for graphene is reported to have reached $9 million by
2014 with most sales in the semiconductor, electronics, battery energy
and composites industries.
11. 11 1111
The famous paper by Andre Geim and Konstantin Novoselov
was published in 2004 and in 2007 it was indeed quite famous
and cited.
The point is whether the committee would have selected his
project and awarded him with an ERC Starting Grant in 2004.
By looking at his citations and publications records in 2004 it is
very un-probable that he would have been considered among
the top 10%.
The case of graphene
16. 16 1616
The new catchwords
New public management
Value for money
Accountability
Relevance
Excellence
17. 17 1717
The neo-conservative wave in Italy
Letizia Moratti
Italian minister of education and research
“You first show that you use efficiently and effectively
the public money, then we will open the strings of the
purse” Never happened!
18. 18 1818
Model of firm’s management based on the
principles of competitiveness and customer
satisfaction (the market)
The catchwords:
competitiveness
excellence
meritocracy
“Evaluative state” as the “minimum state” in
which the government gives up the role of
political responsibility and avoid the democratic
debate in search of consensus, and rests on the
“automatic pilot” of techno-administrative
control.
Contro l’ideologia della valutazione.
L’ANVUR e l’arte della rottamazione dell’università
19. 19 1919
Contro l’ideologia della valutazione.
L’ANVUR e l’arte della rottamazione dell’università
“ANVUR is much more than an
administrative branch. It is the outcome
of a cultural and political project aimed
at reducing the range of alternatives
and hampering pluralism.”
Sergio Benedetto
20. 20 2020
Changes in university life
The university has become at the mercy of:
- increasing bibliometric measurement
- quality standards
- blind refereeing (someone sees you but you do not see him)
- bibliometric medians
- journal classifications (A, B, C, …)
- opportunistic citing
- academic tourism
- administrative burden
- …….
21. 21 2121
Interview of Italian researchers (40-65 years old)
Main results:
A drastic change of researchers’ attitude due to the introduction of
bibliometrics-based evaluation
The bibliometrics-based evaluation has an extremely strong
normative function on scientific practices, which deeply impact the
epistemic status of the disciplines
The epistemic consequences of bibliometrics-based evaluation
(T. Castellani, E. Pontecorvo, A. Valente, Epistemological consequences of bibliometrics:
Insights from the scientific community, Social Epistemology Review and Reply Collective vol. 3
no. 11, 2014).
22. 22 2222
Results
1. The bibliometrics-based evaluation criteria changed the way in
which scientists choose the topic of their research:
-choosing a fashionable theme
-placing the article in the tail of an important discovery (bandwagon
effect)
-choosing short empirical papers
2. The hurry
3. Interdisciplinary topics are hindered. Bibliometric evaluative
systems encourage researchers not to change topic during their
career
4. repetition of experiments is discouraged. Only new results are
considered interesting(T. Castellani, E. Pontecorvo, A. Valente, Epistemological consequences of bibliometrics:
Insights from the scientific community, Social Epistemology Review and Reply Collective vol. 3
no. 11, 2014).
The epistemic consequences of bibliometrics-based evaluation
24. 24 2424
Research evaluation
Indicators used
- bibliometrics
- R&D
- peer review
- students
- graduates
- patents
- spin-offs
- contracts and other funding
- other
26. 26 2626
Use of publications for decision making
The case of China (SCI)
The case of Russia
27. 27 2727
The h-index (Jorge Eduardo Hirsch)
In 2005, the physicist Jorge Hirsch suggested a new
index to measure the broad impact of an individual
scientist’s work, the h-index .
A scientist has index h if h of his or her Np papers have
at least h citations each and the other (Np − h) papers
have ≤ h citations each.
In plain terms, a researcher has an h-index of 20 if he
or she has published 20 articles receiving at least 20
citations each.
28. 28 2828
Impact factor (Eugene Fardfield)
The impact factor (IF) of an academic journal is a measure
reflecting the average number of citations to recent articles
published in that journal. It is frequently used as a proxy for
the relative importance of a journal within its field.
In any given year, the impact factor of a journal is the average
number of citations received per paper published in that
journal during the two preceding years.
For example, if a journal has an impact factor of 3 in 2008,
then its papers published in 2006 and 2007 received 3 citations
each on average in 2008.
("Citable items" for this calculation are usually articles,
reviews, proceedings, or notes; not editorials or letters to the
editor).
29. 29 2929
Nobel laureates and bibliometrics (Boson in 2013)
Peter Ware Higgs
13 works, mostly in “minor” journal, h-index = 6
Francois Englert
89 works, both in prestigious and minor journals, h-index = 10
W. S. Boyle
h-index = 7
G. E. Smith
h-index = 5
C. K. Kao
h-index = 1
T. Maskawa
h-index = 1
Y. Namby
h-index = 17
30. 30 3030
Science and ideology: the impact on citations
0
500
1,000
1,500
2,000
2,500
3,00080
82
84
86
88
90
92
94
96
98
00
02
04
06
CITATION YEAR
NRCITES
MARX
LENIN
Fall of the Berlin wall
Berlin Nov. 1989
31. 31 3131
San Francisco Declaration on Research Assessment
The Journal Impact Factor, as calculated by Thomson Reuters, was
originally created as a tool to help librarians identify journals to
purchase, not as a measure of the scientific quality of research in an
article.
With that in mind, it is critical to understand that the Journal Impact
Factor has a number of well-documented deficiencies as a tool for
research assessment. These limitations include:
A) citation distributions within journals are highly skewed;
B) the properties of the Journal Impact Factor are field-specific: it is a
composite of multiple, highly diverse article types, including primary
research papers and reviews;
C) Journal Impact Factors can be manipulated (or “gamed”) by editorial
policy; and
D) data used to calculate the Journal Impact Factors are neither
transparent nor openly available to the public.
32. 32 3232
San Francisco Declaration on Research Assessment
General Recommendation
Do not use journal-based metrics, such as Journal Impact Factors,
as a surrogate measure of the quality of individual research articles,
to assess an individual scientist’s contributions, or in hiring,
promotion, or funding decisions.
San Francisco Declaration
on Research Assessment
34. 34 3434
The Leiden Manifesto
Bibliometrics: The Leiden Manifesto for research metrics
“Data are increasingly used to govern science. Research
evaluations that were once bespoke and performed by peers are
now routine and reliant on metrics. The problem is that
evaluation is now led by the data rather than by judgement.
Metrics have proliferated: usually well intentioned, not always
well informed, often ill applied. We risk damaging the system
with the very tools designed to improve it, as evaluation is
increasingly implemented by organizations without knowledge
of, or advice on, good practice and interpretation.”
35. 35 3535
The Leiden Manifesto – Ten principles
1) Quantitative evaluation should support qualitative, expert
assessment.
2) Measure performance against the research missions of the
institution, group or researcher.
3) Protect excellence in locally relevant research.
4) Keep data collection and analytical processes open, transparent
and simple.
5) Allow those evaluated to verify data and analysis.
36. 36 3636
6) Account for variation by field in publication and citation practices.
7) Base assessment of individual researchers on a qualitative
judgment of their portfolio.
8) Avoid misplaced concreteness and false precision.
9) Recognize the systemic effects of assessment and indicators.
10) Scrutinize indicators regularly and update them.
The Leiden Manifesto – Ten principles
39. 39 3939
Ranking of universities
Four major sources of ranking
ARWU Shangai (Shangai, Jiao Tong University)
QS World University Ranking
THE University Ranking (Times Higher Education)
US News e World Reports (Best Global Universities)
40. Criteria selected as the key pillars of what
makes a world class university:
•Research
•Teaching
•Employability
•Internationalisation
•Facilities
•Social Responsibility
•Innovation
•Arts & Culture
•Inclusiveness
•Specialist Criteria
TopUNIVERSITIES
Worldwide university rankings, guides & events
41. 41 4141
Global rankings cover less than 3-5% of the world universities
Performance
Top20
Top500
Next 500
Numberofuniversities
Other 16,500
universities
42. 42 4242
Ranking of universities: the case of Italy
ARWU Shangai (Shangai, Jiao Tong University)
QS World University Ranking
THE University Ranking (Times Higher Education)
US News e World Reports (Best Global Universities)
ARWU Shangai:
Bologna 173,, Milano 186, Padova 188, Pisa 190, Sapienza 191
QS World University Ranking:
Bologna 182,, Sapienza 202, Politecnico Milano 229
World University Ranking SA:
Sapienza 95, Bologna 99, Pisa 184, Milano 193
US News e World Report:
Sapienza 139, Bologna 146, Padova 146, Milano 155
44. 44 4444
The rank-ism (De Nicolao)
The vice-rector of the univerisity of
Pavia declared that “There are various
rankings in the world: in each of them
the University of Pavia ranks in the
firts 1%.
But it is not true. According to three
agencies Pavia is in the following
positions:
371: QS World University Rankings
251-275: Times Higher Education
401-500: Shanghai Ranking (ARWU)
Pavia
45. 45 4545
Evaluation is an expensive exercise
Rule of thumb: less than 1% of R&D budget devoted to
its evaluation
Evaluation of the Quality of Research (VQR)
300 million Euro (ROARS)
182 million Euro (Geuna)
Research Assessment Exercise (RAE)
540 million Euro
Research Excellence Framework (REF)
1 milllion Pounds (500 million)
46. 46 4646
Evaluation is an expensive exercise
National Scientific Habilitation: 126 million Euro
- Cost per application: 2,300 euro
- Cost per job assigned: 32,000 euro
47. 47 4747
Cost of evaluation: the saturation effect
Source: Geuna and Martin
49. 49 4949
Evaluation of the Quality of Research by ANVUR
Researchers’ products to be evaluated
- journal articles
- books and book chapters
- patents
- designs, exhibitions, software, manufactured items,
prototypes, etc.
University teachers: 3 “products” over the period 2004-2010
Public Research Agencies researchers: 6 “products” over the period
2004-2010
Scores: from 1 (excellent) to -1 (missing)
50. 50 5050
Attention basically here!
Evaluation of the Quality of Research by ANVUR
Indicators linked to research:
quality (0,5)
ability to attract resources (0,1)
mobility (0,1)
internazionationalisation (0,1)
high level education (0,1)
own resources (0,05)
improvement (0,05)
51. 51 5151
Evaluation of the Quality of Research by ANVUR
Indicators of the “third mission” :
fund raising (0,2)
patents (0,1)
spin-offs (0,1)
incubators (0,1)
consortia (0,1)
archaeological sites (0,1)
museums (0,1)
other activities (0,2)
52. 52 5252
Call for Papers for Philosophy and Technology’s special
issue: Toward a Philosophy of Impact
There was a time when serendipity played a central role in knowledge
policy. Scientific advancement was viewed as essential for social
progress, but this was paired with the assumption that it was generally
impossible to steer research directly toward desired outcomes.
Attempts to guide the course of research or predict its societal impacts
were seen as impeding the advancement of science and thus of social
welfare.
Driven in part by budgetary constraints, and in part by ideology, the
age of serendipity is being eclipsed by the age of accountability.
Society increasingly requires academics to give an account of the value
of their research. The ‘audit culture’ now permeates the university
from STEM (science, technology, engineering, and math) through HASS
(humanities, arts, and social sciences). Academics are being asked to
consider not just how their work influences their disciplines, but also
other disciplines and society more generally.
53. 53 5353
A warning
“Science today is riven with perverse incentives:
Researchers judge one another not by the quality of their science —
who has time to read all that? — but by the pedigree of their journal
publications.
High-profile journals pursue flashy results, many of which won’t pan
out on further scrutiny.
Universities reward researchers on those publication records.
Financing agencies, reliant on peer review, direct their grant money
back toward those same winners.
Graduate students, dependent on their advisers and neglected by their
universities, receive minimal, ad hoc training on proper experimental
design, believing the system of rewards is how it always has been and
how it always will be.”
The Cronicle of Higher Education (March 16, 2015)
Amid a Sea of False Findings, the NIH Tries Reform - By Paul Voosen
54. 54 5454
Lessons from Research Evaluation
Evaluation in Italy is going to stay
The system has been measured and has changed
Awareness of the limitations of metrics
The challenge: avoid that evaluation becomes a Frankenstein monster
Main problems:
League tables
Competition vs cooperation of scientists
Peer review vs bibliometrics
NSE vs SSH
Opportunistic behaviour
The split of the academic community (the good and the bad guys)
The equilibrium amongst the teaching, research and third mission
Bureacratisation
The use of evaluation for polict purposes