SlideShare a Scribd company logo
1 of 25
What is
Analytics Labs?
We have worked with partners to create a business intelligence
shared service for UK education
Runner up in the 2018 National
Technology Awards
http://nationaltechnologyawards.co.uk/
• Unique CPD opportunity
• Teams of analysts from across UK HE
• Expertise in policy, data, and visualisation
• One day a week for 13 weeks
• Access to a range of data sources including HESA
• Aim is to produce proof-of-concept dashboards
• Remote working using agile project management methods
• Secure data processing environment
• 289 participants from 109 UK universities (including 11 APs)
What is Analytics Labs?
www.jisc.ac.uk/analytics-labs
Analytics Labs –
Working in an
Agile way –
Activity, Team
Roles and
Approach
Makeup of an Analytics Labs team
Product
Owner
Brings an
understanding of the
policy context and
the needs of
customers
Data
Analyst
Expertise in data and
analysis, especially from an
HE perspective
Scrum Master
Data & Viz Support
Keeps the project on track and
removes impediments
Specialist knowledge in tools
such as Alteryx and Tableau
Meta
Product Owner
Provides
expertise and
guidance in the
specific theme
Lab – the
environment and
tools
Plus Prep, Power BI, PowerPoint, Word, Python, R, Pentaho, Firefox, Chrome, Sublime Text
Secure data processing environment with team and shared data prep areas
GDPR
User Stories, Sprint Goals, Data Sources
Backlog, In Progress, Blocked, Done
Lab environment and tools used
The Analytics Labs curriculum focuses on 2 of our 5 competencies:
• Participating in agile development
• Visualising data
• Transforming data
• Digital collaboration
• Understanding policy and the data landscape
Curriculum
Team:Conquest
ResearchAnalytics
Theme: Evolve
Downstream effects of research funding
Theme: Reproducibility
Bias in experimental research
REPRODUCIBILITY
Reproducibility and transparency
of published preclinical research
involving animal models of
human disease is alarmingly low
Aim
Understand where
improvements might be
made
Provide a tool to allow service
users to
JOURNALFUNDERINSTITUTION
Evaluate the current state of published
preclinical research and explore what
initiatives by the scientific community might
have an impact on this
1. Benchmark to other users
2. See where targets for
improvement might be set
3.Track this progress
Focused on from the
perspective of
How do we measure reproducibility?
Threats to reproducibility are thought to include
Blinded
assessment of
outcome in
these animals
Compliance
with animal
welfare
regulations
Performance
of a sample
size
calculation
Potential
conflicts of
interest
Random
allocation of
animals to
group
Lack of
scientific
rigour
Low
statistical
power
Questionable
research
integrity
Evaluated the reporting of 5 key quality measures
The world’s largest collection of open
access full texts, containing aggregated
content of all research disciplines
CONCORDAT ON
OPENNESS ON ANIMAL
RESEARCH INTHE UK
TOP
GUIDELINES
FOR
JOURNALS
proxy for relative importance
of journal within its field
support researchers and
organisations to further
good practice and
promote integrity and
high ethical standards in
research
Animal Model Studies
Text Mining
Using Machine Learning
Data Sources
promote Open
Research Culture, and
alignment of scientific
ideals with practices
provides the ‘full economic
cost’ of activities including
how much they spend on
research
intended to improve the reporting of
research using animals
encourages organisations
to be clear about their use
of animals in research and
enhance their
communications
Data Source
More information on Machine learning carried out by the Edinburgh team:
This is an algorithm that was developed by James Thomas at UCL and works by starting with a dataset of
studies and classing a subset of these manually, then feeding it to the machine so it can use it as a training
set in order to “learn” by identifying patterns between the data and your manual decisions (i.e. whether a
study should be included because it reports on an animal model of human disease, or it should be excluded
from the dataset because it doesn’t report anything of relevance).
The more you class manually and feed into the algorithm the more the machine will be able to detect patterns
and its performance of being able to do what you are doing as a human, should go up.
Obviously this method is not 100% as there is a lot of noise in there, but it can be a very good tool especially
when you have thousands of papers to screen, which would otherwise take months and even years in some
cases to be performed by a user (made even worse by the fact that the gold standard is for two independent
people to screen and then a third to screen disagreements), so this method not only saves time but also is a
good method to use when resources are limited.
More technical information about the algorithm: The algorithm uses a bag-of-words model for feature
selection and support vector machine with stochastic gradient descent for text classification to filter out animal
publications. More on this: https://www.biorxiv.org/content/10.1101/280131v1
Performance of machine learning algorithm for selecting our animal studies:
Sensitivity 95.5%, Specificity 83.5% and Accuracy 84.7%
More information on Text Mining used by Edinburgh based team:
“Text mining is a method used to explore and analyse large amounts of unstructured text to identify
concepts/patterns/keywords/phrases in the data. The team used regular expressions, which are
essentially a string of rules that tells the computer what conditions of word combinations to use when
searching a piece of text.
It’s fairly simple in the sense I tell the computer to find me the expression “animals randomly allocated
to group” and if it does, class this as the publication having reported random allocation of animals to
group. It’s slightly more sophisticated in the sense that when this statement is preceded by “not” for
example, the computer should not class this as a match.
Unfortunately these are still a work in progress and like the machine learning are not 100%, but again
reading these publications manually and classing them like this is an incredibly time-consuming
process therefore automating this can be very useful and the fact that it’s not 100% doesn’t affect the
overall conclusions that much. In fact, we have found that in some cases the computer identifies
publications that should be classed as TRUE, but the human has falsely classified them as FALSE and
therefore there is error in both directions.”
Tableau
Visualisations and
potential dashboards
Note: findings for illustrative purposes only due to small and example prototype research areas explored
Note: findings for illustrative purposes only due to small and example prototype research areas explored
Note: findings for illustrative purposes only due to small and example prototype research areas explored
Note: findings for illustrative purposes only due to small and example prototype research areas explored
Journal
Anonymised
..
.
.
.
.
.
.
Note: findings for illustrative purposes only due to small and example prototype research areas explored
Note: findings for illustrative purposes only due to small and example prototype research areas explored

More Related Content

Similar to Reproducibility Analytics Lab

Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
butest
 

Similar to Reproducibility Analytics Lab (20)

CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
How Many Dimensions of Compatibility?: Discovering What's Right for Your Users
How Many Dimensions of Compatibility?: Discovering What's Right for Your Users How Many Dimensions of Compatibility?: Discovering What's Right for Your Users
How Many Dimensions of Compatibility?: Discovering What's Right for Your Users
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Sbi simulation
Sbi simulationSbi simulation
Sbi simulation
 
76201910
7620191076201910
76201910
 
Machine learning (domingo's paper)
Machine learning (domingo's paper)Machine learning (domingo's paper)
Machine learning (domingo's paper)
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
 
Elsevier
ElsevierElsevier
Elsevier
 
HCI 3e - Ch 9: Evaluation techniques
HCI 3e - Ch 9:  Evaluation techniquesHCI 3e - Ch 9:  Evaluation techniques
HCI 3e - Ch 9: Evaluation techniques
 
Recommender Systems in TEL
Recommender Systems in TELRecommender Systems in TEL
Recommender Systems in TEL
 
IRJET- Detection of Clinical Depression in Humans using Sentiment Analysis
IRJET-  	  Detection of Clinical Depression in Humans using Sentiment AnalysisIRJET-  	  Detection of Clinical Depression in Humans using Sentiment Analysis
IRJET- Detection of Clinical Depression in Humans using Sentiment Analysis
 
-linkedin
-linkedin-linkedin
-linkedin
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Dissertation
DissertationDissertation
Dissertation
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
INTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptxINTERNSHIP ON MAcHINE LEARNING.pptx
INTERNSHIP ON MAcHINE LEARNING.pptx
 

More from Verena139

Orcid implementation in uk 29092014
Orcid implementation in uk 29092014Orcid implementation in uk 29092014
Orcid implementation in uk 29092014
Verena139
 
Thunderbolts and lightning outputs
Thunderbolts and lightning outputsThunderbolts and lightning outputs
Thunderbolts and lightning outputs
Verena139
 
Weathering the storm outputs
Weathering the storm outputsWeathering the storm outputs
Weathering the storm outputs
Verena139
 

More from Verena139 (16)

Peer judge: Praise and Criticism Detection in F1000Research reviews
Peer judge: Praise and Criticism Detection in F1000Research reviews Peer judge: Praise and Criticism Detection in F1000Research reviews
Peer judge: Praise and Criticism Detection in F1000Research reviews
 
GWAS and DAS
GWAS and DASGWAS and DAS
GWAS and DAS
 
Tracking data
Tracking dataTracking data
Tracking data
 
Data availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case studyData availability and feasibility of validation – A genomics case study
Data availability and feasibility of validation – A genomics case study
 
Metrics for oa monographs - introduction
Metrics for oa monographs - introductionMetrics for oa monographs - introduction
Metrics for oa monographs - introduction
 
Thoughts on metrics for OA monographs
Thoughts on metrics for OA monographsThoughts on metrics for OA monographs
Thoughts on metrics for OA monographs
 
Operas Metrics Service
Operas Metrics Service Operas Metrics Service
Operas Metrics Service
 
Prediction markets
Prediction markets  Prediction markets
Prediction markets
 
Data availability Study
Data availability Study Data availability Study
Data availability Study
 
Jisc R&D work in Research Analytics
Jisc R&D work in Research AnalyticsJisc R&D work in Research Analytics
Jisc R&D work in Research Analytics
 
ORCID: Jisc&ARMA final meeting update by Josh Brown
ORCID: Jisc&ARMA final meeting update by Josh BrownORCID: Jisc&ARMA final meeting update by Josh Brown
ORCID: Jisc&ARMA final meeting update by Josh Brown
 
Orcid implementation in uk 29092014
Orcid implementation in uk 29092014Orcid implementation in uk 29092014
Orcid implementation in uk 29092014
 
ORCID: Jisc&ARMA progress meeting update by Josh Brown
ORCID: Jisc&ARMA progress meeting update by Josh Brown ORCID: Jisc&ARMA progress meeting update by Josh Brown
ORCID: Jisc&ARMA progress meeting update by Josh Brown
 
Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)
Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)
Jisc-ARMA ORCID pilot start-up meeting - presentation by Laure Haak (ORCID)
 
Thunderbolts and lightning outputs
Thunderbolts and lightning outputsThunderbolts and lightning outputs
Thunderbolts and lightning outputs
 
Weathering the storm outputs
Weathering the storm outputsWeathering the storm outputs
Weathering the storm outputs
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 

Recently uploaded (20)

Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 

Reproducibility Analytics Lab

  • 2. We have worked with partners to create a business intelligence shared service for UK education Runner up in the 2018 National Technology Awards http://nationaltechnologyawards.co.uk/
  • 3. • Unique CPD opportunity • Teams of analysts from across UK HE • Expertise in policy, data, and visualisation • One day a week for 13 weeks • Access to a range of data sources including HESA • Aim is to produce proof-of-concept dashboards • Remote working using agile project management methods • Secure data processing environment • 289 participants from 109 UK universities (including 11 APs) What is Analytics Labs? www.jisc.ac.uk/analytics-labs
  • 4.
  • 5. Analytics Labs – Working in an Agile way – Activity, Team Roles and Approach
  • 6. Makeup of an Analytics Labs team Product Owner Brings an understanding of the policy context and the needs of customers Data Analyst Expertise in data and analysis, especially from an HE perspective Scrum Master Data & Viz Support Keeps the project on track and removes impediments Specialist knowledge in tools such as Alteryx and Tableau Meta Product Owner Provides expertise and guidance in the specific theme
  • 8. Plus Prep, Power BI, PowerPoint, Word, Python, R, Pentaho, Firefox, Chrome, Sublime Text Secure data processing environment with team and shared data prep areas GDPR User Stories, Sprint Goals, Data Sources Backlog, In Progress, Blocked, Done Lab environment and tools used
  • 9. The Analytics Labs curriculum focuses on 2 of our 5 competencies: • Participating in agile development • Visualising data • Transforming data • Digital collaboration • Understanding policy and the data landscape Curriculum
  • 10. Team:Conquest ResearchAnalytics Theme: Evolve Downstream effects of research funding Theme: Reproducibility Bias in experimental research
  • 12. Reproducibility and transparency of published preclinical research involving animal models of human disease is alarmingly low
  • 13. Aim Understand where improvements might be made Provide a tool to allow service users to JOURNALFUNDERINSTITUTION Evaluate the current state of published preclinical research and explore what initiatives by the scientific community might have an impact on this 1. Benchmark to other users 2. See where targets for improvement might be set 3.Track this progress Focused on from the perspective of
  • 14. How do we measure reproducibility? Threats to reproducibility are thought to include Blinded assessment of outcome in these animals Compliance with animal welfare regulations Performance of a sample size calculation Potential conflicts of interest Random allocation of animals to group Lack of scientific rigour Low statistical power Questionable research integrity Evaluated the reporting of 5 key quality measures
  • 15. The world’s largest collection of open access full texts, containing aggregated content of all research disciplines CONCORDAT ON OPENNESS ON ANIMAL RESEARCH INTHE UK TOP GUIDELINES FOR JOURNALS proxy for relative importance of journal within its field support researchers and organisations to further good practice and promote integrity and high ethical standards in research Animal Model Studies Text Mining Using Machine Learning Data Sources promote Open Research Culture, and alignment of scientific ideals with practices provides the ‘full economic cost’ of activities including how much they spend on research intended to improve the reporting of research using animals encourages organisations to be clear about their use of animals in research and enhance their communications
  • 17. More information on Machine learning carried out by the Edinburgh team: This is an algorithm that was developed by James Thomas at UCL and works by starting with a dataset of studies and classing a subset of these manually, then feeding it to the machine so it can use it as a training set in order to “learn” by identifying patterns between the data and your manual decisions (i.e. whether a study should be included because it reports on an animal model of human disease, or it should be excluded from the dataset because it doesn’t report anything of relevance). The more you class manually and feed into the algorithm the more the machine will be able to detect patterns and its performance of being able to do what you are doing as a human, should go up. Obviously this method is not 100% as there is a lot of noise in there, but it can be a very good tool especially when you have thousands of papers to screen, which would otherwise take months and even years in some cases to be performed by a user (made even worse by the fact that the gold standard is for two independent people to screen and then a third to screen disagreements), so this method not only saves time but also is a good method to use when resources are limited. More technical information about the algorithm: The algorithm uses a bag-of-words model for feature selection and support vector machine with stochastic gradient descent for text classification to filter out animal publications. More on this: https://www.biorxiv.org/content/10.1101/280131v1 Performance of machine learning algorithm for selecting our animal studies: Sensitivity 95.5%, Specificity 83.5% and Accuracy 84.7%
  • 18. More information on Text Mining used by Edinburgh based team: “Text mining is a method used to explore and analyse large amounts of unstructured text to identify concepts/patterns/keywords/phrases in the data. The team used regular expressions, which are essentially a string of rules that tells the computer what conditions of word combinations to use when searching a piece of text. It’s fairly simple in the sense I tell the computer to find me the expression “animals randomly allocated to group” and if it does, class this as the publication having reported random allocation of animals to group. It’s slightly more sophisticated in the sense that when this statement is preceded by “not” for example, the computer should not class this as a match. Unfortunately these are still a work in progress and like the machine learning are not 100%, but again reading these publications manually and classing them like this is an incredibly time-consuming process therefore automating this can be very useful and the fact that it’s not 100% doesn’t affect the overall conclusions that much. In fact, we have found that in some cases the computer identifies publications that should be classed as TRUE, but the human has falsely classified them as FALSE and therefore there is error in both directions.”
  • 20. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  • 21. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  • 22. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  • 23. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  • 24. Journal Anonymised .. . . . . . . Note: findings for illustrative purposes only due to small and example prototype research areas explored
  • 25. Note: findings for illustrative purposes only due to small and example prototype research areas explored

Editor's Notes

  1. What is Analytics labs
  2. Who have Jisc worked with
  3. What is labs and what are our aims?
  4. In a nutshell Analytics Labs provides a CPD opportunity - Teams work on commonly felt problem spaces, explore the wider national data landscape, acquire HESA and non-HESA data and cleanse, link and transform it creating new proof of concept dashboards.
  5. How do we run these analytics labs?
  6. Makeup of a team
  7. Secure processing environment
  8. Secure data processing environment Within the secure environment we have a number of cutting edge data manipulation tools for team members to use. These do change but as at May 2019 we included Tableau Desktop and Server, Excel, Alteryx, Pentaho, R, Microsoft Power BI and several others.
  9. Curriculum Analytics Labs curriculum - developed in response to participant feedback. It’s designed to help you get up to speed with Alteryx and Tableau quickly, by signposting some of the many resources which are available online.
  10. Feb – May 2019 – Research analytics lab Two key themes 1. Downstream effects of research funding 2. Reproducibility
  11. Team
  12. In recent years it has become increasingly clear that that the reproducibility and transparency of published preclinical research involving animal models of human disease is alarmingly low Unfortunately, it’s been showed time and time again, especially in more recent years, that research has some major reproducibility issues. We focused on animal models of human disease because research shows that reproducibility in this domain is especially low. In terms of transparency research has shown (i.e. systematic reviews) that researchers don’t report enough of their experiments to make them reproducible by other researchers. This contributes to what has now been recognised by the media as the “reproducibility crisis”, which plays a role in the translational failure we have between animal models of human disease and the clinic - where drugs tested in animals often then don’t work when they are taken forward to human studies and therefore arguably waste money and resources.
  13. Teams aim was to understand where improvements might be made. To do this, we can evaluate the current state of published preclinical research and explore what initiatives by the scientific community might have an impact on this at the level of institution, funder and journal. The team wanted to design a tool that could be used by these service users to allow them to benchmark themselves with their competitors, see where they might set targets for improvement and ultimately be able to track their progress and any change in relation to changes in practise over time.
  14. Threats to reproducibility are thought to include a lack of scientific rigour, low statistical power and questionable research integrity among other things. We can attempt to measure these threats to reproducibility by looking at published research papers themselves and assess the reporting of concepts that are intended to reduce these threats in either the design, performance or reporting of research studies. The 5 measures we focused on were: Random allocation of animals to group: where researchers describe whether or not treatment was randomly allocated to animal subjects so that each subject has an equal likelihood of receiving the intervention. This introduces a selection bias into the experiment if not done. Blinded assessment of outcome: Relates to whether or not the investigator involved with performing the experiment, collecting data, and/or assessing the outcome of the experiment was unaware of which subjects received the treatment and which did not. This introduces a detection bias into the experiment if not done. Sample size calculation: Describes how the total number of animals used in the study was determined so that we can make sure study that has been performed is adequately powered and is powerful enough to detect a true biological effect. Compliance with Animal Welfare Regulations: Describes whether or not the research investigators complied with animal welfare regulations. Reporting of any conflicts of interest: Describes if the investigator(s) disclosed whether or not he/she has a financial conflict of interest, for example. *Useful paper if you want more explanation on these and why they are important: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3764080/
  15. For our project we started off with the CORE dataset as the main source of our data, a JISC library that is described as the world’s largest collection of open access full-texts, containing an aggregated content of published research articles from a wide range of disciplines. 1 - Using the CORE-API we extracted 3 years of data to include publications from the years 2016, 2017 and 2018. 2 - We filled in any missing fields where possible by linking articles to CrossRef. We then used machine learning on these to narrow down the studies to animal model studies specifically.This provided the backbone to our dataset and we subbed this with various other bits of information. 3 - As we were interested in reproducibility we brought in information about journals, institutions and funders on a number of initiatives that promote open and transparent research culture and encourage robust performance and reporting of science involving animal models of human disease. These included things like: TOP Guidelines for journals - that promote Open Research Culture UKRIO - that support high ethical standards in research ARRIVE guidelines - that encourage improved reporting of animal research CONCORDAT - who encourage good communication about research Alongside other information like: Journal impact factor which is a proxy measure of how prestigious a journal is in its field and information about the TRAC groups for institutions which is an indicator of how much institutions spend on research out of their total funding. We also really wanted to look at things like training offered by institutions for example, but these data were difficult to find and therefore fell outside of the scope of this project.
  16. Data loss
  17. Team Tableau Outputs
  18. Overview 5 key performance factors Sample size Blinding Compliance with regulations Conflict of interest Randomness Modifiers TRAC group Policies and endorsements
  19. Provider comparison tool (benchmarking)
  20. What policies and endorsements are associated with improvements in the research we fund
  21. Overall impact factor (all measures combined into a single measure) By Journal