SlideShare a Scribd company logo
1 of 13
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object]
Significance Testing ,[object Object],[object Object],[object Object],[object Object]
General Approach
Randomization test p-value = 0.0138
Wilcoxon Test p-value = 0.0560
Sign Test p-value = 0.3222 p-value = 0.3604
Bootstrap Test p-value = 0.0107
Student’s Paired t-test p-value = 0.0153
Results
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Errors in chemical analysis
Errors in chemical analysisErrors in chemical analysis
Errors in chemical analysisUMAR ALI
 
Errors in chemical analyses
Errors in chemical analysesErrors in chemical analyses
Errors in chemical analysesGrace de Jesus
 
Factor analysis
Factor analysis Factor analysis
Factor analysis Nima
 
Factor Analysis
Factor Analysis Factor Analysis
Factor Analysis Raja Adapa
 
Factor analysis
Factor analysisFactor analysis
Factor analysisashishjaswal
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor AnalysesNeerav Shivhare
 
Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...
Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...
Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...manumelwin
 
Priya
PriyaPriya
PriyaStudent
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysisSasquatch S
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingMeghana Gowda
 
Steps In Experimental Design ( QE )
Steps In Experimental Design ( QE )Steps In Experimental Design ( QE )
Steps In Experimental Design ( QE )Pandya Kartik
 
Multivariate Analysis An Overview
Multivariate Analysis An OverviewMultivariate Analysis An Overview
Multivariate Analysis An Overviewguest3311ed
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesYoga Setiawan
 
Sensitivity Analysis
Sensitivity AnalysisSensitivity Analysis
Sensitivity Analysisashishtqm
 

What's hot (20)

Errors in chemical analysis
Errors in chemical analysisErrors in chemical analysis
Errors in chemical analysis
 
Errors in chemical analyses
Errors in chemical analysesErrors in chemical analyses
Errors in chemical analyses
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Factor Analysis
Factor Analysis Factor Analysis
Factor Analysis
 
Output analysis of a single model
Output analysis of a single modelOutput analysis of a single model
Output analysis of a single model
 
Factor analysis (1)
Factor analysis (1)Factor analysis (1)
Factor analysis (1)
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
 
Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...
Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...
Factorial design - Dr. Manu Melwin Joy - School of Management Studies, Cochin...
 
Priya
PriyaPriya
Priya
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modeling
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Steps In Experimental Design ( QE )
Steps In Experimental Design ( QE )Steps In Experimental Design ( QE )
Steps In Experimental Design ( QE )
 
Multivariate Analysis An Overview
Multivariate Analysis An OverviewMultivariate Analysis An Overview
Multivariate Analysis An Overview
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
Sensitivity Analysis
Sensitivity AnalysisSensitivity Analysis
Sensitivity Analysis
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 

Viewers also liked

A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921Ting-Shuo Yo
 
Making the most out of corporate social responsibility
Making the most out of  corporate social responsibilityMaking the most out of  corporate social responsibility
Making the most out of corporate social responsibilityAneesh Suresh
 
From theory based policy evaluation to smart policy design: lessons learned f...
From theory based policy evaluation to smart policy design: lessons learned f...From theory based policy evaluation to smart policy design: lessons learned f...
From theory based policy evaluation to smart policy design: lessons learned f...IEA DSM Implementing Agreement (IA)
 
RESULT BASED M&E in FFA-revised
RESULT BASED M&E in FFA-revisedRESULT BASED M&E in FFA-revised
RESULT BASED M&E in FFA-revisedStephen Musimba
 
Step by step guide to sustainability planning
Step by step guide to sustainability planningStep by step guide to sustainability planning
Step by step guide to sustainability planningKenny Nguyen
 
Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...Soumya Kanti Datta
 
Presentation Training on Result Based Management (RBM) for M&E Staff
Presentation Training on Result Based Management (RBM) for M&E StaffPresentation Training on Result Based Management (RBM) for M&E Staff
Presentation Training on Result Based Management (RBM) for M&E StaffFida Karim 🇵🇰
 
Results Based Monitoring and Evaluation
Results Based Monitoring and EvaluationResults Based Monitoring and Evaluation
Results Based Monitoring and EvaluationMadhawa Waidyaratna
 
Balanced Scorecard for Strategic Planning and Measurement
Balanced Scorecard for Strategic Planning and MeasurementBalanced Scorecard for Strategic Planning and Measurement
Balanced Scorecard for Strategic Planning and MeasurementKenny Ong
 
Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]skzarif
 
Project Monitoring & Evaluation
Project Monitoring & EvaluationProject Monitoring & Evaluation
Project Monitoring & EvaluationSrinivasan Rengasamy
 
Results-Based Management in UNDP
Results-Based Management in UNDPResults-Based Management in UNDP
Results-Based Management in UNDPUNDP Eurasia
 

Viewers also liked (16)

A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921A Comparison of Evaluation Methods in Coevolution 20070921
A Comparison of Evaluation Methods in Coevolution 20070921
 
Making the most out of corporate social responsibility
Making the most out of  corporate social responsibilityMaking the most out of  corporate social responsibility
Making the most out of corporate social responsibility
 
From theory based policy evaluation to smart policy design: lessons learned f...
From theory based policy evaluation to smart policy design: lessons learned f...From theory based policy evaluation to smart policy design: lessons learned f...
From theory based policy evaluation to smart policy design: lessons learned f...
 
RESULT BASED M&E in FFA-revised
RESULT BASED M&E in FFA-revisedRESULT BASED M&E in FFA-revised
RESULT BASED M&E in FFA-revised
 
Step by step guide to sustainability planning
Step by step guide to sustainability planningStep by step guide to sustainability planning
Step by step guide to sustainability planning
 
CSR, Sustainable Business and Strategy
CSR, Sustainable Business and StrategyCSR, Sustainable Business and Strategy
CSR, Sustainable Business and Strategy
 
Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...
 
Outcome Mapping: Monitoring and Evaluation Tool
Outcome Mapping: Monitoring and Evaluation ToolOutcome Mapping: Monitoring and Evaluation Tool
Outcome Mapping: Monitoring and Evaluation Tool
 
Patagonia, integrating CSR into business model creation and strategic management
Patagonia, integrating CSR into business model creation and strategic managementPatagonia, integrating CSR into business model creation and strategic management
Patagonia, integrating CSR into business model creation and strategic management
 
Presentation Training on Result Based Management (RBM) for M&E Staff
Presentation Training on Result Based Management (RBM) for M&E StaffPresentation Training on Result Based Management (RBM) for M&E Staff
Presentation Training on Result Based Management (RBM) for M&E Staff
 
Results Based Monitoring and Evaluation
Results Based Monitoring and EvaluationResults Based Monitoring and Evaluation
Results Based Monitoring and Evaluation
 
Balanced Scorecard for Strategic Planning and Measurement
Balanced Scorecard for Strategic Planning and MeasurementBalanced Scorecard for Strategic Planning and Measurement
Balanced Scorecard for Strategic Planning and Measurement
 
Monitoring and Evaluation Framework
Monitoring and Evaluation FrameworkMonitoring and Evaluation Framework
Monitoring and Evaluation Framework
 
Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]Monitoring & evaluation presentation[1]
Monitoring & evaluation presentation[1]
 
Project Monitoring & Evaluation
Project Monitoring & EvaluationProject Monitoring & Evaluation
Project Monitoring & Evaluation
 
Results-Based Management in UNDP
Results-Based Management in UNDPResults-Based Management in UNDP
Results-Based Management in UNDP
 

Similar to Comparison statisticalsignificancetestir

Parametric vs non parametric test
Parametric vs non parametric testParametric vs non parametric test
Parametric vs non parametric testar9530
 
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptxTopic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptxEdwinDagunot4
 
Errors in research
Errors in researchErrors in research
Errors in researchAasthaBhatia18
 
biki1 biostat.pdf
biki1 biostat.pdfbiki1 biostat.pdf
biki1 biostat.pdfGoogle
 
Non parametrict test
Non parametrict testNon parametrict test
Non parametrict testdobhalshiv
 
Systematic review and meta analysis
Systematic review and meta analysisSystematic review and meta analysis
Systematic review and meta analysisumaisashraf
 
Selection of appropriate data analysis technique
Selection of appropriate data analysis techniqueSelection of appropriate data analysis technique
Selection of appropriate data analysis techniqueRajaKrishnan M
 
Parametric and nonparametric test
Parametric and nonparametric testParametric and nonparametric test
Parametric and nonparametric testponnienselvi
 
statistical analysis gr12.pptx lesson in research
statistical analysis gr12.pptx lesson in researchstatistical analysis gr12.pptx lesson in research
statistical analysis gr12.pptx lesson in researchCyrilleGustilo
 
Probability and data 1w
Probability and data 1wProbability and data 1w
Probability and data 1wKyoungilYoon
 
Business Research Methods PPT - III
Business Research Methods PPT - IIIBusiness Research Methods PPT - III
Business Research Methods PPT - IIIRavinder Singh
 
Sampling, measurement, and stats(2013)
Sampling, measurement, and stats(2013)Sampling, measurement, and stats(2013)
Sampling, measurement, and stats(2013)BarryCRNA
 
uses of statistics in experimental plant pathology
uses of statistics in experimental plant pathologyuses of statistics in experimental plant pathology
uses of statistics in experimental plant pathologyreza23220508
 
Correlation research design presentation 2015
Correlation research design presentation 2015Correlation research design presentation 2015
Correlation research design presentation 2015Syed imran ali
 
Inferential Statistics.pptx
Inferential Statistics.pptxInferential Statistics.pptx
Inferential Statistics.pptxjonatanjohn1
 
Week 6 DQ1. What is your research questionIs there a differen.docx
Week 6 DQ1. What is your research questionIs there a differen.docxWeek 6 DQ1. What is your research questionIs there a differen.docx
Week 6 DQ1. What is your research questionIs there a differen.docxcockekeshia
 

Similar to Comparison statisticalsignificancetestir (20)

Parametric vs non parametric test
Parametric vs non parametric testParametric vs non parametric test
Parametric vs non parametric test
 
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptxTopic 10 DATA ANALYSIS TECHNIQUES.pptx
Topic 10 DATA ANALYSIS TECHNIQUES.pptx
 
Errors in research
Errors in researchErrors in research
Errors in research
 
biki1 biostat.pdf
biki1 biostat.pdfbiki1 biostat.pdf
biki1 biostat.pdf
 
Non parametrict test
Non parametrict testNon parametrict test
Non parametrict test
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Systematic review and meta analysis
Systematic review and meta analysisSystematic review and meta analysis
Systematic review and meta analysis
 
Selection of appropriate data analysis technique
Selection of appropriate data analysis techniqueSelection of appropriate data analysis technique
Selection of appropriate data analysis technique
 
Parametric and nonparametric test
Parametric and nonparametric testParametric and nonparametric test
Parametric and nonparametric test
 
statistical analysis gr12.pptx lesson in research
statistical analysis gr12.pptx lesson in researchstatistical analysis gr12.pptx lesson in research
statistical analysis gr12.pptx lesson in research
 
Probability and data 1w
Probability and data 1wProbability and data 1w
Probability and data 1w
 
Business Research Methods PPT - III
Business Research Methods PPT - IIIBusiness Research Methods PPT - III
Business Research Methods PPT - III
 
Sampling, measurement, and stats(2013)
Sampling, measurement, and stats(2013)Sampling, measurement, and stats(2013)
Sampling, measurement, and stats(2013)
 
uses of statistics in experimental plant pathology
uses of statistics in experimental plant pathologyuses of statistics in experimental plant pathology
uses of statistics in experimental plant pathology
 
Research Procedure
Research ProcedureResearch Procedure
Research Procedure
 
Methodology
MethodologyMethodology
Methodology
 
Correlation research design presentation 2015
Correlation research design presentation 2015Correlation research design presentation 2015
Correlation research design presentation 2015
 
Inferential Statistics.pptx
Inferential Statistics.pptxInferential Statistics.pptx
Inferential Statistics.pptx
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
Week 6 DQ1. What is your research questionIs there a differen.docx
Week 6 DQ1. What is your research questionIs there a differen.docxWeek 6 DQ1. What is your research questionIs there a differen.docx
Week 6 DQ1. What is your research questionIs there a differen.docx
 

More from Claudia Ribeiro

Agent-Based Simulation and Cooperation in Business Organizational Settings
Agent-Based Simulation and Cooperation in Business Organizational SettingsAgent-Based Simulation and Cooperation in Business Organizational Settings
Agent-Based Simulation and Cooperation in Business Organizational SettingsClaudia Ribeiro
 
IThink: A Game-based Approach Towards Improving Collaboration and Participati...
IThink: A Game-based Approach Towards Improving Collaboration and Participati...IThink: A Game-based Approach Towards Improving Collaboration and Participati...
IThink: A Game-based Approach Towards Improving Collaboration and Participati...Claudia Ribeiro
 
Using Serious Games to Teach Business Process Modelling and Simulation
Using Serious Games to Teach Business Process Modelling and SimulationUsing Serious Games to Teach Business Process Modelling and Simulation
Using Serious Games to Teach Business Process Modelling and SimulationClaudia Ribeiro
 
Soft systems methogology
Soft systems methogologySoft systems methogology
Soft systems methogologyClaudia Ribeiro
 
Profiling systems using SoS characteristics
Profiling systems using SoS characteristicsProfiling systems using SoS characteristics
Profiling systems using SoS characteristicsClaudia Ribeiro
 

More from Claudia Ribeiro (6)

Agent-Based Simulation and Cooperation in Business Organizational Settings
Agent-Based Simulation and Cooperation in Business Organizational SettingsAgent-Based Simulation and Cooperation in Business Organizational Settings
Agent-Based Simulation and Cooperation in Business Organizational Settings
 
IThink: A Game-based Approach Towards Improving Collaboration and Participati...
IThink: A Game-based Approach Towards Improving Collaboration and Participati...IThink: A Game-based Approach Towards Improving Collaboration and Participati...
IThink: A Game-based Approach Towards Improving Collaboration and Participati...
 
Using Serious Games to Teach Business Process Modelling and Simulation
Using Serious Games to Teach Business Process Modelling and SimulationUsing Serious Games to Teach Business Process Modelling and Simulation
Using Serious Games to Teach Business Process Modelling and Simulation
 
Video jogos2011
Video jogos2011Video jogos2011
Video jogos2011
 
Soft systems methogology
Soft systems methogologySoft systems methogology
Soft systems methogology
 
Profiling systems using SoS characteristics
Profiling systems using SoS characteristicsProfiling systems using SoS characteristics
Profiling systems using SoS characteristics
 

Recently uploaded

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Recently uploaded (20)

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

Comparison statisticalsignificancetestir

Editor's Notes

  1. Problemas de usar a MAP: a ruido na avaliação de sistemas de retrieval. Alguns tópicos são mais fáceis q outros. As pessoas que são contractadas para criar os julgamentos de relevância dos tópicos são meros humanos, logo cometem erros. Finalmente a escolha da colecção de documentos também influência o resultado da avaliação.. Por motivos óbvios. Porquê significance tests?
  2. General Approach: Two actual runs from TREC 3, 5-8 were used The MAP of each runs is as showed in the excel table (accounting every topic) 5 significance test were use to measure if the difference in MAP between System A and System B was statistically significant, which means.. If System A is in fact better that System B. For every significance test the p-value was calculated according to the test statistic. Then that value is confronted with the significance level, that states the maximum value that a p-value can have to reject the null hypothesis. finally the null hypothesis is accept or rejected. Significance Testing 1. A test statistic or criterion by which to judge the two systems. IR researchers commonly use the difference in mean average precision (MAP) or the difference in the mean of another IR metric. 2. A distribution of the test statistic given a null hypothesis. A typical null hypothesis is that there is no difference in our two systems 3.A significance level (p-value) that is computed by taking the value of the test statistic for our experimental systems and determining how likely a value could have occurred under the null hypothesis.
  3. Null hypothesis = System A and System B have the same distribution. Statistic Test = Mean Average Precision (MAP) P-Value = number of times the difference between MPA(A ) - MPA(B) <= -0.052 + number of times the difference between MPA(A ) - MPA(B) >= 0.052 / total number of permutations (100,000). Characteristics : Distribution-free and doesn’t assumes random sampling.
  4. It can be used as an alternative to the  paired Student's t-test  when the population cannot be assumed to be  normally distributed . But when N (the number of samples) is bigger than 25 the distribution of the wilcoxon text approximates to a normal distributions. Null hypothesis = System A and System B have the same distribution. Test statistic = is the sum of the ranks. p-value = is the minimum value of the test statistic.
  5. Null hypothesis = System A and System B have the same distribution. Test statistic = is the number of pairs for which system A is better than System B. p-value = numero de pairs em q o sistems A é melhor que o sistema B, a dividir pelo número total de pares da permutação. Tied cases = no caso de haver empate, portanto, em q o sistema A teve o mm score que B, e tendo em conta que a precisão numérica pode variar de computador para computador, pode definir uma medida de “Diferença Minima”, segunda a qual é possível desempatar os empatas . IMPORTANTE = o valor do p-value diminui substancialmente (0.0987) quando aumentamos o valor da “Diferença Minima”, pq isso quer dizer que os casos de empate se vão transformar em casos de sucesso para o sistema A.
  6. Null hypothesis = the scores of System A and System B are random samples from the same distribution (diferent from randomization test, wilcoxon test and sign test). Statistic Test = Mean Average Precision (MAP) P-Value = fraction of samples in the shifted distribution that have an absolute value as large or larger that our experiment’s difference. Sampling with replacement - Sampling schemes may be without replacement ('WOR' - no element can be selected more than once in the same sample) or with replacement ('WR' - an element may appear multiple times in the one sample). Characteristics : Distribution-free and assumes random sampling.
  7. Null Hipothesis = System A and System B are random samples from the normal distribution. Statistic Test = Mean Average Precision (MAP) P-Value = fraction of samples in the shifted distribution that have an absolute value as large or larger that our experiment’s difference. Characteristics : Normal Distribution and assumes random sampling. IMPORTANTE: só funciona com populações que sigam uma distribuição normal, portanto pode não ser adequado a todas as null hypothesis. Exemplo??
  8. In this section we report the amount of agreement among p-values produced by the various significance tests. Table 1 shows the RMSE or each of the tests on a subset of the TREC run pairs. We formed this subset by removing all pairs for which all tests agreed on p-value. * If the tests agree with each other there is practical difference among tests. The randomization test, bootstrap test and t test largely agree with each other. The RMSE between these three tests is approximately 0,01 which is an error of 20% for a p-value of 0.05. The wilcoxon test and sign tests don’t agree with any of the other tests. Compared to the randomization test, and this to the t-test and bootstrap, the wilcoxon and sig tests will result in failure to detect significance and false detection of significance. Root Mean Square Error (RMSE)  of an  estimator  is one of many ways to quantify the difference between an  estimator  and the true value of the quantity being estimated. 
  9. Wilcoxon and sign tests : were apropriated before affordable computation existed, but are inappropriate today. Random sampling versus not random sampling: An IR researcher may argue that the assumption of random samples from a population is required to draw an inference from the experiment to the larget world. This cannot be the case. IR researchers have for long understood that inferences from their experiments must be carefuly drawn given the construction of the test setup. Using significance test based on the assumption of random sampling is not warranted for most IR research.
  10. A researcher using the wilcoxon test and sign test is likely spend a lot longer searching for methods that improve retrieval performance compared to a researcher using the randomization, bootstrap or t test.