SlideShare una empresa de Scribd logo
1 de 34
Instituto Superior Técnico
Universidade Técnica de Lisboa




Learning To Rank
Academic Experts
        Catarina Moreira
Outline
✓ Introduction
✓ State of the Art Problems
✓ Features to Estimate Expertise
✓ Datasets
✓ Approaches and Results
 ✓ Rank Aggregation Framework
 ✓ Learning to Rank Framework
✓ Conclusions and Future Work
                                   2/34
Expert Finding
                Information
                  Retrieval




Gerard Salton                    Ricardo Baeza-Yates




                   Bruce Croft
                                                3/34
State of the Art Problems

Usage of Generative Probabilistic Models



Heuristics are too simple and do not reflect expertise

Heuristics only based on the documents’ textual contents


                                                        4/34
Contributions

1. Different Sets of Features to Estimate Expertise

2. Rank Aggregation Framework for Expert Finding

3. Learning to Rank (L2R) Framework for Expert Finding



                                                      5/34
Outline
✓ Introduction
✓ State of the Art Problems
✓ Features to Estimate Expertise
✓ Datasets
✓ Approaches and Results
 ✓ Rank Aggregation Framework
 ✓ Learning to Rank Framework
✓ Conclusions and Future Work
                                   6/34
Features: Hypothesis

Multiple estimators of expertise, based on
different sources of evidence, will enable the
construction of more accurate and reliable
ranking models!




                                                 7/34
Textual Similarity
Term Frequency



Inverse Document Frequency   TF.IDF



BM25


                                      8/34
Profile Information

✓ Number of Publications with(out) query topics

✓ Number of Journals with(out) query topics

✓ Years Between Publications with(out) query topics

✓ Average Number of Publications per year



                                                      9/34
Graphs

✓ Total/Max/Avg citations of the authors’ papers

✓ Total Number of Unique Collaborators

✓ Publications’ PageRank

✓ Academic Indexes


                                                   10/34
Hirsch Index


   Hirsch	
  Index




                     11/34
Other Indexes
a-Index



Contemporary h-Index (extension of h Index)



Trend h-Index (extension of h Index)

                                              12/34
Datasets
DBLP - Computer Science Dataset
- Covers journal and conference publications
- Contains abstracts and citation links
- All this information was processed and stored in a database




                                                                13/34
Datasets
Arnetminer - Validation
- Contains experts for 13 query topics
- Experts collected from important Program Committees
 related to the query topics




                                                        14/34
Outline
✓ Introduction
✓ State of the Art Problems
✓ Features to Estimate Expertise
✓ DataSets
✓ Approaches and Results
 ✓ Rank Aggregation Framework
 ✓ Learning to Rank Framework
✓ Conclusions and Future Work
                                   15/34
Question


How can we combine these
        features?


                           16/34
Answer
Traditional IR techniques use
frameworks inspired in
traditional search engines to
combine different sources of
evidence!
                            17/34
Rank Aggregation Framework
     for Expert Finding




                             18/34
Data Fusion Algorithms
✓ Positional

  ✓ Based on the position that a candidate occupies in a ranked list
  ✓ Algorithms: Borda Fuse and Reciprocal Rank Fuse
✓ Score Aggregation

  ✓ Based on the score that a candidate achieved in a ranked list
  ✓ Algorithms: CombSUM, CombMNZ and CombANZ
✓ Majoritarian

  ✓ Based on pairwise comparisons between candidates
  ✓ Algorithms: Condorcet Fusion

                                                                       19/34
Results Rank
       Aggregation (MAP)*
  CombMNZ                                  48,43% [-10,25%]
 Cond. Fusion                             43,82%
  CombSUM                               41,34% [+6,00%]
   Borda Fuse                          39,99% [+9,58%]
Rec. Rank Fuse                        39,99% [+9,58%]
   CombANZ*                          35,61% [+23,06%]

  *Sig. Tests of 0.95 conf.   *Mean Average Precision   20/34
Impact of the Features with
   Condorcet Fusion(MAP)*
                Graph                       43,86% [- 0,09%]
Text + Profile + Graph                        43,82%
      Profile + Graph                       41,65% [+4,95%]
       Text + Graph*                      39,08% [+10,82%]
              Profile*                    36,87% [+15,86%]
       Text + Profile*                  32,67% [+25,45%]
                 Text*                29,75% [+32,11%]
  *Sig. Tests of 0.95 conf.   *Mean Average Precision    21/34
Outline
✓ Introduction
✓ State of the Art Problems
✓ Features to Estimate Expertise
✓ Datasets
✓ Approaches and Results
 ✓ Rank Aggregation Framework
 ✓ Learning to Rank Framework
✓ Conclusions and Future Work
                                   22/34
Question


  How can we combine these
features in an optimal way?


                          23/34
Answer
   IR literature focuses on
Machine learning techniques,
They enable the combination
 of multiple estimators in an
          optimal way!
                                24/34
The L2R Framework
 For Expert Finding




         25           25/34
L2R Algorithms
✓ Pointwise
 ✓ Input: single candidate
 ✓ Goal: use scoring functions to predict relevance
 ✓ Algorithms: Additive Groves

✓ Pairwise
 ✓ Input: pair of candidates
 ✓ Goal: loss function to minimize number of misclassified candidate pairs
 ✓ Algorithms: RankBoost, SVMrank and RankNet

✓ Listwise
 ✓ Input: list of candidates
 ✓ Goal: loss function which directly optimizes an IR metric
 ✓ Algorithm: SVMmap, Coordinate Ascent and AdaRank
                                                                            26/34
Results Learning to
        Rank (MAP)*
Additive Groves                           89,40%
      SVMmap                             87,02% [+2,66%]
      SVMrank                            83,11[+7,04%]
    RankBoost*                         78,40 [+12,30%]
 Coord. Ascent*                        75,77 [+15,25%]
      RankNet*                      65,30% [+26,96%]
      AdaRank*                      64,78% [+27,54%]
*Sig. Tests of 0.95 conf.   *Mean Average Precision    27/34
Impact of the Features with
      Additive Groves(Map)*
Text + Profile + Graph                          89,40%
        Text + Graph*                       88,25% [+1,29%]
                Profile                    87,28% [+2,37%]
        Text + Profile*                    87,14% [+2,53%]
                 Text                   86,60% [+3,13%]
               Graph                  85,26% [+4,63%]
      Profile + Graph*         82,37% [+7,86%]

  *Sig. Tests of 0.95 conf.   *Mean Average Precision     28/34
Comparison with State
      of the Art (MAP)*
    Balog’s Model 2                 39,15% [+56,21%]

 Deng’s AuthorRank                     49,06% [+45,12%]

     Yang’s SVMrank                        63,56% [+28,90%]

Moreira’s Add. Groves                             89,40%
                        *Mean Average Precision           29/34
Prototype
Outline
✓ Introduction
✓ State of the Art Problems
✓ Features to Estimate Expertise
✓ Datasets
✓ Approaches and Results
 ✓ Rank Aggregation Framework
 ✓ Learning to Rank Framework
✓ Conclusions and Future Work
                                   31/34
Conclusions
✓ Effectiveness of the Learning to Rank Framework

  ✓ Best algorithms: Additive Groves, SVMmap and SVMrank

✓ Effectiveness of the Rank Aggregation Approach

  ✓ Best algorithms: CombMNZ and Condorcet Fusion

✓ Effectiveness of the Proposed Features

  ✓ Set of full features are the best


                                                           32/34
Future Work
✓ Feature Selection Techniques (ex: PCA)

✓ Expert Finding in an organizational environment (TREC dataset)

✓ Tasks beyond expert finding

  ✓ Natural Language Processing

  ✓ Geographic Information Retrieval




                                                              33/34
Publications
✓ C. Moreira, P. Calado and B. Martins, Learning to Rank for Expert
  Search in Digital Libraries of Academic Publications, In proceedings of the
  15th portuguese conference on Artificial Intelligence, 2011

✓ C. Moreira, B. Martins and P. Calado, Using Rank Aggregation for Expert
  Search in Academic Digital Libraries, In Simpósio de Informática,
  INFORUM, 2011

✓ C. Moreira, A. Mendes, L. Coheur and B. Martins, Towards the Rapid
  Development of a Natural Language Understanding Module, In proceedings
  of the 11th conference on intelligent virtual agents, 2011


                                                                          34/34

Más contenido relacionado

Similar a Slides

how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projectZenodia Charpy
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)Maurício Aniche
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Sayed Mohsin Reza
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Algorithm Identification In Programming Assignments
Algorithm Identification In Programming AssignmentsAlgorithm Identification In Programming Assignments
Algorithm Identification In Programming AssignmentsKarin Faust
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance Anaya Zafar
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Institute of Contemporary Sciences
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?Michaela Greiler
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Greg Makowski
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentationArun Kumar
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collectionsBIOVIA
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code OptimizationIRJET Journal
 
Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Nane Kratzke
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementEmil Lupu
 
Ranking The Refactoring Techniques Based on The External Quality Attributes
Ranking The Refactoring Techniques Based on The External Quality AttributesRanking The Refactoring Techniques Based on The External Quality Attributes
Ranking The Refactoring Techniques Based on The External Quality AttributesIJRES Journal
 

Similar a Slides (20)

how to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept projecthow to build a Length of Stay model for a ProofOfConcept project
how to build a Length of Stay model for a ProofOfConcept project
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)Can ML help software developers? (TEQnation 2022)
Can ML help software developers? (TEQnation 2022)
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Algorithm Identification In Programming Assignments
Algorithm Identification In Programming AssignmentsAlgorithm Identification In Programming Assignments
Algorithm Identification In Programming Assignments
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance
 
MDE in Practice
MDE in PracticeMDE in Practice
MDE in Practice
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Multi variate presentation
Multi variate presentationMulti variate presentation
Multi variate presentation
 
(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections(ATS6-PLAT03) What's behind Discngine collections
(ATS6-PLAT03) What's behind Discngine collections
 
IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...Smart like a Fox: How clever students trick dumb programming assignment asses...
Smart like a Fox: How clever students trick dumb programming assignment asses...
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
 
Ranking The Refactoring Techniques Based on The External Quality Attributes
Ranking The Refactoring Techniques Based on The External Quality AttributesRanking The Refactoring Techniques Based on The External Quality Attributes
Ranking The Refactoring Techniques Based on The External Quality Attributes
 
Final
FinalFinal
Final
 

Slides

  • 1. Instituto Superior Técnico Universidade Técnica de Lisboa Learning To Rank Academic Experts Catarina Moreira
  • 2. Outline ✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work 2/34
  • 3. Expert Finding Information Retrieval Gerard Salton Ricardo Baeza-Yates Bruce Croft 3/34
  • 4. State of the Art Problems Usage of Generative Probabilistic Models Heuristics are too simple and do not reflect expertise Heuristics only based on the documents’ textual contents 4/34
  • 5. Contributions 1. Different Sets of Features to Estimate Expertise 2. Rank Aggregation Framework for Expert Finding 3. Learning to Rank (L2R) Framework for Expert Finding 5/34
  • 6. Outline ✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work 6/34
  • 7. Features: Hypothesis Multiple estimators of expertise, based on different sources of evidence, will enable the construction of more accurate and reliable ranking models! 7/34
  • 8. Textual Similarity Term Frequency Inverse Document Frequency TF.IDF BM25 8/34
  • 9. Profile Information ✓ Number of Publications with(out) query topics ✓ Number of Journals with(out) query topics ✓ Years Between Publications with(out) query topics ✓ Average Number of Publications per year 9/34
  • 10. Graphs ✓ Total/Max/Avg citations of the authors’ papers ✓ Total Number of Unique Collaborators ✓ Publications’ PageRank ✓ Academic Indexes 10/34
  • 11. Hirsch Index Hirsch  Index 11/34
  • 12. Other Indexes a-Index Contemporary h-Index (extension of h Index) Trend h-Index (extension of h Index) 12/34
  • 13. Datasets DBLP - Computer Science Dataset - Covers journal and conference publications - Contains abstracts and citation links - All this information was processed and stored in a database 13/34
  • 14. Datasets Arnetminer - Validation - Contains experts for 13 query topics - Experts collected from important Program Committees related to the query topics 14/34
  • 15. Outline ✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ DataSets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work 15/34
  • 16. Question How can we combine these features? 16/34
  • 17. Answer Traditional IR techniques use frameworks inspired in traditional search engines to combine different sources of evidence! 17/34
  • 18. Rank Aggregation Framework for Expert Finding 18/34
  • 19. Data Fusion Algorithms ✓ Positional ✓ Based on the position that a candidate occupies in a ranked list ✓ Algorithms: Borda Fuse and Reciprocal Rank Fuse ✓ Score Aggregation ✓ Based on the score that a candidate achieved in a ranked list ✓ Algorithms: CombSUM, CombMNZ and CombANZ ✓ Majoritarian ✓ Based on pairwise comparisons between candidates ✓ Algorithms: Condorcet Fusion 19/34
  • 20. Results Rank Aggregation (MAP)* CombMNZ 48,43% [-10,25%] Cond. Fusion 43,82% CombSUM 41,34% [+6,00%] Borda Fuse 39,99% [+9,58%] Rec. Rank Fuse 39,99% [+9,58%] CombANZ* 35,61% [+23,06%] *Sig. Tests of 0.95 conf. *Mean Average Precision 20/34
  • 21. Impact of the Features with Condorcet Fusion(MAP)* Graph 43,86% [- 0,09%] Text + Profile + Graph 43,82% Profile + Graph 41,65% [+4,95%] Text + Graph* 39,08% [+10,82%] Profile* 36,87% [+15,86%] Text + Profile* 32,67% [+25,45%] Text* 29,75% [+32,11%] *Sig. Tests of 0.95 conf. *Mean Average Precision 21/34
  • 22. Outline ✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work 22/34
  • 23. Question How can we combine these features in an optimal way? 23/34
  • 24. Answer IR literature focuses on Machine learning techniques, They enable the combination of multiple estimators in an optimal way! 24/34
  • 25. The L2R Framework For Expert Finding 25 25/34
  • 26. L2R Algorithms ✓ Pointwise ✓ Input: single candidate ✓ Goal: use scoring functions to predict relevance ✓ Algorithms: Additive Groves ✓ Pairwise ✓ Input: pair of candidates ✓ Goal: loss function to minimize number of misclassified candidate pairs ✓ Algorithms: RankBoost, SVMrank and RankNet ✓ Listwise ✓ Input: list of candidates ✓ Goal: loss function which directly optimizes an IR metric ✓ Algorithm: SVMmap, Coordinate Ascent and AdaRank 26/34
  • 27. Results Learning to Rank (MAP)* Additive Groves 89,40% SVMmap 87,02% [+2,66%] SVMrank 83,11[+7,04%] RankBoost* 78,40 [+12,30%] Coord. Ascent* 75,77 [+15,25%] RankNet* 65,30% [+26,96%] AdaRank* 64,78% [+27,54%] *Sig. Tests of 0.95 conf. *Mean Average Precision 27/34
  • 28. Impact of the Features with Additive Groves(Map)* Text + Profile + Graph 89,40% Text + Graph* 88,25% [+1,29%] Profile 87,28% [+2,37%] Text + Profile* 87,14% [+2,53%] Text 86,60% [+3,13%] Graph 85,26% [+4,63%] Profile + Graph* 82,37% [+7,86%] *Sig. Tests of 0.95 conf. *Mean Average Precision 28/34
  • 29. Comparison with State of the Art (MAP)* Balog’s Model 2 39,15% [+56,21%] Deng’s AuthorRank 49,06% [+45,12%] Yang’s SVMrank 63,56% [+28,90%] Moreira’s Add. Groves 89,40% *Mean Average Precision 29/34
  • 31. Outline ✓ Introduction ✓ State of the Art Problems ✓ Features to Estimate Expertise ✓ Datasets ✓ Approaches and Results ✓ Rank Aggregation Framework ✓ Learning to Rank Framework ✓ Conclusions and Future Work 31/34
  • 32. Conclusions ✓ Effectiveness of the Learning to Rank Framework ✓ Best algorithms: Additive Groves, SVMmap and SVMrank ✓ Effectiveness of the Rank Aggregation Approach ✓ Best algorithms: CombMNZ and Condorcet Fusion ✓ Effectiveness of the Proposed Features ✓ Set of full features are the best 32/34
  • 33. Future Work ✓ Feature Selection Techniques (ex: PCA) ✓ Expert Finding in an organizational environment (TREC dataset) ✓ Tasks beyond expert finding ✓ Natural Language Processing ✓ Geographic Information Retrieval 33/34
  • 34. Publications ✓ C. Moreira, P. Calado and B. Martins, Learning to Rank for Expert Search in Digital Libraries of Academic Publications, In proceedings of the 15th portuguese conference on Artificial Intelligence, 2011 ✓ C. Moreira, B. Martins and P. Calado, Using Rank Aggregation for Expert Search in Academic Digital Libraries, In Simpósio de Informática, INFORUM, 2011 ✓ C. Moreira, A. Mendes, L. Coheur and B. Martins, Towards the Rapid Development of a Natural Language Understanding Module, In proceedings of the 11th conference on intelligent virtual agents, 2011 34/34