SlideShare a Scribd company logo
1 of 15
Download to read offline
Usage-Based vs. Citation-Based
Recommenders in a Digital Library

                   André Vellino 	

          School of Information Studies 
               University of Ottawa
       blog: http://synthese.wordpress.com
                 twitter: @vellino	

          e-mail: avellino@uottawa.ca
Context
—  Canada Institute for Scientific and Technical Information
  (aka Canada’s National Science Library)
—  Has a full-text digital collection (Scientific, Technical,
  Medical) with text-mining rights for research purposes only
  —  Elsevier and Springer (mostly)
      —  ~8M articles
      —  ~2800 journals
      —  ~ 3TB
—  Plan: a Hybrid, Multi-Dimensional
  —  Usage-based (CF)
  —  Content-based (CBF)
  —  User-Context
Sparsity of Usage Data is a Problem in
Digital Libraries
    Amazon             Digital Libraries
 Users       Items                Items
                      Users




                      ~70,000



~ 70 M   ~ 93 M                  ~7M
Data is Sparse Too

                                  edges user-item graph 	

—  Sparseness of a dataset S =
                                  total number of possible edges	


—  Mendeley data           S = 2.66 x 10-05
—  Neflix                  S = 1.18 x 10-02

—  But also, Mendeley data isn’t “highly connected”
   —  83.6% of Mendeley articles were referenced by only 1 user
   —   6% of the articles were referenced by 3 or more users.
(2009)	




ExLibris bX solution to data sparsity:
   Harvest lots usage (co-download)
   behaviour from world-wide SFX (Ex
   Libris Open URL resolver) logs and
   apply collaborative filtering to
   correlate articles.




     Johan Bollen and Herbert Van de Sompel. An architecture for the
     aggregation and analysis of scholarly usage data. (in JCDL2006)
TechLens+ Citation-Based Recommdendation
          p2	

                                                                              References	

                                                       Articles	





p3	


p5	




        R. Torres, S. McNee, M. Abel, J. Konstan, and J. Riedl. Enhancing Digital
        Libraries with TechLens+. (in JCDL 2004)
Does “Rated” Citations w/ PageRank Help?
                       p1 p2 p3 p4 p5 p6 p7 p8                         citations
                  p1                         0.4         
                  p2             0.5         0.4
   articles
                  p3   0.2                         0.6

                  p4         0.7 0.5                          
                  u1             0.5 0.3           0.6        
   users
                  u2   0.2             0.3                        = constant

Answer:	

    Using PageRank to “rate” citations is not significantly 	

    Better than using a constant (0/1)	

Note:	

    There is ongoing work w/ NRC on machine learning method 
    for extracting “most important references” – that might help more
Sarkanto (NRC Article Recommender)
—  Uses TechLens+ strategy of replacing User-Item matrix with
    Article-Article matrix from citation data
—  Uses TASTE recommender (now the recommendation
    component of Mahout)
—  Is now decoupled from user-based recommender
—  Compare side by side w/ ‘bX’ recommendations
Try it here:

     http://lab.cisti-icist.nrc-cnrc.gc.ca/Sarkanto/
Sarkanto compared w/ bX




“These are articles whose co-         “Users who viewed this article also
citations are similar to this one.”   viewed these articles.”
Experiments
—  Sarkanto generated ~ 1.9 million citation-based
    recommendations (statically)
—  Experimental comparison done on 1886 randomly selected
    articles from a subset of ~ 1.2M articles (down from ~ 8M)
—  Questions asked in the experiment:
  —  How many recommendations produced by each recommender
  —  Coverage (how often does a seed article generate a
      recommendation)
  —  How semantically diverse are the recommendations
Measuring Semantic Diversity




—  Question: what is the semantic distance between the source-
    article and the recommendations?
—  In this setup it was not possible to compare the semantic distance
    without the full-text for both set of recommendations
—  Full-text is available for the Sarkanto recommendations but not for
    the bX recommendations
Journal-Journal Semantic Distance
  —  Concatenate the full-text of all the articles in each journal
  —  From a Lucene index of the full text in each journal, use
     Dominic Widdows’ Semantic Vectors package to create
     —  a term-journal matrix,
     —  reduced dimensionality term-vectors (512) for each journal
        using random projections
  —  Apply multidimensional scaling (MDS) in R to obtain a 2-D
     distance matrix (2300 x 2300)
G. Newton, A. Callahan, and M. Dumontier. Semantic journal mapping for 	

search visualization in a large scale article digital library in Second Workshop 	

on Very Large Digital Libraries, ECDL 2009
2-D Journal Distance Map
                                              Colours clusters represent	

                                              Journal subject headings
                                              (from publisher metadata)	





http://cuvier.cisti.nrc.ca/~gnewton/torngat/applet.2009.07.22/index.html
Results: Diversity of Recommendations

—  ~13% of seed articles generated recommendations for both
    bX and Sarkanto (i.e. not much overlap!)
—  Citation-based recommendations appear to be more
    semantically diverse than User-based.
Conclusions
—  Citation-based and User-based recommendations are
    complementary
—  Different kinds of data sources (users vs. citations) produce
    different kinds of (non-overlapping) results
—  Citation-based recommendations are more semantically diverse
    —  Hypothesis:“user-based recommendations may be biased by the semantic
        similarity of search-engine results”

More Related Content

What's hot

Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemMichael Bar-Sinai
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platformTim Clark
 
Overview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standardsOverview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standardsSusanna-Assunta Sansone
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse CommonsMerce Crosas
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsTim Clark
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...Open Knowledge Maps
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentsTim Clark
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsTim Clark
 

What's hot (20)

Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Sharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags systemSharing Sensitive Data With Confidence: The DataTags system
Sharing Sensitive Data With Confidence: The DataTags system
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platform
 
Overview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standardsOverview of the NIH BD2K CEDAR centre, on metadata and standards
Overview of the NIH BD2K CEDAR centre, on metadata and standards
 
Ngsp
NgspNgsp
Ngsp
 
The Dataverse Commons
The Dataverse CommonsThe Dataverse Commons
The Dataverse Commons
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology views
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...
Harnessing User Library Statistics for Research Evaluation and Knowledge Doma...
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
 

Similar to Usage-Based vs. Citation-Based Recommenders in a Digital Library

MS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptxMS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptxNimraTariq69
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de BellisFrom Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de BellisXanat V. Meza
 
Text Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - PublisherText Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - Publisherjudsondunham
 
The paper trail:steps towards a reference model for the metadata ecology
The paper trail:steps towards a reference model for the metadata ecologyThe paper trail:steps towards a reference model for the metadata ecology
The paper trail:steps towards a reference model for the metadata ecologyR. John Robertson
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
INSC580MacasaOpenSourceSoftwareLibrariesFall2016
INSC580MacasaOpenSourceSoftwareLibrariesFall2016INSC580MacasaOpenSourceSoftwareLibrariesFall2016
INSC580MacasaOpenSourceSoftwareLibrariesFall2016Michael J. Macasa
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...Marko Rodriguez
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findingsalc28
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 

Similar to Usage-Based vs. Citation-Based Recommenders in a Digital Library (20)

A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
MS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptxMS-Presentation-new template arid university.pptx
MS-Presentation-new template arid university.pptx
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de BellisFrom Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis
 
Text Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - PublisherText Mining from Three Perspectives - Publisher
Text Mining from Three Perspectives - Publisher
 
The paper trail:steps towards a reference model for the metadata ecology
The paper trail:steps towards a reference model for the metadata ecologyThe paper trail:steps towards a reference model for the metadata ecology
The paper trail:steps towards a reference model for the metadata ecology
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
New age
New ageNew age
New age
 
INSC580MacasaOpenSourceSoftwareLibrariesFall2016
INSC580MacasaOpenSourceSoftwareLibrariesFall2016INSC580MacasaOpenSourceSoftwareLibrariesFall2016
INSC580MacasaOpenSourceSoftwareLibrariesFall2016
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 
Price "KBART: improving the supply of data to link resolvers and knowledge ba...
Price "KBART: improving the supply of data to link resolvers and knowledge ba...Price "KBART: improving the supply of data to link resolvers and knowledge ba...
Price "KBART: improving the supply of data to link resolvers and knowledge ba...
 
Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...
Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...
Price "KBART: Improving the Supply of Data to Link Resolvers and Knowledge Ba...
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 

More from Andre Vellino

Why machines can't think (logically)
Why machines can't think (logically)Why machines can't think (logically)
Why machines can't think (logically)Andre Vellino
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalAndre Vellino
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocistiAndre Vellino
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical LibrarianAndre Vellino
 
La recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueLa recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueAndre Vellino
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender SystemAndre Vellino
 

More from Andre Vellino (6)

Why machines can't think (logically)
Why machines can't think (logically)Why machines can't think (logically)
Why machines can't think (logically)
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equal
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocisti
 
Mechanical Librarian
Mechanical LibrarianMechanical Librarian
Mechanical Librarian
 
La recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numériqueLa recommandation d'articles scientifiques dans une bibliothèque numérique
La recommandation d'articles scientifiques dans une bibliothèque numérique
 
Synthese Recommender System
Synthese Recommender SystemSynthese Recommender System
Synthese Recommender System
 

Recently uploaded

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Usage-Based vs. Citation-Based Recommenders in a Digital Library

  • 1. Usage-Based vs. Citation-Based Recommenders in a Digital Library André Vellino School of Information Studies  University of Ottawa blog: http://synthese.wordpress.com twitter: @vellino e-mail: avellino@uottawa.ca
  • 2. Context —  Canada Institute for Scientific and Technical Information (aka Canada’s National Science Library) —  Has a full-text digital collection (Scientific, Technical, Medical) with text-mining rights for research purposes only —  Elsevier and Springer (mostly) —  ~8M articles —  ~2800 journals —  ~ 3TB —  Plan: a Hybrid, Multi-Dimensional —  Usage-based (CF) —  Content-based (CBF) —  User-Context
  • 3. Sparsity of Usage Data is a Problem in Digital Libraries Amazon Digital Libraries Users Items Items Users ~70,000 ~ 70 M ~ 93 M ~7M
  • 4. Data is Sparse Too edges user-item graph —  Sparseness of a dataset S = total number of possible edges —  Mendeley data S = 2.66 x 10-05 —  Neflix S = 1.18 x 10-02 —  But also, Mendeley data isn’t “highly connected” —  83.6% of Mendeley articles were referenced by only 1 user —  6% of the articles were referenced by 3 or more users.
  • 5. (2009) ExLibris bX solution to data sparsity: Harvest lots usage (co-download) behaviour from world-wide SFX (Ex Libris Open URL resolver) logs and apply collaborative filtering to correlate articles. Johan Bollen and Herbert Van de Sompel. An architecture for the aggregation and analysis of scholarly usage data. (in JCDL2006)
  • 6. TechLens+ Citation-Based Recommdendation p2 References Articles p3 p5 R. Torres, S. McNee, M. Abel, J. Konstan, and J. Riedl. Enhancing Digital Libraries with TechLens+. (in JCDL 2004)
  • 7. Does “Rated” Citations w/ PageRank Help? p1 p2 p3 p4 p5 p6 p7 p8 citations p1 0.4  p2 0.5 0.4 articles p3 0.2 0.6 p4 0.7 0.5  u1 0.5 0.3 0.6  users u2 0.2 0.3   = constant Answer: Using PageRank to “rate” citations is not significantly Better than using a constant (0/1) Note: There is ongoing work w/ NRC on machine learning method for extracting “most important references” – that might help more
  • 8. Sarkanto (NRC Article Recommender) —  Uses TechLens+ strategy of replacing User-Item matrix with Article-Article matrix from citation data —  Uses TASTE recommender (now the recommendation component of Mahout) —  Is now decoupled from user-based recommender —  Compare side by side w/ ‘bX’ recommendations Try it here: http://lab.cisti-icist.nrc-cnrc.gc.ca/Sarkanto/
  • 9. Sarkanto compared w/ bX “These are articles whose co- “Users who viewed this article also citations are similar to this one.” viewed these articles.”
  • 10. Experiments —  Sarkanto generated ~ 1.9 million citation-based recommendations (statically) —  Experimental comparison done on 1886 randomly selected articles from a subset of ~ 1.2M articles (down from ~ 8M) —  Questions asked in the experiment: —  How many recommendations produced by each recommender —  Coverage (how often does a seed article generate a recommendation) —  How semantically diverse are the recommendations
  • 11. Measuring Semantic Diversity —  Question: what is the semantic distance between the source- article and the recommendations? —  In this setup it was not possible to compare the semantic distance without the full-text for both set of recommendations —  Full-text is available for the Sarkanto recommendations but not for the bX recommendations
  • 12. Journal-Journal Semantic Distance —  Concatenate the full-text of all the articles in each journal —  From a Lucene index of the full text in each journal, use Dominic Widdows’ Semantic Vectors package to create —  a term-journal matrix, —  reduced dimensionality term-vectors (512) for each journal using random projections —  Apply multidimensional scaling (MDS) in R to obtain a 2-D distance matrix (2300 x 2300) G. Newton, A. Callahan, and M. Dumontier. Semantic journal mapping for search visualization in a large scale article digital library in Second Workshop on Very Large Digital Libraries, ECDL 2009
  • 13. 2-D Journal Distance Map Colours clusters represent Journal subject headings (from publisher metadata) http://cuvier.cisti.nrc.ca/~gnewton/torngat/applet.2009.07.22/index.html
  • 14. Results: Diversity of Recommendations —  ~13% of seed articles generated recommendations for both bX and Sarkanto (i.e. not much overlap!) —  Citation-based recommendations appear to be more semantically diverse than User-based.
  • 15. Conclusions —  Citation-based and User-based recommendations are complementary —  Different kinds of data sources (users vs. citations) produce different kinds of (non-overlapping) results —  Citation-based recommendations are more semantically diverse —  Hypothesis:“user-based recommendations may be biased by the semantic similarity of search-engine results”