SlideShare a Scribd company logo
1 of 12
Download to read offline
Vector Space Model & Lantent Semantic
              Indexing

              Ryan Reck


           November 18, 2008
1 Introduction

2 Vector Space Model

3 Lantent Semantic Indexing

4 Applications of VSM & LSI

5 Comparison: VSM vs. LSI

6 Conclusion

7 References
Introduction
What are VSM & LSI?




    VSM & LSI are techniques from information retrievel for managing
    documents based on their content.
Vector Space Model




      Models documents as a vector in a multi-dimensional space.
      Similar documents are closer together, angle between vectors
      can be interpretted as similarity of two documents.
      Queries are translated into the vector space, and the nearest
      documents (point in space, or vector angle) are the desired
      documents.
      Originated from the SMART Information Retrieval project at
      Cornell University. First published paper in 1975 [2].
Vector Space Model
Example




          doc1 =< tf1 , tf2 , tf3 , . . . , tfn >
          doc2 =< tf1 , tf2 , tf3 , . . . , tfn >
          sim(doc1 , doc2 ) = cos(θ) = v0 · v1
Vector Space Model
Calcuating Term Weights




         VSM introduced the Term Frequency - Inverse Document
         Frequency method of calculating term weights.
         TF-IDF gives greater weight to less common terms, and less
         weight to common ones, since rare terms will better
         distinguish documents than common terms.
                              |D|
         Wf ,d = tft · log ( |t∈d|
Lantent Semantic Indexing




      Built off of Vector Space Model.
      Extracts concepts from the term-document matrix.
          Combines corelated dimensions into a single aggrgate
          dimension.
      This allows the documents to be indexed by concept instead
      of simple terms.
Lantent Semantic Indexing
Example




    Good Example
    {computer , laptop} − >      {1.2 ∗ computer + 0.9 ∗ laptop}

    Realistic Example
    {computer , elevator } − >    {1.2 ∗ computer + 0.9 ∗ elevator }
Applications of VSM & LSI




      VSM, or variations of it, are almost universal.
      Search Engines
          Apache Lucene
Comparison: VSM vs. LSI


  Advantages of LSI

      Handles synonymy and polysemy directly
      Can match documents using differing vocabularies.
      Can even match across different languages, after some
      translated documents have been handled[1].


  Advantages of VSM

      Much simpler, but still performs well
      Handles new documents more easily, LSI’s dimension
      reduction can cause problems with this.
Conslusion




   VSM and LSI are both good ways to index and compare
   documents. VSM is pretty basic but still gets the job done. LSI
   provides a more complex system, but it can do a very good job,
   even under extreme circumstances, like multi-language datasets.
Refeences

      Dumais, S. T., Letsche, T. A., Littman, M. L., and
      Landauer, T. K.
      Automatic cross-language retrieval using latent semantic
      indexing.
      In AAAI Symposium on CrossLanguage Text and Speech
      Retrieval. American Association for Artificial Intelligence,
      March 1997. (March 1997).
      Salton, G., Wong, A., and Yang, C. S.
      A vector space model for automatic indexing.
      Commun. ACM 18, 11 (1975), 613–620.
      Latent semantic indexing, 2008.
      http://en.wikipedia.com/wiki/Latent semantic indexing.
      Vector space model, 2008.
      http://en.wikipedia.com/wiki/Vector space model.

More Related Content

What's hot

The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspectiveankurpandeyinfo
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological dataKrish_ver2
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networksswapnac12
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
 
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON DatatypePapers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON DatatypeMax Klymyshyn
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed CoordinationLuis Galárraga
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsSharath TS
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...Sharath TS
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...Madan Golla
 
xSDN - An Expressive Simulator for Dynamic Network Flows
xSDN - An Expressive Simulator for Dynamic Network FlowsxSDN - An Expressive Simulator for Dynamic Network Flows
xSDN - An Expressive Simulator for Dynamic Network FlowsPradeeban Kathiravelu, Ph.D.
 
Strings in c langauge
Strings in c langaugeStrings in c langauge
Strings in c langaugeYash Thakkar
 

What's hot (19)

The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
4 Cliques Clusters
4 Cliques Clusters4 Cliques Clusters
4 Cliques Clusters
 
5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data5.4 mining sequence patterns in biological data
5.4 mining sequence patterns in biological data
 
Clustering ppt
Clustering pptClustering ppt
Clustering ppt
 
P229 godfrey
P229 godfreyP229 godfrey
P229 godfrey
 
Advanced topics in artificial neural networks
Advanced topics in artificial neural networksAdvanced topics in artificial neural networks
Advanced topics in artificial neural networks
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON DatatypePapers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
Papers We Love Kyiv, July 2018: A Conflict-Free Replicated JSON Datatype
 
Distributed Coordination
Distributed CoordinationDistributed Coordination
Distributed Coordination
 
Svv
SvvSvv
Svv
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
15 82-87
15 82-8715 82-87
15 82-87
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
xSDN - An Expressive Simulator for Dynamic Network Flows
xSDN - An Expressive Simulator for Dynamic Network FlowsxSDN - An Expressive Simulator for Dynamic Network Flows
xSDN - An Expressive Simulator for Dynamic Network Flows
 
Strings in c langauge
Strings in c langaugeStrings in c langauge
Strings in c langauge
 

Viewers also liked

Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGGeorge Simov
 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learningSanjib Basak
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"sandinmyjoints
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec👋 Christopher Moody
 

Viewers also liked (7)

Topic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and ApplicationsTopic Modelling: Tutorial on Usage and Applications
Topic Modelling: Tutorial on Usage and Applications
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learning
 
An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"An Introduction to gensim: "Topic Modelling for Humans"
An Introduction to gensim: "Topic Modelling for Humans"
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 

Similar to Vsm lsi

Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachFindwise
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationPaul Houle
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationRichard Littauer
 

Similar to Vsm lsi (20)

Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 
L0261075078
L0261075078L0261075078
L0261075078
 
L0261075078
L0261075078L0261075078
L0261075078
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly Information
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Vsm lsi

  • 1. Vector Space Model & Lantent Semantic Indexing Ryan Reck November 18, 2008
  • 2. 1 Introduction 2 Vector Space Model 3 Lantent Semantic Indexing 4 Applications of VSM & LSI 5 Comparison: VSM vs. LSI 6 Conclusion 7 References
  • 3. Introduction What are VSM & LSI? VSM & LSI are techniques from information retrievel for managing documents based on their content.
  • 4. Vector Space Model Models documents as a vector in a multi-dimensional space. Similar documents are closer together, angle between vectors can be interpretted as similarity of two documents. Queries are translated into the vector space, and the nearest documents (point in space, or vector angle) are the desired documents. Originated from the SMART Information Retrieval project at Cornell University. First published paper in 1975 [2].
  • 5. Vector Space Model Example doc1 =< tf1 , tf2 , tf3 , . . . , tfn > doc2 =< tf1 , tf2 , tf3 , . . . , tfn > sim(doc1 , doc2 ) = cos(θ) = v0 · v1
  • 6. Vector Space Model Calcuating Term Weights VSM introduced the Term Frequency - Inverse Document Frequency method of calculating term weights. TF-IDF gives greater weight to less common terms, and less weight to common ones, since rare terms will better distinguish documents than common terms. |D| Wf ,d = tft · log ( |t∈d|
  • 7. Lantent Semantic Indexing Built off of Vector Space Model. Extracts concepts from the term-document matrix. Combines corelated dimensions into a single aggrgate dimension. This allows the documents to be indexed by concept instead of simple terms.
  • 8. Lantent Semantic Indexing Example Good Example {computer , laptop} − > {1.2 ∗ computer + 0.9 ∗ laptop} Realistic Example {computer , elevator } − > {1.2 ∗ computer + 0.9 ∗ elevator }
  • 9. Applications of VSM & LSI VSM, or variations of it, are almost universal. Search Engines Apache Lucene
  • 10. Comparison: VSM vs. LSI Advantages of LSI Handles synonymy and polysemy directly Can match documents using differing vocabularies. Can even match across different languages, after some translated documents have been handled[1]. Advantages of VSM Much simpler, but still performs well Handles new documents more easily, LSI’s dimension reduction can cause problems with this.
  • 11. Conslusion VSM and LSI are both good ways to index and compare documents. VSM is pretty basic but still gets the job done. LSI provides a more complex system, but it can do a very good job, even under extreme circumstances, like multi-language datasets.
  • 12. Refeences Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. Automatic cross-language retrieval using latent semantic indexing. In AAAI Symposium on CrossLanguage Text and Speech Retrieval. American Association for Artificial Intelligence, March 1997. (March 1997). Salton, G., Wong, A., and Yang, C. S. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620. Latent semantic indexing, 2008. http://en.wikipedia.com/wiki/Latent semantic indexing. Vector space model, 2008. http://en.wikipedia.com/wiki/Vector space model.