SlideShare una empresa de Scribd logo
1 de 41
Near-Neighbor Search Applications Matrix Formulation Minhashing
Example Problem  --- Face Recognition ,[object Object],[object Object],[object Object]
Face Recognition --- (2) ,[object Object],[object Object]
Face Recognition --- (3) ,[object Object],[object Object]
Simple Solution ,[object Object],[object Object],[object Object],[object Object]
Multidimensional Indexes Don’t Work New face: [6,14,…] 0-4 5-9 10-14 . . . Dimension 1 = Surely we’d better look here. Maybe look here too, in case of a slight error. But the first dimension could be one of those that is not close.  So we’d better look everywhere!
Another Problem : Entity Resolution ,[object Object],[object Object],[object Object],[object Object]
Entity Resolution --- (2) ,[object Object],[object Object],[object Object],[object Object]
Simple Solution ,[object Object],[object Object],[object Object],[object Object],[object Object]
Yet Another Problem : Finding Similar Documents ,[object Object],[object Object]
Complexity of Document Similarity ,[object Object],[object Object],[object Object]
Complexity --- (2) ,[object Object],[object Object]
Representing Documents for Similarity Search ,[object Object],[object Object],[object Object],[object Object]
Shingles ,[object Object],[object Object],[object Object]
Shingles:  Aside ,[object Object],[object Object],[object Object]
Shingles:  Compression Option ,[object Object],[object Object],[object Object]
MinHashing Data as Sparse Matrices Jaccard Similarity Measure Constructing Signatures
Roadmap Boolean matrices Market baskets Other apps Signatures Locality- Sensitive Hashing Minhashing Face- recognition Entity- resolution Other apps Shingling Documents
Boolean Matrix Representation ,[object Object],[object Object],[object Object],[object Object]
Matrix Representation of Item/Basket Data ,[object Object],[object Object],[object Object],[object Object]
In Matrix Form ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Documents in Matrix Form ,[object Object],[object Object],[object Object],[object Object]
Aside ,[object Object],[object Object],[object Object],[object Object]
Assumptions ,[object Object],[object Object],[object Object],[object Object]
Similarity of Columns ,[object Object],[object Object],[object Object]
Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* * * * * * *
Outline of Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object]
Warnings ,[object Object],[object Object],[object Object]
Signatures ,[object Object],[object Object],[object Object]
An Idea That Doesn’t Work ,[object Object],[object Object]
Four Types of Rows ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Minhashing ,[object Object],[object Object],[object Object]
Minhashing Example Input matrix  0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1  5 2 1 6 7 4 3 Signature matrix  M 1 2 1 2 5 7 6 3 1 2 4 1 4 1 2 4 5 2 6 7 3 1 2 1 2 1
Surprising Property ,[object Object],[object Object],[object Object],[object Object],[object Object]
Similarity for Signatures ,[object Object],[object Object]
Min Hashing – Example Similarities: 1-3  2-4  1-2  3-4 Col/Col 0.75  0.75  0  0 Sig/Sig 0.67  1.00  0  0 Input matrix 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1  5 2 1 6 7 4 3 Signature matrix  M 1 2 1 2 5 7 6 3 1 2 4 1 4 1 2 4 5 2 6 7 3 1 2 1 2 1
Minhash Signatures ,[object Object],[object Object],[object Object]
Implementation --- (1) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Implementation --- (2) ,[object Object],[object Object]
Implementation --- (3) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example Row C1 C2 1  1  0 2  0  1 3  1  1 4  1  0 5  0  1 h ( x ) =  x  mod 5 g ( x ) = 2 x +1 mod 5 h (1) = 1 1 - g (1) = 3 3 - h (2) = 2 1 2 g (2) = 0 3 0 h (3) = 3 1 2 g (3) = 2 2 0 h (4) = 4 1 2 g (4) = 4 2 0 h (5) = 0 1 0 g (5) = 1 2 0 Sig1 Sig2

Más contenido relacionado

La actualidad más candente

Arrays In General
Arrays In GeneralArrays In General
Arrays In General
martha leon
 
Arrays and library functions
Arrays and library functionsArrays and library functions
Arrays and library functions
Swarup Kumar Boro
 
String Handling in c++
String Handling in c++String Handling in c++
String Handling in c++
Fahim Adil
 

La actualidad más candente (20)

Managing I/O & String function in C
Managing I/O & String function in CManaging I/O & String function in C
Managing I/O & String function in C
 
Strings
StringsStrings
Strings
 
Monad Fact #2
Monad Fact #2Monad Fact #2
Monad Fact #2
 
Go Course Day2
Go Course Day2Go Course Day2
Go Course Day2
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
Arrays In General
Arrays In GeneralArrays In General
Arrays In General
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Arrays and library functions
Arrays and library functionsArrays and library functions
Arrays and library functions
 
The string class
The string classThe string class
The string class
 
Character Array and String
Character Array and StringCharacter Array and String
Character Array and String
 
Module 5-Structure and Union
Module 5-Structure and UnionModule 5-Structure and Union
Module 5-Structure and Union
 
Strings IN C
Strings IN CStrings IN C
Strings IN C
 
string in C
string in Cstring in C
string in C
 
strings
stringsstrings
strings
 
String Handling in c++
String Handling in c++String Handling in c++
String Handling in c++
 
C programming - String
C programming - StringC programming - String
C programming - String
 
C string
C stringC string
C string
 
14 ruby strings
14 ruby strings14 ruby strings
14 ruby strings
 
Strings
StringsStrings
Strings
 

Destacado

Assigment 2 group 9
Assigment 2 group 9Assigment 2 group 9
Assigment 2 group 9
javerde
 
Airport Development Overview
Airport Development OverviewAirport Development Overview
Airport Development Overview
Rithesh Swamy
 
Wiki linguistics
Wiki linguisticsWiki linguistics
Wiki linguistics
mattriley
 
Amendment To State Support Agreement
Amendment To State Support AgreementAmendment To State Support Agreement
Amendment To State Support Agreement
Rithesh Swamy
 

Destacado (17)

Princeton Community Works 16 - Social Media
Princeton Community Works 16 - Social MediaPrinceton Community Works 16 - Social Media
Princeton Community Works 16 - Social Media
 
Assigment 2 group 9
Assigment 2 group 9Assigment 2 group 9
Assigment 2 group 9
 
Cs345 cl
Cs345 clCs345 cl
Cs345 cl
 
Airport Development Overview
Airport Development OverviewAirport Development Overview
Airport Development Overview
 
Lecture20
Lecture20Lecture20
Lecture20
 
Social Media Master Class for Artists
Social Media Master Class for ArtistsSocial Media Master Class for Artists
Social Media Master Class for Artists
 
Schizophrenia
SchizophreniaSchizophrenia
Schizophrenia
 
Tipe Reading
Tipe ReadingTipe Reading
Tipe Reading
 
V2 groop presentation with audio
V2   groop presentation with audio V2   groop presentation with audio
V2 groop presentation with audio
 
Wiki linguistics
Wiki linguisticsWiki linguistics
Wiki linguistics
 
Amendment To State Support Agreement
Amendment To State Support AgreementAmendment To State Support Agreement
Amendment To State Support Agreement
 
Building Maintainable Android Apps (DroidCon NYC 2014)
Building Maintainable Android Apps (DroidCon NYC 2014)Building Maintainable Android Apps (DroidCon NYC 2014)
Building Maintainable Android Apps (DroidCon NYC 2014)
 
Tips Of Presentation.
Tips Of Presentation.Tips Of Presentation.
Tips Of Presentation.
 
The Fab Four: Beginning Social Media (Twitter, Facebook, Google+, LinkedIn)
The Fab Four: Beginning Social Media (Twitter, Facebook, Google+, LinkedIn)The Fab Four: Beginning Social Media (Twitter, Facebook, Google+, LinkedIn)
The Fab Four: Beginning Social Media (Twitter, Facebook, Google+, LinkedIn)
 
Biblical Worldview
Biblical  WorldviewBiblical  Worldview
Biblical Worldview
 
Tips Structure
Tips StructureTips Structure
Tips Structure
 
Steering 01 10
Steering 01 10Steering 01 10
Steering 01 10
 

Similar a Nn

Lectures r-graphics
Lectures r-graphicsLectures r-graphics
Lectures r-graphics
etyca
 
Real World Haskell: Lecture 2
Real World Haskell: Lecture 2Real World Haskell: Lecture 2
Real World Haskell: Lecture 2
Bryan O'Sullivan
 
Type header file in c++ and its function
Type header file in c++ and its functionType header file in c++ and its function
Type header file in c++ and its function
Frankie Jones
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
sonu sharma
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
Kgr Sushmitha
 

Similar a Nn (20)

similarity1 (6).ppt
similarity1 (6).pptsimilarity1 (6).ppt
similarity1 (6).ppt
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
 
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
Дмитрий Селиванов, OK.RU. Finding Similar Items in high-dimensional spaces: L...
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Data Mining Lecture_6.pptx
Data Mining Lecture_6.pptxData Mining Lecture_6.pptx
Data Mining Lecture_6.pptx
 
I1
I1I1
I1
 
Programming Hp33s talk v3
Programming Hp33s talk v3Programming Hp33s talk v3
Programming Hp33s talk v3
 
Lectures r-graphics
Lectures r-graphicsLectures r-graphics
Lectures r-graphics
 
Real World Haskell: Lecture 2
Real World Haskell: Lecture 2Real World Haskell: Lecture 2
Real World Haskell: Lecture 2
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Type header file in c++ and its function
Type header file in c++ and its functionType header file in c++ and its function
Type header file in c++ and its function
 
Lecft3data
Lecft3dataLecft3data
Lecft3data
 
Hashing
HashingHashing
Hashing
 
CPP Homework Help
CPP Homework HelpCPP Homework Help
CPP Homework Help
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
Mining of massive datasets
Mining of massive datasetsMining of massive datasets
Mining of massive datasets
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 
C Interview Questions for Fresher
C Interview Questions for FresherC Interview Questions for Fresher
C Interview Questions for Fresher
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
 

Nn

  • 1. Near-Neighbor Search Applications Matrix Formulation Minhashing
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Multidimensional Indexes Don’t Work New face: [6,14,…] 0-4 5-9 10-14 . . . Dimension 1 = Surely we’d better look here. Maybe look here too, in case of a slight error. But the first dimension could be one of those that is not close. So we’d better look everywhere!
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. MinHashing Data as Sparse Matrices Jaccard Similarity Measure Constructing Signatures
  • 18. Roadmap Boolean matrices Market baskets Other apps Signatures Locality- Sensitive Hashing Minhashing Face- recognition Entity- resolution Other apps Shingling Documents
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. Minhashing Example Input matrix 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 5 2 1 6 7 4 3 Signature matrix M 1 2 1 2 5 7 6 3 1 2 4 1 4 1 2 4 5 2 6 7 3 1 2 1 2 1
  • 34.
  • 35.
  • 36. Min Hashing – Example Similarities: 1-3 2-4 1-2 3-4 Col/Col 0.75 0.75 0 0 Sig/Sig 0.67 1.00 0 0 Input matrix 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 5 2 1 6 7 4 3 Signature matrix M 1 2 1 2 5 7 6 3 1 2 4 1 4 1 2 4 5 2 6 7 3 1 2 1 2 1
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Example Row C1 C2 1 1 0 2 0 1 3 1 1 4 1 0 5 0 1 h ( x ) = x mod 5 g ( x ) = 2 x +1 mod 5 h (1) = 1 1 - g (1) = 3 3 - h (2) = 2 1 2 g (2) = 0 3 0 h (3) = 3 1 2 g (3) = 2 2 0 h (4) = 4 1 2 g (4) = 4 2 0 h (5) = 0 1 0 g (5) = 1 2 0 Sig1 Sig2