SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
Branch-and-bound nearest neighbor searching over
             unbalanced trie-structured overlays


                           Master’s Thesis Presentation
                           Technical University of Crete
                                     4.2.2013




Author:       Michail Argyriou
Supervisor:   Ass’t Prof. Vasilis Samoladas
P2P Evolution
2002




2001   2001               2001               2001   DHTs



2000                                         2000




1999   1999




1998
        Centralized       Semi-distributed   Fully-distributed   2
Distributed Hash Table (DHT)




                               3
DHT Frameworks Evolution
                • Rectangular queries support
                • Peers only on leaves
2003:   PGrid   • High-dimensional queries support with space filling curves




                • Height-balanced search tree limitation
2006:
         VBI
                • No height-balanced search tree limitation
                • Abstract types of data and queries
                  • Data: point, rectangular
2008:
        GRaSP     • Queries: point, 3-sided, n-d rectangular


                                                                               4
Nearest neighbor search




                          5
Given a distributed data set how can we
 find the k most similar data to a query?


     “k-Nearest Neighbor Search”



                                            6
Applications

                     Distributed
      GIS
                     Databases



  Statistical      Recommendation
 Classification        Systems



Cluster analysis   Similarity Scores
                                       7
Related Work
1. Naïve algorithm: Central peer collects data and
   performs k-NN searching
2. K-nn search algorithm over CAN
3. Distributed quad-based index  each quadtree
   block is uniquely identified by its centroid 
   mapped to Chord  k-NN search algorithm




                                                     8
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   9
GRaSP




        10
GRaSP
                      Building the trie ...
Hierarchical space partition:



        1       Peer p joins


            2     Finds a bootstrapping peer q


                Space region s(q) splits into s(q0) and
        3       s(q1)
                                                          11
GRaSP
Space Partition
              Volume-balanced




Before

                  Data-balanced




                                  13
Before
GRaSP
Space Partition for a 3-sided query




                                      14
GRaSP
Space Partition for a 3-sided query




                                      15
GRaSP
Space Partition for a 3-sided query




                                      16
GRaSP
                  Data Insertion



We insert a key k into all peers who own regions
                 that contain k




                                                   17
GRaSP
                           Routing Tables


    Each peer knows a peer in
each complementary subtrie ...




    0100 = 1
    0100 = 00
    0100 = 011
    0100 = 0101

                                            18
GRaSP
                             Routing

  “In order to route a message from peer p to peer q, the message is
forwarded from p to a neighbor peer included in a known subtrie closer
           to peer q. From r it is recursively forwarded to q.”




                                                                  19
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   20
Searching Algorithm
           Branch-and-bound algorithm



Priority queue PQ of candidate peers holding answer
  better than the k-th answer found so far  Fringe


            1. Branch Step: expand PQ
            2. Bound Step: prune PQ



                                                  21
Searching Algorithm
          Parallel Searching vs Iterative Searching




      Parallel Searching requires huge message state!

Iterative Searching prunes larger regions of the data space!




                                                               22
Searching Algorithm




                      23
Searching Algorithm
Branch-and-bound algorithm

             1?      d(q,s(1)) < d(q,a)
             00?     d(q,s(1)) > d(q,a)
             011?    d(q,s(1)) > d(q,a)
             0101?   d(q,s(1)) < d(q,a)




                                          24
Latency Complexity Theorem
                 Latency = |T|O(logn)




Support Set T:




                                        25
Latency Complexity Theorem
                                    Proof




Peers visited:


Peers in T:

                                            |T| peers


                 Find peer in the
                 complementary
                 subtrie: O(logn)



                                                        26
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   27
Performance Evaluation
Taking into account number of dimensions




Low          Medium               High



                                           28
Performance Evaluation
                   Metrics




•   Data Fairness Index
•   Latency
•   Max Throughput
•   Fringe Size (mean, max)



                               29
Low dimensions




Low   Medium       High


                          30
Low dimensions
                  Workloads

Datasets

• Greece, data-balanced partition,
  k=1/10/100
• Greece, volume-balanced partition, k=1

Querysets

• Synthetic queries
• For a network size of n peers we asked n/3
  queries
                                               31
Low dimensions
Which space partition is the best?




     Volume-          Data-
     balanced        balanced




                                     32
Low dimensions                            Data FI
                                                              vs
                                                        Space Partition
        Which space partition is the best?


Greece ...




                                                                          33
             Data-balanced partition         Volume-balanced partition
Low dimensions                               Latency
                                                              vs
                                                        Space Partition
        Which space partition is the best?


Greece, k=1 ...




                                                                          34
          Data-balanced partition            Volume-balanced partition
Low dimensions                              Fringe Size
                                                              vs
                                                        Space Partition
        Which space partition is the best?


Greece, k=1 ...




                                                                          35
          Data-balanced partition            Volume-balanced partition
Low dimensions                           Max Throughput
                                                             vs
                                                       Space Partition
        Which space partition is the best?


Greece, k=1 ...




                                                                         36
          Data-balanced partition            Volume-balanced partition
Low dimensions
Which space partition is the best?

 Volume-               Data-
 balanced             balanced




                                     37
Low dimensions
      k?




                 38
Low dimensions
                                                Fringe Size
                How is the size of the fringe       vs
                         affected?                   k




Greece, data-balanced partition ...




                                                              39
             k=1                      k=10          k=100
Low dimensions                Latency
                   How is the latency affected?      vs
                                                     k




Greece, data-balanced partition ...




                                                            40
             k=1                       k=10         k=100
Low dimensions                Max Throughput
           How is the Max. Throughput affected?         vs
                                                        k




Greece, data-balanced partition ...




                                                                   41
             k=1                      k=10              k=100
Low dimensions … efficient routing!




                                  42
Medium dimensions




Low    Medium        High


                            43
Medium dimensions
                 Workloads

Datasets

• Uniform, volume-balanced partition, k=1
• ColorMoments, data-balanced partition,
  k=1

Querysets

• Synthetic queries
• For a network size of n peers we asked
  n/3 queries
                                            44
Medium dimensions
How is the size of the fringe
         affected?




                                45
Medium dimensions
How is the size of the fringe
         affected?




                                       46
    ColorMoments, data-balanced, k=1
Medium dimensions
         How is the size of the fringe
                  affected?

             Uniform, volume-balanced, k=1




Mean Fringe Size                             Max. Fringe Size
                                                                47
Medium dimensions
   Data Fairness Index




                         48
Medium dimensions
   Data Fairness Index




                                     49
  ColorMoments, data-balanced, k=1
Medium dimensions
   Data Fairness Index




                                  50
  Uniform, volume-balanced, k=1
Medium dimensions
      Latency




                    51
Medium dimensions
          Latency




                                     52
  ColorMoments, data-balanced, k=1
Medium dimensions
           Latency




                                  53
  Uniform, volume-balanced, k=1
Medium dimensions
                 Latency



Latency is high but near to the optimum!




                                      54
Medium dimensions
   Max. Throughput




                     55
Medium dimensions
    Max. Throughput




                                     56
  ColorMoments, data-balanced, k=1
Medium dimensions
     Max. Throughput




                                  57
  Uniform, volume-balanced, k=1
Medium dimensions … not efficient
    routing but near optimum!

It's still good enough for practical
             applications!


                                     58
High dimensions




Low    Medium       High


                           59
High dimensions
                 Curse of dimensionality




          “When the dimensionality increases,
                 the volume of the space
increases so fast that the available data becomes sparse.”




                                                       60
Contents

GRaSP

              k-NN
 Evaluation
                     Conclusions




                                   61
Conclusions

                API

Searching                    Data
                 Trie
 (k-NN)                    Ins/Rem

 Query                      Space
             Data Types
 Types                     Partition

            Metric Space
                                       62
Future Work
 Approximate k-NN
 searching for high
    dimensions




Redundancy


                      63
THANK YOU
 QUESTIONS ?




               64

Más contenido relacionado

La actualidad más candente

D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDan Han
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization太一郎 遠藤
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Universitat Politècnica de Catalunya
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering methodrajshreemuthiah
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnnTaeoh Kim
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNNanna8885
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionNAVER Engineering
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingTakuma Wakamori
 

La actualidad más candente (20)

D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Design Pattern of HBase Configuration
Design Pattern of HBase ConfigurationDesign Pattern of HBase Configuration
Design Pattern of HBase Configuration
 
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Grid based method & model based clustering method
Grid based method & model based clustering methodGrid based method & model based clustering method
Grid based method & model based clustering method
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
WaveNet
WaveNetWaveNet
WaveNet
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
 

Similar a Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays

An introduction to similarity search and k-nn graphs
An introduction to similarity search and k-nn graphsAn introduction to similarity search and k-nn graphs
An introduction to similarity search and k-nn graphsThibault Debatty
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Intel® Software
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering
 
Bichromatic Reverse Nearest Neighbours
Bichromatic Reverse Nearest NeighboursBichromatic Reverse Nearest Neighbours
Bichromatic Reverse Nearest NeighboursJessie_N
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...ssuser2624f71
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpAdrian Ziegler
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15Karen Pao
 
A fitness landscape analysis of the Travelling Thief Problem
A fitness landscape analysis of the Travelling Thief ProblemA fitness landscape analysis of the Travelling Thief Problem
A fitness landscape analysis of the Travelling Thief ProblemMehdi EL KRARI
 
Design and analysis of distributed k-nearest neighbors graph algorithms
Design and analysis of distributed k-nearest neighbors graph algorithmsDesign and analysis of distributed k-nearest neighbors graph algorithms
Design and analysis of distributed k-nearest neighbors graph algorithmsThibault Debatty
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsMathias Niepert
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...KamleshKumar394
 
Spectral cnn
Spectral cnnSpectral cnn
Spectral cnnBrian Kim
 

Similar a Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays (20)

An introduction to similarity search and k-nn graphs
An introduction to similarity search and k-nn graphsAn introduction to similarity search and k-nn graphs
An introduction to similarity search and k-nn graphs
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Bichromatic Reverse Nearest Neighbours
Bichromatic Reverse Nearest NeighboursBichromatic Reverse Nearest Neighbours
Bichromatic Reverse Nearest Neighbours
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 
Dp idp exploredb
Dp idp exploredbDp idp exploredb
Dp idp exploredb
 
Adams_SIAMCSE15
Adams_SIAMCSE15Adams_SIAMCSE15
Adams_SIAMCSE15
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
A fitness landscape analysis of the Travelling Thief Problem
A fitness landscape analysis of the Travelling Thief ProblemA fitness landscape analysis of the Travelling Thief Problem
A fitness landscape analysis of the Travelling Thief Problem
 
Design and analysis of distributed k-nearest neighbors graph algorithms
Design and analysis of distributed k-nearest neighbors graph algorithmsDesign and analysis of distributed k-nearest neighbors graph algorithms
Design and analysis of distributed k-nearest neighbors graph algorithms
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
Spectral cnn
Spectral cnnSpectral cnn
Spectral cnn
 
DDBMS
DDBMSDDBMS
DDBMS
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Branch and-bound nearest neighbor searching over unbalanced trie-structured overlays

  • 1. Branch-and-bound nearest neighbor searching over unbalanced trie-structured overlays Master’s Thesis Presentation Technical University of Crete 4.2.2013 Author: Michail Argyriou Supervisor: Ass’t Prof. Vasilis Samoladas
  • 2. P2P Evolution 2002 2001 2001 2001 2001 DHTs 2000 2000 1999 1999 1998 Centralized Semi-distributed Fully-distributed 2
  • 4. DHT Frameworks Evolution • Rectangular queries support • Peers only on leaves 2003: PGrid • High-dimensional queries support with space filling curves • Height-balanced search tree limitation 2006: VBI • No height-balanced search tree limitation • Abstract types of data and queries • Data: point, rectangular 2008: GRaSP • Queries: point, 3-sided, n-d rectangular 4
  • 6. Given a distributed data set how can we find the k most similar data to a query? “k-Nearest Neighbor Search” 6
  • 7. Applications Distributed GIS Databases Statistical Recommendation Classification Systems Cluster analysis Similarity Scores 7
  • 8. Related Work 1. Naïve algorithm: Central peer collects data and performs k-NN searching 2. K-nn search algorithm over CAN 3. Distributed quad-based index  each quadtree block is uniquely identified by its centroid  mapped to Chord  k-NN search algorithm 8
  • 9. Contents GRaSP k-NN Evaluation Conclusions 9
  • 10. GRaSP 10
  • 11. GRaSP Building the trie ... Hierarchical space partition: 1 Peer p joins 2 Finds a bootstrapping peer q Space region s(q) splits into s(q0) and 3 s(q1) 11
  • 12. GRaSP Space Partition Volume-balanced Before Data-balanced 13 Before
  • 13. GRaSP Space Partition for a 3-sided query 14
  • 14. GRaSP Space Partition for a 3-sided query 15
  • 15. GRaSP Space Partition for a 3-sided query 16
  • 16. GRaSP Data Insertion We insert a key k into all peers who own regions that contain k 17
  • 17. GRaSP Routing Tables Each peer knows a peer in each complementary subtrie ... 0100 = 1 0100 = 00 0100 = 011 0100 = 0101 18
  • 18. GRaSP Routing “In order to route a message from peer p to peer q, the message is forwarded from p to a neighbor peer included in a known subtrie closer to peer q. From r it is recursively forwarded to q.” 19
  • 19. Contents GRaSP k-NN Evaluation Conclusions 20
  • 20. Searching Algorithm Branch-and-bound algorithm Priority queue PQ of candidate peers holding answer better than the k-th answer found so far  Fringe 1. Branch Step: expand PQ 2. Bound Step: prune PQ 21
  • 21. Searching Algorithm Parallel Searching vs Iterative Searching Parallel Searching requires huge message state! Iterative Searching prunes larger regions of the data space! 22
  • 23. Searching Algorithm Branch-and-bound algorithm 1? d(q,s(1)) < d(q,a) 00? d(q,s(1)) > d(q,a) 011? d(q,s(1)) > d(q,a) 0101? d(q,s(1)) < d(q,a) 24
  • 24. Latency Complexity Theorem Latency = |T|O(logn) Support Set T: 25
  • 25. Latency Complexity Theorem Proof Peers visited: Peers in T: |T| peers Find peer in the complementary subtrie: O(logn) 26
  • 26. Contents GRaSP k-NN Evaluation Conclusions 27
  • 27. Performance Evaluation Taking into account number of dimensions Low Medium High 28
  • 28. Performance Evaluation Metrics • Data Fairness Index • Latency • Max Throughput • Fringe Size (mean, max) 29
  • 29. Low dimensions Low Medium High 30
  • 30. Low dimensions Workloads Datasets • Greece, data-balanced partition, k=1/10/100 • Greece, volume-balanced partition, k=1 Querysets • Synthetic queries • For a network size of n peers we asked n/3 queries 31
  • 31. Low dimensions Which space partition is the best? Volume- Data- balanced balanced 32
  • 32. Low dimensions Data FI vs Space Partition Which space partition is the best? Greece ... 33 Data-balanced partition Volume-balanced partition
  • 33. Low dimensions Latency vs Space Partition Which space partition is the best? Greece, k=1 ... 34 Data-balanced partition Volume-balanced partition
  • 34. Low dimensions Fringe Size vs Space Partition Which space partition is the best? Greece, k=1 ... 35 Data-balanced partition Volume-balanced partition
  • 35. Low dimensions Max Throughput vs Space Partition Which space partition is the best? Greece, k=1 ... 36 Data-balanced partition Volume-balanced partition
  • 36. Low dimensions Which space partition is the best? Volume- Data- balanced balanced 37
  • 37. Low dimensions k? 38
  • 38. Low dimensions Fringe Size How is the size of the fringe vs affected? k Greece, data-balanced partition ... 39 k=1 k=10 k=100
  • 39. Low dimensions Latency How is the latency affected? vs k Greece, data-balanced partition ... 40 k=1 k=10 k=100
  • 40. Low dimensions Max Throughput How is the Max. Throughput affected? vs k Greece, data-balanced partition ... 41 k=1 k=10 k=100
  • 41. Low dimensions … efficient routing! 42
  • 42. Medium dimensions Low Medium High 43
  • 43. Medium dimensions Workloads Datasets • Uniform, volume-balanced partition, k=1 • ColorMoments, data-balanced partition, k=1 Querysets • Synthetic queries • For a network size of n peers we asked n/3 queries 44
  • 44. Medium dimensions How is the size of the fringe affected? 45
  • 45. Medium dimensions How is the size of the fringe affected? 46 ColorMoments, data-balanced, k=1
  • 46. Medium dimensions How is the size of the fringe affected? Uniform, volume-balanced, k=1 Mean Fringe Size Max. Fringe Size 47
  • 47. Medium dimensions Data Fairness Index 48
  • 48. Medium dimensions Data Fairness Index 49 ColorMoments, data-balanced, k=1
  • 49. Medium dimensions Data Fairness Index 50 Uniform, volume-balanced, k=1
  • 50. Medium dimensions Latency 51
  • 51. Medium dimensions Latency 52 ColorMoments, data-balanced, k=1
  • 52. Medium dimensions Latency 53 Uniform, volume-balanced, k=1
  • 53. Medium dimensions Latency Latency is high but near to the optimum! 54
  • 54. Medium dimensions Max. Throughput 55
  • 55. Medium dimensions Max. Throughput 56 ColorMoments, data-balanced, k=1
  • 56. Medium dimensions Max. Throughput 57 Uniform, volume-balanced, k=1
  • 57. Medium dimensions … not efficient routing but near optimum! It's still good enough for practical applications! 58
  • 58. High dimensions Low Medium High 59
  • 59. High dimensions Curse of dimensionality “When the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse.” 60
  • 60. Contents GRaSP k-NN Evaluation Conclusions 61
  • 61. Conclusions API Searching Data Trie (k-NN) Ins/Rem Query Space Data Types Types Partition Metric Space 62
  • 62. Future Work Approximate k-NN searching for high dimensions Redundancy 63