SlideShare a Scribd company logo
1 of 23
Download to read offline
MANIFOLDS IN SEMI-SUPERVISED LEARNING
Monojit Basu
Director, TechYugadi IT Solutions & Consulting, Bangalore
EXTENDED
2
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
3
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
4
Semi-supervized Learning: Overview
● Training Samples consist of data with and without class label
● Images with and without captions
● Text with and without tags, ..
● Model is built with both labeled and unlabeled data
Prob(y|x) Prob(x)
● Smoothness Property: If two data points are close, their labels
should be similar
Label Data
Based on labeled samples Based on both labeled
and unlabeled samples
5
Graph-based Algorithms For SSL
● There are many many ways of exploiting smoothness property
● A simplistic baseline approach is self-training (not graph-based)
● Graph-based Algorithms are particularly effective
● Label Propagation
● Random-Walk
● Min-Cut
● Density-based Distances
● Local and Global Consistency
● Using Graph Kernels, ..
6
Label Propagation
● Generates a weighted graph where edges between similar
neighbours have higher weights (Zhu and Ghahramani, 2002)
● Defines a transition matrix:
● Tij = probability of node i ‘jumping’ into node j, that is, taking up j’s label
● Repeatedly multiplies the current label matrix with the transition
matrix (which itself gets updated)
● Until labels on all nodes stabilize (convergence)
● In effect labels propagate from labeled to unlabeled nodes
1
1
1
00
0 unlabeled
7
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
8
Manifold Structures
● Data (nodes) are distributed over low and high density regions
● Two nodes that are geometrically close may not be similar
● Or equivalently, the geometry / distance measure should be redefined
● Euclidean distances and weights based on them may not work
● Such data is said to lie on a manifold
● Although not necessary, manifold structures are often
observed with high-dimensional data
● More complex scenario: data may not lie on a single manifold
● This is called multi-manifold structure
9
Single Manifold Structures
SWISS ROLL TWO MOONS
10
Multi-manifold Structures
$
Dollar Symbol
Surface Sphere
11
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
12
Manifold Regularization
● This is the technical term for semi-supervized classification of
data distributed on a (single) manifold (Belkin et al., 2006)
● Key is to establish connectivity between similar nodes by
staying along a high-density region
● Mathematically it involves
● Computing a matrix L derived from the ordinary weight matrix W
● Taking the top n eigenvalues of L
● Computing an indicator function using the dot product of a data point
with the eigenvalues
● It is based on a theory known as Kernel Hilbert Spaces
13
Maniford Regularization (Schematic)
DATA
W
L=D-W
Eigen(L)
dotxData Point >0
+ve
-ve
CLASS LABELS
14
Multi-manifold Regularization
● This is the technical term for semi-supervized classification of
data distributed on a multi-manifold (Goldberg et al., 2009)
● Single manifold algorithm still starts with Euclidean distances,
but reformulates steps based on the derived matrix L
● Multi-manifold algorithm straight away changes distance
metrics
● It is based on Hellinger distances H, and
● A Mahalnabis k-nearest neighbor graph computed from H
● Complete algorithm is much longer, involving spectral
clustering and self-training on each cluster
15
Multi-manifold Regularization (Schematic)
DATA
Σs
Sample Cov. Mat.
H
kNN graph
Spectral Clustering
Self-trained Clusters
16
Multi-view Semi-supervised Learning
● Multi-view learning involves two or more independent
projections for each data point
● Classic Example: web-page classification using
● Bag of words
● Links to other web-pages
● Instead of representing data as (X, y) where y is class label, it
may be represented as (X1, X2, y), where Xi are views
● Somewhat related to multimodal learning (like video and
audio)
17
Multi-view Manifold Regularization
● Can manifold regularization be extended to multi-view data
● Yes, algorithms exist, based on strong mathematical
foundations, like Sindhwani and Rosenberg, 2008
● There is actually a generic pattern for multi-view semi-
supervized learning, called co-training
● Sindhwani et al., extends co-training with an algorithm called
co-regularization
● It reduces the problem to a convex optimization to minimize a
loss function
● The total loss function depends on individual class predictors
for each view, and a couple of regularization hyperparameters
18
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
19
Python Implementation
● An implementation of some of these algorithms in Python 3.x is
published on github:
https://github.com/techyugadi/manifold_ssl
● These algorithms offer an interface similar to scikit-learn
● There are some programs to generate synthetic data and also
use the MNIST handwritten digits data
● Note: scikit-learn as of now supports only label propagation
algorithm for semi-supervized learning
● R package has more algorithms but not maifold regularization
● This is early-access release, more algorithms to be published !
20
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
21
Summary
● Manifold regularization is an improvement over the standard
label propagation algorithm for semi-supervised learning
● It may lead to better results when data is distributed over a
manifold or multi-manifold
● This class of algorithms cover a wide range of scenarios,
including multi-view datasets
● These algorithms can be implemented in Python using
common numpy and linear algebra packages (see github)
22
References
● Zhu and Ghahramani, 2002: Learning from Labeled and
Unlabeled Data with Label Propagation
● Belkin, Niyogi and Sindhwani, 2006: Manifold Regularization:
A Geometric Framework for Learning from Labeled and
Unlabeled Examples
● Sindhwani and Rosenberg, 2008: An RKHS for Multi-View
Learning and Manifold Co-Regularization
● Goldberg, Zhu, Singh, Xu and Nowak, 2009: Multi-Manifold
Semi-Supervised Learning
23
THANK YOU
monojit@techyugadi.com

More Related Content

Similar to NODES 2020 extended - Manifolds in semi-supervised learning

Similar to NODES 2020 extended - Manifolds in semi-supervised learning (20)

Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Object Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshopObject Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshop
 
C3 w3
C3 w3C3 w3
C3 w3
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility Principle
 
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboNYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methods
 

More from Neo4j

More from Neo4j (20)

From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 

Recently uploaded

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 

NODES 2020 extended - Manifolds in semi-supervised learning

  • 1. MANIFOLDS IN SEMI-SUPERVISED LEARNING Monojit Basu Director, TechYugadi IT Solutions & Consulting, Bangalore EXTENDED
  • 2. 2 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 3. 3 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 4. 4 Semi-supervized Learning: Overview ● Training Samples consist of data with and without class label ● Images with and without captions ● Text with and without tags, .. ● Model is built with both labeled and unlabeled data Prob(y|x) Prob(x) ● Smoothness Property: If two data points are close, their labels should be similar Label Data Based on labeled samples Based on both labeled and unlabeled samples
  • 5. 5 Graph-based Algorithms For SSL ● There are many many ways of exploiting smoothness property ● A simplistic baseline approach is self-training (not graph-based) ● Graph-based Algorithms are particularly effective ● Label Propagation ● Random-Walk ● Min-Cut ● Density-based Distances ● Local and Global Consistency ● Using Graph Kernels, ..
  • 6. 6 Label Propagation ● Generates a weighted graph where edges between similar neighbours have higher weights (Zhu and Ghahramani, 2002) ● Defines a transition matrix: ● Tij = probability of node i ‘jumping’ into node j, that is, taking up j’s label ● Repeatedly multiplies the current label matrix with the transition matrix (which itself gets updated) ● Until labels on all nodes stabilize (convergence) ● In effect labels propagate from labeled to unlabeled nodes 1 1 1 00 0 unlabeled
  • 7. 7 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 8. 8 Manifold Structures ● Data (nodes) are distributed over low and high density regions ● Two nodes that are geometrically close may not be similar ● Or equivalently, the geometry / distance measure should be redefined ● Euclidean distances and weights based on them may not work ● Such data is said to lie on a manifold ● Although not necessary, manifold structures are often observed with high-dimensional data ● More complex scenario: data may not lie on a single manifold ● This is called multi-manifold structure
  • 11. 11 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 12. 12 Manifold Regularization ● This is the technical term for semi-supervized classification of data distributed on a (single) manifold (Belkin et al., 2006) ● Key is to establish connectivity between similar nodes by staying along a high-density region ● Mathematically it involves ● Computing a matrix L derived from the ordinary weight matrix W ● Taking the top n eigenvalues of L ● Computing an indicator function using the dot product of a data point with the eigenvalues ● It is based on a theory known as Kernel Hilbert Spaces
  • 14. 14 Multi-manifold Regularization ● This is the technical term for semi-supervized classification of data distributed on a multi-manifold (Goldberg et al., 2009) ● Single manifold algorithm still starts with Euclidean distances, but reformulates steps based on the derived matrix L ● Multi-manifold algorithm straight away changes distance metrics ● It is based on Hellinger distances H, and ● A Mahalnabis k-nearest neighbor graph computed from H ● Complete algorithm is much longer, involving spectral clustering and self-training on each cluster
  • 15. 15 Multi-manifold Regularization (Schematic) DATA Σs Sample Cov. Mat. H kNN graph Spectral Clustering Self-trained Clusters
  • 16. 16 Multi-view Semi-supervised Learning ● Multi-view learning involves two or more independent projections for each data point ● Classic Example: web-page classification using ● Bag of words ● Links to other web-pages ● Instead of representing data as (X, y) where y is class label, it may be represented as (X1, X2, y), where Xi are views ● Somewhat related to multimodal learning (like video and audio)
  • 17. 17 Multi-view Manifold Regularization ● Can manifold regularization be extended to multi-view data ● Yes, algorithms exist, based on strong mathematical foundations, like Sindhwani and Rosenberg, 2008 ● There is actually a generic pattern for multi-view semi- supervized learning, called co-training ● Sindhwani et al., extends co-training with an algorithm called co-regularization ● It reduces the problem to a convex optimization to minimize a loss function ● The total loss function depends on individual class predictors for each view, and a couple of regularization hyperparameters
  • 18. 18 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 19. 19 Python Implementation ● An implementation of some of these algorithms in Python 3.x is published on github: https://github.com/techyugadi/manifold_ssl ● These algorithms offer an interface similar to scikit-learn ● There are some programs to generate synthetic data and also use the MNIST handwritten digits data ● Note: scikit-learn as of now supports only label propagation algorithm for semi-supervized learning ● R package has more algorithms but not maifold regularization ● This is early-access release, more algorithms to be published !
  • 20. 20 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 21. 21 Summary ● Manifold regularization is an improvement over the standard label propagation algorithm for semi-supervised learning ● It may lead to better results when data is distributed over a manifold or multi-manifold ● This class of algorithms cover a wide range of scenarios, including multi-view datasets ● These algorithms can be implemented in Python using common numpy and linear algebra packages (see github)
  • 22. 22 References ● Zhu and Ghahramani, 2002: Learning from Labeled and Unlabeled Data with Label Propagation ● Belkin, Niyogi and Sindhwani, 2006: Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples ● Sindhwani and Rosenberg, 2008: An RKHS for Multi-View Learning and Manifold Co-Regularization ● Goldberg, Zhu, Singh, Xu and Nowak, 2009: Multi-Manifold Semi-Supervised Learning