SlideShare a Scribd company logo
1 of 57
Download to read offline
Pattern Recognition
with
Semi-Supervised Learning
Algorithm
Presented By:-
Anurodh Kumar Sinha
2ND Year MSLIS Student
DRTC,ISI Bangalore
2012-2013
12/3/2012 1
Agenda
• What is Pattern Recognition?
• What is Machine Learning n why we
need..?
• Types of Learning Algorithm
• Need for Semi-Supervised Learning
• Conclusion
212/3/2012
What is a Pattern…. ?
• An entity, vaguely defined, that could be
given a name,
• e.g.:
– handwritten word,
– human face,
– fingerprint image,
– speech signal,
312/3/2012
What is Feature….?
• A Feature is an individual measurable heuristic property
of a phenomenon being observed
• Examples
• In speech recognition, features for recognizing
phonemes can include noise ratios, length of sounds,
relative power, filter matches and many others.
• In spam detection algorithms, features may include
whether certain email headers are present or absent,
whether they are well formed, what language the email
appears to be, the grammatical correctness of the text
412/3/2012
What is Pattern Recognition.. ?
• Pattern recognition is the study of how
machines can:
– observe the environment,
– learn to distinguish patterns of interest,
– make sound and reasonable decisions about
the categories of the patterns.
“The assignment of a physical object or event to
one of several prespecified categories” -- Duda
& Hart
512/3/2012
What is Pattern Recognition… ?
• Some Applications:
612/3/2012
Motivation For The Study
of
Pattern Recognition
It is threefold.
• In Artificial Intelligence, which is concerned with techniques, that enable
computers to do things, that seem intelligent when done by people.
• It is an important aspect of applying computers to do analysis and
classification of measurements, from its data observation.
• Pattern Recognition techniques provide a unified frame work to study a
variety of techniques with use of mathematics and computer science, which
helps the machine to make decision
712/3/2012
Methodology
of
Pattern Recognitions
It consists of the following:
1.We observe patterns
2.We study the relationships between the various
patterns.
3.We study the relationships between patterns and
ourselves and thus arrive at situations
4.We study the changes in situations and come to know
about the events.
5.We study events and thus find rule behind the events.
6. Using the rule, we can predict future events.
812/3/2012
An Example
• Suppose that:
– A fish packing plant
wants to automate the
process of sorting
incoming fish on a
conveyor belt according
to species,
– There are two species:
• Sea bass,
• Salmon.
912/3/2012
An Example
1012/3/2012
An Example
How to distinguish one specie from the other ?
(length, width, weight, number and shape of fins,
tail shape,etc.)
1112/3/2012
An Example
• Suppose we also know that:
– Sea bass are typically wider than salmon.
– But it may happen that decision can‟t be
made on single feature
• We can use more than one feature for our
decision:
– Lightness (x1) and width (x2)
1212/3/2012
Components of a typical Pattern Recognition System
Pattern Recognition Systems
1312/3/2012
Examples of applications
• Optical Character
Recognition (OCR)
• Biometrics
• Diagnostic systems
• Military applications
• Handwritten: sorting letters by postal code,
input device for PDA‘s.
• Printed texts: reading machines for blind
people, digitalization of text documents.
• Face recognition, verification, retrieval.
• Finger prints recognition.
• Speech recognition.
• Medical diagnosis: X-Ray, EKG analysis.
• Machine diagnostics, waster detection.
• Automated Target Recognition (ATR).
• Image segmentation and analysis (recognition
from aerial or satelite photographs). 1412/3/2012
What is Machine Learning….?
• Machine Learning algorithms discover the relationships
between the variables of a system (input, output and
hidden) from direct samples of the system
• These algorithms originate form many fields:
– Statistics, mathematics, theoretical computer science,
physics, neuroscience, etc
1512/3/2012
16
Why Learning algorithms needed….?
• When the relationships between all system variables (input,
output, and hidden) is completely understood!
• This is NOT the case for almost any real system!
• Growing flood of online data
• Computational power is available
• progress in algorithms and theory
12/3/2012
Learning Algorithm Application
• Data mining: using historical data to improve decision
– medical records ⇒ medical knowledge
– log data to model user
• Software applications we can‟t program by hand
– autonomous driving
– speech recognition
• Self customizing programs
– Newsreader that learns user interests
1712/3/2012
Typical Example
• 9714 patient records, each describing a pregnancy and birth
• Each patient record contains 215 features
• Classes of future patients at high risk for Emergency Cesarean
Section
Learn to predict:
Given:
1812/3/2012
19
The Sub-Fields
of
Machine Learning
• Supervised Learning
• Unsupervised Learning
• Semi-Supervsed Learning
12/3/2012
Supervised Learning
2012/3/2012
Supervised Learning
• Supervised learning is the machine learning task of inferring
a function from labeled training data.
• In training data each pair consisting of an input object
(typically a vector) and a desired output value (also called the
supervisory signal).
• A supervised learning algorithm analyzes the training data
and produces an inferred function, which is called a classifier
(if the output is discrete) or a regression function (if the output
is continuous).
• The inferred function should predict the correct output value
for any valid input object. This requires the learning algorithm
to generalize from the training data to unseen situations in a
"reasonable" way.
2112/3/2012
Supervised Learning Process: two
Steps
Learning (training): Learn a model using the training data
Testing: Test the model using unseen test data to assess the model accuracy
,
casestestofnumberTotal
tionsclassificacorrectofNumber
Accuracy
12/3/2012 22
Example
• A credit card company receives thousands of
applications for new cards. Each application
contains information about an applicant,
– age
– Job
– House
– credit rating
– etc.
• Problem: to decide whether an application should
approved, or to classify applications into two
categories, approved and not approved.
12/3/2012 23
An example: Data (Loan
Application)
2412/3/2012
25
An example: The Learning Task
• Learn a classification model from the data
• Use the model to classify future loan applications
into
– Yes (approved) and
– No (not approved)
• What is the class for following case/instance?
Bayesian Classifier
• The Simple Bayesian Classifier (SBC) uses probabilistic
methods for classification
• The basis of bayesian classifier is: The probability of document
„d‟ being in class „c‟ is computed as-
where P(tk|c) is the conditional probability of term occurring in a
document of class c .Where,
2612/3/2012
Simple Bayes Classifier
2712/3/2012
12/3/2012 28
Unsupervised Learning
2912/3/2012
• Organizing data into classes such that there is
Inter-clusters distance  maximized
Intra-clusters distance  minimized
• Finding the class labels and the number of classes directly from the data
(in contrast to classification).
• More informally, finding natural groupings among objects.
What is Unsupervised
Learning….?
• Unsupervised learning refers to the problem of trying to
find hidden structure in unlabeled data
• Sometimes it is also referred as Clustering
3012/3/2012
What is a natural grouping among these objects?
3112/3/2012
School EmployeesSimpson's Family MalesFemales
Clustering is subjective
What is a natural grouping among these objects?
3212/3/2012
What is clustering for….?
Let us see some real-life examples
• Example 1: Groups people of similar sizes together to
make “small”, “medium” and “large” T-Shirts.
– Tailor-made for each person: too expensive
– One-size-fits-all: does not fit all.
• Example 2: Given a collection of text documents, we
want to organize them according to their content
similarities,
– To produce a topic hierarchy
12/3/2012 33
What is clustering for? (cont…)
In fact, clustering is one of the most utilized
data mining techniques
– It has a long history, and used in almost every field,
e.g., medicine, psychology, botany, sociology, biology,
archeology, marketing, insurance, libraries, etc.
– In recent years, due to the rapid increase of online
documents, text clustering becomes important.
12/3/2012 34
K-means algorithm
12/3/2012 35
36
An example
+
+
12/3/2012
37
An example (cont …)
12/3/2012
Semi-Supervised learning
12/3/2012 38
Supervised Learning
versus
Unsupervised Learning
• Unsupervised clustering Group similar objects together
to find clusters
• Minimize intra-class distance
• Maximize inter-class distance
• Supervised classification Class label for each training
sample is given
– Build a model from the training data
– Predict class label on unseen future data points
3912/3/2012
However, for many problems, labeled
data can be rare or expensive.
Unlabeled data is much cheaper.
Speech
Images
Medical outcomes
Customer modeling
Protein sequences
Web pages
Need to pay someone to do it, requires special testing,…
4012/3/2012
Why Semi-Supervised Learning…?
• Why not clustering?
– The clusters produced may not be the ones
required.
– Sometimes there are multiple possible
groupings.
• Why not classification?
– Sometimes there are insufficient labeled data.
4112/3/2012
Semi-Supervised Learning
• Combines labeled and unlabeled data
during training to improve performance:
– Semi-supervised classification: Training on labeled data exploits
additional unlabeled data, frequently resulting in a more accurate
classifier.
– Semi-supervised clustering: Uses small amount of labeled data to
aid and bias the clustering of unlabeled data.
Unsupervised
clustering
Semi-supervised
learning
Supervised
classification
4212/3/2012
Semi-Supervised Classification
• An initial classifier is designed using the labeled data set D(l).
This classifier is then used to assign class labels to examples
in D(u). Then the classifier is re-trained using D(l) U D(u).
• The last two steps are usually repeated for a given number of
times or until some criterion is satisfied
4312/3/2012
.
Semi-Supervised Classification
Example
.
.
.
.
. .
. ..
.
.
...
.
.
.
..
4412/3/2012
.
Semi-Supervised Classification
Example
.
.
.
.
. .
. ..
.
.
...
.
.
.
..
4512/3/2012
Semi-Supervised Classification
• Algorithms:
– Semisupervised EM
[Ghahramani:NIPS94,Nigam:ML00].
– Co-training [Blum:COLT98].
– Transductive SVM‟s [Vapnik:98,Joachims:ICML99].
– Graph based algorithms
• Assumptions:
– Known, fixed set of categories given in the labeled
data.
– Goal is to improve classification of examples into
these known categories.
4612/3/2012
Semi-Supervised clustering
• Input:
– A set of unlabeled objects, each described by a set of attributes
(numeric and/or categorical)
– A small amount of domain knowledge
• Output:
– A partitioning of the objects into k clusters (possibly with some
discarded as outliers)
• Objective:
– Maximum intra-cluster similarity
– Minimum inter-cluster similarity
– High consistency between the partitioning and the domain
knowledge
4712/3/2012
How Semi-Supervised Clustering done?
• In addition to the similarity information used by unsupervised
clustering, in many cases a small amount of knowledge is available
concerning either pairwise (must-link or cannot-link) constraints
between data items or class labels for some items.
• Instead of simply using this knowledge for the external validation of
the results of clustering, one can imagine letting it “guide” or “adjust”
the clustering process, i.e. provide a limited form of supervision. The
resulting approach is called semi-supervised clustering
4812/3/2012
Illustration
x
x
Must-link
Determine
its label
Assign to the red class
4912/3/2012
Illustration
x
x
Cannot-link
Determine
its label
Assign to the red class
5012/3/2012
• According to different given domain knowledge:
– Users provide class labels (seeded points) a priori to
some of the documents
-Users know about which few documents are related
(must-link) or unrelated (cannot-link)
Semi-Supervised Clustering
Seeded points
Must-link
Cannot-link
5112/3/2012
Semi-supervised Clustering Algorithm
• Semi-supervised Clustering with labels (Partial label
information is given ) :
– SS-Seeded-Kmeans ( Sugato Basu, et al. ICML 2002)
- SS-Constraint-Kmeans ( Sugato Basu, et al. ICML 2002)
• Semi-supervised Clustering with Constraints (Pairwise
Constraints (Must-link, Cannot-link) is given):
– SS-COP-Kmeans (Wagstaff et al. ICML01)
– SS-HMRF-Kmeans (Sugato Basu, et al. ACM SIGKDD
2004)
– SS-Kernel-Kmeans (Brian Kulis, et al. ICML 2005)
– SS-Spectral-Normalized-Cuts (X. Ji, et al. ACM SIGIR
2006)
5212/3/2012
Co-Training Algorithm
5312/3/2012
Conclusions
• Semi-supervised learning is an area of increasing
importance in Machine Learning.
• Automatic methods of collecting data make it more
important than ever to develop methods to make use
of unlabeled data.
• Several promising algorithms (only discussed a few).
Also new theoretical framework to help guide further
development.
5412/3/2012
Reference
• Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley &
Sons, New York, 1982. (2nd edition 2000).
• Fukunaga: Introduction to Statistical Pattern Recognition. Academic
Press, 1990.
• Sergios Theodoridis, Konstantinos Koutroumbas , pattern recognition
, Pattern Recognition ,Elsevier(USA)) ,1982
• K. Nigam and R. Ghani. Analyzing the effectiveness and applicability
of co-training. In Proceedings of the ninth international conference on
Information and knowledge management, pages 86{93. ACM, 2000.
• http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-
classification-1.html
• http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.htm
l
12/3/2012 55
Any
•
•
Question…..Suggestion….Feedback….???
5612/3/2012
Thank You
5712/3/2012

More Related Content

What's hot

PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION khanam22
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Neural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR ImagesNeural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR ImagesAisha Kalsoom
 
Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemREHMAT ULLAH
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognitionSwarnava Sen
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
pattern classification
pattern classificationpattern classification
pattern classificationRanjan Ganguli
 
Face recognition using neural network
Face recognition using neural networkFace recognition using neural network
Face recognition using neural networkIndira Nayak
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysisBangalore
 
Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network MD Abdullah Al Nasim
 
Image processing fundamentals
Image processing fundamentalsImage processing fundamentals
Image processing fundamentalsA B Shinde
 
Object recognition
Object recognitionObject recognition
Object recognitionsaniacorreya
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Brain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation pptBrain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation pptRoshini Vijayakumar
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 

What's hot (20)

PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION PPT on BRAIN TUMOR detection in MRI images based on  IMAGE SEGMENTATION
PPT on BRAIN TUMOR detection in MRI images based on IMAGE SEGMENTATION
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Neural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR ImagesNeural Network Based Brain Tumor Detection using MR Images
Neural Network Based Brain Tumor Detection using MR Images
 
Artificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition systemArtificial intelligence Pattern recognition system
Artificial intelligence Pattern recognition system
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
pattern classification
pattern classificationpattern classification
pattern classification
 
Face recognition using neural network
Face recognition using neural networkFace recognition using neural network
Face recognition using neural network
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network Brain tumor detection using convolutional neural network
Brain tumor detection using convolutional neural network
 
Image processing fundamentals
Image processing fundamentalsImage processing fundamentals
Image processing fundamentals
 
Object recognition
Object recognitionObject recognition
Object recognition
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Data mining
Data miningData mining
Data mining
 
Brain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation pptBrain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation ppt
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 

Viewers also liked

Pattern Recognition and its Applications
Pattern Recognition and its ApplicationsPattern Recognition and its Applications
Pattern Recognition and its ApplicationsSajida Mohammad
 
Patterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminarsPatterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminarsMukalele Rogers
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image ProcessingSahil Biswas
 
Machine_Learning_Project_Report
Machine_Learning_Project_ReportMachine_Learning_Project_Report
Machine_Learning_Project_ReportAditya Hendra
 
Abstract of the Presentation
Abstract of the PresentationAbstract of the Presentation
Abstract of the Presentationbutest
 
Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...Simone Rossi
 
IEEE_RFIC 2007
IEEE_RFIC 2007 IEEE_RFIC 2007
IEEE_RFIC 2007 wence00
 
IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2) IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2) wence00
 
Navigli sssw
Navigli ssswNavigli sssw
Navigli ssswSSSW
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - IreneSSSW
 

Viewers also liked (18)

Introduction to pattern recognition
Introduction to pattern recognitionIntroduction to pattern recognition
Introduction to pattern recognition
 
Pattern Recognition and its Applications
Pattern Recognition and its ApplicationsPattern Recognition and its Applications
Pattern Recognition and its Applications
 
Pattern Recognition
Pattern RecognitionPattern Recognition
Pattern Recognition
 
Patterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminarsPatterns of organization of speech, and how to lead discussions and seminars
Patterns of organization of speech, and how to lead discussions and seminars
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining
Data miningData mining
Data mining
 
Image processing ppt
Image processing pptImage processing ppt
Image processing ppt
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Machine_Learning_Project_Report
Machine_Learning_Project_ReportMachine_Learning_Project_Report
Machine_Learning_Project_Report
 
Abstract of the Presentation
Abstract of the PresentationAbstract of the Presentation
Abstract of the Presentation
 
Thesis defense
Thesis defenseThesis defense
Thesis defense
 
Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...Feature Extraction for High Resolution Remote Sensing Image Classification us...
Feature Extraction for High Resolution Remote Sensing Image Classification us...
 
IEEE_RFIC 2007
IEEE_RFIC 2007 IEEE_RFIC 2007
IEEE_RFIC 2007
 
IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2) IEEE_RFIC 2007 (2)
IEEE_RFIC 2007 (2)
 
Navigli sssw
Navigli ssswNavigli sssw
Navigli sssw
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - Irene
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 

Similar to Seminar(Pattern Recognition)

STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine LearningSharjeel Imtiaz
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis fmaru kindeneh
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningIRJET Journal
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
التنقيب في البيانات - Data Mining
التنقيب في البيانات -  Data Miningالتنقيب في البيانات -  Data Mining
التنقيب في البيانات - Data Miningnabil_alsharafi
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Researcheckchela
 
Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...LACE Project
 
Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
 
An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...IJERA Editor
 
Data Science Course In Pune
Data Science Course In Pune Data Science Course In Pune
Data Science Course In Pune APT
 
data science institute in bangalore
data science institute in bangaloredata science institute in bangalore
data science institute in bangaloredevipatnala1
 
Data Science Course Pune
Data Science Course PuneData Science Course Pune
Data Science Course PuneAPT
 
Data science course pdf
Data science course pdfData science course pdf
Data science course pdfAPT
 

Similar to Seminar(Pattern Recognition) (20)

STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
 
CodeLess Machine Learning
CodeLess Machine LearningCodeLess Machine Learning
CodeLess Machine Learning
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis f
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
An Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine LearningAn Intelligent Career Guidance System using Machine Learning
An Intelligent Career Guidance System using Machine Learning
 
Data mining
Data miningData mining
Data mining
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
التنقيب في البيانات - Data Mining
التنقيب في البيانات -  Data Miningالتنقيب في البيانات -  Data Mining
التنقيب في البيانات - Data Mining
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Research
 
Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...Scalable Learning Analytics and Interoperability – an assessment of potential...
Scalable Learning Analytics and Interoperability – an assessment of potential...
 
Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithms
 
An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...
 
Data Processing
 Data Processing Data Processing
Data Processing
 
Data Science Course In Pune
Data Science Course In Pune Data Science Course In Pune
Data Science Course In Pune
 
data science institute in bangalore
data science institute in bangaloredata science institute in bangalore
data science institute in bangalore
 
Data Science Course Pune
Data Science Course PuneData Science Course Pune
Data Science Course Pune
 
Data science course pdf
Data science course pdfData science course pdf
Data science course pdf
 
data science certification
data science certificationdata science certification
data science certification
 

Recently uploaded

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 

Recently uploaded (20)

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 

Seminar(Pattern Recognition)

  • 1. Pattern Recognition with Semi-Supervised Learning Algorithm Presented By:- Anurodh Kumar Sinha 2ND Year MSLIS Student DRTC,ISI Bangalore 2012-2013 12/3/2012 1
  • 2. Agenda • What is Pattern Recognition? • What is Machine Learning n why we need..? • Types of Learning Algorithm • Need for Semi-Supervised Learning • Conclusion 212/3/2012
  • 3. What is a Pattern…. ? • An entity, vaguely defined, that could be given a name, • e.g.: – handwritten word, – human face, – fingerprint image, – speech signal, 312/3/2012
  • 4. What is Feature….? • A Feature is an individual measurable heuristic property of a phenomenon being observed • Examples • In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches and many others. • In spam detection algorithms, features may include whether certain email headers are present or absent, whether they are well formed, what language the email appears to be, the grammatical correctness of the text 412/3/2012
  • 5. What is Pattern Recognition.. ? • Pattern recognition is the study of how machines can: – observe the environment, – learn to distinguish patterns of interest, – make sound and reasonable decisions about the categories of the patterns. “The assignment of a physical object or event to one of several prespecified categories” -- Duda & Hart 512/3/2012
  • 6. What is Pattern Recognition… ? • Some Applications: 612/3/2012
  • 7. Motivation For The Study of Pattern Recognition It is threefold. • In Artificial Intelligence, which is concerned with techniques, that enable computers to do things, that seem intelligent when done by people. • It is an important aspect of applying computers to do analysis and classification of measurements, from its data observation. • Pattern Recognition techniques provide a unified frame work to study a variety of techniques with use of mathematics and computer science, which helps the machine to make decision 712/3/2012
  • 8. Methodology of Pattern Recognitions It consists of the following: 1.We observe patterns 2.We study the relationships between the various patterns. 3.We study the relationships between patterns and ourselves and thus arrive at situations 4.We study the changes in situations and come to know about the events. 5.We study events and thus find rule behind the events. 6. Using the rule, we can predict future events. 812/3/2012
  • 9. An Example • Suppose that: – A fish packing plant wants to automate the process of sorting incoming fish on a conveyor belt according to species, – There are two species: • Sea bass, • Salmon. 912/3/2012
  • 11. An Example How to distinguish one specie from the other ? (length, width, weight, number and shape of fins, tail shape,etc.) 1112/3/2012
  • 12. An Example • Suppose we also know that: – Sea bass are typically wider than salmon. – But it may happen that decision can‟t be made on single feature • We can use more than one feature for our decision: – Lightness (x1) and width (x2) 1212/3/2012
  • 13. Components of a typical Pattern Recognition System Pattern Recognition Systems 1312/3/2012
  • 14. Examples of applications • Optical Character Recognition (OCR) • Biometrics • Diagnostic systems • Military applications • Handwritten: sorting letters by postal code, input device for PDA‘s. • Printed texts: reading machines for blind people, digitalization of text documents. • Face recognition, verification, retrieval. • Finger prints recognition. • Speech recognition. • Medical diagnosis: X-Ray, EKG analysis. • Machine diagnostics, waster detection. • Automated Target Recognition (ATR). • Image segmentation and analysis (recognition from aerial or satelite photographs). 1412/3/2012
  • 15. What is Machine Learning….? • Machine Learning algorithms discover the relationships between the variables of a system (input, output and hidden) from direct samples of the system • These algorithms originate form many fields: – Statistics, mathematics, theoretical computer science, physics, neuroscience, etc 1512/3/2012
  • 16. 16 Why Learning algorithms needed….? • When the relationships between all system variables (input, output, and hidden) is completely understood! • This is NOT the case for almost any real system! • Growing flood of online data • Computational power is available • progress in algorithms and theory 12/3/2012
  • 17. Learning Algorithm Application • Data mining: using historical data to improve decision – medical records ⇒ medical knowledge – log data to model user • Software applications we can‟t program by hand – autonomous driving – speech recognition • Self customizing programs – Newsreader that learns user interests 1712/3/2012
  • 18. Typical Example • 9714 patient records, each describing a pregnancy and birth • Each patient record contains 215 features • Classes of future patients at high risk for Emergency Cesarean Section Learn to predict: Given: 1812/3/2012
  • 19. 19 The Sub-Fields of Machine Learning • Supervised Learning • Unsupervised Learning • Semi-Supervsed Learning 12/3/2012
  • 21. Supervised Learning • Supervised learning is the machine learning task of inferring a function from labeled training data. • In training data each pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). • A supervised learning algorithm analyzes the training data and produces an inferred function, which is called a classifier (if the output is discrete) or a regression function (if the output is continuous). • The inferred function should predict the correct output value for any valid input object. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. 2112/3/2012
  • 22. Supervised Learning Process: two Steps Learning (training): Learn a model using the training data Testing: Test the model using unseen test data to assess the model accuracy , casestestofnumberTotal tionsclassificacorrectofNumber Accuracy 12/3/2012 22
  • 23. Example • A credit card company receives thousands of applications for new cards. Each application contains information about an applicant, – age – Job – House – credit rating – etc. • Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved. 12/3/2012 23
  • 24. An example: Data (Loan Application) 2412/3/2012
  • 25. 25 An example: The Learning Task • Learn a classification model from the data • Use the model to classify future loan applications into – Yes (approved) and – No (not approved) • What is the class for following case/instance?
  • 26. Bayesian Classifier • The Simple Bayesian Classifier (SBC) uses probabilistic methods for classification • The basis of bayesian classifier is: The probability of document „d‟ being in class „c‟ is computed as- where P(tk|c) is the conditional probability of term occurring in a document of class c .Where, 2612/3/2012
  • 30. • Organizing data into classes such that there is Inter-clusters distance  maximized Intra-clusters distance  minimized • Finding the class labels and the number of classes directly from the data (in contrast to classification). • More informally, finding natural groupings among objects. What is Unsupervised Learning….? • Unsupervised learning refers to the problem of trying to find hidden structure in unlabeled data • Sometimes it is also referred as Clustering 3012/3/2012
  • 31. What is a natural grouping among these objects? 3112/3/2012
  • 32. School EmployeesSimpson's Family MalesFemales Clustering is subjective What is a natural grouping among these objects? 3212/3/2012
  • 33. What is clustering for….? Let us see some real-life examples • Example 1: Groups people of similar sizes together to make “small”, “medium” and “large” T-Shirts. – Tailor-made for each person: too expensive – One-size-fits-all: does not fit all. • Example 2: Given a collection of text documents, we want to organize them according to their content similarities, – To produce a topic hierarchy 12/3/2012 33
  • 34. What is clustering for? (cont…) In fact, clustering is one of the most utilized data mining techniques – It has a long history, and used in almost every field, e.g., medicine, psychology, botany, sociology, biology, archeology, marketing, insurance, libraries, etc. – In recent years, due to the rapid increase of online documents, text clustering becomes important. 12/3/2012 34
  • 37. 37 An example (cont …) 12/3/2012
  • 39. Supervised Learning versus Unsupervised Learning • Unsupervised clustering Group similar objects together to find clusters • Minimize intra-class distance • Maximize inter-class distance • Supervised classification Class label for each training sample is given – Build a model from the training data – Predict class label on unseen future data points 3912/3/2012
  • 40. However, for many problems, labeled data can be rare or expensive. Unlabeled data is much cheaper. Speech Images Medical outcomes Customer modeling Protein sequences Web pages Need to pay someone to do it, requires special testing,… 4012/3/2012
  • 41. Why Semi-Supervised Learning…? • Why not clustering? – The clusters produced may not be the ones required. – Sometimes there are multiple possible groupings. • Why not classification? – Sometimes there are insufficient labeled data. 4112/3/2012
  • 42. Semi-Supervised Learning • Combines labeled and unlabeled data during training to improve performance: – Semi-supervised classification: Training on labeled data exploits additional unlabeled data, frequently resulting in a more accurate classifier. – Semi-supervised clustering: Uses small amount of labeled data to aid and bias the clustering of unlabeled data. Unsupervised clustering Semi-supervised learning Supervised classification 4212/3/2012
  • 43. Semi-Supervised Classification • An initial classifier is designed using the labeled data set D(l). This classifier is then used to assign class labels to examples in D(u). Then the classifier is re-trained using D(l) U D(u). • The last two steps are usually repeated for a given number of times or until some criterion is satisfied 4312/3/2012
  • 46. Semi-Supervised Classification • Algorithms: – Semisupervised EM [Ghahramani:NIPS94,Nigam:ML00]. – Co-training [Blum:COLT98]. – Transductive SVM‟s [Vapnik:98,Joachims:ICML99]. – Graph based algorithms • Assumptions: – Known, fixed set of categories given in the labeled data. – Goal is to improve classification of examples into these known categories. 4612/3/2012
  • 47. Semi-Supervised clustering • Input: – A set of unlabeled objects, each described by a set of attributes (numeric and/or categorical) – A small amount of domain knowledge • Output: – A partitioning of the objects into k clusters (possibly with some discarded as outliers) • Objective: – Maximum intra-cluster similarity – Minimum inter-cluster similarity – High consistency between the partitioning and the domain knowledge 4712/3/2012
  • 48. How Semi-Supervised Clustering done? • In addition to the similarity information used by unsupervised clustering, in many cases a small amount of knowledge is available concerning either pairwise (must-link or cannot-link) constraints between data items or class labels for some items. • Instead of simply using this knowledge for the external validation of the results of clustering, one can imagine letting it “guide” or “adjust” the clustering process, i.e. provide a limited form of supervision. The resulting approach is called semi-supervised clustering 4812/3/2012
  • 51. • According to different given domain knowledge: – Users provide class labels (seeded points) a priori to some of the documents -Users know about which few documents are related (must-link) or unrelated (cannot-link) Semi-Supervised Clustering Seeded points Must-link Cannot-link 5112/3/2012
  • 52. Semi-supervised Clustering Algorithm • Semi-supervised Clustering with labels (Partial label information is given ) : – SS-Seeded-Kmeans ( Sugato Basu, et al. ICML 2002) - SS-Constraint-Kmeans ( Sugato Basu, et al. ICML 2002) • Semi-supervised Clustering with Constraints (Pairwise Constraints (Must-link, Cannot-link) is given): – SS-COP-Kmeans (Wagstaff et al. ICML01) – SS-HMRF-Kmeans (Sugato Basu, et al. ACM SIGKDD 2004) – SS-Kernel-Kmeans (Brian Kulis, et al. ICML 2005) – SS-Spectral-Normalized-Cuts (X. Ji, et al. ACM SIGIR 2006) 5212/3/2012
  • 54. Conclusions • Semi-supervised learning is an area of increasing importance in Machine Learning. • Automatic methods of collecting data make it more important than ever to develop methods to make use of unlabeled data. • Several promising algorithms (only discussed a few). Also new theoretical framework to help guide further development. 5412/3/2012
  • 55. Reference • Duda, Heart: Pattern Classification and Scene Analysis. J. Wiley & Sons, New York, 1982. (2nd edition 2000). • Fukunaga: Introduction to Statistical Pattern Recognition. Academic Press, 1990. • Sergios Theodoridis, Konstantinos Koutroumbas , pattern recognition , Pattern Recognition ,Elsevier(USA)) ,1982 • K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the ninth international conference on Information and knowledge management, pages 86{93. ACM, 2000. • http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text- classification-1.html • http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.htm l 12/3/2012 55