SlideShare una empresa de Scribd logo
1 de 48
Supervised and semi-supervised learning for NLP John Blitzer 自然语言计算组 http://research.microsoft.com/asia/group/nlc/
Why should I know about machine learning?  This is an NLP summer school.  Why should I care about machine learning? ACL 2008:  50 of 96 full papers mention learning, or statistics in their titles 4 of 4 outstanding papers propose new learning or statistical inference methods
Example 1:   Review classification Output: Labels Input: Product Review 	   Running with Scissors: A Memoir  	   Title: Horrible book, horrible. This book was horrible.  I read half of it, suffering from a headache the entire time, and eventually i lit it on fire.  One less copy in the world...don't waste your money.  I wish i had the time spent reading this book back so i could use it for better purposes.  This book wasted my life Positive Negative
From the MSRA 机器学习组  http://research.microsoft.com/research/china/DCCUE/ml.aspx
Example 2:   Relevance Ranking Ranked List Un-ranked List . . .  . . .
Example 3: Machine Translation Input: English sentence The national track & field championships concluded Output: Chinese sentence 全国田径冠军赛结束
Course Outline 1) Supervised Learning [2.5 hrs] 2) Semi-supervised learning [3 hrs] 3) Learning bounds for domain adaptation [30 mins]
Supervised Learning Outline 1) Notation and Definitions [5 mins] 2) Generative Models [25 mins] 3) Discriminative Models [55 mins] 4) Machine Learning Examples [15 mins]
Training and testing data Training data: labeled pairs .  .  . Use training data to learn a function Use this function to label unlabeled testing data ?? ?? .  .  . ??
Feature representations of . . . . . . 2 1 0 0 3 0 0 waste horrible read_half . . . . . . 0 0 2 1 0 0 0 loved_it excellent horrible
Generative model
Graphical Model Representation Encode a multivariate probability distribution Nodes indicate random variables Edges indicate conditional dependency
Graphical Model Inference p(y = -1) p(horrible | -1) p(read_half | -1) waste read_half horrible
Inference at test time Given an unlabeled instance, how can we find its label? ?? Just choose the most probable label y
Estimating parameters from training data Back to labeled training data: .  .  .
Multiclass Classification Query classification Travel  Technology News Entertainment . . . . Input query:     “自然语言处理” Training and testing same as in binary case
Maximum Likelihood Estimation Why set parameters to counts?
MLE – Label marginals
Problems with Naïve Bayes Predicting broken traffic lights Lights are broken:  both lights are red always Lights are working: 1 is red & 1 is green
Problems with Naïve Bayes 2 Now, suppose both lights are red.  What will our model predict? We got the wrong answer.  Is there a better model? The MLE generative model is not the best model!!
More on Generative models We can introduce more dependencies This can explode parameter space Discriminative models minimize error -- next Further reading K. Toutanova. Competitive generative models with structure learning for NLP classification tasks.  EMNLP 2006. 	A. Ng and M. Jordan.  On Discriminative vs. Generative Classifiers:  A comparison of logistic regression and naïve Bayes. NIPS 2002
Discriminative Learning We will focus on linear models Model training error
Upper bounds on binary training error 0-1 loss (error):  NP-hard to minimize over all data points Exp loss:  exp(-score): Minimized by AdaBoost Hinge loss:  Minimized by support vector machines
Binary classification: Weak hypotheses In NLP, a feature can be a weak learner  Sentiment example:
The AdaBoost algorithm
A small example – – + + Excellent  read Terrible:  The_plot was boring and opaque Awful book.  Couldn’t follow the_plot. Excellent book.  The_plot was riveting – – – – + + + – – – – + – – – – + – –
Bound on training error [Freund & Schapire 1995] We greedily minimize error by minimizing
For proofs and a more complete discussion 	Robert Schapire and YoramSinger. Improved Boosting Algorithms Using Confidence-rated Predictions.   Machine Learning Journal 1998.
Exponential convergence of error in t Plugging in our solution for, we have We chose      to minimize         .  Was that the right choice? This gives
AdaBoost drawbacks What happens when an example is mis-labeled or an outlier? Exp loss exponentially penalizes incorrect scores. Hinge loss linearly penalizes incorrect scores.
Support Vector Machines Linearly separable Non-separable + + + + + – + + – – – – + – – –
Margin + + + + + + + + – – – – – – – – Lots of separating hyperplanes.  Which should we choose? Choose the hyperplane with largest margin
Max-margin optimization  greater than margin  score of correct label Why do we fix norm of w to be less than 1? Scaling the weight vector doesn’t change the optimal hyperplane
Equivalent optimization problem Minimize the norm of the weight vector With fixed margin for each example
Back to the non-separable case + + We can’t satisfy the margin constraints But some hyperplanes are better than others – + – – + –
Soft margin optimization Add slack variables to the optimization Allow margin constraints to be violated But minimize the violation as much as possible
Optimization 1: Absorbing constraints
Optimization 2:  Sub-gradient descent Max creates a non-differentiable point, but there is a subgradient Subgradient:
Stochastic subgradient descent Subgradient descent is like gradient descent. Also guaranteed to converge, but slow Pegasos [Shalev-Schwartz and Singer 2007] Sub-gradient descent for a randomly selected subset of examples. Convergence bound: Objective after T iterations Best objective value Linear convergence
SVMs for NLP We’ve been looking at binary classification But most NLP problems aren’t binary Piece-wise linear decision boundaries We showed 2-dimensional examples But NLP is typically very high dimensional Joachims [2000] discusses linear models in high-dimensional spaces
Kernels and non-linearity Kernels let us efficiently map training data into a high-dimensional feature space Then learn a model which is linear in the new space, but non-linear in our original space But for NLP, we already have a high-dimensional representation! Optimization with non-linear kernels is often super-linear in number of examples
More on SVMs John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis.  Cambridge University Press 2004. Dan Klein and Ben Taskar. Max Margin Methods for NLP:  Estimation, Structure, and Applications.  ACL 2005 Tutorial. Ryan McDonald. Generalized Linear Classifiers in NLP.  Tutorial at the Swedish Graduate School in Language Technology.  2007.
SVMs vs. AdaBoost SVMs with slack are noise tolerant AdaBoost has no explicit regularization Must resort to early stopping AdaBoost easily extends to non-linear models Non-linear optimization for SVMs is super-linear in the number of examples Can be important for examples with hundreds or thousands of features
More on discriminative methods Logistic regression:  Also known as Maximum Entropy Probabilistic discriminative model which directly models p(y | x) A good general machine learning book On discriminative learning and more Chris Bishop.  Pattern Recognition and Machine Learning.  Springer 2006.
Learning to rank (1) (2) (3) (4)
Features for web page ranking Good features for this model?   (1) How many words are shared between the query and the web page? (2) What is the PageRank of the webpage? (3) Other ideas?
Optimization Problem Loss for a query and a pair of documents  Score for documents of different ranks must be separated by a margin MSRA 互联网搜索与挖掘组 http://research.microsoft.com/asia/group/wsm/
Come work with us at Microsoft! http://www.msra.cn/recruitment/

Más contenido relacionado

La actualidad más candente

Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLPVijay Ganti
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...butest
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine LearningStepUp Analytics
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 

La actualidad más candente (20)

Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Ensemble modeling and Machine Learning
Ensemble modeling and Machine LearningEnsemble modeling and Machine Learning
Ensemble modeling and Machine Learning
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 

Destacado

Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
Neural Network Applications In Machining: A Review
Neural Network Applications In Machining: A ReviewNeural Network Applications In Machining: A Review
Neural Network Applications In Machining: A ReviewAshish Khetan
 
Logica | Intelligent Self learning - a helping hand in financial crime
Logica | Intelligent Self learning - a helping hand in financial crimeLogica | Intelligent Self learning - a helping hand in financial crime
Logica | Intelligent Self learning - a helping hand in financial crimeCGI
 
An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...
An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...
An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...Philippe Laborie
 
Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...
Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...
Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...Ashish Gupta
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Mahdi Hosseini Moghaddam
 
Cognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent TutorsCognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent TutorsCody Ray
 
Ai and neural networks
Ai and neural networksAi and neural networks
Ai and neural networksNikhil Kansari
 
Home Automation: Design and Construction of an intelligent design for Cooling...
Home Automation: Design and Construction of an intelligent design for Cooling...Home Automation: Design and Construction of an intelligent design for Cooling...
Home Automation: Design and Construction of an intelligent design for Cooling...Adedamola Wuraola
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima PanditPurnima Pandit
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...IOSR Journals
 
Applications of Artificial Neural Network and Wavelet Transform For Conditio...
Applications of Artificial Neural Network and Wavelet  Transform For Conditio...Applications of Artificial Neural Network and Wavelet  Transform For Conditio...
Applications of Artificial Neural Network and Wavelet Transform For Conditio...IJMER
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 

Destacado (20)

Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
Neural Network Applications In Machining: A Review
Neural Network Applications In Machining: A ReviewNeural Network Applications In Machining: A Review
Neural Network Applications In Machining: A Review
 
Logica | Intelligent Self learning - a helping hand in financial crime
Logica | Intelligent Self learning - a helping hand in financial crimeLogica | Intelligent Self learning - a helping hand in financial crime
Logica | Intelligent Self learning - a helping hand in financial crime
 
An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...
An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...
An Optimal Iterative Algorithm for Extracting MUCs in a Black-box Constraint ...
 
Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...
Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...
Black Box Methods for Inferring Parallel Applications' Properties in Virtual ...
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...
 
Cognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent TutorsCognitive Modeling & Intelligent Tutors
Cognitive Modeling & Intelligent Tutors
 
Project based learning methodologies for Embedded Systems and Intelligent Sys...
Project based learning methodologies for Embedded Systems and Intelligent Sys...Project based learning methodologies for Embedded Systems and Intelligent Sys...
Project based learning methodologies for Embedded Systems and Intelligent Sys...
 
Ai and neural networks
Ai and neural networksAi and neural networks
Ai and neural networks
 
Neural
NeuralNeural
Neural
 
Home Automation: Design and Construction of an intelligent design for Cooling...
Home Automation: Design and Construction of an intelligent design for Cooling...Home Automation: Design and Construction of an intelligent design for Cooling...
Home Automation: Design and Construction of an intelligent design for Cooling...
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance Industry
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Applications of Artificial Neural Network and Wavelet Transform For Conditio...
Applications of Artificial Neural Network and Wavelet  Transform For Conditio...Applications of Artificial Neural Network and Wavelet  Transform For Conditio...
Applications of Artificial Neural Network and Wavelet Transform For Conditio...
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
9 black box model 2015
9 black box model 20159 black box model 2015
9 black box model 2015
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 

Similar a Query Linguistic Intent Detection

Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Hayim Makabee
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptxssuser2023c6
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine LearningAyodele Odubela
 
Download It
Download ItDownload It
Download Itbutest
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummiesMichael Winer
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Rebecca Bilbro
 
WIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual DiagnosticsWIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual DiagnosticsWomen in Analytics Conference
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language ModelsDataScienceConferenc1
 
ML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptxML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptxMayankChadha14
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 

Similar a Query Linguistic Intent Detection (20)

Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
Download It
Download ItDownload It
Download It
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
WIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual DiagnosticsWIA 2019 - Steering Model Selection with Visual Diagnostics
WIA 2019 - Steering Model Selection with Visual Diagnostics
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
 
ML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptxML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptx
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 

Más de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Más de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Query Linguistic Intent Detection

  • 1. Supervised and semi-supervised learning for NLP John Blitzer 自然语言计算组 http://research.microsoft.com/asia/group/nlc/
  • 2. Why should I know about machine learning? This is an NLP summer school. Why should I care about machine learning? ACL 2008: 50 of 96 full papers mention learning, or statistics in their titles 4 of 4 outstanding papers propose new learning or statistical inference methods
  • 3. Example 1: Review classification Output: Labels Input: Product Review Running with Scissors: A Memoir Title: Horrible book, horrible. This book was horrible. I read half of it, suffering from a headache the entire time, and eventually i lit it on fire. One less copy in the world...don't waste your money. I wish i had the time spent reading this book back so i could use it for better purposes. This book wasted my life Positive Negative
  • 4. From the MSRA 机器学习组 http://research.microsoft.com/research/china/DCCUE/ml.aspx
  • 5. Example 2: Relevance Ranking Ranked List Un-ranked List . . . . . .
  • 6. Example 3: Machine Translation Input: English sentence The national track & field championships concluded Output: Chinese sentence 全国田径冠军赛结束
  • 7. Course Outline 1) Supervised Learning [2.5 hrs] 2) Semi-supervised learning [3 hrs] 3) Learning bounds for domain adaptation [30 mins]
  • 8. Supervised Learning Outline 1) Notation and Definitions [5 mins] 2) Generative Models [25 mins] 3) Discriminative Models [55 mins] 4) Machine Learning Examples [15 mins]
  • 9. Training and testing data Training data: labeled pairs . . . Use training data to learn a function Use this function to label unlabeled testing data ?? ?? . . . ??
  • 10. Feature representations of . . . . . . 2 1 0 0 3 0 0 waste horrible read_half . . . . . . 0 0 2 1 0 0 0 loved_it excellent horrible
  • 12. Graphical Model Representation Encode a multivariate probability distribution Nodes indicate random variables Edges indicate conditional dependency
  • 13. Graphical Model Inference p(y = -1) p(horrible | -1) p(read_half | -1) waste read_half horrible
  • 14. Inference at test time Given an unlabeled instance, how can we find its label? ?? Just choose the most probable label y
  • 15. Estimating parameters from training data Back to labeled training data: . . .
  • 16. Multiclass Classification Query classification Travel Technology News Entertainment . . . . Input query: “自然语言处理” Training and testing same as in binary case
  • 17. Maximum Likelihood Estimation Why set parameters to counts?
  • 18. MLE – Label marginals
  • 19. Problems with Naïve Bayes Predicting broken traffic lights Lights are broken: both lights are red always Lights are working: 1 is red & 1 is green
  • 20. Problems with Naïve Bayes 2 Now, suppose both lights are red. What will our model predict? We got the wrong answer. Is there a better model? The MLE generative model is not the best model!!
  • 21. More on Generative models We can introduce more dependencies This can explode parameter space Discriminative models minimize error -- next Further reading K. Toutanova. Competitive generative models with structure learning for NLP classification tasks. EMNLP 2006. A. Ng and M. Jordan. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naïve Bayes. NIPS 2002
  • 22. Discriminative Learning We will focus on linear models Model training error
  • 23. Upper bounds on binary training error 0-1 loss (error): NP-hard to minimize over all data points Exp loss: exp(-score): Minimized by AdaBoost Hinge loss: Minimized by support vector machines
  • 24. Binary classification: Weak hypotheses In NLP, a feature can be a weak learner Sentiment example:
  • 26. A small example – – + + Excellent read Terrible: The_plot was boring and opaque Awful book. Couldn’t follow the_plot. Excellent book. The_plot was riveting – – – – + + + – – – – + – – – – + – –
  • 27. Bound on training error [Freund & Schapire 1995] We greedily minimize error by minimizing
  • 28. For proofs and a more complete discussion Robert Schapire and YoramSinger. Improved Boosting Algorithms Using Confidence-rated Predictions. Machine Learning Journal 1998.
  • 29. Exponential convergence of error in t Plugging in our solution for, we have We chose to minimize . Was that the right choice? This gives
  • 30. AdaBoost drawbacks What happens when an example is mis-labeled or an outlier? Exp loss exponentially penalizes incorrect scores. Hinge loss linearly penalizes incorrect scores.
  • 31. Support Vector Machines Linearly separable Non-separable + + + + + – + + – – – – + – – –
  • 32. Margin + + + + + + + + – – – – – – – – Lots of separating hyperplanes. Which should we choose? Choose the hyperplane with largest margin
  • 33. Max-margin optimization greater than margin score of correct label Why do we fix norm of w to be less than 1? Scaling the weight vector doesn’t change the optimal hyperplane
  • 34. Equivalent optimization problem Minimize the norm of the weight vector With fixed margin for each example
  • 35. Back to the non-separable case + + We can’t satisfy the margin constraints But some hyperplanes are better than others – + – – + –
  • 36. Soft margin optimization Add slack variables to the optimization Allow margin constraints to be violated But minimize the violation as much as possible
  • 38. Optimization 2: Sub-gradient descent Max creates a non-differentiable point, but there is a subgradient Subgradient:
  • 39. Stochastic subgradient descent Subgradient descent is like gradient descent. Also guaranteed to converge, but slow Pegasos [Shalev-Schwartz and Singer 2007] Sub-gradient descent for a randomly selected subset of examples. Convergence bound: Objective after T iterations Best objective value Linear convergence
  • 40. SVMs for NLP We’ve been looking at binary classification But most NLP problems aren’t binary Piece-wise linear decision boundaries We showed 2-dimensional examples But NLP is typically very high dimensional Joachims [2000] discusses linear models in high-dimensional spaces
  • 41. Kernels and non-linearity Kernels let us efficiently map training data into a high-dimensional feature space Then learn a model which is linear in the new space, but non-linear in our original space But for NLP, we already have a high-dimensional representation! Optimization with non-linear kernels is often super-linear in number of examples
  • 42. More on SVMs John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press 2004. Dan Klein and Ben Taskar. Max Margin Methods for NLP: Estimation, Structure, and Applications. ACL 2005 Tutorial. Ryan McDonald. Generalized Linear Classifiers in NLP. Tutorial at the Swedish Graduate School in Language Technology. 2007.
  • 43. SVMs vs. AdaBoost SVMs with slack are noise tolerant AdaBoost has no explicit regularization Must resort to early stopping AdaBoost easily extends to non-linear models Non-linear optimization for SVMs is super-linear in the number of examples Can be important for examples with hundreds or thousands of features
  • 44. More on discriminative methods Logistic regression: Also known as Maximum Entropy Probabilistic discriminative model which directly models p(y | x) A good general machine learning book On discriminative learning and more Chris Bishop. Pattern Recognition and Machine Learning. Springer 2006.
  • 45. Learning to rank (1) (2) (3) (4)
  • 46. Features for web page ranking Good features for this model? (1) How many words are shared between the query and the web page? (2) What is the PageRank of the webpage? (3) Other ideas?
  • 47. Optimization Problem Loss for a query and a pair of documents Score for documents of different ranks must be separated by a margin MSRA 互联网搜索与挖掘组 http://research.microsoft.com/asia/group/wsm/
  • 48. Come work with us at Microsoft! http://www.msra.cn/recruitment/

Notas del editor

  1. Say something good about HIT !!! Also advertise for Microsoft
  2. Say there’s no alpha, so weights are approximate
  3. Say: This is the SVM formulation that we observe most commonly