SlideShare una empresa de Scribd logo
1 de 17
Smoothing N-gram Models
K.A.S.H. Kulathilake
B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
Smoothing
• What do we do with words that are in our vocabulary (they are not
unknown words) but appear in a test set in an unseen context (for
example they appear after a word they never appeared after in
training)?
• To keep a language model from assigning zero probability to these
unseen events, we’ll have to shave off a bit of probability mass from
some more frequent events and give it to the events we’ve never
seen.
• This modification is called smoothing or discounting.
• There are variety of ways to do smoothing:
– Add-1 smoothing
– Add-k smoothing
– Good-Turing Discounting
– Stupid backoff
– Kneser-Ney smoothing and many more
Laplace Smoothing / Add 1 Smoothing
• The simplest way to do smoothing is to add one to all
the bigram counts, before we normalize them into
probabilities.
• All the counts that used to be zero will now have a
count of 1, the counts of 1 will be 2, and so on.
• This algorithm is called Laplace smoothing.
• Laplace smoothing does not perform well enough to be
used in modern N-gram models, but it usefully
introduces many of the concepts that we see in other
smoothing algorithms, gives a useful baseline, and is
also a practical smoothing algorithm for other tasks like
text classification.
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Let’s start with the application of Laplace smoothing to
unigram probabilities.
• Recall that the unsmoothed maximum likelihood estimate
of the unigram probability of the word wi is its count ci
normalized by the total number of word tokens N:
𝑃 𝑊𝑖 =
𝐶𝑖
𝑁
• Laplace smoothing merely adds one to each count.
• Since there are V words in the vocabulary and each one
was incremented, we also need to adjust the denominator
to take into account the extra V observations.
𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 𝑊𝑖 =
𝐶𝑖 + 1
𝑁 + 𝑉
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Instead of changing both the numerator and
denominator, it is convenient to describe how a
smoothing algorithm affects the numerator, by defining
an adjusted count C*.
• This adjusted count is easier to compare directly with
the MLE counts and can be turned into a probability
like an MLE count by normalizing by N.
• To define this count, since we are only changing the
numerator in addition to adding 1 we’ll also need to
multiply by a normalization factor
𝑁
𝑁+𝑉
:
𝐶𝑖
∗
= (𝐶𝑖 + 1)
𝑁
𝑁 + 𝑉
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• A related way to view smoothing is as discounting
(lowering) some non-zero counts in order to get
the probability mass that will be assigned to the
zero counts.
• Thus, instead of referring to the discounted
counts c, we might describe a smoothing
algorithm in terms of a relative discount dc, the
ratio of the discounted counts to the original
counts:
𝑑 𝑐 =
𝐶∗
𝐶
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• let’s smooth our Berkeley Restaurant Project
bigrams.
• Let’s take the Berkeley Restaurant project
bigrams;
Laplace Smoothing / Add 1 Smoothing
(Cont…)
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Recall that normal bigram probabilities are computed by
normalizing each row of counts by the unigram count:
𝑃 𝑊𝑛 𝑊𝑛−1 =
𝐶(𝑊𝑛−1 𝑊𝑛)
𝐶(𝑊𝑛−1)
• For add-one smoothed bigram counts, we need to augment the
unigram count by the number of total word types in the vocabulary
V:
𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒
∗
𝑊𝑛 𝑊𝑛−1 =
𝐶 𝑊𝑛−1 𝑊𝑛 + 1
𝐶 𝑊𝑛−1 + 𝑉
• Thus, each of the unigram counts given in the previous table will
need to be augmented by V =1446.
• Ex:𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒
∗
𝑡𝑜 𝐼 =
𝐶 𝐼,𝑡𝑜 +1
𝐶 𝐼 +𝑉
=
0+1
2500+1446
=
1
3946
= 0.000253
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Following table shows the add-one smoothed
probabilities for the bigrams:
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• It is often convenient to reconstruct the count
matrix so we can see how much a smoothing
algorithm has changed the original counts.
• These adjusted counts can be computed by
using following equation:
𝐶∗
(𝑊𝑛−1 𝑊𝑛 ) =
[𝐶 𝑊𝑛−1 𝑊𝑛 ] × 𝐶(𝑊𝑛−1)
𝐶 𝑊𝑛−1 + 𝑉
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Following table shows the reconstructed
counts:
Laplace Smoothing / Add 1 Smoothing
(Cont…)
• Note that add-one smoothing has made a very big change to the
counts.
• C(want to) changed from 608 to 238!
• We can see this in probability space as well: P(to|want) decreases
from .66 in the unsmoothed case to .26 in the smoothed case.
• Looking at the discount d (the ratio between new and old counts)
shows us how strikingly the counts for each prefix word have been
reduced; the discount for the bigram want to is .39, while the
discount for Chinese food is .10, a factor of 10!
• The sharp change in counts and probabilities occurs because too
much probability mass is moved to all the zeros.
Add-K Smoothing
• One alternative to add-one smoothing is to move a bit
less of the probability mass from the seen to the
unseen events.
• Instead of adding 1 to each count, we add a fractional
count k (.5? .05? .01?).
• This algorithm is therefore called add-k smoothing.
𝑃𝐴𝑑𝑑−𝑘
∗
𝑊𝑛 𝑊𝑛−1 =
𝐶 𝑊𝑛−1 𝑊𝑛 + 𝑘
𝐶 𝑊𝑛−1 + 𝑘𝑉
• Add-k smoothing requires that we have a method for
choosing k; this can be done, for example, by
optimizing on a development set (devset).
Good-Turing Discounting
• The basic insight of Good-Turing smoothing is to re-estimate the amount of
probability mass to assign to N-grams with zero or low counts by looking at the
number of N-grams with higher counts.
• In other words, we examine Nc, the number of N-grams that occur c times.
• We refer to the number of N-grams that occur c times as the frequency of
frequency c.
• So applying the idea to smoothing the joint probability of bigrams, N0 is the
number of bigrams b of count 0, N1 the number of bigrams with count 1, and so
on:
𝑁𝐶 =
𝑏:𝑐 𝑏 =𝑐
1
• The Good-Turing estimate gives a smoothed count c* based on the set of Nc for all
c, as follows:
𝐶∗ = (𝐶 + 1)
𝑁𝐶 + 1
𝑁𝐶
https://www.youtube.com/watch?v=z1bq4C8hFEk
Backoff and Interpolation
• In the backoff model, like the deleted interpolation model,
we build an N-gram model based on an (N-1)-gram model.
• The difference is that in backoff, if we have non-zero
trigram counts, we rely solely on the trigram counts and
don’t interpolate the bigram and unigram counts at all.
• We only ‘back off’ to a lower-order N-gram if we have zero
evidence for a higher-order N-gram.
Reference
• https://www.youtube.com/watch?v=z1bq4C8
hFEk

Más contenido relacionado

La actualidad más candente

Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysisDataminingTools Inc
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Double DES & Triple DES
Double DES & Triple DESDouble DES & Triple DES
Double DES & Triple DESHemant Sharma
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptxHeneWijaya
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationSangwoo Mo
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 

La actualidad más candente (20)

Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Word embedding
Word embedding Word embedding
Word embedding
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Double DES & Triple DES
Double DES & Triple DESDouble DES & Triple DES
Double DES & Triple DES
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Tsp is NP-Complete
Tsp is NP-CompleteTsp is NP-Complete
Tsp is NP-Complete
 
Machine learning
Machine learningMachine learning
Machine learning
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 

Similar a NLP_KASHK:Smoothing N-gram Models

Numerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value ProblemsNumerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value ProblemsGobinda Debnath
 
Mechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectorsMechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectorsZondeenAlleyne
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999fashiontrendzz20
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsKrishnan MuthuManickam
 
Calculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptxCalculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptxSoleil50
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationSantiagoGarridoBulln
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepDan Wellisch
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
Number theory and cryptography
Number theory and cryptographyNumber theory and cryptography
Number theory and cryptographyYasser Ali
 
Linear Programing.pptx
Linear Programing.pptxLinear Programing.pptx
Linear Programing.pptxAdnanHaleem
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Computing Transformations Spring2005
Computing Transformations Spring2005Computing Transformations Spring2005
Computing Transformations Spring2005guest5989655
 
Computingtransformations Spring2005
Computingtransformations Spring2005Computingtransformations Spring2005
Computingtransformations Spring20051Leu
 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptxshyedshahriar
 

Similar a NLP_KASHK:Smoothing N-gram Models (20)

Numerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value ProblemsNumerical Analysis and Its application to Boundary Value Problems
Numerical Analysis and Its application to Boundary Value Problems
 
Mechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectorsMechanics physical quantities, si units and vectors
Mechanics physical quantities, si units and vectors
 
L06
L06L06
L06
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Mathematics Basic operations, fractions decimals and percentage.pptx
Mathematics Basic operations, fractions decimals and percentage.pptxMathematics Basic operations, fractions decimals and percentage.pptx
Mathematics Basic operations, fractions decimals and percentage.pptx
 
Applied Algorithms and Structures week999
Applied Algorithms and Structures week999Applied Algorithms and Structures week999
Applied Algorithms and Structures week999
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of Algorithms
 
Calculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptxCalculator-Techniques for engineering.pptx
Calculator-Techniques for engineering.pptx
 
Optimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimizationOptimum Engineering Design - Day 4 - Clasical methods of optimization
Optimum Engineering Design - Day 4 - Clasical methods of optimization
 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Optimization techniq
Optimization techniqOptimization techniq
Optimization techniq
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
Number theory and cryptography
Number theory and cryptographyNumber theory and cryptography
Number theory and cryptography
 
Linear Programing.pptx
Linear Programing.pptxLinear Programing.pptx
Linear Programing.pptx
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Computing Transformations Spring2005
Computing Transformations Spring2005Computing Transformations Spring2005
Computing Transformations Spring2005
 
Computingtransformations Spring2005
Computingtransformations Spring2005Computingtransformations Spring2005
Computingtransformations Spring2005
 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx
 

Más de Hemantha Kulathilake

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar Hemantha Kulathilake
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishHemantha Kulathilake
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation Hemantha Kulathilake
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops Hemantha Kulathilake
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingHemantha Kulathilake
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants Hemantha Kulathilake
 

Más de Hemantha Kulathilake (20)

NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
 
NLP_KASHK:Markov Models
NLP_KASHK:Markov ModelsNLP_KASHK:Markov Models
NLP_KASHK:Markov Models
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
NLP_KASHK:Morphology
NLP_KASHK:MorphologyNLP_KASHK:Morphology
NLP_KASHK:Morphology
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
COM1407: File Processing
COM1407: File Processing COM1407: File Processing
COM1407: File Processing
 
COm1407: Character & Strings
COm1407: Character & StringsCOm1407: Character & Strings
COm1407: Character & Strings
 
COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation COM1407: Structures, Unions & Dynamic Memory Allocation
COM1407: Structures, Unions & Dynamic Memory Allocation
 
COM1407: Input/ Output Functions
COM1407: Input/ Output FunctionsCOM1407: Input/ Output Functions
COM1407: Input/ Output Functions
 
COM1407: Working with Pointers
COM1407: Working with PointersCOM1407: Working with Pointers
COM1407: Working with Pointers
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops COM1407: Program Control Structures – Repetition and Loops
COM1407: Program Control Structures – Repetition and Loops
 
COM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & BranchingCOM1407: Program Control Structures – Decision Making & Branching
COM1407: Program Control Structures – Decision Making & Branching
 
COM1407: C Operators
COM1407: C OperatorsCOM1407: C Operators
COM1407: C Operators
 
COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants COM1407: Type Casting, Command Line Arguments and Defining Constants
COM1407: Type Casting, Command Line Arguments and Defining Constants
 

Último

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 

Último (20)

Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

NLP_KASHK:Smoothing N-gram Models

  • 1. Smoothing N-gram Models K.A.S.H. Kulathilake B.Sc.(Sp.Hons.)IT, MCS, Mphil, SEDA(UK)
  • 2. Smoothing • What do we do with words that are in our vocabulary (they are not unknown words) but appear in a test set in an unseen context (for example they appear after a word they never appeared after in training)? • To keep a language model from assigning zero probability to these unseen events, we’ll have to shave off a bit of probability mass from some more frequent events and give it to the events we’ve never seen. • This modification is called smoothing or discounting. • There are variety of ways to do smoothing: – Add-1 smoothing – Add-k smoothing – Good-Turing Discounting – Stupid backoff – Kneser-Ney smoothing and many more
  • 3. Laplace Smoothing / Add 1 Smoothing • The simplest way to do smoothing is to add one to all the bigram counts, before we normalize them into probabilities. • All the counts that used to be zero will now have a count of 1, the counts of 1 will be 2, and so on. • This algorithm is called Laplace smoothing. • Laplace smoothing does not perform well enough to be used in modern N-gram models, but it usefully introduces many of the concepts that we see in other smoothing algorithms, gives a useful baseline, and is also a practical smoothing algorithm for other tasks like text classification.
  • 4. Laplace Smoothing / Add 1 Smoothing (Cont…) • Let’s start with the application of Laplace smoothing to unigram probabilities. • Recall that the unsmoothed maximum likelihood estimate of the unigram probability of the word wi is its count ci normalized by the total number of word tokens N: 𝑃 𝑊𝑖 = 𝐶𝑖 𝑁 • Laplace smoothing merely adds one to each count. • Since there are V words in the vocabulary and each one was incremented, we also need to adjust the denominator to take into account the extra V observations. 𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 𝑊𝑖 = 𝐶𝑖 + 1 𝑁 + 𝑉
  • 5. Laplace Smoothing / Add 1 Smoothing (Cont…) • Instead of changing both the numerator and denominator, it is convenient to describe how a smoothing algorithm affects the numerator, by defining an adjusted count C*. • This adjusted count is easier to compare directly with the MLE counts and can be turned into a probability like an MLE count by normalizing by N. • To define this count, since we are only changing the numerator in addition to adding 1 we’ll also need to multiply by a normalization factor 𝑁 𝑁+𝑉 : 𝐶𝑖 ∗ = (𝐶𝑖 + 1) 𝑁 𝑁 + 𝑉
  • 6. Laplace Smoothing / Add 1 Smoothing (Cont…) • A related way to view smoothing is as discounting (lowering) some non-zero counts in order to get the probability mass that will be assigned to the zero counts. • Thus, instead of referring to the discounted counts c, we might describe a smoothing algorithm in terms of a relative discount dc, the ratio of the discounted counts to the original counts: 𝑑 𝑐 = 𝐶∗ 𝐶
  • 7. Laplace Smoothing / Add 1 Smoothing (Cont…) • let’s smooth our Berkeley Restaurant Project bigrams. • Let’s take the Berkeley Restaurant project bigrams;
  • 8. Laplace Smoothing / Add 1 Smoothing (Cont…)
  • 9. Laplace Smoothing / Add 1 Smoothing (Cont…) • Recall that normal bigram probabilities are computed by normalizing each row of counts by the unigram count: 𝑃 𝑊𝑛 𝑊𝑛−1 = 𝐶(𝑊𝑛−1 𝑊𝑛) 𝐶(𝑊𝑛−1) • For add-one smoothed bigram counts, we need to augment the unigram count by the number of total word types in the vocabulary V: 𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 ∗ 𝑊𝑛 𝑊𝑛−1 = 𝐶 𝑊𝑛−1 𝑊𝑛 + 1 𝐶 𝑊𝑛−1 + 𝑉 • Thus, each of the unigram counts given in the previous table will need to be augmented by V =1446. • Ex:𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 ∗ 𝑡𝑜 𝐼 = 𝐶 𝐼,𝑡𝑜 +1 𝐶 𝐼 +𝑉 = 0+1 2500+1446 = 1 3946 = 0.000253
  • 10. Laplace Smoothing / Add 1 Smoothing (Cont…) • Following table shows the add-one smoothed probabilities for the bigrams:
  • 11. Laplace Smoothing / Add 1 Smoothing (Cont…) • It is often convenient to reconstruct the count matrix so we can see how much a smoothing algorithm has changed the original counts. • These adjusted counts can be computed by using following equation: 𝐶∗ (𝑊𝑛−1 𝑊𝑛 ) = [𝐶 𝑊𝑛−1 𝑊𝑛 ] × 𝐶(𝑊𝑛−1) 𝐶 𝑊𝑛−1 + 𝑉
  • 12. Laplace Smoothing / Add 1 Smoothing (Cont…) • Following table shows the reconstructed counts:
  • 13. Laplace Smoothing / Add 1 Smoothing (Cont…) • Note that add-one smoothing has made a very big change to the counts. • C(want to) changed from 608 to 238! • We can see this in probability space as well: P(to|want) decreases from .66 in the unsmoothed case to .26 in the smoothed case. • Looking at the discount d (the ratio between new and old counts) shows us how strikingly the counts for each prefix word have been reduced; the discount for the bigram want to is .39, while the discount for Chinese food is .10, a factor of 10! • The sharp change in counts and probabilities occurs because too much probability mass is moved to all the zeros.
  • 14. Add-K Smoothing • One alternative to add-one smoothing is to move a bit less of the probability mass from the seen to the unseen events. • Instead of adding 1 to each count, we add a fractional count k (.5? .05? .01?). • This algorithm is therefore called add-k smoothing. 𝑃𝐴𝑑𝑑−𝑘 ∗ 𝑊𝑛 𝑊𝑛−1 = 𝐶 𝑊𝑛−1 𝑊𝑛 + 𝑘 𝐶 𝑊𝑛−1 + 𝑘𝑉 • Add-k smoothing requires that we have a method for choosing k; this can be done, for example, by optimizing on a development set (devset).
  • 15. Good-Turing Discounting • The basic insight of Good-Turing smoothing is to re-estimate the amount of probability mass to assign to N-grams with zero or low counts by looking at the number of N-grams with higher counts. • In other words, we examine Nc, the number of N-grams that occur c times. • We refer to the number of N-grams that occur c times as the frequency of frequency c. • So applying the idea to smoothing the joint probability of bigrams, N0 is the number of bigrams b of count 0, N1 the number of bigrams with count 1, and so on: 𝑁𝐶 = 𝑏:𝑐 𝑏 =𝑐 1 • The Good-Turing estimate gives a smoothed count c* based on the set of Nc for all c, as follows: 𝐶∗ = (𝐶 + 1) 𝑁𝐶 + 1 𝑁𝐶 https://www.youtube.com/watch?v=z1bq4C8hFEk
  • 16. Backoff and Interpolation • In the backoff model, like the deleted interpolation model, we build an N-gram model based on an (N-1)-gram model. • The difference is that in backoff, if we have non-zero trigram counts, we rely solely on the trigram counts and don’t interpolate the bigram and unigram counts at all. • We only ‘back off’ to a lower-order N-gram if we have zero evidence for a higher-order N-gram.