Rules engines to machine learning

•Download as PPTX, PDF•

3 likes•750 views

A machine learning classifier can automatically learn probabilities needed for classification by estimating them from labeled training data. Specifically, a naïve Bayesian classifier estimates probabilities like P(category|feature) by using Bayes' rule to calculate P(feature|category)*P(category)/P(feature) from observed frequencies of features in labeled categories. This allows moving from an initial rule-based classifier using word lists to a probabilistic naïve Bayesian approach that learns probabilities automatically rather than requiring them to be manually specified.

Technology

From Linguistic Rules to Machine Learning

Cohan Sujay Carlos
Aiaioo Labs
Bangalore, India
cohan@aiaioo.com

What is a Classifier?

A machine learning tool used to apply a label to data.

Classification used in Text Categorization
Politics Sports
The UN Security Warwickshire's Clarke
Council adopts its first equalled the first-class
clear condemnation of record of seven
Syria for its continuing catches for an
crackdown on outfielder in an
protests, as the army innings but Lancashire
continues its advance took control on day
into Hama. three.

How to Build a Classifier for Text Categorization
How do you tell which label (Politics/Sports) is suitable?

The UN Security Warwickshire's Clarke
Council adopts its first equalled the first-class
clear condemnation of record of seven
Syria for its continuing catches for an
crackdown on outfielder in an
protests, as the army innings but Lancashire
continues its advance took control on day
into Hama. three.

In this case it’s only words
that you will need!

Classification used in Text Categorization
See the words?

The UN Security Warwickshire's Clarke
Council adopts its first equalled the first-class
clear condemnation of record of seven
Syria for its continuing catches for an
crackdown on outfielder in an
protests, as the army innings but Lancashire
continues its advance took control on day
into Hama. three.

Rule-Based Text Categorization
Gazetteers (word lists)

UN Security Council Warwickshire
Adopts Clarke
Condemnation First-class
Syria Record
Crackdown Catches
Protests Outfielder
Army Innings
Hama Lancashire

So you can just use word lists for classification?
Yeah, but they won’t work very well.
Can you see why word lists alone won’t work very well?

Rule-Based to Naïve Bayesian

How can you go:

from the starting point (word lists)

to a really cool classification algorithm

All you need is weights!

Rule-Based with Weights
Let’s improve the gazetteers with weights

Politics Sports
UN 1.0 Warwickshire 0.3
Adopts 0.1 Clarke 0.1
Condemnation 0.2 First-class 0.6
Syria 0.3 Record 0.3
Crackdown 0.8 Catches 0.6
Protests 1.0 Outfielder 1.0
Army 0.8 Innings 0.9
Hama 1.0 Lancashire 0.5
These weights are nothing but P(Category|Word).

Rule-Based with Weights

Politics
UN 1.0 P(Politics|UN)
Adopts 0.1 P(Politics|Adopts)
Condemnation 0.2 P(Politics|Condemnation)
Syria 0.3 P(Politics|Syria)
Crackdown 0.8 P(Politics|Crackdown)
Protests 1.0 P(Politics|Protests)
Army 0.8 P(Politics|Army)
Hama 1.0 P(Politics|Hama)

Rule-Based with Weights

Politics
UN 1.0 P(Politics | “UN”)
Adopts 0.1 P(Politics | “Adopts”)

How can you learn these probabilities automatically?

Rule-Based with Weights

Politics
UN 1.0 P(Politics | “UN”)
Adopts 0.1 P(Politics | “Adopts”)

How can you learn these probabilities automatically?

Estimation
P(Politics | “UN”) = 20/20

Statistically not a very accurate estimator - denominator is small.

Rule-Based with Weights

Politics
UN 1.0 P(Politics | “UN”)
Adopts 0.1 P(Politics | “Adopts”)

How can you learn these probabilities automatically?

Instead you Estimate
P(“UN”|Politics) = 20/40000
{ C(“UN” in politics) / C(all words in category politics) }

Statistically this is a better estimator

Rule-Based with Weights

Politics
UN 1.0 P(Politics | “UN”)
Adopts 0.1 P(Politics | “Adopts”)

How can you learn these probabilities automatically?

A Naïve Bayesian classifier uses a P(Politics | “UN”)
estimate calculated from P(“UN”|Politics).

That’s so cool! Time to learn how to do that!

Use Bayesian Inversion!

In other words, we are looking to turn
P(F|E) into P(E|F).

There is an equation to do this :
P(E|F) = P(F|E) * P(E) / P(F) [Bayesian Inversion]

So finally … you have …

Politics
UN 1.0 P(“UN”|Politics)* P(Politics)/P(“UN”)
Adopts 0.1 P(“Adopts”|Politics)*P(Politics)/P(“Adopts”)

That was easy wasn’t it?!

These don’t have to be only words. They can be ANY sort of feature (word pairs, syntax).

You have just Learnt
How to Build
A Naïve Bayesian Classifier
Starting from Linguistic Rules (Word Lists)

Cohan Sujay Carlos
Aiaioo Labs
Bangalore, India
cohan@aiaioo.com

Recently uploaded

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Why Teams call analytics are critical to your entire businesspanagenda

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

GenAI Risks & Security Meetup 01052024.pdflior mazor

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Artificial Intelligence: Facts and MythsJoaquim Jorge

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Strategies for Landing an Oracle DBA Job as a Fresher

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

presentation ICT roal in 21st century education

Why Teams call analytics are critical to your entire business

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Boost Fertility New Invention Ups Success Rates.pdf

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

HTML Injection Attacks: Impact and Mitigation Strategies

Axa Assurance Maroc - Insurer Innovation Award 2024

GenAI Risks & Security Meetup 01052024.pdf

Exploring the Future Potential of AI-Enabled Smartphone Processors

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Artificial Intelligence: Facts and Myths

Boost PC performance: How more available memory can improve productivity

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Rules engines to machine learning

1. From Linguistic Rules to Machine Learning Cohan Sujay Carlos Aiaioo Labs Bangalore, India cohan@aiaioo.com

2. What is a Classifier? A machine learning tool used to apply a label to data.

3. Classification used in Text Categorization Politics Sports The UN Security Warwickshire's Clarke Council adopts its first equalled the first-class clear condemnation of record of seven Syria for its continuing catches for an crackdown on outfielder in an protests, as the army innings but Lancashire continues its advance took control on day into Hama. three.

4. How to Build a Classifier for Text Categorization How do you tell which label (Politics/Sports) is suitable? The UN Security Warwickshire's Clarke Council adopts its first equalled the first-class clear condemnation of record of seven Syria for its continuing catches for an crackdown on outfielder in an protests, as the army innings but Lancashire continues its advance took control on day into Hama. three.

5. In this case it’s only words that you will need!

6. Classification used in Text Categorization See the words? The UN Security Warwickshire's Clarke Council adopts its first equalled the first-class clear condemnation of record of seven Syria for its continuing catches for an crackdown on outfielder in an protests, as the army innings but Lancashire continues its advance took control on day into Hama. three.

7. Rule-Based Text Categorization Gazetteers (word lists) UN Security Council Warwickshire Adopts Clarke Condemnation First-class Syria Record Crackdown Catches Protests Outfielder Army Innings Hama Lancashire So you can just use word lists for classification? Yeah, but they won’t work very well. Can you see why word lists alone won’t work very well?

8. Rule-Based to Naïve Bayesian How can you go: from the starting point (word lists) to a really cool classification algorithm All you need is weights!

9. Rule-Based with Weights Let’s improve the gazetteers with weights Politics Sports UN 1.0 Warwickshire 0.3 Adopts 0.1 Clarke 0.1 Condemnation 0.2 First-class 0.6 Syria 0.3 Record 0.3 Crackdown 0.8 Catches 0.6 Protests 1.0 Outfielder 1.0 Army 0.8 Innings 0.9 Hama 1.0 Lancashire 0.5 These weights are nothing but P(Category|Word).

10. Rule-Based with Weights Let’s improve the gazetteers with weights Politics Sports UN 1.0 Warwickshire 0.3 Adopts 0.1 Clarke 0.1 Condemnation 0.2 First-class 0.6 Syria 0.3 Record 0.3 Crackdown 0.8 Catches 0.6 Protests 1.0 Outfielder 1.0 Army 0.8 Innings 0.9 Hama 1.0 Lancashire 0.5 P(Politics|Word) P(Sports|Word)

12. Rule-Based with Weights Politics UN 1.0 P(Politics | “UN”) Adopts 0.1 P(Politics | “Adopts”) How can you learn these probabilities automatically?

13. Rule-Based with Weights Politics UN 1.0 P(Politics | “UN”) Adopts 0.1 P(Politics | “Adopts”) How can you learn these probabilities automatically? Estimation P(Politics | “UN”) = 20/20 Statistically not a very accurate estimator - denominator is small.

14. Rule-Based with Weights Politics UN 1.0 P(Politics | “UN”) Adopts 0.1 P(Politics | “Adopts”) How can you learn these probabilities automatically? Instead you Estimate P(“UN”|Politics) = 20/40000 { C(“UN” in politics) / C(all words in category politics) } Statistically this is a better estimator

15. Rule-Based with Weights Politics UN 1.0 P(Politics | “UN”) Adopts 0.1 P(Politics | “Adopts”) How can you learn these probabilities automatically? A Naïve Bayesian classifier uses a P(Politics | “UN”) estimate calculated from P(“UN”|Politics). That’s so cool! Time to learn how to do that!

16. Use Bayesian Inversion! In other words, we are looking to turn P(F|E) into P(E|F). There is an equation to do this : P(E|F) = P(F|E) * P(E) / P(F) [Bayesian Inversion]

17. So finally … you have … Politics UN 1.0 P(“UN”|Politics)* P(Politics)/P(“UN”) Adopts 0.1 P(“Adopts”|Politics)*P(Politics)/P(“Adopts”) That was easy wasn’t it?! These don’t have to be only words. They can be ANY sort of feature (word pairs, syntax).

18. You have just Learnt How to Build A Naïve Bayesian Classifier Starting from Linguistic Rules (Word Lists) Cohan Sujay Carlos Aiaioo Labs Bangalore, India cohan@aiaioo.com

Rules engines to machine learning

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Rules engines to machine learning