SlideShare a Scribd company logo
1 of 9
Download to read offline
Fuzzy Matching/Logic
Explained
2
What is Fuzzy Matching?
Fuzzy Matching also called as Approximate String Matching is a technique that helps identify
two elements of text, strings, or entries that are approximately similar but are not exactly the
same.
3
How does Fuzzy Matching help in the real
world?
There are many situations where the Fuzzy Matching technique can come in handy. Let’s look at some
real-world examples of using Fuzzy Matching.
1. Creating a Single Customer View: A large organization is bound to have a multitude of such tables which
they could join to obtain a single customer view. This often requires fuzzy string matching
2. Fraud Detection: A good fuzzy string matching algorithm can help in detecting fraud within an
organization. FAA used fuzzy string matching to single out several pilots for exhibiting fraudulent behavior.
3. Data Accuracy: Fuzzy string matching can help improve data quality and accuracy by data deduplication,
identification of false-positive, etc.
4
How does Fuzzy Matching work?
Traditional logic is binary in nature i.e. a statement is either true or false. On the contrary, fuzzy logic
indicates the degree to which a statement is true.
5
How does Fuzzy Name Matching work?
One of the most important use cases of fuzzy matching arises when we want to join tables using the
name field. Matching these requires a set of rules that can handle slight variations in the name field.
These sets of rules are called fuzzy rules and we call this process as Fuzzy Name Matching.
6
How to perform Fuzzy Name Matching?
Like with many computing techniques there are popular algorithms that can be used in performing Fuzzy
Name Matching. The following are some popular Fuzzy Name Matching algorithms.
1. Levenshtein Distance: The Levenshtein distance is a metric used to measure the difference between
2 string sequences. It gives us a measure of the number of single character insertions, deletions or
substitutions required to change one string into another.
2. The Soundex Algorithm: Soundex is a phonetic algorithm that is used to search for names that sound
similar but are spelled differently. It is most commonly used for genealogical database searches.
3. The Metaphone and Double Metaphone Algorithms: The Metaphone algorithm is an improvement
over the vanilla Soundex algorithm, while the double Metaphone algorithm builds upon the Metaphone
algorithm. The ‘double’ Metaphone algorithm returns two keys for words that have more than one
pronunciation.
4. Cosine Similarity: Cosine Similarity between two non-zero vectors is equal to the cosine of the angle
between them.
7
Implementing Fuzzy Matching...
Fuzzy Matching algorithms can be implemented in various programming languages.
1. Fuzzy String Matching Using Python: Fuzzywuzzy is a python library that is used for fuzzy string
matching. The basic comparison metric used by the Fuzzywuzzy library is the Levenshtein distance.
2. Fuzzy String Matching Using Java: Things were a little tougher in java as it isn't specifically designed
for data science. However, there are a lot of github repositories available that perform fuzzy string
matching using java.
3. Fuzzy String Matching Using Microsoft Excel: Excel also provides a Fuzzy Lookup Add-In that is
used to perform fuzzy matching between columns on the desktop version.
8
Fuzzy Matching best practices
1. Fuzzy string matching is a widely researched area and new algorithms/software are periodically
released therefore it pays to keep your eyes and ears open for new developments.
2. Even after rigorous testing, you are bound to end up with a few false positives so make sure that you
don't use fuzzy software to process sensitive data.
3. Fuzzy string matching pays the highest dividends when you have a lot of data that if matched
correctly results in a large upside while false positives don't matter as much.
9
Learn more about Fuzzy
Matching:
https://nanonets.com/blog/fuzzy-matching-fuzzy-logic/

More Related Content

What's hot

Control Structures in Python
Control Structures in PythonControl Structures in Python
Control Structures in PythonSumit Satam
 
Python lab basics
Python lab basicsPython lab basics
Python lab basicsAbi_Kasi
 
Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | EdurekaEdureka!
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network securitybabak danyal
 
Voice Dubbing Automation
Voice Dubbing AutomationVoice Dubbing Automation
Voice Dubbing AutomationUtkarsh Agrawal
 
How Hashing Algorithms Work
How Hashing Algorithms WorkHow Hashing Algorithms Work
How Hashing Algorithms WorkCheapSSLsecurity
 
Introduction to Python - Part Two
Introduction to Python - Part TwoIntroduction to Python - Part Two
Introduction to Python - Part Twoamiable_indian
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherNiloy Biswas
 
Introduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful SoupIntroduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful SoupTushar Mittal
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyNati Shalom
 
File handling & regular expressions in python programming
File handling & regular expressions in python programmingFile handling & regular expressions in python programming
File handling & regular expressions in python programmingSrinivas Narasegouda
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
 
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...Edureka!
 
tumblr.com 에 대한 DB구조 분석
tumblr.com 에 대한 DB구조 분석tumblr.com 에 대한 DB구조 분석
tumblr.com 에 대한 DB구조 분석Matthew Chang
 

What's hot (20)

Control Structures in Python
Control Structures in PythonControl Structures in Python
Control Structures in Python
 
Python lab basics
Python lab basicsPython lab basics
Python lab basics
 
Hashing
HashingHashing
Hashing
 
Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | Edureka
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network security
 
Php Lecture Notes
Php Lecture NotesPhp Lecture Notes
Php Lecture Notes
 
Voice Dubbing Automation
Voice Dubbing AutomationVoice Dubbing Automation
Voice Dubbing Automation
 
How Hashing Algorithms Work
How Hashing Algorithms WorkHow Hashing Algorithms Work
How Hashing Algorithms Work
 
Introduction to Python - Part Two
Introduction to Python - Part TwoIntroduction to Python - Part Two
Introduction to Python - Part Two
 
Http
HttpHttp
Http
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipher
 
Introduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful SoupIntroduction to Web Scraping using Python and Beautiful Soup
Introduction to Web Scraping using Python and Beautiful Soup
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
 
File handling & regular expressions in python programming
File handling & regular expressions in python programmingFile handling & regular expressions in python programming
File handling & regular expressions in python programming
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
What are Data structures in Python? | List, Dictionary, Tuple Explained | Edu...
 
tumblr.com 에 대한 DB구조 분석
tumblr.com 에 대한 DB구조 분석tumblr.com 에 대한 DB구조 분석
tumblr.com 에 대한 DB구조 분석
 

Similar to Fuzzy Matching or Fuzzy Logic Explained

The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGcsandit
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGcscpconf
 
Achieving flatness selecting the honeywords
Achieving flatness selecting the honeywordsAchieving flatness selecting the honeywords
Achieving flatness selecting the honeywordsKamal Spring
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesIOSR Journals
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfNohaGhoweil
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Dhabal Sethi
 
Detection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningDetection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningijaia
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11RajpootBhatti5
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisAndi Wu
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered HarmfulPrateek Singh
 
Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingShakas Technologies
 
Lecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnLecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnKodok Ngorex
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportAkshit Arora
 

Similar to Fuzzy Matching or Fuzzy Logic Explained (20)

The search engine index
The search engine indexThe search engine index
The search engine index
 
Irjet v7 i4693
Irjet v7 i4693Irjet v7 i4693
Irjet v7 i4693
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
 
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNINGDETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
DETECTION OF JARGON WORDS IN A TEXT USING SEMI-SUPERVISED LEARNING
 
Achieving flatness selecting the honeywords
Achieving flatness selecting the honeywordsAchieving flatness selecting the honeywords
Achieving flatness selecting the honeywords
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining Methodologies
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11Ijarcet vol-3-issue-1-9-11
Ijarcet vol-3-issue-1-9-11
 
Detection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learningDetection of slang words in e data using semi supervised learning
Detection of slang words in e data using semi supervised learning
 
Spreadsheets are code
Spreadsheets are codeSpreadsheets are code
Spreadsheets are code
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
 
Word Segmentation in Sentence Analysis
Word Segmentation in Sentence AnalysisWord Segmentation in Sentence Analysis
Word Segmentation in Sentence Analysis
 
Algorithm
AlgorithmAlgorithm
Algorithm
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
XML Considered Harmful
XML Considered HarmfulXML Considered Harmful
XML Considered Harmful
 
unit-5.pdf
unit-5.pdfunit-5.pdf
unit-5.pdf
 
Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity ranking
 
Lecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can LearnLecture7 Ml Machines That Can Learn
Lecture7 Ml Machines That Can Learn
 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
 

More from OliviaSmith160

More from OliviaSmith160 (7)

What is Accounts Payable
What is Accounts PayableWhat is Accounts Payable
What is Accounts Payable
 
The Accounts Payable Process
The Accounts Payable ProcessThe Accounts Payable Process
The Accounts Payable Process
 
What is Zonal OCR?
What is Zonal OCR?What is Zonal OCR?
What is Zonal OCR?
 
PDF OCR
PDF OCRPDF OCR
PDF OCR
 
Document Parsing
Document ParsingDocument Parsing
Document Parsing
 
Payment Reconciliation
Payment ReconciliationPayment Reconciliation
Payment Reconciliation
 
PDF to Excel
PDF to ExcelPDF to Excel
PDF to Excel
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Fuzzy Matching or Fuzzy Logic Explained

  • 2. 2 What is Fuzzy Matching? Fuzzy Matching also called as Approximate String Matching is a technique that helps identify two elements of text, strings, or entries that are approximately similar but are not exactly the same.
  • 3. 3 How does Fuzzy Matching help in the real world? There are many situations where the Fuzzy Matching technique can come in handy. Let’s look at some real-world examples of using Fuzzy Matching. 1. Creating a Single Customer View: A large organization is bound to have a multitude of such tables which they could join to obtain a single customer view. This often requires fuzzy string matching 2. Fraud Detection: A good fuzzy string matching algorithm can help in detecting fraud within an organization. FAA used fuzzy string matching to single out several pilots for exhibiting fraudulent behavior. 3. Data Accuracy: Fuzzy string matching can help improve data quality and accuracy by data deduplication, identification of false-positive, etc.
  • 4. 4 How does Fuzzy Matching work? Traditional logic is binary in nature i.e. a statement is either true or false. On the contrary, fuzzy logic indicates the degree to which a statement is true.
  • 5. 5 How does Fuzzy Name Matching work? One of the most important use cases of fuzzy matching arises when we want to join tables using the name field. Matching these requires a set of rules that can handle slight variations in the name field. These sets of rules are called fuzzy rules and we call this process as Fuzzy Name Matching.
  • 6. 6 How to perform Fuzzy Name Matching? Like with many computing techniques there are popular algorithms that can be used in performing Fuzzy Name Matching. The following are some popular Fuzzy Name Matching algorithms. 1. Levenshtein Distance: The Levenshtein distance is a metric used to measure the difference between 2 string sequences. It gives us a measure of the number of single character insertions, deletions or substitutions required to change one string into another. 2. The Soundex Algorithm: Soundex is a phonetic algorithm that is used to search for names that sound similar but are spelled differently. It is most commonly used for genealogical database searches. 3. The Metaphone and Double Metaphone Algorithms: The Metaphone algorithm is an improvement over the vanilla Soundex algorithm, while the double Metaphone algorithm builds upon the Metaphone algorithm. The ‘double’ Metaphone algorithm returns two keys for words that have more than one pronunciation. 4. Cosine Similarity: Cosine Similarity between two non-zero vectors is equal to the cosine of the angle between them.
  • 7. 7 Implementing Fuzzy Matching... Fuzzy Matching algorithms can be implemented in various programming languages. 1. Fuzzy String Matching Using Python: Fuzzywuzzy is a python library that is used for fuzzy string matching. The basic comparison metric used by the Fuzzywuzzy library is the Levenshtein distance. 2. Fuzzy String Matching Using Java: Things were a little tougher in java as it isn't specifically designed for data science. However, there are a lot of github repositories available that perform fuzzy string matching using java. 3. Fuzzy String Matching Using Microsoft Excel: Excel also provides a Fuzzy Lookup Add-In that is used to perform fuzzy matching between columns on the desktop version.
  • 8. 8 Fuzzy Matching best practices 1. Fuzzy string matching is a widely researched area and new algorithms/software are periodically released therefore it pays to keep your eyes and ears open for new developments. 2. Even after rigorous testing, you are bound to end up with a few false positives so make sure that you don't use fuzzy software to process sensitive data. 3. Fuzzy string matching pays the highest dividends when you have a lot of data that if matched correctly results in a large upside while false positives don't matter as much.
  • 9. 9 Learn more about Fuzzy Matching: https://nanonets.com/blog/fuzzy-matching-fuzzy-logic/