SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Named Entity Recognition (NER) from a business point of view :
coupling a rule-based approach with Machine Learning algorithms
Sotiria Bampatzani
NLP Data Engineer - QWAM Content Intelligence
Paris, France
CONTENTS
• Presentation
• Named Entity Recognition (NER)
• The rule-based approach
• Adding Machine Learning to the mix
• Use case example
• Not stopping there…
• Conclusion
PRESENTATION
QWAM Content Intelligence
• QWAM Content Intelligence is a solutions editor, who provides innovative software
solutions for analyzing textual data and extracting insights with its AI and Semantics
technologies.
Search engine to manage textual and press/media content
SaaS solution for real-time web information monitoring
Analytics platform for extracting key information from
textual data
3
NAMED ENTITY RECOGNITION
4
Brief introduction
• 1987 : first studies on information extraction (IE)
• 1991 : first study on Named Entity Recognition (NER)
• 1995 : NER becomes one of the basic Natural Language Processing (NLP) tasks
Named Entity Recognition
Entity Identification Entity Classification
How…
• Rule-based approach : Annotation rules
• Learning approach : word embeddings, statistical models, neural networks, etc.
• Hybrid approach : combination of the rule-based and learning approaches
THE RULE-BASED APPROACH
5
Advantages of this approach
• Robust
• Accurate results
• Adaptable to new types of entities
Drawbacks of this approach
• Based on non-contextual grammars and lexicon (gazetteer) lists, whose maintenance
and update is costly
• Impossible to treat all spelling variants and the resulting ambiguity
• Discovering new entities is very difficult, if not impossible
ADDING ML TO THE MIX…
6
Hybrid approach
• Creation of a dataset containing over 40M news articles with the use of one of
QWAM’s solutions, Ask’n’Read
• Annotation of the aforementioned dataset with the annotation rules developed by our
Text Analytics team
• Use of this annotated dataset in order to train ML models
o RNN/LSTM, word embeddings (word2vec), BERT…
However…
Data preprocessing and filtering do not result in a 100% “clean” dataset.
The training set also contains errors or missing annotations !
DATASET EXAMPLE
7
source : https://www.phonandroid.com/samsung-annonce-arrivee-smartphones-ecran-enroulable-coulissant.html
ADDING ML TO THE MIX…
8
Evaluation
• The ML model correctly identified and classified new entities, that are added to our
gazetteer lists.
• Following statistical evaluation, it appeared that a number of errors resulted from
specific annotation rules. These annotation rules were later improved.
• The dataset is then reannotated with the enhanced annotation rules and the cycle
starts anew…
…And what of client data ?
USE CASE - EXAMPLE
9
Client data
• The need to identify new types of entities arises. Extracting key information, not
limited to predefined categories (person names, locations, organizations, etc.) is
crucial in order to thoroughly analyze the data.
• The size, oftentimes sensitive nature of the dataset, as well as the time allocated to
the project, may not allow for machine learning.
QWAM’s solution…
• Preprocessing and annotating of the data with the “standard” application.
• Identification of a priori “interesting” entities in the data, thanks to an annotation rule
used for “discovering” potentially interesting information.
• Use of these annotations to build a dedicated ontology.
USE CASE - EXAMPLE
10
QWAM Ontology Manager
USE CASE - EXAMPLE
11
USE CASE - EXAMPLE
12
How ML is a part of
Ontology Manager
• Suggestions of a machine
learning algorithm are
incorporated in the platform,
and proposed to users in
order to promote and
facilitate ontology evolution
NOT STOPPING THERE…
13
Establishing relations between entities
• Once named entity recognition and concept recognition are in place, the next step is
to establish a link between them.
o “Atos finalise le rachat de la société canadienne In Fidem”
Company-buys-Company
o TESSI signe un partenariat stratégique avec NEHS DIGITAL”
Company-partners with-Company
• Like named entity and concept recognition, the same methods are implemented.
o A “standard” gazetteer with these expressions already exists, which allows for an
initial annotation and recognition.
o Another annotation rules is used in order to discover new expressions and
relations between different types of entities.
o A ML model is trained for further exploration.
USE CASE - EXAMPLE
14
• Unlike entities or concepts, suggestions are calculated and proposed
based on the content of the whole category.
CONCLUSION
15
Conclusion
• A rule-based approach is not sufficient if we wish to discover new entities “on the fly”.
• The size, oftentimes sensitive nature of a client’s dataset, as well as the time allocated
to the project, may not allow for machine learning in a business setting.
• A hybrid approach seems to be the most efficient method
o Adaptable to different client’s data
• When doing an NLP task such as named entity recognition, more often than not,
errors are cumulated at every step.
• The biggest advantage of the method we use at QWAM, is an NLP Engineer’s rapid
and active involvement at every step of the way.
THANK YOU
16

Más contenido relacionado

Similar a Sotiria bampatzani wi_mlds_presentation_20200203

Prescriptive Analytics-1.pptx
Prescriptive Analytics-1.pptxPrescriptive Analytics-1.pptx
Prescriptive Analytics-1.pptxKarthik132344
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystifiedMarc Moreau
 
artificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdfartificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdftt4765690
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-ExpertsSynaptica, LLC
 
CC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsCC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsSebastian Dennerlein
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Papershashanksalunkhe12
 
Introduction to data science.pdf
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdfalsaid fathy
 
Artificial Intelligence
Artificial Intelligence  Artificial Intelligence
Artificial Intelligence Nishmi Suresh
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxandreecapon
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 

Similar a Sotiria bampatzani wi_mlds_presentation_20200203 (20)

Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Machine learning
Machine learningMachine learning
Machine learning
 
Prescriptive Analytics-1.pptx
Prescriptive Analytics-1.pptxPrescriptive Analytics-1.pptx
Prescriptive Analytics-1.pptx
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystified
 
artificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdfartificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdf
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Text Analytics for Non-Experts
Text Analytics for Non-ExpertsText Analytics for Non-Experts
Text Analytics for Non-Experts
 
CC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsCC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithms
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Introduction to data science.pdf
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdf
 
Artificial Intelligence
Artificial Intelligence  Artificial Intelligence
Artificial Intelligence
 
McKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docxMcKinsey Global Institute Big data The next frontier for innova.docx
McKinsey Global Institute Big data The next frontier for innova.docx
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 

Más de Paris Women in Machine Learning and Data Science

Más de Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Último

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 

Último (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 

Sotiria bampatzani wi_mlds_presentation_20200203

  • 1. Named Entity Recognition (NER) from a business point of view : coupling a rule-based approach with Machine Learning algorithms Sotiria Bampatzani NLP Data Engineer - QWAM Content Intelligence Paris, France
  • 2. CONTENTS • Presentation • Named Entity Recognition (NER) • The rule-based approach • Adding Machine Learning to the mix • Use case example • Not stopping there… • Conclusion
  • 3. PRESENTATION QWAM Content Intelligence • QWAM Content Intelligence is a solutions editor, who provides innovative software solutions for analyzing textual data and extracting insights with its AI and Semantics technologies. Search engine to manage textual and press/media content SaaS solution for real-time web information monitoring Analytics platform for extracting key information from textual data 3
  • 4. NAMED ENTITY RECOGNITION 4 Brief introduction • 1987 : first studies on information extraction (IE) • 1991 : first study on Named Entity Recognition (NER) • 1995 : NER becomes one of the basic Natural Language Processing (NLP) tasks Named Entity Recognition Entity Identification Entity Classification How… • Rule-based approach : Annotation rules • Learning approach : word embeddings, statistical models, neural networks, etc. • Hybrid approach : combination of the rule-based and learning approaches
  • 5. THE RULE-BASED APPROACH 5 Advantages of this approach • Robust • Accurate results • Adaptable to new types of entities Drawbacks of this approach • Based on non-contextual grammars and lexicon (gazetteer) lists, whose maintenance and update is costly • Impossible to treat all spelling variants and the resulting ambiguity • Discovering new entities is very difficult, if not impossible
  • 6. ADDING ML TO THE MIX… 6 Hybrid approach • Creation of a dataset containing over 40M news articles with the use of one of QWAM’s solutions, Ask’n’Read • Annotation of the aforementioned dataset with the annotation rules developed by our Text Analytics team • Use of this annotated dataset in order to train ML models o RNN/LSTM, word embeddings (word2vec), BERT… However… Data preprocessing and filtering do not result in a 100% “clean” dataset. The training set also contains errors or missing annotations !
  • 7. DATASET EXAMPLE 7 source : https://www.phonandroid.com/samsung-annonce-arrivee-smartphones-ecran-enroulable-coulissant.html
  • 8. ADDING ML TO THE MIX… 8 Evaluation • The ML model correctly identified and classified new entities, that are added to our gazetteer lists. • Following statistical evaluation, it appeared that a number of errors resulted from specific annotation rules. These annotation rules were later improved. • The dataset is then reannotated with the enhanced annotation rules and the cycle starts anew… …And what of client data ?
  • 9. USE CASE - EXAMPLE 9 Client data • The need to identify new types of entities arises. Extracting key information, not limited to predefined categories (person names, locations, organizations, etc.) is crucial in order to thoroughly analyze the data. • The size, oftentimes sensitive nature of the dataset, as well as the time allocated to the project, may not allow for machine learning. QWAM’s solution… • Preprocessing and annotating of the data with the “standard” application. • Identification of a priori “interesting” entities in the data, thanks to an annotation rule used for “discovering” potentially interesting information. • Use of these annotations to build a dedicated ontology.
  • 10. USE CASE - EXAMPLE 10 QWAM Ontology Manager
  • 11. USE CASE - EXAMPLE 11
  • 12. USE CASE - EXAMPLE 12 How ML is a part of Ontology Manager • Suggestions of a machine learning algorithm are incorporated in the platform, and proposed to users in order to promote and facilitate ontology evolution
  • 13. NOT STOPPING THERE… 13 Establishing relations between entities • Once named entity recognition and concept recognition are in place, the next step is to establish a link between them. o “Atos finalise le rachat de la société canadienne In Fidem” Company-buys-Company o TESSI signe un partenariat stratégique avec NEHS DIGITAL” Company-partners with-Company • Like named entity and concept recognition, the same methods are implemented. o A “standard” gazetteer with these expressions already exists, which allows for an initial annotation and recognition. o Another annotation rules is used in order to discover new expressions and relations between different types of entities. o A ML model is trained for further exploration.
  • 14. USE CASE - EXAMPLE 14 • Unlike entities or concepts, suggestions are calculated and proposed based on the content of the whole category.
  • 15. CONCLUSION 15 Conclusion • A rule-based approach is not sufficient if we wish to discover new entities “on the fly”. • The size, oftentimes sensitive nature of a client’s dataset, as well as the time allocated to the project, may not allow for machine learning in a business setting. • A hybrid approach seems to be the most efficient method o Adaptable to different client’s data • When doing an NLP task such as named entity recognition, more often than not, errors are cumulated at every step. • The biggest advantage of the method we use at QWAM, is an NLP Engineer’s rapid and active involvement at every step of the way.