SlideShare una empresa de Scribd logo
1 de 28
Cognate or False Friend? Ask the Web! ,[object Object],[object Object],[object Object],A Workshop on Acquisition and Management   of Multilingual Lexicons
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cognates and False Friends  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Paper in One Slide ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Contextual Web Similarity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Same day delivery of fresh  flowers , roses, and unique gift baskets  from our online boutique .  Flower  delivery online by local florists for birthday  flowers .
Contextual Web Similarity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Contextual Web Similarity ,[object Object],[object Object],Flowers, plants, roses, & gifts. Flower s  delivery with fewer ... Flowers, roses, plants and gift delivery. Order flowers from ProFlowers once, and you  will  never use flower s  delivery from florists again. Margarita   Flowers   -   Delivers in Bulgaria for you! - gifts, flowers, roses ... Wide selection of BOUQUETS,   FLORAL ARRANGEMENTS,   CHRISTMAS ECORATIONS,   PLANTS,   CAKES and GIFTS appropriate for various occasions. CREDIT cards acceptable. Flowers, Plants, Gift Baskets - 1-800-FLOWERS.COM - Your Florist ... Flowers, balloons, plants, gift baskets, gourmet food, and teddy bears presented by 1-800-FLOWERS.COM, Your Florist of Choice for over 30 years.
Contextual Web Similarity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Contextual Web Similarity ,[object Object],word:  flower word:  computer 183 rose 165 delivery 124 gift 98 welcome 217 fresh 204 order 87 red ... ... count word 252 technology 185 order 174 new 159 Web 291 Internet 286 PC 146 site ... ... count word
Contextual Web Similarity ,[object Object],[object Object],v 1 :  flower v 2 :  computer 5000 4999 ... 3 2 1 0 # 0 amateur 5 apple ... ... 3 alias 2 alligator 0 zap 6 zoo freq. word 5000 4999 ... 3 2 1 0 # 8 amateur 133 apple ... ... 7 alias 0 alligator 3 zap 0 zoo freq. word
Cross-Lingual Similarity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],C 1 * C 1 G
Reverse Context Lookup ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Reverse Context Lookup ,[object Object],[object Object],[object Object],[object Object],[object Object]
Web Similarity Using Seed Words ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*  P. Fung and L. Y. Yee. An IR approach for translating from   nonparallel, comparable texts. In Proceedings of ACL, volume   1, pages 414–420, 1998
Evaluation Data Set ,[object Object],[object Object],[object Object],[object Object],[object Object]
Experiments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Experiments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Resources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object]
Results (11pt Average Precision) Comparing BASELINE, LCSR, MEDR, SEED and WEB3 algorithms
Results (11pt Average Precision) Comparing different context sizes; keeping the stop words
Results (11pt Average Precision) Comparing different improvements of the WEB3 algorithm
Results (Precision-Recall Graph) Comparing the recall-precision graphs of evaluated algorithms
Results: The Ordering for WEB3 100.00% 50.00% yes 0,9684 beauty beauty красота 200 100.00% 50.25% yes 0,9171 flora flora флора 199 100.00% 50.51% yes 0,9028 science science наука 198 100.00% 50.76% yes 0,8916 silver silver сребро / серебро 197 100.00% 51.28% yes 0,8017 finance finance финанси / финансы 19 6 … … … … … … … … 83.00% 82.18% no 0,2130 rubble leg бут 101 82.00% 82.00% no 0,2101 time year година 100 81.00% 81.82% yes 0,2099 volcano volcano вулкан 99 … … … … … … … … 5.00% 100.00% no 0,0182 whip hedge плет / плеть 5 4.00% 100.00% no 0,0175 crud chill мраз / мразь 4 3.00% 100.00% no 0,0143 income livestock добитък / добыток 3 2.00% 100.00% no 0,0130 gaff mottle багрене / багренье 2 1.00% 100.00% no 0,0085 muff gratis муфта 1 R@ r [email_address] Cogn.? Sim. RU  Sense BG Sense Candidate r
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion and Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions ? Cognate or False Friend? Ask the Web!

Más contenido relacionado

Similar a Svetlin Nakov - Cognate or False Friend? Ask the Web!

Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Lucidworks
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Chunyang Chen
 

Similar a Svetlin Nakov - Cognate or False Friend? Ask the Web! (20)

Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a CorpusSvetlin Nakov - Improved Word Alignments Using the Web as a Corpus
Svetlin Nakov - Improved Word Alignments Using the Web as a Corpus
 
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
Building Smarter Search Applications Using Built-In Knowledge Graphs and Quer...
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Regular Expressions for SEO
Regular Expressions for SEORegular Expressions for SEO
Regular Expressions for SEO
 
AINL 2016: Grigorieva
AINL 2016: GrigorievaAINL 2016: Grigorieva
AINL 2016: Grigorieva
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
Optimization by translation
Optimization by translationOptimization by translation
Optimization by translation
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Tf dsyv
Tf dsyvTf dsyv
Tf dsyv
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...
Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...
Constructing dataset based_on_concept_hierarchy_for_evaluating_word_vectors_l...
 
Spell Checking in Deezer Search Engine
Spell Checking in Deezer Search EngineSpell Checking in Deezer Search Engine
Spell Checking in Deezer Search Engine
 
C 2
C 2C 2
C 2
 
Subword tokenizers
Subword tokenizersSubword tokenizers
Subword tokenizers
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Using topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic searchUsing topic modelling frameworks for NLP and semantic search
Using topic modelling frameworks for NLP and semantic search
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
Dependency-Based Word Embeddings
Dependency-Based Word EmbeddingsDependency-Based Word Embeddings
Dependency-Based Word Embeddings
 
Dealing with Legacy Perl Code - Peter Scott
Dealing with Legacy Perl Code - Peter ScottDealing with Legacy Perl Code - Peter Scott
Dealing with Legacy Perl Code - Peter Scott
 
Using Parallel Propbanks to enhance Word-alignments
Using Parallel Propbanks to enhance Word-alignmentsUsing Parallel Propbanks to enhance Word-alignments
Using Parallel Propbanks to enhance Word-alignments
 

Más de Svetlin Nakov

Дипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин НаковДипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин Наков
Svetlin Nakov
 
Свободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТСвободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТ
Svetlin Nakov
 

Más de Svetlin Nakov (20)

Най-търсените направления в ИТ сферата за 2024
Най-търсените направления в ИТ сферата за 2024Най-търсените направления в ИТ сферата за 2024
Най-търсените направления в ИТ сферата за 2024
 
BG-IT-Edu: отворено учебно съдържание за ИТ учители
BG-IT-Edu: отворено учебно съдържание за ИТ учителиBG-IT-Edu: отворено учебно съдържание за ИТ учители
BG-IT-Edu: отворено учебно съдържание за ИТ учители
 
Programming World in 2024
Programming World in 2024Programming World in 2024
Programming World in 2024
 
AI Tools for Business and Startups
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and Startups
 
AI Tools for Scientists - Nakov (Oct 2023)
AI Tools for Scientists - Nakov (Oct 2023)AI Tools for Scientists - Nakov (Oct 2023)
AI Tools for Scientists - Nakov (Oct 2023)
 
AI Tools for Entrepreneurs
AI Tools for EntrepreneursAI Tools for Entrepreneurs
AI Tools for Entrepreneurs
 
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
 
AI Tools for Business and Personal Life
AI Tools for Business and Personal LifeAI Tools for Business and Personal Life
AI Tools for Business and Personal Life
 
Дипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин НаковДипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин Наков
 
Дипломна работа: учебно съдържание по ООП
Дипломна работа: учебно съдържание по ООПДипломна работа: учебно съдържание по ООП
Дипломна работа: учебно съдържание по ООП
 
Свободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТСвободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТ
 
AI and the Professions of the Future
AI and the Professions of the FutureAI and the Professions of the Future
AI and the Professions of the Future
 
Programming Languages Trends for 2023
Programming Languages Trends for 2023Programming Languages Trends for 2023
Programming Languages Trends for 2023
 
IT Professions and How to Become a Developer
IT Professions and How to Become a DeveloperIT Professions and How to Become a Developer
IT Professions and How to Become a Developer
 
GitHub Actions (Nakov at RuseConf, Sept 2022)
GitHub Actions (Nakov at RuseConf, Sept 2022)GitHub Actions (Nakov at RuseConf, Sept 2022)
GitHub Actions (Nakov at RuseConf, Sept 2022)
 
IT Professions and Their Future
IT Professions and Their FutureIT Professions and Their Future
IT Professions and Their Future
 
How to Become a QA Engineer and Start a Job
How to Become a QA Engineer and Start a JobHow to Become a QA Engineer and Start a Job
How to Become a QA Engineer and Start a Job
 
Призвание и цели: моята рецепта
Призвание и цели: моята рецептаПризвание и цели: моята рецепта
Призвание и цели: моята рецепта
 
What Mongolian IT Industry Can Learn from Bulgaria?
What Mongolian IT Industry Can Learn from Bulgaria?What Mongolian IT Industry Can Learn from Bulgaria?
What Mongolian IT Industry Can Learn from Bulgaria?
 
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
 

Último

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Último (20)

WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 

Svetlin Nakov - Cognate or False Friend? Ask the Web!

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Results (11pt Average Precision) Comparing BASELINE, LCSR, MEDR, SEED and WEB3 algorithms
  • 21. Results (11pt Average Precision) Comparing different context sizes; keeping the stop words
  • 22. Results (11pt Average Precision) Comparing different improvements of the WEB3 algorithm
  • 23. Results (Precision-Recall Graph) Comparing the recall-precision graphs of evaluated algorithms
  • 24. Results: The Ordering for WEB3 100.00% 50.00% yes 0,9684 beauty beauty красота 200 100.00% 50.25% yes 0,9171 flora flora флора 199 100.00% 50.51% yes 0,9028 science science наука 198 100.00% 50.76% yes 0,8916 silver silver сребро / серебро 197 100.00% 51.28% yes 0,8017 finance finance финанси / финансы 19 6 … … … … … … … … 83.00% 82.18% no 0,2130 rubble leg бут 101 82.00% 82.00% no 0,2101 time year година 100 81.00% 81.82% yes 0,2099 volcano volcano вулкан 99 … … … … … … … … 5.00% 100.00% no 0,0182 whip hedge плет / плеть 5 4.00% 100.00% no 0,0175 crud chill мраз / мразь 4 3.00% 100.00% no 0,0143 income livestock добитък / добыток 3 2.00% 100.00% no 0,0130 gaff mottle багрене / багренье 2 1.00% 100.00% no 0,0085 muff gratis муфта 1 R@ r [email_address] Cogn.? Sim. RU Sense BG Sense Candidate r
  • 25.
  • 26.
  • 27.
  • 28. Questions ? Cognate or False Friend? Ask the Web!