SlideShare una empresa de Scribd logo
1 de 27
1
AI-Powered
Linguistics
&
Search
ROSETTE FOR FUSION
2
Today’s Speakers
Radu Miclaus
Director of Product, AI and Cloud
Lucidworks
Robert Lucarini
Senior Software Engineer
Lucidworks
Nick Belanger
Solutions Engineer
Basis Technology
3
Agenda
Challenges with Languages in Search Applications
How Fusion uses Rosette to address these Challenges
Deeper dive into Entities Customization
4
Personalization through Search
experience
Documents Search Curation Personalization
Text Interpretation
Data Enrichment
Relevancy Tuning
Exactly what I am
searching for
Guide me to other
interesting things
Recommendations
✔
5
• LANGUAGE IDENTIFICATION
• CHARACTER NORMALIZATION
• GREATER RECALL WITHOUT LOSING
PRECISION
• METADATA
EXTRACTION/ENTITIES/FACETS/FILTERS
Challenges with
Languages in Search
Applications
6
Fusion + Rosette
Best-in-Class Search using Best-in-Class Linguistics
&
77
Boosting Global Search Quality with Rosette
Essential Elements of Multilingual Search
8
Lemmatization
What is it?
Associates words with the same
meaning (child/children;
beau/belle/beaux/belles). This is an
alternative to stemming which
associates words that look alike with
endings removed (arsen|ic -- arsen|al).
Why it matters
Important for European languages
where adjective agreement of
gender/number and verb conjugation
create multiple word forms,
associating the forms of a single word
increases search recall.
Impact on search
Increases recall of relevant results,
especially for European languages.
French examples:
9
Tokenization
What is it?
Divide sentences into words
for languages written without
spaces between words.
Why it matters
The bigram method ignores
meaning and essentially does
substring matching of one or
two characters. Chinese is
highly ambiguous. Any one
character could be a single
word, but often isn’t.
Impact on search
Greater precision of Chinese,
Japanese, Korean searches.
10
Chinese Script Conversion
What is it?
Converts all records or queries
to between simplified and
traditional Chinese.
Why it matters
It’s impossible to search all
Chinese documents at once
unless a user searches twice: in
traditional and then simplified
Chinese.
Impact on search
With one query, one can search
both simplified and traditional
Chinese documents
simultaneously and see results
in your preferred script.
11
Decompounding
What is it?
Splits compound nouns.
Why it matters
A search for a compound word
like Jugendarbeitslosigkeit
(German: “youth unemployment”)
misses results where the two
concepts (“youth” and
“unemployment”) are separated
(“20% more youth were
unemployed this month.”
Impact on search
Greater recall of German, Dutch,
Korean searches.
German examples:
12
Named Entity Recognition (NER)
What is it?
Adds structure to your
unstructured, multilingual text by
automatically identifying people,
organizations, and locations,
dates, products, and much more.
Why it matters
Filter results for the ones
containing the entities most
pertinent to your search.
Impact on search
More quickly refine your search,
remove noise, and increase
search relevance.
1313
How Does Fusion Use Rosette?
14
SOLR and Fusion
Rosette Enhancing Fusion
- SOLR support for multilingual tokenization
- 35 languages supported
- 7 entities supported with OpenNLP
integration
SOLR/Fusion/Rosette
Base Linguistics:
- 32 supported languages
- Sentence tagging
- Tokenization
- Lemmatization
- Part-of-speech tagging
- Decompounding
- Chinese/Japanese readings
Rosette Entity Extractor:
- 21 supported languages
- 29 entity types and 450+ sub-types detected
15
Rosette is enhancing Fusion’s capabilities to enrich data for search and
personalization. Besides language interpretation, robust Entity Extraction can
enhance Search through the usage of Facets.
1616
Fusion Entities Demo
17
Entity Extraction Workflow
REX engine for Entity Extraction and Fusion Pipelines
18
Fusion 5 Sample Architecture
1919
Deeper Dive
Entities Customization
BASIS TECHNOLOGY
The Rosette Entity Extraction Workflow.
20
The Rosette Entity Extractor:
● comes with expertly crafted models.
● can extract 18 different kinds of entities in more
than 20 different languages.
● is made with high quality data.
● Is curated by our dedicated data team.
● Is backed by 25 years of NLP expertise.
BASIS TECHNOLOGY
The Rosette Entity Extraction Workflow.
21
Machine or deep learned
statistical models that
identify entities based on
context
A high performance
gazetteer that is
dynamically updatable
Rules based extraction
based on REGEX style
patterns
BASIS TECHNOLOGY
Configuration and Customization.
22
Configuration:
● Quick and easy
● Leverages pre-defined capabilities
● Primarily file manipulation
Customization:
● Drastically change REx capabilities
● Allows for truly custom approaches
● More time-intensive
BASIS TECHNOLOGY
Configuration: Gazetteer and Regex.
23
Gazetteer
● Easy to create/modify/maintain
● Create lists of entities to extract
● Great when set is limited/defined
● Accept and reject
Regex
● Match any pattern, simple or complex
● Extract all entities following a pattern
● Requires technical resources
● Accept and reject
BASIS TECHNOLOGY
Configuration: Model Training and Custom Processors.
24
Model Training
● Customize the ML models directly
● Train on your genre of text
● Teach it to recognize new entities
● Requires training process
Custom Processors
● Execute custom code in a sandbox
● Validation, redaction, transformation
● Create more complex extraction rules
● Accept and reject
25
Take Away
● Text Interpretation and Enrichment are Crucial to Personalization
● Having robust language and entity support technology is essential for text
interpretation and enrichment
● Fusion and Rosette technologies stacks are now integrated to provide the
best of AI-Powered Search and AI-Powered Linguistics.
● Visit the BasisTech Booth at Activate
2626
Questions & Answers
27

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

1415 gold sanford
1415 gold sanford1415 gold sanford
1415 gold sanford
 
Test strategy for Conversational AI
Test strategy for Conversational AITest strategy for Conversational AI
Test strategy for Conversational AI
 
CURA ANALYTICS OVERVIEW
CURA ANALYTICS OVERVIEWCURA ANALYTICS OVERVIEW
CURA ANALYTICS OVERVIEW
 
Composite apps cura-analytics mod march2016
Composite apps cura-analytics mod march2016Composite apps cura-analytics mod march2016
Composite apps cura-analytics mod march2016
 
Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
How to Drive High Performance Intelligence Teams
How to Drive High Performance Intelligence TeamsHow to Drive High Performance Intelligence Teams
How to Drive High Performance Intelligence Teams
 
Solr Migration at Scale: A LexisNexis Journey
Solr Migration at Scale: A LexisNexis JourneySolr Migration at Scale: A LexisNexis Journey
Solr Migration at Scale: A LexisNexis Journey
 
The Software Defined Enterprise - Innovating and disrupting IT in business
The Software Defined Enterprise - Innovating and disrupting IT in businessThe Software Defined Enterprise - Innovating and disrupting IT in business
The Software Defined Enterprise - Innovating and disrupting IT in business
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
Why many data science projects fail
Why many data science projects fail Why many data science projects fail
Why many data science projects fail
 
Getting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AIGetting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AI
 
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne ThompsonConversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
Conversational Artificial Intelligence with Ben Tomlinson and Wayne Thompson
 
NUS-ISS Learning Day 2019-The Power of Data Visualisation
NUS-ISS Learning Day 2019-The Power of Data VisualisationNUS-ISS Learning Day 2019-The Power of Data Visualisation
NUS-ISS Learning Day 2019-The Power of Data Visualisation
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
"How To Build and Lead a Winning Data Team" by Cahyo Listyanto (Bizzy.co.id)
"How To Build and Lead a Winning Data Team" by Cahyo Listyanto (Bizzy.co.id)"How To Build and Lead a Winning Data Team" by Cahyo Listyanto (Bizzy.co.id)
"How To Build and Lead a Winning Data Team" by Cahyo Listyanto (Bizzy.co.id)
 
Objectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
 
Lucidworks Fusion at Foot Locker: Speed at Scale
Lucidworks Fusion at Foot Locker: Speed at ScaleLucidworks Fusion at Foot Locker: Speed at Scale
Lucidworks Fusion at Foot Locker: Speed at Scale
 
BI and Data Analytics
BI and Data Analytics BI and Data Analytics
BI and Data Analytics
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 

Similar a AI-Powered Linguistics and Search with Fusion and Rosette

Finding balance of DDD while your application grows
Finding balance of DDD while your application growsFinding balance of DDD while your application grows
Finding balance of DDD while your application grows
Carolina Karklis
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Module BookletUnitUnit17 Database Design Concepts.docx
Module BookletUnitUnit17 Database Design Concepts.docxModule BookletUnitUnit17 Database Design Concepts.docx
Module BookletUnitUnit17 Database Design Concepts.docx
gilpinleeanna
 
Introduction to object oriented language
Introduction to object oriented languageIntroduction to object oriented language
Introduction to object oriented language
farhan amjad
 

Similar a AI-Powered Linguistics and Search with Fusion and Rosette (20)

Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
 
Domain Driven Design
Domain Driven DesignDomain Driven Design
Domain Driven Design
 
Up to speed in domain driven design
Up to speed in domain driven designUp to speed in domain driven design
Up to speed in domain driven design
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
Product Update: Customization with Rosette
Product Update: Customization with RosetteProduct Update: Customization with Rosette
Product Update: Customization with Rosette
 
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
 
Finding balance of DDD while your application grows
Finding balance of DDD while your application growsFinding balance of DDD while your application grows
Finding balance of DDD while your application grows
 
Smart cities no ai without ia
Smart cities   no ai without iaSmart cities   no ai without ia
Smart cities no ai without ia
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Welcome to a new state of find
Welcome to a new state of findWelcome to a new state of find
Welcome to a new state of find
 
Os Long
Os LongOs Long
Os Long
 
II-PIC 2017 in Bangalore
II-PIC 2017 in BangaloreII-PIC 2017 in Bangalore
II-PIC 2017 in Bangalore
 
II-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics TechnologiesII-SDV 2017: Gridlogics Technologies
II-SDV 2017: Gridlogics Technologies
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Text Analytics in Enterprise Search - Daniel Ling
Text Analytics in Enterprise Search - Daniel LingText Analytics in Enterprise Search - Daniel Ling
Text Analytics in Enterprise Search - Daniel Ling
 
Text Analytics in Enterprise Search
Text Analytics in Enterprise SearchText Analytics in Enterprise Search
Text Analytics in Enterprise Search
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
Module BookletUnitUnit17 Database Design Concepts.docx
Module BookletUnitUnit17 Database Design Concepts.docxModule BookletUnitUnit17 Database Design Concepts.docx
Module BookletUnitUnit17 Database Design Concepts.docx
 
Introduction to object oriented language
Introduction to object oriented languageIntroduction to object oriented language
Introduction to object oriented language
 
Integrating advanced analytics with Elasticsearch
Integrating advanced analytics with ElasticsearchIntegrating advanced analytics with Elasticsearch
Integrating advanced analytics with Elasticsearch
 

Más de Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

Más de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 
Webinar: Lucidworks Managed Search
Webinar: Lucidworks Managed SearchWebinar: Lucidworks Managed Search
Webinar: Lucidworks Managed Search
 
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
KMWorld 2019 - Diane Burley - Bringing a Consumer-Like Experience to the Digi...
 
Using Search to Drive Self-Help Success at Veritas
Using Search to Drive Self-Help Success at VeritasUsing Search to Drive Self-Help Success at Veritas
Using Search to Drive Self-Help Success at Veritas
 
Using Signals in Lucidworks Fusion
Using Signals in Lucidworks FusionUsing Signals in Lucidworks Fusion
Using Signals in Lucidworks Fusion
 
Enterprise Information Architecture: Empowering AI in the Digital Workplace
Enterprise Information Architecture: Empowering AI in the Digital WorkplaceEnterprise Information Architecture: Empowering AI in the Digital Workplace
Enterprise Information Architecture: Empowering AI in the Digital Workplace
 
Building Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | HautelookBuilding Search and Personalization at Nordstrom Rack | Hautelook
Building Search and Personalization at Nordstrom Rack | Hautelook
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

AI-Powered Linguistics and Search with Fusion and Rosette

  • 2. 2 Today’s Speakers Radu Miclaus Director of Product, AI and Cloud Lucidworks Robert Lucarini Senior Software Engineer Lucidworks Nick Belanger Solutions Engineer Basis Technology
  • 3. 3 Agenda Challenges with Languages in Search Applications How Fusion uses Rosette to address these Challenges Deeper dive into Entities Customization
  • 4. 4 Personalization through Search experience Documents Search Curation Personalization Text Interpretation Data Enrichment Relevancy Tuning Exactly what I am searching for Guide me to other interesting things Recommendations ✔
  • 5. 5 • LANGUAGE IDENTIFICATION • CHARACTER NORMALIZATION • GREATER RECALL WITHOUT LOSING PRECISION • METADATA EXTRACTION/ENTITIES/FACETS/FILTERS Challenges with Languages in Search Applications
  • 6. 6 Fusion + Rosette Best-in-Class Search using Best-in-Class Linguistics &
  • 7. 77 Boosting Global Search Quality with Rosette Essential Elements of Multilingual Search
  • 8. 8 Lemmatization What is it? Associates words with the same meaning (child/children; beau/belle/beaux/belles). This is an alternative to stemming which associates words that look alike with endings removed (arsen|ic -- arsen|al). Why it matters Important for European languages where adjective agreement of gender/number and verb conjugation create multiple word forms, associating the forms of a single word increases search recall. Impact on search Increases recall of relevant results, especially for European languages. French examples:
  • 9. 9 Tokenization What is it? Divide sentences into words for languages written without spaces between words. Why it matters The bigram method ignores meaning and essentially does substring matching of one or two characters. Chinese is highly ambiguous. Any one character could be a single word, but often isn’t. Impact on search Greater precision of Chinese, Japanese, Korean searches.
  • 10. 10 Chinese Script Conversion What is it? Converts all records or queries to between simplified and traditional Chinese. Why it matters It’s impossible to search all Chinese documents at once unless a user searches twice: in traditional and then simplified Chinese. Impact on search With one query, one can search both simplified and traditional Chinese documents simultaneously and see results in your preferred script.
  • 11. 11 Decompounding What is it? Splits compound nouns. Why it matters A search for a compound word like Jugendarbeitslosigkeit (German: “youth unemployment”) misses results where the two concepts (“youth” and “unemployment”) are separated (“20% more youth were unemployed this month.” Impact on search Greater recall of German, Dutch, Korean searches. German examples:
  • 12. 12 Named Entity Recognition (NER) What is it? Adds structure to your unstructured, multilingual text by automatically identifying people, organizations, and locations, dates, products, and much more. Why it matters Filter results for the ones containing the entities most pertinent to your search. Impact on search More quickly refine your search, remove noise, and increase search relevance.
  • 13. 1313 How Does Fusion Use Rosette?
  • 14. 14 SOLR and Fusion Rosette Enhancing Fusion - SOLR support for multilingual tokenization - 35 languages supported - 7 entities supported with OpenNLP integration SOLR/Fusion/Rosette Base Linguistics: - 32 supported languages - Sentence tagging - Tokenization - Lemmatization - Part-of-speech tagging - Decompounding - Chinese/Japanese readings Rosette Entity Extractor: - 21 supported languages - 29 entity types and 450+ sub-types detected
  • 15. 15 Rosette is enhancing Fusion’s capabilities to enrich data for search and personalization. Besides language interpretation, robust Entity Extraction can enhance Search through the usage of Facets.
  • 17. 17 Entity Extraction Workflow REX engine for Entity Extraction and Fusion Pipelines
  • 18. 18 Fusion 5 Sample Architecture
  • 20. BASIS TECHNOLOGY The Rosette Entity Extraction Workflow. 20 The Rosette Entity Extractor: ● comes with expertly crafted models. ● can extract 18 different kinds of entities in more than 20 different languages. ● is made with high quality data. ● Is curated by our dedicated data team. ● Is backed by 25 years of NLP expertise.
  • 21. BASIS TECHNOLOGY The Rosette Entity Extraction Workflow. 21 Machine or deep learned statistical models that identify entities based on context A high performance gazetteer that is dynamically updatable Rules based extraction based on REGEX style patterns
  • 22. BASIS TECHNOLOGY Configuration and Customization. 22 Configuration: ● Quick and easy ● Leverages pre-defined capabilities ● Primarily file manipulation Customization: ● Drastically change REx capabilities ● Allows for truly custom approaches ● More time-intensive
  • 23. BASIS TECHNOLOGY Configuration: Gazetteer and Regex. 23 Gazetteer ● Easy to create/modify/maintain ● Create lists of entities to extract ● Great when set is limited/defined ● Accept and reject Regex ● Match any pattern, simple or complex ● Extract all entities following a pattern ● Requires technical resources ● Accept and reject
  • 24. BASIS TECHNOLOGY Configuration: Model Training and Custom Processors. 24 Model Training ● Customize the ML models directly ● Train on your genre of text ● Teach it to recognize new entities ● Requires training process Custom Processors ● Execute custom code in a sandbox ● Validation, redaction, transformation ● Create more complex extraction rules ● Accept and reject
  • 25. 25 Take Away ● Text Interpretation and Enrichment are Crucial to Personalization ● Having robust language and entity support technology is essential for text interpretation and enrichment ● Fusion and Rosette technologies stacks are now integrated to provide the best of AI-Powered Search and AI-Powered Linguistics. ● Visit the BasisTech Booth at Activate
  • 27. 27