SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Based in San Francisco
Offices in Bangalore, Bangkok,
New York City, Raleigh, Munich
Over 300 customers across the
Fortune 1000
Fusion, a Solr-powered platform
for search-driven apps
Consulting and support for
organizations using Solr
Produces the world’s largest open
source user conference dedicated
to Lucene/Solr
Lucidworks is the primary sponsor of
the Apache Solr project
Employs over 40% of the active
committers on the Solr project
Contributes over 70% of Solr's
open source codebase
40%
70%
If you can’t find it, then you can’t…
R&D Online
Retail
Customer
Insights
Fraud &
Compliance
Customer
Service
.1?
What’s New in
• Over 50 connectors to
integrate all your data
• Robust parsing framework
to seamlessly ingest all your
document types
• Point and click Indexing
configuration and iterative
simulation of results for full
control over your ETL process
• Your security model
enforced end-to-end from
ingest to search across your
different datasources
• Relevancy tuning: Point-and-click
query pipeline configuration allow
fine-grained control of results.
• Machine-driven relevancy: Signals
aggregation learn and automatically
tune relevancy and drive
recommendations out of the box .
• Powerful pipeline stages:
Customize fields, stages, synonyms,
boosts, facets, machine learning
models, your own scripted behavior,
and dozens of other powerful
search stages.
• Turnkey search UI

(Lucidworks AppStudio): Build a
sophisticated end-to-end search
application in just hours.
Operational Simplicity
Typical Apache Solr Deployment Architecture
Optional
Worker Worker Cluster Manager
Spark/Hadoop
Shards Shards
Solr
HDFS
Shared Config
Management
Leader
Election
Load
Balancing
ZK 1
Zookeeper
ZK N
Nutch/
Heretrix
Log Proc.
Mahout
(Recommender)
ManifoldCF*
(Connectors)
Security
(Roll your own)
Roll your own
*only 12 connectors available, 

compared w/ 60+ in Fusion
SiLK
Scheduling
(cron?)
Admin UI
Deployment
(Roll your own)
Monitoring
(Roll your own)
Relevance Tools
(Roll your own)
Tika ships w/ Solr, but can’t be scaled independently
NLP tools
SECURITY BUILT-IN
Shards Shards
Apache Solr
Apache Zookeeper
ZK 1
Leader
Election
Load
Balancing
Shared Config
Management
Worker Worker
Apache Spark
Cluster
Manager
RESTAPI
Admin UI
Lucidworks
View
LOGS FILE WEB DATABASE CLOUD
HDFS(Optional)
Core Services
• • •
ETL and Query Pipelines
Recommenders/Signals
NLP
Machine Learning
Alerting and Messaging
Security
Connectors
Scheduling
Fusion Simplifies the Deployment
SECURITY BUILT-IN
RESTAPI
Admin UI
Lucidworks
View
LOGS FILE WEB DATABASE CLOUD
HDFS(Optional)
Fusion abstracts the
OS so you don't have
to worry about that.
Core Services
• • •
ETL and Query Pipelines
Recommenders/Signals
NLP
Machine Learning
Alerting and Messaging
Security
Connectors
Scheduling
Fusion Abstracts Open Source Bits
Core Services
• • •
NLP
Recommenders / Signals
Blob Storage
Pipelines
Scheduling
Alerting / Messaging
Connectors
RESTAPI
Admin UI
Lucidworks
View
LOGS FILE WEB DATABASE CLOUD
• Seamless integration of your
entire search & analytics
platform
• All capabilities exposed
through secured API's, so you
can use our UI or build your
own.
• End-to-end security policies
can be applied out of the box
to every aspect of your search
ecosystem.
• Distributed, fault-tolerant
scaling and supervision of
your entire search application
So…what’s new already??
Component version
upgrades
• Solr 6.5.1
• Zookeeper 3.4.6
• Spark 2.1.1
Connectors al
la carte
Object Explorer
Object Explorer
Query Explorer Jobs
• Collection Analysis
• Levenshtein Spell
Checking
• Statistically Interesting
Phrases
• http://lucidworks.com/
2017/06/21/query-
explorer-jobs-in-
fusion-3-1/
Query Explorer
Jobs
Blob Manager
Sharepoint Online
Connected!!
New APIs
• Links API - explore the links between Fusion objects.
• Jobs API - replaces the Scheduler API
• Tasks API - Jobs schedule Tasks, tasks are the things you do (i.e. make
this rest call)
• Groups API - Tag groups of Fusion Objects with identifiers
Improved APIs
• Jobs and Schedules are now separate (generally but this leaks into the
APIs)
• Spark Jobs API
• Objects API
Distributed Indexing
Index pipelines can
now be invoked on a
different Fusion
instance than the one
on which the
connector is running.
Connector Enhancements
• JIRA Connector - security trimming, parsers and performance,
settings for field, timeout and retry.
• Web Connector - crawl JavaScript powered sites, authentication and
timeout settings
• Jive Connector - lists and maps
• Box.com connector - incremental crawling improvements, more
depth and exclusion settings
• Google Drive - batch incremental crawling, security trimming,
timeouts, index trash
New Parsers
• XML Parser - Parse XML separately into new documents
• HTML Parser - not Tika based, uses JSoup Selectors to extract HTML
or CSS into fields/documents
Query Pipeline
• Recommendation -> Boost With Signals
• User Recommendation Boosting -> Recommend Items for User
• Recommend Similar -> Recommend More Like This
•SSL to Solr
•with /
Kerberos
•or Basic Auth
Enabling Cognitive Search
with Fusion 3.1 Machine
Learning
Machine Learning Functionalities with Fusion 3.1
Fusion machine learning capabilities include:
• Recommendation and Personalization.
• Query intent and document classification.
• Learning to rank.
• Automatic doc clustering, anomaly detection, cluster labeling and topic
detection.
• NLP: synonym discovery and expansion via w2v; phrase detection using log
likelihood.
• Clickstream auto-tuning relevance and analysis.
• Multi-arm bandit experimentation.
Recommendation and Personalization :
Out of the box, multi-modal recommenders based on collaborative
filtering and content:
Use cases:
• Recommend item for user: select items that user likely to buy based on past click/
purchase behavior.
• Recommend item for query: help boost items based on query history.
• Item to item similarity: users who were interested in items like X were also
interested in items like Y and Z.
• Query to query similarity: suggest similar queries to provide query expansion.
Built in pipeline, spark job, job schedules and visualizations all provided.
Query relevancy and intent:
Query intent classification using Random Forest or Logistic Regression:
Use Cases:
• Predict product category of an incoming query to help reduce ambiguity.
• Predict category of a set of new product to be incorporated into catalog.
Learning to Rank:
• Use internal and external features of content and query to influence ranking.
• Leverage Solr’s re-ranking (&rq) capability.
• Support for: linear and non-linear models (libSVM, liblinear, RankLib)
Automatic Doc Clustering and Anomaly Detection :
Features :
• Automatic outlier detection.
• Automatic decision about number of clusters K. (in house algorithm)
• Automatic cluster topic detection. (in house algorithm)
• Ranking of document based on relevance to the topic.
• Minimum data science knowledge needed to use the module:
Extensive research has been done to help choose the best set of models; Good default
parameter settings that generalize well; Flexible pipeline.
Use Cases :
• Financial: group emails, news, research articles.
• Ecommerce: group product reviews, product descriptions.
NLP using Machine Learning:
Automatic phrase detection using log likelihood:
• Spark job to generate a table of phrases (freq co-occurred terms) such as “area rug”,
“interior paint” and “ipad case”.
Spell checking using edit distance:
• Match a tail query to head query or product name based on edit distance to correct typos:
e.g., ”wireless motam”->”wireless modem”.
Synonym detection/query expansion using Word2Vec:
• e.g., “refrigerator”->”freezer”, “phone”->”cellular”.
Thank You

Más contenido relacionado

La actualidad más candente

Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 

La actualidad más candente (20)

Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
Search UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
Search UI and Lucidworks View: Presented by Josh Ellinger, LucidworksSearch UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
Search UI and Lucidworks View: Presented by Josh Ellinger, Lucidworks
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
Splunk
SplunkSplunk
Splunk
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
Participating in the Community - Beyond Code: Presented by Cassandra Targett,...
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Discovery Interfaces
Discovery InterfacesDiscovery Interfaces
Discovery Interfaces
 
Introduction à Application Insights
Introduction à Application InsightsIntroduction à Application Insights
Introduction à Application Insights
 
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
Our Tale from the Trail of Shadows at REI Co-op - Chris Phillips & Dale Smith...
 
Extending drupal authentication
Extending drupal authenticationExtending drupal authentication
Extending drupal authentication
 
Building Ext JS Using HATEOAS - Jeff Stano
Building Ext JS Using HATEOAS - Jeff StanoBuilding Ext JS Using HATEOAS - Jeff Stano
Building Ext JS Using HATEOAS - Jeff Stano
 
Metrics & more
Metrics & more Metrics & more
Metrics & more
 
Intro to Search
Intro to SearchIntro to Search
Intro to Search
 
Mythbusters
MythbustersMythbusters
Mythbusters
 
Remix
RemixRemix
Remix
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Gab2015 azure search as a service
Gab2015 azure search as a serviceGab2015 azure search as a service
Gab2015 azure search as a service
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
Creating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleepCreating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleep
 

Similar a Webinar: Fusion 3.1 - What's New

The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Amazon Web Services
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 

Similar a Webinar: Fusion 3.1 - What's New (20)

The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Current and emerging trends in library services
Current and emerging trends in library servicesCurrent and emerging trends in library services
Current and emerging trends in library services
 
LLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flowLLMOps with Azure Machine Learning prompt flow
LLMOps with Azure Machine Learning prompt flow
 
Sumo Logic QuickStart
Sumo Logic QuickStartSumo Logic QuickStart
Sumo Logic QuickStart
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Scalable Search Analytics
Scalable Search AnalyticsScalable Search Analytics
Scalable Search Analytics
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Migrating from Azure Search to SearcStax
Migrating from Azure Search to SearcStaxMigrating from Azure Search to SearcStax
Migrating from Azure Search to SearcStax
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 

Más de Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

Más de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Webinar: Fusion 3.1 - What's New

  • 1.
  • 2. Based in San Francisco Offices in Bangalore, Bangkok, New York City, Raleigh, Munich Over 300 customers across the Fortune 1000 Fusion, a Solr-powered platform for search-driven apps Consulting and support for organizations using Solr Produces the world’s largest open source user conference dedicated to Lucene/Solr Lucidworks is the primary sponsor of the Apache Solr project Employs over 40% of the active committers on the Solr project Contributes over 70% of Solr's open source codebase 40% 70%
  • 3. If you can’t find it, then you can’t… R&D Online Retail Customer Insights Fraud & Compliance Customer Service
  • 5. • Over 50 connectors to integrate all your data • Robust parsing framework to seamlessly ingest all your document types • Point and click Indexing configuration and iterative simulation of results for full control over your ETL process • Your security model enforced end-to-end from ingest to search across your different datasources
  • 6. • Relevancy tuning: Point-and-click query pipeline configuration allow fine-grained control of results. • Machine-driven relevancy: Signals aggregation learn and automatically tune relevancy and drive recommendations out of the box . • Powerful pipeline stages: Customize fields, stages, synonyms, boosts, facets, machine learning models, your own scripted behavior, and dozens of other powerful search stages. • Turnkey search UI
 (Lucidworks AppStudio): Build a sophisticated end-to-end search application in just hours.
  • 8. Typical Apache Solr Deployment Architecture Optional Worker Worker Cluster Manager Spark/Hadoop Shards Shards Solr HDFS Shared Config Management Leader Election Load Balancing ZK 1 Zookeeper ZK N Nutch/ Heretrix Log Proc. Mahout (Recommender) ManifoldCF* (Connectors) Security (Roll your own) Roll your own *only 12 connectors available, 
 compared w/ 60+ in Fusion SiLK Scheduling (cron?) Admin UI Deployment (Roll your own) Monitoring (Roll your own) Relevance Tools (Roll your own) Tika ships w/ Solr, but can’t be scaled independently NLP tools
  • 9. SECURITY BUILT-IN Shards Shards Apache Solr Apache Zookeeper ZK 1 Leader Election Load Balancing Shared Config Management Worker Worker Apache Spark Cluster Manager RESTAPI Admin UI Lucidworks View LOGS FILE WEB DATABASE CLOUD HDFS(Optional) Core Services • • • ETL and Query Pipelines Recommenders/Signals NLP Machine Learning Alerting and Messaging Security Connectors Scheduling Fusion Simplifies the Deployment
  • 10. SECURITY BUILT-IN RESTAPI Admin UI Lucidworks View LOGS FILE WEB DATABASE CLOUD HDFS(Optional) Fusion abstracts the OS so you don't have to worry about that. Core Services • • • ETL and Query Pipelines Recommenders/Signals NLP Machine Learning Alerting and Messaging Security Connectors Scheduling Fusion Abstracts Open Source Bits
  • 11. Core Services • • • NLP Recommenders / Signals Blob Storage Pipelines Scheduling Alerting / Messaging Connectors RESTAPI Admin UI Lucidworks View LOGS FILE WEB DATABASE CLOUD • Seamless integration of your entire search & analytics platform • All capabilities exposed through secured API's, so you can use our UI or build your own. • End-to-end security policies can be applied out of the box to every aspect of your search ecosystem. • Distributed, fault-tolerant scaling and supervision of your entire search application
  • 13. Component version upgrades • Solr 6.5.1 • Zookeeper 3.4.6 • Spark 2.1.1
  • 17. Query Explorer Jobs • Collection Analysis • Levenshtein Spell Checking • Statistically Interesting Phrases • http://lucidworks.com/ 2017/06/21/query- explorer-jobs-in- fusion-3-1/
  • 21. New APIs • Links API - explore the links between Fusion objects. • Jobs API - replaces the Scheduler API • Tasks API - Jobs schedule Tasks, tasks are the things you do (i.e. make this rest call) • Groups API - Tag groups of Fusion Objects with identifiers Improved APIs • Jobs and Schedules are now separate (generally but this leaks into the APIs) • Spark Jobs API • Objects API
  • 22. Distributed Indexing Index pipelines can now be invoked on a different Fusion instance than the one on which the connector is running.
  • 23. Connector Enhancements • JIRA Connector - security trimming, parsers and performance, settings for field, timeout and retry. • Web Connector - crawl JavaScript powered sites, authentication and timeout settings • Jive Connector - lists and maps • Box.com connector - incremental crawling improvements, more depth and exclusion settings • Google Drive - batch incremental crawling, security trimming, timeouts, index trash
  • 24. New Parsers • XML Parser - Parse XML separately into new documents • HTML Parser - not Tika based, uses JSoup Selectors to extract HTML or CSS into fields/documents
  • 25. Query Pipeline • Recommendation -> Boost With Signals • User Recommendation Boosting -> Recommend Items for User • Recommend Similar -> Recommend More Like This
  • 26. •SSL to Solr •with / Kerberos •or Basic Auth
  • 27. Enabling Cognitive Search with Fusion 3.1 Machine Learning
  • 28. Machine Learning Functionalities with Fusion 3.1 Fusion machine learning capabilities include: • Recommendation and Personalization. • Query intent and document classification. • Learning to rank. • Automatic doc clustering, anomaly detection, cluster labeling and topic detection. • NLP: synonym discovery and expansion via w2v; phrase detection using log likelihood. • Clickstream auto-tuning relevance and analysis. • Multi-arm bandit experimentation.
  • 29. Recommendation and Personalization : Out of the box, multi-modal recommenders based on collaborative filtering and content: Use cases: • Recommend item for user: select items that user likely to buy based on past click/ purchase behavior. • Recommend item for query: help boost items based on query history. • Item to item similarity: users who were interested in items like X were also interested in items like Y and Z. • Query to query similarity: suggest similar queries to provide query expansion. Built in pipeline, spark job, job schedules and visualizations all provided.
  • 30. Query relevancy and intent: Query intent classification using Random Forest or Logistic Regression: Use Cases: • Predict product category of an incoming query to help reduce ambiguity. • Predict category of a set of new product to be incorporated into catalog. Learning to Rank: • Use internal and external features of content and query to influence ranking. • Leverage Solr’s re-ranking (&rq) capability. • Support for: linear and non-linear models (libSVM, liblinear, RankLib)
  • 31. Automatic Doc Clustering and Anomaly Detection : Features : • Automatic outlier detection. • Automatic decision about number of clusters K. (in house algorithm) • Automatic cluster topic detection. (in house algorithm) • Ranking of document based on relevance to the topic. • Minimum data science knowledge needed to use the module: Extensive research has been done to help choose the best set of models; Good default parameter settings that generalize well; Flexible pipeline. Use Cases : • Financial: group emails, news, research articles. • Ecommerce: group product reviews, product descriptions.
  • 32. NLP using Machine Learning: Automatic phrase detection using log likelihood: • Spark job to generate a table of phrases (freq co-occurred terms) such as “area rug”, “interior paint” and “ipad case”. Spell checking using edit distance: • Match a tail query to head query or product name based on edit distance to correct typos: e.g., ”wireless motam”->”wireless modem”. Synonym detection/query expansion using Word2Vec: • e.g., “refrigerator”->”freezer”, “phone”->”cellular”.