Webinar: Fusion 3.1 - What's New

Based in San Francisco
Ofﬁces in Bangalore, Bangkok,
New York City, Raleigh, Munich
Over 300 customers across the
Fortune 1000
Fusion, a Solr-powered platform
for search-driven apps
Consulting and support for
organizations using Solr
Produces the world’s largest open
source user conference dedicated
to Lucene/Solr
Lucidworks is the primary sponsor of
the Apache Solr project
Employs over 40% of the active
committers on the Solr project
Contributes over 70% of Solr's
open source codebase
40%
70%

If you can’t ﬁnd it, then you can’t…
R&D Online
Retail
Customer
Insights
Fraud &
Compliance
Customer
Service

• Over 50 connectors to
integrate all your data
• Robust parsing framework
to seamlessly ingest all your
document types
• Point and click Indexing
conﬁguration and iterative
simulation of results for full
control over your ETL process
• Your security model
enforced end-to-end from
ingest to search across your
different datasources

• Relevancy tuning: Point-and-click
query pipeline configuration allow
fine-grained control of results.
• Machine-driven relevancy: Signals
aggregation learn and automatically
tune relevancy and drive
recommendations out of the box .
• Powerful pipeline stages:
Customize fields, stages, synonyms,
boosts, facets, machine learning
models, your own scripted behavior,
and dozens of other powerful
search stages.
• Turnkey search UI 
(Lucidworks AppStudio): Build a
sophisticated end-to-end search
application in just hours.

Typical Apache Solr Deployment Architecture
Optional
Worker Worker Cluster Manager
Spark/Hadoop
Shards Shards
Solr
HDFS
Shared Conﬁg
Management
Leader
Election
Load
Balancing
ZK 1
Zookeeper
ZK N
Nutch/
Heretrix
Log Proc.
Mahout
(Recommender)
ManifoldCF*
(Connectors)
Security
(Roll your own)
Roll your own
*only 12 connectors available,  
compared w/ 60+ in Fusion
SiLK
Scheduling
(cron?)
Admin UI
Deployment
(Roll your own)
Monitoring
(Roll your own)
Relevance Tools
(Roll your own)
Tika ships w/ Solr, but can’t be scaled independently
NLP tools

SECURITY BUILT-IN
Shards Shards
Apache Solr
Apache Zookeeper
ZK 1
Leader
Election
Load
Balancing
Shared Conﬁg
Management
Worker Worker
Apache Spark
Cluster
Manager
RESTAPI
Admin UI
Lucidworks
View
LOGS FILE WEB DATABASE CLOUD
HDFS(Optional)
Core Services
• • •
ETL and Query Pipelines
Recommenders/Signals
NLP
Machine Learning
Alerting and Messaging
Security
Connectors
Scheduling
Fusion Simpliﬁes the Deployment

SECURITY BUILT-IN
RESTAPI
Admin UI
Lucidworks
View
HDFS(Optional)
Fusion abstracts the
OS so you don't have
to worry about that.
Core Services
• • •
ETL and Query Pipelines
Recommenders/Signals
NLP
Machine Learning
Alerting and Messaging
Security
Connectors
Scheduling
Fusion Abstracts Open Source Bits

Core Services
• • •
NLP
Recommenders / Signals
Blob Storage
Pipelines
Scheduling
Alerting / Messaging
Connectors
RESTAPI
Admin UI
Lucidworks
View
• Seamless integration of your
entire search & analytics
platform
• All capabilities exposed
through secured API's, so you
can use our UI or build your
own.
• End-to-end security policies
can be applied out of the box
to every aspect of your search
ecosystem.
• Distributed, fault-tolerant
scaling and supervision of
your entire search application

Component version
upgrades
• Solr 6.5.1
• Zookeeper 3.4.6
• Spark 2.1.1

Query Explorer Jobs
• Collection Analysis
• Levenshtein Spell
Checking
• Statistically Interesting
Phrases
• http://lucidworks.com/
2017/06/21/query-
explorer-jobs-in-
fusion-3-1/

New APIs
• Links API - explore the links between Fusion objects.
• Jobs API - replaces the Scheduler API
• Tasks API - Jobs schedule Tasks, tasks are the things you do (i.e. make
this rest call)
• Groups API - Tag groups of Fusion Objects with identiﬁers
Improved APIs
• Jobs and Schedules are now separate (generally but this leaks into the
APIs)
• Spark Jobs API
• Objects API

Distributed Indexing
Index pipelines can
now be invoked on a
different Fusion
instance than the one
on which the
connector is running.

Connector Enhancements
• JIRA Connector - security trimming, parsers and performance,
settings for ﬁeld, timeout and retry.
• Web Connector - crawl JavaScript powered sites, authentication and
timeout settings
• Jive Connector - lists and maps
• Box.com connector - incremental crawling improvements, more
depth and exclusion settings
• Google Drive - batch incremental crawling, security trimming,
timeouts, index trash

New Parsers
• XML Parser - Parse XML separately into new documents
• HTML Parser - not Tika based, uses JSoup Selectors to extract HTML
or CSS into ﬁelds/documents

Query Pipeline
• Recommendation -> Boost With Signals
• User Recommendation Boosting -> Recommend Items for User
• Recommend Similar -> Recommend More Like This

•SSL to Solr
•with /
Kerberos
•or Basic Auth

Enabling Cognitive Search
with Fusion 3.1 Machine
Learning

Machine Learning Functionalities with Fusion 3.1
Fusion machine learning capabilities include:
• Recommendation and Personalization.
• Query intent and document classification.
• Learning to rank.
• Automatic doc clustering, anomaly detection, cluster labeling and topic
detection.
• NLP: synonym discovery and expansion via w2v; phrase detection using log
likelihood.
• Clickstream auto-tuning relevance and analysis.
• Multi-arm bandit experimentation.

Recommendation and Personalization :
Out of the box, multi-modal recommenders based on collaborative
filtering and content:
Use cases:
• Recommend item for user: select items that user likely to buy based on past click/
purchase behavior.
• Recommend item for query: help boost items based on query history.
• Item to item similarity: users who were interested in items like X were also
interested in items like Y and Z.
• Query to query similarity: suggest similar queries to provide query expansion.
Built in pipeline, spark job, job schedules and visualizations all provided.

Query relevancy and intent:
Query intent classification using Random Forest or Logistic Regression:
Use Cases:
• Predict product category of an incoming query to help reduce ambiguity.
• Predict category of a set of new product to be incorporated into catalog.
Learning to Rank:
• Use internal and external features of content and query to influence ranking.
• Leverage Solr’s re-ranking (&rq) capability.
• Support for: linear and non-linear models (libSVM, liblinear, RankLib)

Automatic Doc Clustering and Anomaly Detection :
Features :
• Automatic outlier detection.
• Automatic decision about number of clusters K. (in house algorithm)
• Automatic cluster topic detection. (in house algorithm)
• Ranking of document based on relevance to the topic.
• Minimum data science knowledge needed to use the module:
Extensive research has been done to help choose the best set of models; Good default
parameter settings that generalize well; Flexible pipeline.
Use Cases :
• Financial: group emails, news, research articles.
• Ecommerce: group product reviews, product descriptions.

NLP using Machine Learning:
Automatic phrase detection using log likelihood:
• Spark job to generate a table of phrases (freq co-occurred terms) such as “area rug”,
“interior paint” and “ipad case”.
Spell checking using edit distance:
• Match a tail query to head query or product name based on edit distance to correct typos:
e.g., ”wireless motam”->”wireless modem”.
Synonym detection/query expansion using Word2Vec:
• e.g., “refrigerator”->”freezer”, “phone”->”cellular”.

Webinar: Fusion 3.1 - What's New

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Webinar: Fusion 3.1 - What's New

Similar a Webinar: Fusion 3.1 - What's New (20)

Más de Lucidworks

Más de Lucidworks (20)

Último

Último (20)

Webinar: Fusion 3.1 - What's New