NetDocuments- Journey from FAST to Solr

Journey from FAST to Solr

Presented By :
David Hamson , Mou Nandi

Goal of the Session
•  NetDocuments

•  Why
move
to
Solr
from
FAST

•  Architec8ng
Solr
to
work
as
a
core
module
for
a
Cloud
Document

Management
product
user
interface
building
and
document

discovery

•  Tes8ng
and
benchmarking
Solr
to
scale
and
perform
for
billions
of

documents
with
200
QPS
and
200
DPS

•  Lessons
learned/
shortcuts
found
migra8ng
from
FAST
to
Solr

2/14

Who We Are

A
Leading
cloud
content
management
and
collabora8on
service
for
small
to
medium
businesses
(SMB)

and
professional
services
ﬁrms

2/14

Who We Serve
We
service
over
1,000
customers
across
128
countries
worldwide
and
host
over
250+million

documents.

2/14

Why Migrate to Solr

•  Product
roadmap
does
not
ﬁt
with
company
roadmap

•  Large
hardware
footprint
,
expensive
to
scale

•  High
indexing
latency

•  Unpredictable
and
untraceable
document
loss

•  A
black
box
search
engine,
dependency
on
MicrosoT
FAST
support
team

•  No
control
over
new
features

•  Expensive
license

•  Solr
supports
massive
index

•  Ac8ve
hardworking
development
community

•  Access
to
what’s
happening
under
the
hood

•  Improved
hardware
footprint

•  Reduced
licensing
cost

2/14

Migration to Solr

FAST Instance 1 •  95
%
of
searches
are

metadata
search
-‐
Metadata

FIXML
Fast
MDI + FTI
index
does
not
need
rich
text

Indexer
Fast Doc Processors
processing

FAST Instance 2 •  Flexibility
to
implement

diﬀerent
architecture
for

ND
Document FIXML
Fast
Indexer MDI + FTI MDI
and
FTI

Fast Doc Processors

•  Highest
level
of
logging
can

not
trace
the
document
loss

More FAST Instances
during
a
heavy
feeding
traﬃc

2/14

Migration to Solr – Solr Indexing
Solr MD Instance 1

Solr MDI MDI

MD Solr MD
XML
Solr MD Instance 1

Solr MDI MDI

ND
Document
Solr FT Instance

ND Pipeline
Solr FTI FTI

FT Solr FT
XML
Solr FT Instance
Aspire

Solr FTI FTI

2/14

The Migration Project

•  Only create MDI
Phase 1 - MDI •  Use FAST data to prototype Solr
•  Use the fixmls to build the Solr index
•  Use 100% filter queries

Phase 2 – FTI •  Build a robust feeding pipeline to handle both MD FT
•  Building a text processing pipeline

Phase 3 •  Implement new Solr features

2/14

Some ft. view of NetDocuments Search Architecture

Web Queue Solr MDI

NDPipeline

-‐

Administration ( monitoring, debugging, stats)

MDH1 FTP1 D1

FT Processor pool
MD Handler Pool

Dispatcher queue

Dispatcher pool
MDH2 FTP2 D2
Query
FT Queue

Web App
Web App MDH3 FTP3 D3 Distributor

MDH4 FTP4 D4

MDH5 FTP5 D5

File Solr FTI
System

2/14

Benchmarking Solr Config Parameter for indexing
•  Created
Solr
index
from
fixmls
with
different
ram
buffer,
merge
factor

and
auto
commit
configura8on

Testing with HDD and SSD

•  We
did
not
see
any
performance
difference
between
HDD
(
15k
rpm)
and

the
iodrive2
with
ND
documents

•  15
threads
running
at
a
8me
from
client
feeder
applica8on

2/14

Testing using different file system

•  We
did
not
see
huge
performance
diﬀerence
between
ext3
and
xfs
on

HDD
or
SSD,
with
ND
Documents

•  We
chose
to
use
ext3
for
FTI

with
15K
HDD
on
RAID10

•  We
are
using
xfs
for
iodrive
for
MDI
as
suggested
by
fusion
Io

2/14

Benchmarking Solr Indexing and Query Process

search
going
to
10

search
going
to
5
shards
shards

5
solr
meter
instances
10

Solr
meter
instances

Each
shard
serving

3000
queries
per
min
Each
shard
serving

1500
queries/min

Total
15000
queries/min
Total
15000
queries/min

Implemented
and
compared

mul8-‐core
index
processing
avg
response
8me
8
ms
avg
response
8me
12
ms

and
query

performance
cpu
20
%
cpu
32
%

compared
to
single
core
index
ram
-‐
52
G
ram
-‐
53
G

cache
warmup
8me
2.5
S
cache
warmup
8me
2.7
S

cachehit
ra8o
.98
cachehit
ra8o
.98

cache
size
2276
cache
size
2276

no
evic8on
no
evic8on

index
updated
every
7
sec
index
updated
every
7
sec

test
ran
5
min
test
ran
8
min

2/14

Benchmark qtime increase as Solr scales and start row increases

qTime does not vary much with start row increase.

6/14

Tuning System queries for Solr
•  System
searches
are
metadata
searches

•  Thousands
of
real-‐life
queries
were
extracted
from
FAST
query
log

• 
Extensive
use
of
filter
queries
and
filter
cache
give
excellent
response
8me
for
complex

queries

•  Example
queries:

FAST
Query
:

ANDNOT(ANDNOT(ANDNOT(AND(AND(ndcabinets:string(“cab1",

mode="and"),ndcredate:range(2011-‐09-‐26T00:00:00,2012-‐04-‐13T23:59:59)),FILTER(ndacl:string(“acl1
acl2
acl3

",mode="OR"))),nddeletedcabs:string(“cab1",
mode="and")),ndexten:string("ndws",
mode="and")),ndexten:string("ndflt",

mode="and"))

Solr
Query:

hlp://solrserver:port/solrSearch/core0/select?shards=solrserver:port/solrSearch/core0,1solrserver:port/solrSearch/
core1&start=0&rows=500&fl=ndenvurl,nddocmodnum_s_std,nd8tle_t_idx_std&sort=ndlastmoddate_tdt_idx
+desc&q=ndenvurl:*&fq=ndcabinets_smul8_idx:cab1&fq=ndcredate_tdt_idx:[2011-‐09-‐26T00:00:00Z
TO

2012-‐04-‐13T23:59:59Z]&fq={!cache=false
cost=100}(ndacl_smul8_idx:acl1
OR
ndacl_smul8_idx:acl2
OR

ndacl_smul8_idx:acl3)&fq=-‐nddeletedcabs_smul8_idx:cab1&fq=-‐ndexten_s_idx:ndws&fq=-‐ndexten_s_idx:ndflt

2/14

NetDocuments- Journey from FAST to Solr

Recomendados

Recomendados

Más contenido relacionado

Más de lucenerevolution

Más de lucenerevolution (20)

Último

Último (20)

NetDocuments- Journey from FAST to Solr