HPTS talk on micro sharding with Katta

Text Retrieval Task
• Text viewed as a sequences of terms in fields
• Document and position for each term are indexed
• Query is a sequence of terms (typically many more than
user actually types)

Text Retrieval
• Scores computed by merging occurrences of terms in
query
• Only top scoring documents are kept
• Deletion and document edits done by adding new
documents and keeping deletion list
• (this is all standard ... Lucene is the best known example)

Traditional Scaling
shardshard
11
shardshard
22
shardshard
33
shardshard
44
shardshard
55
shardshard
11
shardshard
22
shardshard
33
shardshard
44
shardshard
55
shardshard
11
shardshard
22
shardshard
33
shardshard
44
shardshard
55
Replication
Sharding
?1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n

Problems
• Presumes objects can be moved individually
• Has very high insertion/deletion rate
• Has disordered access patterns
• Often exhibits content/placement correlations

Micro Sharding
Retrieval Indexer #1Retrieval Indexer #1
Retrieval Indexer #2Retrieval Indexer #2
Retrieval Indexer #Retrieval Indexer #nn
Content Indexer #1Content Indexer #1
Content Indexer #2Content Indexer #2
Content Indexer #Content Indexer #mm
for (t in types)for (t in types)
yield [key:(t, h(key)%shardCnt),yield [key:(t, h(key)%shardCnt),
value:doc]value:doc]
n,m >> number of search nodes
map reduce hdfs

Search Architecture
Retrieval Engine #1Retrieval Engine #1
Retrieval Engine #2Retrieval Engine #2
Retrieval Engine #Retrieval Engine #nn
Content Engine #1Content Engine #1
Content Engine #Content Engine #mm
federatorfederatorfederatorfederator
presentationpresentation
layerlayer

Control Architecture
Retrieval Engine #2Retrieval Engine #2federatorfederator
zookeeperzookeeper
indexerindexer
kattakatta
mastermaster
HDFS

Scenario: Node Start
●
Node starts, tells ZK it exists and has no
shards
●
Master notified by ZK, looks at shard
placement
●
Imbalance exists so Master assigns shards to
new node
●
Node notified by ZK, downloads shard, tells ZK
●
Master notified by ZK, looks at shard
placement, unassigns shard somewhere

Scenario: Node Crash
●
ZK detects node connection loss and session
expiration
●
Master is notified by ZK that node ephemeral
file has vanished, looks at shard placements
●
If under-replication exists, Master assigns
shards to other nodes
●
Nodes are notified by ZK, download shards, tell
ZK
●
Master is notified by ZK, no action needed

Summary of Master
●
Master is notified of node set or shard set
change
●
Master examines current state of cluster
●
If shards are under-replicated, add
assignments
●
If shards are over-replicated, delete
assignments
●
If cluster is imbalanced, add assignments
●
Rinse, repeat

Quick Results
• No deletion/insertion in indexes at runtime
• Reloading micro-shards allows large sequential transfers
• Multiple shards allows very simple threading of search
• Random placement guided by balancing policy gives near
optimal motion
• Node addition and failure are simple, reliable
• Random sharding also near optimal
local = global statistics, 2x query time improvement
load balancing
uniform management

• EC2 - elastic compute
• Zookeeper - reliable coordination
• Katta - shard and query management
• Hadoop - map-reduce, RPC for Katta
• Lucene - candidate set retrieval, index file storage
• Deepdyve search algorithms - segment scoring
Building Blocks

Zookeeper
• Replicated key-value in-memory store
• Minimal semantics
create, read, replace specified version
sequential and ephemeral files
notifications
• Very strict correctness guarantees
strict ordering
quorum writes
no blocking operations
no race conditions
• High speed
50,000 updates per second, 200,000 reads per second

Katta Interface
• Simple Interface
Client - horizontal broadcast for query, vertical broadcast for update
InodeManaged - add/removeShard
• Pluggable Application Interface
• Pluggable Return Policy
Given current return state
return < 0 => done
return 0 => return result, allow updates
return n => wait at most n milliseconds
• Comprehensive Results
Results, exceptions, arrival times

Horizontal/Vertical
Broadcast
shardshard
11
shardshard
22
shardshard
33
shardshard
44
shardshard
11
shardshard
22
shardshard
33
shardshard
44
shardshard
11
shardshard
22
shardshard
33
shardshard
44
Replication
1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n

Operations
Retrieval Engine #2Retrieval Engine #2federatorfederator
zookeeperzookeeper
indexerindexer
kattakatta
mastermaster
HDFS

Impact of Cloud
Approach
• Scale-free programming
• Deployed in EC2 (test) or in private farm (production)
• No single point of failure
• Real-time scale up/down
• Extensible to real-time index updates

Lessons
• Random is good
no correlations in document to shard assignments
⇒ strong bounds on node variations in search time
⇒ local statistics are as good as global statistics
no structure in shard to node assignments
⇒ node failure is not correlated to documents
⇒ load balancing and rebalancing is trivial
⇒ threaded search is trivial

More Lessons
• Randomized clustering requires good coordination
Zookeeper makes that easy
• Good coordination means not having to say you’re sorry
Masters coordinate but don’t participate

Resources
●
My blog
– http://tdunning.blogspot.com/
●
The web-site
– www.deepdyve.com
●
Source code
– Katta (sourceforge)
– Hadoop (Apache)
– Lucene (Apache)

HPTS talk on micro sharding with Katta

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a HPTS talk on micro sharding with Katta

Similar a HPTS talk on micro sharding with Katta (20)

Más de MapR Technologies

Más de MapR Technologies (20)

Último

Último (20)

HPTS talk on micro sharding with Katta