This document discusses a system for scalable text retrieval that uses randomized clustering. Key aspects include randomly assigning documents to shards and shards to nodes for balanced load and simplified management. It uses Zookeeper for coordination between components like the retrieval engines, indexer, and master, which monitors the cluster and rebalances as needed. The approach aims to provide scalability, reliability, and easy rebalancing through randomization and coordination.
2. Text Retrieval Task
• Text viewed as a sequences of terms in fields
• Document and position for each term are indexed
• Query is a sequence of terms (typically many more than
user actually types)
3. Text Retrieval
• Scores computed by merging occurrences of terms in
query
• Only top scoring documents are kept
• Deletion and document edits done by adding new
documents and keeping deletion list
• (this is all standard ... Lucene is the best known example)
6. Problems
• Presumes objects can be moved individually
• Has very high insertion/deletion rate
• Has disordered access patterns
• Often exhibits content/placement correlations
10. Scenario: Node Start
●
Node starts, tells ZK it exists and has no
shards
●
Master notified by ZK, looks at shard
placement
●
Imbalance exists so Master assigns shards to
new node
●
Node notified by ZK, downloads shard, tells ZK
●
Master notified by ZK, looks at shard
placement, unassigns shard somewhere
11. Scenario: Node Crash
●
ZK detects node connection loss and session
expiration
●
Master is notified by ZK that node ephemeral
file has vanished, looks at shard placements
●
If under-replication exists, Master assigns
shards to other nodes
●
Nodes are notified by ZK, download shards, tell
ZK
●
Master is notified by ZK, no action needed
12. Summary of Master
●
Master is notified of node set or shard set
change
●
Master examines current state of cluster
●
If shards are under-replicated, add
assignments
●
If shards are over-replicated, delete
assignments
●
If cluster is imbalanced, add assignments
●
Rinse, repeat
13. Quick Results
• No deletion/insertion in indexes at runtime
• Reloading micro-shards allows large sequential transfers
• Multiple shards allows very simple threading of search
• Random placement guided by balancing policy gives near
optimal motion
• Node addition and failure are simple, reliable
• Random sharding also near optimal
local = global statistics, 2x query time improvement
load balancing
uniform management
14. • EC2 - elastic compute
• Zookeeper - reliable coordination
• Katta - shard and query management
• Hadoop - map-reduce, RPC for Katta
• Lucene - candidate set retrieval, index file storage
• Deepdyve search algorithms - segment scoring
Building Blocks
15. • EC2 - elastic compute
• Zookeeper - reliable coordination
• Katta - shard and query management
• Hadoop - map-reduce, RPC for Katta
• Lucene - candidate set retrieval, index file storage
• Deepdyve search algorithms - segment scoring
Building Blocks
16. Zookeeper
• Replicated key-value in-memory store
• Minimal semantics
create, read, replace specified version
sequential and ephemeral files
notifications
• Very strict correctness guarantees
strict ordering
quorum writes
no blocking operations
no race conditions
• High speed
50,000 updates per second, 200,000 reads per second
17. • EC2 - elastic compute
• Zookeeper - reliable coordination
• Katta - shard and query management
• Hadoop - map-reduce, RPC for Katta
• Lucene - candidate set retrieval, index file storage
• Deepdyve search algorithms - segment scoring
Building Blocks
18. Katta Interface
• Simple Interface
Client - horizontal broadcast for query, vertical broadcast for update
InodeManaged - add/removeShard
• Pluggable Application Interface
• Pluggable Return Policy
Given current return state
return < 0 => done
return 0 => return result, allow updates
return n => wait at most n milliseconds
• Comprehensive Results
Results, exceptions, arrival times
21. Impact of Cloud
Approach
• Scale-free programming
• Deployed in EC2 (test) or in private farm (production)
• No single point of failure
• Real-time scale up/down
• Extensible to real-time index updates
22. Lessons
• Random is good
no correlations in document to shard assignments
⇒ strong bounds on node variations in search time
⇒ local statistics are as good as global statistics
no structure in shard to node assignments
⇒ node failure is not correlated to documents
⇒ load balancing and rebalancing is trivial
⇒ threaded search is trivial
23. More Lessons
• Randomized clustering requires good coordination
Zookeeper makes that easy
• Good coordination means not having to say you’re sorry
Masters coordinate but don’t participate