2. Questions we want to answer
• What is the purpose of leader in SolrCloud?
• How a leader is selected?
• What happens when a leader dies?
3. Purpose of Leader
• Shards: to scale : particular collection of
documents, the collection can be divided in
multiple shards.
• Shard replica: to failover correction(high
availability), load balancing : each of the shard
can be replicated to multiple shard replica
4. Purpose of Leader
• Collection – multiple shards – multiple replica
• How a request is served?
– Types of request:
• Read – search query, no consistency issue between
replica
• Write – index a document, consistency issue, should
have single source for write – Hence leader
6. Leader selection
• Zookeeper: SolrCloud uses Zok to track which
node is active and not, manage config files
etc.
• Zok helps is leader selection
• Zok already embedded in SolrCloud, but can
be run externally
7. Leader selection
• SolrCloud += new node
– The new node registers itself with Zok
– And creates znodes:
• session – with timeout, updated by the client node
regulary
• ephemaral node
• sequence node: when created gets a unique seq. no
assigned and suffixed to its name
– the clusterstate.json file gets updated (by
overseer)
11. Leader dies
• When the leader dies, znode having the
lowest sequence no.
• all znodes are being watched by ZoK
• Znode having the next sequence no. is elected
as the leader
12. Leader dies
• New leader candidate starts sync process with
each replica, if everyone has same version.
Then it registers as leader active
• Old leader might have sent docs to some
replicas and not all.
• And if a replica is far too behind, its tries to
replay log or ask for full replication
14. Code Flow of write requests
Rough sketch ->
org.apache.solr.handler.UpdateRequestHandler -> multiple
org.apache.solr.handler.loader.ContentStreamLoader: csv, xml, json
For each write request: loader is identified and its load method is
called
Within the loader, for different type of write request -
org.apache.solr.update.UpdateCommand is created and it is passed to
org.apache.solr.update.processor.UpdateRequestProcessor.process<Ad
d/Commit/...>
For solrcloud: DistributedUpdateProcessor is used