Distributed Search in Riak - Integrating Search in a NoSQL Database: Presented by Fred Dushin, Basho Technologies
1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
2. Distributed Search in Riak
Integrating search in a NoSQL database
Fred Dushin
Member of Technical Staff
Basho Technologies
3. 3
About Me
CORBA -> Web Services -> MoM
Joined Basho Jan 2015
Reach out!
github://fadushin
lr2015@dushin.net
4. 4
What I want to talk about
How is Query even possible in a distributed
NoSQL database?
What happens when things break?
How does Riak distribute data?
How does Riak repair divergence?
What is Riak? What is Riak Search?
What does Solr bring to Riak?
What does Riak bring to Solr?
5. 5
What is Riak?
A Distributed key-value store
Prioritizes availability over consistency
Provides elasticity without downtime
6. 6
A Riak Glossary
• Key
... any sequence of bytes
• Value
... any opaque blob of data
• Bucket
... an organizing namespace for keys
• Bucket Type
... an organizing namespace for buckets
{{BucketType, Bucket}, Key} -> Value
"BKey"
16. 16
What is Yokozuna?
An extension of Riak which provides search
capability over values stored in Riak
Data stored and replicated in Riak is
automatically indexed in Solr
Solr queries are distributed across the Riak
cluster
http://github.com/basho/yokozuna
20. 20
Riak Query
All Solr queries are made on the Riak endpoint
Riak uses distributed (legacy) Solr to route queries to
nodes in the Riak cluster using the shards parameter
Solr aggregates results and returns result through Riak
Riak supports all query features supported in distributed
Solr*
* Protobuf interfaces currently have some limitations.
21. Node 5
Node 4
Node 3
Node 2
Node 1
21
Covering Sets
bucket: agents
key: agentp
value: {"name_s": "perry",
"type_s': "reptile"}
bucket: agents
key: agentp
value: {"name_s": "perry",
"type_s': "reptile"}
bucket: agents
key: agentp
value: {"name_s": "perry",
"type_s': "reptile"}
6
7
83
4
5 1
2
A Covering Set is a subset of all partitions
such that for all BKeys in the keyspace, there is
exactly one partition in the covering set in
which that BKey can be found.
Covering Sets are not unique!
26. Node 2
26
YZ AAE
2 my_index
name_s: ["perry"]
type_s: ["mammal"]
bucket: agents
key: agentp
value: {"name_s": "perry",
"type_s': "mammal"}
7
Yokozuna maintains its own set of AAE tress
for data stored in Solr.
Hashtrees are stored persistently on disk
Updated on indexing operations
Periodically exchanged between the K/V AAE
tree on the same node
If a value is missing in Solr, it is reindexed; if a
value is indexed when it shouldn't be, it is
deleted. Riak K/V is canonical.