1. ... In which I tell a
story of building
a CMS on top of
‘NoSQL’
(*)
(*) HBase and SOLR
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
2. ... and hopefully
warn you on
what YOU will
encounter in the
near future.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
3. /usr/bin/whoami
» co-founder of Outerthought
» scalable content applications
» content management & publishing
» Java, REST and now NoSQL
» open source product portfolio
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.orgTHIS NOTEBOOK BELONGS TO: 3
4. » Daisy: content- and knowledge management
www.daisycms.org
» Lily: scalable store and search
THIS N OTE B OOK B ELO N GS TO :
www.lilycms.org
» Kauri: RESTcentric internet app development
www.kauriproject.org
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
12. Hitting the scale spot
» Sweet spot of # documents: (100)Ks, not Ms
» Not everything could be solved with increasing
heap size
» cold cache at startup
» OOME’s
» we didn’t want to step in the PHP/RDBMS trap
(of dynamic database schemes)
» The cost of flexibility
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
13. What we found hard to scale
» access control (dynamically evaluated against rule set)
» facet browsing (compute facet counts in RAM)
» all the nifty stuff people were using our
software for
» ... anything that required random access
to in-memory-cache data for computations
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
14. Beyond the ‘scaling’ problem
» three-prong data layer
fs
» result set merging (between MySQL & Lucene)
» happened in appcode/memory
» ‘transactions’, set operations = hard
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
15. Beyond the three-prong problem
» errrr..... “Failover” ..... ?
» = symptom of enterprise success
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15
16. If we would be able to add more nodes ...
scalability
» True Distribution availability
performance
... in the line of fire
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16
17. Solution 1
» do MORE inside the database
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
18. Infrastructural (master/slave)
e !
as
ta b
d a
o r e
m
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
18
19. e !
a s
ta b
da
o r e
n m
e ve
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
20. s !
u s se
e b
sa g
mes
d d
’s a
l et
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
21. ff!
! s tu
B C
JD
r !
o ve t
S w00
I! JM
RM
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
25. NoSQL
» the era of Polyglot Persistence
» the Tower of Bable
» the (B)Le(e|a)ding Edge
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
27. NoSQL tool selection
» the luxury of choice
(but remember polyglot persistence)
» survival of the fittest
» inflated expectations + nifty marketing
NOTE If your data fits in single node RAM
memory, DON’T go NoSQL (just yet)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
28. C
Requirements, phase I
» automatic scaling to large data sets
» fault-tolerance: replication, automatic handling of failing nodes
» a flexible data model supporting sparse data
» runs on commodity hardware
» efficient random access to data
» open source, ability to participate in the development thus
drive the direction of the project
» some preference for a Java-based solution
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
29. C
Requirements, phase II
» After careful consideration, we realized the
important choices were also:
» consistency: no chance of having two conflicting
versions of a row
» atomic updates of a single row, single-row
transactions
» bonus points for MapReduce integration
» e.g. full-text index rebuilding
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29
30. That brought us to HBase, which bought us:
» a datamodel where you can have column
families which keep all versions and others
which do not, which fits very well on our
CMS document model
» ordered tables with the ability to do range
scans on them, which allows to build
scalable indexes on top of it
» HDFS, a convenient place to store large blobs
» Apache license and community, a familiar
environment for us
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 30
31. HBase
» hbase.apache.org + Cloudera CDH distro
» Open Source (Google) BigTable
implementation
» HDFS as underlying DFS (≈GFS)
» ZooKeeper as lock service (≈Chubby)
» Integration with Hadoop MapReduce
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
32. BigTable
column family
{
"contents:" "anchor:cnnsi.com" "anchor:my.look.ca"
"com.cnn.www"
"<html>..."
"<html>..."
"<html>..." t6
t5
t3
"CNN" t9 "CNN.com" t8
} row
ure 1: A slice of an example table that stores Web pages. The row name is a reversed URL. The contents column family con-
the page contents, and the anchor column family contains the text of any anchors that reference the page. CNN’s home page
key cell
ferenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named anchor:cnnsi.com
anchor:my.look.ca. Each anchor cell has one version; the contents column has three versions, at timestamps t 3 , t5 , and t6 .
We settled on this data model after examining a variety Column Families
otential uses of a Bigtable-like system. As one con- 3
e example that drove some ofTECHNOLOGIEPARKdecisions,ZWIJNAARDE (GENT) » are grouped into sets called column fami-
IIC »
our design 3 » B-9052 Column keys www.outerthought.org
45. » OK, so now we have a data store !
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
46. » However, content repository =
store + search !
u ch
o
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 46
47. a s
w
t !
h a
T asy ...)
e er
w ev
(h o
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 47
48. Search ponderings
» CMS = two types of search
» structured, ‘logic’ search
» numbers, strings
» based on logic (SQL, anyone?)
» information retrieval (or: full-text search)
» text
» based on statistics
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 48
49. Search ponderings
» All of that, at scale
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 49
50. Structured Search
» HBase Indexing Library
» idea from Google App Engine datastore indexes
» http://code.google.com/appengine/articles/
index_building.html
rowkey col col rowkey col
order
A val3 foo6 val2-B
B val2 foo7 val3-A
content table index table A
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 50
51. Full-text / IR search
» Lucene?
» no sharding (for scale)
» no replication (for availability)
» batched index updates (not real-time)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 51
52. Beyond Lucene
» Katta
» scalable architecture, however only search, no indexing
» Elastic Search
» very young (sorry)
» hbasene et al.
» stores inverted index in HBase, might not scale all features
» SOLR
» widely used, schema, facets, query syntax, cloud branch
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 52
53. ?
+
=
r ?
! O
as y
E
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 53
54. Remember distribution ?
Remember secondary indexes ?
➙ Need for reliable queuing
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 54
55. Connecting things
» we needed a reliable bridge between our
main storage (HBase) and our index/search
server(s) (SOLR)
» indexing, reindexing, mass reindexing (M/R)
» we need a reliable method of updating
HBase secondary indexes
» all of that eventually to run distributed
» distribution means coping with failure
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 55
56. Solution
» ... a QUEUE ! Meh.
» ACMEMessageQueue ? Bzzzzzt.
We wanted fault-safe HBase persistence for
the queues.
Also for ease of administration.
» ➙ WAL & Queue implemented on top of
HBase tables
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 56
57. WAL / Queue
» WAL » Queue
» guaranteed execution » triggering of async
of synchronous actions actions
» call doesn’t return before » e.g. (re)index (updated)
secondary action finishes record with SOLR back-end
» e.g. update secondary actions » size depends on speed of
» if all goes well, back-end process
size = #concurrent ops
» useful outside of Lily context
as well!
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 57
58. The Sum
» Lily model (records & fields)
» mapped onto HBase (=storage)
» indexed and searchable through
SOLR
» using a WAL/Queue mechanism
implemented in HBase
» runtime based on Kauri
» with client/server comms via Avro
(and a REST interface with JSON)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 58
59. Lily Content Model
» Records > Fields
» Field types: the usual base types + blobs + link
fields
» ... so we can model relationships again
(and have free versioning while at it)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 59
62. Roadmap
» Available now = learning material
(architecture, model, API, Javadoc)
+ developer playground ‘proof of architecture’
➥ www.lilycms.org
» End of October = fully distributed release re!
the
early
» from there on, ca. 3-monthly releases N
leading up to Lily 1.0
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 62
66. Thanks for your
hospitality and
attention !
THIS NOTEBOOK BELONGS TO:
» stevenn@outerthought.org
Noteblock_03.indd 1 23/05/10 14:42
» @stevenn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 66
Notas del editor
+ mergen van results van search overheen mysql & lucene
consistency?? we&#x2019;re a content repository, after all - people rely on us
MapReduce for index rebuilding
This can be used instead of Lucene for indexes which are structured, large, and should be immediately up to date. For example, we use this to keep an index of the links that exist between records.
use values as base for key computation and rely on HBase naturally-ordered rows + scans