2. Apache Jackrabbit Oak
• Scalable content repository
• JCR 2.0
• Designed for concurrent access (MVCC)
• Pluggable components (storage, indexes)
• Powering AEM 6.0
18/11/14
2
3. Oak Architecture
• Oak-JCR
• Oak-Core
– MVCC (node states and immutable trees)
– Core components (Security, Query engine, …)
– Plugins
• Oak-MK
– Pluggable storage
18/11/14
3
4. Oak – the Query Engine
• Query languages
– XPATH
– SQL-2
• Selects the index(es) supposed to perform
better
– Search is demanded to the underlying indexes
– No index? The repository is traversed
• ACLs applied afterwards
18/11/14
4
5. Indexing – the IndexEditor API
• NodeState before = builder.getNodeState();
• builder.child(”a").setProperty(”foo", ”bar");
• NodeState after = builder.getNodeState();
• NodeState indexed =
editorHook.processCommit(before, after,
…); // who said MVCC?
18/11/14
5
6. Searching – the QueryIndex API
• Filter filter = … ; // "select * from [nt:folder]"
• filter.restrictPath("/somenode",
Filter.PathRestriction.DIRECT_CHILDREN);
• Cursor cursor = queryIndex.query(filter,
nodeState); // search against a state
• IndexRow row = cursor.next(); // results
18/11/14
6
7. Searching – Filters
• Full text expressions
• Property restrictions
• Path restrictions
– Exact
– Parent
– Child
– Descendant
• Node type restrictions
18/11/14
7
8. Configuring indexes
• Indexes are declared by adding “query
index configuration” nodes in the repository
– Type
– Asynchronous
– Reindex
– Index specific properties
18/11/14
8
9. In repository indexes
• Data structures designed as content
– Property index
– Ordered property index
– Node type index
– Reference index
18/11/14
9
10. Lucene index
• Full text and (sorted) property restrictions
• Stored in repository
• Tika for indexing binaries
• Configurable indexing rules (boost), codec,
analyzers
19/11/14
10
11. Lucene index
• Interesting facts
– DocValues for sorted property restrictions
– Uncompressed stored fields
– Property exists queries
• TermRange vs Wildcard vs Term vs MatchAll
+FieldExistsFilter
19/11/14
11
12. Solr index
• Full text, property, path restrictions
• Embedded or remote Solr(Cloud)
• Configurable
– Mapping restriction / fields
– Page size
– Commit policy
• Most is configured on the Solr side
18/11/14
12
13. Problems
• Hard to express complex queries
• Cannot leverage underlying indexes
advanced capabilities
18/11/14
13
14. Native language support
• Leverage underlying index capabilities
– Multiple query languages/parsers
• More accurate full text queries (and results)
– … where native(’lucene', 'name:(hello world)
“hello world”^3')
• Advanced index capabilities (e.g. MLT)
– … where native('solr', 'mlt?q=path:/content/
sample1&mlt.fl=jcr:title')
19/11/14
14
15. Adding more indexes
• Create an IndexEditor
– Turn diff into an “indexable”
• Create a QueryIndex
– Turn a Filter into an index-specific query
• “Declare” the index
18/11/14
15
16. Looking forward
• Results aggregation features (e.g. facets)
• More configuration options (Lucene, Solr)
• Smarter index selection
• Cover indexes
18/11/14
16