Apache Jackrabbit is just about to reach the 3.0 milestone based on a new architecture called Oak. Based on concepts like eventual consistency and multi-version concurrency control, and borrowing ideas from distributed version control systems and cloud-scale databases, the Oak architecture is a major leap ahead for Jackrabbit. This presentation describes the Oak architecture and shows what it means for the scalability and performance of modern content applications. Changes to existing Jackrabbit functionality are described and the migration process is explained.
4. Outline
• Tree model
• Updating the tree
• Refresh and garbage collection
• Concurrency and conflicts
• Interlude: Implementations
• Replicas and sharding
• Access control
• Comparing revisions
• Commit hooks
• Observers
• Search
• Big picture
18. Conflict handling strategies
a. Fully serialized commits
• fail on conflict, no concurrent updates
b. Partially serialized commits
• fail on conflict, concurrent conflict-free updates
c. Partial merge logic
• conflict markers, manual conflict resolution
d. Full merge logic
• conflicting changes may be lost
20. MicroKernel/NodeStore
• Implementation of the tree/revision model
Responsible for
Clustering
Sharding
Caching
Conflict handling
etc.
Not responsible for
Type validation
Access control
Search
Versioning
etc.
21. Current implementations
DocumentMK TarMK (SegmentMK)
Persistence backends MongoDB, JDBC (WIP) Local FS (tar files)
Conflict handling Partial serialization Full serialization
Clustering MongoDB clustering Simple failover
Sharding MongoDB sharding N/A
Single-node performance Moderate High
Key use cases Large deployments (>1TB),
concurrent writes
Small/medium deployments,
mostly read
27. Existentialism
• All (syntactically valid) paths can be traversed
• But the identified node might not exist
• For example:
root.getChildNode(“a”).exists() -> false
root.getChildNode(“a”).getChildNode(“b”).exists() -> true!
• Implemented as a decorator over the MK
30. Content diff
• Tells what changed between two content trees
• Cornerstone of most higher-level functionality
• validation
• indexing
• observation
• etc.
34. Commit hooks
• Based on given before and after states, a hook can:
• fail the commit, or
• pass the commit unmodified, or
• pass the commit with modifications
• Key plugin mechanism in Oak
• All configured hooks are applied in sequence
• Used for much higher level functionality
• Often implemented using a content diff
35. Examples
• All kinds of validation
• node types, access control, references, etc.
• Trigger-like functionality
• autocreated content, default values, etc.
• In-content index updates
• etc.
36. Types of hooks
CommitHook Editor Validator
Content diff Optional Always Always
Can modify commit Yes Yes No
Programming
model
Simple Callbacks Callbacks
Performance
impact
High Medium Low
38. Observers
• Based on given before and after states, an observer can:
• observe what changed in the content tree
• Invoked after the commit, unlike commit hooks
• Always asynchronous for changes from other cluster
nodes
• Depending on backend, can be synchronous for
changes on the local cluster node
• Often implemented using a content diff
42. Query processing steps
1. Parsing
a. Select matching parser
b. Parse the query string
2. Execution
a. Estimate cost per index
b. Select index with the least cost estimate
c. Execute the query against the index
3. Post-processing
a. Filter results on access control and additional constraints
b. Apply sorting, grouping, faceting, etc.
43. Index implementations
• Property index
• Reference index
• Lucene index
• in-content
• local file system
• Solr index
• embedded
• external