1. www.day.com
CRX 1.4
TarPersistenceManager and Clustering
with TarPersistenceManager
Speaker: Honwai Wong, SSE
Duration: 45 min
Feedback: techsummit@day.com
Day Technical Summit 2008 1
2. www.day.com
Agenda
TarPM (TarPersistenceManager)
Functionality
Configuration
Optimization
Hot Backup
Migration
TarPM (TarPersistenceManager) Clustering
Architecture
Global Data Store
Setup
Configuration
Day Technical Summit 2008
3. www.day.com
TarPM
Functionality
Disk-based PersistenceManager
Uses standard Tar file format (POSIX standard)
Append-only write operations, thus extremely efficient
Particularly suitable for high data creation and modification
use-cases
Takes advantage of key-value pair data structure of CRX
Maintains index files for fast access
Hot backup capability
Day Technical Summit 2008 3
4. www.day.com
TarPM
Configuration
TarPM configuration is done on a per workspace-level
e.g. <crx_home>/workspaces/crx.default/workspace.xml
no mandatory parameters, preset with default values
<PersistenceManager class=
quot;com.day.crx.persistence.tar.TarPersistenceManagerquot; />
Day Technical Summit 2008 4
5. www.day.com
TarPM
Configuration - Parameters
Parameter Description default
The directory where local files are stored. This can be an base directory
localPath absolute or relative path. of workspace
If the current data file grows larger than this number (in MB),
maxFileSize 64
a new data file is created.
After an abnormal termination, at most this much data (in
maxIndexBuffer 32
MB) needs to be scanned to re-create the tar entry index.
optimizeSleep Number in milliseconds to wait after each optimization step. 1
Day Technical Summit 2008 5
6. www.day.com
TarPM
Optimization
Append-only operation leads to increased disk usage
Data in the tar files is never overwritten
Delete will append 0 length entries
Optimization task copies active data from old tar files into new
ones and subsequently deletes old tar files
Different modes of operation supported
Recommended to run during times of low system usage
Day Technical Summit 2008 6
7. www.day.com
TarPM
Optimization - Modes
Manually trigger optimization from CRX Explorer
Place a file called optimize.tar in data-directory
TarPM detects this file and starts optimization
optimize.tar is renamed to optimizeNow.tar
after optimization finished, optimizeNow.tar is deleted automatically
stop task by deleting this file
Automate using cron-job
Offline optimization using command-line tool
java -cp <jars> com.day.crx.persistence.tar.TarUtils -optimize <directory>
Day Technical Summit 2008 7
8. www.day.com
TarPM
Hot Backup
Reminder: tar files are append-only
Backup at any time including runtime
Place file stopdelete.tar to prevent the TarPM from deleting old
files while backing up
Consistent backup by copying data_*.tar files, sorted by
modification date, newest last
e.g. ls -tr data_*.tar | xargs -n1 -J % cp -v % /backup
When restoring, incomplete transactions are rolled back
Day Technical Summit 2008 8
9. www.day.com
TarPM
Migration
Migration of workspaces using CRX Console
Low-level copy of existing workspace to new TarPM workspace
Tool is provided with CRX
Comprehensive documentation on docs.day.com
see section CQ 4.2 / Setup / Migration
Part of migration presentation from Tech Summit 2007
http://daycare.day.com/home/day_public/tech_summit_2007.html
Author: Dominique Jaeggi, SSE, Day
Day Technical Summit 2008 9
10. www.day.com
TarPM Clustering
Architecture
Master/Slave relation between CRX cluster nodes participating
in a cluster
Consists of 2 or more CRX cluster nodes with TarPM
Synchronization via file-based Cluster Journal
Direct communication between cluster nodes via TCP/IP using
HTTP
Only Master CRX node writes data
Master is elected automatically
Automatic fail-over
Day Technical Summit 2008 10
11. www.day.com
TarPM Clustering
Architecture - Overview
Cluster Node A Cluster Node B
(Master)
s
old
H s CRX CRX
ock Journal posts
L to master
TAR PM TAR PM
master write
read
master write
read
Master Data TAR
Cluster
Journal (FS)
FS
Day Technical Summit 2008 11
12. www.day.com
TarPM Clustering
Global Data Store
Central storage for binary data, even beyond repository
boundaries
Only one copy per unique object is kept
Storing and reading does not block other users, done outside
Persistence Manager
Objects in the Data Store are immutable
Only unique data identifier of existing objects in the Data Store
are stored in the Persistence Manager
Transactional semantics guaranteed
Hot Backup by simply copying all files :)
Day Technical Summit 2008 12
13. www.day.com
TarPM Clustering
Global Data Store - Configuration
Configured in repository.xml of CRX
e.g. <crx_home>/server/runtime/0/_crx/WEB-INF/repository.xml
File-based or database-backed
org.apache.jackrabbit.core.data.FileDataStore
org.apache.jackrabbit.core.data.db.DbDataStore
Day Technical Summit 2008 13
14. www.day.com
TarPM Clustering
FileDataStore - Config Parameters
Parameter Description default
repository.home/
path The directory where to store binary objects. repository/datastore
Binary objects bigger than this value (in bytes) are
minRecordLength 100
stored in the Data Store.
Day Technical Summit 2008 14
15. www.day.com
TarPM Clustering
DbDataStore - Config Parameters
Parameter Description default
url The database URL used to access the database. -
user Name of the database-user. -
password Password of the user. -
Binary objects bigger than this value (in bytes) are
minRecordLength 100
stored in the Data Store.
maxConnections The maximum number of open connections. 3
Day Technical Summit 2008 15
16. www.day.com
TarPM Clustering
Architecture
Cluster Node A Cluster Node B
(Master)
s
old
H s
ock
CRX CRX
L Journal posts
to master
TAR PM TAR PM
master write
read
master write
read
Master Data TAR
Cluster
Journal (FS)
Global Data Store
FS
Day Technical Summit 2008 16
17. www.day.com
TarPM Clustering
Setup
Install CRX
Configure clustering in repository.xml
Configure TarPM to run in cluster mode
Setup additional CRX cluster node by copying complete
instance
Delete repository-local revision, if present
On startup, CRX cluster node will sync up with master data
based on journal
Day Technical Summit 2008 17
18. www.day.com
TarPM Clustering
Repository Configuration
Enable clustering on a repository-wide level
e.g. <crx_home>/runtime/0/_crx/WEB-INF/repository.xml
Unique cluster id
Cluster Journal
<Cluster id=quot;cluster-node-1quot; syncDelay=quot;1quot;>
<Journal class=quot;org.apache.jackrabbit.core.journal.FileJournalquot;>
<param name=quot;revisionquot; value=quot;${rep.home}/revision.logquot; />
<param name=quot;directoryquot; value=quot;/data/shared/journalquot; />
</Journal>
</Cluster>
Day Technical Summit 2008 18
19. www.day.com
TarPM Clustering
Repository Configuration - Parameters
luster
C
Parameter Description
id This is required to be a unique literal id of the cluster node.
Delay in milliseconds before changes to the journal are
syncDelay automatically detected. Default: 5000
urnal
Jo
Parameter Description
FQN of org.apache.jackrabbit.core.journal.Journal interface
class
implementation.
revision Location and filename of repository-local revision counter.
directory Shared directory of journal entries and global revision counter.
Day Technical Summit 2008 19
20. www.day.com
TarPM Clustering
Workspace Configuration
enable clustering on the TarPM
e.g. <crx_home>/crx/workspaces/crx.default/workspace.xml
set cluster flag
configure local and shared paths
<PersistenceManager class=quot;com.day.crx.persistence.tar.TarPersistenceManagerquot;>
<param name=quot;clusterquot; value=quot;truequot; />
<param name=quot;localPathquot; value=quot;${wsp.home}quot; />
<param name=quot;sharedPathquot; value=quot;/data/sharedquot; />
</PersistenceManager>
Day Technical Summit 2008 20
21. www.day.com
TarPM Clustering
TarPM Configuration - Parameters
Parameter Description default
cluster Enables clustering. FALSE
localPath Path where to store local tar-files and index-files. workspace.home
sharedPath Path where to store shared data, i.e. tar-files. workspace.home
Day Technical Summit 2008 21