SlideShare una empresa de Scribd logo
1 de 25
Content Storage with Apache
Jackrabbit
2009-03-25, Jukka Zitting
Introducing JCR
Content Repository for Java™ Technology API
-> Version 1.0 defined in JSR 170, final in 2006
-> Version 2.0 defined in JSR 283, final (hopefully) in 2009
-> Open source RI and TCK developed in Apache Jackrabbit
Flexible, hierarchically ordered content store with features like
full text search, versioning, transactions, observation, etc.
Combines and extends features often found in file systems and
databases; a "best of both worlds" approach.
Introducing Apache Jackrabbit
Entered Apache Incubator
in 2004, graduated in 2006,
now a TLP with 24
committers
Version 1.0 (and JCR RI) in
2006, currently at 1.5.3.
Version 2.0 (and JCR 2.0 RI)
planned for 2009.
Also: JCR Commons, Sling
Components:
jackrabbit-api
jackrabbit-core
jackrabbit-standalone
jackrabbit-webapp
jackrabbit-jca
jackrabbit-ocm
jackrabbit-jcr-rmi
jackrabbit-webdav
jackrabbit-jcr-server
etc.
Structure of a content repository
Consists of one or more workspaces, one of which is the
default workspace.
A workspace consists of a tree of nodes, each of which can
have zero or more child nodes. Each workspace has a single
root node.
Nodes have properties. Properties are typed (string,
integer, date,  binary, reference, etc.) and can be either
single- or multivalued.
Structure, cont.
Each node or property has a name, and can be accessed
using a path. Names are namespaced.
A referenceable node has a UUID, through which it can be
accessed or referenced. Referential integrity is guaranteed.
Each node has a primary node type and zero or more mixin
types. Node types define the structure of a node. A node
can also be unstructured.
Content repository features
Read/write
Search (XPath and SQL)
XML import/export
Observation
Versioning
Node types
Locking
Access control
Atomic changes
XA transactions
Getting started with Jackrabbit
Download and run the standalone server:
java -jar jackrabbit-standalone-1.5.3.jar
-> Web interface at http://localhost:8080/
-> WebDAV at http://localhost:8080/repository/default
-> JCR access over RMI at http://localhost:8080/rmi
-> Repository data in ./jackrabbit
-> Configuration in ./jackrabbit/repository.xml
If you have a servlet container, use the webapp
If you have a J2EE application server, use jca
Remote access
JCR-RMI layer available
since Jackrabbit 0.9.
Good functional coverage,
not so good performance.
-> administrative tools
Look at clustering for
better performance.
Jackrabbit 1.6: spi2dav
o.a.j.rmi.repository:
new URLRemoteRepository(
    "http://.../rmi");
new RMIRemoteRepository(
    "//.../repository");
Classpath:
jcr-1.0.jar
jackrabbit-jcr-rmi-1.5.3.jar
jackrabbit-api-1.5.3.jar
(also in rmiregistry!)
Jackrabbit as a shared resource
JCA adapter in an
application server
-> accessed through JNDI
Jackrabbit webapp in a
servlet container
-> JNDI or servlet context
-> JNDI: complex setup
-> cross-context access
JNDI configuration with
jackrabbit-servlet:
<servlet>
  <servlet-name>Repository</servlet-name>
  <servlet-class>
  org.apache.jackrabbit.servlet.JNDIRepositoryServlet
  </servlet-class>
  <init-param>
    <param-name>location</param-name>
    <param-value<javax/jcr/Repository</param-value>
  </init-param>
</servlet>
Accessing the repository:
public class MyServlet extends HttpServlet {
    private final Repository repository =
        new ServletRepository(this);
}
Jackrabbit in embedded mode
RepositoryConfig config = RepositoryConfig.create(
      "/path/to/repository", "/path/to/repository.xml");
RepositoryImpl repository = RepositoryImpl.create(config);
try {
    // Use the javax.jcr interfaces to access the repository
} finally {
    repository.shutdown();
}
Embedded mode, cont.
Only a single instance can be
running at a time.
For concurrent access:
- multiple sessions
- RMI for remote access
- clustering
Also: TransientRepository
Maven coordinates
<dependency>
  <groupId>org.apache.jackrabbit</groupId>
  <artifactId>jackrabbit-core</artifactId>
  <version>1.5.3</version>
  <exclusions>
    <exclusion>
      <groupId>commons-logging</groupId>
      <artifactId>commons-logging</artifactId>
    </exclusion>
  </exclusions>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.5.3</version>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>jcl-over-slf4j</artifactId>
  <version>1.5.3</version>
</dependency>
Logging - what's this SLF4J thing?
Jackrabbit uses the SLF4J
logging facade for logging.
Benefits: Great for
embedded uses, can adapt
to 
Drawbacks: What do I put
in my classpath?
SLF4J in practice
SLF4J API is automatically
included as a dependency of
jackrabbit-core.
You need to explicitly add
the SLF4J implementation.
Jackrabit webapp, jca and
standalone use slf4j-log4j,
so you can use normal log4j
configuration.
Classpath with log4j:
slf4j-api-1.5.3.jar
slf4j-log4j-1.5.3.jar
log4j-1.2.14.jar
log4j.properties
Classpath with no logging:
slf4j-api-1.5.3.jar
slf4j-nop-1.5.3.jar
Repository configuration
Configuration in a repository.xml file. Default configuration
shipped with Jackrabbit. Structure defined in a DTD.
Contains global settings and a workspace configuration
template. The template is instantiated to a workspace.xml
configuration file for each new workspace.
Main elements: clustering, workspace, versioning, security,
persistence, search index, data store, file system
Persistence managers
Use one of the bundle
persistence managers.
-> select one for your db
Other persistence managers
mostly for backwards
compatibility.
Default based on embedded
Apache Derby.
Configuration:
driver,url,user,password
schema
schemaObjectPrefix
minBlobSize
bundleCacheSize
consistencyCheck/Fix
blockOnConnectionLoss
Needs CREATE TABLE
permissions!
Search index
Per-workspace search
indexes based on Apache
Lucene.
By default everything is
indexed. Use index
configuration to customize.
Clustering: Each cluster
node maintains local search
indexes.
Configuration:
min/maxMergeDocs
mergeFactor
analyzer
textFilterClasses
respectDocumentOrder
resultFetchSize
Performance:
property, type, ft -> fast
path -> slow
Text extraction
Full text indexing of file
contents based on various
parser libraries (POI,
PDFBox, etc.).
Currently only for the
jcr:data property with
correct jcr:mimeType.
Jackrabbit 2.0: indexing of
all binary properties.
textFilterClasses:
PlainTextExtractor
MsWordTextExtractor
MsExcelTextExtractor
MsPowerPointTextExtractor
PdfTextExtractor
OpenOfficeTextExtractor
RTFTextExtractor
HTMLTextExtractor
XMLTextExtractor
Data store - dealing with lots of data
Data store feature available
since Jackrabbit 1.4
Content-addressed storage
of large binary properties.
Completely transparent to
client applications.
Uses garbage collection to
remove unused data.
Implementations:
FileDataStore
DbDataStore
sandbox: S3DataStore
Garbage collection:
gc = si.createDSGC();
gc.scan();
gc.stopScan();
gc.deleteUnused();
Content modeling: David's model
1. Data First. Structure Later. Maybe.
2.Drive the content hierarchy, don't let it happen.
3.Workspaces are for clone(), merge() and update()
4.Beware of Same Name Siblings
5.References considered harmful
6.Files are Files are Files
7.ID's are evil
Content modeling: Example
CREATE TABLE author (
    id INTEGER,
    name VARCHAR,
);
CREATE TABLE post (
    id INTEGER,
    author INTEGER,
    posted DATETIME,
    title VARCHAR,
    body TEXT
);
/blog
  /jukka
    @name = Jukka Zitting
    /2009
      /03
        /25
          /hello
            @author = jukka
            @posted
            @title
            @body
Example, cont.
CREATE TABLE  comment (
    id INTEGER,
    post INTEGER,
    title VARCHAR,
    body TEXT
);
CREATE TABLE media (
    id INTEGER,
    post INTEGER,
    data BLOB,
    caption VARCHAR
);
/hello
  /comments
    /salut
      @title = Salut!
      @body
  /media [nt:folder]
    /image.jpg [nt:file]
    /code [nt:folder]
      /Example.java [nt:file]
Example, cont.
Versioning:
/hello [mix:versionable]
  @jcr:versionHistory
  @jcr:baseVersion
node.checkout();
// ...
node.save();
node.checkin();
Locking:
/hello [mix:lockable]
node.lock();
// ...
node.unlock();
Common issues: Content hierarchy
Jackrabbit doesn't support very flat content hierarchies.
You'll start seeing problems when you put more than 10k
child nodes under a single parent.
Solution: Add more depth to your hierarchy. Divide entries
by date, category, author, etc. If nothing else, use the first
letter(s) of a title, a content hash, or even a random number
to distribute the nodes.
Note: You can still access the entries as a single flat set
through search. The hierarchy is for browsing.
Common issues: Concurrent edits
Three ways to handle concurrent edits:
1. Merge changes
2. Fail conflicting changes 
3. Block concurrent changes
Jackrabbit does 1 by default, and falls back to 2 when
merge fails. You can explicitly opt for 3 by using the JCR
locking feature.
Estimate: How often conflicts would happen? Will the
benefits of locking be worth the overhead.
Questions?
 

Más contenido relacionado

La actualidad más candente

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
OSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingOSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingCarsten Ziegeler
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Sling models by Justin Edelson
Sling models by Justin Edelson Sling models by Justin Edelson
Sling models by Justin Edelson AEM HUB
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisArnab Mitra
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in AlfrescoAngel Borroy López
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Alfresco search services: Now and Then
Alfresco search services: Now and ThenAlfresco search services: Now and Then
Alfresco search services: Now and ThenAngel Borroy López
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Oracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuningOracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuningMichel Schildmeijer
 

La actualidad más candente (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
OSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache SlingOSGi, Scripting and REST, Building Webapps With Apache Sling
OSGi, Scripting and REST, Building Webapps With Apache Sling
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Sling models by Justin Edelson
Sling models by Justin Edelson Sling models by Justin Edelson
Sling models by Justin Edelson
 
Alfresco DevCon 2019 Performance Tools of the Trade
Alfresco DevCon 2019   Performance Tools of the TradeAlfresco DevCon 2019   Performance Tools of the Trade
Alfresco DevCon 2019 Performance Tools of the Trade
 
Tomcat Server
Tomcat ServerTomcat Server
Tomcat Server
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Alfresco search services: Now and Then
Alfresco search services: Now and ThenAlfresco search services: Now and Then
Alfresco search services: Now and Then
 
Alfresco Certificates
Alfresco Certificates Alfresco Certificates
Alfresco Certificates
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Oracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuningOracle WebLogic Diagnostics & Perfomance tuning
Oracle WebLogic Diagnostics & Perfomance tuning
 

Destacado

The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6Jukka Zitting
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oakMichael Dürig
 
The Zero Bullshit Architecture
The Zero Bullshit ArchitectureThe Zero Bullshit Architecture
The Zero Bullshit ArchitectureLars Trieloff
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorJukka Zitting
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On SteroidsJukka Zitting
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CIJukka Zitting
 
Java Content Repository avec Jackrabbit
Java Content Repository avec JackrabbitJava Content Repository avec Jackrabbit
Java Content Repository avec JackrabbitEmmanuel Hugonnet
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRWoonsan Ko
 
Rapid JCR Applications Development with Sling
Rapid JCR Applications Development with SlingRapid JCR Applications Development with Sling
Rapid JCR Applications Development with SlingFelix Meschberger
 
恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡Aya Komuro
 
Building Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiBuilding Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiCédric Hüsler
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak SearchJustin Edelson
 
Into the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep DiveInto the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep DiveMichael Dürig
 
Shooting rabbits with sling
Shooting rabbits with slingShooting rabbits with sling
Shooting rabbits with slingTomasz Rękawek
 
Content Management Standards
Content Management StandardsContent Management Standards
Content Management StandardsDavid Nuescheler
 
Introducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformIntroducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformNuxeo
 

Destacado (20)

The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
 
The Zero Bullshit Architecture
The Zero Bullshit ArchitectureThe Zero Bullshit Architecture
The Zero Bullshit Architecture
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache Incubator
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On Steroids
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CI
 
Java Content Repository avec Jackrabbit
Java Content Repository avec JackrabbitJava Content Repository avec Jackrabbit
Java Content Repository avec Jackrabbit
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCR
 
Apache Jackrabbit
Apache JackrabbitApache Jackrabbit
Apache Jackrabbit
 
Rapid JCR Applications Development with Sling
Rapid JCR Applications Development with SlingRapid JCR Applications Development with Sling
Rapid JCR Applications Development with Sling
 
恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡恐るべきApache, Web勉強会@福岡
恐るべきApache, Web勉強会@福岡
 
Apache Hive 紹介
Apache Hive 紹介Apache Hive 紹介
Apache Hive 紹介
 
Building Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiBuilding Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGi
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak Search
 
Into the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep DiveInto the TarPit: A TarMK Deep Dive
Into the TarPit: A TarMK Deep Dive
 
Shooting rabbits with sling
Shooting rabbits with slingShooting rabbits with sling
Shooting rabbits with sling
 
Content Management Standards
Content Management StandardsContent Management Standards
Content Management Standards
 
Introducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformIntroducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management Platform
 
2008-12 OJUG JCR Demo
2008-12 OJUG JCR Demo2008-12 OJUG JCR Demo
2008-12 OJUG JCR Demo
 

Similar a Content Storage With Apache Jackrabbit

Jackrabbit setup configuration
Jackrabbit setup configurationJackrabbit setup configuration
Jackrabbit setup configurationprabakaranbrick
 
Introduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 PresentationIntroduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 PresentationTomcat Expert
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018CodeOps Technologies LLP
 
香港六合彩
香港六合彩香港六合彩
香港六合彩iewsxc
 
Rollin onj Rubyv3
Rollin onj Rubyv3Rollin onj Rubyv3
Rollin onj Rubyv3Oracle
 
Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009Arun Gupta
 
The Java EE 6 platform
The Java EE 6 platformThe Java EE 6 platform
The Java EE 6 platformLorraine JUG
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repositorynobby
 
How to start using Scala
How to start using ScalaHow to start using Scala
How to start using ScalaNgoc Dao
 

Similar a Content Storage With Apache Jackrabbit (20)

Jackrabbit setup configuration
Jackrabbit setup configurationJackrabbit setup configuration
Jackrabbit setup configuration
 
Introduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 PresentationIntroduction to Apache Tomcat 7 Presentation
Introduction to Apache Tomcat 7 Presentation
 
GlassFish and JavaEE, Today and Future
GlassFish and JavaEE, Today and FutureGlassFish and JavaEE, Today and Future
GlassFish and JavaEE, Today and Future
 
Jira Rev002
Jira Rev002Jira Rev002
Jira Rev002
 
Hibernate
HibernateHibernate
Hibernate
 
bjhbj
bjhbjbjhbj
bjhbj
 
Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018Java is Container Ready - Vaibhav - Container Conference 2018
Java is Container Ready - Vaibhav - Container Conference 2018
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Java se7 features
Java se7 featuresJava se7 features
Java se7 features
 
Arquillian in a nutshell
Arquillian in a nutshellArquillian in a nutshell
Arquillian in a nutshell
 
Hibernate java and_oracle
Hibernate java and_oracleHibernate java and_oracle
Hibernate java and_oracle
 
.NET RDF APIs
.NET RDF APIs.NET RDF APIs
.NET RDF APIs
 
Rollin onj Rubyv3
Rollin onj Rubyv3Rollin onj Rubyv3
Rollin onj Rubyv3
 
Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009Dynamic Languages Web Frameworks Indicthreads 2009
Dynamic Languages Web Frameworks Indicthreads 2009
 
The Java EE 6 platform
The Java EE 6 platformThe Java EE 6 platform
The Java EE 6 platform
 
Java Cloud and Container Ready
Java Cloud and Container ReadyJava Cloud and Container Ready
Java Cloud and Container Ready
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repository
 
How to start using Scala
How to start using ScalaHow to start using Scala
How to start using Scala
 
Jsp and jstl
Jsp and jstlJsp and jstl
Jsp and jstl
 
Practical JRuby
Practical JRubyPractical JRuby
Practical JRuby
 

Más de Jukka Zitting

MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStoreJukka Zitting
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Jukka Zitting
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repositoryJukka Zitting
 
Repository performance tuning
Repository performance tuningRepository performance tuning
Repository performance tuningJukka Zitting
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical modelJukka Zitting
 
Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of JackrabbitJukka Zitting
 

Más de Jukka Zitting (11)

MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStore
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
 
Repository performance tuning
Repository performance tuningRepository performance tuning
Repository performance tuning
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical model
 
Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache Tika
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
NoSQL Oakland
NoSQL OaklandNoSQL Oakland
NoSQL Oakland
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of Jackrabbit
 
Apache Tika
Apache TikaApache Tika
Apache Tika
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Content Storage With Apache Jackrabbit

  • 2. Introducing JCR Content Repository for Java™ Technology API -> Version 1.0 defined in JSR 170, final in 2006 -> Version 2.0 defined in JSR 283, final (hopefully) in 2009 -> Open source RI and TCK developed in Apache Jackrabbit Flexible, hierarchically ordered content store with features like full text search, versioning, transactions, observation, etc. Combines and extends features often found in file systems and databases; a "best of both worlds" approach.
  • 3. Introducing Apache Jackrabbit Entered Apache Incubator in 2004, graduated in 2006, now a TLP with 24 committers Version 1.0 (and JCR RI) in 2006, currently at 1.5.3. Version 2.0 (and JCR 2.0 RI) planned for 2009. Also: JCR Commons, Sling Components: jackrabbit-api jackrabbit-core jackrabbit-standalone jackrabbit-webapp jackrabbit-jca jackrabbit-ocm jackrabbit-jcr-rmi jackrabbit-webdav jackrabbit-jcr-server etc.
  • 4. Structure of a content repository Consists of one or more workspaces, one of which is the default workspace. A workspace consists of a tree of nodes, each of which can have zero or more child nodes. Each workspace has a single root node. Nodes have properties. Properties are typed (string, integer, date,  binary, reference, etc.) and can be either single- or multivalued.
  • 5. Structure, cont. Each node or property has a name, and can be accessed using a path. Names are namespaced. A referenceable node has a UUID, through which it can be accessed or referenced. Referential integrity is guaranteed. Each node has a primary node type and zero or more mixin types. Node types define the structure of a node. A node can also be unstructured.
  • 6. Content repository features Read/write Search (XPath and SQL) XML import/export Observation Versioning Node types Locking Access control Atomic changes XA transactions
  • 7. Getting started with Jackrabbit Download and run the standalone server: java -jar jackrabbit-standalone-1.5.3.jar -> Web interface at http://localhost:8080/ -> WebDAV at http://localhost:8080/repository/default -> JCR access over RMI at http://localhost:8080/rmi -> Repository data in ./jackrabbit -> Configuration in ./jackrabbit/repository.xml If you have a servlet container, use the webapp If you have a J2EE application server, use jca
  • 8. Remote access JCR-RMI layer available since Jackrabbit 0.9. Good functional coverage, not so good performance. -> administrative tools Look at clustering for better performance. Jackrabbit 1.6: spi2dav o.a.j.rmi.repository: new URLRemoteRepository(     "http://.../rmi"); new RMIRemoteRepository(     "//.../repository"); Classpath: jcr-1.0.jar jackrabbit-jcr-rmi-1.5.3.jar jackrabbit-api-1.5.3.jar (also in rmiregistry!)
  • 9. Jackrabbit as a shared resource JCA adapter in an application server -> accessed through JNDI Jackrabbit webapp in a servlet container -> JNDI or servlet context -> JNDI: complex setup -> cross-context access JNDI configuration with jackrabbit-servlet: <servlet>   <servlet-name>Repository</servlet-name>   <servlet-class>   org.apache.jackrabbit.servlet.JNDIRepositoryServlet   </servlet-class>   <init-param>     <param-name>location</param-name>     <param-value<javax/jcr/Repository</param-value>   </init-param> </servlet> Accessing the repository: public class MyServlet extends HttpServlet {     private final Repository repository =         new ServletRepository(this); }
  • 10. Jackrabbit in embedded mode RepositoryConfig config = RepositoryConfig.create(       "/path/to/repository", "/path/to/repository.xml"); RepositoryImpl repository = RepositoryImpl.create(config); try {     // Use the javax.jcr interfaces to access the repository } finally {     repository.shutdown(); }
  • 11. Embedded mode, cont. Only a single instance can be running at a time. For concurrent access: - multiple sessions - RMI for remote access - clustering Also: TransientRepository Maven coordinates <dependency>   <groupId>org.apache.jackrabbit</groupId>   <artifactId>jackrabbit-core</artifactId>   <version>1.5.3</version>   <exclusions>     <exclusion>       <groupId>commons-logging</groupId>       <artifactId>commons-logging</artifactId>     </exclusion>   </exclusions> </dependency> <dependency>   <groupId>org.slf4j</groupId>   <artifactId>slf4j-log4j12</artifactId>   <version>1.5.3</version> </dependency> <dependency>   <groupId>org.slf4j</groupId>   <artifactId>jcl-over-slf4j</artifactId>   <version>1.5.3</version> </dependency>
  • 12. Logging - what's this SLF4J thing? Jackrabbit uses the SLF4J logging facade for logging. Benefits: Great for embedded uses, can adapt to  Drawbacks: What do I put in my classpath?
  • 13. SLF4J in practice SLF4J API is automatically included as a dependency of jackrabbit-core. You need to explicitly add the SLF4J implementation. Jackrabit webapp, jca and standalone use slf4j-log4j, so you can use normal log4j configuration. Classpath with log4j: slf4j-api-1.5.3.jar slf4j-log4j-1.5.3.jar log4j-1.2.14.jar log4j.properties Classpath with no logging: slf4j-api-1.5.3.jar slf4j-nop-1.5.3.jar
  • 14. Repository configuration Configuration in a repository.xml file. Default configuration shipped with Jackrabbit. Structure defined in a DTD. Contains global settings and a workspace configuration template. The template is instantiated to a workspace.xml configuration file for each new workspace. Main elements: clustering, workspace, versioning, security, persistence, search index, data store, file system
  • 15. Persistence managers Use one of the bundle persistence managers. -> select one for your db Other persistence managers mostly for backwards compatibility. Default based on embedded Apache Derby. Configuration: driver,url,user,password schema schemaObjectPrefix minBlobSize bundleCacheSize consistencyCheck/Fix blockOnConnectionLoss Needs CREATE TABLE permissions!
  • 16. Search index Per-workspace search indexes based on Apache Lucene. By default everything is indexed. Use index configuration to customize. Clustering: Each cluster node maintains local search indexes. Configuration: min/maxMergeDocs mergeFactor analyzer textFilterClasses respectDocumentOrder resultFetchSize Performance: property, type, ft -> fast path -> slow
  • 17. Text extraction Full text indexing of file contents based on various parser libraries (POI, PDFBox, etc.). Currently only for the jcr:data property with correct jcr:mimeType. Jackrabbit 2.0: indexing of all binary properties. textFilterClasses: PlainTextExtractor MsWordTextExtractor MsExcelTextExtractor MsPowerPointTextExtractor PdfTextExtractor OpenOfficeTextExtractor RTFTextExtractor HTMLTextExtractor XMLTextExtractor
  • 18. Data store - dealing with lots of data Data store feature available since Jackrabbit 1.4 Content-addressed storage of large binary properties. Completely transparent to client applications. Uses garbage collection to remove unused data. Implementations: FileDataStore DbDataStore sandbox: S3DataStore Garbage collection: gc = si.createDSGC(); gc.scan(); gc.stopScan(); gc.deleteUnused();
  • 19. Content modeling: David's model 1. Data First. Structure Later. Maybe. 2.Drive the content hierarchy, don't let it happen. 3.Workspaces are for clone(), merge() and update() 4.Beware of Same Name Siblings 5.References considered harmful 6.Files are Files are Files 7.ID's are evil
  • 20. Content modeling: Example CREATE TABLE author (     id INTEGER,     name VARCHAR, ); CREATE TABLE post (     id INTEGER,     author INTEGER,     posted DATETIME,     title VARCHAR,     body TEXT ); /blog   /jukka     @name = Jukka Zitting     /2009       /03         /25           /hello             @author = jukka             @posted             @title             @body
  • 21. Example, cont. CREATE TABLE  comment (     id INTEGER,     post INTEGER,     title VARCHAR,     body TEXT ); CREATE TABLE media (     id INTEGER,     post INTEGER,     data BLOB,     caption VARCHAR ); /hello   /comments     /salut       @title = Salut!       @body   /media [nt:folder]     /image.jpg [nt:file]     /code [nt:folder]       /Example.java [nt:file]
  • 22. Example, cont. Versioning: /hello [mix:versionable]   @jcr:versionHistory   @jcr:baseVersion node.checkout(); // ... node.save(); node.checkin(); Locking: /hello [mix:lockable] node.lock(); // ... node.unlock();
  • 23. Common issues: Content hierarchy Jackrabbit doesn't support very flat content hierarchies. You'll start seeing problems when you put more than 10k child nodes under a single parent. Solution: Add more depth to your hierarchy. Divide entries by date, category, author, etc. If nothing else, use the first letter(s) of a title, a content hash, or even a random number to distribute the nodes. Note: You can still access the entries as a single flat set through search. The hierarchy is for browsing.
  • 24. Common issues: Concurrent edits Three ways to handle concurrent edits: 1. Merge changes 2. Fail conflicting changes  3. Block concurrent changes Jackrabbit does 1 by default, and falls back to 2 when merge fails. You can explicitly opt for 3 by using the JCR locking feature. Estimate: How often conflicts would happen? Will the benefits of locking be worth the overhead.