Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero (JBoss by RedHat)

OpenBlend Ljubljana
September 15th, 2011

Sanne Grinovero
Software Engineer at Red Hat

About me
• Hibernate
• Hibernate Search
• Hibernate OGM in.relation.to/Bloggers/Sanne
• Infinispan
• Lucene Directory
Twitter: @SanneGrinovero
• Infinispan Query

What is Hibernate OGM ?

JPA for NoSQL
• initially Key/Value store
• in particular Infinispan

Relational Databases
• Transactions
• Referential integrity
• Simple Types
• Well understood
- tuning, backup,
resilience

Relational Databases
But scaling is hard!
-Replication
-Multiple instances w/ shared
disk
-Sharding

Relational Databases on a cloud
Master/replicas: which master?

A single master? I was promised elasticity

Less reliable “disks”

IP in configuration files? DNS update times?

Who coordinates this? How does that failover?

¬SQL
being a not-only-thatone
basically makes it a definition of
“everything else too”
“no-category”

No-SQL goals
Very different
• Large datasets
• High availability
• Low latency / higher throughput
• Specific data access pattern
• Specific data structures
• ...

NotOnlySQL
• Document based stores
• Column based
• Graph oriented
databases
• Key / value stores
• Full-Text Search

Flexibility at a cost

• Programming model
• one per product :-(
• no schema => app driven schema
• query (Map Reduce, specific DSL, ...)
• data structure transpires
• Transaction
• durability / consistency

Quick Infinispan introduction
Distributed Key/Value store
•(or Replicated, local only efficient cache,
invalidating cache)
Each node is equal
•Just start more nodes, or kill some
No bottlenecks
•by design
Cloud-network friendly
•JGroups
•And “cloud storage” friendly too!

Infinispan ABC

map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

It's a ConcurrentMap !

map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

map.putIfAbsent( “user-38”, another );

Something more about
Infinispan
●
Support for Transactions (XA)
●
CacheLoaders
●
Cassandra, JDBC, Amazon S3 (jclouds),...
●
Tree API for JBossCache compatibility
●
Lucene integration
●
Two-fold
●
Some Hibernate integrations
●
Second level cache
●
Hibernate Search indexing backend

Cloud-hack experiments
Let's abuse of Hibernate's second level cache
design, using Infinispan's implementation:
- usually configured in clustering mode
INVALIDATION. Let's use DIST instead.
- Disable expiry/timeouts.

What's the effect on your cloud-deployed
database?


Now introduce Hibernate Search:
- full-text queries should be handled by
Lucene, NOT by the database.

Hibernate Search identifies hits from the
Lucene index, but loads them by PK. *by default

Load by PK ->
second level cache ->
Key/Value store

FullText query ->
Hibernate Search ->
Lucene Indexes

Load by PK ->
Key/Value store

FullText query ->
Hibernate Search ->
Lucene Indexes

So what if you shut down the database?

Load by PK ->
Key/Value store

FullText query ->
Hibernate Search ->
Lucene Indexes

So what if you shut down the database?
•No relational/SQL queries
•You won't be able to write!

Goals

•Encourage new data usage patterns
•Familiar environment
•Ease of use
•easy to jump in
•easy to jump out
•Push NoSQL exploration in enterprises
•“PaaS for existing API” initiative

What it does

• JPA front end to key/value stores
• Object CRUD (incl polymorphism and associations)
• OO queries (JP-QL)
• Reuses
• Hibernate Core
• Hibernate Search (and Lucene)
• Infinispan
• Is not a silver bullet
• not for all NoSQL use cases

Schema or no schema?

• Schema-less
• move to new schema very easy
• app deal with old and new structure or migrate all
data
• need strict development guidelines
• Schema
• reduce likelihood of rogue developer corruption
• share with other apps
• “didn’t think about that” bugs reduced

Entities as serialized blobs?
• Serialize objects into the (key) value
• store the whole graph?

• maintain consistency with duplicated objects
• guaranteed identity a == b
• concurrency / latency
• structure change and (de)serialization, class definition
changes

OGM’s approach to schema

• Keep what’s best from relational model
• as much as possible
• tables / columns / pks
• Decorrelate object structure from data structure
• Data stored as (self-described) tuples
• Core types limited
• portability

OGM’s approach to schema

• Store metadata for queries
• Lucene index
• CRUD operations are key lookups

How does it work?
• Entities are stored as tuples (Map<String,Object>)
• The key is composed of
• table name
• entity id
• Collections are represented as a list of tuple
- The key is composed of:
• table name hosting the collection information
• column names representing the FK
• column values representing the FK

Queries

• Hibernate Search indexes entities
• Store Lucene indexes in Infinispan
• JP-QL to Lucene query transformation

• Works for simple queries
• Lucene is not a relational SQL engine

select a from Animal a where a.size > 20

> animalQueryBuilder
.range().onField(“size”).above(20).excludeLimit()
.createQuery();

select u from Order o join o.user u where o.price > 100 and u.city =
“Paris”
> orderQB.bool()
.must(
orderQB.range()
.onField(“price”).above(100).excludeLimit().createQuery() )
.must(
orderQB.keyword(“user.city”).matching(“Paris”)
.createQuery()
).createQuery();

Why Infinispan?
• We know it well
• Supports transactions (!)
• Research is going on to provide “cloud transactions”
on more platforms
• It supports Lucene indexes distribution
• Easy to manage in clouds
• It's a key/value store with support for Map/Reduce
• Simple
• Likely a common point for many other “databases”

Why Infinispan?

•Map/Reduce as an alternative to
indexed queries
•Might be chosen by a clever JP-QL
engine
•Supports – experimentally – distributed
Lucene queries
•Since ISPN-200, merged last week

Why all this ?
Developers will only need to think about
• JPA models
• JP-QL queries

Everything else is perfomance tuning, including:
•Move to/from different NoSQL implementations
•Move to/from a SQL implementation
•Move to/from clouds/laptops
•JPA is a well known standard: move to/from Hibernate :-)

Summary
•JPA for NoSQL
•Reusing mature projects
•Keep the good of the relational model
•Query via Hibernate Search

•JP-QL support on its way
•Still early in the project
•Only Infinispan is integrated:
contributions welcome!

Summary

•Performance / scalability is different
•Isolation is different

http://www.hibernate.org/subprojects/ogm.html

http://www.jboss.org/jbw2011keynote.html
https://github.com/Sanne/tweets-ogm

Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero (JBoss by RedHat)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (11)

Destacado

Destacado (15)

Similar a Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero (JBoss by RedHat)

Similar a Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero (JBoss by RedHat) (20)

Más de OpenBlend society

Más de OpenBlend society (11)

Último

Último (20)

Introducing Hibernate OGM: porting JPA applications to NoSQL, Sanne Grinovero (JBoss by RedHat)