SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Elastic consistent
NoSQL data storage with
ModeShape 3
NoSQL Matters 2013
Cologne, Germany
April 26, 2013
Randall Hauch
Principal Software Engineer at Red Hat
@rhauch
@modeshape
SQL databases
2
BLOB
or
CLOB
recursive JOINs
and queries
SQL types
(CHAR, VARCHAR, etc.)
SQL databases
3
BLOB
or
CLOB
recursive JOINs
and queries
SQL types
(CHAR, VARCHAR, etc.)
NoSQL databases
4
http://www.flickr.com/photos/8431398@N04/2680944871
NoSQL databases
5
Document
Key/Value
Column-oriented
Graph
Others, including hierarchical...
ModeShape
An open source
elastic in-memory hierarchical database
with queries, transactions, events & more
6
Hierarchical
• Organize the data into a tree structure
– A lot of data has natural hierarchies
– Conceptually similar to a file system
– Nodes with properties
– References enable graphs (not limited to parent/child)
• Navigate or query
– Quickly navigate to related (or contained) data
– Use queries to find data independently of location
7
Nodes and names
• Node names
– consist of a local part and a namespace (like XML names)
– need not be unique within a parent node (but it is recommended)
• Namespaces
– are URIs that are registered and can be assigned a prefix
– prefixes are repository-wide, but can be permanently changed or
overridden locally by clients
8
Each  node  has  a  name.
Namespace  prefix:  “”  (empty  string)
Local  part:    “equipment”
Namespace  prefix:  “jcr”
Local  part:    “system”
Node paths
• Absolute paths
– the sequence of names from the root to the node in question
– always start with a ‘/’ signifying the root node
– may use a 1-based same-name-sibling positional index (which can
change if order of children are changed)
9
Each  node  is  identified  by  a  path
These  paths  are  equivalent:
/facilities/San Fransisco/Eastford Plaza
/facilities[1]/San Fransisco[1]/Eastford Plaza[1]
Node paths (cont’d)
• Relative paths
– the sequence of names from one node to another
– never start with a ‘/’
– similar to file system relative paths
10
Paths  can  be  relative  and  can  use  “.”  and  “..”
From  the  “passenger”  node  to  the  “Eastford Plaza”  node:
../../facilities/San Fransisco/Eastford Plaza
Node identifier
• Used to lookup that node directly
– no navigation is required
– will never change after a new node is created, even if moved
(unlike paths)
– behaves as a “unique key” within the workspace
(shared nodes behave differently)
– fast
• Used within reference properties
– both REFERENCE and WEAKREFERENCE
• Can be used by applications
11
Each  node  also  has  an  opaque  string  identifier
Properties
• Nodes can have 0+ properties
– each property must have
a unique name in a node
• Properties have values
– single-valued: exactly 1
non-null value
– multi-valued: 0 or more
possibly null values
• Values
– are immutable
– have an implicit type
– are accessed by desired type
with auto-conversion; e.g.,
value.getString(), getDate(),
value.getNode(), etc.
12
The  only  place  to  store  data  on  the  nodes
Property Type Java type
STRING java.lang.String
NAME java.lang.String
PATH java.lang.String
BOOLEAN java.lang.Boolean
LONG java.lang.Long
DOUBLE java.lang.Double
DATE java.util.Calendar
BINARY javax.jcr.Binary
REFERENCE javax.jcr.Node
WEAKREFERENCE javax.jcr.Node
DECIMAL java.math.BigDecimal
URI java.lang.String
BINARY property values
• Any size binary content
– read/written via streams
• Separate storage
– content keyed by SHA-1
– property value stored with node
contains SHA-1 and resolved
when stream is read
– streamed content always buffered
– all this is transparent to applications
• Automatic text extraction
– text is used for full-text searching
• Choices for binary storage
– File, DBMS, MongoDB, data grid (out of the box)
– Custom
13
Binary  Storage
Workspace
• Comprised of
– a single root node
– the “/jcr:system” branch containing the system-wide information
– other nodes that have child nodes and properties
14
Named  segments  of  a  repository
Putting the pieces together
• Repository contains
– named workspaces
– namespaces, node types, version storage, etc.
• Workspaces have
– hierarchy of nodes
– access to the shared system area
• Nodes have
– name (can change)
– identifier (doesn’t change)
– path (can change)
– properties (can change)
• Properties have values
– single-valued: exactly 1 non-null value
– multi-valued: 0 or more possibly null values
• Values
– are immutable & can be reused
– have an implicit type
– are accessed by desired type with auto-
conversion; e.g., value.getString()15
Session
• Authenticated and authorized
– only sees content authorized by credentials
– only changes content authorized by credentials
– use the built-in auth service or integrate with your own
• Stateful
– changes are kept in the session’s transient state until the session is saved
– changes can be dropped without saving (e.g., “refreshing the session”)
• Lightweight
– intended to be created, used, then closed
– pooling sessions is more trouble than it’s worth
• Self-contained
– exposed objects are tied to the session; can’t be shared w/ others
16
An  authenticated  connection  to  a  repository,  
used  to  access  a  single  workspace
With or without schema
• Choose how much schema is enforced
– define patterns for values and structure
– use different patterns for different parts of the database
– change the patterns over time
– use the “best” levels of schema validation
– evolve as necessary
17
STRICT
ENFORCEMENT
NO
ENFORCEMENT
Queries
• Find the data independently of the hierarchy
• SQL-like language (including full-text search)
18
SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006
SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
WHERE PATH() LIKE $path
SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file]
WHERE PATH() IN (
SELECT [vdb:originalFile] FROM [vdb:virtualDatabase]
WHERE [vdb:version] <= $maxVersion
AND CONTAINS([vdb:description],'xml OR xml maybe')
)
SELECT file.*,content.* FROM [nt:file] AS file
JOIN [nt:resource] AS content ON ISCHILDNODE(content,file)
WHERE file.[jcr:path] LIKE '/files/q*.2.vdb'
Sequencing
• Automatically extract structured content
– just write BINARY or STRING property values on nodes, then save
– sequencers run asynchronously based upon path rules & MIME types
– output stored in repository at configurable location
• Sequencers
– DDL (variety)
– text (fixed width, delimited)
– Microsoft Office™
– Java (source & class)
– ZIP (and JAR/WAR/EAR)
– XML, XSD, and WSDL
– Teiid VDBs
– audio (MP3)
– images
– CND
– custom
19
1)  upload
2)  notify
3)  derive  
and  store
Sequencers
4)  navigate  
or  query
Federation
• Access data in external systems
– external data projected as nodes
with properties and node types
– supports read and optional write
with same validation rules
– transparent to applications
• Connector options
– File system
– Local git
– CMIS repository
– custom
– (more are planned)
20
External  source  B
External  source  A
Other features
• Events
– register listeners to be notified of changes in content
– optional criteria limits what listeners are interested in
• Versioning
– checkin/checkout nodes & subtrees
– branch, merge, restore
• Locking
– short-lived locks (longer than transaction scope)
• Namespace management
– programmatically (un)register namespaces
• Node type management
– programmatically/declaratively define or update node types
• Monitoring
– statistics for a variety of metrics
21
Public APIs
22
Java API
• Standard Java API (JSR-283)
– javax.jcr packages
– programmatically access,
find, update, query content
– commonly needed features:
events, versioning, etc.
– 95% of API
• ModeShape extensions
– additional node type management methods
– additional event types
– additional Binary value methods (hash)
– additional JCR-QOM language objects
– cancel queries
– sequencer and text extraction SPIs
– monitoring API
23
Other APIs
• JDBC driver
– connect to local or remote repository
– execute queries
– access database metadata
– enables existing applications to access content
• RESTful API
– POST, PUT, GET, DELETE methods
– JSON representations of one or multiple nodes
– Streams large binary values
– Execute queries
• WebDAV API
– Exposes content as files and directories
– Mount repository using file system
24
ModeShape
An open source
elastic in-memory hierarchical database
with queries, transactions, events & more
25
Elastic
• Add more processes to increase storage
capacity and/or throughput
– Transparent to applications!
– No master, no slaves
– Data is rebalanced as needed
– Optionally separate database engine from storage
processes
• Fault tolerant
– Processes can fail without loss of data
– Cross-data center distribution (in near future)
26
In-memory
• Memory is really fast (and cheap)
• Why not keep all data in application memory?
– practical limits to memory on particular machines
– memory isn’t shared between machines
– data stored in memory isn’t durable
– no queries, structure, or transactions
• ModeShape
– distributes multiple copies of data across the combined
memory of many machines
– persist data to disk or DB (if really needed)
– transparent to applications
27
Large single- or multi-site cluster
28
...
...
ModeShape
...
...
ModeShape
events...
...
ModeShape
events ...
...
ModeShape
events
...
Infinispan data grid
datadata data data
Strongly consistent
• ACID
– Atomic, Consistent, Isolated, Durable
– Already familiar to most developers
– Easy to reason about code
– Writes don’t block reads (MVCC)
– Writes to one node don’t block writes to others
• JTA
– Will participate in user transactions
– Works with Java EE
29
Why not eventually-consistent?
• In eventually-consistent databases
– changes made by one client will eventually (but not
immediately) be propagated to all processes
– other clients won’t see latest data right away, yet can still make
other changes
– there may be multiple versions of a particular piece of data
• Can be ideal for some scenarios
– read-heavy and/or best-effort
• Applications that update data may need to
– expect inconsistencies (and/or multiple versions)
– specify conflict strategies
– resolve conflicts (inconsistencies)
30
Clustering topologies
31
Single process
32
...
...
ModeShape
Infinispan cache
(local)
Persistent Store
data
Small cluster
33
...
...
ModeShape
Infinispan cache
(replicated)
...
...
ModeShape
Infinispan cache
(replicated)
...
...
ModeShape
Infinispan cache
(replicated)
Persistent Store
data
events
data
events
data
data
data
Moderate single- or multi-site cluster
34
...
...
ModeShape
Infinispan
(distributed)
...
...
ModeShape
Infinispan
(distributed)
data
events...
...
ModeShape
Infinispan
(distributed)
data
events ...
...
ModeShape
Infinispan
(distributed)
data
events
...
Best Practices
35
Best practices (1 of 2)
• Build structure first, then node types
– most important to get your node structure right
– it will change over time anyway, so don’t define the node types too soon
• Prefer hierarchies
– moderate numbers of child nodes, use multiple levels if necessary
• Limit use of same-name-siblings
– useful when required, but can be expensive and difficult to use (i.e., paths change)
• Use mixin node types and mixins
– where possible define sets of properties as mixins
– use in primary types and dynamically add to nodes
• Store files and folders with ‘nt:file’ and ‘nt:folder’
– use it wherever appropriate; not for all binary data, though!
• Verify which JCR features are enabled
– improves portability and safety with configuration changes
• Import and export
– avoid document view; use system view wherever possible
36
Best practices (2 of 2)
• Prefer JCR-SQL2 and JCR-QOM over other query languages
– by far the richest and most useful
– do this even when it appears the queries are more complicated
• Only Repository is thread-safe; no other APIs are
– don’t share sessions
– don’t share anything between sessions
• Register all listeners in special long-lived sessions
– do nothing else with these sessions, however (Session is not threadsafe)
– get off the notification thread ASAP, using work queues where necessary
– Session is not threadsafe
• Create new sessions rather than reusing a pool of sessions
– Sessions are intended to be lightweight as possible
– Create a session, use it, log out (even web applications and services!)
• Avoid deprecated APIs
– either perform poorly or are a bad idea; besides, they’ll be removed eventually
• Use Session.save() not Node.save()
37
• Project ! http://modeshape.org
• Blog ! http://modeshape.wordpress.com
• Twitter ! @modeshape
• IRC ! #modeshape (irc.freenode.org)
• Code ! http://github.com/modeshape
38
Want more ModeShape?
Questions?
39

Más contenido relacionado

Último

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Destacado

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Destacado (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Consistent NoSQL data storage with ModeShape (NoSQL Matters 2013)

  • 1. Elastic consistent NoSQL data storage with ModeShape 3 NoSQL Matters 2013 Cologne, Germany April 26, 2013 Randall Hauch Principal Software Engineer at Red Hat @rhauch @modeshape
  • 2. SQL databases 2 BLOB or CLOB recursive JOINs and queries SQL types (CHAR, VARCHAR, etc.)
  • 3. SQL databases 3 BLOB or CLOB recursive JOINs and queries SQL types (CHAR, VARCHAR, etc.)
  • 6. ModeShape An open source elastic in-memory hierarchical database with queries, transactions, events & more 6
  • 7. Hierarchical • Organize the data into a tree structure – A lot of data has natural hierarchies – Conceptually similar to a file system – Nodes with properties – References enable graphs (not limited to parent/child) • Navigate or query – Quickly navigate to related (or contained) data – Use queries to find data independently of location 7
  • 8. Nodes and names • Node names – consist of a local part and a namespace (like XML names) – need not be unique within a parent node (but it is recommended) • Namespaces – are URIs that are registered and can be assigned a prefix – prefixes are repository-wide, but can be permanently changed or overridden locally by clients 8 Each  node  has  a  name. Namespace  prefix:  “”  (empty  string) Local  part:    “equipment” Namespace  prefix:  “jcr” Local  part:    “system”
  • 9. Node paths • Absolute paths – the sequence of names from the root to the node in question – always start with a ‘/’ signifying the root node – may use a 1-based same-name-sibling positional index (which can change if order of children are changed) 9 Each  node  is  identified  by  a  path These  paths  are  equivalent: /facilities/San Fransisco/Eastford Plaza /facilities[1]/San Fransisco[1]/Eastford Plaza[1]
  • 10. Node paths (cont’d) • Relative paths – the sequence of names from one node to another – never start with a ‘/’ – similar to file system relative paths 10 Paths  can  be  relative  and  can  use  “.”  and  “..” From  the  “passenger”  node  to  the  “Eastford Plaza”  node: ../../facilities/San Fransisco/Eastford Plaza
  • 11. Node identifier • Used to lookup that node directly – no navigation is required – will never change after a new node is created, even if moved (unlike paths) – behaves as a “unique key” within the workspace (shared nodes behave differently) – fast • Used within reference properties – both REFERENCE and WEAKREFERENCE • Can be used by applications 11 Each  node  also  has  an  opaque  string  identifier
  • 12. Properties • Nodes can have 0+ properties – each property must have a unique name in a node • Properties have values – single-valued: exactly 1 non-null value – multi-valued: 0 or more possibly null values • Values – are immutable – have an implicit type – are accessed by desired type with auto-conversion; e.g., value.getString(), getDate(), value.getNode(), etc. 12 The  only  place  to  store  data  on  the  nodes Property Type Java type STRING java.lang.String NAME java.lang.String PATH java.lang.String BOOLEAN java.lang.Boolean LONG java.lang.Long DOUBLE java.lang.Double DATE java.util.Calendar BINARY javax.jcr.Binary REFERENCE javax.jcr.Node WEAKREFERENCE javax.jcr.Node DECIMAL java.math.BigDecimal URI java.lang.String
  • 13. BINARY property values • Any size binary content – read/written via streams • Separate storage – content keyed by SHA-1 – property value stored with node contains SHA-1 and resolved when stream is read – streamed content always buffered – all this is transparent to applications • Automatic text extraction – text is used for full-text searching • Choices for binary storage – File, DBMS, MongoDB, data grid (out of the box) – Custom 13 Binary  Storage
  • 14. Workspace • Comprised of – a single root node – the “/jcr:system” branch containing the system-wide information – other nodes that have child nodes and properties 14 Named  segments  of  a  repository
  • 15. Putting the pieces together • Repository contains – named workspaces – namespaces, node types, version storage, etc. • Workspaces have – hierarchy of nodes – access to the shared system area • Nodes have – name (can change) – identifier (doesn’t change) – path (can change) – properties (can change) • Properties have values – single-valued: exactly 1 non-null value – multi-valued: 0 or more possibly null values • Values – are immutable & can be reused – have an implicit type – are accessed by desired type with auto- conversion; e.g., value.getString()15
  • 16. Session • Authenticated and authorized – only sees content authorized by credentials – only changes content authorized by credentials – use the built-in auth service or integrate with your own • Stateful – changes are kept in the session’s transient state until the session is saved – changes can be dropped without saving (e.g., “refreshing the session”) • Lightweight – intended to be created, used, then closed – pooling sessions is more trouble than it’s worth • Self-contained – exposed objects are tied to the session; can’t be shared w/ others 16 An  authenticated  connection  to  a  repository,   used  to  access  a  single  workspace
  • 17. With or without schema • Choose how much schema is enforced – define patterns for values and structure – use different patterns for different parts of the database – change the patterns over time – use the “best” levels of schema validation – evolve as necessary 17 STRICT ENFORCEMENT NO ENFORCEMENT
  • 18. Queries • Find the data independently of the hierarchy • SQL-like language (including full-text search) 18 SELECT * FROM [car:Car] WHERE [car:model] LIKE ‘%Toyota%’ AND [car:year] >= 2006 SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() LIKE $path SELECT [jcr:primaryType],[jcr:created],[jcr:createdBy] FROM [nt:file] WHERE PATH() IN ( SELECT [vdb:originalFile] FROM [vdb:virtualDatabase] WHERE [vdb:version] <= $maxVersion AND CONTAINS([vdb:description],'xml OR xml maybe') ) SELECT file.*,content.* FROM [nt:file] AS file JOIN [nt:resource] AS content ON ISCHILDNODE(content,file) WHERE file.[jcr:path] LIKE '/files/q*.2.vdb'
  • 19. Sequencing • Automatically extract structured content – just write BINARY or STRING property values on nodes, then save – sequencers run asynchronously based upon path rules & MIME types – output stored in repository at configurable location • Sequencers – DDL (variety) – text (fixed width, delimited) – Microsoft Office™ – Java (source & class) – ZIP (and JAR/WAR/EAR) – XML, XSD, and WSDL – Teiid VDBs – audio (MP3) – images – CND – custom 19 1)  upload 2)  notify 3)  derive   and  store Sequencers 4)  navigate   or  query
  • 20. Federation • Access data in external systems – external data projected as nodes with properties and node types – supports read and optional write with same validation rules – transparent to applications • Connector options – File system – Local git – CMIS repository – custom – (more are planned) 20 External  source  B External  source  A
  • 21. Other features • Events – register listeners to be notified of changes in content – optional criteria limits what listeners are interested in • Versioning – checkin/checkout nodes & subtrees – branch, merge, restore • Locking – short-lived locks (longer than transaction scope) • Namespace management – programmatically (un)register namespaces • Node type management – programmatically/declaratively define or update node types • Monitoring – statistics for a variety of metrics 21
  • 23. Java API • Standard Java API (JSR-283) – javax.jcr packages – programmatically access, find, update, query content – commonly needed features: events, versioning, etc. – 95% of API • ModeShape extensions – additional node type management methods – additional event types – additional Binary value methods (hash) – additional JCR-QOM language objects – cancel queries – sequencer and text extraction SPIs – monitoring API 23
  • 24. Other APIs • JDBC driver – connect to local or remote repository – execute queries – access database metadata – enables existing applications to access content • RESTful API – POST, PUT, GET, DELETE methods – JSON representations of one or multiple nodes – Streams large binary values – Execute queries • WebDAV API – Exposes content as files and directories – Mount repository using file system 24
  • 25. ModeShape An open source elastic in-memory hierarchical database with queries, transactions, events & more 25
  • 26. Elastic • Add more processes to increase storage capacity and/or throughput – Transparent to applications! – No master, no slaves – Data is rebalanced as needed – Optionally separate database engine from storage processes • Fault tolerant – Processes can fail without loss of data – Cross-data center distribution (in near future) 26
  • 27. In-memory • Memory is really fast (and cheap) • Why not keep all data in application memory? – practical limits to memory on particular machines – memory isn’t shared between machines – data stored in memory isn’t durable – no queries, structure, or transactions • ModeShape – distributes multiple copies of data across the combined memory of many machines – persist data to disk or DB (if really needed) – transparent to applications 27
  • 28. Large single- or multi-site cluster 28 ... ... ModeShape ... ... ModeShape events... ... ModeShape events ... ... ModeShape events ... Infinispan data grid datadata data data
  • 29. Strongly consistent • ACID – Atomic, Consistent, Isolated, Durable – Already familiar to most developers – Easy to reason about code – Writes don’t block reads (MVCC) – Writes to one node don’t block writes to others • JTA – Will participate in user transactions – Works with Java EE 29
  • 30. Why not eventually-consistent? • In eventually-consistent databases – changes made by one client will eventually (but not immediately) be propagated to all processes – other clients won’t see latest data right away, yet can still make other changes – there may be multiple versions of a particular piece of data • Can be ideal for some scenarios – read-heavy and/or best-effort • Applications that update data may need to – expect inconsistencies (and/or multiple versions) – specify conflict strategies – resolve conflicts (inconsistencies) 30
  • 33. Small cluster 33 ... ... ModeShape Infinispan cache (replicated) ... ... ModeShape Infinispan cache (replicated) ... ... ModeShape Infinispan cache (replicated) Persistent Store data events data events data data data
  • 34. Moderate single- or multi-site cluster 34 ... ... ModeShape Infinispan (distributed) ... ... ModeShape Infinispan (distributed) data events... ... ModeShape Infinispan (distributed) data events ... ... ModeShape Infinispan (distributed) data events ...
  • 36. Best practices (1 of 2) • Build structure first, then node types – most important to get your node structure right – it will change over time anyway, so don’t define the node types too soon • Prefer hierarchies – moderate numbers of child nodes, use multiple levels if necessary • Limit use of same-name-siblings – useful when required, but can be expensive and difficult to use (i.e., paths change) • Use mixin node types and mixins – where possible define sets of properties as mixins – use in primary types and dynamically add to nodes • Store files and folders with ‘nt:file’ and ‘nt:folder’ – use it wherever appropriate; not for all binary data, though! • Verify which JCR features are enabled – improves portability and safety with configuration changes • Import and export – avoid document view; use system view wherever possible 36
  • 37. Best practices (2 of 2) • Prefer JCR-SQL2 and JCR-QOM over other query languages – by far the richest and most useful – do this even when it appears the queries are more complicated • Only Repository is thread-safe; no other APIs are – don’t share sessions – don’t share anything between sessions • Register all listeners in special long-lived sessions – do nothing else with these sessions, however (Session is not threadsafe) – get off the notification thread ASAP, using work queues where necessary – Session is not threadsafe • Create new sessions rather than reusing a pool of sessions – Sessions are intended to be lightweight as possible – Create a session, use it, log out (even web applications and services!) • Avoid deprecated APIs – either perform poorly or are a bad idea; besides, they’ll be removed eventually • Use Session.save() not Node.save() 37
  • 38. • Project ! http://modeshape.org • Blog ! http://modeshape.wordpress.com • Twitter ! @modeshape • IRC ! #modeshape (irc.freenode.org) • Code ! http://github.com/modeshape 38 Want more ModeShape?