SlideShare a Scribd company logo
1 of 69
TRANSACTIONS OVER HBASE
Alex Baranau @abaranau	

Gary Helmling @gario	

!
Continuuity
WHO WE ARE
• We’ve built Continuuity Reactor: the world’s first scale-out
application server for Hadoop
• Fast, easy development, deployment and management of
Hadoop and HBase apps
• Continuuity team has years of experience in using and contributing
to Open Source, and we intend to continue doing so.
AGENDA
• Transactions over HBase: Why? What?
• Implementation: How?
• The approach
• Transaction Manager
• Transaction in batch jobs
THE REACTOR
• Continuuity Reactor is an app platform built on Hadoop and HBase
• Collect, Process, Store, and Query data.
• A Flow is a real-time processor with exactly-once guarantee
• A flow is composed of flowlets, connected via queues
• All processing happens with ACID guarantees in transactions
HBase
Table
PROCESSING IN A FLOW
...Queue ...
...
Flowlet
... ...
HBase
Table
PROCESSING IN A FLOW
...Queue ...
...
Flowlet
... ...
HBase
Table
WRAP IN TRANSACTION!
...Queue ...
...
Flowlet
TRANSACTIONS: WHAT?
• Atomic - Entire transaction is committed as one
• Consistent - No partial state change due to failure
• Isolated - No dirty reads, transaction is only visible
after commit
• Durable - Once committed, data is persisted reliably
HBASE: QUICK “WHAT?”
• Open-source non-relational distributed column-
oriented database modeled after Google’s BigTable
• Data is stored in named Tables of Rows
• Row consists of Cells:

(row key, column family, column, timestamp) -> value
• Table is split into Regions, each served by a single node
in HBase cluster
WHAT ABOUT HBASE?
• Atomic operations on cell value: 

checkAndPut, checkAndDelete, increment, append
• Atomic batch of operations on rows within region
• No cross region atomic operations support
• No cross table atomic operations support
• No multi-RPC atomic operations support
TRANSACTIONS: WHY ELSE?
• Complex data access patterns
• Secondary Indexes
• Design for greater performance
• Efficient batching of data operations
• Client-side buffer reliability fix
IMPLEMENTATION
• Multi-Version Concurrency Control
• Cell version (timestamp) = transaction ID
• All writes in the same transaction use the transaction ID as timestamp
• Reads exclude other, uncommitted transactions (for isolation)
• Optimistic Concurrency Control
• Conflict detection at commit of transaction
• Write Conflict: two overlapping transactions write the same row
• Rollback of one transaction in case of conflict (whichever commits later)
OPTIMISTIC CONCURRENCY
CONTROL
• Avoids cost of locking rows and tables
• No deadlocks or lock escalations
• Cost of conflict detection and possible rollback
is higher
• Good if conflicts are rare: short transaction,
disjoint partitioning of work
ZooKeeper
TRANSACTIONS IN CONTEXT
Tx Manager	

(standby)
HBase
Master 1
Master 2	

RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager	

(active)
TRANSACTION LIFE CYCLE
time
out
try abort
failed
roll back
in HBase
write
to
HBase
do work
Client Tx Manager
none
complete V
abortsucceeded
in progress
start tx
start
start tx
commit
try commit check conflicts
RPC API
invalid X
invalidate
failed
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1002
read = 1001
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1003
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
Tx Manager
Client 1 start
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
increment
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002, 1003]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
write = 1003
read = 1001
excluded=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
commit
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
abort
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
write = 1004
read = 1003
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
conflict!
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
invalidate
write = 1002
read = 1001
Client 2
write = 1004
read = 1003
inprogress=[]
invalid=[1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 start
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
HBase
CLIENT SIDE: TX AWARE
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1
read
Client 2
write = 1005
read = 1003
inprogress=[]
invalid=[1002]
write = 1004
read = 1003
exclude = [1002]
invisible!
TRANSACTION MANAGER
• Create new transactions
• Provides monotonically increasing write pointers
• Maintains all in-progress, committed, and invalid transactions
• Detect conflicts
• Transaction =
	 	 	 Write Pointer: Timestamp for HBase writes
	 	 	 Read pointer: Upper bound timestamp for reads
	 	 	 Excludes: List of timestamps to exclude from reads
TRANSACTION MANAGER
• Simple & Fast
• All required state is in-memory
• Single point of failure?
• Persist all state to a write-ahead log
• Secondary Tx Manager watches for failure of Primary
• Failover can happen quickly
TRANSACTION MANAGER
Tx Manager
Current State
in progress
committed
invalid
read point
write point
start()
TRANSACTION MANAGER
Tx Manager
Current State
in progress (+)
committed
invalid
read point
write point ++
start()
Tx Log
started, <write pt>
HDFS
TRANSACTION MANAGER
Tx Manager
Current State
in progress (-)
committed (+)
invalid
read point
write point
commit()
Tx Log
start, <write pt>
commit, <write pt>
HDFS
TRANSACTION SNAPSHOTS
• Write-ahead log provides persistence
• Guarantees point-in-time recovery
• Longer the log grows, longer recovery takes
• Periodically write snapshot of full transaction state
• Snapshot + all new logs provides full state
Tx Manager
Current State
TRANSACTION SNAPSHOTS
Tx Log A
in progress
committed
invalid
read point
write point
HDFS
Tx Manager
Current State
TRANSACTION SNAPSHOTS
Tx Log A
in progress
committed
invalid
read point
write point Tx Log B1
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B2
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
3
HDFS
TRANSACTION SNAPSHOTS
Tx Log ATx Manager
in progress
committed
invalid
read point
write point
Current State
State Snapshot
in progress
committed
invalid
read point
write point
Tx Log B
Tx Snapshot
in progress
committed
invalid
read point
write point
4
HDFS
HBase
TRANSACTION CLEANUP
Cell TS Value
row1:col1 1001 10
row1:col1 1002 11
row1:col1 1003 11
Tx Manager
Client 1 rollback failed
write = 1002
read = 1001
Client 2
write = 1004
read = 1001
inprogress=[1002]
TRANSACTION CLEANUP:
DATA JANITOR
• RegionObserver coprocessor
• Maintains in-memory snapshot of recent invalid &
in-progress sets
• Periodically updates from transaction snapshot in
HDFS
• Purges data from invalid transactions and older
versions on flush & compaction
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
refresh
Data Janitor	

(RegionObserver)
MemStore
preFlush()
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
Custom RegionScanner
HFile
Cell TS Value
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
Custom RegionScanner
MemStore
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
HFile
Cell TS Value
row1:col1 1004 12Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
Custom RegionScanner
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
MemStore
preFlush()
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
HBase
TRANSACTION CLEANUP:
DATA JANITOR
Data Janitor	

(RegionObserver)
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Tx Snapshot
read point = 1003
write point = 1005
in progress = [1004]
committed = []
invalid = [1002]
Custom RegionScanner
preFlush()
MemStore
Cell TS Value
row1:col1 1004 12
1003 11
1002 11
HFile
Cell TS Value
row1:col1 1004 12
1003 11
REDUNDANT VERSIONS CLEANUP
• Every write is new version of a cell

• Heavily updated values (e.g. counters) result
in lots of versions
• Cleanup: keep at most one version of a
cell visible to all transactions
• Plug-in cleanup into existing Data Janitor CP
TTL
• Timestamp of a value is overwritten
with tx id
• Breaks support for “native TTL”
feature in HBase
• Tx ids and timestamps cannot be
aligned: more than 1K tx/sec wanted
TTL
• Encode timestamp in tx id
• tx id = (current ts) * 1,000,000 + counter-
within-ms
• 1B txs/sec max
• long value overflow in ~200 years
• Plug-in cleanup into existing Data Janitor CP
• Apply TTL logic on reads
TRANSACTIONS IN BATCH
• Batch jobs are long running
• No changes should be visible before job is done
• Batch jobs are multi-step
• Individual step can be restarted
• All steps succeed or all fail semantics needed
• Every step retry should be isolated
• Uncommitted changes should be visible within same
step but not to other steps
TRANSACTIONS IN BATCH
• Batch jobs are long-running
• No fine-grained conflict detection: last started
among successfully committed wins
• Delayed rollback execution
• Batch jobs are multi-step
• For now: batch job uses single transaction, every
step re-uses it
• For future: nested transactions for steps
TX OVERHEAD
• 2 RPC calls, each faster than Put in HBase
• Can be improved with batching
• To avoid overhead of transactions where
appropriate the ACID guarantees can be relaxed
• No need to change data format
• When reading: read latest version of a cell
• When writing: use ts = (current ts) * 1M + counter
WHAT’S NEXT?
• Open Source
• Continue Scaling Tx Manager
• Transaction Groups?
• Integration across other transactional stores
QS?
Looking for the chance to work with a team that is defining
a new category within Big Data?
!
We are hiring!
http://continuuity.com/careers
careers[at]continuuity.com
!
!
Alex Baranau @abaranau alexb[at]continuuity.com

More Related Content

Similar to Transactions Over Apache HBase

HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityRicardo Jimenez-Peris
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementRicardo Jimenez-Peris
 
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxRunning Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxjoellemurphey
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Prosanta Ghosh
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Ishan Thakkar
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
Multi-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketMulti-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketStéphane Maldini
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Eran Harel
 
Databases: Concurrency Control
Databases: Concurrency ControlDatabases: Concurrency Control
Databases: Concurrency ControlDamian T. Gordon
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxssusereced02
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0StreamNative
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforprojectashish61_scs
 
VMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld
 

Similar to Transactions Over Apache HBase (20)

HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo University
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 
Transaction
TransactionTransaction
Transaction
 
basil.pptx
basil.pptxbasil.pptx
basil.pptx
 
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docxRunning Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
Running Head CLIENT SERVER AUTHENTICATIONHANDLING CONCURRENT CL.docx
 
Consul scale
Consul scaleConsul scale
Consul scale
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013Dbms ii mca-ch9-transaction-processing-2013
Dbms ii mca-ch9-transaction-processing-2013
 
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in H...
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Multi-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocketMulti-service reactive streams using Spring, Reactor, RSocket
Multi-service reactive streams using Spring, Reactor, RSocket
 
Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)Service discovery like a pro (presented at reversimX)
Service discovery like a pro (presented at reversimX)
 
Databases: Concurrency Control
Databases: Concurrency ControlDatabases: Concurrency Control
Databases: Concurrency Control
 
transactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptxtransactions-advanced for automatic payment.pptx
transactions-advanced for automatic payment.pptx
 
19.TRANSACTIONs.ppt
19.TRANSACTIONs.ppt19.TRANSACTIONs.ppt
19.TRANSACTIONs.ppt
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0
 
5 application serversforproject
5 application serversforproject5 application serversforproject
5 application serversforproject
 
Data (1)
Data (1)Data (1)
Data (1)
 
VMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the UniverseVMworld 2013: Extreme Performance Series: vCenter of the Universe
VMworld 2013: Extreme Performance Series: vCenter of the Universe
 

Recently uploaded

Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 

Recently uploaded (20)

Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 

Transactions Over Apache HBase

  • 1. TRANSACTIONS OVER HBASE Alex Baranau @abaranau Gary Helmling @gario ! Continuuity
  • 2. WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out application server for Hadoop • Fast, easy development, deployment and management of Hadoop and HBase apps • Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so.
  • 3. AGENDA • Transactions over HBase: Why? What? • Implementation: How? • The approach • Transaction Manager • Transaction in batch jobs
  • 4. THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase • Collect, Process, Store, and Query data. • A Flow is a real-time processor with exactly-once guarantee • A flow is composed of flowlets, connected via queues • All processing happens with ACID guarantees in transactions
  • 5. HBase Table PROCESSING IN A FLOW ...Queue ... ... Flowlet ... ...
  • 6. HBase Table PROCESSING IN A FLOW ...Queue ... ... Flowlet ... ...
  • 8. TRANSACTIONS: WHAT? • Atomic - Entire transaction is committed as one • Consistent - No partial state change due to failure • Isolated - No dirty reads, transaction is only visible after commit • Durable - Once committed, data is persisted reliably
  • 9. HBASE: QUICK “WHAT?” • Open-source non-relational distributed column- oriented database modeled after Google’s BigTable • Data is stored in named Tables of Rows • Row consists of Cells:
 (row key, column family, column, timestamp) -> value • Table is split into Regions, each served by a single node in HBase cluster
  • 10. WHAT ABOUT HBASE? • Atomic operations on cell value: 
 checkAndPut, checkAndDelete, increment, append • Atomic batch of operations on rows within region • No cross region atomic operations support • No cross table atomic operations support • No multi-RPC atomic operations support
  • 11. TRANSACTIONS: WHY ELSE? • Complex data access patterns • Secondary Indexes • Design for greater performance • Efficient batching of data operations • Client-side buffer reliability fix
  • 12. IMPLEMENTATION • Multi-Version Concurrency Control • Cell version (timestamp) = transaction ID • All writes in the same transaction use the transaction ID as timestamp • Reads exclude other, uncommitted transactions (for isolation) • Optimistic Concurrency Control • Conflict detection at commit of transaction • Write Conflict: two overlapping transactions write the same row • Rollback of one transaction in case of conflict (whichever commits later)
  • 13. OPTIMISTIC CONCURRENCY CONTROL • Avoids cost of locking rows and tables • No deadlocks or lock escalations • Cost of conflict detection and possible rollback is higher • Good if conflicts are rare: short transaction, disjoint partitioning of work
  • 14. ZooKeeper TRANSACTIONS IN CONTEXT Tx Manager (standby) HBase Master 1 Master 2 RS 1 RS 2 RS 4 RS 3 Client 1 Client 2 Client N Tx Manager (active)
  • 15. TRANSACTION LIFE CYCLE time out try abort failed roll back in HBase write to HBase do work Client Tx Manager none complete V abortsucceeded in progress start tx start start tx commit try commit check conflicts RPC API invalid X invalidate failed
  • 16. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 Client 2 write = 1002 read = 1001
  • 17. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1002 read = 1001
  • 18. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 19. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 20. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002]
  • 21. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1003 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 22. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 Tx Manager Client 1 start write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 23. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 increment write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 24. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002, 1003] write = 1003 read = 1001 excluded=[1002]
  • 25. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002] write = 1003 read = 1001 excluded=[1002]
  • 26. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 commit write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 27. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 28. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 29. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 30. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 31. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[]
  • 32. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 abort write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[]
  • 33. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 34. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] write = 1004 read = 1003
  • 35. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 conflict! write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 36. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 37. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 38. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 39. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 invalidate write = 1002 read = 1001 Client 2 write = 1004 read = 1003 inprogress=[] invalid=[1002]
  • 40. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 start Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002]
  • 41. HBase CLIENT SIDE: TX AWARE Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 read Client 2 write = 1005 read = 1003 inprogress=[] invalid=[1002] write = 1004 read = 1003 exclude = [1002] invisible!
  • 42. TRANSACTION MANAGER • Create new transactions • Provides monotonically increasing write pointers • Maintains all in-progress, committed, and invalid transactions • Detect conflicts • Transaction = Write Pointer: Timestamp for HBase writes Read pointer: Upper bound timestamp for reads Excludes: List of timestamps to exclude from reads
  • 43. TRANSACTION MANAGER • Simple & Fast • All required state is in-memory • Single point of failure? • Persist all state to a write-ahead log • Secondary Tx Manager watches for failure of Primary • Failover can happen quickly
  • 44. TRANSACTION MANAGER Tx Manager Current State in progress committed invalid read point write point start()
  • 45. TRANSACTION MANAGER Tx Manager Current State in progress (+) committed invalid read point write point ++ start() Tx Log started, <write pt> HDFS
  • 46. TRANSACTION MANAGER Tx Manager Current State in progress (-) committed (+) invalid read point write point commit() Tx Log start, <write pt> commit, <write pt> HDFS
  • 47. TRANSACTION SNAPSHOTS • Write-ahead log provides persistence • Guarantees point-in-time recovery • Longer the log grows, longer recovery takes • Periodically write snapshot of full transaction state • Snapshot + all new logs provides full state
  • 48. Tx Manager Current State TRANSACTION SNAPSHOTS Tx Log A in progress committed invalid read point write point HDFS
  • 49. Tx Manager Current State TRANSACTION SNAPSHOTS Tx Log A in progress committed invalid read point write point Tx Log B1 HDFS
  • 50. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B2 HDFS
  • 51. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 3 HDFS
  • 52. TRANSACTION SNAPSHOTS Tx Log ATx Manager in progress committed invalid read point write point Current State State Snapshot in progress committed invalid read point write point Tx Log B Tx Snapshot in progress committed invalid read point write point 4 HDFS
  • 53. HBase TRANSACTION CLEANUP Cell TS Value row1:col1 1001 10 row1:col1 1002 11 row1:col1 1003 11 Tx Manager Client 1 rollback failed write = 1002 read = 1001 Client 2 write = 1004 read = 1001 inprogress=[1002]
  • 54. TRANSACTION CLEANUP: DATA JANITOR • RegionObserver coprocessor • Maintains in-memory snapshot of recent invalid & in-progress sets • Periodically updates from transaction snapshot in HDFS • Purges data from invalid transactions and older versions on flush & compaction
  • 55. HBase TRANSACTION CLEANUP: DATA JANITOR Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] refresh Data Janitor (RegionObserver) MemStore preFlush() read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Cell TS Value row1:col1 1004 12 1003 11 1002 11 Custom RegionScanner HFile Cell TS Value
  • 56. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) HFile Cell TS Value Custom RegionScanner MemStore read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 57. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) HFile Cell TS Value row1:col1 1004 12Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11
  • 58. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12
  • 59. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 60. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) Custom RegionScanner read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] MemStore preFlush() Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 61. HBase TRANSACTION CLEANUP: DATA JANITOR Data Janitor (RegionObserver) read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Tx Snapshot read point = 1003 write point = 1005 in progress = [1004] committed = [] invalid = [1002] Custom RegionScanner preFlush() MemStore Cell TS Value row1:col1 1004 12 1003 11 1002 11 HFile Cell TS Value row1:col1 1004 12 1003 11
  • 62. REDUNDANT VERSIONS CLEANUP • Every write is new version of a cell • Heavily updated values (e.g. counters) result in lots of versions • Cleanup: keep at most one version of a cell visible to all transactions • Plug-in cleanup into existing Data Janitor CP
  • 63. TTL • Timestamp of a value is overwritten with tx id • Breaks support for “native TTL” feature in HBase • Tx ids and timestamps cannot be aligned: more than 1K tx/sec wanted
  • 64. TTL • Encode timestamp in tx id • tx id = (current ts) * 1,000,000 + counter- within-ms • 1B txs/sec max • long value overflow in ~200 years • Plug-in cleanup into existing Data Janitor CP • Apply TTL logic on reads
  • 65. TRANSACTIONS IN BATCH • Batch jobs are long running • No changes should be visible before job is done • Batch jobs are multi-step • Individual step can be restarted • All steps succeed or all fail semantics needed • Every step retry should be isolated • Uncommitted changes should be visible within same step but not to other steps
  • 66. TRANSACTIONS IN BATCH • Batch jobs are long-running • No fine-grained conflict detection: last started among successfully committed wins • Delayed rollback execution • Batch jobs are multi-step • For now: batch job uses single transaction, every step re-uses it • For future: nested transactions for steps
  • 67. TX OVERHEAD • 2 RPC calls, each faster than Put in HBase • Can be improved with batching • To avoid overhead of transactions where appropriate the ACID guarantees can be relaxed • No need to change data format • When reading: read latest version of a cell • When writing: use ts = (current ts) * 1M + counter
  • 68. WHAT’S NEXT? • Open Source • Continue Scaling Tx Manager • Transaction Groups? • Integration across other transactional stores
  • 69. QS? Looking for the chance to work with a team that is defining a new category within Big Data? ! We are hiring! http://continuuity.com/careers careers[at]continuuity.com ! ! Alex Baranau @abaranau alexb[at]continuuity.com