SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Relaxed Transactions for HBase
         Francis Liu, Software Engineer
                               5/22/12
Mutable Data

 Writes are effective immediately


                                     Read_Job
                        Table1
                                      Map1
                        C1=1

                        C2=2
                                      Map2
                        C3=3

                                      Map3
Mutable Data

 Writes are effective immediately

           Job1
                                     Read_Job
                        Table1
           Map1
                                      Map1
                        C1=1
           Map2         C2=2
           Write_Job                  Map2
                        C3=4
           Map3
                                      Map3
Mutable Data
 Partial writes in the midst of failures



              Write_Job

                                    Table1
                Map1

                                    C1=1
                Map2
                                    C2=2

                                    C3=?
                Map3
Mutable Data
 Partial writes in the midst of failures



              Write_Job

                                    Table1
                Map1

                                    C1=1
                Map2                         Read_Job
                                    C2=2

                                    C3=?
                Map3
Revision Manager

 Optimized for batch processing
 ›   Large number of writes (ie Data Ingestion, Batch
     updates)
 Cross row write transactions within a table
 Coprocessor Endpoint
  › Leverage HBase Security
 Zookeeper for persistence
 ›   table revision information
 Experimental feature in Hcatalog 0.4
Architecture



                      Revision Mgr
      Revision Mgr
                        Service
         Client
                     (Coprocessor)
    InputFormat/
                     RegionServer    Zookeeper
    OutputFormat
API

 For reads
 ›   RevisionManager.createSnapshot(tableName)
 ›   SnapshotFilter.filter(result)


 For writes
 ›   RevisionManager.beginWriteTransaction(table, families)
 ›   RevisionManager.commitWriteTransaction(transaction)
 ›   RevisionManager.abortWriteTransaction(transaction)
Concepts
 Revision
  › Monotonically increasing number
  › All “Puts” of a job are written with the same revision number as the
    cell version

 TableSnapshot
  › Point-in-time consistent view of a table
  › Used for reading
  › Latest committed revision
  › List of aborted revisions
  › Upper bound on visible revision per CF


 Transaction
  ›   Write transaction
  ›   Revision Number
  ›   List of column Families being written to
Relaxed Transaction Properties

 Immutable Input
 Change After Commit
 Precedence Preservation
Immutable Input

                                            Consistent Read
                                                                     Write
             CellA=1
                                                                     Read


                Snapshot1        CellA=1    CellA=1        CellA=1
Read_Job1

                            Begin t1   CellA=2    Commit

Write_Job1
Change After Commit

 Revisions are only viewable after commit
  › A job cannot see it‟s own writes
 Aborted revisions are added to a table‟s aborted
  list
 Timed out revisions are aborted
Change After Commit


                                                                     Write
             CellA=1
                                                                     Read
                                       Snapshot1     CellA=1
Read_Job1

                       Begin t1   CellA=2   Commit

Write_Job1
                                                                      t1 change read
                                               Snapshot2   CellA=2
Read_Job2
Precedence Preservation

 Snapshot Isolation
 ›   Transaction is aborted when a write conflict is detected
 Conflicts
 ›   Concurrent transactions to the same Column Family
 ›   Inefficient to abort
 Resolved during read time
     • For every CF
      – find: min_rev = min(active_revision)
      – Only return closest revision to min_rev
     • min_rev is what‟s stored in a snapshot
Precedence Preservation

       CellA=1                                                                 Write
       CellB=1                                                                 Read

                 Begin t1 CellA=2    CellB=2                  Commit

Write_Job1
                                                    Changes are not visible due to t1
                        Begin t2    CellA=3    Commit

Write_Job2

                                                  Snapshot1        CellA=1 CellB=1

Read_Job1


      * CellA and CellB are members of the same column family
Snapshot Filter

 Consumes TableSnapshot
 Read time filtering
  › Aborted revisions
  › Revisions written after snapshot was taken
  › Conflicting/Blocked revisions
Flow - Read
 User/Client
  ›   RevisionManager.createSnapshot()
      • TableSnapshot instance is serialized into JobConf



 RecordReader
  ›   Using SnapshotFilter.filter(result)
Flow - Read
          SnapshotRecordReader                  SnapshotFilter   ScannerIterator

   next(key,value)

                          Loop
                      result != null
                           and
                     filtered == null

                                       next()


                                                  next result

                               filter(result)




                               filtered result

   next record
Flow - Write
 User/Client
  ›   HBaseOutputFormat.checkOutputSpecs(FileSystem, JobConf)
      • Write transaction is started by calling beginWriteTransaction(Transaction)
      • Transaction instance is serialized into JobConf



 RecordWriter
  ›   Puts make use of the revision number as the version


 OutputCommitter
  ›   OutputCommitter.commitJob(JobContext)
      • RevisionManager.commitWriteTransaction(Transaction)
  ›   OutputCommitter.abortJob(JobContext)
      • RevisionManager.abortWriteTransaction(Transaction)
Usage

 Using HCatalog Revision Manager usage is done
  under the covers.
 Work is being done to decouple HCatalog from
  HBaseInputFormat/HBaseOutputFormat
 Other frameworks can make use of the
  RevisionManager API
Usage: HCatalog
Create Table
hcat –e “create table my_table(key string, gpa string) STORED BY
'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
TBLPROPERTIES ('hbase.columns.mapping'=':key,info:gpa');”

Using Pig
A = LOAD „table1‟ USING org.apache.hcatalog.pig.HCatLoader();
STORE A INTO „table1‟ USING org.apache.hcatalog.pig.HCatStorer();

Using MapReduce
HCatInputFormat.setInput(job,…)
HCatOutputFormat.setOutput(job,…)
Future Work

 Compaction of aborted transactions
 Server-side filtering using HBase Filters
 Compatibility with Hive
Further Info

 hcatalog-user@incubator.apache.org
 http://incubator.apache.org/hcatalog/
 toffer@apache.org
Questions?

Más contenido relacionado

Destacado

HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponCloudera, Inc.
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBaseHBaseCon
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseCloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon
 
Bulk Loading in the Wild: Ingesting the World's Energy Data
Bulk Loading in the Wild: Ingesting the World's Energy DataBulk Loading in the Wild: Ingesting the World's Energy Data
Bulk Loading in the Wild: Ingesting the World's Energy DataHBaseCon
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon
 
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 

Destacado (20)

HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
Bulk Loading in the Wild: Ingesting the World's Energy Data
Bulk Loading in the Wild: Ingesting the World's Energy DataBulk Loading in the Wild: Ingesting the World's Energy Data
Bulk Loading in the Wild: Ingesting the World's Energy Data
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBaseHBaseCon 2015 General Session: State of HBase
HBaseCon 2015 General Session: State of HBase
 
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamArik Fletcher
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersPeter Horsten
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOne Monitar
 
Driving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerDriving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerAggregage
 
Types of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfTypes of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfASGITConsulting
 
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Aggregage
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxRich Reba
 
Neha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and CareerNeha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and Careerr98588472
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...Operational Excellence Consulting
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsKnowledgeSeed
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...SOFTTECHHUB
 
Paul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate ProfessionalPaul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate ProfessionalPaul Turovsky
 
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...PRnews2
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsIndiaMART InterMESH Limited
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterJamesConcepcion7
 
digital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingdigital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingrajputmeenakshi733
 
14680-51-4.pdf Good quality CAS Good quality CAS
14680-51-4.pdf  Good  quality CAS Good  quality CAS14680-51-4.pdf  Good  quality CAS Good  quality CAS
14680-51-4.pdf Good quality CAS Good quality CAScathy664059
 
Fundamentals Welcome and Inclusive DEIB
Fundamentals Welcome and  Inclusive DEIBFundamentals Welcome and  Inclusive DEIB
Fundamentals Welcome and Inclusive DEIBGregory DeShields
 

Último (20)

Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management Team
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exporters
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
 
Driving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon HarmerDriving Business Impact for PMs with Jon Harmer
Driving Business Impact for PMs with Jon Harmer
 
Types of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdfTypes of Cyberattacks - ASG I.T. Consulting.pdf
Types of Cyberattacks - ASG I.T. Consulting.pdf
 
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
 
Implementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptxImplementing Exponential Accelerators.pptx
Implementing Exponential Accelerators.pptx
 
Neha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and CareerNeha Jhalani Hiranandani: A Guide to Her Life and Career
Neha Jhalani Hiranandani: A Guide to Her Life and Career
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applications
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
 
Paul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate ProfessionalPaul Turovsky - Real Estate Professional
Paul Turovsky - Real Estate Professional
 
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
Introducing the AI ShillText Generator A New Era for Cryptocurrency Marketing...
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan Dynamics
 
Healthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare NewsletterHealthcare Feb. & Mar. Healthcare Newsletter
Healthcare Feb. & Mar. Healthcare Newsletter
 
digital marketing , introduction of digital marketing
digital marketing , introduction of digital marketingdigital marketing , introduction of digital marketing
digital marketing , introduction of digital marketing
 
14680-51-4.pdf Good quality CAS Good quality CAS
14680-51-4.pdf  Good  quality CAS Good  quality CAS14680-51-4.pdf  Good  quality CAS Good  quality CAS
14680-51-4.pdf Good quality CAS Good quality CAS
 
Fundamentals Welcome and Inclusive DEIB
Fundamentals Welcome and  Inclusive DEIBFundamentals Welcome and  Inclusive DEIB
Fundamentals Welcome and Inclusive DEIB
 

HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!

  • 1. Relaxed Transactions for HBase Francis Liu, Software Engineer 5/22/12
  • 2. Mutable Data  Writes are effective immediately Read_Job Table1 Map1 C1=1 C2=2 Map2 C3=3 Map3
  • 3. Mutable Data  Writes are effective immediately Job1 Read_Job Table1 Map1 Map1 C1=1 Map2 C2=2 Write_Job Map2 C3=4 Map3 Map3
  • 4. Mutable Data  Partial writes in the midst of failures Write_Job Table1 Map1 C1=1 Map2 C2=2 C3=? Map3
  • 5. Mutable Data  Partial writes in the midst of failures Write_Job Table1 Map1 C1=1 Map2 Read_Job C2=2 C3=? Map3
  • 6. Revision Manager  Optimized for batch processing › Large number of writes (ie Data Ingestion, Batch updates)  Cross row write transactions within a table  Coprocessor Endpoint › Leverage HBase Security  Zookeeper for persistence › table revision information  Experimental feature in Hcatalog 0.4
  • 7. Architecture Revision Mgr Revision Mgr Service Client (Coprocessor) InputFormat/ RegionServer Zookeeper OutputFormat
  • 8. API  For reads › RevisionManager.createSnapshot(tableName) › SnapshotFilter.filter(result)  For writes › RevisionManager.beginWriteTransaction(table, families) › RevisionManager.commitWriteTransaction(transaction) › RevisionManager.abortWriteTransaction(transaction)
  • 9. Concepts  Revision › Monotonically increasing number › All “Puts” of a job are written with the same revision number as the cell version  TableSnapshot › Point-in-time consistent view of a table › Used for reading › Latest committed revision › List of aborted revisions › Upper bound on visible revision per CF  Transaction › Write transaction › Revision Number › List of column Families being written to
  • 10. Relaxed Transaction Properties  Immutable Input  Change After Commit  Precedence Preservation
  • 11. Immutable Input Consistent Read Write CellA=1 Read Snapshot1 CellA=1 CellA=1 CellA=1 Read_Job1 Begin t1 CellA=2 Commit Write_Job1
  • 12. Change After Commit  Revisions are only viewable after commit › A job cannot see it‟s own writes  Aborted revisions are added to a table‟s aborted list  Timed out revisions are aborted
  • 13. Change After Commit Write CellA=1 Read Snapshot1 CellA=1 Read_Job1 Begin t1 CellA=2 Commit Write_Job1 t1 change read Snapshot2 CellA=2 Read_Job2
  • 14. Precedence Preservation  Snapshot Isolation › Transaction is aborted when a write conflict is detected  Conflicts › Concurrent transactions to the same Column Family › Inefficient to abort  Resolved during read time • For every CF – find: min_rev = min(active_revision) – Only return closest revision to min_rev • min_rev is what‟s stored in a snapshot
  • 15. Precedence Preservation CellA=1 Write CellB=1 Read Begin t1 CellA=2 CellB=2 Commit Write_Job1 Changes are not visible due to t1 Begin t2 CellA=3 Commit Write_Job2 Snapshot1 CellA=1 CellB=1 Read_Job1 * CellA and CellB are members of the same column family
  • 16. Snapshot Filter  Consumes TableSnapshot  Read time filtering › Aborted revisions › Revisions written after snapshot was taken › Conflicting/Blocked revisions
  • 17. Flow - Read  User/Client › RevisionManager.createSnapshot() • TableSnapshot instance is serialized into JobConf  RecordReader › Using SnapshotFilter.filter(result)
  • 18. Flow - Read SnapshotRecordReader SnapshotFilter ScannerIterator next(key,value) Loop result != null and filtered == null next() next result filter(result) filtered result next record
  • 19. Flow - Write  User/Client › HBaseOutputFormat.checkOutputSpecs(FileSystem, JobConf) • Write transaction is started by calling beginWriteTransaction(Transaction) • Transaction instance is serialized into JobConf  RecordWriter › Puts make use of the revision number as the version  OutputCommitter › OutputCommitter.commitJob(JobContext) • RevisionManager.commitWriteTransaction(Transaction) › OutputCommitter.abortJob(JobContext) • RevisionManager.abortWriteTransaction(Transaction)
  • 20. Usage  Using HCatalog Revision Manager usage is done under the covers.  Work is being done to decouple HCatalog from HBaseInputFormat/HBaseOutputFormat  Other frameworks can make use of the RevisionManager API
  • 21. Usage: HCatalog Create Table hcat –e “create table my_table(key string, gpa string) STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler' TBLPROPERTIES ('hbase.columns.mapping'=':key,info:gpa');” Using Pig A = LOAD „table1‟ USING org.apache.hcatalog.pig.HCatLoader(); STORE A INTO „table1‟ USING org.apache.hcatalog.pig.HCatStorer(); Using MapReduce HCatInputFormat.setInput(job,…) HCatOutputFormat.setOutput(job,…)
  • 22. Future Work  Compaction of aborted transactions  Server-side filtering using HBase Filters  Compatibility with Hive
  • 23. Further Info  hcatalog-user@incubator.apache.org  http://incubator.apache.org/hcatalog/  toffer@apache.org