SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
HBaseCon, May 2012

HBase Filters
Lars George, Solutions Architect
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




2                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
About Me

    •  Solutions Architect @ Cloudera
    •  Apache HBase & Whirr Committer
    •  Author of
           HBase – The Definitive Guide
    •  Working with HBase since end
       of 2007
    •  Organizer of the Munich OpenHUG
    •  Speaker at Conferences (Fosdem,
       Hadoop World)

3               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
Introduction to Filters

    •  Used in combination with get() and scan()
       API calls
    •  Steps:
      –  Create Filter instance
      –  Create Get or Scan instance
      –  Assign Filter to Get or Scan
      –  Call API and enjoy
    •  More fine-grained control over what is
       returned to the client

4                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Features

    •  Allow client to further narrow down what is
       retrieved
      –  Not just per row or column key, or per column
         family
    •  Predicate Pushdown
      –  Move filtering from client to server to reduce
         network traffic
    •  Varying performance implications,
       dependent on the use-case


5                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Pushdown




6             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Features (cont.)

    •  Filters have access to the entire row to
       decide its fate
      –  Access to KeyValue instances to check row keys,
         column qualifiers, timestamps, or values
    •  Scan batching might conflict with the above
       and might trigger an “Incompatible Filter”
       exception
      –  Example: DependentColumnFilter
    •  There is no cross invocation state
      –  Cannot filter rows based on dependent rows


7                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Filters

    •  Many filters are supplied by HBase
      –  Based on row key, column family, or column
         qualifier
      –  Paging through rows and columns
      –  Based on dependencies

    •  Write your own filters
      –  Use FilterBase class to get a no-op
         skeleton and fill in the gaps


8                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




9                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Comparison Filters

 •  Based on CompareFilter class
 •  Adds the compare() method to
    FilterBase!
 •  Takes operator that defines how the
    comparison is performed
     –  Predefined by client API
 •  Also needs a comparator to do the actual
    check
     –  HBase supplies a large set

10                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Operators




11        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparators




12        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  Not all combinations of operator and
    comparator make sense
     –  For example, the SubstringComparator
        replies only 0 (match) and 1(no match)
     –  Only EQUAL and NOT_EQUAL are useful
     –  Using other operators is allowed but will most
        likely yield unexpected results




13                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  HBase filters are usually filtering data out
 •  Comparison filters work in reverse as they
    include matching data
     –  Be mindful when selecting the comparison
        operator!




14               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Available Comparison Filters

 •  Row Filter
     –  Based on row keys comparisons
 •  Family Filter
     –  Based on column family names
 •  Qualifier Filter
     –  Based on column names, aka qualifiers
 •  Value Filter
     –  Based on the actual value of a column


15                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Comparison Filters (cont.)

 •  Dependent Column Filter
     –  Based on a timestamp of a reference column
     –  Includes all columns that have the same
        timestamp
     –  Implies that the entire row is accessible, since
        batching will not have access to the reference
        column
        •  No scanner batching allowed!
     –  Useful for loading interdependent changes
        within a row


16                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Example Code
Scan scan = new Scan();

scan.addColumn(Bytes.toBytes("colfam1"), !
  Bytes.toBytes("col-0")); !
Filter filter = new RowFilter(!
  CompareFilter.CompareOp.LESS_OR_EQUAL, !
new BinaryComparator(Bytes.toBytes("row-22")));
scan.setFilter(filter);

ResultScanner scanner = table.getScanner(scan);
for (Result res : scanner) { !
  System.out.println(res); !
} !
scanner.close(); !
!

17            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Example Ouput
 keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} !
 keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} !
 keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} !
 keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} !
 keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} !
 keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} !
 keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} !
 keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} !
 keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} !
 keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} !
 keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} !
 keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} !
 keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} !
 keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} !
 keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} !
 keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} !



18                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




19                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Dedicated Filters

 •  Based directly on FilterBase class
 •  Often less useful for get() calls, since
    entire rows are filtered




20             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                    or redistribution without written permission is prohibited.
Available Dedicated Filters

 •  Single Column Value Filter
     –  Filter rows based on one specific column
     –  Extra features
       •  “Filter if missing”
       •  “Get latest version only”
     –  Column must be part of the scan selection
       •  Or else it is all or nothing
     –  Also needs compare operation and an
        optional comparator


21                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Single Column Value Exclude Filter
     –  Same as the one before but excludes the
        selection column
 •  Prefix Filter
     –  Based on prefix of row keys
     –  Can early out the scan!
       •  Combine with start row for best performance




22                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)
 •  Page Filter
     –  Allows pagination through rows
     –  Needs to be combined with setting the start row on
        subsequent scans
     –  Can early out the scan when limit is reached
 •  Key Only Filter
     –  Drop the value for every column
 •  First Key Only Filter
     –  Return only the first column key
     –  Useful for row counter, or get newest post type
        applications
     –  Can early out rest of row scan


23                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Inclusive Stop Filter
     –  As opposed to the exclusive stop row, this
        filter will include the final row
 •  Timestamp Filter
     –  Takes list of timestamps to include in result
 •  Column Count Get Filter
     –  Used to limit number of columns returned by a
        get() call


24                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Column Pagination Filter
     –  Allows to paginate through columns within a
        row
     –  Skips to offset parameter and returns
        limit columns
 •  Column Prefix Filter
     –  Analog to PrefixFilter, here for matching
        column qualifiers
 •  Random Row Filter

25               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




26                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Decorating Filters

 •  Extend filters to gain additional control
    over the returned data
 •  Skip Filter
     –  Skip entire row when a column is filtered
     –  Not all filters are compatible
 •  While Match Filter
     –  Aborts entire scan once the wrapped filter
        indicates a row or column is omitted


27                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




28                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Combining Filters

 •  Implemented by the FilterList class
     –  Wraps list of filters into a Filter compatible
        class
     –  Takes optional operator to decide how to
        handle the results of each wrapped filter
        (default: MUST_PASS_ALL)




29                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Combining Filters

 •  Filter lists can contain other filter lists
 •  Operator is fixed per list, but hierarchy
    allows to create combinations
 •  Using the proper List implementation
    helps controlling filter execution order




30              ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
List<Filter> filters = new ArrayList<Filter>();

 Filter filter1 = new RowFilter(!
    CompareFilter.CompareOp.GREATER_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-03"))); !
 filters.add(filter1); !
 Filter filter2 = new RowFilter(!
    CompareFilter.CompareOp.LESS_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-06"))); !
 filters.add(filter2); !
 Filter filter3 = new QualifierFilter(!
    CompareFilter.CompareOp.EQUAL, !
    new RegexStringComparator("col-0[03]")); !
 filters.add(filter3);!
 FilterList filterList1 = new FilterList(filters); !
 …!
 FilterList filterList2 = new
 FilterList(FilterList.Operator.MUST_PASS_ONE, filters); !


31                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




32                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Custom Filter

 •  Allows users to add missing filters
 •  Either implement Filter interface or use
    FilterBase skeleton
 •  Provides hooks called at different stages
    of the read process




33            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Interface
 public interface Filter extends Writable { !
   public enum ReturnCode { !
     INCLUDE, SKIP, NEXT_COL, NEXT_ROW,!
     SEEK_NEXT_USING_HINT } !
   public void reset()!
   public boolean filterRowKey(byte[] buffer, !
     int offset, int length) !
   public boolean filterAllRemaining()!
   public ReturnCode filterKeyValue(KeyValue v)!
   public void filterRow(List<KeyValue> kvs)!
   public boolean hasFilterRow()!
   public boolean filterRow()!
   public KeyValue getNextKeyHint(KeyValue !
     currentKV) !
 !


34               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Filter Return Codes




35          ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                 or redistribution without written permission is prohibited.
Merge Reads




36        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Filter Flow

 •  Filter hooks are called at
    different stages
 •  Seeks are done initially to
    find the next KeyValue
     –  Hint from previous filter
        invocation might help
 •  Early out checks improve
    performance


37      ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
             or redistribution without written permission is prohibited.
Example Code
public class CustomFilter extends FilterBase{ !
  private byte[] value = null; !
  private boolean filterRow = true; !
  public CustomFilter() { super(); }!
  public CustomFilter(byte[] value) { this.value = value; } !
  @Override

  public void reset() { this.filterRow = true; } !
  @Override !
  public ReturnCode filterKeyValue(KeyValue kv) {!
    if (Bytes.compareTo(value, kv.getValue()) == 0) { !
       filterRow = false; !
    } !
    return ReturnCode.INCLUDE; !
  } !
  @Override !
  public boolean filterRow() { return filterRow; } !
  ...!
} !
!
38                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Deploying Custom Filters

 •    Need to provide JAR file with filter class
 •    Deploy JAR to RegionServers
 •    Add JAR to HBASE_CLASSPATH
 •    Restart RegionServers

 •  Tip: Testing on cluster more involved, test
    on local machine first


39                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Summary




40         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.
Summary (cont.)




41         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.

Más contenido relacionado

La actualidad más candente

High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

La actualidad más candente (20)

Impala presentation
Impala presentationImpala presentation
Impala presentation
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Destacado

A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model
abodeltae
 

Destacado (20)

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model
 
Git branching-model
Git branching-modelGit branching-model
Git branching-model
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git Right
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 

Similar a HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
Morgan Tocker
 

Similar a HBaseCon 2012 | HBase Filtering - Lars George, Cloudera (20)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin Development
 
Oracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlOracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version Control
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source Code
 
44 Slides About 22 Modules
44 Slides About 22 Modules44 Slides About 22 Modules
44 Slides About 22 Modules
 
Oracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideOracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners Guide
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL Fabric
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
FOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureFOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component Infrastructure
 
Oracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideOracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners Guide
 
The Power Boost of Atelier
The Power Boost of AtelierThe Power Boost of Atelier
The Power Boost of Atelier
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development
 
Advance java session 17
Advance java session 17Advance java session 17
Advance java session 17
 
Extending ZF & Extending With ZF
Extending ZF & Extending With ZFExtending ZF & Extending With ZF
Extending ZF & Extending With ZF
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache Kafka
 
Provisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerProvisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack Manager
 
Apache - Mod-Rewrite
Apache - Mod-RewriteApache - Mod-Rewrite
Apache - Mod-Rewrite
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 

Más de Cloudera, Inc.

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

  • 1. HBaseCon, May 2012 HBase Filters Lars George, Solutions Architect
  • 2. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 2 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. About Me •  Solutions Architect @ Cloudera •  Apache HBase & Whirr Committer •  Author of HBase – The Definitive Guide •  Working with HBase since end of 2007 •  Organizer of the Munich OpenHUG •  Speaker at Conferences (Fosdem, Hadoop World) 3 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 4. Introduction to Filters •  Used in combination with get() and scan() API calls •  Steps: –  Create Filter instance –  Create Get or Scan instance –  Assign Filter to Get or Scan –  Call API and enjoy •  More fine-grained control over what is returned to the client 4 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 5. Filter Features •  Allow client to further narrow down what is retrieved –  Not just per row or column key, or per column family •  Predicate Pushdown –  Move filtering from client to server to reduce network traffic •  Varying performance implications, dependent on the use-case 5 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 6. Filter Pushdown 6 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 7. Filter Features (cont.) •  Filters have access to the entire row to decide its fate –  Access to KeyValue instances to check row keys, column qualifiers, timestamps, or values •  Scan batching might conflict with the above and might trigger an “Incompatible Filter” exception –  Example: DependentColumnFilter •  There is no cross invocation state –  Cannot filter rows based on dependent rows 7 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 8. Available Filters •  Many filters are supplied by HBase –  Based on row key, column family, or column qualifier –  Paging through rows and columns –  Based on dependencies •  Write your own filters –  Use FilterBase class to get a no-op skeleton and fill in the gaps 8 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 9. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 9 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 10. Comparison Filters •  Based on CompareFilter class •  Adds the compare() method to FilterBase! •  Takes operator that defines how the comparison is performed –  Predefined by client API •  Also needs a comparator to do the actual check –  HBase supplies a large set 10 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 11. Comparison Operators 11 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 12. Comparators 12 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 13. Comparison Filters (cont.) •  Not all combinations of operator and comparator make sense –  For example, the SubstringComparator replies only 0 (match) and 1(no match) –  Only EQUAL and NOT_EQUAL are useful –  Using other operators is allowed but will most likely yield unexpected results 13 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 14. Comparison Filters (cont.) •  HBase filters are usually filtering data out •  Comparison filters work in reverse as they include matching data –  Be mindful when selecting the comparison operator! 14 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 15. Available Comparison Filters •  Row Filter –  Based on row keys comparisons •  Family Filter –  Based on column family names •  Qualifier Filter –  Based on column names, aka qualifiers •  Value Filter –  Based on the actual value of a column 15 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 16. Available Comparison Filters (cont.) •  Dependent Column Filter –  Based on a timestamp of a reference column –  Includes all columns that have the same timestamp –  Implies that the entire row is accessible, since batching will not have access to the reference column •  No scanner batching allowed! –  Useful for loading interdependent changes within a row 16 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 17. Example Code Scan scan = new Scan();
 scan.addColumn(Bytes.toBytes("colfam1"), ! Bytes.toBytes("col-0")); ! Filter filter = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-22"))); scan.setFilter(filter);
 ResultScanner scanner = table.getScanner(scan); for (Result res : scanner) { ! System.out.println(res); ! } ! scanner.close(); ! ! 17 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 18. Example Ouput keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} ! keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} ! keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} ! keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} ! keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} ! keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} ! keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} ! keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} ! keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} ! keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} ! keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} ! keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} ! keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} ! keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} ! keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} ! keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} ! 18 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 19. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 19 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 20. Dedicated Filters •  Based directly on FilterBase class •  Often less useful for get() calls, since entire rows are filtered 20 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 21. Available Dedicated Filters •  Single Column Value Filter –  Filter rows based on one specific column –  Extra features •  “Filter if missing” •  “Get latest version only” –  Column must be part of the scan selection •  Or else it is all or nothing –  Also needs compare operation and an optional comparator 21 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 22. Available Dedicated Filters (cont.) •  Single Column Value Exclude Filter –  Same as the one before but excludes the selection column •  Prefix Filter –  Based on prefix of row keys –  Can early out the scan! •  Combine with start row for best performance 22 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 23. Available Dedicated Filters (cont.) •  Page Filter –  Allows pagination through rows –  Needs to be combined with setting the start row on subsequent scans –  Can early out the scan when limit is reached •  Key Only Filter –  Drop the value for every column •  First Key Only Filter –  Return only the first column key –  Useful for row counter, or get newest post type applications –  Can early out rest of row scan 23 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 24. Available Dedicated Filters (cont.) •  Inclusive Stop Filter –  As opposed to the exclusive stop row, this filter will include the final row •  Timestamp Filter –  Takes list of timestamps to include in result •  Column Count Get Filter –  Used to limit number of columns returned by a get() call 24 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 25. Available Dedicated Filters (cont.) •  Column Pagination Filter –  Allows to paginate through columns within a row –  Skips to offset parameter and returns limit columns •  Column Prefix Filter –  Analog to PrefixFilter, here for matching column qualifiers •  Random Row Filter 25 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 26. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 26 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 27. Decorating Filters •  Extend filters to gain additional control over the returned data •  Skip Filter –  Skip entire row when a column is filtered –  Not all filters are compatible •  While Match Filter –  Aborts entire scan once the wrapped filter indicates a row or column is omitted 27 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 28. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 28 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 29. Combining Filters •  Implemented by the FilterList class –  Wraps list of filters into a Filter compatible class –  Takes optional operator to decide how to handle the results of each wrapped filter (default: MUST_PASS_ALL) 29 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 30. Combining Filters •  Filter lists can contain other filter lists •  Operator is fixed per list, but hierarchy allows to create combinations •  Using the proper List implementation helps controlling filter execution order 30 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 31. List<Filter> filters = new ArrayList<Filter>();
 Filter filter1 = new RowFilter(! CompareFilter.CompareOp.GREATER_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-03"))); ! filters.add(filter1); ! Filter filter2 = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-06"))); ! filters.add(filter2); ! Filter filter3 = new QualifierFilter(! CompareFilter.CompareOp.EQUAL, ! new RegexStringComparator("col-0[03]")); ! filters.add(filter3);! FilterList filterList1 = new FilterList(filters); ! …! FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters); ! 31 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 32. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 32 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 33. Custom Filter •  Allows users to add missing filters •  Either implement Filter interface or use FilterBase skeleton •  Provides hooks called at different stages of the read process 33 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 34. Filter Interface public interface Filter extends Writable { ! public enum ReturnCode { ! INCLUDE, SKIP, NEXT_COL, NEXT_ROW,! SEEK_NEXT_USING_HINT } ! public void reset()! public boolean filterRowKey(byte[] buffer, ! int offset, int length) ! public boolean filterAllRemaining()! public ReturnCode filterKeyValue(KeyValue v)! public void filterRow(List<KeyValue> kvs)! public boolean hasFilterRow()! public boolean filterRow()! public KeyValue getNextKeyHint(KeyValue ! currentKV) ! ! 34 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 35. Filter Return Codes 35 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 36. Merge Reads 36 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 37. Filter Flow •  Filter hooks are called at different stages •  Seeks are done initially to find the next KeyValue –  Hint from previous filter invocation might help •  Early out checks improve performance 37 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 38. Example Code public class CustomFilter extends FilterBase{ ! private byte[] value = null; ! private boolean filterRow = true; ! public CustomFilter() { super(); }! public CustomFilter(byte[] value) { this.value = value; } ! @Override
 public void reset() { this.filterRow = true; } ! @Override ! public ReturnCode filterKeyValue(KeyValue kv) {! if (Bytes.compareTo(value, kv.getValue()) == 0) { ! filterRow = false; ! } ! return ReturnCode.INCLUDE; ! } ! @Override ! public boolean filterRow() { return filterRow; } ! ...! } ! ! 38 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 39. Deploying Custom Filters •  Need to provide JAR file with filter class •  Deploy JAR to RegionServers •  Add JAR to HBASE_CLASSPATH •  Restart RegionServers •  Tip: Testing on cluster more involved, test on local machine first 39 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 40. Summary 40 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 41. Summary (cont.) 41 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.