SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
  Petabyte Scale Data Warehouse System on Hadoop Ning Zhang Data Infrastructure
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object]
Facebook is a Set of Web Services …
…  at Large Scale ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Under the Hook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data-driven Business/R&D/Science … ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Biggest Challenge at Facebook – growth!  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why not Existing Data Warehousing Systems? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lets try Hadoop… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is HIVE? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why SQL on Hadoop? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Architecture
Hive: Familiar Schema Concepts Name HDFS Directory Table pvs /wh/pvs Partition ds = 20090801, ctry = US /wh/pvs/ds=20090801/ctry=US Bucket user into 32 buckets HDFS file for user hash 0 /wh/pvs/ds=20090801/ctry=US/part-00000
Column Data Types ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive: Making Optimizations Transparent  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive: Making Optimizations Transparent ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive: Open & Extensible ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MapReduce Scripts Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Usage in Facebook
Hive & Hadoop Usage @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop & Hive Cluster @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Flow Architecture at Facebook Web Servers Scribe MidTier Filers Production Hive-Hadoop Cluster Oracle RAC Federated MySQL Scribe-Hadoop Cluster Adhoc Hive-Hadoop Cluster Hive replication
Scribe-HDFS: 101 Scribed Scribed Scribed Scribed Scribed <category, msgs> HDFS Data Node HDFS Data Node HDFS Data Node Append to  /staging/<category>/<file> Scribe-HDFS
Scribe-HDFS: Near real time Hadoop ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Warehousing at Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More Real-World Use Cases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
Data Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid uhash time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 uhash age_bkt gender 111 B3 female 222 B4 male key value 111 < 2, B3> 222 < 2, B4> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, B3> key value 222 < 1, 1> 222 < 2, B4> Shuffle Sort Pageid age_bkt 1 B3 2 B3 pageid age_bkt 1 B4
Join Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object]
Hive QL – Group By ,[object Object],[object Object],[object Object]
Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age_bkt 1 B3 1 B3 pageid age_bkt count 1 B3 3 pageid age_bkt 2 B4 1 B3 key value <1,B3> 2 key value <1,B3> 1 <2,B4> 1 key value <1,B3> 2 <1,B3> 1 key value <2,B4> 1 Shuffle Sort pageid age_bkt count 2 B4 1
Group by Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hive Extensibility Features
Hive is an open system ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Storage Format Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Existing File Formats *  Splitable: Capable of splitting the file so that a single huge file can be processed by multiple mappers in parallel. TEXTFILE SEQUENCEFILE RCFILE Data type text only  text/binary text/binary Internal Storage order Row-based Row-based Column-based Compression File-based Block-based Block-based Splitable* YES YES YES Splitable* after compression NO YES YES
Serialization Formats ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SerDe Examples ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Existing SerDes * LazyObjects: deserialize the columns only when accessed. * Binary Sortable: binary format preserving the sort order. LazySimpleSerDe LazyBinarySerDe (HIVE-640) BinarySortable SerDe serialized format delimited proprietary binary proprietary binary sortable* deserialized format LazyObjects* LazyBinaryObjects* Writable ThriftSerDe (HIVE-706) RegexSerDe ColumnarSerDe serialized format Depends on the Thrift Protocol Regex formatted proprietary column-based deserialized format User-defined Classes, Java Primitive Objects ArrayList<String> LazyObjects*
UDF Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
UDAF Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Comparison of UDF/UDAF v.s. M/R scripts UDF/UDAF M/R scripts language Java any language data format in-memory objects serialized streams 1/1 input/output supported via UDF supported n/1 input/output supported via UDAF supported 1/n input/output supported via UDTF supported Speed faster Slower
Hive Interoperability
Interoperability: Interfaces ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Interoperability: Microstrategy ,[object Object],[object Object],[object Object]
Future ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Casesnzhang
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoopChirag Ahuja
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionXplenty
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 

La actualidad más candente (20)

The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Hadoop
HadoopHadoop
Hadoop
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
6.hive
6.hive6.hive
6.hive
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 

Similar a Hive @ Hadoop day seattle_2010

Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 

Similar a Hive @ Hadoop day seattle_2010 (20)

Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 

Hive @ Hadoop day seattle_2010

  • 1. Petabyte Scale Data Warehouse System on Hadoop Ning Zhang Data Infrastructure
  • 2.
  • 3. Facebook is a Set of Web Services …
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 13. Hive: Familiar Schema Concepts Name HDFS Directory Table pvs /wh/pvs Partition ds = 20090801, ctry = US /wh/pvs/ds=20090801/ctry=US Bucket user into 32 buckets HDFS file for user hash 0 /wh/pvs/ds=20090801/ctry=US/part-00000
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 22.
  • 23.
  • 24. Data Flow Architecture at Facebook Web Servers Scribe MidTier Filers Production Hive-Hadoop Cluster Oracle RAC Federated MySQL Scribe-Hadoop Cluster Adhoc Hive-Hadoop Cluster Hive replication
  • 25. Scribe-HDFS: 101 Scribed Scribed Scribed Scribed Scribed <category, msgs> HDFS Data Node HDFS Data Node HDFS Data Node Append to /staging/<category>/<file> Scribe-HDFS
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. Hive QL – Join in Map Reduce page_view user pv_users Map Reduce key value 111 < 1, 1> 111 < 1, 2> 222 < 1, 1> pageid uhash time 1 111 9:08:01 2 111 9:08:13 1 222 9:08:14 uhash age_bkt gender 111 B3 female 222 B4 male key value 111 < 2, B3> 222 < 2, B4> key value 111 < 1, 1> 111 < 1, 2> 111 < 2, B3> key value 222 < 1, 1> 222 < 2, B4> Shuffle Sort Pageid age_bkt 1 B3 2 B3 pageid age_bkt 1 B4
  • 32.
  • 33.
  • 34. Hive QL – Group By in Map Reduce pv_users Map Reduce pageid age_bkt 1 B3 1 B3 pageid age_bkt count 1 B3 3 pageid age_bkt 2 B4 1 B3 key value <1,B3> 2 key value <1,B3> 1 <2,B4> 1 key value <1,B3> 2 <1,B3> 1 key value <2,B4> 1 Shuffle Sort pageid age_bkt count 2 B4 1
  • 35.
  • 37.
  • 38.
  • 39. Existing File Formats * Splitable: Capable of splitting the file so that a single huge file can be processed by multiple mappers in parallel. TEXTFILE SEQUENCEFILE RCFILE Data type text only text/binary text/binary Internal Storage order Row-based Row-based Column-based Compression File-based Block-based Block-based Splitable* YES YES YES Splitable* after compression NO YES YES
  • 40.
  • 41.
  • 42. Existing SerDes * LazyObjects: deserialize the columns only when accessed. * Binary Sortable: binary format preserving the sort order. LazySimpleSerDe LazyBinarySerDe (HIVE-640) BinarySortable SerDe serialized format delimited proprietary binary proprietary binary sortable* deserialized format LazyObjects* LazyBinaryObjects* Writable ThriftSerDe (HIVE-706) RegexSerDe ColumnarSerDe serialized format Depends on the Thrift Protocol Regex formatted proprietary column-based deserialized format User-defined Classes, Java Primitive Objects ArrayList<String> LazyObjects*
  • 43.
  • 44.
  • 45. Comparison of UDF/UDAF v.s. M/R scripts UDF/UDAF M/R scripts language Java any language data format in-memory objects serialized streams 1/1 input/output supported via UDF supported n/1 input/output supported via UDAF supported 1/n input/output supported via UDTF supported Speed faster Slower
  • 47.
  • 48.
  • 49.

Notas del editor

  1. Polls: How many of you are working or have worked on DW/BI in your organization? How many of you are satisfied with your current solution? How many of your have been using open source solutions in your organization?
  2. List of apps, news feed, ads/notifications Dynamic web site What boils down to is a set of web services, not a big deal