SlideShare a Scribd company logo
1 of 26
Download to read offline
Shivram Mani ( Pivotal)
Unified Framework for
Big Data Foreign Data Wrappers
@ FOSDEM PGDay 2016
Agenda
● Introduction to Hadoop Ecosystem
● Why Postgres SQL on Hadoop
● Current state of SQL on Hadoop (FDW/Big data wrappers)
● PXF - Design & Architecture
● Demo
● Benefits of using PXF with FDW
● Q&A
Agenda
➢ Introduction to Hadoop Ecosystem
● Why Postgres SQL on Hadoop
● Current state of SQL on Hadoop (FDW/Big data wrappers)
● PXF - Design & Architecture
● Demo
● Benefits of using PXF with FDW
● Q&A
What is Hadoop/Big Data
Apache Hadoop is an open source framework for distributed processing of large data sets across clusters
of computers.
● Commodity Hardware
● Scale out
● Fault tolerance
● Support multiple file formats
Mapreduce HBase
Hive Pig
Clustered File
System
Distributed
Data Processing
Top level
Abstractions
ETL Tools BI Tools RDMS
Hadoop Distributed File System (HDFS)
Top level
Interfaces
Agenda
● Introduction to Hadoop Ecosystem
➢ Why Postgres SQL on Hadoop
● Current state of SQL on Hadoop (FDW/Big data wrappers)
● PXF
● Demo
● Benefits of using PXF with FDW
● Q&A
Motivations: SQL on Hadoop
RDBMS
?
various formats, storages
supported on HDFS
● ANSI SQL
● Cost based optimizer
● Transactions
● Indexes
Foreign
Tables!
Agenda
● Introduction to Hadoop Ecosystem
● Why Postgres SQL on Hadoop
➢ Current state of SQL on External Hadoop - FDW/Big data wrappers
● PXF - Design & Architecture
● Demo
● Benefits of using PXF with FDW
● Q&A
Foreign Data Wrappers (FDW)
Foreign tables and foreign data wrapper is postgres way to read external data.
1. Create FDW (compiled C functions in the handler)
2. Declare the extension (FDW)
3. Create server that uses the wrapper
4. Create table that uses the server
CREATE FOREIGN DATA WRAPPER
hadoop_fdw
HANDLER hadoop_fdw_handler
NO VALIDATOR;
CREATE EXTENSION hadoop_fdw;
CREATE SERVER hadoop_server
FOREIGN DATA WRAPPER hadoop_fdw
OPTIONS (address '127.0.0.1', port '10000');
CREATE FOREIGN TABLE retail_history (
name text,
price double precision )
SERVER hadoop_server
OPTIONS (table 'example.retail_history');
Foreign Data Wrappers - Implementation
Creating a new foreign data wrapper simply consists of implementing the API of the FDW as c-
language functions.
Scanning a foreign table requires implementation of the following:
● GetForeignRelSize - Estimate of the relation size
● GetForeignPaths - Get access paths for the foreign data
● GetForeignPlan - Plan the foreign paths of this table
● BeginForeignScan - Start scan. Open connections, etc
● IterateForeignScan - Perform scan and return tuples
● EndForeignScan - End scan. Close connection, etc
Big Data Wrappers (Multicorn, BigSQL EnterpriseDB)
Create a Hive table
corresponding to HDFS file/HBase table
Create Extension, Server &
Foreign Table
schema and necessary Options
Results mapped
to postgres table
Query connects to HiveServer
via thrift client
Hive server executes
mapreduce jobs
Query Foreign Table
Big Data Wrapper - Communication
libthrift
F
D
W
MetaStore
Agenda
● Introduction to Hadoop Ecosystem
● Why Postgres SQL on Hadoop
● Current state of SQL on Hadoop - FDW/Big data wrappers
➢ PXF - Design & Architecture
● Demo
● Benefits of using PXF with FDW
● Q&A
● HAWQ is an MPP SQL engine on HDFS (evolved from Greenplum Database)
● PXF is an extensible framework that allows HAWQ to query external data.
● PXF includes built-in connectors for accessing data in HDFS files, Hive & HBase tables.
● Users can create custom connectors to other parallel data stores or processing engines.
HAWQ Extension Framework - PXF
PXF - Communication
Apache Tomcat
PXF Webapp
REST API
Java API
libhdfs3, written in C, segments
External Tables
Native Tables
HTTP, port: 51200
Java API
Java API
Architecture - Deployment
HAWQ
Master Node
NN
pxf
HBase
Master
DN4
pxf
HAWQ
seg4
DN1
pxf
HAWQ
seg1
HBase
Region
Server1
DN2
pxf
HAWQ
seg2
HBase
Region
Server2
DN3
pxf
HAWQ
seg3
HBase
Region
Server3
* PXF needs to be installed on all DN
* PXF is recommended to be installed on NN
Design - Components(PXF)
Fragmenter Get the locations of fragments for an external table
Implicitly provides stats to query optimizer
Accessor Understand and read/write the fragment , return
records
Resolver Convert records to HAWQ consumable format (Data Types)
CREATE EXTENSION hadoop_fdw;
CREATE SERVER hadoop_server
FOREIGN DATA WRAPPER hadoop_fdw
OPTIONS (address '127.0.0.1', port
'10000');
CREATE FOREIGN TABLE retail_history (
name text,
price double precision )
SERVER hadoop_server
OPTIONS (table 'example.retail_history');
CREATE PROTOCOL PXF;
DDL Comparison
LOCATION('pxf://127.0.0.1:51200/
example.retail_history?
CREATE EXTERNAL TABLE retail_history
name text,
price double precision )
PROFILE = HIVE
FORMAT 'CUSTOM'
(formatter='pxfwritable_import');
PXF FDW
* Items with the same color have similar action
Architecture - Data Flow: Query (HDFS)
HAWQ
Master Node NN
pxf
DN1
pxf
HAWQ
seg1
select * from ext_table0
pxf:
//<namenode><port>/path
/to/data
getFragments()
REST
1
Fragments
JSON2
7
3
Split
mapping
(fragment -
> segment)
DN1
pxf
HAWQ
seg1
DN1
pxf
HAWQ
seg1
Query dispatched to Segment 1,2,3… (Interconnect)
5
Read() REST
6 records
8
query result
records (stream)
Fragmenter
Resolver
Accessor
4
PXF Plugins, Profiles
• Built-in with HAWQ (Profiles)
• HDFS: HDFSTextSimple(R/W), HDFSTextMulti(R), Avro(R)
• Hive(R): Hive, HiveRC, HiveText
• HBase(R): HBase
• Community (https://bintray.com/big-data/maven/pxf-plugins/view )
• JSON HAWQ-178
• Cassandra
• Accumulo
• ...
Agenda
● Introduction to Hadoop Ecosystem
● Why Postgres SQL on Hadoop
● Current state of SQL on Hadoop - FDW/Big data wrappers
● PXF - Design & Architecture
➢ Demo
● Benefits of using PXF with FDW
● Q&A
Demo
https://github.com/shivzone/pxf_demo
● Implement FDW callback functions that will interact with PXF.
● Use the enhanced libcurl library - libchurl
PXF as Big Data Wrapper Abstraction
Apache
Tomcat
PXF
WebappREST API Java API
HTTP, port: 51200
Java API
Java API
F
D
W
Agenda
● Introduction to Hadoop Ecosystem
● Why Postgres SQL on Hadoop
● Current state of SQL on Hadoop - FDW/Big data wrappers
● PXF - Design & Architecture
● Demo
➢ Benefits of using PXF with FDW
● Q&A
Benefits of using PXF with FDW
● FDW isolated from underlying hadoop ecosystem APIs
● Direct access of HDFS data.
● Access Hive data without overhead of underlying execution framework
● Access HBase data without mapped Hive table
● Supports Single node & parallel execution
● Extensibility/ease of building extensions
● Support for multiple versions of underlying distributions
● Built in filter push down and support for stats
Resources
● Github
https://github.com/apache/incubator-hawq/tree/master/pxf
● Documentation
http://hawq.docs.pivotal.io/docs-hawq/topics/PivotalExtensionFrameworkPXF.html
● Wiki
https://cwiki.apache.org/confluence/display/HAWQ/PXF
Q & A

More Related Content

Viewers also liked

Viewers also liked (19)

HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)Pivotal HAWQ - High Availability (2014)
Pivotal HAWQ - High Availability (2014)
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
Build & test Apache Hawq
Build & test Apache Hawq Build & test Apache Hawq
Build & test Apache Hawq
 
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to ApacheApache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
 
Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1Phd tutorial hawq_v0.1
Phd tutorial hawq_v0.1
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 
Webinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_finalWebinar turbo charging_data_science_hawq_on_hdp_final
Webinar turbo charging_data_science_hawq_on_hdp_final
 
How to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARNHow to manage Hortonworks HDB Resources with YARN
How to manage Hortonworks HDB Resources with YARN
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
 

Recently uploaded

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 

Unified Framework for Big Data FDW

  • 1. Shivram Mani ( Pivotal) Unified Framework for Big Data Foreign Data Wrappers @ FOSDEM PGDay 2016
  • 2. Agenda ● Introduction to Hadoop Ecosystem ● Why Postgres SQL on Hadoop ● Current state of SQL on Hadoop (FDW/Big data wrappers) ● PXF - Design & Architecture ● Demo ● Benefits of using PXF with FDW ● Q&A
  • 3. Agenda ➢ Introduction to Hadoop Ecosystem ● Why Postgres SQL on Hadoop ● Current state of SQL on Hadoop (FDW/Big data wrappers) ● PXF - Design & Architecture ● Demo ● Benefits of using PXF with FDW ● Q&A
  • 4. What is Hadoop/Big Data Apache Hadoop is an open source framework for distributed processing of large data sets across clusters of computers. ● Commodity Hardware ● Scale out ● Fault tolerance ● Support multiple file formats Mapreduce HBase Hive Pig Clustered File System Distributed Data Processing Top level Abstractions ETL Tools BI Tools RDMS Hadoop Distributed File System (HDFS) Top level Interfaces
  • 5. Agenda ● Introduction to Hadoop Ecosystem ➢ Why Postgres SQL on Hadoop ● Current state of SQL on Hadoop (FDW/Big data wrappers) ● PXF ● Demo ● Benefits of using PXF with FDW ● Q&A
  • 6. Motivations: SQL on Hadoop RDBMS ? various formats, storages supported on HDFS ● ANSI SQL ● Cost based optimizer ● Transactions ● Indexes Foreign Tables!
  • 7. Agenda ● Introduction to Hadoop Ecosystem ● Why Postgres SQL on Hadoop ➢ Current state of SQL on External Hadoop - FDW/Big data wrappers ● PXF - Design & Architecture ● Demo ● Benefits of using PXF with FDW ● Q&A
  • 8. Foreign Data Wrappers (FDW) Foreign tables and foreign data wrapper is postgres way to read external data. 1. Create FDW (compiled C functions in the handler) 2. Declare the extension (FDW) 3. Create server that uses the wrapper 4. Create table that uses the server CREATE FOREIGN DATA WRAPPER hadoop_fdw HANDLER hadoop_fdw_handler NO VALIDATOR; CREATE EXTENSION hadoop_fdw; CREATE SERVER hadoop_server FOREIGN DATA WRAPPER hadoop_fdw OPTIONS (address '127.0.0.1', port '10000'); CREATE FOREIGN TABLE retail_history ( name text, price double precision ) SERVER hadoop_server OPTIONS (table 'example.retail_history');
  • 9. Foreign Data Wrappers - Implementation Creating a new foreign data wrapper simply consists of implementing the API of the FDW as c- language functions. Scanning a foreign table requires implementation of the following: ● GetForeignRelSize - Estimate of the relation size ● GetForeignPaths - Get access paths for the foreign data ● GetForeignPlan - Plan the foreign paths of this table ● BeginForeignScan - Start scan. Open connections, etc ● IterateForeignScan - Perform scan and return tuples ● EndForeignScan - End scan. Close connection, etc
  • 10. Big Data Wrappers (Multicorn, BigSQL EnterpriseDB) Create a Hive table corresponding to HDFS file/HBase table Create Extension, Server & Foreign Table schema and necessary Options Results mapped to postgres table Query connects to HiveServer via thrift client Hive server executes mapreduce jobs Query Foreign Table
  • 11. Big Data Wrapper - Communication libthrift F D W MetaStore
  • 12. Agenda ● Introduction to Hadoop Ecosystem ● Why Postgres SQL on Hadoop ● Current state of SQL on Hadoop - FDW/Big data wrappers ➢ PXF - Design & Architecture ● Demo ● Benefits of using PXF with FDW ● Q&A
  • 13. ● HAWQ is an MPP SQL engine on HDFS (evolved from Greenplum Database) ● PXF is an extensible framework that allows HAWQ to query external data. ● PXF includes built-in connectors for accessing data in HDFS files, Hive & HBase tables. ● Users can create custom connectors to other parallel data stores or processing engines. HAWQ Extension Framework - PXF
  • 14. PXF - Communication Apache Tomcat PXF Webapp REST API Java API libhdfs3, written in C, segments External Tables Native Tables HTTP, port: 51200 Java API Java API
  • 15. Architecture - Deployment HAWQ Master Node NN pxf HBase Master DN4 pxf HAWQ seg4 DN1 pxf HAWQ seg1 HBase Region Server1 DN2 pxf HAWQ seg2 HBase Region Server2 DN3 pxf HAWQ seg3 HBase Region Server3 * PXF needs to be installed on all DN * PXF is recommended to be installed on NN
  • 16. Design - Components(PXF) Fragmenter Get the locations of fragments for an external table Implicitly provides stats to query optimizer Accessor Understand and read/write the fragment , return records Resolver Convert records to HAWQ consumable format (Data Types)
  • 17. CREATE EXTENSION hadoop_fdw; CREATE SERVER hadoop_server FOREIGN DATA WRAPPER hadoop_fdw OPTIONS (address '127.0.0.1', port '10000'); CREATE FOREIGN TABLE retail_history ( name text, price double precision ) SERVER hadoop_server OPTIONS (table 'example.retail_history'); CREATE PROTOCOL PXF; DDL Comparison LOCATION('pxf://127.0.0.1:51200/ example.retail_history? CREATE EXTERNAL TABLE retail_history name text, price double precision ) PROFILE = HIVE FORMAT 'CUSTOM' (formatter='pxfwritable_import'); PXF FDW * Items with the same color have similar action
  • 18. Architecture - Data Flow: Query (HDFS) HAWQ Master Node NN pxf DN1 pxf HAWQ seg1 select * from ext_table0 pxf: //<namenode><port>/path /to/data getFragments() REST 1 Fragments JSON2 7 3 Split mapping (fragment - > segment) DN1 pxf HAWQ seg1 DN1 pxf HAWQ seg1 Query dispatched to Segment 1,2,3… (Interconnect) 5 Read() REST 6 records 8 query result records (stream) Fragmenter Resolver Accessor 4
  • 19. PXF Plugins, Profiles • Built-in with HAWQ (Profiles) • HDFS: HDFSTextSimple(R/W), HDFSTextMulti(R), Avro(R) • Hive(R): Hive, HiveRC, HiveText • HBase(R): HBase • Community (https://bintray.com/big-data/maven/pxf-plugins/view ) • JSON HAWQ-178 • Cassandra • Accumulo • ...
  • 20. Agenda ● Introduction to Hadoop Ecosystem ● Why Postgres SQL on Hadoop ● Current state of SQL on Hadoop - FDW/Big data wrappers ● PXF - Design & Architecture ➢ Demo ● Benefits of using PXF with FDW ● Q&A
  • 22. ● Implement FDW callback functions that will interact with PXF. ● Use the enhanced libcurl library - libchurl PXF as Big Data Wrapper Abstraction Apache Tomcat PXF WebappREST API Java API HTTP, port: 51200 Java API Java API F D W
  • 23. Agenda ● Introduction to Hadoop Ecosystem ● Why Postgres SQL on Hadoop ● Current state of SQL on Hadoop - FDW/Big data wrappers ● PXF - Design & Architecture ● Demo ➢ Benefits of using PXF with FDW ● Q&A
  • 24. Benefits of using PXF with FDW ● FDW isolated from underlying hadoop ecosystem APIs ● Direct access of HDFS data. ● Access Hive data without overhead of underlying execution framework ● Access HBase data without mapped Hive table ● Supports Single node & parallel execution ● Extensibility/ease of building extensions ● Support for multiple versions of underlying distributions ● Built in filter push down and support for stats
  • 26. Q & A