SlideShare a Scribd company logo
1 of 30
Tajo:  Big  Data  Warehouse  System
(What’s  New  Tajo  0.10  and  its  Beyond)
Big  Data  Day  LA  2015
Hyunsik Choi,  Gruter Inc.
(hschoi @  gruter.com)
Agenda
• Tajo  Overview
• Milestones  and  0.10  Features
• What’s  Next
Tajo:  A  Big  Data  Warehouse  System
• Apache  Top-­‐level  project
• Distributed  and  scalable  data  warehouse  system  on  various  data  
sources  (e.g,  HDFS,  S3,  Hbase,  …)
• Low  latency,  and  long  running  batch  queries  in  a  single  system
• Features
• ANSI  SQL  compliance
• Mature  SQL  features
• Partitioned  table  support
• Java/Python  UDF  support
• JDBC  driver  and  Java-­‐based  asynchronous  API
• Read/Write  support  of  CSV,  JSON,  RCFile,  SequenceFile,  Parquet,  ORC
Master
 Server
TajoMaster
Slave Server
TajoWorker
QueryMaster
Local
  Query
 Engine
StorageManager
HDFS HBase
Client
JDBC TSql Web
 UI
Slave
 Server
TajoWorker
QueryMaster
Local
  Query
 Engine
StorageManager
Slave
 Server
TajoWorker
QueryMaster
Local
  Query
 Engine
StorageManager
CatalogStore
DBMS
HCatalogSubmit
 
 a
 query
Manage
 metadata
Allocate
 a
 query
send
 tasks
 monitor
 
send
 tasks
 monitor
 
Tajo  Overall  Architecture
HDFS HBase HDFS HBase
Common  Scenarios
• Extraction,  Transformation,  Loading  (ETL)
• Interactive  BI/analytics  on  web-­‐scale  big  data
• Data  discovery/Exploratory  analysis  with  R  and  
existing  SQL  tools
Use  Cases:  Replacement  of  Commercial  DW
• Example:  1st  Telco  Company  in  South  Korea
• Goal:
• Replacement  of  slow  ETL  workloads  on  several  TB  datasets
• Lots  daily  reports  generation  about  users’  behaviors
• Ad-­‐hoc  analysis  on  Terabytes  data  sets
• Key  Benefits  of  Tajo:
• Simplification  of  DW  ETL,  OLAP,  and  Hadoop  ETL  into  an  
unified  system
• Saved  license  over  commercial  DW
• Much  less  cost,  more  data  analysis  within  the  same  SLA
Use  Cases:  Data  Discovery
• Example:  Music  streaming  service  
(26  million  users)
• Goal:  
• Analysis  on  purchase  history  for  target  marketing
• Benefits:
• Query  interactivity  on  large  data  sets
• Ability  to  use  existing  BI  visualization  tools
When  Tajo  is  right  choice?
• You  want  an  unified  system  for  batch  and  
interactive  queries  on  Hadoop,  Amazon  S3,  or  
Hbase.
• You  want  a  mixed  use  of  Hadoop-­‐based  DW  and  
RDBMS-­‐based  DW  or  want  to  replace  existing  
RDBMS  DW.
• You  want  to  use  existing  SQL  tools  on  Hadoop  DW

More Related Content

What's hot

Tajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopTajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for Hadoop
Hyunsik Choi
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Gruter
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
Hyunsik Choi
 

What's hot (19)

מיכאל
מיכאלמיכאל
מיכאל
 
Tajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopTajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for Hadoop
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterBig Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
2011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v52011 06-30-hadoop-summit v5
2011 06-30-hadoop-summit v5
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
 
Apache Kite
Apache KiteApache Kite
Apache Kite
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special EventApache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
Apache Tajo - Bay Area HUG Nov. 2013 LinkedIn Special Event
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Apache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouseApache Tajo - An open source big data warehouse
Apache Tajo - An open source big data warehouse
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
Hadoop
HadoopHadoop
Hadoop
 
NoSQL & HBase overview
NoSQL & HBase overviewNoSQL & HBase overview
NoSQL & HBase overview
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 

Viewers also liked

마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)
마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)
마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)
Jungah Park
 
Civica power point
Civica power pointCivica power point
Civica power point
adeasc
 

Viewers also liked (20)

마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)
마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)
마음을 이해하는 테크놀로지 (Technology towards understanding of the mind)
 
インターネットの仕組み
インターネットの仕組みインターネットの仕組み
インターネットの仕組み
 
STROKE ISQUEMICO Y HEMATOMA INTRAPARENQUIMATOSO- RUBER RODRIGUEZ
STROKE ISQUEMICO Y HEMATOMA INTRAPARENQUIMATOSO- RUBER RODRIGUEZ STROKE ISQUEMICO Y HEMATOMA INTRAPARENQUIMATOSO- RUBER RODRIGUEZ
STROKE ISQUEMICO Y HEMATOMA INTRAPARENQUIMATOSO- RUBER RODRIGUEZ
 
Apache Tajo - BWC 2014
Apache Tajo - BWC 2014Apache Tajo - BWC 2014
Apache Tajo - BWC 2014
 
TMC Hugues Sweeney CoPro Interview French Version
TMC Hugues Sweeney CoPro Interview French VersionTMC Hugues Sweeney CoPro Interview French Version
TMC Hugues Sweeney CoPro Interview French Version
 
Civica power point
Civica power pointCivica power point
Civica power point
 
Sap hana modelling online training
Sap hana modelling online trainingSap hana modelling online training
Sap hana modelling online training
 
OKKY_송년회_발표자료
OKKY_송년회_발표자료OKKY_송년회_발표자료
OKKY_송년회_발표자료
 
실무로 배우는 시스템 성능 최적화 - 4부. 프로세스 이해하기
실무로 배우는 시스템 성능 최적화 - 4부. 프로세스 이해하기실무로 배우는 시스템 성능 최적화 - 4부. 프로세스 이해하기
실무로 배우는 시스템 성능 최적화 - 4부. 프로세스 이해하기
 
[SOSCON 2015] 제 3회 EFL 한국 커뮤니티 세미나 - 16살 된 EFL은 어떻게 관리하고 배포ᄒ...
[SOSCON 2015] 제 3회 EFL 한국 커뮤니티 세미나 - 16살 된 EFL은 어떻게 관리하고 배포ᄒ...[SOSCON 2015] 제 3회 EFL 한국 커뮤니티 세미나 - 16살 된 EFL은 어떻게 관리하고 배포ᄒ...
[SOSCON 2015] 제 3회 EFL 한국 커뮤니티 세미나 - 16살 된 EFL은 어떻게 관리하고 배포ᄒ...
 
reproducción sexual Meiosis
 reproducción sexual Meiosis  reproducción sexual Meiosis
reproducción sexual Meiosis
 
Presentation day4 oracle12c
Presentation day4 oracle12cPresentation day4 oracle12c
Presentation day4 oracle12c
 
2016 화장품 미세 플라스틱 간담회_한국해양과학기술원
2016 화장품 미세 플라스틱 간담회_한국해양과학기술원2016 화장품 미세 플라스틱 간담회_한국해양과학기술원
2016 화장품 미세 플라스틱 간담회_한국해양과학기술원
 
7가지 동시성 모델 4장
7가지 동시성 모델 4장7가지 동시성 모델 4장
7가지 동시성 모델 4장
 
스프링 Generic autowired이용해보기
스프링 Generic autowired이용해보기스프링 Generic autowired이용해보기
스프링 Generic autowired이용해보기
 
Bases Moleculares de la Enfermedad del Parkinson
Bases Moleculares de la Enfermedad del ParkinsonBases Moleculares de la Enfermedad del Parkinson
Bases Moleculares de la Enfermedad del Parkinson
 
Gruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigDataGruter TECHDAY 2014 MelOn BigData
Gruter TECHDAY 2014 MelOn BigData
 
Histología de riñón
Histología de riñónHistología de riñón
Histología de riñón
 
Perfil cardiaco
Perfil cardiacoPerfil cardiaco
Perfil cardiaco
 
Electrolitos sericos- EGO
Electrolitos sericos- EGOElectrolitos sericos- EGO
Electrolitos sericos- EGO
 

Similar to What's New Tajo 0.10 and Its Beyond

Efficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache TajoEfficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache Tajo
DataWorks Summit
 

Similar to What's New Tajo 0.10 and Its Beyond (20)

Efficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache TajoEfficient In-situ Processing of Various Storage Types on Apache Tajo
Efficient In-situ Processing of Various Storage Types on Apache Tajo
 
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache TajoEfficient In­‐situ Processing of Various Storage Types on Apache Tajo
Efficient In­‐situ Processing of Various Storage Types on Apache Tajo
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
SQL Server 2016 new features
SQL Server 2016 new featuresSQL Server 2016 new features
SQL Server 2016 new features
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Sql azure dec_2010 Lynn & Ike
Sql azure dec_2010 Lynn & IkeSql azure dec_2010 Lynn & Ike
Sql azure dec_2010 Lynn & Ike
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 

More from Gruter

More from Gruter (20)

MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기MelOn 빅데이터 플랫폼과 Tajo 이야기
MelOn 빅데이터 플랫폼과 Tajo 이야기
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Expanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with TajoExpanding Your Data Warehouse with Tajo
Expanding Your Data Warehouse with Tajo
 
Introduction to Apache Tajo
Introduction to Apache TajoIntroduction to Apache Tajo
Introduction to Apache Tajo
 
스타트업사례로 본 로그 데이터분석 : Tajo on AWS
스타트업사례로 본 로그 데이터분석 : Tajo on AWS스타트업사례로 본 로그 데이터분석 : Tajo on AWS
스타트업사례로 본 로그 데이터분석 : Tajo on AWS
 
Big data analysis with R and Apache Tajo (in Korean)
Big data analysis with R and Apache Tajo (in Korean)Big data analysis with R and Apache Tajo (in Korean)
Big data analysis with R and Apache Tajo (in Korean)
 
Tajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWSTajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWS
 
Data analysis with Tajo
Data analysis with TajoData analysis with Tajo
Data analysis with Tajo
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
 
Elastic Search Performance Optimization - Deview 2014
Elastic Search Performance Optimization - Deview 2014Elastic Search Performance Optimization - Deview 2014
Elastic Search Performance Optimization - Deview 2014
 
Hadoop security DeView 2014
Hadoop security DeView 2014Hadoop security DeView 2014
Hadoop security DeView 2014
 
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Cloumon sw제품설명회 발표자료
Cloumon sw제품설명회 발표자료Cloumon sw제품설명회 발표자료
Cloumon sw제품설명회 발표자료
 
Tajo and SQL-on-Hadoop in Tech Planet 2013
Tajo and SQL-on-Hadoop in Tech Planet 2013Tajo and SQL-on-Hadoop in Tech Planet 2013
Tajo and SQL-on-Hadoop in Tech Planet 2013
 
SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
SQL-on-Hadoop with Apache Tajo,  and application case of SK TelecomSQL-on-Hadoop with Apache Tajo,  and application case of SK Telecom
SQL-on-Hadoop with Apache Tajo, and application case of SK Telecom
 
Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105
 
DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim
DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun KimDeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim
DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim
 
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Tajo와 SQL-on-Hadoop
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Tajo와 SQL-on-HadoopGRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Tajo와 SQL-on-Hadoop
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Tajo와 SQL-on-Hadoop
 

Recently uploaded

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 

What's New Tajo 0.10 and Its Beyond

  • 1. Tajo:  Big  Data  Warehouse  System (What’s  New  Tajo  0.10  and  its  Beyond) Big  Data  Day  LA  2015 Hyunsik Choi,  Gruter Inc. (hschoi @  gruter.com)
  • 2. Agenda • Tajo  Overview • Milestones  and  0.10  Features • What’s  Next
  • 3. Tajo:  A  Big  Data  Warehouse  System • Apache  Top-­‐level  project • Distributed  and  scalable  data  warehouse  system  on  various  data   sources  (e.g,  HDFS,  S3,  Hbase,  …) • Low  latency,  and  long  running  batch  queries  in  a  single  system • Features • ANSI  SQL  compliance • Mature  SQL  features • Partitioned  table  support • Java/Python  UDF  support • JDBC  driver  and  Java-­‐based  asynchronous  API • Read/Write  support  of  CSV,  JSON,  RCFile,  SequenceFile,  Parquet,  ORC
  • 15.  
  • 16.  a
  • 19.  a
  • 27. Common  Scenarios • Extraction,  Transformation,  Loading  (ETL) • Interactive  BI/analytics  on  web-­‐scale  big  data • Data  discovery/Exploratory  analysis  with  R  and   existing  SQL  tools
  • 28. Use  Cases:  Replacement  of  Commercial  DW • Example:  1st  Telco  Company  in  South  Korea • Goal: • Replacement  of  slow  ETL  workloads  on  several  TB  datasets • Lots  daily  reports  generation  about  users’  behaviors • Ad-­‐hoc  analysis  on  Terabytes  data  sets • Key  Benefits  of  Tajo: • Simplification  of  DW  ETL,  OLAP,  and  Hadoop  ETL  into  an   unified  system • Saved  license  over  commercial  DW • Much  less  cost,  more  data  analysis  within  the  same  SLA
  • 29. Use  Cases:  Data  Discovery • Example:  Music  streaming  service   (26  million  users) • Goal:   • Analysis  on  purchase  history  for  target  marketing • Benefits: • Query  interactivity  on  large  data  sets • Ability  to  use  existing  BI  visualization  tools
  • 30. When  Tajo  is  right  choice? • You  want  an  unified  system  for  batch  and   interactive  queries  on  Hadoop,  Amazon  S3,  or   Hbase. • You  want  a  mixed  use  of  Hadoop-­‐based  DW  and   RDBMS-­‐based  DW  or  want  to  replace  existing   RDBMS  DW. • You  want  to  use  existing  SQL  tools  on  Hadoop  DW
  • 31. Milestones 0.8 0.9 0.10 0.11 2014.5 2014.10 2015.3 2015.7 More  features     SQL  compatibility Stability   Analytical function Eco-­‐system expansion More  features • Python  UDF • Nested  Schema • Tablespace support • Query  federation • Better  query  scheduler
  • 33. Hbase Storage  Support • You  can  use  SQL  to  access  Hbase tables. • Tajo  supports  Hbase storage • CREATE  (EXTERNAL)/DROP/INSERT  (OVERWRITE)/SELECT • Bulk  Insertion  through  Direct  HFile writing CREATE TABLE hbase_t1 (key TEXT, col1 TEXT, col2 INT) USING hbase WITH ( ‘table’ = ‘t1’, ‘columns’ = ‘:key,cf1:col1,cf2:col2`, ‘hbase.zookeeper.quorum’ = ‘host1:2181,host2:2181’ )
  • 34. Better  AWS  support • Optimized  for  S3  and  EMR  environments • Fixed  many  bugs  related  to  S3 • EMR  bootstrap  supported  in  AWS  Labs  Github repo • A  quick  guide  for  Tajo  on  EMR • http://www.gruter.com/blog/setting-­‐up-­‐a-­‐tajo-­‐cluster-­‐on-­‐amazon-­‐emr/ • EMR  bootstrap  for  Tajo  on  EMR • https://github.com/awslabs/emr-­‐bootstrap-­‐actions/tree/master/tajo
  • 35. Tajo  JDBC Tajo  Cluster ETL  Tools BI  Tools Reporting  tools Better  SQL  tool  support  via  thin  JDBC HDFS HBase S3 Swift
  • 37. Improved  Performance  and  Stability • Offheap sort  operator  for  ORDER  BY  (TAJO-­‐907) • Hash  shuffle  IO  improvement  (TAJO-­‐374,  TAJO-­‐987) • Skewness handling  of  hash  shuffle • Automatic  parallel  degree  choice  during  runtime • Lots  of  query  optimizer  improvements • Add  Master  HA  (TAJO-­‐704) • More  error-­‐tolerant  shuffle  fetch  (TAJO-­‐789,  TAJO-­‐953)
  • 38. What’s  New  in  Tajo  0.11
  • 39. Nested  data  and  JSON  support • Nested  data  is  becoming  common • JSON,  BSON,  XML,  Protocol  Buffer,  Avro,  Parquet,  … • Many  web  applications  in  common  use  JSON. • MongoDB by  default  uses  JSON  document • Many  Hbase users  also  store  JSON  document  in  a  cell. • Flattening  causes  lots  of  data/computation   overhead. • Tajo  0.11  natively  supports  nested  data  format.
  • 40. How  to  create  a  nested  schema  table Use  ‘RECORD’  keyword  to  define  complex  data  type
  • 41. Loose  schema  for  self-­‐describing  formats You  can  handle  schema  evolving  with  ALTER  ADD  COLUMN!
  • 42. How  to  retrieve  nested  fields Input  Data Table  Definition SQL
  • 43. Query  federation  and  Tablespace support • Query  support  across  multiple  data  sources • You  can  perform  join  or  union  among  tables  on  different  systems. • Benefits: • Data  offload  from  RDBMS  to  Hadoop  vice  versa • A  mixed  use  of  existing  RDBMS  and  Hadoop. • Access  to  NoSQL  and  various  storages  through  SQL • An  unified  interface  for  SQL  tools HDFS NoSQL S3 Swift Apache  Tajo
  • 45. Tablespace Concept • Tablespace • Registered  storage  space • A  table  space  is  identified  by  an  unique  URI • Configuration  and  Policy  shared  in  all  tables  in  the  same   tablespace • It  allows  users  to  reuse  registered  storages  and  their   configuration.
  • 47. Create  Table  on  a  specified  Tablespace CREATE TABLE uptodate (key TEXT, …) TABLESPACE hbase1; CREATE TABLE archive (l_orderkey bigint, …) TABLESPACE warehouse USING text WITH (‘text.delimiter’ = ‘|’); Tablespace Name Format  name
  • 48. Operation  Push  Down SELECT X, SUM(Y) FROM table1 WHERE x 100 GROUP BY x Underlying Storage Filter,  Projection  or  Groupbycan  be  pushed  down  into Underlying  storages  (like  RDBMS,  Hbase,   Elasticsearch,  …)
  • 49. Current  Status  of  Storages • Storages: • HDFS  support • Amazon  S3  and  Openstack Swift • Hbase Scanner  and    Writer  -­‐ HFile and  Put  Mode • JDBC-­‐based  Scanner  and  Writer  (Working) • Kafka  Scanner  (Patch  Available) • Elastic  Search  (Patch  Available) • Data  Formats • Text,  JSON,  RCFile,  SequenceFile,  Avro,  Parquet,  and  ORC   (Patch  Available)
  • 50. Python  UDF • Python  UDF  and  UDAF  are  supported  in  Tajo • http://tajo.apache.org/docs/devel/functions/python.html @output_type('int4') def return_one(): return 1 @output_type('text') def helloworld(): return 'Hello, World’ @output_type('int4') def sum_py(a,b): return a+b
  • 51. Get  Involved! • We  are  recruiting  contributors! • General • http://tajo.apache.org • Getting  Started • http://tajo.apache.org/docs/0.10.0/getting_started.html • Downloads • http://tajo.apache.org/downloads.html • Jira  – Issue  Tracker • https://issues.apache.org/jira/browse/TAJO • Join  the  mailing  list • dev-­‐subscribe@tajo.apache.org • issues-­‐subscribe@tajo.apache.org
  • 52. QA