SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
Basic Configurations in
Apache Tajo 0.11
Jihoon Son / Gruter Inc.
Preliminary
● This slides assume that your cluster is installed in
fully distributed mode
○ For cluster setup, please see http://tajo.apache.
org/docs/current/configuration/cluster_setup.
html.
2
Configurations
● Cluster resource
● Concurrent disk access
● Garbage collection
3
Cluster Resource
● How efficiently utilize cluster resources
○ Ideally ...
■ In every machine
● Every core does some computation
● The whole memory is used
● Disk bandwidth is fully utilized
4
Cluster Resource
● Configurations
○ tajo-env.sh
■ TAJO_WORKER_HEAPSIZE
● Heap size allocated for each worker
● Default: 5000 MB (0.11.1 ~)
○ tajo-site.xml
■ tajo.worker.resource.disks
● # of disks per node
■ tajo.task.resource.min.memory-mb
● Minimum amount of allocatable memory per task
● Default: 1000 MB 5
http://www.cs.odu.edu/~cs471w/spring11/lectures/MassStorage_files/image002.jpg
Concurrent Disk Access
● Frequent random access might degrade
performance
6
Concurrent Disk Access
● Configurations
○ tajo-site.xml
■ tajo.worker.resource.disk.parallel-execution.num
● # of tasks assigned per disk
● Default: 2
○ Increase for SSD
■ tajo.worker.tmpdir.locations
● Temporal directories which are used for query
execution
● Recommended to use all available disks
7
Garbage Collection
● GC option in tajo-env.sh
○ -XX:+UseParallelOldGC
■ Parallel old GC
■ Default (0.11.1 ~)
○ -XX:+UseG1GC
■ G1 GC
8
Garbage Collection
● Migrate ParallelOldGC to G1GC when
○ Full GC durations are too long or too frequent
○ The rate of object allocation rate or promotion
varies significantly
○ Undesired long garbage collection or compaction
pauses (longer than 0.5 to 1 second)
● For further information, please see http://www.
oracle.com/technetwork/tutorials/tutorials-
1876574.html.
9
Conclusion
● More configurations are found on http://tajo.
apache.org/docs/current/configuration/tajo-site-
xml.html
● But, Tajo works well with default
configurations!
10
Get Involved!
● We are recruiting contributors!
● General
○ http://tajo.apache.org/
● Getting Started
○ http://tajo.apache.org/docs/current/getting_started.html
● Downloads
○ http://tajo.apache.org/downloads.html
● Issue tracker
○ http://issues.apache.org/jira/browse/TAJO
● Join the mailing list
○ dev-subscribe@tajo.apache.org
○ issues-subscribe@tajo.apache.org
11
Q & A
12

Más contenido relacionado

La actualidad más candente

Availability and scalability in mongo
Availability and scalability in mongoAvailability and scalability in mongo
Availability and scalability in mongo
Md. Khairul Anam
 

La actualidad más candente (20)

BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Intorduce to Ceph
Intorduce to CephIntorduce to Ceph
Intorduce to Ceph
 
Indexes don't mean slow inserts.
Indexes don't mean slow inserts.Indexes don't mean slow inserts.
Indexes don't mean slow inserts.
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 
Bluestore
BluestoreBluestore
Bluestore
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
Availability and scalability in mongo
Availability and scalability in mongoAvailability and scalability in mongo
Availability and scalability in mongo
 
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vosOSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Strategies for Backing Up MongoDB
Strategies for Backing Up MongoDBStrategies for Backing Up MongoDB
Strategies for Backing Up MongoDB
 
Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in Containers
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
MongoDB Backup & Disaster Recovery
MongoDB Backup & Disaster RecoveryMongoDB Backup & Disaster Recovery
MongoDB Backup & Disaster Recovery
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Vmfs
VmfsVmfs
Vmfs
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 

Similar a Apache tajo configuration

Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "
Tanya Denisyuk
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
Baruch Osoveskiy
 
Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...
Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...
Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...
Zarafa
 

Similar a Apache tajo configuration (20)

Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Tuning Linux for Databases.
Tuning Linux for Databases.Tuning Linux for Databases.
Tuning Linux for Databases.
 
Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "Алексей Лесовский "Тюнинг Linux для баз данных. "
Алексей Лесовский "Тюнинг Linux для баз данных. "
 
MariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & OptimizationMariaDB Server Performance Tuning & Optimization
MariaDB Server Performance Tuning & Optimization
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Percona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and ImprovementsPercona XtraBackup - New Features and Improvements
Percona XtraBackup - New Features and Improvements
 
Optimizing Linux Servers
Optimizing Linux ServersOptimizing Linux Servers
Optimizing Linux Servers
 
Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...
Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...
Zararfa SummerCamp 2012 - Performing fast backups in large scale environments...
 
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Cha...
 
Apache Geode Offheap Storage
Apache Geode Offheap StorageApache Geode Offheap Storage
Apache Geode Offheap Storage
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
Loadays MySQL
Loadays MySQLLoadays MySQL
Loadays MySQL
 
Backing up thousands of containers
Backing up thousands of containersBacking up thousands of containers
Backing up thousands of containers
 
linux monitoring and performance tunning
linux monitoring and performance tunning linux monitoring and performance tunning
linux monitoring and performance tunning
 
Bringing up Android on your favorite X86 Workstation or VM (AnDevCon Boston, ...
Bringing up Android on your favorite X86 Workstation or VM (AnDevCon Boston, ...Bringing up Android on your favorite X86 Workstation or VM (AnDevCon Boston, ...
Bringing up Android on your favorite X86 Workstation or VM (AnDevCon Boston, ...
 
Bcache and Aerospike
Bcache and AerospikeBcache and Aerospike
Bcache and Aerospike
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 

Más de Jihoon Son

Más de Jihoon Son (7)

Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Query optimization in Apache Tajo
Query optimization in Apache TajoQuery optimization in Apache Tajo
Query optimization in Apache Tajo
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
 
Apache Tajo on Swift
Apache Tajo on SwiftApache Tajo on Swift
Apache Tajo on Swift
 
Apache Tajo 프로젝트 소개 및 최신 기술동향
Apache Tajo 프로젝트 소개 및 최신 기술동향Apache Tajo 프로젝트 소개 및 최신 기술동향
Apache Tajo 프로젝트 소개 및 최신 기술동향
 
대학과 오픈소스
대학과 오픈소스대학과 오픈소스
대학과 오픈소스
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Apache tajo configuration

  • 1. Basic Configurations in Apache Tajo 0.11 Jihoon Son / Gruter Inc.
  • 2. Preliminary ● This slides assume that your cluster is installed in fully distributed mode ○ For cluster setup, please see http://tajo.apache. org/docs/current/configuration/cluster_setup. html. 2
  • 3. Configurations ● Cluster resource ● Concurrent disk access ● Garbage collection 3
  • 4. Cluster Resource ● How efficiently utilize cluster resources ○ Ideally ... ■ In every machine ● Every core does some computation ● The whole memory is used ● Disk bandwidth is fully utilized 4
  • 5. Cluster Resource ● Configurations ○ tajo-env.sh ■ TAJO_WORKER_HEAPSIZE ● Heap size allocated for each worker ● Default: 5000 MB (0.11.1 ~) ○ tajo-site.xml ■ tajo.worker.resource.disks ● # of disks per node ■ tajo.task.resource.min.memory-mb ● Minimum amount of allocatable memory per task ● Default: 1000 MB 5
  • 7. Concurrent Disk Access ● Configurations ○ tajo-site.xml ■ tajo.worker.resource.disk.parallel-execution.num ● # of tasks assigned per disk ● Default: 2 ○ Increase for SSD ■ tajo.worker.tmpdir.locations ● Temporal directories which are used for query execution ● Recommended to use all available disks 7
  • 8. Garbage Collection ● GC option in tajo-env.sh ○ -XX:+UseParallelOldGC ■ Parallel old GC ■ Default (0.11.1 ~) ○ -XX:+UseG1GC ■ G1 GC 8
  • 9. Garbage Collection ● Migrate ParallelOldGC to G1GC when ○ Full GC durations are too long or too frequent ○ The rate of object allocation rate or promotion varies significantly ○ Undesired long garbage collection or compaction pauses (longer than 0.5 to 1 second) ● For further information, please see http://www. oracle.com/technetwork/tutorials/tutorials- 1876574.html. 9
  • 10. Conclusion ● More configurations are found on http://tajo. apache.org/docs/current/configuration/tajo-site- xml.html ● But, Tajo works well with default configurations! 10
  • 11. Get Involved! ● We are recruiting contributors! ● General ○ http://tajo.apache.org/ ● Getting Started ○ http://tajo.apache.org/docs/current/getting_started.html ● Downloads ○ http://tajo.apache.org/downloads.html ● Issue tracker ○ http://issues.apache.org/jira/browse/TAJO ● Join the mailing list ○ dev-subscribe@tajo.apache.org ○ issues-subscribe@tajo.apache.org 11