2. Preliminary
● This slides assume that your cluster is installed in
fully distributed mode
○ For cluster setup, please see http://tajo.apache.
org/docs/current/configuration/cluster_setup.
html.
2
4. Cluster Resource
● How efficiently utilize cluster resources
○ Ideally ...
■ In every machine
● Every core does some computation
● The whole memory is used
● Disk bandwidth is fully utilized
4
5. Cluster Resource
● Configurations
○ tajo-env.sh
■ TAJO_WORKER_HEAPSIZE
● Heap size allocated for each worker
● Default: 5000 MB (0.11.1 ~)
○ tajo-site.xml
■ tajo.worker.resource.disks
● # of disks per node
■ tajo.task.resource.min.memory-mb
● Minimum amount of allocatable memory per task
● Default: 1000 MB 5
7. Concurrent Disk Access
● Configurations
○ tajo-site.xml
■ tajo.worker.resource.disk.parallel-execution.num
● # of tasks assigned per disk
● Default: 2
○ Increase for SSD
■ tajo.worker.tmpdir.locations
● Temporal directories which are used for query
execution
● Recommended to use all available disks
7
9. Garbage Collection
● Migrate ParallelOldGC to G1GC when
○ Full GC durations are too long or too frequent
○ The rate of object allocation rate or promotion
varies significantly
○ Undesired long garbage collection or compaction
pauses (longer than 0.5 to 1 second)
● For further information, please see http://www.
oracle.com/technetwork/tutorials/tutorials-
1876574.html.
9
10. Conclusion
● More configurations are found on http://tajo.
apache.org/docs/current/configuration/tajo-site-
xml.html
● But, Tajo works well with default
configurations!
10
11. Get Involved!
● We are recruiting contributors!
● General
○ http://tajo.apache.org/
● Getting Started
○ http://tajo.apache.org/docs/current/getting_started.html
● Downloads
○ http://tajo.apache.org/downloads.html
● Issue tracker
○ http://issues.apache.org/jira/browse/TAJO
● Join the mailing list
○ dev-subscribe@tajo.apache.org
○ issues-subscribe@tajo.apache.org
11