SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
NoSQL Now!
Aug 21, 2013
Ben Wen, Joyent
Renat Khasanshyn, Altoros
About Joyent
 The high-performance public cloud
infrastructure provider
 Cloud IaaS Virtual Machines:
 Linux, Windows, BSD, SmartOS
(fka Solaris) with Zones
 Core founding sponsors of Node.js
 Four global datacenters
 Key markets:
 Big data, mobile, e-commerce,
finsvc, SaaS
 Open Source contributions:
 Node.js, KVM, DTrace, ZFS,
SmartOS
4
 Running bare-metal only practical for some organizations
 Performance varies significantly across various job types
 In fact, for many jobs, less = more
 Utilization of most clusters in production is low
 Optimizing Hadoop/MapReduce performance is hard
5
 Get upset when truth comes out!
 Biased (to the shiny side of the coin)
 Often add controversy and confusion
6
- For Hadoop, what is the impact of Container-based virtualization vs Hardware
emulation (KVM)*
- What are the Hadoop optimization strategies? Is there a “rule of thumb” when it
comes to determining the optimization approach?
- What are the optimal Hadoop cluster settings for 1TB TeraSort benchmark on
100 and 400 node clusters running Linux and SmartOS on the Joyent Public
Cloud?
7
Physical (disks, cpu, network)
OS/Hypervisor (especially for virtualized environments)
Hadoop/MapReduce (tons of settings)
Algorithmic (data structures, join strategies, big-O…)
Implementation (code efficiency, architecture decisions that fit all other factors)
8
Open source Unix operating system based on the active
fork of Open Solaris technology (illumos) for the cloud.
Uses containerized OS virtualization, called Zones (think a
mature LXC with secure RBAC and auditing)
operating system based on the Debian
Linux distribution and distributed as free
and open source software.
Apache Hadoop is an open-source software framework that
supports data-intensive distributed applications, licensed
under the Apache v2 license. Derived from Google's
MapReduce and Google File System (GFS) papers, Hadoop
enables applications to work with thousands of computation-
independent computers and petabytes of data.
9
Written by Opscode and released as open source under the
Apache License 2.0., Chef is a DevOps tool used for configuring
cloud services or to streamline the task of configuring a
company's internal servers. Chef automatically sets up and
tweaks the operating systems and programs that run in massive
data centers.
Developed by creators of the Starfish project from Duke
University, Unravel brings run-time profiling of Hadoop jobs
followed by a cost-based database query optimization. Unravel
connects to streams of Hadoop and system instrumentation
data, and applies statistical machine learning to optimize cost of
Hadoop jobs and increase cluster utilization.
1
0
Comparing I/O Path on
Bare Metal Unix Vs Zones Vs KVM
• Code path is essentially
the same as bare metal
• Zones partition at the OS
level
• Performance is higher
• KVM is encapsulated
by hypervisor
• Code path is much
more circuitous in a
KVM process.
• Performance is
impacted
Bare-metal OS Virtualization Kernel Virtualization
1
1
No over
head for
Zones:
Stack traces
show how a
network
packet is
transmitted
from:
Bare Metal
vs
Joyent Zone
vs
Fedora VM
on KVM
Bare Metal Joyent Zone (aka SmartMachine) Fedora VM on KVM VM
Start Start Start
1 kernel`start_xmit
2 kernel`dtrace_int3_handler+0xd2
3 kernel`kmem_cache_free+0x2f
4 kernel`dtrace_int3+0x3a
5 kernel`eth_header
6 kernel`__kfree_skb+0x47
7 kernel`start_xmit+0x1
8 kernel`dev_hard_start_xmit+0x322
9 kernel`sch_direct_xmit+0xef
10 kernel`dev_queue_xmit+0x184
11 kernel`eth_header+0x3a
12 kernel`neigh_resolve_output+0x11e
13 kernel`nf_hook_slow+0x75
14 kernel`ip_finish_output
15 kernel`ip_finish_output+0x17e
16 kernel`ip_output+0x98
17 kernel`__ip_local_out+0xa4
18 kernel`ip_local_out+0x29
19 kernel`ip_queue_xmit+0x14f
20 kernel`tcp_transmit_skb+0x3e4
21 kernel`__kmalloc_node_track_caller+0x185
22 kernel`sk_stream_alloc_skb+0x41
23 kernel`tcp_write_xmit+0xf7
24 kernel`__alloc_skb+0x8c
25 kernel`__tcp_push_pending_frames+0x26
26 kernel`tcp_sendmsg+0x895
27 kernel`inet_sendmsg+0x64
28 kernel`sock_aio_write+0x13a
29 kernel`do_sync_write+0xd2
30 kernel`security_file_permission+0x2c
31 kernel`rw_verify_area+0x61
32 kernel`vfs_write+0x16d
33 kernel`sys_write+0x4a
34 kernel`sys_rt_sigprocmask+0x84
35 kernel`system_call_fastpath+0x16
36 igb`igb_tx_ring_send+0x33
37 mac`mac_hwring_tx+0x1d
38 mac`mac_tx_send+0x5dc
39 mac`mac_tx_single_ring_mode+0x6e
mac`mac_tx+0xda mac`mac_tx+0xda mac`mac_tx+0xda
dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53
ip`ip_xmit+0x82d ip`ip_xmit+0x82d ip`ip_xmit+0x82d
ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9
ip`conn_ip_output+0x190 ip`conn_ip_output+0x190 ip`conn_ip_output+0x190
ip`tcp_send_data+0x59 ip`tcp_send_data+0x59 ip`tcp_send_data+0x59
ip`tcp_output+0x58c ip`tcp_output+0x58c ip`tcp_output+0x58c
ip`squeue_enter+0x426 ip`squeue_enter+0x426 ip`squeue_enter+0x426
ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f
sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b
sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48
sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c
genunix`fop_write+0x8b genunix`fop_write+0x8b genunix`fop_write+0x8b
genunix`write+0x250 genunix`write+0x250 genunix`write+0x250
genunix`write32+0x1e genunix`write32+0x1e genunix`write32+0x1e
unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x149
Skips stepping
through
39 functions
required
when Fedora
is running on
KVM/qemu
Note that
a Joyent Zone
is exactly the
same as “Bare
Metal”
Three identical Apache Hadoop 1.0.4 clusters were provisioned on Joyent
infrastructure using Joyent REST API and Opscode Chef
Each cluster was tweaked for optimal performance following best practices for
TeraSort benchmark.
13
A custom script launches virtual machines using Joyent API and stores information
about them in a json file.
14
Each machine in cluster is being configured according to its role in cluster using
Chef cookbooks.
15
As part of TeraSort benchmark a dataset is generated using TeraGen utility
included in Apache Hadoop.
16
On one of the nodes a Hadoop TeraSort job using previously generated dataset is
submitted.
17
See: Hadoop job_201210261134_0010 on hadoop-smartos-r-1.html
The key difference between the two clusters was unveiled when monitoring I/O and
CPU utilization. Ubuntu cluster was spending too much time in OS kernel while
performing I/O operations as demonstrated on Figure 1.
SmartOS cluster was using CPU much more efficiently and was able to utilize larger
number of Hadoop mappers and reducers, key configuration parameters for Hadoop:
20
21
22
The key difference
between the clusters was
unveiled when monitoring
I/O and CPU utilization.
Ubuntu cluster was
spending too much time in
OS kernel while performing
I/O (for copies of config
files and job reports –
email
renat.k@altoros.com)
24
1) Basic cluster configuration is key (one time effort for typical workloads)
DATA DISK SCALING
COMPRESSION
JVM REUSE POLICY
HDFS BLOCK SIZE
MAP-SIDE SPILLS
COPY/SHUFFLE PHASE TUNING
REDUCE-SIDE SPILLS
2) Tune the number of map and reduce tasks appropriately
3) Consider GPU for some workloads
25
• Forthcoming in October
• Includes cloud performance
• Co-author DTrace book
• More here on his techniques:
• http://dtrace.org/blogs/brendan/
26
Thank you!
Ben Wen: benwen@joyent.com
Renat Khasanshyn: renat.k@altoros.com
@renatco (650) 395-7002

Más contenido relacionado

La actualidad más candente

Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performanceDataWorks Summit
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuningAnil Reddy
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedInAllen Wittenauer
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedInDataWorks Summit
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.finalHortonworks
 

La actualidad más candente (20)

Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
ha_module5
ha_module5ha_module5
ha_module5
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014Treasure Data on The YARN - Hadoop Conference Japan 2014
Treasure Data on The YARN - Hadoop Conference Japan 2014
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
10c introduction
10c introduction10c introduction
10c introduction
 
Hadoop Operations at LinkedIn
Hadoop Operations at LinkedInHadoop Operations at LinkedIn
Hadoop Operations at LinkedIn
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 

Destacado

Cloud Computing Assignment 3
Cloud Computing Assignment 3Cloud Computing Assignment 3
Cloud Computing Assignment 3Gurpreet singh
 
Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production
Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production
Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production Ceph Community
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Assignment on Cloud Computing
Assignment on Cloud ComputingAssignment on Cloud Computing
Assignment on Cloud ComputingAl Shahriar
 
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 

Destacado (7)

Cloud Computing Assignment 3
Cloud Computing Assignment 3Cloud Computing Assignment 3
Cloud Computing Assignment 3
 
Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production
Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production
Ceph Day SF 2015 - SysAdmin's Toolbox: Tools for Running Ceph in Production
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Assignment on Cloud Computing
Assignment on Cloud ComputingAssignment on Cloud Computing
Assignment on Cloud Computing
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 

Similar a NoSQL Now! Virtualization Impact on Hadoop Performance

The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Gluecon Preso: Hybrid Container Infrastructure
Gluecon Preso: Hybrid Container InfrastructureGluecon Preso: Hybrid Container Infrastructure
Gluecon Preso: Hybrid Container Infrastructurerhirschfeld
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructuredevopsdaysaustin
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Krishna-Kumar
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna KumarCodeOps Technologies LLP
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)Robert Grossman
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...OpenShift Origin
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 julyDiego Rodriguez
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarArun Kumar
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarArun Kumar
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a productCarlo Daffara
 
Making Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a ProductMaking Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a ProductNETWAYS
 

Similar a NoSQL Now! Virtualization Impact on Hadoop Performance (20)

The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Gluecon Preso: Hybrid Container Infrastructure
Gluecon Preso: Hybrid Container InfrastructureGluecon Preso: Hybrid Container Infrastructure
Gluecon Preso: Hybrid Container Infrastructure
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
 
Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)Why kubernetes for Serverless (FaaS)
Why kubernetes for Serverless (FaaS)
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Linux one vs x86
Linux one vs x86 Linux one vs x86
Linux one vs x86
 
Linux one vs x86 18 july
Linux one vs x86 18 julyLinux one vs x86 18 july
Linux one vs x86 18 july
 
Rhel7 vs rhel6
Rhel7 vs rhel6Rhel7 vs rhel6
Rhel7 vs rhel6
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumar
 
Cloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumarCloudexpowest opensourcecloudcomputing-1by arun kumar
Cloudexpowest opensourcecloudcomputing-1by arun kumar
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a product
 
Making Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a ProductMaking Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a Product
 

Más de Altoros

Maturing with Kubernetes
Maturing with KubernetesMaturing with Kubernetes
Maturing with KubernetesAltoros
 
Kubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentKubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentAltoros
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityAltoros
 
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksSGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksAltoros
 
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Altoros
 
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
A Zero-Knowledge Proof:  Improving Privacy on a BlockchainA Zero-Knowledge Proof:  Improving Privacy on a Blockchain
A Zero-Knowledge Proof: Improving Privacy on a BlockchainAltoros
 
Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Altoros
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and KubernetesAltoros
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingAltoros
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple NodesAltoros
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayAltoros
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for KubernetesAltoros
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryAltoros
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFAltoros
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedAltoros
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsAltoros
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionAltoros
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesAltoros
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoTAltoros
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentAltoros
 

Más de Altoros (20)

Maturing with Kubernetes
Maturing with KubernetesMaturing with Kubernetes
Maturing with Kubernetes
 
Kubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity AssessmentKubernetes Platform Readiness and Maturity Assessment
Kubernetes Platform Readiness and Maturity Assessment
 
Journey Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment MaturityJourney Through Four Stages of Kubernetes Deployment Maturity
Journey Through Four Stages of Kubernetes Deployment Maturity
 
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain NetworksSGX: Improving Privacy, Security, and Trust Across Blockchain Networks
SGX: Improving Privacy, Security, and Trust Across Blockchain Networks
 
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
Using the Cloud Foundry and Kubernetes Stack as a Part of a Blockchain CI/CD ...
 
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
A Zero-Knowledge Proof:  Improving Privacy on a BlockchainA Zero-Knowledge Proof:  Improving Privacy on a Blockchain
A Zero-Knowledge Proof: Improving Privacy on a Blockchain
 
Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.Crap. Your Big Data Kitchen Is Broken.
Crap. Your Big Data Kitchen Is Broken.
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter Trading
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with Kubespray
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for Kubernetes
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud Foundry
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCF
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment Unattended
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based Solution
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 

Último

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

NoSQL Now! Virtualization Impact on Hadoop Performance

  • 1. NoSQL Now! Aug 21, 2013 Ben Wen, Joyent Renat Khasanshyn, Altoros
  • 2. About Joyent  The high-performance public cloud infrastructure provider  Cloud IaaS Virtual Machines:  Linux, Windows, BSD, SmartOS (fka Solaris) with Zones  Core founding sponsors of Node.js  Four global datacenters  Key markets:  Big data, mobile, e-commerce, finsvc, SaaS  Open Source contributions:  Node.js, KVM, DTrace, ZFS, SmartOS
  • 3. 4  Running bare-metal only practical for some organizations  Performance varies significantly across various job types  In fact, for many jobs, less = more  Utilization of most clusters in production is low  Optimizing Hadoop/MapReduce performance is hard
  • 4. 5  Get upset when truth comes out!  Biased (to the shiny side of the coin)  Often add controversy and confusion
  • 5. 6 - For Hadoop, what is the impact of Container-based virtualization vs Hardware emulation (KVM)* - What are the Hadoop optimization strategies? Is there a “rule of thumb” when it comes to determining the optimization approach? - What are the optimal Hadoop cluster settings for 1TB TeraSort benchmark on 100 and 400 node clusters running Linux and SmartOS on the Joyent Public Cloud?
  • 6. 7 Physical (disks, cpu, network) OS/Hypervisor (especially for virtualized environments) Hadoop/MapReduce (tons of settings) Algorithmic (data structures, join strategies, big-O…) Implementation (code efficiency, architecture decisions that fit all other factors)
  • 7. 8 Open source Unix operating system based on the active fork of Open Solaris technology (illumos) for the cloud. Uses containerized OS virtualization, called Zones (think a mature LXC with secure RBAC and auditing) operating system based on the Debian Linux distribution and distributed as free and open source software. Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. Derived from Google's MapReduce and Google File System (GFS) papers, Hadoop enables applications to work with thousands of computation- independent computers and petabytes of data.
  • 8. 9 Written by Opscode and released as open source under the Apache License 2.0., Chef is a DevOps tool used for configuring cloud services or to streamline the task of configuring a company's internal servers. Chef automatically sets up and tweaks the operating systems and programs that run in massive data centers. Developed by creators of the Starfish project from Duke University, Unravel brings run-time profiling of Hadoop jobs followed by a cost-based database query optimization. Unravel connects to streams of Hadoop and system instrumentation data, and applies statistical machine learning to optimize cost of Hadoop jobs and increase cluster utilization.
  • 9. 1 0 Comparing I/O Path on Bare Metal Unix Vs Zones Vs KVM • Code path is essentially the same as bare metal • Zones partition at the OS level • Performance is higher • KVM is encapsulated by hypervisor • Code path is much more circuitous in a KVM process. • Performance is impacted Bare-metal OS Virtualization Kernel Virtualization
  • 10. 1 1 No over head for Zones: Stack traces show how a network packet is transmitted from: Bare Metal vs Joyent Zone vs Fedora VM on KVM Bare Metal Joyent Zone (aka SmartMachine) Fedora VM on KVM VM Start Start Start 1 kernel`start_xmit 2 kernel`dtrace_int3_handler+0xd2 3 kernel`kmem_cache_free+0x2f 4 kernel`dtrace_int3+0x3a 5 kernel`eth_header 6 kernel`__kfree_skb+0x47 7 kernel`start_xmit+0x1 8 kernel`dev_hard_start_xmit+0x322 9 kernel`sch_direct_xmit+0xef 10 kernel`dev_queue_xmit+0x184 11 kernel`eth_header+0x3a 12 kernel`neigh_resolve_output+0x11e 13 kernel`nf_hook_slow+0x75 14 kernel`ip_finish_output 15 kernel`ip_finish_output+0x17e 16 kernel`ip_output+0x98 17 kernel`__ip_local_out+0xa4 18 kernel`ip_local_out+0x29 19 kernel`ip_queue_xmit+0x14f 20 kernel`tcp_transmit_skb+0x3e4 21 kernel`__kmalloc_node_track_caller+0x185 22 kernel`sk_stream_alloc_skb+0x41 23 kernel`tcp_write_xmit+0xf7 24 kernel`__alloc_skb+0x8c 25 kernel`__tcp_push_pending_frames+0x26 26 kernel`tcp_sendmsg+0x895 27 kernel`inet_sendmsg+0x64 28 kernel`sock_aio_write+0x13a 29 kernel`do_sync_write+0xd2 30 kernel`security_file_permission+0x2c 31 kernel`rw_verify_area+0x61 32 kernel`vfs_write+0x16d 33 kernel`sys_write+0x4a 34 kernel`sys_rt_sigprocmask+0x84 35 kernel`system_call_fastpath+0x16 36 igb`igb_tx_ring_send+0x33 37 mac`mac_hwring_tx+0x1d 38 mac`mac_tx_send+0x5dc 39 mac`mac_tx_single_ring_mode+0x6e mac`mac_tx+0xda mac`mac_tx+0xda mac`mac_tx+0xda dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53 dld`str_mdata_fastpath_put+0x53 ip`ip_xmit+0x82d ip`ip_xmit+0x82d ip`ip_xmit+0x82d ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9 ip`ire_send_wire_v4+0x3e9 ip`conn_ip_output+0x190 ip`conn_ip_output+0x190 ip`conn_ip_output+0x190 ip`tcp_send_data+0x59 ip`tcp_send_data+0x59 ip`tcp_send_data+0x59 ip`tcp_output+0x58c ip`tcp_output+0x58c ip`tcp_output+0x58c ip`squeue_enter+0x426 ip`squeue_enter+0x426 ip`squeue_enter+0x426 ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f ip`tcp_sendmsg+0x14f sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b sockfs`so_sendmsg+0x26b sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48 sockfs`socket_sendmsg+0x48 sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c sockfs`socket_vop_write+0x6c genunix`fop_write+0x8b genunix`fop_write+0x8b genunix`fop_write+0x8b genunix`write+0x250 genunix`write+0x250 genunix`write+0x250 genunix`write32+0x1e genunix`write32+0x1e genunix`write32+0x1e unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x14 unix`_sys_sysenter_post_swapgs+0x149 Skips stepping through 39 functions required when Fedora is running on KVM/qemu Note that a Joyent Zone is exactly the same as “Bare Metal”
  • 11. Three identical Apache Hadoop 1.0.4 clusters were provisioned on Joyent infrastructure using Joyent REST API and Opscode Chef Each cluster was tweaked for optimal performance following best practices for TeraSort benchmark.
  • 12. 13 A custom script launches virtual machines using Joyent API and stores information about them in a json file.
  • 13. 14 Each machine in cluster is being configured according to its role in cluster using Chef cookbooks.
  • 14. 15 As part of TeraSort benchmark a dataset is generated using TeraGen utility included in Apache Hadoop.
  • 15. 16 On one of the nodes a Hadoop TeraSort job using previously generated dataset is submitted.
  • 16. 17 See: Hadoop job_201210261134_0010 on hadoop-smartos-r-1.html The key difference between the two clusters was unveiled when monitoring I/O and CPU utilization. Ubuntu cluster was spending too much time in OS kernel while performing I/O operations as demonstrated on Figure 1.
  • 17. SmartOS cluster was using CPU much more efficiently and was able to utilize larger number of Hadoop mappers and reducers, key configuration parameters for Hadoop:
  • 18.
  • 19. 20
  • 20. 21
  • 21. 22
  • 22. The key difference between the clusters was unveiled when monitoring I/O and CPU utilization. Ubuntu cluster was spending too much time in OS kernel while performing I/O (for copies of config files and job reports – email renat.k@altoros.com)
  • 23. 24 1) Basic cluster configuration is key (one time effort for typical workloads) DATA DISK SCALING COMPRESSION JVM REUSE POLICY HDFS BLOCK SIZE MAP-SIDE SPILLS COPY/SHUFFLE PHASE TUNING REDUCE-SIDE SPILLS 2) Tune the number of map and reduce tasks appropriately 3) Consider GPU for some workloads
  • 24. 25 • Forthcoming in October • Includes cloud performance • Co-author DTrace book • More here on his techniques: • http://dtrace.org/blogs/brendan/
  • 25. 26 Thank you! Ben Wen: benwen@joyent.com Renat Khasanshyn: renat.k@altoros.com @renatco (650) 395-7002