SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Benchmarking Sahara-based
Big-Data-as-a-Service Solutions
Zhidong Yu, Weiting Chen (Intel)
Trevor McKay (Red Hat)
May 2015
o Why Sahara
o Sahara introduction
o Deployment considerations
o Performance testing and results
o Future envisioning
o Summary and Call to Action
2
Agenda
o You or someone at your company is using AWS, Azure, or Google
o You’re probably doing it for easy access to OS instances, but also
the modern application features, e.g. AWS’ EMR or RDS or Storage
o Migrating to, or even using, OpenStack infrastructure for workloads
means having application features, e.g. Sahara & Trove
o Writing applications is complex enough without having to manage
supporting (non-value-add) infrastructure
3
Why Sahara: Cloud features
o Writing data analysis applications are especially hard
o Complexity in acquiring data
o Complexity in organizing (ETL) data
o Complexity in analyzing data
o Complexity in integrating analysis into applications (bonus!)
o This compounded complexity does not even include the
necessary tooling and infrastructure
4
Why Sahara: Data analysis
o Why Sahara
o Sahara introduction
o Deployment considerations
o Performance testing and results
o Future envisioning
o Summary and Call to Action
5
Agenda
o Repeatable cluster provisioning and management
operations
o Data processing workflows (EDP)
o Cluster scaling (elasticity), Storage integration
(Swift, Cinder, HCFS)
o Network and security group (firewall) integration
o Service anti-affinity (fault domains & efficiency)
6
Sahara features
7
Sahara
architecture
o Users get choice of integrated data progressing
engines
o Vendors get a way to integrate with OpenStack and
access users
o Upstream - Apache Hadoop (Vanilla), Hortonworks,
Cloudera, MapR, Apache Spark, Apache Storm
o Downstream - depends on your OpenStack vendor
8
Sahara plugins
o Why Sahara
o Sahara introduction
o Deployment considerations
o Performance testing and results
o Future envisioning
o Summary and Call to Action
9
Agenda
10
Storage Architecture
#2 #4#3#1
Host
HDFS
VM
Comput-
ing Task
VM
Comput-
ing Task
Host
HDFS
VM
Comput-
ing Task
VM
HDFS
Host
HDFS
VM
Comput-
ing Task
VM
Comput-
ing Task
Legacy
NFS
GlusterFS Ceph
External
HDFS Swift
HDFS
Scenario #1: computing and data service collocate in the
VMs
Scenario #2: data service locates in the host world
Scenario #3: data service locates in a separate VM world
Scenario #4: data service locates in the remote network
o Tenant provisioned (in VM)
o HDFS in the same VMs of computing tasks vs. in
the different VMs
o Ephemeral disk vs. Cinder volume
o Admin provided
o Logically disaggregated from computing tasks
o Physical collocation is a matter of deployment
o For network remote storage, Neutron DVR is very
useful feature
o A disaggregated (and centralized) storage system has
significant values
o No data silos, more business opportunities
o Could leverage Manila service
o Allow to create advanced solutions (.e.g. in-memory
overlayer)
o More vendor specific optimization opportunities
o Container seems to be promising but still need better support
o Determining the appropriate cluster size is always a challenge to tenants
o e.g. small flavor with more nodes or large flavor with less nodes
11
Compute Engine
Pros Cons
VM
• Best support in OpenStack
• Strong security
• Slow to provision
• Relatively high runtime performance overhead
Container
• Light-weight, fast provisioning
• Better runtime performance than VM
• Nova-docker readiness
• Cinder volume support is not ready yet
• Weaker security than VM
• Not the ideal way to use container
Bare-Metal
• Best performance and QoS
• Best security isolation
• Ironic readiness
• Worst efficiency (e.g. consolidation of workloads with
different behaviors)
• Worst flexibility (e.g. migration)
• Worst elasticity due to slow provisioning
o Direct Cluster Operations
o Sahara is used as a provisioning engine
o Tenants expect to have direct access to the virtual cluster
o e.g. directly SSH into the VMs
o May use whatever APIs comes with the distro
o e.g. Oozie
o EDP approach
o Sahara’s EDP is designed to be an abstraction layer for tenants to consume the services
o Ideally should be vendor neutral and plugin agnostic
o Limited job types are supported at present
o 3rd party abstraction APIs
o Not supported yet
o e.g. Cask CDAP
Data Processing API
Deployment Considerations Matrix
Storage
Compute
Distro/Plugin
Data Processing API
Vanilla CDH HDP MapRSpark
VM Container Bare-metal
Tenant vs. Admin
provisioned
Disaggregated vs. Collocated HDFS vs. other options
Traditional EDP
(Sahara native)
3rd party APIs
Storm
Performance results in the next section
o Why Sahara
o Sahara introduction
o Deployment considerations
o Performance testing and results
o Future envisioning
o Summary and Call to Action
14
Agenda
● Host OS: CentOS7
● Guest OS: CentOS7
● Hadoop 2.6.0
● 4-Nodes Cluster
● Baremetal
●OpenStack using KVM
○qemu-kvm v1.5
●OpenStack using Docker
○Docker v1.6
15
Testing Environment
Ephemeral Disk Performance
Host
HDFS
Host
Nova Inst.
Store
VM
HDFS
VM
HDFS
RAID
…..
…..vs.
● 1.3x read overhead, 2.1x overhead
○ disk access pattern change: 10%
○ virtualization overhead: 90%
■ 60% due to I/O overhead
■ 30% due to memory efficiency in virtualization
Heavy tuning is required.
1.3x overhead
2.1x overhead
o Performance difference comes from several factors:
o Free of I/O virtualization overhead
o DVR bring huge performance enhancement
o Location awareness(HVE) is not enabled
External HDFS Performance
Host
HDFS
VMVM
Host
Nova Inst.
Store
VM
HDFS
VM
HDFS
RAID
…..
….. vs.
RAID
2x improvement
1.3x overhead
Host
Swift Performance
o Similar to external HDFS but even worse
o Without enabling location awareness, all the traffic go through the Swift
proxy node
Swift
…..
VMVM
Host
Nova Inst.
Store
VM
HDFS
VM
HDFS
RAID
…..
….. vs.
RAID
1.35x overhead
● For an I/O intensive workload, 2.19x overhead is big but is consistent with previous
results.
● Container demonstrates a fair performance compared to KVM
○ considering OpenStack services also consumes resources, 0.46x is not that bad.
Bare-metal vs. Container vs. VM
Host
Container
Host Host
VM
HDFS
HDFS
HDFS
2.19X
1.46X
1X
o Why Sahara
o Sahara introduction
o Deployment considerations
o Performance testing and results
o Future envisioning
o Summary and Call to Action
20
Agenda
o Architectured for disaggregated computing and storage
o Supporting more storage backend
o Integration with Manila
o Better support for container and bare-metal (Nova-docker and Ironic)
o Magnum as an alternative?
o EDP as a PaaS like layer for Sahara core provisioning engine
o Data connector abstraction
o Workflow management
o Policy engine for resource and SLA management
o Auto-scale, auto-tune
o Sahara needs to offer broader vendor integration opportunities (not just engines)
o A complete big data stack may have many options at each layer
o e.g. acceleration libraries, analytics, developer oriented application framework (e.g.
CDAP)
o Requires a more generic plugin/driver framework to support it
21
Future of Sahara? (NOT a roadmap)
o Upgrade HDP plugin to HDP 2.2
oHadoop HA for CDH and HDP
oEnhancements to provisioning through Heat
oBring Sahara to python-openstackclient
oBare-metal clusters with Ironic
oSecurity enhancements
oEDP enhancements
oJob scheduling
oCoordination
oLog retrieval
oImproved parameter specification
22
Liberty Roadmap Highlights
o Great improvement in Sahara Kilo release. Production ready with
real customer deployments.
o A complete Big-Data-as-a-Service solution requires more
considerations than simply adding a Sahara service to the
existing OpenStack deployment
o Preliminary benchmark results show the performance gap with
bare-metal is still huge. Tuning and optimizations are required.
o Many features could be added to enhance Sahara. Opportunities
exist for various types of vendors.
23
Summary and Call-to-Action
Join in the Sahara community and make it even more vibrant!
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware,
software or service activation. Performance varies depending on system configuration. No computer system can
be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the
results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products.
Include as footnote in appropriate slide where performance data is shown:
o §Configurations: [describe config + what test used + who did testing]
o §For more information go to http://www.intel.com/performance.
Intel, the Intel logo, {List the Intel trademarks in your document} are trademarks of Intel Corporation in the U.S.
and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2015 Intel Corporation.
24
Disclaimer

Más contenido relacionado

La actualidad más candente

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...spinningmatt
 
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackSergey Lukjanov
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Juan Pedro Moreno
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
Integrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther KundinIntegrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther KundinDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Guy Harrison
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLliuknag
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigDataWorks Summit/Hadoop Summit
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...DataWorks Summit
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila finalWei Ting Chen
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Sergey Lukjanov
 

La actualidad más candente (20)

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Integrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther KundinIntegrating Existing C++ Libraries into PySpark with Esther Kundin
Integrating Existing C++ Libraries into PySpark with Esther Kundin
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 

Destacado

Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Codemotion
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014Sergey Lukjanov
 
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...Leila Esmaeili
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
آشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ایآشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ایRamin Najjarbashi
 
Cloud Security and Risk Management
Cloud Security and Risk ManagementCloud Security and Risk Management
Cloud Security and Risk ManagementMorteza Javan
 
The Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseThe Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseRackspace
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStackNati Shalom
 

Destacado (10)

Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015Sahara presentation latest - Codemotion Rome 2015
Sahara presentation latest - Codemotion Rome 2015
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
از نماینده ایران در WSIS Prizes 2016 حمایت کنید ... متشکریم ...
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Sahara Updates - Kilo Edition
Sahara Updates - Kilo EditionSahara Updates - Kilo Edition
Sahara Updates - Kilo Edition
 
آشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ایآشنایی با جرم‌یابی قانونی رایانه‌ای
آشنایی با جرم‌یابی قانونی رایانه‌ای
 
Cloud Security and Risk Management
Cloud Security and Risk ManagementCloud Security and Risk Management
Cloud Security and Risk Management
 
The Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to EnterpriseThe Evolution of OpenStack – From Infancy to Enterprise
The Evolution of OpenStack – From Infancy to Enterprise
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
 

Similar a Benchmarking Sahara-based Big Data Solutions

Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftSerhat Dirik
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Community
 
Making workload nomadic when accelerated
Making workload nomadic when acceleratedMaking workload nomadic when accelerated
Making workload nomadic when acceleratedZhipeng Huang
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Toronto-Oracle-Users-Group
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analyticsSouth West Data Meetup
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Die pacman nomaden opnfv summit 2016 berlin
Die pacman nomaden opnfv summit 2016 berlinDie pacman nomaden opnfv summit 2016 berlin
Die pacman nomaden opnfv summit 2016 berlinZhipeng Huang
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesScyllaDB
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergenceinside-BigData.com
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Fran Navarro
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16MLconf
 
Coherence RoadMap 2018
Coherence RoadMap 2018Coherence RoadMap 2018
Coherence RoadMap 2018harvraja
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsDataWorks Summit
 
SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQLPASSTW
 

Similar a Benchmarking Sahara-based Big Data Solutions (20)

Cloud Native Applications on OpenShift
Cloud Native Applications on OpenShiftCloud Native Applications on OpenShift
Cloud Native Applications on OpenShift
 
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store Ceph Day Beijing: Big Data Analytics on Ceph Object Store
Ceph Day Beijing: Big Data Analytics on Ceph Object Store
 
Making workload nomadic when accelerated
Making workload nomadic when acceleratedMaking workload nomadic when accelerated
Making workload nomadic when accelerated
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Die pacman nomaden opnfv summit 2016 berlin
Die pacman nomaden opnfv summit 2016 berlinDie pacman nomaden opnfv summit 2016 berlin
Die pacman nomaden opnfv summit 2016 berlin
 
Lookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million DevicesLookout on Scaling Security to 100 Million Devices
Lookout on Scaling Security to 100 Million Devices
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
 
Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster Simplify IT: Oracle SuperCluster
Simplify IT: Oracle SuperCluster
 
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
 
Coherence RoadMap 2018
Coherence RoadMap 2018Coherence RoadMap 2018
Coherence RoadMap 2018
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloads
 
SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1
 

Último

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Benchmarking Sahara-based Big Data Solutions

  • 1. Benchmarking Sahara-based Big-Data-as-a-Service Solutions Zhidong Yu, Weiting Chen (Intel) Trevor McKay (Red Hat) May 2015
  • 2. o Why Sahara o Sahara introduction o Deployment considerations o Performance testing and results o Future envisioning o Summary and Call to Action 2 Agenda
  • 3. o You or someone at your company is using AWS, Azure, or Google o You’re probably doing it for easy access to OS instances, but also the modern application features, e.g. AWS’ EMR or RDS or Storage o Migrating to, or even using, OpenStack infrastructure for workloads means having application features, e.g. Sahara & Trove o Writing applications is complex enough without having to manage supporting (non-value-add) infrastructure 3 Why Sahara: Cloud features
  • 4. o Writing data analysis applications are especially hard o Complexity in acquiring data o Complexity in organizing (ETL) data o Complexity in analyzing data o Complexity in integrating analysis into applications (bonus!) o This compounded complexity does not even include the necessary tooling and infrastructure 4 Why Sahara: Data analysis
  • 5. o Why Sahara o Sahara introduction o Deployment considerations o Performance testing and results o Future envisioning o Summary and Call to Action 5 Agenda
  • 6. o Repeatable cluster provisioning and management operations o Data processing workflows (EDP) o Cluster scaling (elasticity), Storage integration (Swift, Cinder, HCFS) o Network and security group (firewall) integration o Service anti-affinity (fault domains & efficiency) 6 Sahara features
  • 8. o Users get choice of integrated data progressing engines o Vendors get a way to integrate with OpenStack and access users o Upstream - Apache Hadoop (Vanilla), Hortonworks, Cloudera, MapR, Apache Spark, Apache Storm o Downstream - depends on your OpenStack vendor 8 Sahara plugins
  • 9. o Why Sahara o Sahara introduction o Deployment considerations o Performance testing and results o Future envisioning o Summary and Call to Action 9 Agenda
  • 10. 10 Storage Architecture #2 #4#3#1 Host HDFS VM Comput- ing Task VM Comput- ing Task Host HDFS VM Comput- ing Task VM HDFS Host HDFS VM Comput- ing Task VM Comput- ing Task Legacy NFS GlusterFS Ceph External HDFS Swift HDFS Scenario #1: computing and data service collocate in the VMs Scenario #2: data service locates in the host world Scenario #3: data service locates in a separate VM world Scenario #4: data service locates in the remote network o Tenant provisioned (in VM) o HDFS in the same VMs of computing tasks vs. in the different VMs o Ephemeral disk vs. Cinder volume o Admin provided o Logically disaggregated from computing tasks o Physical collocation is a matter of deployment o For network remote storage, Neutron DVR is very useful feature o A disaggregated (and centralized) storage system has significant values o No data silos, more business opportunities o Could leverage Manila service o Allow to create advanced solutions (.e.g. in-memory overlayer) o More vendor specific optimization opportunities
  • 11. o Container seems to be promising but still need better support o Determining the appropriate cluster size is always a challenge to tenants o e.g. small flavor with more nodes or large flavor with less nodes 11 Compute Engine Pros Cons VM • Best support in OpenStack • Strong security • Slow to provision • Relatively high runtime performance overhead Container • Light-weight, fast provisioning • Better runtime performance than VM • Nova-docker readiness • Cinder volume support is not ready yet • Weaker security than VM • Not the ideal way to use container Bare-Metal • Best performance and QoS • Best security isolation • Ironic readiness • Worst efficiency (e.g. consolidation of workloads with different behaviors) • Worst flexibility (e.g. migration) • Worst elasticity due to slow provisioning
  • 12. o Direct Cluster Operations o Sahara is used as a provisioning engine o Tenants expect to have direct access to the virtual cluster o e.g. directly SSH into the VMs o May use whatever APIs comes with the distro o e.g. Oozie o EDP approach o Sahara’s EDP is designed to be an abstraction layer for tenants to consume the services o Ideally should be vendor neutral and plugin agnostic o Limited job types are supported at present o 3rd party abstraction APIs o Not supported yet o e.g. Cask CDAP Data Processing API
  • 13. Deployment Considerations Matrix Storage Compute Distro/Plugin Data Processing API Vanilla CDH HDP MapRSpark VM Container Bare-metal Tenant vs. Admin provisioned Disaggregated vs. Collocated HDFS vs. other options Traditional EDP (Sahara native) 3rd party APIs Storm Performance results in the next section
  • 14. o Why Sahara o Sahara introduction o Deployment considerations o Performance testing and results o Future envisioning o Summary and Call to Action 14 Agenda
  • 15. ● Host OS: CentOS7 ● Guest OS: CentOS7 ● Hadoop 2.6.0 ● 4-Nodes Cluster ● Baremetal ●OpenStack using KVM ○qemu-kvm v1.5 ●OpenStack using Docker ○Docker v1.6 15 Testing Environment
  • 16. Ephemeral Disk Performance Host HDFS Host Nova Inst. Store VM HDFS VM HDFS RAID ….. …..vs. ● 1.3x read overhead, 2.1x overhead ○ disk access pattern change: 10% ○ virtualization overhead: 90% ■ 60% due to I/O overhead ■ 30% due to memory efficiency in virtualization Heavy tuning is required. 1.3x overhead 2.1x overhead
  • 17. o Performance difference comes from several factors: o Free of I/O virtualization overhead o DVR bring huge performance enhancement o Location awareness(HVE) is not enabled External HDFS Performance Host HDFS VMVM Host Nova Inst. Store VM HDFS VM HDFS RAID ….. ….. vs. RAID 2x improvement 1.3x overhead
  • 18. Host Swift Performance o Similar to external HDFS but even worse o Without enabling location awareness, all the traffic go through the Swift proxy node Swift ….. VMVM Host Nova Inst. Store VM HDFS VM HDFS RAID ….. ….. vs. RAID 1.35x overhead
  • 19. ● For an I/O intensive workload, 2.19x overhead is big but is consistent with previous results. ● Container demonstrates a fair performance compared to KVM ○ considering OpenStack services also consumes resources, 0.46x is not that bad. Bare-metal vs. Container vs. VM Host Container Host Host VM HDFS HDFS HDFS 2.19X 1.46X 1X
  • 20. o Why Sahara o Sahara introduction o Deployment considerations o Performance testing and results o Future envisioning o Summary and Call to Action 20 Agenda
  • 21. o Architectured for disaggregated computing and storage o Supporting more storage backend o Integration with Manila o Better support for container and bare-metal (Nova-docker and Ironic) o Magnum as an alternative? o EDP as a PaaS like layer for Sahara core provisioning engine o Data connector abstraction o Workflow management o Policy engine for resource and SLA management o Auto-scale, auto-tune o Sahara needs to offer broader vendor integration opportunities (not just engines) o A complete big data stack may have many options at each layer o e.g. acceleration libraries, analytics, developer oriented application framework (e.g. CDAP) o Requires a more generic plugin/driver framework to support it 21 Future of Sahara? (NOT a roadmap)
  • 22. o Upgrade HDP plugin to HDP 2.2 oHadoop HA for CDH and HDP oEnhancements to provisioning through Heat oBring Sahara to python-openstackclient oBare-metal clusters with Ironic oSecurity enhancements oEDP enhancements oJob scheduling oCoordination oLog retrieval oImproved parameter specification 22 Liberty Roadmap Highlights
  • 23. o Great improvement in Sahara Kilo release. Production ready with real customer deployments. o A complete Big-Data-as-a-Service solution requires more considerations than simply adding a Sahara service to the existing OpenStack deployment o Preliminary benchmark results show the performance gap with bare-metal is still huge. Tuning and optimizations are required. o Many features could be added to enhance Sahara. Opportunities exist for various types of vendors. 23 Summary and Call-to-Action Join in the Sahara community and make it even more vibrant!
  • 24. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Include as footnote in appropriate slide where performance data is shown: o §Configurations: [describe config + what test used + who did testing] o §For more information go to http://www.intel.com/performance. Intel, the Intel logo, {List the Intel trademarks in your document} are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2015 Intel Corporation. 24 Disclaimer