SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
© 2014 IBM Corporation
Best Practices Building a
Multi-tenant Big Data Infrastructure
STAC Summit 2014 - NYC
Gord Sissons, gsissons@ca.ibm.com @GJSissons
© 2014 IBM Corporation2
Agenda
What do we mean by multi-tenancy?
Our evolving view - from HPC to HPA
Enter Big Data
Client example – multi-tenant Hadoop
New frameworks & Benchmarking Hadoop
Closing thoughts
© 2014 IBM Corporation3
Multi-tenancy is an over-loaded term
Virtualization
Multiple users, lines-of-business
Multiple application instances & versions
Multi-tenant datastores – security isolation
Multiple distributed frameworks
Multiple instances of the same framework
Our viewpoint shaped by managing scaled-out cluster
infrastructure for the Financial Services Community
Means different things to different people
© 2014 IBM Corporation4
HPC, HPA
IBM Platform
Symphony
Low latency scheduling
Dynamic resource sharing
ISV applications
Extensive APIs High-performance SOA
A high-performance, shared
grid infrastructure for risk
analytics
From a shared infrastructure for risk analytics to born-in-the-cloud frameworks
Batch
IBM Platform
LSF
Multi-headed
Configurations
Batch workloads
On a shared infrastructure,
sharing resources according
to policy – a broad set of
workloads
Our evolving view of multi-tenancy
© 2014 IBM Corporation5
Client requirements
Need for guaranteed service levels, notion of ownership
Time-variant, directed sharing policies
Dynamic, transparent service orchestration
Support for multiple concurrent applications
Agile flexing & resource reclaim
A simple value proposition to the business – sign on to a shared
infrastructure and have guaranteed resource ownership, and a better
quality of service than you could realize on dedicated infrastructure
© 2014 IBM Corporation6
split 0
split 1
split 2
split 3
split 4
split 5
Map
Map
Map
Reduce
Reduce
Reduce
C Client
output 0
output 1
output 2
M Master
Input
Files
Map
Phase
Intermediate
Files
Reduce
Phase
Output
Files
Enter Hadoop - much attention for new workloads
 Data warehouse modernization
 Fraud analytics
 Audit & compliance
 Social media analytics
 360 view of the customer
 Machine data analytics
 Text analytics
 Tick analytics
 Trade visibility
 Click-stream analytics
 Vehicle telematics
History repeating itself - Much as distributed system dominate large-
scale HPC, the same is becoming true in data management
© 2014 IBM Corporation7
HPC, HPA
IBM Platform
Symphony
Low latency scheduling
Dynamic resource sharing
ISV applications
Extensive APIs High-performance SOA
A high-performance, shared
grid infrastructure for risk
analytics
From a shared infrastructure for risk analytics to born-in-the-cloud frameworks
Batch
IBM Platform
LSF
Multi-headed
Configurations
Batch workloads
On a shared infrastructure,
sharing resources according
to policy
Big Data
IBM Platform
Symphony
Advanced Edition
MapReduce
Multitenancy
Agile Scheduling
Hadoop MapReduce
Advanced, high-performance
MapReduce framework with
Hadoop compatibility and
multitenancy
Our evolving view of multi-tenancy
© 2014 IBM Corporation8
Cluster Sprawl – The Elephant in the Room
 Diverse applications with different dependencies
 Different distributions, versions & tools
 Life cycle management challenges – dev, QA, test, production
 Big Data is more than just Hadoop – multiple projects and frameworks
© 2014 IBM Corporation9
HPC, HPA
IBM Platform
Symphony
Low latency scheduling
Dynamic resource sharing
ISV applications
Extensive APIs High-performance SOA
A high-performance, shared
grid infrastructure for risk
analytics
From a shared infrastructure for risk analytics to born-in-the-cloud frameworks
Batch
IBM Platform
LSF
Multi-headed
Configurations
Batch workloads
On a shared infrastructure,
sharing resources according
to policy
Big Data
IBM Platform
Symphony
Advanced Edition
Low latency MapReduce
Multitenancy
Agile Scheduling
Hadoop MapReduce
Advanced, high-performance
MapReduce framework with 100%
Hadoop compatibility and
sophisticated multitenancy
Application
Frameworks
IBM Application
Services Controller
Complex Service
Orchestration
Advanced Services
“Born in the cloud”
application frameworks
Our evolving view of multi-tenancy
© 2014 IBM Corporation10
Customer example
US financial institution, approx 9M customers
 Retail banking, credit cards, insurance, portfolio mgmt, real-estate, retirement
planning & more
Began Hadoop journey in ~2010
 Deliver new services, reduce costs, off-load warehouse, provide timely data
access to analysts & data scientists
Target application areas
 CRM, click-stream analytics, fraud alerting, actuarial underwriting, social data
analytics, vehicle telematics / geo-spatial analytics
Rapid success, internal demand & security requirements
drove the need for an architecture re-think in ~2012
 Deployed IBM Platform Symphony MapReduce + Elastic Storage
(based on IBM GPFS) realizing a shared, multi-tenant analytics grid
© 2014 IBM Corporation11
App #1
User Group #1
App #2
User Group #2
App #3
User Group #3
App #4
User Group #4
App #5
User Group #5
App #6
User Group #6
App #7
User Group #7
App #n
User Group #n
…
Shared infrastructure – current state
 Over two-dozen lines of business sharing production cluster
 1 PB deployed, rapid growth trajectory - ~ 40% reduction in storage requirement
 Security isolation, guaranteed service-levels, show-back accounting
 Significant performance & operational gains, higher infrastructure utilization
 Avoided the need for additional production clusters
InfoSphere BigInsights - Enterprise-grade Hadoop
Platform Symphony MapReduce – Multi-tenancy, high-performance, service level guarantees
IBM Elastic Storage (based on IBM GPFS) - HDFS compatible, POSIX, enterprise-features
© 2014 IBM Corporation12
Planned cluster expansion – early 2015
Expanding the Hadoop infrastructure
Deploying Spark to support new applications
Big R deployment serving data scientists community
Pilot Hadoop-as-a-service on cloud
SQL-on-Hadoop deployment to serve demand from analysts
© 2014 IBM Corporation13
Hadoop-DS Benchmark – October 2014
 IBM developed benchmark reflecting growing interest in SQL-on-Hadoop
 Showcase IBM’s Big SQL capability
 Big Data DS benchmark - based on TPC-DS
 Fully complies with the TPC-DS schema requirement
 Uses all 99 queries
 Meets the multi-user requirement
 Has been audited by a TPC-DS auditor but as a non-TPC benchmark
 Select deviations from TPC-DS due to Hadoop limitations:
 No data maintenance operations, referential integrity enforcement, or ACID
property validation as these are not feasible with HDFS
 Additional statistics used
 Metric adjustments
 No price/performance measures included
 Not an official TPC benchmark result
© 2014 IBM Corporation14
Benchmarking SQL language compatibility
Key points
 With competing solutions, many
queries needed to be re-written
 Owing to various restrictions,
some queries could not be re-
written or failed at run-time
 Re-writing queries in a
benchmark scenario where
results are known is one thing –
doing this against real production
databases is another
 Minimum 3.6x speed advantage
across 46 common query set
InfoSphere BigInsights runs all queries with 12 allowable modifications
Detailed presentation on SlideShare: http://www.slideshare.net/IBM_IM/hadoop-ds-benchmark-results
Audited by InfoSizing, certified TPC auditors – letter of attestation available
© 2014 IBM Corporation15
Resource manager included in Hadoop 2.x and later
Decouples Hadoop workload & resource management
Introduces a general purpose application container
Enjoys broad industry support
By all means use it, but understand current limitations
 Missing flexible resource sharing policies, not yet widely deployed
outside Hadoop contexts, limited application service orchestration
capabilities
What about YARN?
Yet Another Resource Negotiator
© 2014 IBM Corporation16
Closing thoughts
http://ibm.com/platformcomputing
http://ibm.com/hadoop
Be clear on what you mean by multi-tenancy
The right approach to building a shared
infrastructure will depend on what you have
Consider the need for policy management and the
ability to orchestrate services for a wide variety of
distributed frameworks
© 2014 IBM Corporation17

Más contenido relacionado

La actualidad más candente

MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm DataWorks Summit/Hadoop Summit
 
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)Abdelkrim Hadjidj
 

La actualidad más candente (20)

Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Production Grade Data Science for Hadoop
Production Grade Data Science for HadoopProduction Grade Data Science for Hadoop
Production Grade Data Science for Hadoop
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonUsing Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 

Destacado

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaTodd Palino
 
Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase HBaseCon
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Sematext Group, Inc.
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Behar Veliqi
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureDataWorks Summit/Hadoop Summit
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyTreasure Data, Inc.
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 

Destacado (20)

Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
Multi tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafkaMulti tier, multi-tenant, multi-problem kafka
Multi tier, multi-tenant, multi-problem kafka
 
Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase Solving Multi-tenancy and G1GC in Apache HBase
Solving Multi-tenancy and G1GC in Apache HBase
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-Tenancy
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 

Similar a STAC Summit 2014 - Building a multitenant Big Data infrastructure

2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Accelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid CloudAccelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid CloudJeff Jakubiak
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017Helene Lyon
 
Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018Lucas Lopez
 
7 steps to Enterprise PaaS
7 steps to Enterprise PaaS7 steps to Enterprise PaaS
7 steps to Enterprise PaaSVMware vFabric
 
IBM APM for Hybrid Applications
IBM APM for Hybrid ApplicationsIBM APM for Hybrid Applications
IBM APM for Hybrid ApplicationsMatthew Cheah
 
The intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation ITThe intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation ITKangaroot
 
High Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsHigh Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsGabor Samu
 
Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014KMS Technology
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyMongoDB
 
Towards Application Portability in Platform as a Service
Towards Application Portability in Platform as a ServiceTowards Application Portability in Platform as a Service
Towards Application Portability in Platform as a ServiceStefan Kolb
 
Gartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid CloudGartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid CloudRosalind Radcliffe
 
Cloud adoption patterns
Cloud adoption patternsCloud adoption patterns
Cloud adoption patternsKyle Brown
 
Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016Kyle Brown
 
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBMBuild end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBMCodemotion Tel Aviv
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 

Similar a STAC Summit 2014 - Building a multitenant Big Data infrastructure (20)

2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Accelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid CloudAccelerating Innovation with Hybrid Cloud
Accelerating Innovation with Hybrid Cloud
 
IMS integration 2017
IMS integration 2017IMS integration 2017
IMS integration 2017
 
Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018Cloud Computing Introduction - 2018
Cloud Computing Introduction - 2018
 
7 steps to Enterprise PaaS
7 steps to Enterprise PaaS7 steps to Enterprise PaaS
7 steps to Enterprise PaaS
 
IBM APM for Hybrid Applications
IBM APM for Hybrid ApplicationsIBM APM for Hybrid Applications
IBM APM for Hybrid Applications
 
The intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation ITThe intersection of Traditional IT and New-Generation IT
The intersection of Traditional IT and New-Generation IT
 
Upmc tpdev3
Upmc tpdev3Upmc tpdev3
Upmc tpdev3
 
High Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsHigh Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environments
 
Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014Software Technology Trends in 2013-2014
Software Technology Trends in 2013-2014
 
Overview of SaaS
Overview of SaaSOverview of SaaS
Overview of SaaS
 
Adopting the Cloud
Adopting the CloudAdopting the Cloud
Adopting the Cloud
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data Strategy
 
Towards Application Portability in Platform as a Service
Towards Application Portability in Platform as a ServiceTowards Application Portability in Platform as a Service
Towards Application Portability in Platform as a Service
 
Gartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid CloudGartner EA Architecting for DevOps and Hybrid Cloud
Gartner EA Architecting for DevOps and Hybrid Cloud
 
Cloud adoption patterns
Cloud adoption patternsCloud adoption patterns
Cloud adoption patterns
 
Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the Cloud
 
Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016Cloud adoption patterns April 11 2016
Cloud adoption patterns April 11 2016
 
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBMBuild end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBM
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

STAC Summit 2014 - Building a multitenant Big Data infrastructure

  • 1. © 2014 IBM Corporation Best Practices Building a Multi-tenant Big Data Infrastructure STAC Summit 2014 - NYC Gord Sissons, gsissons@ca.ibm.com @GJSissons
  • 2. © 2014 IBM Corporation2 Agenda What do we mean by multi-tenancy? Our evolving view - from HPC to HPA Enter Big Data Client example – multi-tenant Hadoop New frameworks & Benchmarking Hadoop Closing thoughts
  • 3. © 2014 IBM Corporation3 Multi-tenancy is an over-loaded term Virtualization Multiple users, lines-of-business Multiple application instances & versions Multi-tenant datastores – security isolation Multiple distributed frameworks Multiple instances of the same framework Our viewpoint shaped by managing scaled-out cluster infrastructure for the Financial Services Community Means different things to different people
  • 4. © 2014 IBM Corporation4 HPC, HPA IBM Platform Symphony Low latency scheduling Dynamic resource sharing ISV applications Extensive APIs High-performance SOA A high-performance, shared grid infrastructure for risk analytics From a shared infrastructure for risk analytics to born-in-the-cloud frameworks Batch IBM Platform LSF Multi-headed Configurations Batch workloads On a shared infrastructure, sharing resources according to policy – a broad set of workloads Our evolving view of multi-tenancy
  • 5. © 2014 IBM Corporation5 Client requirements Need for guaranteed service levels, notion of ownership Time-variant, directed sharing policies Dynamic, transparent service orchestration Support for multiple concurrent applications Agile flexing & resource reclaim A simple value proposition to the business – sign on to a shared infrastructure and have guaranteed resource ownership, and a better quality of service than you could realize on dedicated infrastructure
  • 6. © 2014 IBM Corporation6 split 0 split 1 split 2 split 3 split 4 split 5 Map Map Map Reduce Reduce Reduce C Client output 0 output 1 output 2 M Master Input Files Map Phase Intermediate Files Reduce Phase Output Files Enter Hadoop - much attention for new workloads  Data warehouse modernization  Fraud analytics  Audit & compliance  Social media analytics  360 view of the customer  Machine data analytics  Text analytics  Tick analytics  Trade visibility  Click-stream analytics  Vehicle telematics History repeating itself - Much as distributed system dominate large- scale HPC, the same is becoming true in data management
  • 7. © 2014 IBM Corporation7 HPC, HPA IBM Platform Symphony Low latency scheduling Dynamic resource sharing ISV applications Extensive APIs High-performance SOA A high-performance, shared grid infrastructure for risk analytics From a shared infrastructure for risk analytics to born-in-the-cloud frameworks Batch IBM Platform LSF Multi-headed Configurations Batch workloads On a shared infrastructure, sharing resources according to policy Big Data IBM Platform Symphony Advanced Edition MapReduce Multitenancy Agile Scheduling Hadoop MapReduce Advanced, high-performance MapReduce framework with Hadoop compatibility and multitenancy Our evolving view of multi-tenancy
  • 8. © 2014 IBM Corporation8 Cluster Sprawl – The Elephant in the Room  Diverse applications with different dependencies  Different distributions, versions & tools  Life cycle management challenges – dev, QA, test, production  Big Data is more than just Hadoop – multiple projects and frameworks
  • 9. © 2014 IBM Corporation9 HPC, HPA IBM Platform Symphony Low latency scheduling Dynamic resource sharing ISV applications Extensive APIs High-performance SOA A high-performance, shared grid infrastructure for risk analytics From a shared infrastructure for risk analytics to born-in-the-cloud frameworks Batch IBM Platform LSF Multi-headed Configurations Batch workloads On a shared infrastructure, sharing resources according to policy Big Data IBM Platform Symphony Advanced Edition Low latency MapReduce Multitenancy Agile Scheduling Hadoop MapReduce Advanced, high-performance MapReduce framework with 100% Hadoop compatibility and sophisticated multitenancy Application Frameworks IBM Application Services Controller Complex Service Orchestration Advanced Services “Born in the cloud” application frameworks Our evolving view of multi-tenancy
  • 10. © 2014 IBM Corporation10 Customer example US financial institution, approx 9M customers  Retail banking, credit cards, insurance, portfolio mgmt, real-estate, retirement planning & more Began Hadoop journey in ~2010  Deliver new services, reduce costs, off-load warehouse, provide timely data access to analysts & data scientists Target application areas  CRM, click-stream analytics, fraud alerting, actuarial underwriting, social data analytics, vehicle telematics / geo-spatial analytics Rapid success, internal demand & security requirements drove the need for an architecture re-think in ~2012  Deployed IBM Platform Symphony MapReduce + Elastic Storage (based on IBM GPFS) realizing a shared, multi-tenant analytics grid
  • 11. © 2014 IBM Corporation11 App #1 User Group #1 App #2 User Group #2 App #3 User Group #3 App #4 User Group #4 App #5 User Group #5 App #6 User Group #6 App #7 User Group #7 App #n User Group #n … Shared infrastructure – current state  Over two-dozen lines of business sharing production cluster  1 PB deployed, rapid growth trajectory - ~ 40% reduction in storage requirement  Security isolation, guaranteed service-levels, show-back accounting  Significant performance & operational gains, higher infrastructure utilization  Avoided the need for additional production clusters InfoSphere BigInsights - Enterprise-grade Hadoop Platform Symphony MapReduce – Multi-tenancy, high-performance, service level guarantees IBM Elastic Storage (based on IBM GPFS) - HDFS compatible, POSIX, enterprise-features
  • 12. © 2014 IBM Corporation12 Planned cluster expansion – early 2015 Expanding the Hadoop infrastructure Deploying Spark to support new applications Big R deployment serving data scientists community Pilot Hadoop-as-a-service on cloud SQL-on-Hadoop deployment to serve demand from analysts
  • 13. © 2014 IBM Corporation13 Hadoop-DS Benchmark – October 2014  IBM developed benchmark reflecting growing interest in SQL-on-Hadoop  Showcase IBM’s Big SQL capability  Big Data DS benchmark - based on TPC-DS  Fully complies with the TPC-DS schema requirement  Uses all 99 queries  Meets the multi-user requirement  Has been audited by a TPC-DS auditor but as a non-TPC benchmark  Select deviations from TPC-DS due to Hadoop limitations:  No data maintenance operations, referential integrity enforcement, or ACID property validation as these are not feasible with HDFS  Additional statistics used  Metric adjustments  No price/performance measures included  Not an official TPC benchmark result
  • 14. © 2014 IBM Corporation14 Benchmarking SQL language compatibility Key points  With competing solutions, many queries needed to be re-written  Owing to various restrictions, some queries could not be re- written or failed at run-time  Re-writing queries in a benchmark scenario where results are known is one thing – doing this against real production databases is another  Minimum 3.6x speed advantage across 46 common query set InfoSphere BigInsights runs all queries with 12 allowable modifications Detailed presentation on SlideShare: http://www.slideshare.net/IBM_IM/hadoop-ds-benchmark-results Audited by InfoSizing, certified TPC auditors – letter of attestation available
  • 15. © 2014 IBM Corporation15 Resource manager included in Hadoop 2.x and later Decouples Hadoop workload & resource management Introduces a general purpose application container Enjoys broad industry support By all means use it, but understand current limitations  Missing flexible resource sharing policies, not yet widely deployed outside Hadoop contexts, limited application service orchestration capabilities What about YARN? Yet Another Resource Negotiator
  • 16. © 2014 IBM Corporation16 Closing thoughts http://ibm.com/platformcomputing http://ibm.com/hadoop Be clear on what you mean by multi-tenancy The right approach to building a shared infrastructure will depend on what you have Consider the need for policy management and the ability to orchestrate services for a wide variety of distributed frameworks
  • 17. © 2014 IBM Corporation17