SlideShare una empresa de Scribd logo
1 de 21
Self-Service Provisioning and
Hadoop Management with Apache Ambari
Anant Chintamaneni
June 9th, 2015
1:45pm to 2:25pm
This session is on self-service Hadoop for:
 ON-PREMISES
 IN YOUR DATA CENTER
 USING YOUR INFRASTRUCTURE
NOT
X PUBLIC CLOUD (e.g. AMAZON EMR, AZURE etc.)
About me
• VP of Products at BlueData
• @AnantCman on Twitter
• Former Head of Hadoop Products at Pivotal
• Championed Ambari at Pivotal
• Introduced Hadoop at Merced Systems (now NICE Systems)
Personal
• Soccer dad 
• Sports fan – go Niners!
• Self-service Hadoop – what is it, why now?
• Key building blocks for self-service Hadoop
• Why Apache Ambari
• Delivering self-service with Ambari
• Demo
• Q&A
Talk Track
Self-service is the need of the hour for Hadoop
“……while Hadoop can handle huge data sets and make them useable, the
capabilities needed to set up and run Hadoop remain scarce and expensive…..”
Self-service models are proven to simplify and drive usage
Self-service Hadoop defined
Make it work the way users want to work today…
Files
NFS
RDBMS
I can access my
desktop analysis /
BI tool of choice
Analytics
/visualization
idea!
Point at data
and analyze
Self-service analytics: from idea to insights in minutes
Self-service Hadoop defined
Make it work the way users want to work today…
Self-service Hadoop: from idea to infrastructure to insights in minutes
I can provision
my own Hadoop
‘cluster’ so I have
Hive, Pig, BI tool,
etc.
Big Data
Analytics
/visualization
idea!
Point at data
and analyze,
extract insights
NFS
RDBMS
Self-service Hadoop examples
• Ad-hoc data exploration  can I blend this data with that data?
• Fail fast experimentation  you don’t know what you don’t know
• Test multiple predictive analytics models  get a dedicated sandbox
• Bursty workload  your boss needs you do an analytics drill
Without self-service Hadoop
It may not work the way your users want to work today…
From idea to infrastructure to insights in weeks
YES
NO NO
Provision cluster Copy data to cluster
NO
Wait!
Run Hadoop
analytics
jobs
Meet … wait …
email … why isn’t
my cluster ready?
Big Data
Analytics
/visualization
idea!
Lost business
opportunity,
insights no
longer relevant
YES YESHadoop
cluster
ready
Is my
data
there?
Code/q
uery
review
Key building blocks for self-service Hadoop
End user experience
Agility, elasticity and easy access
Enterprise IT
Operational support and oversight
Easy
Access
Tech
Support
Why Apache Ambari
 RESTful APIs to automate provisioning of Apache Hadoop clusters
• Capture basic cluster parameters from user and leverage Ambari APIs
 Granular control on deployment of services (e.g. Hive, Pig)
• Only deploy ‘compute’ services (e.g. Hive, BI tool) requested by user
• Speeds up availability of cluster by eliminating overhead
 Enterprise-grade security, management and monitoring capabilities
• IT admins can support user-created clusters with familiar mgmt console
Delivering self-service with Ambari
Your physical servers
+ =
VIRTUALIZED INFRASTRUCTURE
• Big Data VMs/Containers
• Self-service web UI
• Tenant/User Management
• DataTap (HDFS abstraction)
SELF-SERVICE HDP CLUSTERS
• HDP Virtual Hadoop clusters
• Ambari management console
• ‘Compute’ services (e.g. Hive)
+ =
Delivering self-service with Ambari
Self-service web interface – define cluster with a few mouse clicks
* Example screenshot from BlueData
integration with Apache Ambari
Delivering self-service with Ambari
Creating virtual Hadoop clusters within minutes
* Example screenshot from BlueData integration with Apache Ambari
Delivering self-service with Ambari
Creating virtual Hadoop clusters within minutes
* Example screenshot from BlueData integration with Apache Ambari
Delivering self-service with Ambari
Hadoop cluster provisioning using Ambari API
Phase 1: VMs
• Self-service request
• VMs provisioned
• Ambari server & agents
pre-deployed
• HDFS dependency
removed
Phase 2: Core Stack
• Agent registration with server
• REST API call to deploy HDP stack
• REST API to create core-site.xml to
use BlueData HDFS abstraction
• Start YARN/MRv2
• Shutdown HDFS service
Phase 3: Services
• Add specific services
requested by end user via
REST API calls
• Start ‘compute’ services
(e.g. Hive, Pig) requested
by user
• Update status of cluster
Design optimized for
cluster creation speed and user feedback
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"ServiceInfo":{"service_name":"PIG"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG/components/PIG
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-env.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-properties.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-log4j.json
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
env","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
properties","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
log4j","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"host_components":[{"HostRoles":{"component_name":"PIG"}}]}
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/hosts?Hosts/host_name=bluedata-71.openstacklocal
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"ServiceInfo":{"state":"INSTALLED"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
Delivering self-service with Ambari
REST API example to deploy specific service (Pig)
Service
Configs
Install
Delivering self-service with Ambari
Design choices and considerations
• Used Apache Ambari v1.7 for this example
• BlueData mgmt services orchestrate Ambari REST API calls
• Ambari Blueprints used bring up HDFS only
– Post cluster creation, services added using individual REST APIs for better control
– Blueprints/Stack Advisor do not provide REST API to track intermediate progress
• Used individual REST API calls with static configuration files
– Could not leverage Stack Advisor for individual services
Self-Service with Ambari:
Live Demo
Q&A
Contact me directly at …
Email: anant@bluedata.com
Twitter: @AnantCman
BlueData + Apache Ambari 1.7 Integration
Benefits Features
Infrastructure agility, elasticity, and efficiency – virtual HDP
clusters with the functionality and performance of physical
clusters
• Auto-provisioning of VM hosts with Ambari server and
agent components
• Automated, transparent deployment of CDH using REST
API for Stacks and Services.
Time savings for Data Scientists and Big Data
administrators
• Self-service virtual cluster creation by data scientists or
business analysts
• Troubleshooting and management by Big Data admins
using Apache Ambari
Administrator productivity & flexibility • Apache Ambari for monitoring, fine-grained configuration,
and enterprise support

Más contenido relacionado

La actualidad más candente

Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
DataWorks Summit
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 

La actualidad más candente (20)

Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Ravi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introduction
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing HubIMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattle
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 

Similar a Self-Service Provisioning and Hadoop Management with Apache Ambari

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
harendra_pathak
 

Similar a Self-Service Provisioning and Hadoop Management with Apache Ambari (20)

Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStack
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
 
Building a cloud based managed BigData platform for the enterprise
Building a cloud based managed BigData platform for the enterpriseBuilding a cloud based managed BigData platform for the enterprise
Building a cloud based managed BigData platform for the enterprise
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
Simplified Cluster Operation and Troubleshooting
Simplified Cluster Operation and TroubleshootingSimplified Cluster Operation and Troubleshooting
Simplified Cluster Operation and Troubleshooting
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Building a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStackBuilding a Dev/Test Cloud with Apache CloudStack
Building a Dev/Test Cloud with Apache CloudStack
 
Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...
Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...
Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...
 
Tear It Down, Build It Back Up: Empowering Developers with Amazon CloudFormation
Tear It Down, Build It Back Up: Empowering Developers with Amazon CloudFormationTear It Down, Build It Back Up: Empowering Developers with Amazon CloudFormation
Tear It Down, Build It Back Up: Empowering Developers with Amazon CloudFormation
 
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingApache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari Overview2015 zData Inc. - Apache Ambari Overview
2015 zData Inc. - Apache Ambari Overview
 
Ambari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.finalAmbari hadoop-ops-meetup-2013-09-19.final
Ambari hadoop-ops-meetup-2013-09-19.final
 
Cloud Platforms for Java
Cloud Platforms for JavaCloud Platforms for Java
Cloud Platforms for Java
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
Intro to Service Worker API and its use cases
Intro to Service Worker API and its use casesIntro to Service Worker API and its use cases
Intro to Service Worker API and its use cases
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Self-Service Provisioning and Hadoop Management with Apache Ambari

  • 1. Self-Service Provisioning and Hadoop Management with Apache Ambari Anant Chintamaneni June 9th, 2015 1:45pm to 2:25pm
  • 2. This session is on self-service Hadoop for:  ON-PREMISES  IN YOUR DATA CENTER  USING YOUR INFRASTRUCTURE NOT X PUBLIC CLOUD (e.g. AMAZON EMR, AZURE etc.)
  • 3. About me • VP of Products at BlueData • @AnantCman on Twitter • Former Head of Hadoop Products at Pivotal • Championed Ambari at Pivotal • Introduced Hadoop at Merced Systems (now NICE Systems) Personal • Soccer dad  • Sports fan – go Niners!
  • 4. • Self-service Hadoop – what is it, why now? • Key building blocks for self-service Hadoop • Why Apache Ambari • Delivering self-service with Ambari • Demo • Q&A Talk Track
  • 5. Self-service is the need of the hour for Hadoop “……while Hadoop can handle huge data sets and make them useable, the capabilities needed to set up and run Hadoop remain scarce and expensive…..” Self-service models are proven to simplify and drive usage
  • 6. Self-service Hadoop defined Make it work the way users want to work today… Files NFS RDBMS I can access my desktop analysis / BI tool of choice Analytics /visualization idea! Point at data and analyze Self-service analytics: from idea to insights in minutes
  • 7. Self-service Hadoop defined Make it work the way users want to work today… Self-service Hadoop: from idea to infrastructure to insights in minutes I can provision my own Hadoop ‘cluster’ so I have Hive, Pig, BI tool, etc. Big Data Analytics /visualization idea! Point at data and analyze, extract insights NFS RDBMS
  • 8. Self-service Hadoop examples • Ad-hoc data exploration  can I blend this data with that data? • Fail fast experimentation  you don’t know what you don’t know • Test multiple predictive analytics models  get a dedicated sandbox • Bursty workload  your boss needs you do an analytics drill
  • 9. Without self-service Hadoop It may not work the way your users want to work today… From idea to infrastructure to insights in weeks YES NO NO Provision cluster Copy data to cluster NO Wait! Run Hadoop analytics jobs Meet … wait … email … why isn’t my cluster ready? Big Data Analytics /visualization idea! Lost business opportunity, insights no longer relevant YES YESHadoop cluster ready Is my data there? Code/q uery review
  • 10. Key building blocks for self-service Hadoop End user experience Agility, elasticity and easy access Enterprise IT Operational support and oversight Easy Access Tech Support
  • 11. Why Apache Ambari  RESTful APIs to automate provisioning of Apache Hadoop clusters • Capture basic cluster parameters from user and leverage Ambari APIs  Granular control on deployment of services (e.g. Hive, Pig) • Only deploy ‘compute’ services (e.g. Hive, BI tool) requested by user • Speeds up availability of cluster by eliminating overhead  Enterprise-grade security, management and monitoring capabilities • IT admins can support user-created clusters with familiar mgmt console
  • 12. Delivering self-service with Ambari Your physical servers + = VIRTUALIZED INFRASTRUCTURE • Big Data VMs/Containers • Self-service web UI • Tenant/User Management • DataTap (HDFS abstraction) SELF-SERVICE HDP CLUSTERS • HDP Virtual Hadoop clusters • Ambari management console • ‘Compute’ services (e.g. Hive) + =
  • 13. Delivering self-service with Ambari Self-service web interface – define cluster with a few mouse clicks * Example screenshot from BlueData integration with Apache Ambari
  • 14. Delivering self-service with Ambari Creating virtual Hadoop clusters within minutes * Example screenshot from BlueData integration with Apache Ambari
  • 15. Delivering self-service with Ambari Creating virtual Hadoop clusters within minutes * Example screenshot from BlueData integration with Apache Ambari
  • 16. Delivering self-service with Ambari Hadoop cluster provisioning using Ambari API Phase 1: VMs • Self-service request • VMs provisioned • Ambari server & agents pre-deployed • HDFS dependency removed Phase 2: Core Stack • Agent registration with server • REST API call to deploy HDP stack • REST API to create core-site.xml to use BlueData HDFS abstraction • Start YARN/MRv2 • Shutdown HDFS service Phase 3: Services • Add specific services requested by end user via REST API calls • Start ‘compute’ services (e.g. Hive, Pig) requested by user • Update status of cluster Design optimized for cluster creation speed and user feedback
  • 17. curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"ServiceInfo":{"service_name":"PIG"}} http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG/components/PIG curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-env.json http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-properties.json http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-log4j.json http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig- env","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7 curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig- properties","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7 curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig- log4j","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7 curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"host_components":[{"HostRoles":{"component_name":"PIG"}}]} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/hosts?Hosts/host_name=bluedata-71.openstacklocal curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"ServiceInfo":{"state":"INSTALLED"}} http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG Delivering self-service with Ambari REST API example to deploy specific service (Pig) Service Configs Install
  • 18. Delivering self-service with Ambari Design choices and considerations • Used Apache Ambari v1.7 for this example • BlueData mgmt services orchestrate Ambari REST API calls • Ambari Blueprints used bring up HDFS only – Post cluster creation, services added using individual REST APIs for better control – Blueprints/Stack Advisor do not provide REST API to track intermediate progress • Used individual REST API calls with static configuration files – Could not leverage Stack Advisor for individual services
  • 20. Q&A Contact me directly at … Email: anant@bluedata.com Twitter: @AnantCman
  • 21. BlueData + Apache Ambari 1.7 Integration Benefits Features Infrastructure agility, elasticity, and efficiency – virtual HDP clusters with the functionality and performance of physical clusters • Auto-provisioning of VM hosts with Ambari server and agent components • Automated, transparent deployment of CDH using REST API for Stacks and Services. Time savings for Data Scientists and Big Data administrators • Self-service virtual cluster creation by data scientists or business analysts • Troubleshooting and management by Big Data admins using Apache Ambari Administrator productivity & flexibility • Apache Ambari for monitoring, fine-grained configuration, and enterprise support

Notas del editor

  1. “……Skills gaps continue to be a major adoption inhibitor for 57 percent of respondents, while figuring out how to get value from Hadoop was cited by 49 percent of respondents. The absence of skills has long been a key blocker. Tooling vendors claim their products also address the skills gap. While tools are improving, they primarily support highly skilled users rather than elevate the skills already available in most enterprises.
  2. Extends Ambari Stacks to include a “Stack Advisor” Provides recommendations for and performs validation on component layout & configuration Improves Stack pluggability Exposes new REST endpoints: /recommendations /validations REST endpoints used during Cluster Install Wizard and Configs UI