2. This session is on self-service Hadoop for:
ON-PREMISES
IN YOUR DATA CENTER
USING YOUR INFRASTRUCTURE
NOT
X PUBLIC CLOUD (e.g. AMAZON EMR, AZURE etc.)
3. About me
• VP of Products at BlueData
• @AnantCman on Twitter
• Former Head of Hadoop Products at Pivotal
• Championed Ambari at Pivotal
• Introduced Hadoop at Merced Systems (now NICE Systems)
Personal
• Soccer dad
• Sports fan – go Niners!
4. • Self-service Hadoop – what is it, why now?
• Key building blocks for self-service Hadoop
• Why Apache Ambari
• Delivering self-service with Ambari
• Demo
• Q&A
Talk Track
5. Self-service is the need of the hour for Hadoop
“……while Hadoop can handle huge data sets and make them useable, the
capabilities needed to set up and run Hadoop remain scarce and expensive…..”
Self-service models are proven to simplify and drive usage
6. Self-service Hadoop defined
Make it work the way users want to work today…
Files
NFS
RDBMS
I can access my
desktop analysis /
BI tool of choice
Analytics
/visualization
idea!
Point at data
and analyze
Self-service analytics: from idea to insights in minutes
7. Self-service Hadoop defined
Make it work the way users want to work today…
Self-service Hadoop: from idea to infrastructure to insights in minutes
I can provision
my own Hadoop
‘cluster’ so I have
Hive, Pig, BI tool,
etc.
Big Data
Analytics
/visualization
idea!
Point at data
and analyze,
extract insights
NFS
RDBMS
8. Self-service Hadoop examples
• Ad-hoc data exploration can I blend this data with that data?
• Fail fast experimentation you don’t know what you don’t know
• Test multiple predictive analytics models get a dedicated sandbox
• Bursty workload your boss needs you do an analytics drill
9. Without self-service Hadoop
It may not work the way your users want to work today…
From idea to infrastructure to insights in weeks
YES
NO NO
Provision cluster Copy data to cluster
NO
Wait!
Run Hadoop
analytics
jobs
Meet … wait …
email … why isn’t
my cluster ready?
Big Data
Analytics
/visualization
idea!
Lost business
opportunity,
insights no
longer relevant
YES YESHadoop
cluster
ready
Is my
data
there?
Code/q
uery
review
10. Key building blocks for self-service Hadoop
End user experience
Agility, elasticity and easy access
Enterprise IT
Operational support and oversight
Easy
Access
Tech
Support
11. Why Apache Ambari
RESTful APIs to automate provisioning of Apache Hadoop clusters
• Capture basic cluster parameters from user and leverage Ambari APIs
Granular control on deployment of services (e.g. Hive, Pig)
• Only deploy ‘compute’ services (e.g. Hive, BI tool) requested by user
• Speeds up availability of cluster by eliminating overhead
Enterprise-grade security, management and monitoring capabilities
• IT admins can support user-created clusters with familiar mgmt console
12. Delivering self-service with Ambari
Your physical servers
+ =
VIRTUALIZED INFRASTRUCTURE
• Big Data VMs/Containers
• Self-service web UI
• Tenant/User Management
• DataTap (HDFS abstraction)
SELF-SERVICE HDP CLUSTERS
• HDP Virtual Hadoop clusters
• Ambari management console
• ‘Compute’ services (e.g. Hive)
+ =
13. Delivering self-service with Ambari
Self-service web interface – define cluster with a few mouse clicks
* Example screenshot from BlueData
integration with Apache Ambari
14. Delivering self-service with Ambari
Creating virtual Hadoop clusters within minutes
* Example screenshot from BlueData integration with Apache Ambari
15. Delivering self-service with Ambari
Creating virtual Hadoop clusters within minutes
* Example screenshot from BlueData integration with Apache Ambari
16. Delivering self-service with Ambari
Hadoop cluster provisioning using Ambari API
Phase 1: VMs
• Self-service request
• VMs provisioned
• Ambari server & agents
pre-deployed
• HDFS dependency
removed
Phase 2: Core Stack
• Agent registration with server
• REST API call to deploy HDP stack
• REST API to create core-site.xml to
use BlueData HDFS abstraction
• Start YARN/MRv2
• Shutdown HDFS service
Phase 3: Services
• Add specific services
requested by end user via
REST API calls
• Start ‘compute’ services
(e.g. Hive, Pig) requested
by user
• Update status of cluster
Design optimized for
cluster creation speed and user feedback
17. curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"ServiceInfo":{"service_name":"PIG"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG/components/PIG
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-env.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-properties.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-log4j.json
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
env","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
properties","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
log4j","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"host_components":[{"HostRoles":{"component_name":"PIG"}}]}
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/hosts?Hosts/host_name=bluedata-71.openstacklocal
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"ServiceInfo":{"state":"INSTALLED"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
Delivering self-service with Ambari
REST API example to deploy specific service (Pig)
Service
Configs
Install
18. Delivering self-service with Ambari
Design choices and considerations
• Used Apache Ambari v1.7 for this example
• BlueData mgmt services orchestrate Ambari REST API calls
• Ambari Blueprints used bring up HDFS only
– Post cluster creation, services added using individual REST APIs for better control
– Blueprints/Stack Advisor do not provide REST API to track intermediate progress
• Used individual REST API calls with static configuration files
– Could not leverage Stack Advisor for individual services
21. BlueData + Apache Ambari 1.7 Integration
Benefits Features
Infrastructure agility, elasticity, and efficiency – virtual HDP
clusters with the functionality and performance of physical
clusters
• Auto-provisioning of VM hosts with Ambari server and
agent components
• Automated, transparent deployment of CDH using REST
API for Stacks and Services.
Time savings for Data Scientists and Big Data
administrators
• Self-service virtual cluster creation by data scientists or
business analysts
• Troubleshooting and management by Big Data admins
using Apache Ambari
Administrator productivity & flexibility • Apache Ambari for monitoring, fine-grained configuration,
and enterprise support
Notas del editor
“……Skills gaps continue to be a major adoption inhibitor for 57 percent of respondents, while figuring out how to get value from Hadoop was cited by 49 percent of respondents. The absence of skills has long been a key blocker.
Tooling vendors claim their products also address the skills gap. While tools are improving, they primarily support highly skilled users rather than elevate the skills already available in most enterprises.
Extends Ambari Stacks to include a “Stack Advisor”
Provides recommendations for and performs validation on component layout & configuration
Improves Stack pluggability
Exposes new REST endpoints:
/recommendations
/validations
REST endpoints used during Cluster Install Wizard and Configs UI