SlideShare una empresa de Scribd logo
1 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations –
Past, Present, and Future
Santhosh B Gowda
Feb 2019
2 © Hortonworks Inc. 2011–2018. All rights reserved
Disclaimer
This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software
Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from
inception to release through Apache, however, technical feasibility, market demand, user
feedback and the overarching Apache Software Foundation community development process can
all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a
contractual commitment, promise or obligation from Hortonworks to deliver these features in
any generally available product.
Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should
not rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• Hadoop Operations: Ambari
• Hadoop Operations: Data Challenge
• Cloud Key Considerations
• Cloudbreak
• What is Cloudbreak ?
• Custom Images
• Kerberos Security
• Recipes
• Auto Scaling
4 © Hortonworks Inc. 2011–2018. All rights reserved
What Is Apache Ambari?
A completely open source
management platform for
provisioning, managing,
monitoring and securing
Apache Hadoop clusters.
Apache Ambari takes the
guesswork out of operating
Hadoop.
5 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations - Ambari
Simplified Installation,
Configuration and Management
Centralized Security Setup
Full Visibility into Cluster Health
Highly Extensible and
Customizable
• Wizard-driven and automated cluster provisioning
• Smart Configurations and Cluster Recommendations
• Automated Rolling and Express cluster upgrades
• Reduce complexity to administer security across the
platform
• Automate setup Kerberos
• Simplify the configuration of Apache Ranger
• Predefined alerts based on operational best practices
• Advanced metrics visualization with Grafana
• Integrated with SmartSense for proactive issues prevention
• Seamlessly fit into your enterprise environment
• Bring custom Services under management via Ambari
Stacks
• Customize the UI with Ambari Views
6 © Hortonworks Inc. 2011–2018. All rights reserved
Early Adopters
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
010101010101010
Public Cloud Storage
Public Cloud
Compute
Large Shared Workloads, supported by Shared
Services, On-Premise
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
01010101010101
01010101010101010
10
Long-Running Cluster on Cloud IaaS
10101
10101010101
01010101010101
0101010101010101010
7 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations: Data Challenge
• Data is becoming more and more distributed…
• Across data center and cloud environments…
• Accessed using multi- and single-workload clusters…
• But must be discoverable and accessed by all who seek it.
Cluster
Cluster
Cluster
Cluster
ClusterCluster ClusterClusterClusterClusterCluster ClusterClusterCluster
DATA
CENTER CLOUDS
The Virtual Data Lake
Business User
Very difficult to find data
(leading to inefficient use of time)
Platform Operator
Hard to secure and hard to
operate (can be time consuming
and prone to error)
8 © Hortonworks Inc. 2011–2018. All rights reserved
Cloud: Key Considerations
• Cloud is infrastructure… need a Data Strategy
• Hybrid (on-premise & cloud) requirements are real.
• Multi-Cloud (i.e. portability) is a key emerging requirement
• Logistics & Physics
• Regulatory & Compliance
• Economic arbitrage
• Consistent and familiar Security & Governance across on-premise & cloud environments
• Free movement of data, regardless of origin or destination
• Global data catalog, regardless of location
9 © Hortonworks Inc. 2011–2018. All rights reserved
Data Management across On-Prem & Multi-Cloud
Large Shared Workloads, supported by
Shared Services, On-Premise
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
01010101010101
010101010101010
Multiple Ephemeral Workloads,
supported by Shared Services, Multi-
Cloud.
Hortonworks DataPlane Service
Public Cloud A
Storage
Public Cloud A
Compute
Atlas, Ranger, Metastore, Knox
Hive LLAP
Ambari Ambari Ambari
NiFi
Spark
Cloudbreak
YARN YARN
Public Cloud B
Storage
Public Cloud B
Compute
Atlas, Ranger, Metastore, Knox
Hive LLAP
Ambari Ambari Ambari
NiFi
Spark
Cloudbreak
YARN YARN
Multiple Ephemeral Workloads,
supported by Shared Services, Multi-
Cloud.
10101
10101010101
01010101010101
010101010101010
1010
10101
10101010101
01010101010101
010101010101010
1010
10 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks: Architecting and Optimizing for the Cloud
CLOUD STORAGE WORKLOADS
Durable Ephemeral
When data resides in cloud object
stores (e.g. Amazon S3), Hadoop
optimizes reads/writes and acts as
an intermediate cache to increase
performance and decrease latency.
Metastore
SCHEMA
Long Running
Security access to workload
clusters via a Protected Gateway
enabled for AuthN and HTTPS.
Define your data schema, security
policies, and metadata catalog
once for your ephemeral and
always-on workloads.
Atlas
CATALOG
Ranger
POLICY
SHARED DATA LAKE SERVICES
11 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak
12 © Hortonworks Inc. 2011–2018. All rights reserved
What Is Cloudbreak ?
Cloudbreak is a tool for provisioning Hadoop
clusters on any cloud infrastructure
Simplified cluster provisioning - prescriptive
setup, simple automation
13 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Harness the Agility of Cloud with Ease
Cloudbreak
• Declarative workload
provisioning across
multiple cloud providers
• Flexible topologies and
security configuration
options
• DevOps friendly, easy setup
and simple to automate
• Built-in elasticity and auto-
scaling
• Prescriptive integration
with cloud services
14 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak Building Blocks
• Cloud Credentials
• Ambari Blueprints
• Auto Scaling
• Custom Recipes
• Custom Images
• Network
• Gateway
• Kerberos Security
• Dynamic Blueprints
• Cloud Storage
Simple and Flexible Prescriptive Secure
15 © Hortonworks Inc. 2011–2018. All rights reserved
Custom Images
16 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Cloudbreak
1. Cloudbreak creates VM instances using a default base image.
2. Cloudbreak installs Ambari on a VM instance.
3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances.
Cloudbreak
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Cluster
17 © Hortonworks Inc. 2011–2018. All rights reserved
Custom Images Overview
Create the
Custom Image
Register the
Custom Image
Use the
Custom Image
when Creating
a Cluster
1 2 3
18 © Hortonworks Inc. 2011–2018. All rights reserved
Recipes
19 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Recipes
• Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases
can be addressed.
• Install additional software.
• System config changes.
• A recipe is a script that runs on all nodes of a selected node group at a specific time.
• Support for bash and python scripts.
• Available hooks
• Pre-ambari-start
• Post-ambari-start
• Post-cluster-install
• Pre-termination
20 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Cluster Extensions > Recipes > Create
• Add recipe as File, Url or Text
21 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Clusters > Create Cluster >
Cluster Extensions
22 © Hortonworks Inc. 2011–2018. All rights reserved
Kerberos Security
23 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Kerberos
• Strongly authenticating and establishing a user’s identity is the basis for secure access in
Hadoop. Users need to be able to reliably “identify” themselves and then have that
identity propagated throughout the Hadoop cluster.
• Once this is done, those users can access resources (such as files or directories) or
interact with the cluster (like running MapReduce jobs).
• Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need
to authenticate with each other to avoid potential malicious systems or daemon’s
“posing as” trusted components of the cluster to gain access to data.
25 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Enable Kerberos Security
• Create Cluster > Security > Advanced
• [ ] Enable Kerberos Security
26 © Hortonworks Inc. 2011–2018. All rights reserved
Options: Use Existing KDC or Use Test KDC
Use Existing
KDC
Use Test KDC
Advanced
Basic
- Not for production use. For testing and
evaluation purposes only.
- Installs and configures an MIT KDC on the
master node.
- Configures the cluster to leverage that
KDC.
- Provide basic information
about your existing KDC.
- Ambari Kerberos descriptors
are generated automatically.
- Provide basic information
about your existing KDC.
- Provide your own Ambari
Kerberos descriptors.
27 © Hortonworks Inc. 2011–2018. All rights reserved
Auto Scaling
28 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling
• Alerts: Create metric or time-based alerts for cluster scaling
• Policies: Scaling policies adjust cluster size based on activity and workload alerts
• General Configurations: Boundaries and cooldown period
29 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Time-Based Alert
Fire at 10:15 am everyday
30 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Metric-Based Alert
Fire after NodeManagers are
CRITICAL for 10 minutes
31 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Policies
• Define the Scale Adjustment (Node Count, Percentage, Exact)
• Select the Host Group (to Scale)
• Select Alert (which when fired, executes the Policy)
37 © Hortonworks Inc. 2011–2018. All rights reserved
Learn More
• Try Ambari
• https://docs.hortonworks.co
m/HDPDocuments/Ambari/A
mbari-2.7.0.0/index.html
• Try Cloudbreak 2.8 (TP)
• https://docs.hortonworks.co
m/HDPDocuments/Cloudbre
ak/Cloudbreak-
2.8.0/index.html
38 © Hortonworks Inc. 2011–2018. All rights reserved
Questions ?
39 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you !

Más contenido relacionado

La actualidad más candente

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Hortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
Hortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
Hortonworks
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
DataWorks Summit
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
DataWorks Summit
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
DataWorks Summit
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Hortonworks
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
 

La actualidad más candente (20)

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 

Similar a Hadoop Operations – Past, Present, and Future

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
DataWorks Summit
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Billie Rinaldi
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
Shivaji Dutta
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 

Similar a Hadoop Operations – Past, Present, and Future (20)

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 

Más de DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 

Último (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 

Hadoop Operations – Past, Present, and Future

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations – Past, Present, and Future Santhosh B Gowda Feb 2019
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • Hadoop Operations: Ambari • Hadoop Operations: Data Challenge • Cloud Key Considerations • Cloudbreak • What is Cloudbreak ? • Custom Images • Kerberos Security • Recipes • Auto Scaling
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved What Is Apache Ambari? A completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. Apache Ambari takes the guesswork out of operating Hadoop.
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations - Ambari Simplified Installation, Configuration and Management Centralized Security Setup Full Visibility into Cluster Health Highly Extensible and Customizable • Wizard-driven and automated cluster provisioning • Smart Configurations and Cluster Recommendations • Automated Rolling and Express cluster upgrades • Reduce complexity to administer security across the platform • Automate setup Kerberos • Simplify the configuration of Apache Ranger • Predefined alerts based on operational best practices • Advanced metrics visualization with Grafana • Integrated with SmartSense for proactive issues prevention • Seamlessly fit into your enterprise environment • Bring custom Services under management via Ambari Stacks • Customize the UI with Ambari Views
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Early Adopters Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 010101010101010 Public Cloud Storage Public Cloud Compute Large Shared Workloads, supported by Shared Services, On-Premise Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 01010101010101 01010101010101010 10 Long-Running Cluster on Cloud IaaS 10101 10101010101 01010101010101 0101010101010101010
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations: Data Challenge • Data is becoming more and more distributed… • Across data center and cloud environments… • Accessed using multi- and single-workload clusters… • But must be discoverable and accessed by all who seek it. Cluster Cluster Cluster Cluster ClusterCluster ClusterClusterClusterClusterCluster ClusterClusterCluster DATA CENTER CLOUDS The Virtual Data Lake Business User Very difficult to find data (leading to inefficient use of time) Platform Operator Hard to secure and hard to operate (can be time consuming and prone to error)
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Cloud: Key Considerations • Cloud is infrastructure… need a Data Strategy • Hybrid (on-premise & cloud) requirements are real. • Multi-Cloud (i.e. portability) is a key emerging requirement • Logistics & Physics • Regulatory & Compliance • Economic arbitrage • Consistent and familiar Security & Governance across on-premise & cloud environments • Free movement of data, regardless of origin or destination • Global data catalog, regardless of location
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Data Management across On-Prem & Multi-Cloud Large Shared Workloads, supported by Shared Services, On-Premise Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 01010101010101 010101010101010 Multiple Ephemeral Workloads, supported by Shared Services, Multi- Cloud. Hortonworks DataPlane Service Public Cloud A Storage Public Cloud A Compute Atlas, Ranger, Metastore, Knox Hive LLAP Ambari Ambari Ambari NiFi Spark Cloudbreak YARN YARN Public Cloud B Storage Public Cloud B Compute Atlas, Ranger, Metastore, Knox Hive LLAP Ambari Ambari Ambari NiFi Spark Cloudbreak YARN YARN Multiple Ephemeral Workloads, supported by Shared Services, Multi- Cloud. 10101 10101010101 01010101010101 010101010101010 1010 10101 10101010101 01010101010101 010101010101010 1010
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Hortonworks: Architecting and Optimizing for the Cloud CLOUD STORAGE WORKLOADS Durable Ephemeral When data resides in cloud object stores (e.g. Amazon S3), Hadoop optimizes reads/writes and acts as an intermediate cache to increase performance and decrease latency. Metastore SCHEMA Long Running Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS. Define your data schema, security policies, and metadata catalog once for your ephemeral and always-on workloads. Atlas CATALOG Ranger POLICY SHARED DATA LAKE SERVICES
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved What Is Cloudbreak ? Cloudbreak is a tool for provisioning Hadoop clusters on any cloud infrastructure Simplified cluster provisioning - prescriptive setup, simple automation
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Harness the Agility of Cloud with Ease Cloudbreak • Declarative workload provisioning across multiple cloud providers • Flexible topologies and security configuration options • DevOps friendly, easy setup and simple to automate • Built-in elasticity and auto- scaling • Prescriptive integration with cloud services
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak Building Blocks • Cloud Credentials • Ambari Blueprints • Auto Scaling • Custom Recipes • Custom Images • Network • Gateway • Kerberos Security • Dynamic Blueprints • Cloud Storage Simple and Flexible Prescriptive Secure
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Custom Images
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Background: Cloudbreak 1. Cloudbreak creates VM instances using a default base image. 2. Cloudbreak installs Ambari on a VM instance. 3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances. Cloudbreak Node VM Node VM Node VM Node VM Node VM Node VM Cluster
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Custom Images Overview Create the Custom Image Register the Custom Image Use the Custom Image when Creating a Cluster 1 2 3
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Recipes
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Background: Recipes • Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases can be addressed. • Install additional software. • System config changes. • A recipe is a script that runs on all nodes of a selected node group at a specific time. • Support for bash and python scripts. • Available hooks • Pre-ambari-start • Post-ambari-start • Post-cluster-install • Pre-termination
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Cluster Extensions > Recipes > Create • Add recipe as File, Url or Text
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Clusters > Create Cluster > Cluster Extensions
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Kerberos Security
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Background: Kerberos • Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop. Users need to be able to reliably “identify” themselves and then have that identity propagated throughout the Hadoop cluster. • Once this is done, those users can access resources (such as files or directories) or interact with the cluster (like running MapReduce jobs). • Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need to authenticate with each other to avoid potential malicious systems or daemon’s “posing as” trusted components of the cluster to gain access to data.
  • 24. 25 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Enable Kerberos Security • Create Cluster > Security > Advanced • [ ] Enable Kerberos Security
  • 25. 26 © Hortonworks Inc. 2011–2018. All rights reserved Options: Use Existing KDC or Use Test KDC Use Existing KDC Use Test KDC Advanced Basic - Not for production use. For testing and evaluation purposes only. - Installs and configures an MIT KDC on the master node. - Configures the cluster to leverage that KDC. - Provide basic information about your existing KDC. - Ambari Kerberos descriptors are generated automatically. - Provide basic information about your existing KDC. - Provide your own Ambari Kerberos descriptors.
  • 26. 27 © Hortonworks Inc. 2011–2018. All rights reserved Auto Scaling
  • 27. 28 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling • Alerts: Create metric or time-based alerts for cluster scaling • Policies: Scaling policies adjust cluster size based on activity and workload alerts • General Configurations: Boundaries and cooldown period
  • 28. 29 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Time-Based Alert Fire at 10:15 am everyday
  • 29. 30 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Metric-Based Alert Fire after NodeManagers are CRITICAL for 10 minutes
  • 30. 31 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Policies • Define the Scale Adjustment (Node Count, Percentage, Exact) • Select the Host Group (to Scale) • Select Alert (which when fired, executes the Policy)
  • 31. 37 © Hortonworks Inc. 2011–2018. All rights reserved Learn More • Try Ambari • https://docs.hortonworks.co m/HDPDocuments/Ambari/A mbari-2.7.0.0/index.html • Try Cloudbreak 2.8 (TP) • https://docs.hortonworks.co m/HDPDocuments/Cloudbre ak/Cloudbreak- 2.8.0/index.html
  • 32. 38 © Hortonworks Inc. 2011–2018. All rights reserved Questions ?
  • 33. 39 © Hortonworks Inc. 2011–2018. All rights reserved Thank you !