SlideShare una empresa de Scribd logo
1 de 28
Partha Seetala
CTO, Robin Systems
Containerized Hadoop
beyond Kubernetes
Who am I?
SAMPLE CUSTOMER DEPLOYMENTS
11 billion security events ingested and analyzed a day
(Elasticsearch, Logstash, Kibana, Kafka)
6 Petabytes under active management in a single Robin cluster
(Cloudera, Impala, Kafka, Druid)
400 Oracle RAC databases managed by a single Robin cluster
(Oracle, Oracle RAC)
CTO of Robin Systems, before that Distinguished Engineer at Veritas/Symantec
We have solved some fundamental problems to enable containers and Kubernetes for running
complex Big Data, NoSQL, Database and AI/ML workloads
Robin is The Kubernetes platform for big data, databases and AI/ML
Why containerize Big Data, NoSQL, and Databases?
Why containerize: developers perspective?
1. Dislike opening an IT ticket and wait weeks for their apps to be ready for use
They want their apps to be available now
2. Want to experiment with tools, but dislike the complexity of setting them up
For example, which of RDBMS, NoSQL, DocumentDB, or GraphDB is better for the app?
3. Want to run apps where it makes the most sense
Their laptop, on prem datacenter, public cloud, etc
Containers offer deployment agility and infrastructure independence
Why containerize: infrastructure perspective?
40%
Resource utilization is pretty low on Big Data clusters
Why containerize – infrastructure perspective?
$34 K
Utilization worsens with every
hardware refresh
$141 K$25 K
4 years ago Today
CPU 20 Cores 40 Cores
Memory 128 GiB 512 GiB
Storage 48 TiB 144 TiB
Network 2x10 Gbe 2x40 Gbe
CPU 24 Cores
Memory 256 GiB
Storage 540 TiB
Network 4x40 Gbe
CPU 36 Cores
Memory 768 GiB
Storage 122 TiB
Network 2x100 Gbe
GPU 8x NVIDIA V100
Modern hardware offers a lot more resources per
rack unit which must be kept busy to realize RIO
Containers allow you to maximize infrastructure utilization
Why Containers, not Virtualization?
› Get the benefits of virtualization without any of its overhead
› Containers run applications directly on baremetal without virtualizing hardware
› A resource given to a hypervisor is a resource that is taken away from your Big Data application
› Applications are being packaged and shipped as container images, not VMs
› You must leverage and adapt to this shift in application packaging
› Containers avoid the need for specialized storage stacks for deduplicating VM images
What are the challenges with containerizing
Big Data, NoSQL and Databases?
Challenges with containers
Incomplete cgroups virtualization causes many Big Data and Databases to misbehave
CPU
› Contiguous core IDs, CPU ID mapping (Kudu), accurate threads:cores mapping (DB)
› NUMA aware assignment (HANA)
Memory:
› JVM sees entire host memory even if you cap the memory for container (Any JVM app)
› Memory allocation inconsistencies (hugepages, shared page cache) (Oracle)
Storage
› Apps that need raw block devices need correct WWNs management (e.g., Oracle, MapR)
› blkio cgroups setting is useless to avoid noisy neighbor problems (All apps)
Confidential – Restricted Distribution
Challenges with container orchestration platforms
Very opiniated and architected with a microservices-oriented philosophy
› Expects that apps can be brought up trivially within milliseconds during crash recovery
› Scale by adding more containers and registering with a load balancer to spread load around
› Recommend modeling your app as a collection of stateless containers, each serving a single service
But you are dealing with applications that have decades of built in assumptions
› Big Data and databases are not written as a micro-services applications
› You can’t stop and restart them rapidly
› You have to worry about both storage and network state for ensuring high availability
› There is significant investment in custom scripting that assume SSH access to hosts running apps
Confidential – Restricted Distribution
Storage and Networking challenges
2018 CNCF survey says Storage and Networking are the biggest challenges in Kubernetes
https://www.cncf.io/blog/2017/06/28/survey-shows-kubernetes-leading-orchestration-platform
48%
44%
Storage and Networking challenges
› Latest 2018 CNCF: 48% say Storage is a big challenge, 44% say Networking is a challenge in Kubernetes
› There are 27 Storage vendors and 21 Network vendors providing Storage & Networking solutions for
containers and Kubernetes1
1 https://github.com/cncf/landscape
Despite so many vendor solutions, why is it still a challenge for so many people?
Storage vendors Network vendors
Operational challenges to overcome
Storage
› Performance un-predictability when consolidating Big Data, and Database apps
› Data locality requirements (both performance and datacenter network bandwidth constraints)
› Anti/affinity and isolation constraints
Networking
› Services running inside K8S are often times consumed by applications running in different L3 subnets
› Putting a load-balancer in between apps and services is unnecessary and less performant for most
Big Data, NoSQL, and Database applications
› 90% of the apps being used in real-life require IP address to be preserved during restarts
Spending time setting up Storage and Networking is a drag on user productivity
Don’t miss the forest for the trees
Users
Applications
Infrastructure
Most vendors are looking at the
problem in this direction
Whereas we should be
looking at it in this direction
Focus on User-
App Interaction
Let apps drive
infrastructure to meet
user requirements
Focus is on Infra
components
StatefulSets, Deployments,
Persistent Volume Claims,
Services, CSI, CNI, HPA
Can Kubernetes alone get us to the promised land?
MANAGEMENT
(kubectl, helm)
SERVICES
(Ingress, Proxy, LB)
STORAGE
(CSI)
NETWORKING
(CNI, Overlay)
MONITORING
(Heapster, HPA)
CONFIGURATION
(ConfigMap, Secrets)
UI
SERVER INFRA
(Baremetal, On-prem VM,
AWS, Azure, GCP)
DATA
(PV, PVC)
TROUBLESHOOTING
(Logging, Events)
CONTAINERS
(docker, LXC)
Time to reframe our thinking
Let applications drive infrastructure to meet user requirements
(in this model application workflows configure Kubernetes, Networking and Storage)
Robin is The Kubernetes platform for big data, databases and AI/ML
Integrated
App-aware Storage
Docker, LxC,
Kubernetes
Integrated
Networking
Application-aware
Workflow Manager
+ + +
Application workflows configure Kubernetes, Networking and Storage
When you elevate your thinking to Applications
You do less of this
› Deployments, ReplicaSets, and StatefulSets
› Persistent Volumes and Claims
› Service endpoints, proxy
› Ingress and Egress routes
› Secrets and Configmaps
› Heapster, CSI, CNI, and CRI
And do more of this
› Time-travel application states
› Clone entire Applications with their data
› Backup and restore entire apps, any app
› Upgrade applications in a failsafe manner
› Control QoS of apps to meet performance SLAs
› Make applications and data mobile across clouds
Give your users a managed service experience
SPECIFY
DATA-LOCALITY,
ANTI/AFFINITY
CONSTRAINTS AND
PLACEMENT HINTS
ENABLE
SERVICE
COMPONENTS
SPECIFY COMPUTE
SPECIFY STORAGE
SPECIFY SCALE
Just minutes from click to use
64 node Hadoop Cluster with
1408 CPU Cores, 4.5 TB of Memory,
1.5 PB of Storage  takes just 23
mins
Services enabled: Atlas, Spark, Hive,
Kerberos, Sentry, HDFS, namenode
HA
K8S components auto created
(StatefulSets, PVC, Services, …)
Data-locality, anti/affinity policies
enforced
Any Big Data, NoSQL,
Database, AI/ML app
Adjust resources to meet changing priorities
› Application priorities change with time
› Faster ingest during daytime
› Faster querying for end-of-quarter reporting
› Trade resources between adjacent applications
dynamically
› Adjust CPU, Memory, GPU, Network and IOPs
dynamically
› Scaling resources vs scaling entire service
› K8S’ Horizontal Pod Autoscaler (HPA) is not
suitable for data applications:
› Works by adding more Pods to scale horizontally
› Great for stateless apps
› Not so good for Big Data, NoSQL and Databases
› Results in data rebalancing which is a costly and
permanent. Scaling down is very hard.
Kafka
Hadoop1
Hadoop2
Druid
Assign each app its
own resource quota
(CPU, Mem, IOPS)
Shift resources
from Hadoop2
to Hadoop1
with 1-Click
Shift resources
from Hadoop1
to Hadoop2
with 1-Click
8 AM 3 PM12 AM 11 PM
Application-centric resource management and QoS
› We enhanced K8S’ cgroups management
capabilities
› More comprehensive procfs, and sysfs
virtualization
› Virtualize sysinfo(2) system call
› We implemented an application-topology
aware MIN and MAX storage QoS
› Predictable performance for mission-critical
workloads
› Eliminate noisy-neighbor challenges when
consolidating workloads
Robin Application-aware Storage
Hadoop
Mongo
DB
Kafka MySQL Postgres
Mongo
DB
IO IO IO
Postgres
Operational challenges for Big Data, NoSQL, Databases
extend beyond just provisioning and scaling
Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
1-click Application-consistent Snapshots
Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4 Current
Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
Snapshot 1
4 months ago
Snapshot 2
2 weeks ago
Snapshot 3
3 days ago
Snapshot 4
yesterday
Current
now
1-click Ready-to-use Clones
RoW based Cloning (Ultra fast)
Clone gets different network identify
Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
Elevating experience to Applications
› Time machine for applications
Time travel across multiple application states
› Clone and share entire applications
for running reports, tests, and what-if analysis
› Backup and restore entire application
avoid fear of app+data loss
› Safely upgrade application
without fear of service disruption due to
version incompatibilities
› Migrate entire applications with data to
cloud
See demos at booth
G3
Robin is The Kubernetes platform for big data, databases and AI/ML
www.RobinSystems.com
1-click Provision
1-click Scale
1-click QoS Control
1-click Snapshots
1-click Clones
1-click Backup
1-click Upgrade
1-click Migrate

Más contenido relacionado

La actualidad más candente

ExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technologyExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technology
DataWorks Summit
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 

La actualidad más candente (20)

Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
 
ExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technologyExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technology
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 

Similar a Containerized Hadoop beyond Kubernetes

Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 

Similar a Containerized Hadoop beyond Kubernetes (20)

Deliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhereDeliver Big Data, Database and AI/ML as-a-Service anywhere
Deliver Big Data, Database and AI/ML as-a-Service anywhere
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
OpenEBS Technical Workshop - KubeCon San Diego 2019
OpenEBS Technical Workshop - KubeCon San Diego 2019OpenEBS Technical Workshop - KubeCon San Diego 2019
OpenEBS Technical Workshop - KubeCon San Diego 2019
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Containers as Infrastructure for New Gen Apps
Containers as Infrastructure for New Gen AppsContainers as Infrastructure for New Gen Apps
Containers as Infrastructure for New Gen Apps
 
The Modern Database for Enterprise Applications
The Modern Database for Enterprise ApplicationsThe Modern Database for Enterprise Applications
The Modern Database for Enterprise Applications
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Building Cloud-Native Applications with a Container-Native SQL Database in th...
Building Cloud-Native Applications with a Container-Native SQL Database in th...Building Cloud-Native Applications with a Container-Native SQL Database in th...
Building Cloud-Native Applications with a Container-Native SQL Database in th...
 
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
Choosing the Right Database: Exploring MySQL Alternatives for Modern Applicat...
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Containerized Hadoop beyond Kubernetes

  • 1. Partha Seetala CTO, Robin Systems Containerized Hadoop beyond Kubernetes
  • 2. Who am I? SAMPLE CUSTOMER DEPLOYMENTS 11 billion security events ingested and analyzed a day (Elasticsearch, Logstash, Kibana, Kafka) 6 Petabytes under active management in a single Robin cluster (Cloudera, Impala, Kafka, Druid) 400 Oracle RAC databases managed by a single Robin cluster (Oracle, Oracle RAC) CTO of Robin Systems, before that Distinguished Engineer at Veritas/Symantec We have solved some fundamental problems to enable containers and Kubernetes for running complex Big Data, NoSQL, Database and AI/ML workloads Robin is The Kubernetes platform for big data, databases and AI/ML
  • 3. Why containerize Big Data, NoSQL, and Databases?
  • 4. Why containerize: developers perspective? 1. Dislike opening an IT ticket and wait weeks for their apps to be ready for use They want their apps to be available now 2. Want to experiment with tools, but dislike the complexity of setting them up For example, which of RDBMS, NoSQL, DocumentDB, or GraphDB is better for the app? 3. Want to run apps where it makes the most sense Their laptop, on prem datacenter, public cloud, etc Containers offer deployment agility and infrastructure independence
  • 5. Why containerize: infrastructure perspective? 40% Resource utilization is pretty low on Big Data clusters
  • 6. Why containerize – infrastructure perspective? $34 K Utilization worsens with every hardware refresh $141 K$25 K 4 years ago Today CPU 20 Cores 40 Cores Memory 128 GiB 512 GiB Storage 48 TiB 144 TiB Network 2x10 Gbe 2x40 Gbe CPU 24 Cores Memory 256 GiB Storage 540 TiB Network 4x40 Gbe CPU 36 Cores Memory 768 GiB Storage 122 TiB Network 2x100 Gbe GPU 8x NVIDIA V100 Modern hardware offers a lot more resources per rack unit which must be kept busy to realize RIO Containers allow you to maximize infrastructure utilization
  • 7. Why Containers, not Virtualization? › Get the benefits of virtualization without any of its overhead › Containers run applications directly on baremetal without virtualizing hardware › A resource given to a hypervisor is a resource that is taken away from your Big Data application › Applications are being packaged and shipped as container images, not VMs › You must leverage and adapt to this shift in application packaging › Containers avoid the need for specialized storage stacks for deduplicating VM images
  • 8. What are the challenges with containerizing Big Data, NoSQL and Databases?
  • 9. Challenges with containers Incomplete cgroups virtualization causes many Big Data and Databases to misbehave CPU › Contiguous core IDs, CPU ID mapping (Kudu), accurate threads:cores mapping (DB) › NUMA aware assignment (HANA) Memory: › JVM sees entire host memory even if you cap the memory for container (Any JVM app) › Memory allocation inconsistencies (hugepages, shared page cache) (Oracle) Storage › Apps that need raw block devices need correct WWNs management (e.g., Oracle, MapR) › blkio cgroups setting is useless to avoid noisy neighbor problems (All apps) Confidential – Restricted Distribution
  • 10. Challenges with container orchestration platforms Very opiniated and architected with a microservices-oriented philosophy › Expects that apps can be brought up trivially within milliseconds during crash recovery › Scale by adding more containers and registering with a load balancer to spread load around › Recommend modeling your app as a collection of stateless containers, each serving a single service But you are dealing with applications that have decades of built in assumptions › Big Data and databases are not written as a micro-services applications › You can’t stop and restart them rapidly › You have to worry about both storage and network state for ensuring high availability › There is significant investment in custom scripting that assume SSH access to hosts running apps Confidential – Restricted Distribution
  • 11. Storage and Networking challenges 2018 CNCF survey says Storage and Networking are the biggest challenges in Kubernetes https://www.cncf.io/blog/2017/06/28/survey-shows-kubernetes-leading-orchestration-platform 48% 44%
  • 12. Storage and Networking challenges › Latest 2018 CNCF: 48% say Storage is a big challenge, 44% say Networking is a challenge in Kubernetes › There are 27 Storage vendors and 21 Network vendors providing Storage & Networking solutions for containers and Kubernetes1 1 https://github.com/cncf/landscape Despite so many vendor solutions, why is it still a challenge for so many people? Storage vendors Network vendors
  • 13. Operational challenges to overcome Storage › Performance un-predictability when consolidating Big Data, and Database apps › Data locality requirements (both performance and datacenter network bandwidth constraints) › Anti/affinity and isolation constraints Networking › Services running inside K8S are often times consumed by applications running in different L3 subnets › Putting a load-balancer in between apps and services is unnecessary and less performant for most Big Data, NoSQL, and Database applications › 90% of the apps being used in real-life require IP address to be preserved during restarts Spending time setting up Storage and Networking is a drag on user productivity
  • 14. Don’t miss the forest for the trees Users Applications Infrastructure Most vendors are looking at the problem in this direction Whereas we should be looking at it in this direction Focus on User- App Interaction Let apps drive infrastructure to meet user requirements Focus is on Infra components StatefulSets, Deployments, Persistent Volume Claims, Services, CSI, CNI, HPA
  • 15. Can Kubernetes alone get us to the promised land? MANAGEMENT (kubectl, helm) SERVICES (Ingress, Proxy, LB) STORAGE (CSI) NETWORKING (CNI, Overlay) MONITORING (Heapster, HPA) CONFIGURATION (ConfigMap, Secrets) UI SERVER INFRA (Baremetal, On-prem VM, AWS, Azure, GCP) DATA (PV, PVC) TROUBLESHOOTING (Logging, Events) CONTAINERS (docker, LXC)
  • 16. Time to reframe our thinking Let applications drive infrastructure to meet user requirements (in this model application workflows configure Kubernetes, Networking and Storage)
  • 17. Robin is The Kubernetes platform for big data, databases and AI/ML Integrated App-aware Storage Docker, LxC, Kubernetes Integrated Networking Application-aware Workflow Manager + + + Application workflows configure Kubernetes, Networking and Storage
  • 18. When you elevate your thinking to Applications You do less of this › Deployments, ReplicaSets, and StatefulSets › Persistent Volumes and Claims › Service endpoints, proxy › Ingress and Egress routes › Secrets and Configmaps › Heapster, CSI, CNI, and CRI And do more of this › Time-travel application states › Clone entire Applications with their data › Backup and restore entire apps, any app › Upgrade applications in a failsafe manner › Control QoS of apps to meet performance SLAs › Make applications and data mobile across clouds
  • 19. Give your users a managed service experience SPECIFY DATA-LOCALITY, ANTI/AFFINITY CONSTRAINTS AND PLACEMENT HINTS ENABLE SERVICE COMPONENTS SPECIFY COMPUTE SPECIFY STORAGE SPECIFY SCALE Just minutes from click to use 64 node Hadoop Cluster with 1408 CPU Cores, 4.5 TB of Memory, 1.5 PB of Storage  takes just 23 mins Services enabled: Atlas, Spark, Hive, Kerberos, Sentry, HDFS, namenode HA K8S components auto created (StatefulSets, PVC, Services, …) Data-locality, anti/affinity policies enforced Any Big Data, NoSQL, Database, AI/ML app
  • 20. Adjust resources to meet changing priorities › Application priorities change with time › Faster ingest during daytime › Faster querying for end-of-quarter reporting › Trade resources between adjacent applications dynamically › Adjust CPU, Memory, GPU, Network and IOPs dynamically › Scaling resources vs scaling entire service › K8S’ Horizontal Pod Autoscaler (HPA) is not suitable for data applications: › Works by adding more Pods to scale horizontally › Great for stateless apps › Not so good for Big Data, NoSQL and Databases › Results in data rebalancing which is a costly and permanent. Scaling down is very hard. Kafka Hadoop1 Hadoop2 Druid Assign each app its own resource quota (CPU, Mem, IOPS) Shift resources from Hadoop2 to Hadoop1 with 1-Click Shift resources from Hadoop1 to Hadoop2 with 1-Click 8 AM 3 PM12 AM 11 PM
  • 21. Application-centric resource management and QoS › We enhanced K8S’ cgroups management capabilities › More comprehensive procfs, and sysfs virtualization › Virtualize sysinfo(2) system call › We implemented an application-topology aware MIN and MAX storage QoS › Predictable performance for mission-critical workloads › Eliminate noisy-neighbor challenges when consolidating workloads Robin Application-aware Storage Hadoop Mongo DB Kafka MySQL Postgres Mongo DB IO IO IO Postgres
  • 22. Operational challenges for Big Data, NoSQL, Databases extend beyond just provisioning and scaling
  • 23. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud
  • 24. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud 1-click Application-consistent Snapshots Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4 Current
  • 25. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud Snapshot 1 4 months ago Snapshot 2 2 weeks ago Snapshot 3 3 days ago Snapshot 4 yesterday Current now 1-click Ready-to-use Clones RoW based Cloning (Ultra fast) Clone gets different network identify
  • 26. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud
  • 27. Elevating experience to Applications › Time machine for applications Time travel across multiple application states › Clone and share entire applications for running reports, tests, and what-if analysis › Backup and restore entire application avoid fear of app+data loss › Safely upgrade application without fear of service disruption due to version incompatibilities › Migrate entire applications with data to cloud
  • 28. See demos at booth G3 Robin is The Kubernetes platform for big data, databases and AI/ML www.RobinSystems.com 1-click Provision 1-click Scale 1-click QoS Control 1-click Snapshots 1-click Clones 1-click Backup 1-click Upgrade 1-click Migrate

Notas del editor

  1. Containers are taking over the world by storm. Everyone seems to be doing them. They are the next big thing since virtualization. Most software vendors are now releasing their software as a docker image. Heck even Microsoft has released SQLServer for Linux as a docker image. It seems that the industry has accepted that going forward software will be shipped and run inside containers. So it is only natural to ask – how about running Hadoop inside containers.
  2. Containers are taking over the world by storm. Everyone seems to be doing them. They are the next big thing since virtualization. Most software vendors are now releasing their software as a docker image. Heck even Microsoft has released SQLServer for Linux as a docker image. It seems that the industry has accepted that going forward software will be shipped and run inside containers. So it is only natural to ask – how about running Hadoop inside containers.