SlideShare a Scribd company logo
1 of 25
MetaZeta Clusters
Background of Paul Baclace
2005-2006 Internet Archive with Doug
  Cutting on Hadoop/Nutch
2008-2010 AT&T interactive
2010-2012 Euclid Elements, Yoterra,
  Zettaset, GroupAngle.com,
  ProductSignals.com, ThirdEye,
  Hortonworks


July 13, 2012   MetaZeta.com           2
Hadoop Clusters for Training
•
     Generate pre-configured clusters
•
     Identical and independent
•
     Hadoop, HDFS, HBase, Hive, Pig
•
     Spawn N clusters for deadline
•
     Minimize setup needed by student




July 13, 2012   MetaZeta.com            3
Cluster Requirements
•
     Access cluster via a single meta-page
•
     Avoid need for browser proxy or plugins
•
     No installation required for student laptop
•
     ssh is optional




July 13, 2012   MetaZeta.com                 4
Per-Cluster Logical View




July 13, 2012   MetaZeta.com        5
Web UI Map




July 13, 2012   MetaZeta.com   6
Whirr + jclouds




July 13, 2012   MetaZeta.com      7
Whirr + jclouds




July 13, 2012   MetaZeta.com      8
July 13, 2012   MetaZeta.com   9
July 13, 2012   MetaZeta.com   10
July 13, 2012   MetaZeta.com   11
July 13, 2012   MetaZeta.com   12
July 13, 2012   MetaZeta.com   13
July 13, 2012   MetaZeta.com   14
July 13, 2012   MetaZeta.com   15
July 13, 2012   MetaZeta.com   16
Challenges
•
     Slow Package Installation Process
•
     Amazon EC2 throttling
•
     Failures after configuration changes
•
     Occasional failures of EC2 nodes
        Boot failure
        DNS server failure
        Package repo availability

July 13, 2012   MetaZeta.com                17
Slow Package Installation Process
TotalTime = Nclusters * installLatency

installLatency = Npackages * repoLatency

Typical case repoLatency = 10-20sec
Worst case repoLatency = ∞


July 13, 2012   MetaZeta.com             18
Slow Package Installation Process
Solution:
•
   Pre-install everything on custom AMI
•
   Custom AMI can be slower to load




July 13, 2012   MetaZeta.com              19
Amazon EC2 throttling
   EC2 API Request Rate
    At human speeds:
           • 100-2000msec latency
           • Short sleep in between
    Remove sleep time:
           • 2-20sec latency
    Overlap requests in parallel:
           • HTTP 500 (no donut for you)
July 13, 2012   MetaZeta.com           20
Amazon EC2 throttling
Solution:
•
   Avoidance by rate-limiting all requests
•
   Use heuristics to estimate lead-time
 needed to spawn N clusters




July 13, 2012   MetaZeta.com                 21
EC2 or Config Failures
Solution:
•
   Acceptance Testing of
       HDFS
       Map-Reduce
       Hive
       HBase
       Hive + HBase

July 13, 2012   MetaZeta.com        22
Results
Node Allocation: 287sec median, 467sec 95th%
Config: 94sec median, 134sec 95th%
Testing: 147sec median, 155sec 95th%
Tagging: 79sec median, 155sec 95th%


Overall: 520sec median, 777sec 95th%



July 13, 2012   MetaZeta.com             23
Credits
Thank you to:
•
 Tom White for starting Whirr
•
 Adrian Cole for starting jclouds
•
 All the contributors to each project




July 13, 2012   MetaZeta.com            24
Pointers
•
     http://metazeta.com/
•
     http://www.jclouds.org/
•
     http://whirr.apache.org/




July 13, 2012   MetaZeta.com     25

More Related Content

What's hot

Tulsa tech fest 2010 - web speed and scalability
Tulsa tech fest 2010  - web speed and scalabilityTulsa tech fest 2010  - web speed and scalability
Tulsa tech fest 2010 - web speed and scalabilityJason Ragsdale
 
Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Jason Ragsdale
 
Level Up: 5 Expert Tips for Optimizing WordPress Performance
Level Up: 5 Expert Tips for Optimizing WordPress PerformanceLevel Up: 5 Expert Tips for Optimizing WordPress Performance
Level Up: 5 Expert Tips for Optimizing WordPress PerformancePantheon
 
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Fastly
 
An Introduction to Cassandra on Linux
An Introduction to Cassandra on LinuxAn Introduction to Cassandra on Linux
An Introduction to Cassandra on Linuxnickmbailey
 
HTTP Caching in Web Application
HTTP Caching in Web ApplicationHTTP Caching in Web Application
HTTP Caching in Web ApplicationMartins Sipenko
 
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014zshoylev
 
NGINX High-performance Caching
NGINX High-performance CachingNGINX High-performance Caching
NGINX High-performance CachingNGINX, Inc.
 
Wido den Hollander - 10 ways to break your Ceph cluster
Wido den Hollander - 10 ways to break your Ceph clusterWido den Hollander - 10 ways to break your Ceph cluster
Wido den Hollander - 10 ways to break your Ceph clusterShapeBlue
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency CephShapeBlue
 
Performance Optimization using Caching | Swatantra Kumar
Performance Optimization using Caching | Swatantra KumarPerformance Optimization using Caching | Swatantra Kumar
Performance Optimization using Caching | Swatantra KumarSwatantra Kumar
 
Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeFastly
 
Usenix LISA 2012 - Choosing a Proxy
Usenix LISA 2012 - Choosing a ProxyUsenix LISA 2012 - Choosing a Proxy
Usenix LISA 2012 - Choosing a ProxyLeif Hedstrom
 
Advanced HTTP Caching
Advanced HTTP CachingAdvanced HTTP Caching
Advanced HTTP CachingMartin Breest
 
Cluster your application using CDI and JCache - Jonathan Gallimore
Cluster your application using CDI and JCache - Jonathan GallimoreCluster your application using CDI and JCache - Jonathan Gallimore
Cluster your application using CDI and JCache - Jonathan GallimoreJAXLondon_Conference
 

What's hot (20)

Tulsa tech fest 2010 - web speed and scalability
Tulsa tech fest 2010  - web speed and scalabilityTulsa tech fest 2010  - web speed and scalability
Tulsa tech fest 2010 - web speed and scalability
 
Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010Caching: A Guided Tour - 10/12/2010
Caching: A Guided Tour - 10/12/2010
 
Level Up: 5 Expert Tips for Optimizing WordPress Performance
Level Up: 5 Expert Tips for Optimizing WordPress PerformanceLevel Up: 5 Expert Tips for Optimizing WordPress Performance
Level Up: 5 Expert Tips for Optimizing WordPress Performance
 
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
 
An Introduction to Cassandra on Linux
An Introduction to Cassandra on LinuxAn Introduction to Cassandra on Linux
An Introduction to Cassandra on Linux
 
HTTP Caching in Web Application
HTTP Caching in Web ApplicationHTTP Caching in Web Application
HTTP Caching in Web Application
 
Caching
CachingCaching
Caching
 
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
 
Http caching basics
Http caching basicsHttp caching basics
Http caching basics
 
NGINX High-performance Caching
NGINX High-performance CachingNGINX High-performance Caching
NGINX High-performance Caching
 
JCache Using JCache
JCache Using JCacheJCache Using JCache
JCache Using JCache
 
Wido den Hollander - 10 ways to break your Ceph cluster
Wido den Hollander - 10 ways to break your Ceph clusterWido den Hollander - 10 ways to break your Ceph cluster
Wido den Hollander - 10 ways to break your Ceph cluster
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
Performance Optimization using Caching | Swatantra Kumar
Performance Optimization using Caching | Swatantra KumarPerformance Optimization using Caching | Swatantra Kumar
Performance Optimization using Caching | Swatantra Kumar
 
Varnish Cache
Varnish CacheVarnish Cache
Varnish Cache
 
Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the Edge
 
Usenix LISA 2012 - Choosing a Proxy
Usenix LISA 2012 - Choosing a ProxyUsenix LISA 2012 - Choosing a Proxy
Usenix LISA 2012 - Choosing a Proxy
 
Advanced HTTP Caching
Advanced HTTP CachingAdvanced HTTP Caching
Advanced HTTP Caching
 
Caching on the web
Caching on the webCaching on the web
Caching on the web
 
Cluster your application using CDI and JCache - Jonathan Gallimore
Cluster your application using CDI and JCache - Jonathan GallimoreCluster your application using CDI and JCache - Jonathan Gallimore
Cluster your application using CDI and JCache - Jonathan Gallimore
 

Viewers also liked

Not Only Drupal
Not Only DrupalNot Only Drupal
Not Only Drupalmcantelon
 
Sphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in DrupalSphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in Drupalelliando dias
 
Computational genomics approaches to precision medicine
Computational genomics approaches to precision medicineComputational genomics approaches to precision medicine
Computational genomics approaches to precision medicineAltuna Akalin
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesStoyan Stefanov
 
Basic Crud In Django
Basic Crud In DjangoBasic Crud In Django
Basic Crud In Djangomcantelon
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Altuna Akalin
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
The Physics of Fast Image Compression
The Physics of Fast Image CompressionThe Physics of Fast Image Compression
The Physics of Fast Image CompressionCloudinary
 
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)Chris Aniszczyk
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleHBaseCon
 
An Abusive Relationship with AngularJS
An Abusive Relationship with AngularJSAn Abusive Relationship with AngularJS
An Abusive Relationship with AngularJSMario Heiderich
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Fostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation EcosystemFostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation EcosystemTechstars
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 

Viewers also liked (17)

Not Only Drupal
Not Only DrupalNot Only Drupal
Not Only Drupal
 
Sphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in DrupalSphinx: Leveraging Scalable Search in Drupal
Sphinx: Leveraging Scalable Search in Drupal
 
Computational genomics approaches to precision medicine
Computational genomics approaches to precision medicineComputational genomics approaches to precision medicine
Computational genomics approaches to precision medicine
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practices
 
Basic Crud In Django
Basic Crud In DjangoBasic Crud In Django
Basic Crud In Django
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
 
Danger Of Free
Danger Of FreeDanger Of Free
Danger Of Free
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
The Physics of Fast Image Compression
The Physics of Fast Image CompressionThe Physics of Fast Image Compression
The Physics of Fast Image Compression
 
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
An Abusive Relationship with AngularJS
An Abusive Relationship with AngularJSAn Abusive Relationship with AngularJS
An Abusive Relationship with AngularJS
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Fostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation EcosystemFostering a Startup and Innovation Ecosystem
Fostering a Startup and Innovation Ecosystem
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 

Similar to MetaZeta Clusters Overview

Choosing A Proxy Server - Apachecon 2014
Choosing A Proxy Server - Apachecon 2014Choosing A Proxy Server - Apachecon 2014
Choosing A Proxy Server - Apachecon 2014bryan_call
 
MagentoECG-UsingRedisasaCacheBackendinMagento
MagentoECG-UsingRedisasaCacheBackendinMagentoMagentoECG-UsingRedisasaCacheBackendinMagento
MagentoECG-UsingRedisasaCacheBackendinMagentoKirill Morozov
 
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopThe Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopJeff Smoley
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
Joomla! Performance on Steroids
Joomla! Performance on SteroidsJoomla! Performance on Steroids
Joomla! Performance on SteroidsSiteGround.com
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Isaac Chiang
 
Improving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with MemcachedImproving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with MemcachedAcquia
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetAchieve Internet
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetAchieve Internet
 
Node js quick-tour_v2
Node js quick-tour_v2Node js quick-tour_v2
Node js quick-tour_v2tianyi5212222
 
Node js quick-tour_v2
Node js quick-tour_v2Node js quick-tour_v2
Node js quick-tour_v2http403
 
Node js quick tour v2
Node js quick tour v2Node js quick tour v2
Node js quick tour v2Wyatt Fang
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineDataWorks Summit
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopMike Pittaro
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5UniFabric
 
Implementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedImplementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedGear6
 
Implementing an Automated Staging Environment
Implementing an Automated Staging EnvironmentImplementing an Automated Staging Environment
Implementing an Automated Staging EnvironmentDaniel Oliveira Filho
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 

Similar to MetaZeta Clusters Overview (20)

Choosing A Proxy Server - Apachecon 2014
Choosing A Proxy Server - Apachecon 2014Choosing A Proxy Server - Apachecon 2014
Choosing A Proxy Server - Apachecon 2014
 
MagentoECG-UsingRedisasaCacheBackendinMagento
MagentoECG-UsingRedisasaCacheBackendinMagentoMagentoECG-UsingRedisasaCacheBackendinMagento
MagentoECG-UsingRedisasaCacheBackendinMagento
 
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopThe Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Joomla! Performance on Steroids
Joomla! Performance on SteroidsJoomla! Performance on Steroids
Joomla! Performance on Steroids
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15
 
Improving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with MemcachedImproving Website Performance and Scalability with Memcached
Improving Website Performance and Scalability with Memcached
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
 
Improving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve InternetImproving Website Performance with Memecached Webinar | Achieve Internet
Improving Website Performance with Memecached Webinar | Achieve Internet
 
Node js quick-tour_v2
Node js quick-tour_v2Node js quick-tour_v2
Node js quick-tour_v2
 
Node js quick-tour_v2
Node js quick-tour_v2Node js quick-tour_v2
Node js quick-tour_v2
 
Node js quick tour v2
Node js quick tour v2Node js quick tour v2
Node js quick tour v2
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5NGENSTOR_ODA_P2V_V5
NGENSTOR_ODA_P2V_V5
 
Implementing High Availability Caching with Memcached
Implementing High Availability Caching with MemcachedImplementing High Availability Caching with Memcached
Implementing High Availability Caching with Memcached
 
Implementing an Automated Staging Environment
Implementing an Automated Staging EnvironmentImplementing an Automated Staging Environment
Implementing an Automated Staging Environment
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 

Recently uploaded

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

MetaZeta Clusters Overview

  • 2. Background of Paul Baclace 2005-2006 Internet Archive with Doug Cutting on Hadoop/Nutch 2008-2010 AT&T interactive 2010-2012 Euclid Elements, Yoterra, Zettaset, GroupAngle.com, ProductSignals.com, ThirdEye, Hortonworks July 13, 2012 MetaZeta.com 2
  • 3. Hadoop Clusters for Training • Generate pre-configured clusters • Identical and independent • Hadoop, HDFS, HBase, Hive, Pig • Spawn N clusters for deadline • Minimize setup needed by student July 13, 2012 MetaZeta.com 3
  • 4. Cluster Requirements • Access cluster via a single meta-page • Avoid need for browser proxy or plugins • No installation required for student laptop • ssh is optional July 13, 2012 MetaZeta.com 4
  • 5. Per-Cluster Logical View July 13, 2012 MetaZeta.com 5
  • 6. Web UI Map July 13, 2012 MetaZeta.com 6
  • 7. Whirr + jclouds July 13, 2012 MetaZeta.com 7
  • 8. Whirr + jclouds July 13, 2012 MetaZeta.com 8
  • 9. July 13, 2012 MetaZeta.com 9
  • 10. July 13, 2012 MetaZeta.com 10
  • 11. July 13, 2012 MetaZeta.com 11
  • 12. July 13, 2012 MetaZeta.com 12
  • 13. July 13, 2012 MetaZeta.com 13
  • 14. July 13, 2012 MetaZeta.com 14
  • 15. July 13, 2012 MetaZeta.com 15
  • 16. July 13, 2012 MetaZeta.com 16
  • 17. Challenges • Slow Package Installation Process • Amazon EC2 throttling • Failures after configuration changes • Occasional failures of EC2 nodes Boot failure DNS server failure Package repo availability July 13, 2012 MetaZeta.com 17
  • 18. Slow Package Installation Process TotalTime = Nclusters * installLatency installLatency = Npackages * repoLatency Typical case repoLatency = 10-20sec Worst case repoLatency = ∞ July 13, 2012 MetaZeta.com 18
  • 19. Slow Package Installation Process Solution: • Pre-install everything on custom AMI • Custom AMI can be slower to load July 13, 2012 MetaZeta.com 19
  • 20. Amazon EC2 throttling EC2 API Request Rate At human speeds: • 100-2000msec latency • Short sleep in between Remove sleep time: • 2-20sec latency Overlap requests in parallel: • HTTP 500 (no donut for you) July 13, 2012 MetaZeta.com 20
  • 21. Amazon EC2 throttling Solution: • Avoidance by rate-limiting all requests • Use heuristics to estimate lead-time needed to spawn N clusters July 13, 2012 MetaZeta.com 21
  • 22. EC2 or Config Failures Solution: • Acceptance Testing of HDFS Map-Reduce Hive HBase Hive + HBase July 13, 2012 MetaZeta.com 22
  • 23. Results Node Allocation: 287sec median, 467sec 95th% Config: 94sec median, 134sec 95th% Testing: 147sec median, 155sec 95th% Tagging: 79sec median, 155sec 95th% Overall: 520sec median, 777sec 95th% July 13, 2012 MetaZeta.com 23
  • 24. Credits Thank you to: • Tom White for starting Whirr • Adrian Cole for starting jclouds • All the contributors to each project July 13, 2012 MetaZeta.com 24
  • 25. Pointers • http://metazeta.com/ • http://www.jclouds.org/ • http://whirr.apache.org/ July 13, 2012 MetaZeta.com 25

Editor's Notes

  1. Photo Credit: Paul Baclace * Hadoop and Cloud Computing Synergy ** Open Source means no license fee per node ** Cloud computing enables anyone to use Hadoop