SlideShare una empresa de Scribd logo
1 de 15
A Tool for Practical Garbage Collection Analysis
                       In the Cloud


                            Arun Kejariwal

                                March 2013




1                 International Conference on Cloud Engineering 2013   © Arun Kejariwal
Overview

      Cloud computing becoming ubiquitous
       o  SaaS, PaaS, IaaS
       o  Market size of 65 to 85 billion by 2015 [McKinsey]


      IaaS
       o  Large adoption
                Higher scalability, Lower cost, Reduced time-to-market
       o  Examples
                Zynga, Netflix, PBS, Foursquare, …
       o  Growing vendors
                AWS, Google Compute Engine, Azure, Rackspace



      Java-based web applications
       o  GC impacts application performance in a significant way
                For example: [Zhao et. al, OOPSLA’09]
                100s of papers published on memory management in languages such as Java
                  [“The Garbage Collection Bibliography,” http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbib.pdf”]

2                                                 International Conference on Cloud Engineering 2013                    © Arun Kejariwal
GC Analysis in the Cloud: Why Bother?

      User Experience
        o  Latency, Throughput

      Application-driven selection of GC Type

      Performance evaluation of new JVM
        o  JVM 7
              G1 collector, New optimizations such as escape analysis


      Capacity Planning
        o  Operational Efficiency
        o  For example, on AWS




3                                      International Conference on Cloud Engineering 2013   © Arun Kejariwal
Key Contributions

      Tool – called            – for GC analysis in the cloud

        o  Cluster with over 100 nodes
      Features
        o  Driven by actual needs of the various application teams
        o  Focus on simplicity
               Deployed in production
               Solution of the winner of the Netflix Prize was very academic and not deployable in production
        o  Outlier detection
               Detecting “bad” nodes via unsupervised learning
        o  Detect performance regressions via time series analysis
               Performance impact of new features
               Red/Black deployments
        o  Characterize performance during A/B (bucket) testing
        o  Detect memory “leaks”

4                                        International Conference on Cloud Engineering 2013                © Arun Kejariwal
GC: Quick review

      Generational garbage collector




        o  Objects are first allocated to Young Gen (YG)
        o  Objects are promoted to Old Gen (OG) whose age is more than a given threshold
      GC Type
        o  Parallel
        o  CMS
        o  Recent: G1

5                                 International Conference on Cloud Engineering 2013   © Arun Kejariwal
What About Using Existing Tools?

      AppDynamics
      GCHisto, GCViewer, Printgcstats, Jconsole

      Common limitations
       o  Absence of support for analyzing GC performance of a cluster of nodes
              Tailored for a single Java process
       o  Lack of statistical analysis
              Mean
                                     k-Nearest Neighbor for outlier detection
              Standard deviation
              Trend analysis
       o  Lack of support for G1 GC
       o  Most tools are no longer maintained




6                                       International Conference on Cloud Engineering 2013   © Arun Kejariwal
Shrek: Analyzing Heap Usage

      Why bother?
       o  High performance variability in the cloud [Iosup et. al, CCG, 2011]

       o  Potential reasons
            o  Nodes going bad [Hoelzle and Barroso 2009], [Dai et al.], [Vishwanath and Nagappan, SoCC, 2010]

            o  Multi-tenancy

            o  Load balancer issues
                    AWS ELB issues on Dec 24, 2012 [http://aws.amazon.com/message/680587/]


            o  A/B Testing

            o  Cascading effects in a SOA

            o  Failover from another availability zone



7                                           International Conference on Cloud Engineering 2013       © Arun Kejariwal
Shrek: Analyzing Heap Usage (contd.)

      Detect “bad”/outlier nodes
        o  Terminate and spring up new ones
        o  Early detection results in minimum customer impact
        o  Example total heap usage time series output obtained via Shrek




8                                  International Conference on Cloud Engineering 2013   © Arun Kejariwal
Shrek: Analyzing Heap Usage (contd.)

      Detect outliers
        o  k-NN unsupervised learning
                                                                                     3.9513.953
                                                      4


                                                                                                                          3.764           3.772
                                                                                                                                  3.731
                                                                                                  3.697

                                                                                                                  3.581                           3.574 3.563
                                                                                                          3.539                                                         3.528
                                                                                                                                                                3.467
                                                                             3.419                                                                                                                            3.396
                                                                                                                                                                                3.394 3.372
                                                              3.36

                                                                     3.225                                                                                                                            3.247
          10−4/(Avg Young Generation Use * Std Dev)




                                                                                                                                                                                              3.131
                                                      3




                                                                                                                                                                                                                      2.204

                                                                                                                                                                                                                              2.09

                                                                                                                                                                                                                                             1.97
                                                      2




                                                                                                                                                                                                                                     1.885          1.893
                                                                                                                                                                                                                                                            1.829

                                                                                                                                                                                                                                                                    1.705                           1.696
                                                                                                                                                                                                                                                                            1.649
                                                                                                                                                                                                                                                                                    1.561


                                                                                                                                                                                                                                                                                            1.395
                                                      1




                                                                                                                                                                                                                                                                                                                    0.332
                                                                                                                                                                                                                                                                                                            0.294
                                                      0




                                                          0                                5                                  10                                    15                                    20                                        25                                  30

                                                                                                                                                                                 Node
9                                                                                                                                                 International Conference on Cloud Engineering 2013                                                                                                                        © Arun Kejariwal
Shrek: Analyzing Heap Usage (contd.)

       Old Gen usage
         o  Driven by promotion rate
         o  Promotion rate may vary across nodes
               A/B testing




       Shrek also reports the YG usage time series
10                                 International Conference on Cloud Engineering 2013   © Arun Kejariwal
Shrek: Analyzing Pause Times

       Pause time analysis
        o  Data distribution of GC pause times
        o  Histogram plots supported by Shrek
              Initial Mark
              Remark
              Full GC Times




11                                 International Conference on Cloud Engineering 2013   © Arun Kejariwal
Shrek: Summary Report

       Metrics reported for each node
         o  Minor GC count
         o  # Failures (concurrent mode failures) and Failure Time
                 Not reported by any existing tool
         o    Initial Mark and Remark
         o    Average and Max YG (s)
         o    Average and Max Full GC (s)
         o    Average Promotion (MB)
                 Not reported by any existing tool

       Summary report integrated with the in-house alerting system
         o  Assist in triaging production issues
       Recap
         o  Existing tools do not support GC analysis across an entire cluster



12                                          International Conference on Cloud Engineering 2013   © Arun Kejariwal
Shrek: Detecting Memory “Leaks”

       Time series analysis of heap usage
        o  Upward sloping over multiple days
              Potential memory “leak”
        o  Predict heap usage trend
              Holt Winters method for prediction


       Example from production
        o  Upward sloping
            o  Verified “leak” with the application team
        o  Orange region
              80% prediction level
        o  Yellow region
              95% prediction level




13                                       International Conference on Cloud Engineering 2013   © Arun Kejariwal
Wrapping up …

       Shrek – Tool for GC analysis in the cloud
         o    Statistical analysis
         o    Detect performance regression
         o    “Bad”/outlier nodes detection
         o    Characterize performance of Red/Black deployments
         o    Memory “leak” detection




       Future work
         o  Integrate with Hive/… to limit pulling GC logs from production nodes to once only
         o  Support advanced analytics to guide tuning of GC parameters




14                                  International Conference on Cloud Engineering 2013   © Arun Kejariwal
Q&A




15   International Conference on Cloud Engineering 2013   © Arun Kejariwal

Más contenido relacionado

Destacado

Data Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixData Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at Netflix
Kurt Brown
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
Ruslan Meshenberg
 
Soical studies s.b.a
Soical studies s.b.aSoical studies s.b.a
Soical studies s.b.a
bigbellyninja
 
Basic Garbage Collection Techniques
Basic  Garbage  Collection  TechniquesBasic  Garbage  Collection  Techniques
Basic Garbage Collection Techniques
An Khuong
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
Ruslan Meshenberg
 

Destacado (20)

Mapa ilustrado de Estados Unidos
Mapa ilustrado de Estados Unidos Mapa ilustrado de Estados Unidos
Mapa ilustrado de Estados Unidos
 
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
 
re:Invent 2012 Optimizing Cassandra
re:Invent 2012 Optimizing Cassandrare:Invent 2012 Optimizing Cassandra
re:Invent 2012 Optimizing Cassandra
 
Mistery box
Mistery boxMistery box
Mistery box
 
Data Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at NetflixData Science with Elastic MapReduce (EMR) at Netflix
Data Science with Elastic MapReduce (EMR) at Netflix
 
山水
山水山水
山水
 
Formació en competències
Formació en competènciesFormació en competències
Formació en competències
 
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
 
St patricks day
St patricks daySt patricks day
St patricks day
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
 
AWS Re:Invent - Optimizing Costs with AWS
AWS Re:Invent -  Optimizing Costs with AWSAWS Re:Invent -  Optimizing Costs with AWS
AWS Re:Invent - Optimizing Costs with AWS
 
Soical studies s.b.a
Soical studies s.b.aSoical studies s.b.a
Soical studies s.b.a
 
Basic Garbage Collection Techniques
Basic  Garbage  Collection  TechniquesBasic  Garbage  Collection  Techniques
Basic Garbage Collection Techniques
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
 
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
 
Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)Devops at Netflix (re:Invent)
Devops at Netflix (re:Invent)
 
Amalan terbaik bandar mampan
Amalan terbaik bandar mampanAmalan terbaik bandar mampan
Amalan terbaik bandar mampan
 
AWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at NetflixAWS Re:Invent - High Availability Architecture at Netflix
AWS Re:Invent - High Availability Architecture at Netflix
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 

Similar a A Tool for Practical Garbage Collection Analysis In the Cloud

3 d wm monasolyman_10nov_ainshames
3 d wm monasolyman_10nov_ainshames3 d wm monasolyman_10nov_ainshames
3 d wm monasolyman_10nov_ainshames
Aboul Ella Hassanien
 
Research Design Report Tagore
Research Design Report TagoreResearch Design Report Tagore
Research Design Report Tagore
Vinoth Kanna
 
Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan_Dissertation (1)
Mohan_Dissertation (1)
Mohan Bhargav
 
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptxPresentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
AkimPardede2
 

Similar a A Tool for Practical Garbage Collection Analysis In the Cloud (20)

Business Benefits of Cloud Computing to Indian IT Service
Business Benefits of Cloud Computing to Indian IT ServiceBusiness Benefits of Cloud Computing to Indian IT Service
Business Benefits of Cloud Computing to Indian IT Service
 
451\'s Conducting The Cloud Orchestration With A Focus On Test & Development
451\'s Conducting The Cloud Orchestration With A Focus On Test & Development451\'s Conducting The Cloud Orchestration With A Focus On Test & Development
451\'s Conducting The Cloud Orchestration With A Focus On Test & Development
 
Globus Toolkit 3 Core – A Grid Service Container Framework: Thomas Sandholm J...
Globus Toolkit 3 Core – A Grid Service Container Framework: Thomas Sandholm J...Globus Toolkit 3 Core – A Grid Service Container Framework: Thomas Sandholm J...
Globus Toolkit 3 Core – A Grid Service Container Framework: Thomas Sandholm J...
 
3 d wm monasolyman_10nov_ainshames
3 d wm monasolyman_10nov_ainshames3 d wm monasolyman_10nov_ainshames
3 d wm monasolyman_10nov_ainshames
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
 
Research Design Report Tagore
Research Design Report TagoreResearch Design Report Tagore
Research Design Report Tagore
 
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET-  	  Cost Effective Workflow Scheduling in BigdataIRJET-  	  Cost Effective Workflow Scheduling in Bigdata
IRJET- Cost Effective Workflow Scheduling in Bigdata
 
Mohan_Dissertation (1)
Mohan_Dissertation (1)Mohan_Dissertation (1)
Mohan_Dissertation (1)
 
Enhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
Enhanced Integrity Preserving Homomorphic Scheme for Cloud StorageEnhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
Enhanced Integrity Preserving Homomorphic Scheme for Cloud Storage
 
2011 keesvan gelder
2011 keesvan gelder2011 keesvan gelder
2011 keesvan gelder
 
thesis
thesisthesis
thesis
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptxPresentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
 
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)International Journal on Cloud Computing: Services and Architecture (IJCCSA)
International Journal on Cloud Computing: Services and Architecture (IJCCSA)
 
Guaranteed Availability of Cloud Data with Efficient Cost
Guaranteed Availability of Cloud Data with Efficient CostGuaranteed Availability of Cloud Data with Efficient Cost
Guaranteed Availability of Cloud Data with Efficient Cost
 
Secure Cloud Storage
Secure Cloud StorageSecure Cloud Storage
Secure Cloud Storage
 
Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World Cloud Computing: A Perspective on Next Basic Utility in IT World
Cloud Computing: A Perspective on Next Basic Utility in IT World
 

Más de Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
Arun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
Arun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
Arun Kejariwal
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action Upon
Arun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
Arun Kejariwal
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
Arun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Arun Kejariwal
 

Más de Arun Kejariwal (20)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly Detection
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using Heron
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action Upon
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

A Tool for Practical Garbage Collection Analysis In the Cloud

  • 1. A Tool for Practical Garbage Collection Analysis In the Cloud Arun Kejariwal March 2013 1 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 2. Overview   Cloud computing becoming ubiquitous o  SaaS, PaaS, IaaS o  Market size of 65 to 85 billion by 2015 [McKinsey]   IaaS o  Large adoption   Higher scalability, Lower cost, Reduced time-to-market o  Examples   Zynga, Netflix, PBS, Foursquare, … o  Growing vendors   AWS, Google Compute Engine, Azure, Rackspace   Java-based web applications o  GC impacts application performance in a significant way   For example: [Zhao et. al, OOPSLA’09]   100s of papers published on memory management in languages such as Java [“The Garbage Collection Bibliography,” http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbib.pdf”] 2 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 3. GC Analysis in the Cloud: Why Bother?   User Experience o  Latency, Throughput   Application-driven selection of GC Type   Performance evaluation of new JVM o  JVM 7   G1 collector, New optimizations such as escape analysis   Capacity Planning o  Operational Efficiency o  For example, on AWS 3 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 4. Key Contributions   Tool – called – for GC analysis in the cloud o  Cluster with over 100 nodes   Features o  Driven by actual needs of the various application teams o  Focus on simplicity   Deployed in production   Solution of the winner of the Netflix Prize was very academic and not deployable in production o  Outlier detection   Detecting “bad” nodes via unsupervised learning o  Detect performance regressions via time series analysis   Performance impact of new features   Red/Black deployments o  Characterize performance during A/B (bucket) testing o  Detect memory “leaks” 4 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 5. GC: Quick review   Generational garbage collector o  Objects are first allocated to Young Gen (YG) o  Objects are promoted to Old Gen (OG) whose age is more than a given threshold   GC Type o  Parallel o  CMS o  Recent: G1 5 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 6. What About Using Existing Tools?   AppDynamics   GCHisto, GCViewer, Printgcstats, Jconsole   Common limitations o  Absence of support for analyzing GC performance of a cluster of nodes   Tailored for a single Java process o  Lack of statistical analysis   Mean k-Nearest Neighbor for outlier detection   Standard deviation   Trend analysis o  Lack of support for G1 GC o  Most tools are no longer maintained 6 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 7. Shrek: Analyzing Heap Usage   Why bother? o  High performance variability in the cloud [Iosup et. al, CCG, 2011] o  Potential reasons o  Nodes going bad [Hoelzle and Barroso 2009], [Dai et al.], [Vishwanath and Nagappan, SoCC, 2010] o  Multi-tenancy o  Load balancer issues   AWS ELB issues on Dec 24, 2012 [http://aws.amazon.com/message/680587/] o  A/B Testing o  Cascading effects in a SOA o  Failover from another availability zone 7 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 8. Shrek: Analyzing Heap Usage (contd.)   Detect “bad”/outlier nodes o  Terminate and spring up new ones o  Early detection results in minimum customer impact o  Example total heap usage time series output obtained via Shrek 8 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 9. Shrek: Analyzing Heap Usage (contd.)   Detect outliers o  k-NN unsupervised learning 3.9513.953 4 3.764 3.772 3.731 3.697 3.581 3.574 3.563 3.539 3.528 3.467 3.419 3.396 3.394 3.372 3.36 3.225 3.247 10−4/(Avg Young Generation Use * Std Dev) 3.131 3 2.204 2.09 1.97 2 1.885 1.893 1.829 1.705 1.696 1.649 1.561 1.395 1 0.332 0.294 0 0 5 10 15 20 25 30 Node 9 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 10. Shrek: Analyzing Heap Usage (contd.)   Old Gen usage o  Driven by promotion rate o  Promotion rate may vary across nodes   A/B testing   Shrek also reports the YG usage time series 10 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 11. Shrek: Analyzing Pause Times   Pause time analysis o  Data distribution of GC pause times o  Histogram plots supported by Shrek   Initial Mark   Remark   Full GC Times 11 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 12. Shrek: Summary Report   Metrics reported for each node o  Minor GC count o  # Failures (concurrent mode failures) and Failure Time   Not reported by any existing tool o  Initial Mark and Remark o  Average and Max YG (s) o  Average and Max Full GC (s) o  Average Promotion (MB)   Not reported by any existing tool   Summary report integrated with the in-house alerting system o  Assist in triaging production issues   Recap o  Existing tools do not support GC analysis across an entire cluster 12 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 13. Shrek: Detecting Memory “Leaks”   Time series analysis of heap usage o  Upward sloping over multiple days   Potential memory “leak” o  Predict heap usage trend   Holt Winters method for prediction   Example from production o  Upward sloping o  Verified “leak” with the application team o  Orange region   80% prediction level o  Yellow region   95% prediction level 13 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 14. Wrapping up …   Shrek – Tool for GC analysis in the cloud o  Statistical analysis o  Detect performance regression o  “Bad”/outlier nodes detection o  Characterize performance of Red/Black deployments o  Memory “leak” detection   Future work o  Integrate with Hive/… to limit pulling GC logs from production nodes to once only o  Support advanced analytics to guide tuning of GC parameters 14 International Conference on Cloud Engineering 2013 © Arun Kejariwal
  • 15. Q&A 15 International Conference on Cloud Engineering 2013 © Arun Kejariwal