SlideShare a Scribd company logo
1 of 16
Download to read offline
Challenges and Uniqueness
             of
QE and RE processes in Hadoop
          Jayant Mahajan
  Grid Computing, Yahoo! Bangalore
             Feb 2010

                   -1-               1
Agenda



 •   Quality Checks for a Patch at Hadoop
 •   Additional QE at Yahoo!
 •   Tools used for Hadoop QE and RE
 •   Challenges




                               -2-
Quality checks for a patch commit in Hadoop



  • Static Quality Analysis – Patch attached to Jira
     – Verify Findbugs warnings
     – Verify Javadoc warning
     – Verify ReleaseAudit warnings
     – Verify Unit Tests – if added or not
  • Committer Review
  • Unit Tests
     – Junit
     – Mini MR Tests



                                  -3-
Quality checks for a patch commit in Hadoop (Contd ..)

                               COMMUNITY
                                                                                 Secondary Build
                                                                  •   Static analysis – findbugs
                                                                  •   Jdiff
                                                                  •   All Core unit tests with code coverage
                                                                  •   All Contrib unit tests code coverage




  Jira             Patch      Set JIRA             Patch                   commit
  raised           attached   status to            picked up
                   to JIRA    “Patch               for testing
                                                                                                    SVN
                              Available”           - HUDSON




                                  • Static analysis – Findbug
                                  • ReleaseAudit warning                      Committer
                                  • Fast unit tests - TestNG                   Review
                                  • Fast contrib unit tests (if
           Development            patching contrib)



                                             -4-
Additional QE @ Y! for Hadoop


  • We are the largest test team for Hadoop
  • More than 1000 nodes dedicated for QE
  • Hadoop testing at Yahoo
    – Patch testing
    – Automated Testing
    – Manual Testing




                            -5-
Additional QE @ Y! for Hadoop (Contd ..)

                               COMMUNITY

  Jira             Patch      Set JIRA              Patch           commit
  raised           attached   status to             picked up
                   to JIRA    “Patch                for testing
                                                                                   SVN
                              Available”            - HUDSON



                                                                                  GIT




           Development
                                 YAHOO !
                              Test Environment

                  Manual         Manual                HUDSON -       HUDSON      GIT
                  Patch          Functional            Benchmark      Release   Y!Hadoop
                  Testing        Testing               and            Build
                                                       Automation



                                              -6-
Tools used for Hadoop QE and RE

  •   Hudson            – Build automation
  •   SVN and GIT       – Source Code Mgmt (SCM)
  •   Ant & ivy         – Build and Dependency Mgmt
  •   Checkstyle        – code standard checker
  •   Clover            – code coverage
  •   Forrest           – Documentation
  •   Jdiff             – Track API changes
  •   Findbugs          – Static analysis to find bugs
  •   Junit             – Unit tests
  •   Bugzilla & Jira   – Issue Tracking

                           -7-
Hudson



 •   Hudson is a Continuous Integration Server used to
     execute and monitor job (Hudson job)
 •   Used for:
     – Build
     – Unit Tests
     – Deployment
     – Validation Jobs
     – Automated tests
 •   http://hudson-ci.org/


                             -8-
Challenges in Hadoop QE and RE

  • Reliability
     – Loss of nodes
     – Data corruption
     – Loss of data blocks
  • Scale
     – Network issues
     – Disk issues
  • Performance
  • Corner cases
  • Repeatability
     – Deployment
     – Continuous Integration


                                -9-
Reliability


   • MapReduce Reliability
     – Fail Tasks
     – Lost TT’s

   • HDFS Reliability
     – Bringing a rack down
     – Corrupting data blocks
     – Loss of data blocks




                                - 10 -
Scale


  • Testing at scale when Hardware resource are limited
  • If we want more nodes for testing, what will we do?
    – Use simulation
        ▪ DataNode simulation
        ▪ TaskTracker simulation
    – For example
        ▪ We need an environment of 3000 node cluster
        ▪ Run 3 instance of TT’s and DN’s per node on 1000 Node cluster
        ▪ This simulates an environment equivalent to 3000 node cluster




                                   - 11 -
Performance


  • Benchmark execution on 20 and 500 nodes
    – Eg: Sort, Shuffle, DFSIO

  • GridMix
    – V1 - A standard mix of MR jobs of varying types and sizes
      measuring throughput on a cluster
    – V2 - Customized mix of MR jobs where the number of
      small/large/medium jobs can be controlled
    – V3
       ▪ It simulates user load pattern.
       ▪ Work load is generated from job history trace analysis



                                    - 12 -
Corner Cases

  • Challenges in reproducing a problem related to
    –   Timing issues
    –   Race conditions
    –   Out of memory issues
    –   Reproducing in the exact environment where it occurred.

  • AspectJ
    –   Aspectj taps into source code and can run simulated scenarios
        before/after/during a method.
    –   It can reproduce timing issues by introducing sleep statements.
    –   out of memory issues, by reducing the memory available duing run
        time.
    –   Exact environments can reproduced by changing the configs of the
        jobs in the go, when the exact configuration is not possible to
        replicate.

                                     - 13 -
Repeatability - Deployment



  •   Deployment Challenges
      –   Deploying on a multiple node cluster
      –   Deciding on a JTNode and NameNode
      –   Building configurations for variety of clusters


  •   Solution
      –   YUM repo for deployment
      –   Backup host for JTNode and Namenode
      –   Source code build & configuration build




                                      - 14 -
Repeatability - CI



  • Continuous Integration aka CI
     – Software development process where members of the team
       integrate their work frequently, usually daily
     – Every integration is verified by automated build (including
       tests) to verify integration errors as quickly as possible.


  • CI @ Y!
     – Commit build
     – Secondary build
     – Secondary smoke test build
     – Automated deployment

                                  - 15 -
Thank you



    - 16 -   16

More Related Content

Similar to Hadoop Summit 2010 Challenges And Uniqueness Of Qe And Re Processes In Hadoop

Continuous integration for open source distros v 3.0
Continuous integration for open source distros v 3.0Continuous integration for open source distros v 3.0
Continuous integration for open source distros v 3.0Sriram Narayanan
 
Continuous Test Automation via CI (CodeMash 2012) - Automating the Agile way
Continuous Test Automation via CI (CodeMash 2012) - Automating the Agile wayContinuous Test Automation via CI (CodeMash 2012) - Automating the Agile way
Continuous Test Automation via CI (CodeMash 2012) - Automating the Agile wayLeonard Fingerman
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceGrid Dynamics
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
Design For Testability
Design For TestabilityDesign For Testability
Design For TestabilityWill Iverson
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructureelliando dias
 
Beyond Scrum: Scaling Agile with Continuous Delivery and Subversion
Beyond Scrum: Scaling Agile with Continuous Delivery and SubversionBeyond Scrum: Scaling Agile with Continuous Delivery and Subversion
Beyond Scrum: Scaling Agile with Continuous Delivery and SubversionProduct Marketing Services
 
Microsoft SQL Server Testing Frameworks
Microsoft SQL Server Testing FrameworksMicrosoft SQL Server Testing Frameworks
Microsoft SQL Server Testing FrameworksMark Ginnebaugh
 
Towards Continuous Deployment with Django
Towards Continuous Deployment with DjangoTowards Continuous Deployment with Django
Towards Continuous Deployment with DjangoRoger Barnes
 
Drupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study CaseDrupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study CaseEmanuele Quinto
 
Managing High Availability with Low Cost
Managing High Availability with Low CostManaging High Availability with Low Cost
Managing High Availability with Low CostDataLeader.io
 
Summit 16: Multi-site OPNFV Testing Challenges
Summit 16: Multi-site OPNFV Testing ChallengesSummit 16: Multi-site OPNFV Testing Challenges
Summit 16: Multi-site OPNFV Testing ChallengesOPNFV
 
מצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקות
מצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקותמצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקות
מצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקותRam Yonish
 

Similar to Hadoop Summit 2010 Challenges And Uniqueness Of Qe And Re Processes In Hadoop (20)

Continuous integration for open source distros v 3.0
Continuous integration for open source distros v 3.0Continuous integration for open source distros v 3.0
Continuous integration for open source distros v 3.0
 
Continuous Test Automation via CI (CodeMash 2012) - Automating the Agile way
Continuous Test Automation via CI (CodeMash 2012) - Automating the Agile wayContinuous Test Automation via CI (CodeMash 2012) - Automating the Agile way
Continuous Test Automation via CI (CodeMash 2012) - Automating the Agile way
 
Google Compute and MapR
Google Compute and MapRGoogle Compute and MapR
Google Compute and MapR
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
 
Performance on a budget
Performance on a budgetPerformance on a budget
Performance on a budget
 
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Design For Testability
Design For TestabilityDesign For Testability
Design For Testability
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
 
Beyond Scrum: Scaling Agile with Continuous Delivery and Subversion
Beyond Scrum: Scaling Agile with Continuous Delivery and SubversionBeyond Scrum: Scaling Agile with Continuous Delivery and Subversion
Beyond Scrum: Scaling Agile with Continuous Delivery and Subversion
 
Microsoft SQL Server Testing Frameworks
Microsoft SQL Server Testing FrameworksMicrosoft SQL Server Testing Frameworks
Microsoft SQL Server Testing Frameworks
 
Towards Continuous Deployment with Django
Towards Continuous Deployment with DjangoTowards Continuous Deployment with Django
Towards Continuous Deployment with Django
 
Drupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study CaseDrupal & Continous Integration - SF State Study Case
Drupal & Continous Integration - SF State Study Case
 
Managing High Availability with Low Cost
Managing High Availability with Low CostManaging High Availability with Low Cost
Managing High Availability with Low Cost
 
Hawaii Pacific GIS Conference 2012: Mobile GIS - Mobile GIS at Maui Electric ...
Hawaii Pacific GIS Conference 2012: Mobile GIS - Mobile GIS at Maui Electric ...Hawaii Pacific GIS Conference 2012: Mobile GIS - Mobile GIS at Maui Electric ...
Hawaii Pacific GIS Conference 2012: Mobile GIS - Mobile GIS at Maui Electric ...
 
Summit 16: Multi-site OPNFV Testing Challenges
Summit 16: Multi-site OPNFV Testing ChallengesSummit 16: Multi-site OPNFV Testing Challenges
Summit 16: Multi-site OPNFV Testing Challenges
 
מצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקות
מצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקותמצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקות
מצגת מגמות בבדיקות תוכנה כנס בדיקות 2011 רם יוניש טאקט בדיקות
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Implementing your own Google App Engine
Implementing your own Google App Engine Implementing your own Google App Engine
Implementing your own Google App Engine
 
Gated methodology alignment artifact and timing matrix
Gated methodology alignment artifact and timing matrixGated methodology alignment artifact and timing matrix
Gated methodology alignment artifact and timing matrix
 

More from Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Hadoop Summit 2010 Challenges And Uniqueness Of Qe And Re Processes In Hadoop

  • 1. Challenges and Uniqueness of QE and RE processes in Hadoop Jayant Mahajan Grid Computing, Yahoo! Bangalore Feb 2010 -1- 1
  • 2. Agenda • Quality Checks for a Patch at Hadoop • Additional QE at Yahoo! • Tools used for Hadoop QE and RE • Challenges -2-
  • 3. Quality checks for a patch commit in Hadoop • Static Quality Analysis – Patch attached to Jira – Verify Findbugs warnings – Verify Javadoc warning – Verify ReleaseAudit warnings – Verify Unit Tests – if added or not • Committer Review • Unit Tests – Junit – Mini MR Tests -3-
  • 4. Quality checks for a patch commit in Hadoop (Contd ..) COMMUNITY Secondary Build • Static analysis – findbugs • Jdiff • All Core unit tests with code coverage • All Contrib unit tests code coverage Jira Patch Set JIRA Patch commit raised attached status to picked up to JIRA “Patch for testing SVN Available” - HUDSON • Static analysis – Findbug • ReleaseAudit warning Committer • Fast unit tests - TestNG Review • Fast contrib unit tests (if Development patching contrib) -4-
  • 5. Additional QE @ Y! for Hadoop • We are the largest test team for Hadoop • More than 1000 nodes dedicated for QE • Hadoop testing at Yahoo – Patch testing – Automated Testing – Manual Testing -5-
  • 6. Additional QE @ Y! for Hadoop (Contd ..) COMMUNITY Jira Patch Set JIRA Patch commit raised attached status to picked up to JIRA “Patch for testing SVN Available” - HUDSON GIT Development YAHOO ! Test Environment Manual Manual HUDSON - HUDSON GIT Patch Functional Benchmark Release Y!Hadoop Testing Testing and Build Automation -6-
  • 7. Tools used for Hadoop QE and RE • Hudson – Build automation • SVN and GIT – Source Code Mgmt (SCM) • Ant & ivy – Build and Dependency Mgmt • Checkstyle – code standard checker • Clover – code coverage • Forrest – Documentation • Jdiff – Track API changes • Findbugs – Static analysis to find bugs • Junit – Unit tests • Bugzilla & Jira – Issue Tracking -7-
  • 8. Hudson • Hudson is a Continuous Integration Server used to execute and monitor job (Hudson job) • Used for: – Build – Unit Tests – Deployment – Validation Jobs – Automated tests • http://hudson-ci.org/ -8-
  • 9. Challenges in Hadoop QE and RE • Reliability – Loss of nodes – Data corruption – Loss of data blocks • Scale – Network issues – Disk issues • Performance • Corner cases • Repeatability – Deployment – Continuous Integration -9-
  • 10. Reliability • MapReduce Reliability – Fail Tasks – Lost TT’s • HDFS Reliability – Bringing a rack down – Corrupting data blocks – Loss of data blocks - 10 -
  • 11. Scale • Testing at scale when Hardware resource are limited • If we want more nodes for testing, what will we do? – Use simulation ▪ DataNode simulation ▪ TaskTracker simulation – For example ▪ We need an environment of 3000 node cluster ▪ Run 3 instance of TT’s and DN’s per node on 1000 Node cluster ▪ This simulates an environment equivalent to 3000 node cluster - 11 -
  • 12. Performance • Benchmark execution on 20 and 500 nodes – Eg: Sort, Shuffle, DFSIO • GridMix – V1 - A standard mix of MR jobs of varying types and sizes measuring throughput on a cluster – V2 - Customized mix of MR jobs where the number of small/large/medium jobs can be controlled – V3 ▪ It simulates user load pattern. ▪ Work load is generated from job history trace analysis - 12 -
  • 13. Corner Cases • Challenges in reproducing a problem related to – Timing issues – Race conditions – Out of memory issues – Reproducing in the exact environment where it occurred. • AspectJ – Aspectj taps into source code and can run simulated scenarios before/after/during a method. – It can reproduce timing issues by introducing sleep statements. – out of memory issues, by reducing the memory available duing run time. – Exact environments can reproduced by changing the configs of the jobs in the go, when the exact configuration is not possible to replicate. - 13 -
  • 14. Repeatability - Deployment • Deployment Challenges – Deploying on a multiple node cluster – Deciding on a JTNode and NameNode – Building configurations for variety of clusters • Solution – YUM repo for deployment – Backup host for JTNode and Namenode – Source code build & configuration build - 14 -
  • 15. Repeatability - CI • Continuous Integration aka CI – Software development process where members of the team integrate their work frequently, usually daily – Every integration is verified by automated build (including tests) to verify integration errors as quickly as possible. • CI @ Y! – Commit build – Secondary build – Secondary smoke test build – Automated deployment - 15 -
  • 16. Thank you - 16 - 16