SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Continuous Integration
on top of hadoop
Wisely Chen & Neal Lee
Tuesday, June 11, 13
Agenda
• Who I am
• Problem
• Solution
• Demo
• Q&A
Tuesday, June 11, 13
Who I am
• Wisely Chen ( thegiive@gmail.com )
• Release manager of Yahoo![Taiwan] shopping and data team
• Love to promote open source tech at Taiwan
• Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007
• Puppet : PHPConf 2012 , RubyConf 2012
• Release Practice :Webconf 2013, Coscup 2012
Tuesday, June 11, 13
Who I am
• Neal Lee (@neal_lee)
• Data Engineer at Yahoo![Taiwan]
• Aiming to build up an easy use of self-service BI
platform connecting to Hadoop.
Tuesday, June 11, 13
Story 1
Tuesday, June 11, 13
Another Story
Tuesday, June 11, 13
Yet Another Story
Tuesday, June 11, 13
Solution
Tuesday, June 11, 13
One click
• Manual commit code to SCM
• And DONE
• Auto unit testing
• Auto push beta for performance testing
• Auto push to production grid
• Auto trigger code
Tuesday, June 11, 13
This feeling is 爽!
Tuesday, June 11, 13
Continuous Integration
Tuesday, June 11, 13
Continuous Integration
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commit to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Tuesday, June 11, 13
We focus on
• A software engineering practice
• Maintain code repos
• Automate the build
• Make the build self-testing
• Everyone commit to the baseline everyday
• Every commit should be a build
• Test in a clone of production environment
• Make it easy to get the latest deliverables
• Everyone can see the result of latest build
• Automate deployment
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
11.
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
10. Call
9. git tag
12. notify
user
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify
CI
CI
Master
1. Commit
Code
SCM
3. Call
5. Notify
user
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify
CI
5. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
8.Notify
user
Tuesday, June 11, 13
CI flow
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify
CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
9. Notify
user
Tuesday, June 11, 13
Unit Test
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
11.
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
10. Call
9. git tag
12. notify
user
Tuesday, June 11, 13
PigUnit
• A simple xUnit framework
• No cluster set up is required in local mode
• Unit testing, regression testing, and rapid
prototyping on the fly
Tuesday, June 11, 13
Using PigUnit
• Coding
• Write PigUnit test case
• Run local PigUnit test
• Push to grid
• Run Pig on grid
• Get right result !
Tuesday, June 11, 13
Unit test is live doc
• Unit test is runnable live doc
• Pass test case and meet previous
requirement
Tuesday, June 11, 13
Performance Test
4. CI slave
exec local
UnitTest
7. CI slave
exec
Performanc
11.
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
2. notify CI
5. deploy 8. deploy
CI
Master
1. Commit
Code
SCM
3. Call 6. Call
10. Call
9. git tag
12. notify
user
Tuesday, June 11, 13
Vaidya
• Rule based performance diagnosis of M/R jobs
• Extensible framework
• You can add your own rules
• Write complex rules using existing rules
Tuesday, June 11, 13
CI toolset
CI slave
exec local
UnitTest
CI slave
exec
Performanc
CI exec pig
People
DEV Alpha Beta Grid Prod Grid
notify CI
deploy deploy
CI
Commit
Code
SCM
Call
Vaidya
BASH
Tuesday, June 11, 13
CI is flexible
• MapReduce can use MapUnit
• Hive can use hive_test
• Pig can use PigUnit
Tuesday, June 11, 13
Github trigger CI
Tuesday, June 11, 13
CI testing build pipeline
Tuesday, June 11, 13
Testing Trend
Tuesday, June 11, 13
DEMO
Tuesday, June 11, 13
Conclusion
• Auto testing will save your life
• CI will boost your productivity
• This process can feed in any platform
Tuesday, June 11, 13
謝謝大家
Tuesday, June 11, 13

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Karim Fanadka
Karim FanadkaKarim Fanadka
Karim Fanadka
 
Smarter deployments with octopus deploy
Smarter deployments with octopus deploySmarter deployments with octopus deploy
Smarter deployments with octopus deploy
 
Building Web Apps in Ratpack
Building Web Apps in RatpackBuilding Web Apps in Ratpack
Building Web Apps in Ratpack
 
Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)Intro to Ratpack (CDJDN 2015-01-22)
Intro to Ratpack (CDJDN 2015-01-22)
 
Make It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version ControlMake It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version Control
 
There's more to Ratpack than non-blocking
There's more to Ratpack than non-blockingThere's more to Ratpack than non-blocking
There's more to Ratpack than non-blocking
 
Automated acceptance test
Automated acceptance testAutomated acceptance test
Automated acceptance test
 
Releasing High Quality Packages - Longhorn PHP 2021
Releasing High Quality Packages - Longhorn PHP 2021Releasing High Quality Packages - Longhorn PHP 2021
Releasing High Quality Packages - Longhorn PHP 2021
 
Continuous delivery - tools and techniques
Continuous delivery - tools and techniquesContinuous delivery - tools and techniques
Continuous delivery - tools and techniques
 
Continuous Delivery - Devoxx Morocco 2016
Continuous Delivery - Devoxx Morocco 2016Continuous Delivery - Devoxx Morocco 2016
Continuous Delivery - Devoxx Morocco 2016
 
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
Continuous Delivery - Voxxed Days Thessaloniki 21.10.2016
 
Safe deployments with Blue-Green and Spinnaker
Safe deployments with Blue-Green and SpinnakerSafe deployments with Blue-Green and Spinnaker
Safe deployments with Blue-Green and Spinnaker
 
Speed up your regression and reduce cost load with Selenoid + K8s + ReportPortal
Speed up your regression and reduce cost load with Selenoid + K8s + ReportPortalSpeed up your regression and reduce cost load with Selenoid + K8s + ReportPortal
Speed up your regression and reduce cost load with Selenoid + K8s + ReportPortal
 
dotnetsheff: Continuous delivery with Team City and Octopus Deploy
dotnetsheff: Continuous delivery with Team City and Octopus Deploydotnetsheff: Continuous delivery with Team City and Octopus Deploy
dotnetsheff: Continuous delivery with Team City and Octopus Deploy
 
Developing in the Fastlane -> How LookLive uses Fastlane to automate and spee...
Developing in the Fastlane -> How LookLive uses Fastlane to automate and spee...Developing in the Fastlane -> How LookLive uses Fastlane to automate and spee...
Developing in the Fastlane -> How LookLive uses Fastlane to automate and spee...
 
DevOps 及 TDD 開發流程哲學
DevOps 及 TDD 開發流程哲學DevOps 及 TDD 開發流程哲學
DevOps 及 TDD 開發流程哲學
 
Ratpack Web Framework
Ratpack Web FrameworkRatpack Web Framework
Ratpack Web Framework
 
Continuous delivery in Qbon
Continuous delivery  in QbonContinuous delivery  in Qbon
Continuous delivery in Qbon
 
Perl Continous Integration
Perl Continous IntegrationPerl Continous Integration
Perl Continous Integration
 
Using Docker for Testing
Using Docker for TestingUsing Docker for Testing
Using Docker for Testing
 

Destacado

MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
Spark Summit
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
DataWorks Summit
 

Destacado (20)

Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
 
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)
 
SBT Crash Course
SBT Crash CourseSBT Crash Course
SBT Crash Course
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Devoxx 2016 Using Jenkins, Gerrit and Spark for Continuous Delivery Analytics
Devoxx 2016 Using Jenkins, Gerrit and Spark for Continuous Delivery AnalyticsDevoxx 2016 Using Jenkins, Gerrit and Spark for Continuous Delivery Analytics
Devoxx 2016 Using Jenkins, Gerrit and Spark for Continuous Delivery Analytics
 
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...  MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
MLLeap, or How to Productionize Data Science Workflows Using Spark by Mikha...
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
MongoDB as a Data Warehouse: Time Series and Device History Data (Medtronic)
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Dealing with Changed Data in Hadoop
Dealing with Changed Data in HadoopDealing with Changed Data in Hadoop
Dealing with Changed Data in Hadoop
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Not Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaNot Your Father's Database by Vida Ha
Not Your Father's Database by Vida Ha
 
Running Spark in Production
Running Spark in ProductionRunning Spark in Production
Running Spark in Production
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 

Similar a Hadoop Summit 2013 : Continuous Integration on top of hadoop

Continuous integration
Continuous integrationContinuous integration
Continuous integration
Basma Alkerm
 
AgileLINC Continous Slides by Daniel Harp
AgileLINC Continous Slides by Daniel HarpAgileLINC Continous Slides by Daniel Harp
AgileLINC Continous Slides by Daniel Harp
Barry Gavril
 
Continuous Integration and Delivery
Continuous Integration and DeliveryContinuous Integration and Delivery
Continuous Integration and Delivery
Brandon Cornett
 
Intro to PHP Testing
Intro to PHP TestingIntro to PHP Testing
Intro to PHP Testing
Ran Mizrahi
 

Similar a Hadoop Summit 2013 : Continuous Integration on top of hadoop (20)

Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
Test Driven Development & CI/CD
Test Driven Development & CI/CDTest Driven Development & CI/CD
Test Driven Development & CI/CD
 
Introduction to Continuous Integration
Introduction to Continuous IntegrationIntroduction to Continuous Integration
Introduction to Continuous Integration
 
Test parallelization using Jenkins
Test parallelization using JenkinsTest parallelization using Jenkins
Test parallelization using Jenkins
 
TYPO3 Camp Stuttgart 2015 - Continuous Delivery with Open Source Tools
TYPO3 Camp Stuttgart 2015 - Continuous Delivery with Open Source ToolsTYPO3 Camp Stuttgart 2015 - Continuous Delivery with Open Source Tools
TYPO3 Camp Stuttgart 2015 - Continuous Delivery with Open Source Tools
 
Enabling Agile Testing Through Continuous Integration Agile2009
Enabling Agile Testing Through Continuous Integration Agile2009Enabling Agile Testing Through Continuous Integration Agile2009
Enabling Agile Testing Through Continuous Integration Agile2009
 
Automated Visual Regression Testing by Dave Sadlon
Automated Visual Regression Testing by Dave SadlonAutomated Visual Regression Testing by Dave Sadlon
Automated Visual Regression Testing by Dave Sadlon
 
AgileLINC Continous Slides by Daniel Harp
AgileLINC Continous Slides by Daniel HarpAgileLINC Continous Slides by Daniel Harp
AgileLINC Continous Slides by Daniel Harp
 
Continuous integration & deployment
Continuous integration & deploymentContinuous integration & deployment
Continuous integration & deployment
 
Atagg2015 Continuous delivery by building environment using docker
Atagg2015 Continuous delivery by building environment using dockerAtagg2015 Continuous delivery by building environment using docker
Atagg2015 Continuous delivery by building environment using docker
 
Continuous Integration and Delivery
Continuous Integration and DeliveryContinuous Integration and Delivery
Continuous Integration and Delivery
 
Towards Continuous Delivery in Mobile Apps
Towards Continuous Delivery in Mobile AppsTowards Continuous Delivery in Mobile Apps
Towards Continuous Delivery in Mobile Apps
 
Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
From Renamer Plugin to Polyglot IDE
From Renamer Plugin to Polyglot IDEFrom Renamer Plugin to Polyglot IDE
From Renamer Plugin to Polyglot IDE
 
Continuous Delivery Using Jenkins
Continuous Delivery Using JenkinsContinuous Delivery Using Jenkins
Continuous Delivery Using Jenkins
 
DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013
 
Continuous Development Pipeline
Continuous Development PipelineContinuous Development Pipeline
Continuous Development Pipeline
 
Continuous Integration, the minimum viable product
Continuous Integration, the minimum viable productContinuous Integration, the minimum viable product
Continuous Integration, the minimum viable product
 
Intro to PHP Testing
Intro to PHP TestingIntro to PHP Testing
Intro to PHP Testing
 
How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015How we realized SOA by Python at PyCon JP 2015
How we realized SOA by Python at PyCon JP 2015
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Hadoop Summit 2013 : Continuous Integration on top of hadoop

  • 1. Continuous Integration on top of hadoop Wisely Chen & Neal Lee Tuesday, June 11, 13
  • 2. Agenda • Who I am • Problem • Solution • Demo • Q&A Tuesday, June 11, 13
  • 3. Who I am • Wisely Chen ( thegiive@gmail.com ) • Release manager of Yahoo![Taiwan] shopping and data team • Love to promote open source tech at Taiwan • Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007 • Puppet : PHPConf 2012 , RubyConf 2012 • Release Practice :Webconf 2013, Coscup 2012 Tuesday, June 11, 13
  • 4. Who I am • Neal Lee (@neal_lee) • Data Engineer at Yahoo![Taiwan] • Aiming to build up an easy use of self-service BI platform connecting to Hadoop. Tuesday, June 11, 13
  • 9. One click • Manual commit code to SCM • And DONE • Auto unit testing • Auto push beta for performance testing • Auto push to production grid • Auto trigger code Tuesday, June 11, 13
  • 10. This feeling is 爽! Tuesday, June 11, 13
  • 12. Continuous Integration • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commit to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Tuesday, June 11, 13
  • 13. We focus on • A software engineering practice • Maintain code repos • Automate the build • Make the build self-testing • Everyone commit to the baseline everyday • Every commit should be a build • Test in a clone of production environment • Make it easy to get the latest deliverables • Everyone can see the result of latest build • Automate deployment Tuesday, June 11, 13
  • 14. CI flow 4. CI slave exec local UnitTest 7. CI slave exec Performanc 11. CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 10. Call 9. git tag 12. notify user Tuesday, June 11, 13
  • 15. CI flow 4. CI slave exec local UnitTest CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI CI Master 1. Commit Code SCM 3. Call 5. Notify user Tuesday, June 11, 13
  • 16. CI flow 4. CI slave exec local UnitTest 7. CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 8.Notify user Tuesday, June 11, 13
  • 17. CI flow 4. CI slave exec local UnitTest 7. CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 9. Notify user Tuesday, June 11, 13
  • 18. Unit Test 4. CI slave exec local UnitTest 7. CI slave exec Performanc 11. CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 10. Call 9. git tag 12. notify user Tuesday, June 11, 13
  • 19. PigUnit • A simple xUnit framework • No cluster set up is required in local mode • Unit testing, regression testing, and rapid prototyping on the fly Tuesday, June 11, 13
  • 20. Using PigUnit • Coding • Write PigUnit test case • Run local PigUnit test • Push to grid • Run Pig on grid • Get right result ! Tuesday, June 11, 13
  • 21. Unit test is live doc • Unit test is runnable live doc • Pass test case and meet previous requirement Tuesday, June 11, 13
  • 22. Performance Test 4. CI slave exec local UnitTest 7. CI slave exec Performanc 11. CI exec pig People DEV Alpha Beta Grid Prod Grid 2. notify CI 5. deploy 8. deploy CI Master 1. Commit Code SCM 3. Call 6. Call 10. Call 9. git tag 12. notify user Tuesday, June 11, 13
  • 23. Vaidya • Rule based performance diagnosis of M/R jobs • Extensible framework • You can add your own rules • Write complex rules using existing rules Tuesday, June 11, 13
  • 24. CI toolset CI slave exec local UnitTest CI slave exec Performanc CI exec pig People DEV Alpha Beta Grid Prod Grid notify CI deploy deploy CI Commit Code SCM Call Vaidya BASH Tuesday, June 11, 13
  • 25. CI is flexible • MapReduce can use MapUnit • Hive can use hive_test • Pig can use PigUnit Tuesday, June 11, 13
  • 27. CI testing build pipeline Tuesday, June 11, 13
  • 30. Conclusion • Auto testing will save your life • CI will boost your productivity • This process can feed in any platform Tuesday, June 11, 13