SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Hadoop Testing
Workshop
Ophir Cohen
Data Platform Leader,
ophirc@liveperson.com
July 2013
Agenda
1. Connection Before Content
2. Testing Fundamental
3. Unit Tests
4. Integration Tests
5. Try it out
6. Performance
7. Diagnostics
Why Testing
1. Catch bugs early in the developing cycle
2. Transparency of current project status
3. Easy developing / refactoring: immediate feedback
4. Push developer to provide better and stable code
5. Decrease developing cycle times
Why Automatic Testing?
It isn't real question right?
Testing Fundamental
1. Unit testing - functional verification of each 'unit' (method /
class in Java)
2. Integration testing - verifies that the system works as a
whole
3. Performance testing - test the efficiency of the program.
Deepened by code AND cluster architecture
4. Diagnostic - the way to find problems in production.
--> 1 + 2 should be done BEFORE production
Unit Tests
Key Features
1. Simple (up to 10 lines)
2. Isolation (no DB connection, no cluster dependency etc...)
3. Deterministics - PASS or FAIL
4. Automated (of course)
Why Unit Tests
1. Prevent regression
2. Fast - no need of full MR env
3. Help in refactoring and updates
Unit Tests - MR jobs
Best Practices
1. Extract the tested code into isolated method/class
2. Do not test MR framework but pure Java
3. Use the same package for tests
MRUnit
1. Lib for MR unit tests
2. Apache project
3. Supports testing of mappers, reducers and full job (without full
cluster)
4. Supports counters testing (nice!)
Unit Tests - Examples
Unit Tests Code Example
Integration Tests - background
1. Unit tests test each unit (Mapper/Reducer), integration
test the integrated work
2. Test the integration with the framework
3. Does not limited by data volumes
Integration Tests - tips and tricks
Tips and tricks
1. Use MiniMRCluster / MiniDFSCluster for tests
2. Use Linux
3. Make dev == production
4. Use data sampling:
a. Random sampling
b. Biased sampling
5. Apache BigTop (never try that)
6. Use Cloudera CDH
Lets play a bit
1. Checkout the code:
git clone https://github.com/ophchu/mapreduce-tutorials.git
2. Make sure you manage to run the mapper test
3. Complete the MRUnit tests for the reducer and full job
4. Play with the MiniMRCluster/MiniDFSCluster test
Performance
Profiling (at a glance...)
1. Profile your code
2. Measure and tune what's matters to you
3. Benchmarking: micro and macro
4. Hadoop has a built-in profiler (e.g. using hprof)
Cluster Performance
1. Terasort test
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.
1.2.jar teragen 1000 /user/dataint/terasort/input
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.
1.2.jar terasort /user/dataint/terasort/input /user/dataint/terasort/output
2. MRBench - MR benchmarking
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench -numRuns 2
-maps 10 -reduces 10 -inputLines 100 -inputType random
3. NNBench - Name Node benchmarking
4. TestDFSIO - write and read performance
Diagnostics
1. Check web API (http://your_server:50030/jobtracker.jsp):
a. Nodes: how many up, how many down, check slots
b. Jobs: logs, failures, exceptions
c. Counters: expected
2. Configuration:
a. check job conf (job.xml)
b. Check env conf (http://your_server:50030/conf)
3. Jobs history (http://your_server:50030/jobhistory.jsp)
4. Log dirs:
a. Job tracker (http://your_server:50030/logs/)
b. Task trakcers
Thanks
● ophchu@gmail.com
● @ophchu
Thanks

Más contenido relacionado

La actualidad más candente

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOpsRTTS
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupQualitest
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionRTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectRTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingRTTS
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaRTTS
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopRTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyRTTS
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...RTTS
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonTerry Bunio
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS
 

La actualidad más candente (20)

QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
How to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest GroupHow to Test Big Data Systems | QualiTest Group
How to Test Big Data Systems | QualiTest Group
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madisonA data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 

Similar a Hadoop testing workshop - july 2013

ScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency InjectionScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency Injection7mind
 
Unit testing using Munit Part 1
Unit testing using Munit Part 1Unit testing using Munit Part 1
Unit testing using Munit Part 1Anand kalla
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsQuontra Solutions
 
JAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & JasmineJAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & JasmineAnup Singh
 
Developers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomonDevelopers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomonIneke Scheffers
 
2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the Trenches2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the TrenchesNicolas Fränkel
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal infoSynapseindiappsdevelopment
 
Drupalcamp Simpletest
Drupalcamp SimpletestDrupalcamp Simpletest
Drupalcamp Simpletestlyricnz
 
Testing In Drupal
Testing In DrupalTesting In Drupal
Testing In DrupalRyan Cross
 
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...Bruno Tanoue
 
Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015Stephen Ritchie
 
(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should know(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should knowRichard Cheng
 
TDD for joomla extensions
TDD for joomla extensionsTDD for joomla extensions
TDD for joomla extensionsRoberto Segura
 
Agile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard ChengAgile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard ChengExcella
 
Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010Miguel Hernandez
 
Testing & should i do it
Testing & should i do itTesting & should i do it
Testing & should i do itMartin Sykora
 

Similar a Hadoop testing workshop - july 2013 (20)

ScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency InjectionScalaUA - distage: Staged Dependency Injection
ScalaUA - distage: Staged Dependency Injection
 
Unit testing using Munit Part 1
Unit testing using Munit Part 1Unit testing using Munit Part 1
Unit testing using Munit Part 1
 
Automated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra SolutionsAutomated Software Testing Framework Training by Quontra Solutions
Automated Software Testing Framework Training by Quontra Solutions
 
JAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & JasmineJAVASCRIPT Test Driven Development & Jasmine
JAVASCRIPT Test Driven Development & Jasmine
 
TDD Workshop UTN 2012
TDD Workshop UTN 2012TDD Workshop UTN 2012
TDD Workshop UTN 2012
 
Developers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomonDevelopers Testing - Girl Code at bloomon
Developers Testing - Girl Code at bloomon
 
2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the Trenches2014 Joker - Integration Testing from the Trenches
2014 Joker - Integration Testing from the Trenches
 
SynapseIndia drupal presentation on drupal info
SynapseIndia drupal  presentation on drupal infoSynapseIndia drupal  presentation on drupal info
SynapseIndia drupal presentation on drupal info
 
Drupalcamp Simpletest
Drupalcamp SimpletestDrupalcamp Simpletest
Drupalcamp Simpletest
 
Testing In Drupal
Testing In DrupalTesting In Drupal
Testing In Drupal
 
The Test way
The Test wayThe Test way
The Test way
 
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
 
Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015Agile Engineering Sparker GLASScon 2015
Agile Engineering Sparker GLASScon 2015
 
(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should know(Agile) engineering best practices - What every project manager should know
(Agile) engineering best practices - What every project manager should know
 
TDD for joomla extensions
TDD for joomla extensionsTDD for joomla extensions
TDD for joomla extensions
 
Python and test
Python and testPython and test
Python and test
 
Agile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard ChengAgile Engineering Best Practices by Richard Cheng
Agile Engineering Best Practices by Richard Cheng
 
Testing 101
Testing 101Testing 101
Testing 101
 
Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010Simple test drupal7_presentation_la_drupal_jul21-2010
Simple test drupal7_presentation_la_drupal_jul21-2010
 
Testing & should i do it
Testing & should i do itTesting & should i do it
Testing & should i do it
 

Último

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Hadoop testing workshop - july 2013

  • 1. Hadoop Testing Workshop Ophir Cohen Data Platform Leader, ophirc@liveperson.com July 2013
  • 2. Agenda 1. Connection Before Content 2. Testing Fundamental 3. Unit Tests 4. Integration Tests 5. Try it out 6. Performance 7. Diagnostics
  • 3. Why Testing 1. Catch bugs early in the developing cycle 2. Transparency of current project status 3. Easy developing / refactoring: immediate feedback 4. Push developer to provide better and stable code 5. Decrease developing cycle times
  • 4. Why Automatic Testing? It isn't real question right?
  • 5. Testing Fundamental 1. Unit testing - functional verification of each 'unit' (method / class in Java) 2. Integration testing - verifies that the system works as a whole 3. Performance testing - test the efficiency of the program. Deepened by code AND cluster architecture 4. Diagnostic - the way to find problems in production. --> 1 + 2 should be done BEFORE production
  • 6. Unit Tests Key Features 1. Simple (up to 10 lines) 2. Isolation (no DB connection, no cluster dependency etc...) 3. Deterministics - PASS or FAIL 4. Automated (of course) Why Unit Tests 1. Prevent regression 2. Fast - no need of full MR env 3. Help in refactoring and updates
  • 7. Unit Tests - MR jobs Best Practices 1. Extract the tested code into isolated method/class 2. Do not test MR framework but pure Java 3. Use the same package for tests MRUnit 1. Lib for MR unit tests 2. Apache project 3. Supports testing of mappers, reducers and full job (without full cluster) 4. Supports counters testing (nice!)
  • 8. Unit Tests - Examples Unit Tests Code Example
  • 9. Integration Tests - background 1. Unit tests test each unit (Mapper/Reducer), integration test the integrated work 2. Test the integration with the framework 3. Does not limited by data volumes
  • 10. Integration Tests - tips and tricks Tips and tricks 1. Use MiniMRCluster / MiniDFSCluster for tests 2. Use Linux 3. Make dev == production 4. Use data sampling: a. Random sampling b. Biased sampling 5. Apache BigTop (never try that) 6. Use Cloudera CDH
  • 11. Lets play a bit 1. Checkout the code: git clone https://github.com/ophchu/mapreduce-tutorials.git 2. Make sure you manage to run the mapper test 3. Complete the MRUnit tests for the reducer and full job 4. Play with the MiniMRCluster/MiniDFSCluster test
  • 12. Performance Profiling (at a glance...) 1. Profile your code 2. Measure and tune what's matters to you 3. Benchmarking: micro and macro 4. Hadoop has a built-in profiler (e.g. using hprof)
  • 13. Cluster Performance 1. Terasort test hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4. 1.2.jar teragen 1000 /user/dataint/terasort/input hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4. 1.2.jar terasort /user/dataint/terasort/input /user/dataint/terasort/output 2. MRBench - MR benchmarking hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench -numRuns 2 -maps 10 -reduces 10 -inputLines 100 -inputType random 3. NNBench - Name Node benchmarking 4. TestDFSIO - write and read performance
  • 14. Diagnostics 1. Check web API (http://your_server:50030/jobtracker.jsp): a. Nodes: how many up, how many down, check slots b. Jobs: logs, failures, exceptions c. Counters: expected 2. Configuration: a. check job conf (job.xml) b. Check env conf (http://your_server:50030/conf) 3. Jobs history (http://your_server:50030/jobhistory.jsp) 4. Log dirs: a. Job tracker (http://your_server:50030/logs/) b. Task trakcers