Hadoop testing workshop - july 2013

•

0 recomendaciones•2,733 vistas

Ophir Cohen

Presentation from the July's 2013 workshop on how to test, monitor and profile map reduce jobs

Tecnología

Hadoop Testing
Workshop
Ophir Cohen
Data Platform Leader,
ophirc@liveperson.com
July 2013

Agenda
1. Connection Before Content
2. Testing Fundamental
3. Unit Tests
4. Integration Tests
5. Try it out
6. Performance
7. Diagnostics

Why Testing
1. Catch bugs early in the developing cycle
2. Transparency of current project status
3. Easy developing / refactoring: immediate feedback
4. Push developer to provide better and stable code
5. Decrease developing cycle times

Why Automatic Testing?
It isn't real question right?

Testing Fundamental
1. Unit testing - functional verification of each 'unit' (method /
class in Java)
2. Integration testing - verifies that the system works as a
whole
3. Performance testing - test the efficiency of the program.
Deepened by code AND cluster architecture
4. Diagnostic - the way to find problems in production.
--> 1 + 2 should be done BEFORE production

Unit Tests
Key Features
1. Simple (up to 10 lines)
2. Isolation (no DB connection, no cluster dependency etc...)
3. Deterministics - PASS or FAIL
4. Automated (of course)
Why Unit Tests
1. Prevent regression
2. Fast - no need of full MR env
3. Help in refactoring and updates

Unit Tests - MR jobs
Best Practices
1. Extract the tested code into isolated method/class
2. Do not test MR framework but pure Java
3. Use the same package for tests
MRUnit
1. Lib for MR unit tests
2. Apache project
3. Supports testing of mappers, reducers and full job (without full
cluster)
4. Supports counters testing (nice!)

Unit Tests - Examples
Unit Tests Code Example

Integration Tests - background
1. Unit tests test each unit (Mapper/Reducer), integration
test the integrated work
2. Test the integration with the framework
3. Does not limited by data volumes

Integration Tests - tips and tricks
Tips and tricks
1. Use MiniMRCluster / MiniDFSCluster for tests
2. Use Linux
3. Make dev == production
4. Use data sampling:
a. Random sampling
b. Biased sampling
5. Apache BigTop (never try that)
6. Use Cloudera CDH

Lets play a bit
1. Checkout the code:
git clone https://github.com/ophchu/mapreduce-tutorials.git
2. Make sure you manage to run the mapper test
3. Complete the MRUnit tests for the reducer and full job
4. Play with the MiniMRCluster/MiniDFSCluster test

Cluster Performance
1. Terasort test
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.
1.2.jar teragen 1000 /user/dataint/terasort/input
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4.
1.2.jar terasort /user/dataint/terasort/input /user/dataint/terasort/output
2. MRBench - MR benchmarking
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench -numRuns 2
-maps 10 -reduces 10 -inputLines 100 -inputType random
3. NNBench - Name Node benchmarking
4. TestDFSIO - write and read performance

Diagnostics
1. Check web API (http://your_server:50030/jobtracker.jsp):
a. Nodes: how many up, how many down, check slots
b. Jobs: logs, failures, exceptions
c. Counters: expected
2. Configuration:
a. check job conf (job.xml)
b. Check env conf (http://your_server:50030/conf)
3. Jobs history (http://your_server:50030/jobhistory.jsp)
4. Log dirs:
a. Job tracker (http://your_server:50030/logs/)
b. Task trakcers

Thanks
● ophchu@gmail.com
● @ophchu
Thanks

Más contenido relacionado

La actualidad más candente

QuerySurge for DevOpsRTTS

How to Test Big Data Systems | QualiTest GroupQualitest

Completing the Data Equation: Test Data + Data Validation = SuccessRTTS

QuerySurge - the automated Data Testing solutionRTTS

Implementing Azure DevOps with your Testing ProjectRTTS

Query Wizards - data testing made easy - no programmingRTTS

Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies

Leveraging HPE ALM & QuerySurge to test HPE VerticaRTTS

Testing Big Data: Automated ETL Testing of HadoopRTTS

Whitepaper: Volume Testing Thick Clients and DatabasesRTTS

Data Warehouse Testing in the Pharmaceutical IndustryRTTS

Creating a Data validation and Testing StrategyRTTS

Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...RTTS

Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS

A data driven etl test framework sqlsat madisonTerry Bunio

QuerySurge Slide Deck for Big Data Testing WebinarRTTS

the Data World DistilledRTTS

What is a Data Warehouse and How Do I Test It?RTTS

RTTS - the Software Quality ExpertsRTTS

Test Automation for Data Warehouses Patrick Van Renterghem

La actualidad más candente (20)

QuerySurge for DevOps

How to Test Big Data Systems | QualiTest Group

Completing the Data Equation: Test Data + Data Validation = Success

QuerySurge - the automated Data Testing solution

Implementing Azure DevOps with your Testing Project

Query Wizards - data testing made easy - no programming

Performance Testing of Big Data Applications - Impetus Webcast

Leveraging HPE ALM & QuerySurge to test HPE Vertica

Testing Big Data: Automated ETL Testing of Hadoop

Whitepaper: Volume Testing Thick Clients and Databases

Data Warehouse Testing in the Pharmaceutical Industry

Creating a Data validation and Testing Strategy

Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...

Testing Big Data: Automated Testing of Hadoop with QuerySurge

A data driven etl test framework sqlsat madison

QuerySurge Slide Deck for Big Data Testing Webinar

the Data World Distilled

What is a Data Warehouse and How Do I Test It?

RTTS - the Software Quality Experts

Test Automation for Data Warehouses

Similar a Hadoop testing workshop - july 2013

ScalaUA - distage: Staged Dependency Injection7mind

Unit testing using Munit Part 1Anand kalla

Automated Software Testing Framework Training by Quontra SolutionsQuontra Solutions

JAVASCRIPT Test Driven Development & JasmineAnup Singh

TDD Workshop UTN 2012Facundo Farias

Developers Testing - Girl Code at bloomonIneke Scheffers

2014 Joker - Integration Testing from the TrenchesNicolas Fränkel

SynapseIndia drupal presentation on drupal infoSynapseindiappsdevelopment

Drupalcamp Simpletestlyricnz

Testing In DrupalRyan Cross

The Test wayMikhail Grinfeld

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...Bruno Tanoue

Agile Engineering Sparker GLASScon 2015Stephen Ritchie

(Agile) engineering best practices - What every project manager should knowRichard Cheng

TDD for joomla extensionsRoberto Segura

Python and testMicron Technology

Agile Engineering Best Practices by Richard ChengExcella

Testing 101Noam Barkai

Simple test drupal7_presentation_la_drupal_jul21-2010Miguel Hernandez

Testing & should i do itMartin Sykora

Similar a Hadoop testing workshop - july 2013 (20)

ScalaUA - distage: Staged Dependency Injection

Unit testing using Munit Part 1

Automated Software Testing Framework Training by Quontra Solutions

JAVASCRIPT Test Driven Development & Jasmine

TDD Workshop UTN 2012

Developers Testing - Girl Code at bloomon

2014 Joker - Integration Testing from the Trenches

SynapseIndia drupal presentation on drupal info

Drupalcamp Simpletest

Testing In Drupal

The Test way

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...

Agile Engineering Sparker GLASScon 2015

(Agile) engineering best practices - What every project manager should know

TDD for joomla extensions

Python and test

Agile Engineering Best Practices by Richard Cheng

Testing 101

Simple test drupal7_presentation_la_drupal_jul21-2010

Testing & should i do it

Último

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Advanced Computer Architecture – An IntroductionDilum Bandara

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Take control of your SAP testing with UiPath Test SuiteDianaGray10

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Sample pptx for embedding into website for demoHarshalMandlekar2

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Gen AI in Business - Global Trends Report 2024.pdfAddepto

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Hadoop testing workshop - july 2013

1. Hadoop Testing Workshop Ophir Cohen Data Platform Leader, ophirc@liveperson.com July 2013

2. Agenda 1. Connection Before Content 2. Testing Fundamental 3. Unit Tests 4. Integration Tests 5. Try it out 6. Performance 7. Diagnostics

3. Why Testing 1. Catch bugs early in the developing cycle 2. Transparency of current project status 3. Easy developing / refactoring: immediate feedback 4. Push developer to provide better and stable code 5. Decrease developing cycle times

4. Why Automatic Testing? It isn't real question right?

5. Testing Fundamental 1. Unit testing - functional verification of each 'unit' (method / class in Java) 2. Integration testing - verifies that the system works as a whole 3. Performance testing - test the efficiency of the program. Deepened by code AND cluster architecture 4. Diagnostic - the way to find problems in production. --> 1 + 2 should be done BEFORE production

6. Unit Tests Key Features 1. Simple (up to 10 lines) 2. Isolation (no DB connection, no cluster dependency etc...) 3. Deterministics - PASS or FAIL 4. Automated (of course) Why Unit Tests 1. Prevent regression 2. Fast - no need of full MR env 3. Help in refactoring and updates

7. Unit Tests - MR jobs Best Practices 1. Extract the tested code into isolated method/class 2. Do not test MR framework but pure Java 3. Use the same package for tests MRUnit 1. Lib for MR unit tests 2. Apache project 3. Supports testing of mappers, reducers and full job (without full cluster) 4. Supports counters testing (nice!)

8. Unit Tests - Examples Unit Tests Code Example

9. Integration Tests - background 1. Unit tests test each unit (Mapper/Reducer), integration test the integrated work 2. Test the integration with the framework 3. Does not limited by data volumes

10. Integration Tests - tips and tricks Tips and tricks 1. Use MiniMRCluster / MiniDFSCluster for tests 2. Use Linux 3. Make dev == production 4. Use data sampling: a. Random sampling b. Biased sampling 5. Apache BigTop (never try that) 6. Use Cloudera CDH

11. Lets play a bit 1. Checkout the code: git clone https://github.com/ophchu/mapreduce-tutorials.git 2. Make sure you manage to run the mapper test 3. Complete the MRUnit tests for the reducer and full job 4. Play with the MiniMRCluster/MiniDFSCluster test

12. Performance Profiling (at a glance...) 1. Profile your code 2. Measure and tune what's matters to you 3. Benchmarking: micro and macro 4. Hadoop has a built-in profiler (e.g. using hprof)

13. Cluster Performance 1. Terasort test hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4. 1.2.jar teragen 1000 /user/dataint/terasort/input hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples-2.0.0-mr1-cdh4. 1.2.jar terasort /user/dataint/terasort/input /user/dataint/terasort/output 2. MRBench - MR benchmarking hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-test.jar mrbench -numRuns 2 -maps 10 -reduces 10 -inputLines 100 -inputType random 3. NNBench - Name Node benchmarking 4. TestDFSIO - write and read performance

14. Diagnostics 1. Check web API (http://your_server:50030/jobtracker.jsp): a. Nodes: how many up, how many down, check slots b. Jobs: logs, failures, exceptions c. Counters: expected 2. Configuration: a. check job conf (job.xml) b. Check env conf (http://your_server:50030/conf) 3. Jobs history (http://your_server:50030/jobhistory.jsp) 4. Log dirs: a. Job tracker (http://your_server:50030/logs/) b. Task trakcers

15. Thanks ● ophchu@gmail.com ● @ophchu Thanks

Hadoop testing workshop - july 2013

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Hadoop testing workshop - july 2013

Similar a Hadoop testing workshop - july 2013 (20)

Último

Último (20)

Hadoop testing workshop - july 2013