SlideShare una empresa de Scribd logo
1 de 27
Testing Hadoop jobs
    with MRUnit

 Boulder/Denver Hadoop Users Group
                        05.12.2010


                     © 2010 Eric Wendelin
Eric Wendelin
Hadooper @returnpath
Blog: eriwen.com
Twitter: @eriwen
What is MRUnit?

• Testing library for MapReduce
• Developed by Cloudera
• Easy integration between MapReduce
  and standard testing tools (e.g. JUnit)

  cloudera.com/hadoop-mrunit
Why do I need that?
Testing without MRUnit
• Write tests that create JobConf or
  Configuration   objects
 •   conf.set(‘mapred.job.tracker’, ‘local’)

• Developing new test input files stored
  alongside MapReduce test code
• Lots of work to validate output files
 • External file I/O makes tests slooooow
MRUnit makes testing
Hadoop jobs easier
Testing with MRUnit

• No external test input or output files
 • Programmatically specified
• Less test harness code (but also perhaps
  less control)
• Concise, fast tests
Example
class ExampleTest() {
  private Example.MyMapper mapper
  private Example.MyReducer reducer
  private MapReduceDriver driver

    @Before void setUp() {
      mapper = new Example.MyMapper()
      reducer = new Example.MyReducer()
      driver = new MapReduceDriver(mapper, reducer)
    }

    @Test void testMapReduce() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
Example
class ExampleTest() {
  private Example.MyMapper mapper
  private Example.MyReducer reducer
  private MapReduceDriver driver

    @Before void setUp() {
      mapper = new Example.MyMapper()
      reducer = new Example.MyReducer()
      driver = new MapReduceDriver(mapper, reducer)
    }

    @Test void testMapReduce() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
          .withOutput(new Text(‘c’), new Text(‘d’))
          .runTest()
    }
}
Test map and reduce
    separately
class ExampleTest() {
  private Example.MyMapper mapper
  private MapDriver driver

    @Before void setUp() {
       mapper = new Example.MyMapper()
       driver = new MapDriver(mapper)
     }

    @Test void testMap() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
class ExampleTest() {
  private Example.MyReducer reducer
  private ReduceDriver driver

    @Before void setUp() {
       reducer = new Example.MyReducer()
       driver = new ReduceDriver(reducer)
     }

    @Test void testReduce() {
      driver.withInput(new Text(‘a’),
          [new Text(‘foo’), new Text(‘bar’)])
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
Counters!
driver.withInput(...)
driver.run()

def counters = driver.getCounters()

assertEquals(1, counters.findCounter
    (‘foo’, ‘bar’).getValue())
Verifying logging
def messages = []
def appender = [
    append: { messages.add(it) },
    requiresLayout: { false }
  ] as AppenderSkeleton
Logger.getRootLogger().addAppender(appender)

driver.runTest()

assertTrue messages.find {
    it.getLevel.toString() == ‘WARN’ &&
    it.getMessage().contains(‘My err’) }

Logger.getRootLogger().removeAppender(appender)
Cool stuff I haven’t
         tried...
• The   PipelineMapReduceDriver  - allows
  testing a series of MapReduce passes
 • Just call addMapReduce(mapper, reducer)
• Mock objects - MockReporter,
  MockInputSplit, and MockOutputCollector

• Test combiners with
  myMapReduceDriver.setCombiner(myCombiner)
Problems with MRUnit
Not useful for
streaming jobs
shell$ ./myMapper.py < test.input |
sort | ./myReducer.py > actual.out

shell$ diff expected.out actual.out
runTest()  does not
    give meaningful
information on failure
Better to use run() and
      then assert
driver.setInput(new Text(‘foo’),
    new Text(‘bar’))

def output = driver.run()

assertEquals ‘baz’, output[0].first
assertEquals ‘jy’, output[0].second
Documentation is
 severely lacking
runXxx()   calls setup()
called for new Hadoop
 API, but not old API
Tests are not executed
 in a distributed way
In Summary, MRUnit...

• Makes testing your Hadoop jobs easier
• Abstracts away a lot of the boilerplate test
  setup you need
• Has it’s problems
 • but they are outweighed by the benefits
?
cloudera.com/hadoop-mrunit


Blog: eriwen.com
Twitter: @eriwen
Email:
eric.wendelin@returnpath.net
                   © 2010 Eric Wendelin

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Hadoop
HadoopHadoop
Hadoop
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Version Stamps in NOSQL Databases
Version Stamps in NOSQL DatabasesVersion Stamps in NOSQL Databases
Version Stamps in NOSQL Databases
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 

Destacado

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
AgileNCR2013
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and Correction
TechiNerd
 
UNIT TESTING PPT
UNIT TESTING PPTUNIT TESTING PPT
UNIT TESTING PPT
suhasreddy1
 

Destacado (14)

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
 
Groovy-er desktop applications with Griffon
Groovy-er desktop applications with GriffonGroovy-er desktop applications with Griffon
Groovy-er desktop applications with Griffon
 
Javascript Stacktrace Ignite
Javascript Stacktrace IgniteJavascript Stacktrace Ignite
Javascript Stacktrace Ignite
 
Cn lec-06
Cn lec-06Cn lec-06
Cn lec-06
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
 
Piggybacking
PiggybackingPiggybacking
Piggybacking
 
JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!
 
Test your Javascript! v1.1
Test your Javascript! v1.1Test your Javascript! v1.1
Test your Javascript! v1.1
 
Gradle by Example
Gradle by ExampleGradle by Example
Gradle by Example
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and Correction
 
CRC Error coding technique
CRC Error coding techniqueCRC Error coding technique
CRC Error coding technique
 
UNIT TESTING PPT
UNIT TESTING PPTUNIT TESTING PPT
UNIT TESTING PPT
 

Similar a Testing Hadoop jobs with MRUnit

An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
Ananth PackkilDurai
 
How and why i roll my own node.js framework
How and why i roll my own node.js frameworkHow and why i roll my own node.js framework
How and why i roll my own node.js framework
Ben Lin
 
Background Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbBackground Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRb
Juan Maiz
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience Report
Anton Arhipov
 
Your task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdfYour task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdf
amie1085
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
Tomek Kaczanowski
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
Jerome Eteve
 

Similar a Testing Hadoop jobs with MRUnit (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduce
 
Shooting the Rapids
Shooting the RapidsShooting the Rapids
Shooting the Rapids
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
How and why i roll my own node.js framework
How and why i roll my own node.js frameworkHow and why i roll my own node.js framework
How and why i roll my own node.js framework
 
Background Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbBackground Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRb
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience Report
 
AngularJS Testing Strategies
AngularJS Testing StrategiesAngularJS Testing Strategies
AngularJS Testing Strategies
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
R console
R consoleR console
R console
 
Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Your task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdfYour task is to implement an informed search algorithm that will cal.pdf
Your task is to implement an informed search algorithm that will cal.pdf
 
Testing in airflow
Testing in airflowTesting in airflow
Testing in airflow
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Testing Hadoop jobs with MRUnit

  • 1. Testing Hadoop jobs with MRUnit Boulder/Denver Hadoop Users Group 05.12.2010 © 2010 Eric Wendelin
  • 2. Eric Wendelin Hadooper @returnpath Blog: eriwen.com Twitter: @eriwen
  • 3. What is MRUnit? • Testing library for MapReduce • Developed by Cloudera • Easy integration between MapReduce and standard testing tools (e.g. JUnit) cloudera.com/hadoop-mrunit
  • 4. Why do I need that?
  • 5. Testing without MRUnit • Write tests that create JobConf or Configuration objects • conf.set(‘mapred.job.tracker’, ‘local’) • Developing new test input files stored alongside MapReduce test code • Lots of work to validate output files • External file I/O makes tests slooooow
  • 7. Testing with MRUnit • No external test input or output files • Programmatically specified • Less test harness code (but also perhaps less control) • Concise, fast tests
  • 8. Example class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 9. Example class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) .withOutput(new Text(‘c’), new Text(‘d’)) .runTest() } }
  • 10. Test map and reduce separately
  • 11. class ExampleTest() { private Example.MyMapper mapper private MapDriver driver @Before void setUp() { mapper = new Example.MyMapper() driver = new MapDriver(mapper) } @Test void testMap() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 12. class ExampleTest() { private Example.MyReducer reducer private ReduceDriver driver @Before void setUp() { reducer = new Example.MyReducer() driver = new ReduceDriver(reducer) } @Test void testReduce() { driver.withInput(new Text(‘a’), [new Text(‘foo’), new Text(‘bar’)]) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 13. Counters! driver.withInput(...) driver.run() def counters = driver.getCounters() assertEquals(1, counters.findCounter (‘foo’, ‘bar’).getValue())
  • 14. Verifying logging def messages = [] def appender = [ append: { messages.add(it) }, requiresLayout: { false } ] as AppenderSkeleton Logger.getRootLogger().addAppender(appender) driver.runTest() assertTrue messages.find { it.getLevel.toString() == ‘WARN’ && it.getMessage().contains(‘My err’) } Logger.getRootLogger().removeAppender(appender)
  • 15. Cool stuff I haven’t tried... • The PipelineMapReduceDriver - allows testing a series of MapReduce passes • Just call addMapReduce(mapper, reducer) • Mock objects - MockReporter, MockInputSplit, and MockOutputCollector • Test combiners with myMapReduceDriver.setCombiner(myCombiner)
  • 18. shell$ ./myMapper.py < test.input | sort | ./myReducer.py > actual.out shell$ diff expected.out actual.out
  • 19. runTest() does not give meaningful information on failure
  • 20. Better to use run() and then assert
  • 21. driver.setInput(new Text(‘foo’), new Text(‘bar’)) def output = driver.run() assertEquals ‘baz’, output[0].first assertEquals ‘jy’, output[0].second
  • 23. runXxx() calls setup() called for new Hadoop API, but not old API
  • 24. Tests are not executed in a distributed way
  • 25. In Summary, MRUnit... • Makes testing your Hadoop jobs easier • Abstracts away a lot of the boilerplate test setup you need • Has it’s problems • but they are outweighed by the benefits
  • 26. ?

Notas del editor