SlideShare una empresa de Scribd logo
1 de 15
© Hortonworks Inc. 2014
Designing and Testing
Accumulo Iterators
Josh Elser
Member of Technical Staff
PMC, Apache Accumulo
10, November 2015
Page 1
Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
© Hortonworks Inc. 2014
Design
Page 2
How do I know if my Iterator works?
What can I do in an Iterator?
How are these methods even called?!
© Hortonworks Inc. 2014
Common Patterns
Only a certain subset of algorithms fit into Accumulo
Iterators well.
(Avoid shoving a square peg into a round hole.)
• Filtering
• Reduction
• Bounded aggregations
–Keep an upper bound on the number of elements being
aggregated to avoid memory issues
• Transformations
–Key sort-order must be retained
–Best limited to the Value only
Page 3
© Hortonworks Inc. 2014
Design
Josh’s Iterator Design Principles:
•Always make forward-progress
•Think functional – Avoid unnecessary state
•Operate only on the data you have
•Do one thing and do it efficiently
Page 4
© Hortonworks Inc. 2014
Design
Page 5
Make
Forward
Progress
Start
End
© Hortonworks Inc. 2014
Think about your Iterator like a function
Unnecessary State
Page 6
def sum(list):
sum = 0
for entry in list:
sum += entry
return sum
• Avoid holding onto state when
at all possible.
• Think in terms of a stream
rather than chunks of data.
• Beware of memory implications
when performing aggregations.
© Hortonworks Inc. 2014
Operate locally
Daily Reminder: Iterators have no calls
for implementing a safe cleanup.
• Iterators cannot properly handle I/O-related issues to
external systems.
• Slow-external calls result in slow Accumulo.
• Some problems are more-safely implemented outside of
an Accumulo Iterators. Not a Coprocessor/Container.
Page 7
© Hortonworks Inc. 2014
Simplicity
Avoid doing multiple things in a
single Iterator.
•Object Oriented Design 101
•Iterators can be tricky to debug on their own
•Configuring multiple iterators are a feature
Page 8
© Hortonworks Inc. 2014
Testing
You should always test your code
before running it in any environment
to ensure that it functions as intended.
Page 9
© Hortonworks Inc. 2014
Testing
HOW?
Page 10
© Hortonworks Inc. 2014
Testing
A framework designed for testing Iterators given input,
a Range, options, and expected output.
Page 11
© Hortonworks Inc. 2014
Testing
Page 12
Test
Test
Test
Test
Test
Iterator Class
Range
Iterator Options
Sorted Input Data
Verification of
output records
OR
True/False check
User-Provided
Framework
© Hortonworks Inc. 2014
Features
•Auto-Discovery of test cases
•JUnit Parameterized test integration
•Provided Generic Tests
–Default Constructor
–Re-Seek (teardown)
–Deep Copy Verification
Page 13
© Hortonworks Inc. 2014
Future Work
•More Iterator tests!
•A final resting place for the code
•Documentation
•Usability testing
Page 14
https://issues.apache.org/jira/browse/ACCUMULO-626
© Hortonworks Inc. 2014
Thanks!
jelser@hortonworks.com
Page 15

Más contenido relacionado

La actualidad más candente

(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
NAVER D2
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
OReillyStrata
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Cloudera, Inc.
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 

La actualidad más candente (19)

Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep dive
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 

Similar a Designing and Testing Accumulo Iterators

The View - Lotusscript coding best practices
The View - Lotusscript coding best practicesThe View - Lotusscript coding best practices
The View - Lotusscript coding best practices
Bill Buchan
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven development
Einar Ingebrigtsen
 

Similar a Designing and Testing Accumulo Iterators (20)

The View - Lotusscript coding best practices
The View - Lotusscript coding best practicesThe View - Lotusscript coding best practices
The View - Lotusscript coding best practices
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
Oracle Database Lifecycle Management
Oracle Database Lifecycle ManagementOracle Database Lifecycle Management
Oracle Database Lifecycle Management
 
Coherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gemsCoherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gems
 
Oracle Open World Exadata Monitoring and Management with EM12c
Oracle Open World Exadata Monitoring and Management with EM12cOracle Open World Exadata Monitoring and Management with EM12c
Oracle Open World Exadata Monitoring and Management with EM12c
 
Using MySQL Enterprise Monitor for Continuous Performance Improvement
Using MySQL Enterprise Monitor for Continuous Performance ImprovementUsing MySQL Enterprise Monitor for Continuous Performance Improvement
Using MySQL Enterprise Monitor for Continuous Performance Improvement
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
Agile Software Testing the Agilogy Way
Agile Software Testing the Agilogy WayAgile Software Testing the Agilogy Way
Agile Software Testing the Agilogy Way
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
 
WMS Overview
WMS OverviewWMS Overview
WMS Overview
 
Improving the Design of Existing Software
Improving the Design of Existing SoftwareImproving the Design of Existing Software
Improving the Design of Existing Software
 
Practical Software Testing Tools
Practical Software Testing ToolsPractical Software Testing Tools
Practical Software Testing Tools
 
Load testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew SiemerLoad testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew Siemer
 
QA standup - workload analysis
QA standup  - workload analysisQA standup  - workload analysis
QA standup - workload analysis
 
EARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements SyntaxEARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements Syntax
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven development
 
So You Think You Can Write a Test Case - XBOSoft Webinar
So You Think You Can Write a Test Case - XBOSoft WebinarSo You Think You Can Write a Test Case - XBOSoft Webinar
So You Think You Can Write a Test Case - XBOSoft Webinar
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to Work
 
Leandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & RightLeandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & Right
 
Improving The Quality of Existing Software
Improving The Quality of Existing SoftwareImproving The Quality of Existing Software
Improving The Quality of Existing Software
 

Más de Josh Elser

RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010
Josh Elser
 

Más de Josh Elser (10)

Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query Server
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo Iterators
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache Accumulo
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010
 

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Último (20)

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Designing and Testing Accumulo Iterators

  • 1. © Hortonworks Inc. 2014 Designing and Testing Accumulo Iterators Josh Elser Member of Technical Staff PMC, Apache Accumulo 10, November 2015 Page 1 Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
  • 2. © Hortonworks Inc. 2014 Design Page 2 How do I know if my Iterator works? What can I do in an Iterator? How are these methods even called?!
  • 3. © Hortonworks Inc. 2014 Common Patterns Only a certain subset of algorithms fit into Accumulo Iterators well. (Avoid shoving a square peg into a round hole.) • Filtering • Reduction • Bounded aggregations –Keep an upper bound on the number of elements being aggregated to avoid memory issues • Transformations –Key sort-order must be retained –Best limited to the Value only Page 3
  • 4. © Hortonworks Inc. 2014 Design Josh’s Iterator Design Principles: •Always make forward-progress •Think functional – Avoid unnecessary state •Operate only on the data you have •Do one thing and do it efficiently Page 4
  • 5. © Hortonworks Inc. 2014 Design Page 5 Make Forward Progress Start End
  • 6. © Hortonworks Inc. 2014 Think about your Iterator like a function Unnecessary State Page 6 def sum(list): sum = 0 for entry in list: sum += entry return sum • Avoid holding onto state when at all possible. • Think in terms of a stream rather than chunks of data. • Beware of memory implications when performing aggregations.
  • 7. © Hortonworks Inc. 2014 Operate locally Daily Reminder: Iterators have no calls for implementing a safe cleanup. • Iterators cannot properly handle I/O-related issues to external systems. • Slow-external calls result in slow Accumulo. • Some problems are more-safely implemented outside of an Accumulo Iterators. Not a Coprocessor/Container. Page 7
  • 8. © Hortonworks Inc. 2014 Simplicity Avoid doing multiple things in a single Iterator. •Object Oriented Design 101 •Iterators can be tricky to debug on their own •Configuring multiple iterators are a feature Page 8
  • 9. © Hortonworks Inc. 2014 Testing You should always test your code before running it in any environment to ensure that it functions as intended. Page 9
  • 10. © Hortonworks Inc. 2014 Testing HOW? Page 10
  • 11. © Hortonworks Inc. 2014 Testing A framework designed for testing Iterators given input, a Range, options, and expected output. Page 11
  • 12. © Hortonworks Inc. 2014 Testing Page 12 Test Test Test Test Test Iterator Class Range Iterator Options Sorted Input Data Verification of output records OR True/False check User-Provided Framework
  • 13. © Hortonworks Inc. 2014 Features •Auto-Discovery of test cases •JUnit Parameterized test integration •Provided Generic Tests –Default Constructor –Re-Seek (teardown) –Deep Copy Verification Page 13
  • 14. © Hortonworks Inc. 2014 Future Work •More Iterator tests! •A final resting place for the code •Documentation •Usability testing Page 14 https://issues.apache.org/jira/browse/ACCUMULO-626
  • 15. © Hortonworks Inc. 2014 Thanks! jelser@hortonworks.com Page 15