Más contenido relacionado
La actualidad más candente (19)
Similar a Designing and Testing Accumulo Iterators (20)
Designing and Testing Accumulo Iterators
- 1. © Hortonworks Inc. 2014
Designing and Testing
Accumulo Iterators
Josh Elser
Member of Technical Staff
PMC, Apache Accumulo
10, November 2015
Page 1
Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
- 2. © Hortonworks Inc. 2014
Design
Page 2
How do I know if my Iterator works?
What can I do in an Iterator?
How are these methods even called?!
- 3. © Hortonworks Inc. 2014
Common Patterns
Only a certain subset of algorithms fit into Accumulo
Iterators well.
(Avoid shoving a square peg into a round hole.)
• Filtering
• Reduction
• Bounded aggregations
–Keep an upper bound on the number of elements being
aggregated to avoid memory issues
• Transformations
–Key sort-order must be retained
–Best limited to the Value only
Page 3
- 4. © Hortonworks Inc. 2014
Design
Josh’s Iterator Design Principles:
•Always make forward-progress
•Think functional – Avoid unnecessary state
•Operate only on the data you have
•Do one thing and do it efficiently
Page 4
- 6. © Hortonworks Inc. 2014
Think about your Iterator like a function
Unnecessary State
Page 6
def sum(list):
sum = 0
for entry in list:
sum += entry
return sum
• Avoid holding onto state when
at all possible.
• Think in terms of a stream
rather than chunks of data.
• Beware of memory implications
when performing aggregations.
- 7. © Hortonworks Inc. 2014
Operate locally
Daily Reminder: Iterators have no calls
for implementing a safe cleanup.
• Iterators cannot properly handle I/O-related issues to
external systems.
• Slow-external calls result in slow Accumulo.
• Some problems are more-safely implemented outside of
an Accumulo Iterators. Not a Coprocessor/Container.
Page 7
- 8. © Hortonworks Inc. 2014
Simplicity
Avoid doing multiple things in a
single Iterator.
•Object Oriented Design 101
•Iterators can be tricky to debug on their own
•Configuring multiple iterators are a feature
Page 8
- 9. © Hortonworks Inc. 2014
Testing
You should always test your code
before running it in any environment
to ensure that it functions as intended.
Page 9
- 11. © Hortonworks Inc. 2014
Testing
A framework designed for testing Iterators given input,
a Range, options, and expected output.
Page 11
- 12. © Hortonworks Inc. 2014
Testing
Page 12
Test
Test
Test
Test
Test
Iterator Class
Range
Iterator Options
Sorted Input Data
Verification of
output records
OR
True/False check
User-Provided
Framework
- 13. © Hortonworks Inc. 2014
Features
•Auto-Discovery of test cases
•JUnit Parameterized test integration
•Provided Generic Tests
–Default Constructor
–Re-Seek (teardown)
–Deep Copy Verification
Page 13
- 14. © Hortonworks Inc. 2014
Future Work
•More Iterator tests!
•A final resting place for the code
•Documentation
•Usability testing
Page 14
https://issues.apache.org/jira/browse/ACCUMULO-626