SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Introducing Accumulo Collections:
A Practical Accumulo Interface
By Jonathan Wolff
jwolff@isentropy.com
Founder, Isentropy LLC
https://isentropy.com
Code and Documentation on Github
https://github.com/isentropy/accumulo-collections/wiki
Accumulo Needs A Practical API
● Accumulo is great under the hood, but needs a practical
interface for real-world NoSQL applications.
● Could companies use Accumulo in place of MySQL??
● Accumulo needs a layer to:
1) Handle java Object serialization locally and on tablet servers
2) Handle foreign keys/joins.
3) Abstract iterators, so that it's easy to do server-side
computations.
4) Provide a useful library of filters, transformations, aggregates.
What is Accumulo Collections?
● Accumulo Collections is a new, alternative NoSQL framework that
uses Accumulo as a backend. It abstracts powerful Accumulo
functionality in a concise java API.
● Since Accumulo is already a sorted map, java SortedMap is a
natural choice for an interface. It's already familiar to java
developers. Devs who know nothing about Accumulo can use it to
build giant, responsive NoSQL applications.
● But Accumulo Collections is more than a SortedMap
implementation...
● Many features are implemented on the tablet servers by iterators,
and wrapped in java methods. You don't need to understand
Accumulo iterators to use them.
AccumuloSortedMap wraps an
Accumulo table
● AccumuloSortedMap is a java SortedMap implementation that is backed by
an Accumulo table. It handles object serialization and foreign keys, and
abstracts powerful iterator functionality.
● Method calls derive new maps that contain transformations and aggregates.
Derived maps modify the underlying Scanner. This abstracts the concept of
iterators. Derived map methods run on-the-fly and can be chained:
// similar to SQL: WHERE timestamp BETWEEN t0 AND t1 AND rand() > .5
AccumuloSortedMap derivedMap = map.timeFilter(t0,t1).sample(0.5);
// statistical aggregate (mean, sd, n, etc) of values from key range [100,200)
StatisticalSummary stats = map.submap(100, 200).valueStats();
Each of the above methods stacks an iterator on the underlying map. The
iterators make use of SerDes to operate directly on java Objects.
Just like a standard java
SortedMap, but…
● AccumuloSortedMap returns a copy of the map value.
You must put() to save modifications.
● To use sorted map features, the SerDe used must
serialize bytes in same sort order as java Objects.
The default FixedPointSerde is suitable for most
common keys types (strings, primitives, byte[], etc).
More about SerDes later…
● Supports sizes greater than MAX_INT. See
sizeAsLong().
● Can be set to read-only. Derived map methods, which
stack scan iterators, always return read-only maps.
Use Accumulo as a SortedMap
AccumuloSortedMapFactory factory = new AccumuloSortedMapFactory(conn,"factory_name");
AccumuloSortedMap<Long,String> map = factory.makeMap("mapname");
for(long i=0; i<1000; i++){
map.put(i, "value"+i);
};
map.get(123); // equals “value123”
map.keySet().iterator().next(); // equals 0
AccumuloSortedMap submap = map.subMap(100, 150);
submap.size(); // equals 50
submap.firstKey(); // equals 100
submap.keyStats().getSum(); // equals 6225.0
for(Entry<Long,String> e : submap.entrySet()){ // iterate };
// these commands throws Exceptions. Both Maps are read-only.
map.setReadOnly(true).put(1000,”nogood”);
submap.put(1000,”nogood”);
Timestamp Features
AccumuloSortedMap makes use of Accumulo's timestamp features
and AgeOffFilter. Each map entry has an insert timestamp:
long insertTimestamp = map.getTimestamp(key);
Can filter map by timestamp. Implemented on tablet servers.
AccumuloSortedMap timeFiltered = map.timeFilter(fromTs, toTs);
Can set an entry TTL in ms. Implemented on tablet servers. Timed
out entries are wiped during compaction:
map.setTimeOutMs(5000);
Filter Entries by Regex
A bundled iterator filters entries on tablet servers by
comparing key.toString() and value.toString() to regexs. To
filter all keys that match “a(b|c)”:
map.put(“ac”,”1”);
map.put(“ax”,”2”);
map.put(“ab”,”3”);
// has only 1st and 3rd entries:
AccumuloSortedMap filtered = map.regexKeyFilter(“a(b|c)”);
Sampling and Partitioning Features
● AccumuloSortedMap supports sampling and partitioning on the tablet
servers using the supplied SamplingFilter (Accumulo iterator).
● You can derive a map that is a random sample:
AccumuloSortedMap sampleSubmap = map.sample(0.5);
● Or you can define a Sampler which will “freeze” a fixed subsample:
Sampler s = new Sampler(“my_sample_seed”,0.0,0.1,fromTs, toTs);
AccumuloSortedMap frozenSample = map.sample(s);
● When you supply a sample_seed, you define an ordering of the
keys by hash(sample_seed + key bytes). The same hash range
within that ordering will produce the same sample. The fractions
indicate the hash range.
Map Aggregates Computed on
Tablet Servers
● Aggregate functions are implemented using iterators
that calculate aggregate quantities over the entire
tablet server. The results are then combined locally.
● Similar to MapReduce with # mappers = # tservers
and # reducers = 1.
● Examples of built-in aggregate methods : size(),
checksum(), keyStats(), valueStats()
Efficient One-to-Many Mapping
● AccumuloSortedMap can be configured to allow multiple
values per key.
● Works by changing the VersioningIterator settings.
● SortedMap functions still work and see only the latest value.
● Extra methods give iterators over multiple values:
– Iterator<V> getAll(Object key)
– Iterator<Entry<K,V>> multiEntryIterator()
● All values for a given key will be stored on the same tablet
server. This enables server-side per-row aggregates. Like
SQL GROUP BY.
One-to-Many Example
map.setMaxValuesPerKey(-1); // unlimited
map.put(1, 2);
map.put(1, 3);
map.put(1, 4);
map.put(2, 22);
AccumuloSortedMap<Number, StatisticalSummary> row_stats = map.rowStats();
StatisticalSummary row1= map.row_stats.get(1);
row1.getMean(); // =3.0;
row1.getMax(); // = 4.0
// count multiple values
sizeAsLong(true); // = 4
//sum all values, looking at 1 value per key. 4 +22
map.valueStats().getSum(); // = 26.0
//sum all values, looking at multiple values per key. 2+3+4+22
map.valueStats(true).getSum(); // = 31
Writing Custom Transformations and
Aggregates
● Accumulo Collections provides useful abstract iterators
that operate on deserialized java Objects.
– Iterators are passed the SerDe classnames so that they
can read the deserialized Objects.
● You can extends these iterators to implement your own
transformations and aggregates. The API is very simple:
abstract Object transformValue(Object k, Object v);
abstract boolean allow(Object k, Object v);
Example: Custom Javascript
Tranformation
As an example of custom transformations, consider
ScriptTransformingIterator in the “experimental” package. You can pass
javaScript code, which is interpreted on the tablet servers. The key and
value bind to javaScript variables “k” and “v”. For example:
Allow only entries with even keys:
AccumuloSortedMap evens = map.jsFilter("k % 2 == 0");
Map of key → 3*value:
AccumuloSortedMap tripled = map.jsTransform(" 3*v ");
These examples work on keys and values that are java Numbers. Other
javascript functions also work on Strings, java Maps, etc.
Foreign Keys
Accumulo Collections provides a serializable ForeignKey Object which is
like a symbolic link that points to a map plus a key. There is no integrity
checking of the link:
map1.put("key1", "value1");
ForeignKey fk_to_key1 = map1.makeForeignKey("key1");
map2.put("key2", fk_to_key1);
// both equals "value1"
fk_to_key1.resolve(conn);
map2.get("key2").resolve(conn);
Using AccumuloSortedMapFactory
● The map factory is the preferred way to construct
AccumuloSortedMaps. The factory is itself a map
of (map name→ map metadata) with default
settings. The factory:
– acts as a namespace, mapping map names to real
Accumulo table names.
– Configures SerDes.
– Configures other metadata like
max_values_per_key.
Factory Example
AccumuloSortedMapFactory factory;
AccumuloSortedMap map;
factory = new AccumuloSortedMapFactory(conn,“factory_table”);
// 10 values per key default for all maps
factory.addDefaultProperty(MAP_PROPERTY_VALUES_PER_KEY , ”10” );
// 5000ms timeout in map “mymap”
factory.addMapSpecificProperty(“mymap”, MAP_PROPERTY_TTL, ”5000”);
map = factory.makeMap(“mymap”);
More about SerDes
● Accumulo uses BytesWritable.compareTo() to
compare keys on the tablet servers.
– No way to set alternate comparator (?)
● Keys must be serialized in such a way that byte
sort order is same as java sort order.
● FixedPointSerde, the default SerDe, writes
Numbers in fixed point unsigned format so that
numerical comparison works. Other Objects are
java serialized.
Bulk Import, Saving Dervied Maps
● The putAll and importAll methods in AccumuloSortedMap batch
writes to Accumulo, unlike put(). You can save a derived map using
putAll:
map.putAll(someOtherMap);
● importAll() is like putAll, but take an Iterator as an argument. This
can be used to import entries from other sources, like input streams
and files.
map.importAll(new TsvInputStreamIterator(“importfile.tsv”));
● Aside from batching, putAll() and importAll() do not do anything
special on the tablet servers. The import data all passes through the
local machine to Accumulo. The optional KeyValueTransformer runs
locally.
Benchmarks
● I benchmarked Accumulo Collections against raw
Accumulo read/writes on a toy Accumulo cluster
running in Docker. All the moving parts of a real
cluster, but running on one machine.
● All tests so far indicate that Accumulo Collections
adds very little overhead (~10%) to normal
Accumulo operation.
● I would appreciate it if someone sends me
benchmarks from a proper cluster!
Benchmark Data
read
write batched
write unbatched
0 2 4 6 8 10 12 14 16 18
Raw Accumulo vs Accumulo Collections
median time in ms, 10000 operations
raw
Acc Collections
median time (ms)
Performance Tips
● Batched writes are much faster. Use putAll() and
importAll() in place of put() when possible.
– Write your changes locally to a memory-based
Map, then store in bulk with putAll().
● Iterating over a range is much faster than lots of
individual get() calls.
– If you need to do lots of get() calls over a small
submap, you can cache a map locally in memory
with the localCopy() method.
Contact Info
● I'm available for hire. You can email me at
jwolff@isentropy.com. My consulting company,
Isentropy, is online at https://isentropy.com .
● Accumulo Collections is available on Github at
https://github.com/isentropy/accumulo-collections
● Constructive questions and comments welcome.

Más contenido relacionado

La actualidad más candente

HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2Tianwei Liu
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - StratosphereEuropean Data Forum
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...newmooxx
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXAndrea Iacono
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...Xiao Qin
 

La actualidad más candente (20)

HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Hadoop introduction 2
Hadoop introduction 2Hadoop introduction 2
Hadoop introduction 2
 
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012   Kostas Tzouma - Linking and analyzing bigdata - StratosphereEDF2012   Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...A time energy performance analysis of map reduce on heterogeneous systems wit...
A time energy performance analysis of map reduce on heterogeneous systems wit...
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
Hadoop 3
Hadoop 3Hadoop 3
Hadoop 3
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 

Similar a Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo Interface

Best practices in Java
Best practices in JavaBest practices in Java
Best practices in JavaMudit Gupta
 
ECMAScript 6 Review
ECMAScript 6 ReviewECMAScript 6 Review
ECMAScript 6 ReviewSperasoft
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on JrubyPrasun Anand
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...InfluxData
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Jyotirmoy Sundi
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectAssignmentpedia
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectAssignmentpedia
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
Getting started with ES6 : Future of javascript
Getting started with ES6 : Future of javascriptGetting started with ES6 : Future of javascript
Getting started with ES6 : Future of javascriptMohd Saeed
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat SheetLaura Hughes
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 

Similar a Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo Interface (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Best practices in Java
Best practices in JavaBest practices in Java
Best practices in Java
 
ECMAScript 6 Review
ECMAScript 6 ReviewECMAScript 6 Review
ECMAScript 6 Review
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on Jruby
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
 
Distributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Sqlapi0.1
Sqlapi0.1Sqlapi0.1
Sqlapi0.1
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Getting started with ES6 : Future of javascript
Getting started with ES6 : Future of javascriptGetting started with ES6 : Future of javascript
Getting started with ES6 : Future of javascript
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
Unit 3
Unit 3 Unit 3
Unit 3
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Java 8
Java 8Java 8
Java 8
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Gephi Toolkit Tutorial
Gephi Toolkit TutorialGephi Toolkit Tutorial
Gephi Toolkit Tutorial
 

Último

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 

Último (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo Interface

  • 1. Introducing Accumulo Collections: A Practical Accumulo Interface By Jonathan Wolff jwolff@isentropy.com Founder, Isentropy LLC https://isentropy.com Code and Documentation on Github https://github.com/isentropy/accumulo-collections/wiki
  • 2. Accumulo Needs A Practical API ● Accumulo is great under the hood, but needs a practical interface for real-world NoSQL applications. ● Could companies use Accumulo in place of MySQL?? ● Accumulo needs a layer to: 1) Handle java Object serialization locally and on tablet servers 2) Handle foreign keys/joins. 3) Abstract iterators, so that it's easy to do server-side computations. 4) Provide a useful library of filters, transformations, aggregates.
  • 3. What is Accumulo Collections? ● Accumulo Collections is a new, alternative NoSQL framework that uses Accumulo as a backend. It abstracts powerful Accumulo functionality in a concise java API. ● Since Accumulo is already a sorted map, java SortedMap is a natural choice for an interface. It's already familiar to java developers. Devs who know nothing about Accumulo can use it to build giant, responsive NoSQL applications. ● But Accumulo Collections is more than a SortedMap implementation... ● Many features are implemented on the tablet servers by iterators, and wrapped in java methods. You don't need to understand Accumulo iterators to use them.
  • 4. AccumuloSortedMap wraps an Accumulo table ● AccumuloSortedMap is a java SortedMap implementation that is backed by an Accumulo table. It handles object serialization and foreign keys, and abstracts powerful iterator functionality. ● Method calls derive new maps that contain transformations and aggregates. Derived maps modify the underlying Scanner. This abstracts the concept of iterators. Derived map methods run on-the-fly and can be chained: // similar to SQL: WHERE timestamp BETWEEN t0 AND t1 AND rand() > .5 AccumuloSortedMap derivedMap = map.timeFilter(t0,t1).sample(0.5); // statistical aggregate (mean, sd, n, etc) of values from key range [100,200) StatisticalSummary stats = map.submap(100, 200).valueStats(); Each of the above methods stacks an iterator on the underlying map. The iterators make use of SerDes to operate directly on java Objects.
  • 5. Just like a standard java SortedMap, but… ● AccumuloSortedMap returns a copy of the map value. You must put() to save modifications. ● To use sorted map features, the SerDe used must serialize bytes in same sort order as java Objects. The default FixedPointSerde is suitable for most common keys types (strings, primitives, byte[], etc). More about SerDes later… ● Supports sizes greater than MAX_INT. See sizeAsLong(). ● Can be set to read-only. Derived map methods, which stack scan iterators, always return read-only maps.
  • 6. Use Accumulo as a SortedMap AccumuloSortedMapFactory factory = new AccumuloSortedMapFactory(conn,"factory_name"); AccumuloSortedMap<Long,String> map = factory.makeMap("mapname"); for(long i=0; i<1000; i++){ map.put(i, "value"+i); }; map.get(123); // equals “value123” map.keySet().iterator().next(); // equals 0 AccumuloSortedMap submap = map.subMap(100, 150); submap.size(); // equals 50 submap.firstKey(); // equals 100 submap.keyStats().getSum(); // equals 6225.0 for(Entry<Long,String> e : submap.entrySet()){ // iterate }; // these commands throws Exceptions. Both Maps are read-only. map.setReadOnly(true).put(1000,”nogood”); submap.put(1000,”nogood”);
  • 7. Timestamp Features AccumuloSortedMap makes use of Accumulo's timestamp features and AgeOffFilter. Each map entry has an insert timestamp: long insertTimestamp = map.getTimestamp(key); Can filter map by timestamp. Implemented on tablet servers. AccumuloSortedMap timeFiltered = map.timeFilter(fromTs, toTs); Can set an entry TTL in ms. Implemented on tablet servers. Timed out entries are wiped during compaction: map.setTimeOutMs(5000);
  • 8. Filter Entries by Regex A bundled iterator filters entries on tablet servers by comparing key.toString() and value.toString() to regexs. To filter all keys that match “a(b|c)”: map.put(“ac”,”1”); map.put(“ax”,”2”); map.put(“ab”,”3”); // has only 1st and 3rd entries: AccumuloSortedMap filtered = map.regexKeyFilter(“a(b|c)”);
  • 9. Sampling and Partitioning Features ● AccumuloSortedMap supports sampling and partitioning on the tablet servers using the supplied SamplingFilter (Accumulo iterator). ● You can derive a map that is a random sample: AccumuloSortedMap sampleSubmap = map.sample(0.5); ● Or you can define a Sampler which will “freeze” a fixed subsample: Sampler s = new Sampler(“my_sample_seed”,0.0,0.1,fromTs, toTs); AccumuloSortedMap frozenSample = map.sample(s); ● When you supply a sample_seed, you define an ordering of the keys by hash(sample_seed + key bytes). The same hash range within that ordering will produce the same sample. The fractions indicate the hash range.
  • 10. Map Aggregates Computed on Tablet Servers ● Aggregate functions are implemented using iterators that calculate aggregate quantities over the entire tablet server. The results are then combined locally. ● Similar to MapReduce with # mappers = # tservers and # reducers = 1. ● Examples of built-in aggregate methods : size(), checksum(), keyStats(), valueStats()
  • 11. Efficient One-to-Many Mapping ● AccumuloSortedMap can be configured to allow multiple values per key. ● Works by changing the VersioningIterator settings. ● SortedMap functions still work and see only the latest value. ● Extra methods give iterators over multiple values: – Iterator<V> getAll(Object key) – Iterator<Entry<K,V>> multiEntryIterator() ● All values for a given key will be stored on the same tablet server. This enables server-side per-row aggregates. Like SQL GROUP BY.
  • 12. One-to-Many Example map.setMaxValuesPerKey(-1); // unlimited map.put(1, 2); map.put(1, 3); map.put(1, 4); map.put(2, 22); AccumuloSortedMap<Number, StatisticalSummary> row_stats = map.rowStats(); StatisticalSummary row1= map.row_stats.get(1); row1.getMean(); // =3.0; row1.getMax(); // = 4.0 // count multiple values sizeAsLong(true); // = 4 //sum all values, looking at 1 value per key. 4 +22 map.valueStats().getSum(); // = 26.0 //sum all values, looking at multiple values per key. 2+3+4+22 map.valueStats(true).getSum(); // = 31
  • 13. Writing Custom Transformations and Aggregates ● Accumulo Collections provides useful abstract iterators that operate on deserialized java Objects. – Iterators are passed the SerDe classnames so that they can read the deserialized Objects. ● You can extends these iterators to implement your own transformations and aggregates. The API is very simple: abstract Object transformValue(Object k, Object v); abstract boolean allow(Object k, Object v);
  • 14. Example: Custom Javascript Tranformation As an example of custom transformations, consider ScriptTransformingIterator in the “experimental” package. You can pass javaScript code, which is interpreted on the tablet servers. The key and value bind to javaScript variables “k” and “v”. For example: Allow only entries with even keys: AccumuloSortedMap evens = map.jsFilter("k % 2 == 0"); Map of key → 3*value: AccumuloSortedMap tripled = map.jsTransform(" 3*v "); These examples work on keys and values that are java Numbers. Other javascript functions also work on Strings, java Maps, etc.
  • 15. Foreign Keys Accumulo Collections provides a serializable ForeignKey Object which is like a symbolic link that points to a map plus a key. There is no integrity checking of the link: map1.put("key1", "value1"); ForeignKey fk_to_key1 = map1.makeForeignKey("key1"); map2.put("key2", fk_to_key1); // both equals "value1" fk_to_key1.resolve(conn); map2.get("key2").resolve(conn);
  • 16. Using AccumuloSortedMapFactory ● The map factory is the preferred way to construct AccumuloSortedMaps. The factory is itself a map of (map name→ map metadata) with default settings. The factory: – acts as a namespace, mapping map names to real Accumulo table names. – Configures SerDes. – Configures other metadata like max_values_per_key.
  • 17. Factory Example AccumuloSortedMapFactory factory; AccumuloSortedMap map; factory = new AccumuloSortedMapFactory(conn,“factory_table”); // 10 values per key default for all maps factory.addDefaultProperty(MAP_PROPERTY_VALUES_PER_KEY , ”10” ); // 5000ms timeout in map “mymap” factory.addMapSpecificProperty(“mymap”, MAP_PROPERTY_TTL, ”5000”); map = factory.makeMap(“mymap”);
  • 18. More about SerDes ● Accumulo uses BytesWritable.compareTo() to compare keys on the tablet servers. – No way to set alternate comparator (?) ● Keys must be serialized in such a way that byte sort order is same as java sort order. ● FixedPointSerde, the default SerDe, writes Numbers in fixed point unsigned format so that numerical comparison works. Other Objects are java serialized.
  • 19. Bulk Import, Saving Dervied Maps ● The putAll and importAll methods in AccumuloSortedMap batch writes to Accumulo, unlike put(). You can save a derived map using putAll: map.putAll(someOtherMap); ● importAll() is like putAll, but take an Iterator as an argument. This can be used to import entries from other sources, like input streams and files. map.importAll(new TsvInputStreamIterator(“importfile.tsv”)); ● Aside from batching, putAll() and importAll() do not do anything special on the tablet servers. The import data all passes through the local machine to Accumulo. The optional KeyValueTransformer runs locally.
  • 20. Benchmarks ● I benchmarked Accumulo Collections against raw Accumulo read/writes on a toy Accumulo cluster running in Docker. All the moving parts of a real cluster, but running on one machine. ● All tests so far indicate that Accumulo Collections adds very little overhead (~10%) to normal Accumulo operation. ● I would appreciate it if someone sends me benchmarks from a proper cluster!
  • 21. Benchmark Data read write batched write unbatched 0 2 4 6 8 10 12 14 16 18 Raw Accumulo vs Accumulo Collections median time in ms, 10000 operations raw Acc Collections median time (ms)
  • 22. Performance Tips ● Batched writes are much faster. Use putAll() and importAll() in place of put() when possible. – Write your changes locally to a memory-based Map, then store in bulk with putAll(). ● Iterating over a range is much faster than lots of individual get() calls. – If you need to do lots of get() calls over a small submap, you can cache a map locally in memory with the localCopy() method.
  • 23. Contact Info ● I'm available for hire. You can email me at jwolff@isentropy.com. My consulting company, Isentropy, is online at https://isentropy.com . ● Accumulo Collections is available on Github at https://github.com/isentropy/accumulo-collections ● Constructive questions and comments welcome.