A long time ago in a galaxy far, far away...
Java open source developers managed to the see the previously secret plans to the Empire's ultimate weapon, the JAVA™ COLLECTIONS FRAMEWORK.
Evading the dreaded Imperial Starfleet, a group of freedom fighters investigate common developer errors and bugs to help protect their vital software. In addition, they investigate the performance of the Empire’s most popular weapon: HashMap. With this new found knowledge they strike back!
Pursued by the Empire's sinister agents, JDuchess races home aboard her JVM, investigating proposed future changes to the Java Collections and other options such as Immutable Persistent Collections which could save her people and restore freedom to the galaxy....
7. Reducing scope for bugs
● ~280 bugs in 28 projects including Cassandra, Lucene
● ~80% check-then-act bugs discovered are put-if-absent
● Library designers can help by updating APIs as new idioms emerge
● Different data structures can provide alternatives by restricting reads &
updates to reduce scope for bugs
CHECK-THEN-ACT Misuse of Java Concurrent Collections
http://dig.cs.illinois.edu/papers/checkThenAct.pdf
8. Java 9 API updates
Collection factory methods
● Non-goal to provide persistent immutable collections
● http://openjdk.java.net/jeps/269
Live Demo using jShell
http://iteratrlearning.com/java9/2016/11/09/java9-collection-factory-methods
11. Mutable
● Popular friends include ArrayList, HashMap, TreeSet
● Memory-efficient modification operations
● State can be accidentally modified
● Can be thread-safe, but requires careful design
14. Immutable & Non-persistent
● No updates
● Flexibility to convert source in a more efficient representation
● No locking in context of concurrency
● Satisfies co-variant subtyping requirements
● Can be copied with modifications to create a new version (can be
expensive)
16. Immutable and Persistent
● Changing source produces a new (version) of the collection
● Resulting collections shares structure with source to avoid full copying
on updates
18. Persistent List (aka Cons)
public final class Cons<T> implements ConsList<T> {
private final T head;
private final ConsList<T> tail;
public Cons(T head, ConsList<T> tail) {
this.head = head; this.tail = tail;
}
@Override
public ConsList<T> add(T e) {
return new Cons(e, this);
}
}
22. Concatenating Two Persistent Lists
- Poor locality due to pointer chasing
- Copying of nodes
A B C
X Y Z
Before
A B C
After
23. Persistent List
● Structural sharing: no need to copy full structure
● Poor locality due to pointer chasing
● Copying becomes more expensive with larger lists
● Poor Random Access and thus Data Decomposition
26. Persistent Array
How do we get the immutability benefits with performance of mutable
variants?
27. Trie
root
10 4520
3. Picking the right branch is done by using
parts of the key as a lookup
1. Branch factor
not limited to
binary
2. Leaf nodes
contain actual
values
a
a e
b
c
b c f
29. Trade-offs
● Large branching factor facilitates iteration but hinders updates
● Small branching factor facilitates updates but hinders traversal
30. Java Persistent Collections
- Not available as part of Java Core Library
- Existing projects includes
- PCollections: https://github.com/hrldcpr/pcollections
- Port of Clojure DS: https://github.com/krukow/clj-ds
- Port of Scala DS: https://github.com/andrewoma/dexx
- Now also in Javaslang: http://javaslang.io
31. Memory usage survey
10,000,000 elements, heap < 32GB
int[] : 40MB
Integer[]: 160MB
ArrayList<Integer>: 215MB
PersistentVector<Integer>: 214MB (Clojure-DS)
Vector<Integer>: 206MB (Dexx, port of Scala-DS)
Data collected using Java Object Layout:
http://openjdk.java.net/projects/code-tools/jol/
32. Takeaways
● Immutable collections reduce the scope for bugs
● Always a compromise between programming safety and performance
● Performance of persistent data structure is improving
36. Primitive specialised collections
● Collections often hold boxed representations of primitive values
● Java 8 introduced IntStream, LongStream, DoubleStream and
primitive specialised functional interfaces
● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide
primitive specialised collections today.
● Valhalla investigates primitive specialised generics
37. Java 8 Lazy Collection Initialization
Many allocated HashMaps and ArrayLists never written to, eg Null object
pattern
Java 8 adds Lazy Initialization for the default initialization case
Typically 1-2% reduction in memory consumption
http://www.javamagazine.mozaicreader.com/MarApr2016/Twitter#&pageS
et=28&page=0
45. java.util.HashMap in Java 8
Starts by using a List to store colliding values.
Trees used when there are over 8 elements
Tree based nodes use about twice the memory
Make heavy collision lookup case O(log(N)) rather than O(N)
Relies on keys being Comparable
https://github.com/RichardWarburton/map-visualiser
50. Probing vs Chaining
Probing Maps usually have lower memory consumption
Small Maps: Probing never has long clusters, can be up to 91% faster.
In large maps with high collision rates, probing scales poorly and can be
significantly slower.
51. Takeaways
There’s no clearcut “winner”.
JDK Implementations try to minimise worst case.
Linear Probing requires a good hashCode() distribution, Often hashmaps
“precondition” their hashes.
IdentityHashMap has low memory consumption and is fast, use it!
3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.