3. SDS: Succinct Data Structure
• Recently, Getting Popular in Some Areas
– Researches & Engineering
• Not Data Structure, But Data Representation
– A compressed method for other data structures
– e.g., alphabets, trees, and graphs
• Transparent Operations w/o Unpacking Explicitly
– e.g., succinct LZ77 compression*1
*1
3
Kreft, S. and Navarro, G.: LZ77-Like Compression with Fast Random Access, In Proceedings of DCC, 2010
4. More Details
• SDS = Succinct Data + Succinct Index
• Succinct Data
– Compact representation for target data
– Almost to information theoretic lower bounds
e.g., If N patterns, the lower bound’s logN
• Succinct Index
– O(1) operations for target data
– o(N) space costs: ignored asymptotically
4
5. More Details
If you need more information, ...
cited from: http://goo.gl/rkQ5z
5
7. A Rank/Select Operations
• SDS Composed of Rank/Select Operations
– Many calls of rank/select inside
• Rank/Select for Succinct Bit Sequences: B[i]
– rankx(n, B): the total of 1s in B[0...n]
– selectx(n, B): n-th position of x in B[]
i 0 1 2 3 4 5 6 7 8
B[i] 1 0 1 1 0 0 1 1 0
rank1(5, B)=3 select1(4, B)=6
7
9. Performance Results
• Performance Benchmark Setups*1
– Generate a random sequence of bits: 50% density
– Random rank/select queries over the bits
– CPU: Intel Core-i5 U470@1.33GHz
• Latency Observed
– 11 trials, and median latency
*1
9
Reference: http://d.hatena.ne.jp/s-yata/20111216/1324032373
13. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
B[] = A sequence of bits
N-bits
13
14. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2
• Split into log2N fixed-length blocks
• Total Counts Pre-computed in L[]
x x / log 2 N x
rank1 ( x, B) B[i ] B[i ] B[i]
i 1 i 1
i x / log 2 N 1
L1[ x / log 2 N ]
14
15. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2
• Split into log2N fixed-length blocks
• Total Counts Pre-computed in L[]
x x / log 2 N x
rank1 ( x, B) B[i ] B[i ] B[i]
i 1 i 1
i x / log 2 N 1
L[ x / log 2 N ]
O(log2N)
O(1) 15
16. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2
• L[]: o(N) space costs
N N
2
log N O( ) o( N )
log N log N
16
17. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2 1 log n
2
S[] = s1 s2
• Split into 1/2logN fixed-length blocks again
• Total Counts Pre-computed in S[]
1
x x / log N
2 x / 2 log N
x
rank1 ( x, B) B[i ] B[i ] B[i] B[i]
i 1 i 1
i x / log 2 N 1 1
i x / log N 1
2
1
L[ x / log 2 n] S[ x / log n]
2
17
18. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2 1 log n
2
S[] = s1 s2
• Split into 1/2logN fixed-length blocks again
• Total Counts Pre-computed in S[]
1 O(logN)
x / log N
2 x / log N
2
x x
rank1 ( x, B) B[i ] B[i ] B[i] B[i]
i 1 i 1
i x / log 2 N 1 1
i x / log N 1
2
1
L[ x / log 2 n] S [ x / log n]
2
O(1) O(1) 18
19. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2 1 log n
2
S[] = s1 s2
• S[]: o(N) space costs
N log log N
2
log(log N ) O( N
2
) o( N )
1 2 log N log N
19
20. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2 1 log n
2
S[] = s1 s2
• O(1) Popcount/Table-Lookup in Last Term
1 O(logN) -> O(1)
x x / log 2 N x / 2 log N
x
rank1 ( x, B) B[i ] B[i ] B[i] B[i]
i 1 i 1
i x / log 2 N 1 1
i x / log N 1
2
1
L[ x / log 2 n] S [ x / log n]
2
O(1) O(1)
20
21. Implementation: 4 Russian Methods
• Rule: O(1) operation costs with o(N) space
log 2 N
B[] = A sequence of bits
L[] = l1 l2 1 log n
2
S[] = s1 s2
• As a result, o(N) Space Costs
N 4 N log log N log log N
O( N ) o( N )
log N log N log N
L[] size S[] size
21
23. Implementation: Practice
• Low Computation Costs & High Cache Penalties
– 3 cache/TLB misses per rank
ex. rank1(402=256*1+32*4+18, B)
256bit
B[]: 01..000000....101......0 0110....001...............0 0000100 ...
32bit Popcount these left bits
L[]: 18 21 …
S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 …
23
24. Implementation: Practice
• Low Computation Costs & High Cache Penalties
– 3 cache/TLB misses per rank
ex. rank1(402=256*1+32*4+18, B)
256bit
B[]: 01..000000....101......0 0110....001...............0 0000100 ...
32bit Miss! Popcount these left bits
L[]: 18 Miss! 21 …
S[]: 1 3 4 6 7 9 10 13 2 5 7 9 12 13 18 19 1 3 7 …
Miss!
24
25. Implementation: Practice
• Packing the required data into a single cacheline
56B Chunk
4B 1B 32B
・・・ 12B padding
0110....001..........0 padding
64B Cache line
25
27. Implementation: Practice
• BTW, where select?
– Omitted for my time limit
– Plz see the code ...
• 2 Way Implementation
– O(logN) complexity
• ux-trie, rx, and marisa-trie
• Binary searches with rank
• Many cache/TLB misses suffered
– O(1) complexity
• My implementation to minimize these penalties
• 1-rank, 1-SIMD comparison, and O(1) –bsf
• Only 2 cache/TLB misses
27
28. Implementation: Practice
• BTW, where select?
– Omitted for my time limit
– Plz see the code ...
• 2 Way Implementation
– O(logN) complexity
• ux-trie, rx, and marisa-trie
• Binary searches with rank
• Many cache/TLB misses suffered
– O(1) complexity
• My implementation to minimize these penalties
• 1-rank, 1-SIMD comparison, and O(1) –bsf
• Only 2 cache/TLB misses
Not implemented yet ...
28