Más contenido relacionado Similar a Big Data & NoSQL - EFS'11 (Pavlo Baron) (20) Big Data & NoSQL - EFS'11 (Pavlo Baron)2. Pavlo Baron http://www.pbit.org [email_address] @pavlobaron 7. … and somewhere else a data center gets flooded with data (PB) 8. Big Data describes datasets that grow so large that they become awkward to work with using on-hand database management tools (Wikipedia) 9. NoSQL is not about … <140’000 things NoSQL is not about>… NoSQL is about choice (Jan Lehnardt, CouchDB) 18. Do you look at your data Once a month to create a management report? 20. Do you get flooded by tera-/petabytes of data? 22. Does your data flow on streams at a very high rate from different locations? 24. Do you need to distribute your data over the whole world 25. Or does your existence depend on (the quality of) your data? 26. Look back and turn back. Look at yourself 27. Is it the storage that you need to focus on? 29. Or do you have your customers spread all over the world ? 30. Or do you have complex statistical analysis to do? 31. Or do you have to filter data as it comes? 32. Or is it necessary to visualize the data? 35. Chop in bite-size , manageable pieces 45. Build upon consensus , agreement , voting , quorum 62. Design for theoretically unlimited amount of data 66. Why can we never be sure till we die . Or have killed for an answer 67. CAP – C onsistency, A vailability, P artition tolerance 68. CAP – the variations CA – irrelevant CP – eventually unavailable offering maximum consistency AP – eventually inconsistent offering maximum availability 70. CP Replica 1 Replica 2 v 1 read write v 2 read v 1 v 2 v 2 71. CP ( partition ) Replica 1 Replica 2 v 1 read write v 2 read v 1 v 2 72. AP Replica 1 Replica 2 v 1 read write v 2 read v 1 v 2 v 2 replicate 73. AP ( partition ) Replica 1 Replica 2 v 1 read write v 2 read v 1 v 2 v 2 hint handoff 75. BASE B asically A vailable, S oft-state, E ventually consistent Opposite to ACID 77. Read your write consistency write v 2 read v2 FE1 v 2 Data store v 3 v 1 write v 1 read v1 FE2 78. Session 2 Session 1 Session consistency write v 2 read v2 FE v 2 Data store v 3 v 1 write v 1 read v1 80. Monotonic read consistency read v 2 read v2 FE1 v 2 Data store v 3 v 1 read v 3 read v4 FE2 v 4 read v3 82. Monotonic write consistency write v 1 write v4 FE1 Data store v 2 write v 2 write v3 FE2 v 4 v 1 v 3 86. Node 1 Node 2 users products contracts Vertical sharding items orders addresses invoices „ read contract“ user=foo 87. Node 1 Node 2 users id(1-N) products Range based sharding addresses zip(1234- 2345) read users id(1-M) addresses zip(2346- 9999) write write read 88. Hash based sharding start with 3 nodes: node hash N = # mod 3 add 2 nodes N = # mod 5 kill 2 nodes N = # mod 3 94. The ring X bit integer space 0 <= N <= 2 ^ X or: 2 x Pi 0 <= A <= 2 x Pi x(N) = cos(A) y(N) = sin(A) 99. Clustering 12 partitions (constant) 3 nodes, 4 vnodes each add node 4 nodes, 3 vnodes each Alternatives: 3 nodes, 2 x 5 + 1 x 2 vnodes container based 100. Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum W > 0.5 * V R + W > V 101. Key = “foo” # = N, W = 2 N Insert key ( sloppy quorum) replicate ok 103. Key = “foo” # = N, R = 2 N Lookup key ( sloppy quorum) Value = “bar” 106. Clocks V(i), V(j): competing Conflict resolution: 1: siblings , client 2: merge , system 3: voting , system 107. Node 1 Node 2 Node 3 10:00 10:11 10:20 10:20 10:01 9:59 10:09 10:10 Timestamps 10:18 10:19 108. Node 1 Node 2 Node 3 1 3 5 6 2 2 4 5 4 7 7 7 Logical clocks 6 6 ? ? 109. Node 1 Node 2 Node 3 1,0,0 1,2,0 3,2,0 1,3,3 1,1,0 1,0,1 1,2,2 1,2,3 2,2,0 4,3,3 4,4,3 4,3,4 Vector clocks 110. Node 2 Node 3 Node 4 1,1,0,0 1,0,1,0 1,0,0,1 1,3,0,3 1,2,0, 2 1,2,0,3 Vector clocks Node 1 1,0,0,0 1,2,0,0 1,0,2,0 111. Merkle Trees N, M: nodes HT(N), HT(M): hash trees M needs update: obtain HT(N) calc delta(HT(M), HT(N)) pull keys(delta) 112. Node a.1 Node a.2 a ab ac abc abd acb acc Merkle Trees a ab ad abe abd ada adb 113. Node a.1 Node a.2 a ab abc abd Merkle Trees a ab ad abd ada adb 114. Sudden call shouldn't take away the startled memory 117. Eager replication - 3PC Coordinator Cohort 1 Cohort 2 yes can commit? pre commit ACK commit ok 118. Eager replication – 3PC ( failure ) Coordinator Cohort 1 Cohort 2 yes can commit? pre commit ACK abort ok 119. Eager replication- Paxos Commit 2F + 1 acceptorsoverall , F + 1 correct ones to achieve consensus Stability, Consistency, Non-Triviality, Non-Blocking 120. prepare 2b prepared initial leader other RMs RM1 2a prepared Eager replication – Paxos Commit Acceptors begin commit commit 121. Eager replication – Paxos Commit ( failure ) prepare timeout, no decision initial leader other RMs RM 1 2a prepared Acceptors begin commit abort prepare 2a prepared timeout, no decision 122. Master node Slave node(s) users products Lazy replication – Master/slave addresses read write read 123. Master node(s) Master node(s) Lazy replication – Master/master read write read users id(1-N) users id(1-M) items id(1-K) items id(1-L) write 124. stable updates Gossip – RM RM1 Clock table Replica clock Update log Value clock Value Executed operation table write RM2 gossip 125. Node 1 Node 2 Node 3 update Gossip – node down/up Node 4 update update, 4 down read read, 4 up update 126. Hinted handoff N: node, G: group including N node(N) is unavailable replicate to G or store data(N) locally hint handoff for later node(N) is alive handoff data to node(N) 127. Key = “foo” N replicate Key = “foo”, # = N -> handoff hint = true Direct replica fails 129. N Key = “foo”, # = N -> handoff hint = true All replicas fail 131. I’m a speed king, see me fly 133. MapReduce model: functional map/fold out-database MR irrelevant in-database MR: data locality no splitting needed distributed querying distributed processing 134. In-database MapReduce map reduce Node X Node C N = "Alice" map query = "Alice" Node A N = „ Alice" Node B N = "Alice" map hit list 137. Write through read write data store products write through users cache read read miss 138. Write back / snapshotting read write data store products write back users cache read miss 140. Physical storage row based: irrelevant column based: many rows, few columns value based: ad-hoc querying 141. Column based storage 1, 2 Peter, Anna London, Paris data store ID Name City 1 Peter London 2 Anna Paris 142. Value based storage 1:1, 3:Peter, 5:London, 2:2, 4:Anna, 6:Paris, 7:[1, 3, 5], 8:[2, 4, 6] data store ID Name City 1 Peter London 2 Anna Paris 145. Many graphics I’ve created myself, though I better should have asked @mononcqc for help ‘cause his drawings are awesome Some images originate from istockphoto.com except few ones taken from Wikipedia and product pages