Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Empowering Big Data
with Cassandra
Empowering Big Data
with Cassandra
./me
> Renato Carelli
- DevOps + Infra @ Big Data
- Hardening Enthusiast
- Cloud evangelist
- Bitcoin speculator
./intro
./intro/CAP
Consistency
Availability
Partition
tolerance
CA CP
AP
N/A
./intro/RDBMS
Data
./intro/NoSQL
Unstructured
(not really a DB)
Key Value Column Graph Document
General file
storage
Text files
Log files
Com...
./intro/BigData
./intro/specs
./intro/history
BigTable (2006) Dynamo (2007)
Open Source
(2008)
{data modeling} {design}
./intro/version_history
(0.1) (0.3) (0.6) (0.7)(0.8)(1.0) (1.1) (1.2) (2.0) (2.1)
./infra
./infra/features
N1
N2N4
N3
> Masterless
> Distributed
> Decentralized [p2p]
> Elastically Scalable
> Highly Available
> F...
./infra/benchmark
Nodes
Ops/sec
./infra/benchmark
./infra/references
N1 C* Node
Connection
Failed
Connection
Established
Updated Data
Outdated Data
ACK
Slow Connection
Esta...
./infra/token
Murmur3Partitioner:
-2^63 to +2^63 -1
token(‘Globant’) = -6148914691517517206
./infra/token
DemoPartitioner:
1 to 100
token(‘Globant’) = 68
./infra/token_ring
Node 1 Node 2
Node 3 Node 4
./infra/token_ring
Node 1 Node 2
Node 3 Node 4
1 - 25
26 - 50
51 - 75 76 - 100
‘Glob’ = 17
‘ant’ = 94
‘Globant’ = 68
~/Ima...
./infra/token_ring/replication
Node 1 Node 2
Node 3 Node 4
1 - 25 26 - 50
51 - 75 76 - 100
‘Glob’ = 17
RF = 3
./infra/token_ring/vnodes
What about virtual nodes?
C* 1.2
./infra/coordinator
N1
N2N4
N3
./infra/coordinator
N1
N2N4
N3
> read
RF = 3
CL = TWO
./infra/coordinator
N1
N2N4
N3
> read
RF = 3
CL = TWO
./infra/coordinator
N1
N2N4
N3
Coordinator
> read
RF = 3
CL = TWO
./infra/coordinator
N1
N2N4
N3
Coordinator
> read
RF = 3
CL = TWO
./infra/coordinator
N1
N2N4
N3
Coordinator
> read
RF = 3
CL = TWO
./infra/coordinator
N1
N2N4
N3
Coordinator
> read
RF = 3
CL = TWO
./infra/coordinator
N1
N2N4
N3
Coordinator
> read
RF = 3
CL = TWO
./infra/coordinator
N1
N2N4
N3
Coordinator
> read
RF = 3
CL = TWO
./infra/replication
How many copies of each piece of data
(partition) do we want in the system?
./infra/replication
> Replication Factor
> Replication Strategy
Keyspace-based!
./infra/replication
N1
N2N4
N3RF = 3
./infra/replication
CREATE KEYSPACE Globant
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
./infra/replication
N1
N2N3
N4
R1
R2
Data Center - East
N1
N2N3
N4
R1
R2
Data Center - West
RF = {‘w’:3, ‘e’:2}
./infra/replication
CREATE KEYSPACE Globant
WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'w' : 3, 'e' : 2};
./infra/consistency_level
How many replicas/nodes (based in RF) must
respond to declare success?
./infra/consistency_level
Query-based!
./infra/consistency_level
N1
N2N4
N3
> write
CL = QUORUM
CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EAC...
./infra/consistency_level
N1
N2N4
N3
> read
CL = ALL
CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QU...
./infra/consistency_level
N1
N2N4
N3
> read
CL = QUORUM
CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH...
./infra/consistency_level
Latest timestamp wins!
./infra/consistency_level/immediate
{ R + W > RF }
./infra/consistency_level/immediate
+Reads
> Write CL: ALL
> Read CL: ONE
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ALL
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ALL
> read
CL = ONE
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ALL
> read
CL = ONE
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ALL
> read
CL = ONE
./infra/consistency_level/immediate
{ R + W > RF }
./infra/consistency_level/immediate
{ 1 + 3 > 3 }
./infra/consistency_level/immediate
+Writes
> Write CL: ONE
> Read CL: ALL
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ONE
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ONE
> read
CL = ALL
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ONE
> read
CL = ALL
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = ONE
> read
CL = ALL
./infra/consistency_level/immediate
{ R + W > RF }
./infra/consistency_level/immediate
{ 3 + 1 > 3 }
./infra/consistency_level/immediate
Balanced
> Write CL: QUORUM
> Read CL: QUORUM
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = QUORUM
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = QUORUM
> read
CL = QUORUM
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = QUORUM
> read
CL = QUORUM
./infra/consistency_level/immediate
N1
N2N4
N3
RF = 3
> write
CL = QUORUM
> read
CL = QUORUM
./infra/consistency_level/immediate
{ R + W > RF }
./infra/consistency_level/immediate
{ 2 + 2 > 3 }
./infra/read_repair
> Query ALL replicas when reading
- Data from one.
- Checksum + Timestamp from others.
./infra/read_repair
> If there is a mismatch:
- Pull all data and merge
- Write back to out of sync replicas
./infra/read_repair
Table-based!
./infra/read_repair
N1
N2N4
N3
DATA
SUM
SUM
./infra/read_repair
N1
N2N4
N3
DATA
SUM
SUM
./infra/read_repair
N1
N2N4
N3
./infra/read_repair
ALTER TABLE Globant.foobar
WITH read_repair_chance = 0.2;
./infra/read_repair
> Weak Consistency
return results + repair
> Strong Consistency
repair + return results
./infra/hinted_handoff
> Recovery mechanism
- Stored @ Coordinator‘s system.hints
- 3hs default TTL
- DataCenter-based!
./infra/nodetool
$ nodetool repair
> Recovering a failed node
> Infreq read data (read repair chance)
> Tombstone gc perio...
./internals
./internals/write_path
N
partition key 3 n: SasoConf city: cur year: 3
partition key 2 n: EkoParty city: caba year: 11
par...
./internals/write_path
N
MEMORY
STORAGE
Memtable (1 table)
Commit
Log
SSTables
... ... ... ...
... ... ... ...
... ... ......
./internals/write_path
N
MEMORY
STORAGE
Memtable (1 table)
Commit
Log
SSTables
... ... ... ...
... ... ... ...
... ... ......
./internals/write_path
N
MEMORY
STORAGE
Memtable (1 table)
Commit
Log
SSTables
C
... ... ... ...
... ... ... ...
... ... ....
./hands-on
./hands-on/stress
> 1.3M writes/sec (1.3 write/µs)
> 160K reads/sec (160 reads/ms)
> Collisions?
./hands-on/stress
Custom Apps
./me/contact
> Renato Carelli
- mailto: renato.carelli@globant.com
- mailto: renato@carelli.com.ar
- telegram: @renato
We are hiring DevOps!
> mailto: solange.domijan@globant.com
Thanks!
Próxima SlideShare
Cargando en…5
×

[Globant summer take over] Empowering Big Data with Cassandra

518 visualizaciones

Publicado el

Mar del Plata Summer Take Over Presentation 2016 - By Renato Carelli
DevOps + Infra @ Big Data
Hardening Enthusiast
Cloud evangelist
Bitcoin speculator

Publicado en: Educación
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

[Globant summer take over] Empowering Big Data with Cassandra

  1. 1. Empowering Big Data with Cassandra
  2. 2. Empowering Big Data with Cassandra
  3. 3. ./me > Renato Carelli - DevOps + Infra @ Big Data - Hardening Enthusiast - Cloud evangelist - Bitcoin speculator
  4. 4. ./intro
  5. 5. ./intro/CAP Consistency Availability Partition tolerance CA CP AP N/A
  6. 6. ./intro/RDBMS Data
  7. 7. ./intro/NoSQL Unstructured (not really a DB) Key Value Column Graph Document General file storage Text files Log files Complex models Flexible business logic Semi-structured data High volumes OLAP Analytics NOT FOR UPDATES Relations between entities (social graphs) Agile development Flexible data- models Too many types. Eg: Corporate areas Data Store BigQueryGFS BigTableCloudStore
  8. 8. ./intro/BigData
  9. 9. ./intro/specs
  10. 10. ./intro/history BigTable (2006) Dynamo (2007) Open Source (2008) {data modeling} {design}
  11. 11. ./intro/version_history (0.1) (0.3) (0.6) (0.7)(0.8)(1.0) (1.1) (1.2) (2.0) (2.1)
  12. 12. ./infra
  13. 13. ./infra/features N1 N2N4 N3 > Masterless > Distributed > Decentralized [p2p] > Elastically Scalable > Highly Available > Fault-Tolerant > Tuneable Consistent
  14. 14. ./infra/benchmark Nodes Ops/sec
  15. 15. ./infra/benchmark
  16. 16. ./infra/references N1 C* Node Connection Failed Connection Established Updated Data Outdated Data ACK Slow Connection Established
  17. 17. ./infra/token Murmur3Partitioner: -2^63 to +2^63 -1 token(‘Globant’) = -6148914691517517206
  18. 18. ./infra/token DemoPartitioner: 1 to 100 token(‘Globant’) = 68
  19. 19. ./infra/token_ring Node 1 Node 2 Node 3 Node 4
  20. 20. ./infra/token_ring Node 1 Node 2 Node 3 Node 4 1 - 25 26 - 50 51 - 75 76 - 100 ‘Glob’ = 17 ‘ant’ = 94 ‘Globant’ = 68 ~/Images/pic.png = 69 ~/media/movie.mkv = 34
  21. 21. ./infra/token_ring/replication Node 1 Node 2 Node 3 Node 4 1 - 25 26 - 50 51 - 75 76 - 100 ‘Glob’ = 17 RF = 3
  22. 22. ./infra/token_ring/vnodes What about virtual nodes? C* 1.2
  23. 23. ./infra/coordinator N1 N2N4 N3
  24. 24. ./infra/coordinator N1 N2N4 N3 > read RF = 3 CL = TWO
  25. 25. ./infra/coordinator N1 N2N4 N3 > read RF = 3 CL = TWO
  26. 26. ./infra/coordinator N1 N2N4 N3 Coordinator > read RF = 3 CL = TWO
  27. 27. ./infra/coordinator N1 N2N4 N3 Coordinator > read RF = 3 CL = TWO
  28. 28. ./infra/coordinator N1 N2N4 N3 Coordinator > read RF = 3 CL = TWO
  29. 29. ./infra/coordinator N1 N2N4 N3 Coordinator > read RF = 3 CL = TWO
  30. 30. ./infra/coordinator N1 N2N4 N3 Coordinator > read RF = 3 CL = TWO
  31. 31. ./infra/coordinator N1 N2N4 N3 Coordinator > read RF = 3 CL = TWO
  32. 32. ./infra/replication How many copies of each piece of data (partition) do we want in the system?
  33. 33. ./infra/replication > Replication Factor > Replication Strategy Keyspace-based!
  34. 34. ./infra/replication N1 N2N4 N3RF = 3
  35. 35. ./infra/replication CREATE KEYSPACE Globant WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
  36. 36. ./infra/replication N1 N2N3 N4 R1 R2 Data Center - East N1 N2N3 N4 R1 R2 Data Center - West RF = {‘w’:3, ‘e’:2}
  37. 37. ./infra/replication CREATE KEYSPACE Globant WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'w' : 3, 'e' : 2};
  38. 38. ./infra/consistency_level How many replicas/nodes (based in RF) must respond to declare success?
  39. 39. ./infra/consistency_level Query-based!
  40. 40. ./infra/consistency_level N1 N2N4 N3 > write CL = QUORUM CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL }
  41. 41. ./infra/consistency_level N1 N2N4 N3 > read CL = ALL CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL }
  42. 42. ./infra/consistency_level N1 N2N4 N3 > read CL = QUORUM CL = { ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL }
  43. 43. ./infra/consistency_level Latest timestamp wins!
  44. 44. ./infra/consistency_level/immediate { R + W > RF }
  45. 45. ./infra/consistency_level/immediate +Reads > Write CL: ALL > Read CL: ONE
  46. 46. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ALL
  47. 47. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ALL > read CL = ONE
  48. 48. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ALL > read CL = ONE
  49. 49. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ALL > read CL = ONE
  50. 50. ./infra/consistency_level/immediate { R + W > RF }
  51. 51. ./infra/consistency_level/immediate { 1 + 3 > 3 }
  52. 52. ./infra/consistency_level/immediate +Writes > Write CL: ONE > Read CL: ALL
  53. 53. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ONE
  54. 54. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ONE > read CL = ALL
  55. 55. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ONE > read CL = ALL
  56. 56. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = ONE > read CL = ALL
  57. 57. ./infra/consistency_level/immediate { R + W > RF }
  58. 58. ./infra/consistency_level/immediate { 3 + 1 > 3 }
  59. 59. ./infra/consistency_level/immediate Balanced > Write CL: QUORUM > Read CL: QUORUM
  60. 60. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = QUORUM
  61. 61. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = QUORUM > read CL = QUORUM
  62. 62. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = QUORUM > read CL = QUORUM
  63. 63. ./infra/consistency_level/immediate N1 N2N4 N3 RF = 3 > write CL = QUORUM > read CL = QUORUM
  64. 64. ./infra/consistency_level/immediate { R + W > RF }
  65. 65. ./infra/consistency_level/immediate { 2 + 2 > 3 }
  66. 66. ./infra/read_repair > Query ALL replicas when reading - Data from one. - Checksum + Timestamp from others.
  67. 67. ./infra/read_repair > If there is a mismatch: - Pull all data and merge - Write back to out of sync replicas
  68. 68. ./infra/read_repair Table-based!
  69. 69. ./infra/read_repair N1 N2N4 N3 DATA SUM SUM
  70. 70. ./infra/read_repair N1 N2N4 N3 DATA SUM SUM
  71. 71. ./infra/read_repair N1 N2N4 N3
  72. 72. ./infra/read_repair ALTER TABLE Globant.foobar WITH read_repair_chance = 0.2;
  73. 73. ./infra/read_repair > Weak Consistency return results + repair > Strong Consistency repair + return results
  74. 74. ./infra/hinted_handoff > Recovery mechanism - Stored @ Coordinator‘s system.hints - 3hs default TTL - DataCenter-based!
  75. 75. ./infra/nodetool $ nodetool repair > Recovering a failed node > Infreq read data (read repair chance) > Tombstone gc period (gc_grace_seconds)
  76. 76. ./internals
  77. 77. ./internals/write_path N partition key 3 n: SasoConf city: cur year: 3 partition key 2 n: EkoParty city: caba year: 11 partition key 1 n: pwnConf city: mdq year: 2 MEMORY STORAGE Memtable (1 table) Commit Log SSTables ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... C
  78. 78. ./internals/write_path N MEMORY STORAGE Memtable (1 table) Commit Log SSTables ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... C (Flush) ... ... ... ... ... ... ... ... ... ... ... ... partition key 3 n: SasoConf city: cur year: 3 partition key 2 n: EkoParty city: caba year: 11 partition key 1 n: pwnConf city: mdq year: 2
  79. 79. ./internals/write_path N MEMORY STORAGE Memtable (1 table) Commit Log SSTables ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... C ... ... ... ... ... ... ... ... ... ... ... ... partition key 3 n: SasoConf city: cur year: 3 partition key 2 n: EkoParty city: caba year: 11 partition key 1 n: pwnConf city: mdq year: 2 Compaction ... ... ... ... ... ... ... ... ... ... ... ...
  80. 80. ./internals/write_path N MEMORY STORAGE Memtable (1 table) Commit Log SSTables C ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Compaction ... ... ... ... ... ... ... ... ... ... ... ...
  81. 81. ./hands-on
  82. 82. ./hands-on/stress > 1.3M writes/sec (1.3 write/µs) > 160K reads/sec (160 reads/ms) > Collisions?
  83. 83. ./hands-on/stress Custom Apps
  84. 84. ./me/contact > Renato Carelli - mailto: renato.carelli@globant.com - mailto: renato@carelli.com.ar - telegram: @renato
  85. 85. We are hiring DevOps! > mailto: solange.domijan@globant.com
  86. 86. Thanks!

×