SimpleReach is a social intelligence tool for content creators. In order to handle both the data ingestion and data volume, we've employed Cassandra to store, process and aid in the display and organization of that data. We've learned a lot of lessons along the way about the right and wrong things to do both with data in general and with Cassandra in particular. These are some of those lessons.
14. Why?
• Heavier READ loads vs heavier write loads
100 Million Events Eric Lubow @elubow
15. Why?
• Heavier READ loads vs heavier write loads
• Data relationships may be less important
100 Million Events Eric Lubow @elubow
16. Why?
• Heavier READ loads vs heavier write loads
• Data relationships may be less important
• Different aspects of a system have different requirements
100 Million Events Eric Lubow @elubow
17. Why?
• Heavier READ loads vs heavier write loads
• Data relationships may be less important
• Different aspects of a system have different requirements
• Know your compromises
100 Million Events Eric Lubow @elubow
19. Cassandra
• Large data volume ingestion
100 Million Events Eric Lubow @elubow
20. Cassandra
• Large data volume ingestion
• Really fast writes to many locations (eventual consistency)
100 Million Events Eric Lubow @elubow
21. Cassandra
• Large data volume ingestion
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows
100 Million Events Eric Lubow @elubow
22. Cassandra
• Large data volume ingestion
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows
• Range queries in Hive (Slice predicate ranges)
100 Million Events Eric Lubow @elubow
23. Cassandra
• Large data volume ingestion
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows
• Range queries in Hive (Slice predicate ranges)
• Fault tolerant
100 Million Events Eric Lubow @elubow
25. What Mistakes?
• Manage how many servers?
100 Million Events Eric Lubow @elubow
26. What Mistakes?
• Manage how many servers?
• Re-inventing the wheel (Helenus)
100 Million Events Eric Lubow @elubow
27. What Mistakes?
• Manage how many servers?
• Re-inventing the wheel (Helenus)
• Composites Rock
100 Million Events Eric Lubow @elubow
28. What Mistakes?
• Manage how many servers?
• Re-inventing the wheel (Helenus)
• Composites Rock
• Snapshots before drop keyspace
100 Million Events Eric Lubow @elubow
29. What Mistakes?
• Manage how many servers?
• Re-inventing the wheel (Helenus)
• Composites Rock
• Snapshots before drop keyspace
• How many experts does it take to run a cluster?
100 Million Events Eric Lubow @elubow
30. What Mistakes?
• Manage how many servers?
• Re-inventing the wheel (Helenus)
• Composites Rock
• Snapshots before drop keyspace
• How many experts does it take to run a cluster?
• You can tune Cassandra?!?
100 Million Events Eric Lubow @elubow
32. Server Management
• Hand tools - AWS, csshx
Cluster SSH
100 Million Events Eric Lubow @elubow
33. Server Management
• Hand tools - AWS, csshx
• Configuration Management
Cluster SSH
100 Million Events Eric Lubow @elubow
34. Server Management
• Hand tools - AWS, csshx
• Configuration Management
• Monitoring and Alerting Tools Cluster SSH
100 Million Events Eric Lubow @elubow
35. Server Management
• Hand tools - AWS, csshx
• Configuration Management
• Monitoring and Alerting Tools Cluster SSH
• Performance
100 Million Events Eric Lubow @elubow
36. Server Management
• Hand tools - AWS, csshx
• Configuration Management
• Monitoring and Alerting Tools Cluster SSH
• Performance
• Security
100 Million Events Eric Lubow @elubow
44. Data Patterns
• Storage is cheap
100 Million Events Eric Lubow @elubow
45. Data Patterns
• Storage is cheap
• Composites are WAY better than underscores
100 Million Events Eric Lubow @elubow
46. Data Patterns
• Storage is cheap
• Composites are WAY better than underscores
• Beyond UTF8Type
100 Million Events Eric Lubow @elubow
47. Data Patterns
• Storage is cheap
• Composites are WAY better than underscores
• Beyond UTF8Type
• Timestamps as LongType
100 Million Events Eric Lubow @elubow
49. Safety Mechanisms
• Snapshots before dropping keyspaces
100 Million Events Eric Lubow @elubow
50. Safety Mechanisms
• Snapshots before dropping keyspaces
• Authorization and authentication
100 Million Events Eric Lubow @elubow
51. Safety Mechanisms
• Snapshots before dropping keyspaces
• Authorization and authentication
• (Limit) Direct access to the data store
100 Million Events Eric Lubow @elubow
53. Expertise
• What happens when you need help?
100 Million Events Eric Lubow @elubow
54. Expertise
• What happens when you need help?
• How do you become an expert?
100 Million Events Eric Lubow @elubow
55. Expertise
• What happens when you need help?
• How do you become an expert?
• What happens when you need more experts?
100 Million Events Eric Lubow @elubow
57. Tunables
• Replication factor and read_repair_chance
100 Million Events Eric Lubow @elubow
58. Tunables
• Replication factor and read_repair_chance
• Phi Convict and RPC timeout for AWS or DC separation
100 Million Events Eric Lubow @elubow
59. Tunables
• Replication factor and read_repair_chance
• Phi Convict and RPC timeout for AWS or DC separation
• MAX_HEAP_SIZE and HEAP_NEWSIZE (Analytics vs Realtime)
100 Million Events Eric Lubow @elubow
60. Future
• Priam
• Asgard
• Curator
• Work for ?
• Hastur
100 Million Events Eric Lubow @elubow
62. Summary
• Learn from others mistakes
100 Million Events Eric Lubow @elubow
63. Summary
• Learn from others mistakes
• Tuning and data patterns
100 Million Events Eric Lubow @elubow
64. Summary
• Learn from others mistakes
• Tuning and data patterns
• It’s ok to re-invent the wheel
100 Million Events Eric Lubow @elubow
65. Summary
• Learn from others mistakes
• Tuning and data patterns
• It’s ok to re-invent the wheel
• Applications for/with Cassandra
100 Million Events Eric Lubow @elubow
67. Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
elubow@simplereach.com
Thank you.
Notas del editor
\n
\n
\n
\n
\n
\n
SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.\n