Speaker: Dave Gardner, Architect at Hailo
Video: http://www.youtube.com/watch?v=6cUuE7sTdU0&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=16
Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centres globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.
4. 0.6 to 1.2
• 1,352 changed files with 235,413 additions and 47,487 deletions
• 7,429 commits
• 1,653 tickets completed
https://github.com/apache/cassandra/compare/cassandra-0.6.0...cassandra-1.2
https://github.com/apache/cassandra/blob/trunk/CHANGES.txt
#CASSANDRAEU
CASSANDRASUMMITEU
5. What this talk is about
Cassandra adoption at Hailo from three perspectives:
1. Development
2. Operational
3. Management
#CASSANDRAEU
CASSANDRASUMMITEU
6. What is Hailo?
Hailo is The Taxi Magnet. Use Hailo to get a cab wherever you are, whenever you want.
#CASSANDRAEU
CASSANDRASUMMITEU
10. What is Hailo?
• The world’s highest-rated taxi app – over 11,000 five-star reviews
• Over 500,000 registered passengers
• A Hailo hail is accepted around the world every 4 seconds
• Hailo operates in 15 cities on 3 continents from Tokyo to Toronto
in nearly 2 years of operation
#CASSANDRAEU
CASSANDRASUMMITEU
11. Hailo is growing
• Hailo is a marketplace that facilitates over $100M in run-rate
transactions and is making the world a better place for passengers
and drivers
• Hailo has raised over $50M in financing from the world's best
investors including Union Square Ventures, Accel, the founder of
Skype (via Atomico), Wellington Partners (Spotify), Sir Richard
Branson, and our CEO's mother, Janice
#CASSANDRAEU
CASSANDRASUMMITEU
12. The history
The story behind Cassandra adoption at Hailo
#CASSANDRAEU
CASSANDRASUMMITEU
13. Hailo launched in London in November 2011
• Launched on AWS
• Two PHP/MySQL web apps plus a Java backend
• Mostly built by a team of 3 or 4 backend engineers
• MySQL multi-master for single AZ resilience
#CASSANDRAEU
CASSANDRASUMMITEU
14. Why Cassandra?
• A desire for greater resilience – “become a utility”
Cassandra is designed for high availability
• Plans for international expansion around a single consumer app
Cassandra is good at global replication
• Expected growth
Cassandra scales linearly for both reads and writes
• Prior experience
I had experience with Cassandra and could recommend it
#CASSANDRAEU
CASSANDRASUMMITEU
15. The path to adoption
• Largely unilateral decision by developers – a result of a startup
culture
• Replacement of key consumer app functionality, splitting up the
PHP/MySQL web app into a mixture of global PHP/Java services
backed by a Cassandra data store
• Launched into production in September 2012 – originally just
powering North American expansion, before gradually switching
over Dublin and London
#CASSANDRAEU
CASSANDRASUMMITEU
16. One year on...
• Further breakdown of functionality into Go/Java SOA
• Migrating all online databases to Cassandra
#CASSANDRAEU
CASSANDRASUMMITEU
21. Considerations for entity storage
• Do not read the entire entity, update one property and then write
back a mutation containing every column
• Only mutate columns that have been set
• This avoids read-before-write race conditions
#CASSANDRAEU
CASSANDRASUMMITEU
26. Considerations for time series storage
• Choose row key carefully, since this partitions the records
• Think about how many records you want in a single row
• Denormalise on write into many indexes
#CASSANDRAEU
CASSANDRASUMMITEU
28. Analytics
• With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY
• We use Acunu Analytics to give us this abilty in real time, for preplanned query templates
• It is backed by Cassandra and therefore highly available, resilient
and globally distributed
• Integration is straightforward
#CASSANDRAEU
CASSANDRASUMMITEU
33. “Allows a team of 2 to achieve things they wouldn’t
have considered before Cassandra existed”
Chris H, Operations Engineer
#CASSANDRAEU
CASSANDRASUMMITEU
37. 3 AZs per region
m1.large machines
~ 1TB/node
Stats
Cluster
AWS VPCs with Open
VPN links
Provisoned IOPS EBS
#CASSANDRAEU
Operational
Cluster
~ 200GB/node
CASSANDRASUMMITEU
38. Backups
• SSTable snapshot
• Used to upload to S3, but this was taking >6 hours and consuming
all our network bandwidth
• Now take EBS snapshot of the data volumes
#CASSANDRAEU
CASSANDRASUMMITEU
39. Encryption
• Requirement for NYC launch
• We use dmcrypt to encrypt the entire EBS volume
• Chose dmcrypt because it is uncomplicated
• Our tests show a 1% performance hit in disk performance, which
concurs with what Amazon suggest
#CASSANDRAEU
CASSANDRASUMMITEU
41. Multi DC
• Something that Cassandra makes trivial
• Would have been very difficult to accomplish active-active inter-DC
replication with a team of 2 without Cassandra
• Rolling repair needed to make it safe (we use LOCAL_QUORUM)
• We schedule “narrow repairs” on different nodes in our cluster each
night
#CASSANDRAEU
CASSANDRASUMMITEU
42. Compression
• Our stats cluster was running at ~1.5TB per node
• We didn’t want to add more nodes
• With compression, we are now back to ~600GB
• Easy to accomplish
• `nodetool upgradesstables` on a rolling schedule
#CASSANDRAEU
CASSANDRASUMMITEU
44. “The days of the quick and dirty are over”
Simon V, EVP Operations
#CASSANDRAEU
CASSANDRASUMMITEU
45. Technically, everything is fine…
• Our COO feels that C* is “technically good and beautiful”, a
“perfectly good option”
• Our EVPO says that C* reminds him of a time series database in
use at Goldman Sachs that had “very good performance”
…but there are concerns
#CASSANDRAEU
CASSANDRASUMMITEU
46. People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra
#CASSANDRAEU
CASSANDRASUMMITEU
51. Lesson learned
• Have an advocate - get someone who will sell the vision internally
• Learn the theory - teach each team member the fundamentals
• Make an effort to get everyone on board
#CASSANDRAEU
CASSANDRASUMMITEU
58. Lesson learned
• Be pro-active with Cassandra, even if it seems to be running
smoothly
• Peer-review data models, take time to think about them
• Big rows are bad - use cfstats to look for them
• Mixed workloads can cause problems - use cfhistograms and look
out for signs of data modeling problems
• Think about the compaction strategy for each CF
#CASSANDRAEU
CASSANDRASUMMITEU
60. Lessons learned
• EBS is nearly always the cause of Amazon outages
• EBS is a single point of failure (it will fail everywhere in your cluster)
• EBS is slow
• EBS is expensive
• EBS is unnecessary!
#CASSANDRAEU
CASSANDRASUMMITEU
62. Lessons learned
• Keep the business informed – explain the tradeoffs in simple terms
• Sing from the same hymn sheet
• Make sure there solutions in place for every use case from the
beginning
#CASSANDRAEU
CASSANDRASUMMITEU
63. People who can
attempt to query
MySQL
#CASSANDRAEU
People who can
attempt to
query Cassandra
CASSANDRASUMMITEU
65. We like Cassandra
• Solid design
• HA characteristics
• Easy multi-DC setup
• Simplicity of operation
#CASSANDRAEU
CASSANDRASUMMITEU
66. Lessons for successful adoption
• Have an advocate, sell the dream
• Learn the fundamentals, get the best out of Cassandra
• Invest in tools to make life easier
• Keep management in the loop, explain the trade offs
#CASSANDRAEU
CASSANDRASUMMITEU
67. The future
• We will continue to invest in Cassandra as we expand globally
• We will hire people with experience running Cassandra
• We will focus on expanding our reporting facilities
• We aspire to extend our network (1M consumer installs, wallet)
beyond cabs
• We will continue to hire the best engineers in London, NYC and
Asia
#CASSANDRAEU
CASSANDRASUMMITEU