Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centers globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.
10. #CASSANDRA13 CASSANDRASUMMIT2013
• The world’s highest-rated taxi app - over 7,000 five-star reviews
• Over 300,000 registered passengers
• A Hailo hail is accepted around the world every 5 seconds
• Hailo is growing (30%+) every month
• Became the largest taxi network in all of Ireland within two months
of launch
What is Hailo?
12. #CASSANDRA13 CASSANDRASUMMIT2013
Hailo launched in London in November 2011
• Launched on AWS
• Two PHP/MySQL web apps plus a Java backend
• Mostly built by a team of 3 or 4 backend engineers
• MySQL multi-master for single AZ resilience
13. #CASSANDRA13 CASSANDRASUMMIT2013
Why Cassandra?
• A desire for greater resilience – “become a utility”
Cassandra is designed for high availability
• Plans for international expansion around a single consumer app
Cassandra is good at global replication
• Expected growth
Cassandra scales linearly for both reads and writes
• Prior experience
I had experience with Cassandra and could recommend it
14. #CASSANDRA13 CASSANDRASUMMIT2013
The path to adoption
• Largely unilateral decision by developers – a result of a startup
culture
• Replacement of key consumer app functionality, splitting up the
PHP/MySQL web app into a mixture of global PHP/Java services
backed by a Cassandra data store
• Launched into production in XYZ– originally just powering North
American expansion, before gradually switching over Dublin and
London
19. #CASSANDRA13 CASSANDRASUMMIT2013
Considerations for entity storage
• Do not read the entire entity, update one property and then write
back a mutation containing every column
• Only mutate columns that have been set
• This avoids read-before-write race conditions
26. #CASSANDRA13 CASSANDRASUMMIT2013
Considerations for time series storage
• Choose row key carefully, since this partitions the records
• Think about how many records you want in a single row
• Denormalise on write into many indexes
28. #CASSANDRA13 CASSANDRASUMMIT2013
Analytics
• With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY
• We use Acunu Analytics to give us this abilty in real time, for pre-
planned query templates
• It is backed by Cassandra and therefore highly available, resilient
and globally distributed
• Integration is straightforward
37. #CASSANDRA13 CASSANDRASUMMIT2013
Learn the theory
• Teach each team member the fundamentals
• CQL can encourage an SQL mindset, but it’s important to
understand the underlying data model
• Make a real effort to share knowledge – keep in mind the gulf in
experience for most team members between their old world and the
new world (SQL vs NoSQL)
• Peer review data models
41. #CASSANDRA13 CASSANDRASUMMIT2013
2 clusters
6 machines per region
3 regions
(stats cluster pending addition
of third DC)
Operational
Cluster
Stats
Cluster
ap-southeast-1 us-east-1 eu-west-1
us-east-1 eu-west-1
42. #CASSANDRA13 CASSANDRASUMMIT2013
AWS VPCs with Open
VPN links
3 AZs per region
m1.large machines
Provisoned IOPS EBS
Operational
Cluster
Stats
Cluster
~ 600GB/node
~ 100GB/node
43. #CASSANDRA13 CASSANDRASUMMIT2013
Backups
• SSTable snapshot
• Used to upload to S3, but this was taking >6 hours and consuming
all our network bandwidth
• Now take EBS snapshot of the SSTable snapshots
44. #CASSANDRA13 CASSANDRASUMMIT2013
Encryption
• Requirement for NYC launch
• We use dmcrypt to encrypt the entire EBS volume
• Chose dmcrypt because it is uncomplicated
• Our tests show a 1% performance hit in disk performance, which
concurs with what Amazon suggest
45. #CASSANDRA13 CASSANDRASUMMIT2013
Datastax Ops Centre
• We run the free version
• Offers up easily accessible “one screen” overviews of the activity of
the entire cluster
• Big fans – an easy win
47. #CASSANDRA13 CASSANDRASUMMIT2013
Multi DC
• Something that Cassandra makes trivial
• Would have been very difficult to accomplish active-active inter-DC
replication with a team of 2 without Cassandra
• Rolling repair needed to make it safe (we use LOCAL_QUORUM)
• We schedule “narrow repairs” on different nodes in our cluster each
night
48. #CASSANDRA13 CASSANDRASUMMIT2013
Compression
• Our stats cluster was running at ~1.5TB per node
• We didn’t want to add more nodes
• With compression, we are now back to ~600GB
• Easy to accomplish
• `nodetool upgradesstables` on a rolling schedule
53. #CASSANDRA13 CASSANDRASUMMIT2013
Technically, everything is fine…
• Our COO feels that C* is “technically good and beautiful”, a
“perfectly good option”
• Our EVPO says that C* reminds him of a time series database in
use at Goldman Sachs that had “very good performance”
…but there are concerns
57. #CASSANDRA13 CASSANDRASUMMIT2013
Keep the business informed
• Pre-launch, we were tasked with increasing resiliency
• Cassandra addressed immediate business needs, but the trade offs
involved should have been communicated more clearly
58. #CASSANDRA13 CASSANDRASUMMIT2013
Sing from the same hymn sheet
• A senior founding engineer had doubts about the adoption of
Cassandra until very recently
• In the presence of business doubt, this lack of consistency
amongst developers exacerbated the concerns
• We should have made more effort to make bilateral decisions on
adoption – I don’t think this would have been hard to achieve
62. #CASSANDRA13 CASSANDRASUMMIT2013
Cassandra at Hailo
• We will continue to invest in Cassandra as we expand globally
• We will hire people with experience running Cassandra
• We will focus on expanding our reporting facilities