The document discusses advanced deployment strategies for Rails applications. It covers using background processing and search services to improve performance. It also discusses options for scaling databases, such as read slaves, master-master replication, and data partitioning. For applications with multiple clients, it recommends configuring clients differently but keeping the same codebase and database structure. When deploying to cloud infrastructure, it advises using a central directory service to dynamically configure database connections. Throughout, it emphasizes that simpler solutions are generally better than complex ones.
2. Who am I?
Jonathan Weiss
• Consultant for Peritor GmbH in Berlin
• Specialized in Rails, Scaling, Deployment, and Code Review
• Webistrano - Rails deployment tool
• FreeBSD Rubygems and Ruby on Rails maintainer
http://www.peritor.com
http://blog.innerewut.de
2
13. Full Text Search Engine
Separate Service
• Creates full text index
• Application queries search daemon
• Index update through application or
database
Possible Engines
• Ferret
• Sphinx
• Solr
• Lucene
• …
13
14. Search Slave
Database replication slave
• Has complete dataset
• Migrates slow search queries from master
• Can use different database table engine
14
15. Database Index
PostgreSQL Tsearch2
• Core since 8.3
• Allows to create full text index on multiple columns
or arbitrary SQL expressions
MySQL MyISAM FULLTEXT index
• Only works with MySQL <= 5.0 and MyISAM tables
• Full text index on multiple columns
15
16. What to use?
Different characteristics
• Real-time updates and stale data
• Lost updates
• Performance
• Document content and format
• Complexity
16
18. Problem
Long running tasks
• Resizing uploaded images
• Mailing
• Computing an expensive operation
• Accessing slow back-ends
When running inside request-response-cycle
• Blocks user
• Blocks Rails instance
• Hard to monitor and debug
18
24. Scaling the database
One database for everything
• All domain data in one place
• The simplest solution
Problems at some point
• Number of read and write requests
• Data size
24
25. Scaling the database
Read Slave
• Slave replicates each SQL-statement
on the master
• Increase read performance by reading
from replicating slave
• Stale read problem
• Better used explicitly,
but then makes you think
Better use
memcached
25
26. Scaling the database
Master-Master
• Increase write and read performance
• Each server is a slave of the other
• Synchronization can be tricky
• Limited by database size
Better for HA than for
write performance
26
27. Data Partitioning
Partition on domain models
• Separate users and products
• Makes sense if JOINs are rare
• Scales reads/writes
• Reduces data size per database
• Depends on separate domains
Simple and
effective
27
28. Data Partitioning
Sharding
• Split data into shards
• All tables
• Only big ones like users
• Partition by id, hash function or lookup
• Complex and makes JOINs complicated
• Scales reads/writes
• Reduces data size per database
28
29. Data Partitioning
Sharding
• Split data into shards
• All tables
• Only big ones like users
• Partition by id, hash function or lookup
• Complex and makes JOINs complicated
• Scales reads/writes
• Reduces data size per database
Last resort
29
31. Archiving
Get rid of (historical) data
• Delete old data
• Aggregate old data
• Partition old data
Have an archiving policy from the start
31
32. Reduce data size
Avoid exponential data growth
• Do not store data in database, move to
• File system
• S3
• SimpleDB
• Do not normalize data
• Duplicate data in order to remove JOINs (and JOIN tables)
• Combine indices
32
34. Multiple Clients
NOT the same as multiple users
Client is more like a separate domain – i.e. expansion to another country
• Different settings
• Different themes
• Different features enabled
• Different language
• Different audience
How to combine in one app?
34
35. Multiple Clients
Questions to ask
• How many different clients?
• Is there shared state (users, settings, posts, …)?
• What is the expected data size and growth of each client?
35
36. Multiple Clients
The easy way to maintenance hell
• Fork the code
• One branch per client
• One install per client
36
37. Multiple Clients
Same code – same database
• Move different behavior into configuration
• Move configuration into database
• Scope data by DB-column
• Scope all data request in the code
37
38. Multiple Clients
Same code – partition the data
• Move different behavior into configuration
• Partition data by database
Hardcode database while booting
38
39. Multiple Clients
Same code – partition the data
• Move different behavior into configuration
• Partition data by database
Choose database dynamically
39
40. Multiple Clients
Generate local databases
• Import global content into master DB
• Push shared content in the correct
format to app DBs
• Build reverse channel if needed
40
42. Cloud Infrastructure
Servers come and go
• You do not know your servers before deploying
• Restarting is the same as introducing a new machine
You can’t hardcode IPs
database.yml
42
43. Solution #1
Query and manually adjust
• Servers do not change that often
• New nodes probably need manual intervention
• Use AWS ElasticIPs to ease the pain
Set servers dynamically AWS Elastic IP
43
44. Solution #2
Use a central directory service
• A central place to manage your running instances
• Instances query the directory and react
44
45. Solution #2
Use a central directory service
• A central place to manage your running instances
• Instances query the directory and react
45
47. Summary
Simple is better than complex
Carefully evaluate the different solutions
Only introduce a new component if you really need to
Everything has strings attached
Solving the data size problem often solves others too
47