A presentation of my talk about data migrations for Rails applications. Observed different cases, different solutions. Opinion what's the best approach.
2. ● Code is never set in stone
● DB structure mutates
○ Columns/tables rename/drop
○ Move one type of relationship to other (e.g. from “belongs to” to
“has and belongs to many”, from “has many” to “has one”, etc.)
● Zero-downtime policy (production experience)
● Ton of data to migrate
● Public API exposed for other services
● NoSQL
The problem definition
3. ● Code is never set in stone
● DB structure mutates
○ Columns/tables rename/drop
○ Move one type of relationship to other (e.g. from “belongs to” to
“has and belongs to many”, from “has many” to “has one”, etc.)
● Zero-downtime policy (production experience)
● Ton of data to migrate
● Public API exposed for other services
● NoSQL
The problem definition
4. ● No production yet
● Production without zero-downtime policy
● Production with zero-downtime policy
Different situations
5. ● No production yet
● Production without zero-downtime policy
● Production with zero-downtime policy
Different situations: the hardest case
7. class AddStatusToUser < AR::Migration
def up
add_column :users, :status, :string
end
def down
remove_column :users, :status
end
end
Tell things apart: schema migrations
8. class AddStatusToUser < AR::Migration
def up
add_column :users, :status, :string
User.find_each do |user|
user.status = 'active'
user.save!
end
end
...
Tell things apart: data migrations
9. ● Write data migrations inside schema migrations (1)
● Write data migrations separately from schema migrations (2)
Different solutions
10. ● Write any Rails code carelessly (a)
● Redefine models and use them in place (b)
● Call migration data code written outside (seeds, services, etc.) (c)
● Raw SQL (d)
● Rake tasks (e)
Different solutions
11. |{1, 2} x {a, b, c, d, e}| = 10
Different solutions
12. ● Do you need the migrations functioning forever?
● Is a developer environment important more than production?
Pick a solution based on balance
13. ● Do you need the migrations functioning forever?
○ No, clean them up from time to time
○ Don’t run all migrations at fresh start
○ Local/staging loads dump and the final schema at once
○ Obfuscate dump if needed
● Is a developer environment important more than production?
○ Obviously no, see the points above
My choice
14. class AddStatusToUser < AR::Migration
def up
add_column :users, :status, :string
User.find_each do |user|
user.status = 'active'
user.save!
end
end
...
Solution #1: Ruby code inside schema migration
15. ● Error-prone - What if someone renames User model later?
● Not recommended
Solution #1: Ruby code inside schema migration
16. class AddStatusToUser < AR::Migration
class User < ActiveRecord::Base; end
def up
add_column :users, :status, :string
User.find_each { |user| user.update!(status: ‘active’) }
end
...
Solution #2: Redefine models inside migrations
17. class AddStatusToUser < AR::Migration
class User < AR::Base; belongs_to :role, polymorphic:
true; end
class Role < AR::Base; has_many :users, as: :role; end
----------------------------------------------------------
role = Role.create!(name: 'admin')
User.create!(nick: '@ka8725', role: role)
Solution #2: Redefine models inside migrations. Bug
20. ● Much better than the previous one
● Error-prone - How to deal with tricky associations?
● Interesting bug with polymorphic associations
● Not recommended
Solution #2: Redefine models inside migrations
25. Solution #5: Rake tasks
● Define custom Rake tasks
● Run when needed
rake db_migration:fix_data
26. Solution #5: Rake tasks
● Not a bad choice
● Requires some manual work
● Can be automated
● Can be developed to similar solution as schema migrations
in Rails
28. Not bad solution for a start
● Define data migrations inside schema migrations
● But write tests for data migrations
● https://railsguides.net/change-data-in-migrations-like-a-boss/
● https://github.com/ka8725/migration_data
29. ● Similar solution for schema migrations with versioning
○ https://github.com/ilyakatz/data-migrate
● Write SQL
● Schema migrations are made in several steps
○ https://blog.codeship.com/rails-migrations-zero-downtime/
● Heavy migrations (last for hours) are split into several
background jobs scheduled with some interval
The best choice suites production zero-downtime
31. The best choice suites production zero-downtime
Sort and run combined:
for local env only!
32. ● Schema migrations should be fast (<1s)
● Avoid data migrations inside schema migrations
● Data migrations run after deployment
● Complementary actions are made on following deploys if the
data migration is run successfully
Production zero-downtime: deployment caveats