This talk from RailsConf 2019 describes how our team made deep changes to the data model of our production system over a period of 2.5 years.
Changing your data model is hard. Taking care of existing data requires caution. Exploring and testing possible solutions can be slow. Your new data model may require data completeness or correctness that hasn't been enforced for the existing data.
To manage the risk and minimize disruption to the product roadmap, we broke the effort into four stages, each with its own distinct challenges. I'll describe our rationale, process ... and the lessons we learned along the way.
21. What Drove the Change?
• Neo4j/Cypher not as familiar to developers as Postgres/SQL
• Neo4j ActiveModel gem less mature and feature-rich than
ActiveRecord
• Neo4j drivers less mature, less well optimized
• Some features required cross-database joins (slow, memory
intensive)
22. Making a Plan
• Realtor-by-realtor migration
• An importer job that would import a realtor’s Neo4j data
into Postgres
• The importer needed to avoid duplicating shared data that
had already been imported for another realtor
• We would use a feature flag to indicate whether a realtor
had been migrated or not
23. Schema Definition
• We knew our data in Neo4j was messy.
• Neo4j’s referential integrity features weaker than Postgres’
• We weren’t skilled at using the features Neo4j did have
• We got very serious about data integrity in the schema:
• foreign keys, ON CASCADE, check constraints, exclusion
constraints
• This was enormously helpful!
24. Switching Models
• The feature flag needed to be readily available everywhere, so we set a
thread-local variable in middleware.
• A lot of queries start off by calling class methods on a model class
• We needed that model class to be the ActiveRecord model if the current
realtor’s feature flag was set, and the Neo4j model otherwise
Person.find(35)
# or
Property.where(zip5: "75238")
25. Switching Models
• Exploiting Ruby’s dynamic nature, we were able to build models that
could be Neo4j or ActiveRecord models, depending on the feature flag.
class Contact
extend SwitchingModel
switch_between(::ContactV1, ::ContactV2)
end
class ContactV1
include Neo4j::ActiveNode
self.mapped_label_name = "Contact"
# ... Neo4j::ActiveNode model code
end
class ContactV2 < ApplicationRecord
self.table_name = :contacts
# ... ActiveRecord model code
end
26. Switching Models
module SwitchingModel
def switch_between(v1_model, v2_model)
@_v1_model = v1_model
@_v2_model = v2_model
end
private
def _v2_mode?
Thread.current.thread_variable_get(:moved_to_postgres) ||
ENV['FORCE_V2_FEATURE_FLAG'] == '1'
end
def _switch
return @_v2_model if _v2_mode?
@_v1_model
end
end
27. Switching Models
module SwitchingModel
def method_missing(meth, *args, &blk)
_switch.send(meth, *args, &blk)
end
def const_missing(name)
_switch.const_get(name)
end
def new(*args)
_switch.new(*args)
end
private
# ...
end
28. Scopes and More Scopes
• A lot of queries contained Cypher fragments
• Converting those to scopes allowed controllers to use
the same queries, whether the feature flag was set or not
• Built a rich vocabulary of scopes that has served us well
ever since
29. Testing
• Environment variable override of feature flag
• Rake tasks for running two sets of specs
• Separate sets of factories
• CI running both sets
• Lots of comparison testing by developers
• Whole company QA swarm in staging
30. Tracking Progress
• Excellent advice from Jess Martin, our CTO
• Added an RSpec custom formatter to output total number of v2 specs vs.
number of passing v2 specs.
• Those went into a spreadsheet with a chart:
31. Executing
• Select employees first (those not doing sales and demos)
• Rest of employees
• Friendly customers (who would inform us of issues)
• Rest of active customers
• The whole process took about three weeks
32. Finishing the Job
• After the initial round of employee and select customer migrations, we
kicked off the first full batch of customers.
• All of a sudden, I had nothing to do!
• “I may as well start on the PR to rip out all the V1 and transitional code …”
• 10 hours later:
35. What Drove the Change?
• Postgres UUID primary keys work just fine.
• Harder to remember, vdiff, type
• Didn’t become an issue until we needed to start tracking source
info for a different table that had an integer primary key.
• We track sources using a polymorphic join table
(sourcings).
36. A Spike
⭐ id ⭐ first …
abc Joe
def Susan
ghi Rachel
jkl Todd
mno Melanie
contact_names
37. A Spike
id first … integer_id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
⭐ id ⭐
contact_names
38. A Spike
uuid first … integer_id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
contact_names
39. A Spike
uuid first … id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
⭐ id ⭐
contact_names
49. Problem: Polymorphic Tables
• Remember, this started because of a polymorphic join table,
sourcings
• Required converting all tables referenced by the polymorphic
table at once
• Ended up with 5 separate clusters of tables.
• Wrote migration helpers to manage the details and make
things reversible.
50. Discovering Constraints
• You can query anything about the schema from a set of internal tables and
views
• Example: finding all foreign key references to the contacts table:
• You can do similar things for indexes and other kinds of constraints
SELECT *
FROM information_schema.constraint_column_usage
WHERE table_name = 'contacts'
AND column_name = 'id'
AND constraint_name <> 'contacts_pkey'
51. Plan and Wait
• Output of the spike:
• 3 complex migration helpers
• 5 migrations
• Ended up waiting 5 months before the pain outweighed the risk
52. Five Big Migrations
• Simple case easy:
• Harder cases not so easy:
• Worst case: 3 primary keys, 28 foreign keys, 4 polymorphic tables … all in
one migration.
fix_uuid_primary_key :contact_names
fix_uuid_primary_key :avatars
fix_uuid_primary_key :properties
fix_uuid_foreign_key :properties, :property_notes, on_delete: :cascade
fix_uuid_polymorphic_association :sourcings,
:sourceable,
targets: [:avatars, :properties]
53. From Spike to Solution
• Careful review of the migrations and helpers
• Ran the migrations many, many times on clone of production DB
• Run, fix error, repeat. (Very thankful for Postgres transactional DDL!)
• Fixing error usually meant figuring out how to reflect on some new kind of
dependency in Postgres and update the helper to deal with it.
• Sometimes meant just coding a workaround for an odd case.
54. Being Careful
• Ran the migrations in staging for timings
• We had the luxury of downtime!
• But we wanted to understand how long each maintenance window would be.
• Made them reversible!
• We planned for never having to reverse them, including careful testing and
random spot-checks in the migrations.
• But we also made sure they could be reversed (including round-trip testing of
both schema and table contents).
55. Being Careful
• Build correctness checking into the migration helpers
• Remember: we kept the uuid column
• At start of change: store random sample of records
• After change: find those records and ensure they still refer to same UUID
• Finally deployed on five consecutive weekends (simplest first)
59. What Drove the Change?
• Nearly every query had to be filtered based on source
• Extra complexity
• Joining through polymorphic table was costly
• Sooner or later we would miss it and violate data privacy
60. A Spike
• Doing this in Ruby was fairly straightforward … but very slow (about a day
per realtor)
• Doing it in SQL required fairly advanced skills … but took about ten
minutes per realtor
• As with stage 1, decided on a user-by-user approach
63. The Strategy
id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
• First, add old_contact_id column to contact_relationships
• Populate it with current value of contact_id
64. The Strategy
id: 1
contact_relationship_id: "
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• Next, add contact_relationship_id column to contacts
• Populate it with NULL (represented as ")
• Add uniqueness constraint for that column
65. id: 1
old_contact_id: 1
The Strategy
id: 1
contact_relationship_id: "
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
66. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
UPDATE contacts
SET contact_relationship_id = contact_relationships.id
FROM contact_relationships
WHERE contacts.id = contact_relationships.contact_id
AND contact_relationships.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
• Update contact_relationship_id IF it’s NULL
67. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• Now INSERT into contacts for each of Alice’s contact_relationships
• ON CONFLICT just set updated_at on the existing one
• and then UPDATE contact_relationships to point to the new contact records
?
68. INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
69. INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
70. INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
71. INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
72. INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
73. INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;
74. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• INSERT into contacts for each of Alice’s contact_relationships
• ON CONFLICT just set updated_at on the existing one
• and then UPDATE contact_relationships to point to the new contact records
id: 1001
contact_relationship_id: 1
X
75. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• That time, nothing happened, because Alice was the first realtor for contact 1.
76. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• Now let’s try Bill.
77. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• We try to claim the contact for Bill by updating
contacts.contact_relationship_id
• But it isn’t NULL, so we don’t update it to 2
X
78. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• But the INSERT works because it doesn’t create a uniqueness violation
id: 1001
contact_relationship_id: 2
79. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• And then the UPDATE fixes up the contact_relationships record
• But what about the attached attributes?
id: 1001
contact_relationship_id: 2
80. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• For each of Bill’s contacts where
contact_relationships.old_contact_id != contacts.id,
go copy all of the attached attributes from old_contact_id
id: 1001
contact_relationship_id: 2
81. The Strategy
id: 1
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1
old_contact_id: 1
id: 3
old_contact_id: 1
id: 2
old_contact_id: 1
realtors contact_relationships
contacts Name: Nancy
contact_id: 1
Email: nancy@example.com
contact_id: 1
uniqueness constraint
on contact_relationship_id
• For each of Bill’s contacts where
contact_relationships.old_contact_id != contacts.id,
go copy all of the attached attributes from old_contact_id
• A lot of queries, but basically straightforward
• Then move on to Carl
id: 1001
contact_relationship_id: 2
Email: nancy@example.com
contact_id: 2
Name: Nancy
contact_id: 2
82. Being Careful
• Again: ran these transformations against a clone of production
• Run for a realtor, compare against that realtor’s production data
• Complete run-through of all realtors in staging before moving on to
production
• During run-through, I plotted changes to table counts as a sanity check
89. What Drove the Change?
• Everything’s just a little more complex with the join table
• Requires constraints and integrity checks that wouldn’t be
necessary without it
• Another team member challenged me to get rid of it!
• It really wasn’t causing us enough trouble to justify a big push
• But I realized we could set this up to do opportunistically
90. The Idea
• Go ahead and add the direct contacts.realtor_id foreign key
• Populate it to match the existing contact_relationships.
• Then just make sure they stay consistent!
91. Triggers
• Rails developers are wary of stored procedures and triggers (for good
reason)
• But sometimes they’re exactly what you need. This is one of those times.
• I had a lot of ignorance to overcome.
• So I worked on a spike, curling up with the Postgres manual and
experimenting …
96. Triggers Are Difficult
(for me, anyway)
• For efficiency, control the conditions under which invoked
• For correctness, decide before/after
• Carefully write updates/inserts to only make changes if
things are inconsistent
97. The Plan: A 12-Step Epic
• Step 1: build a way to track progress
• Step 2: build a way to audit the activity of the triggers
• Step 3: add contacts.realtor_id and triggers
• Steps 4–6: move fields from contact_relationships to contacts
• Steps 7–8: retargeting polymorphic associations
• Steps 9-11: retargeting associations, scopes, and query fragments
• Step 12: DROP TABLE contact_relationships
98. Tracking Progress
rg --count --ignore-file .rg_crprogress_ignore '[Cc]ontact_?[Rr]elationship'
| cut -d : -f 2
| sed '2,$s/$/+/; $s/$/p/'
| dc
99. Auditing Trigger Activity
• Updated the triggers to log behavior to new
contact_relationship_trigger_actions table.
• Utility script to audit this table for consistency occasionally.
id action contact_relationship_id contact_id performed_update time
4995810 c_setrealtor 5622768 FALSE 2019-02-18 13:31:49.395671
4995811 cr_insert 10607228 5622768 TRUE 2019-02-18 13:31:49.395671
4995812 c_setrealtor 5622769 FALSE 2019-02-18 13:31:50.181528
4995813 cr_insert 10607230 5622769 TRUE 2019-02-18 13:31:50.181528
4995814 c_setrealtor 5622770 FALSE 2019-02-18 13:31:50.474147
100. Executing
• One step a week
• Each took 8–10 hours, on average
• Most deployments on weekends, even when
no downtime required
103. Slow and Steady
• Incremental, “worst pain first” strategy
• Contained risk
• Enabled feature development
• Produced enormous technical improvement over time
104. Keep Looking Ahead
• We were always looking for ways to improve the system
• An “inventory of pain” helps you to identify which pain is the worst right
now
105. Each Stage Was Different!
• Entirely different, creative solutions required at each step
• Ruby magic
• Migrations and database reflections
• Fancy Postgres UPSERT (i.e., INSERT … ON CONFLICT) queries and CTEs
• Triggers
• Entirely different testing strategies, too.
• There is no recipe. Find what works.
106. Leverage Your Database
• We Rails developers love ActiveRecord and Arel for queries.
• But for all its problems, SQL is powerful.
• Data and referential integrity protections can save you.
• Without Postgres’ transactional DDL, the risk and effort would have been
enormously greater. (I’d guess roughly tenfold.)
• Stored procedures and triggers have their place.
107. The Luxury of Downtime
• We have the luxury of being able to schedule maintenance time.
• If you can, do that.
• If not, you have to explore other techniques. (It’s worth bringing in an
experienced database consultant if you need to explore these.)
108. Focus: The Two-Edged Sword
• These kinds of tasks really benefit from intense focus.
• But that kind of focus can keep you from seeing danger.
• Make sure you come up for air and have someone looking over your
shoulder.
109. What Would We Do Differently?
• If we had clearly understood our end goal, we could have done all of this
in stage 1.
• But we still thought we were building a social graph.
• You can never be sure you understand the future of your business.
110. What Would We Do Differently?
• There is one mistake we could have avoided based on technical principles.
• We should never have used UUID primary keys.
• They are useful only if you need to distribute primary key creation.
• Probably where contention on the primary key sequence is a bottleneck.
• Maybe also when you need to provide a key with less latency than a DB
round trip.
• THAT’S IT.