The 30-Month Migration

The 30-Month Migration
Glenn Vanderburg

VP of Engineering, First.io

@glv

Changing Your Data Model 
Is Hard!

Living With a Poor Data Model 
Is Also Hard!

Four Stages, 2½ Years
15Nov
Today
9Feb
Stage 2 Stage 3Stage 1 Stage 4
20Jun
18Nov
15Jan
26Jan
20Mar
7Dec
2016 2017 2018 2019

30 months!
15Nov
Today
9Feb
Stage 2 Stage 3Stage 1 Stage 4
20Jun
18Nov
15Jan
26Jan
20Mar
7Dec
2016 2017 2018 2019

A Technical Talk, 
with Mostly Non-Technical Lessons

Three Principles
•Validation

•Reversibility

•Transparency

Introduction: 
Our Big Mistakes*
* So far.

Realtor
Abby
Isaac
Jane
Kathy
Lee
Mike
Nancy
Realtor
Bill
Oscar
Pat
Quentin
Robert
Sally
Tina

–Gerald Weinberg
Things are the way they are
because they got that way.

Realtor
Abby
Postgres Neo4j
Realtor
Bill
Realtor′
Abby
Realtor′
Bill
Isaac
Jane
Kathy
Lee
Mike
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR
CR

Isaac
Jane
Kathy
Lee
Mike
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
Abby
Bill
realtors Realtors′
ContactRelationships
Contacts
Other tables
here: subscriptions,
payments, notes,
appointments, etc.
Many other
relationships
between
contacts, and
between
contacts and
their attributes.
Postgres Neo4j

Postgres
Isaac
Jane
Kathy
Lee
Mike
Nancy
Quentin
Sally
…
Kathy
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
realtors
contacts

Stage 1: 
From Neo4j to Postgres

What Drove the Change?
• Neo4j/Cypher not as familiar to developers as Postgres/SQL

• Neo4j ActiveModel gem less mature and feature-rich than
ActiveRecord

• Neo4j drivers less mature, less well optimized

• Some features required cross-database joins (slow, memory
intensive)

Making a Plan
• Realtor-by-realtor migration

• An importer job that would import a realtor’s Neo4j data
into Postgres

• The importer needed to avoid duplicating shared data that
had already been imported for another realtor

• We would use a feature ﬂag to indicate whether a realtor
had been migrated or not

Schema Deﬁnition
• We knew our data in Neo4j was messy.

• Neo4j’s referential integrity features weaker than Postgres’

• We weren’t skilled at using the features Neo4j did have

• We got very serious about data integrity in the schema:

• foreign keys, ON CASCADE, check constraints, exclusion
constraints

• This was enormously helpful!

Switching Models
• The feature flag needed to be readily available everywhere, so we set a
thread-local variable in middleware.

• A lot of queries start off by calling class methods on a model class

• We needed that model class to be the ActiveRecord model if the current
realtor’s feature flag was set, and the Neo4j model otherwise
Person.find(35)
# or
Property.where(zip5: "75238")

Switching Models
• Exploiting Ruby’s dynamic nature, we were able to build models that
could be Neo4j or ActiveRecord models, depending on the feature ﬂag.
class Contact
extend SwitchingModel
switch_between(::ContactV1, ::ContactV2)
end
class ContactV1
include Neo4j::ActiveNode
self.mapped_label_name = "Contact"
# ... Neo4j::ActiveNode model code
end
class ContactV2 < ApplicationRecord
self.table_name = :contacts
# ... ActiveRecord model code
end

Switching Models
module SwitchingModel
def switch_between(v1_model, v2_model)
@_v1_model = v1_model
@_v2_model = v2_model
end
private
def _v2_mode?
Thread.current.thread_variable_get(:moved_to_postgres) ||
ENV['FORCE_V2_FEATURE_FLAG'] == '1'
end
def _switch
return @_v2_model if _v2_mode?
@_v1_model
end
end

Switching Models
module SwitchingModel
def method_missing(meth, *args, &blk)
_switch.send(meth, *args, &blk)
end
def const_missing(name)
_switch.const_get(name)
end
def new(*args)
_switch.new(*args)
end
private
# ...
end

Scopes and More Scopes
• A lot of queries contained Cypher fragments

• Converting those to scopes allowed controllers to use
the same queries, whether the feature ﬂag was set or not

• Built a rich vocabulary of scopes that has served us well
ever since

Testing
• Environment variable override of feature ﬂag

• Rake tasks for running two sets of specs

• Separate sets of factories

• CI running both sets

• Lots of comparison testing by developers

• Whole company QA swarm in staging

Tracking Progress
• Excellent advice from Jess Martin, our CTO

• Added an RSpec custom formatter to output total number of v2 specs vs.
number of passing v2 specs.

• Those went into a spreadsheet with a chart:

Executing
• Select employees ﬁrst (those not doing sales and demos)

• Rest of employees

• Friendly customers (who would inform us of issues)

• Rest of active customers

• The whole process took about three weeks

Finishing the Job
• After the initial round of employee and select customer migrations, we
kicked oﬀ the ﬁrst full batch of customers.
• All of a sudden, I had nothing to do!
• “I may as well start on the PR to rip out all the V1 and transitional code …”
• 10 hours later:

Isaac
Jane
Kathy
Lee
Mike
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina
Abby
Bill
realtors
contact_relationships
contacts
Postgres

Stage 2: 
Change Primary Keys to Integers

• Postgres UUID primary keys work just fine.

• Harder to remember, vdiff, type

• Didn’t become an issue until we needed to start tracking source
info for a different table that had an integer primary key.

• We track sources using a polymorphic join table
(sourcings).

A Spike
⭐ id ⭐ ﬁrst …
abc Joe
def Susan
ghi Rachel
jkl Todd
mno Melanie
contact_names

A Spike
id ﬁrst … integer_id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
⭐ id ⭐
contact_names

A Spike
uuid ﬁrst … integer_id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
contact_names

A Spike
uuid ﬁrst … id
abc Joe 1
def Susan 2
ghi Rachel 3
jkl Todd 4
mno Melanie 5
⭐ id ⭐
contact_names

Problem: Foreign Key References
⭐ id ⭐ …
abc
def
ghi
jkl
mno
properties property_notes
⭐ property_id ⭐ …
jkl
def
abc
mno
ghi

⭐ id ⭐ … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
⭐ property_id ⭐ …
jkl
def
abc
mno
ghi

id … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
property_id … int_property_id
jkl 4
def 2
abc 1
mno 5
ghi 3
⭐ id ⭐ ⭐ property_id ⭐

id … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
jkl 4
def 2
abc 1
mno 5
ghi 3

uuid … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
jkl 4
def 2
abc 1
mno 5
ghi 3

uuid … integer_id
abc 1
def 2
ghi 3
jkl 4
mno 5
… int_property_id
4
2
1
5
3

uuid … id
abc 1
def 2
ghi 3
jkl 4
mno 5
… int_property_id
4
2
1
5
3

uuid … id
abc 1
def 2
ghi 3
jkl 4
mno 5
… property_id
4
2
1
5
3

uuid … id
abc 1
def 2
ghi 3
jkl 4
mno 5
… property_id
4
2
1
5
3
⭐ property_id ⭐⭐ id ⭐

Problem: Polymorphic Tables
• Remember, this started because of a polymorphic join table,
sourcings

• Required converting all tables referenced by the polymorphic
table at once

• Ended up with 5 separate clusters of tables.

• Wrote migration helpers to manage the details and make
things reversible.

Discovering Constraints
• You can query anything about the schema from a set of internal tables and
views

• Example: ﬁnding all foreign key references to the contacts table:

• You can do similar things for indexes and other kinds of constraints
SELECT *
FROM information_schema.constraint_column_usage
WHERE table_name = 'contacts'
AND column_name = 'id'
AND constraint_name <> 'contacts_pkey'

Plan and Wait
• Output of the spike:

• 3 complex migration helpers

• 5 migrations

• Ended up waiting 5 months before the pain outweighed the risk

Five Big Migrations
• Simple case easy:

• Harder cases not so easy:

• Worst case: 3 primary keys, 28 foreign keys, 4 polymorphic tables … all in
one migration.
fix_uuid_primary_key :contact_names
fix_uuid_primary_key :avatars
fix_uuid_primary_key :properties
fix_uuid_foreign_key :properties, :property_notes, on_delete: :cascade
fix_uuid_polymorphic_association :sourcings,
:sourceable,
targets: [:avatars, :properties]

From Spike to Solution
• Careful review of the migrations and helpers

• Ran the migrations many, many times on clone of production DB

• Run, fix error, repeat. (Very thankful for Postgres transactional DDL!)

• Fixing error usually meant figuring out how to reflect on some new kind of
dependency in Postgres and update the helper to deal with it.

• Sometimes meant just coding a workaround for an odd case.

Being Careful
• Ran the migrations in staging for timings

• We had the luxury of downtime!

• But we wanted to understand how long each maintenance window would be.

• Made them reversible!

• We planned for never having to reverse them, including careful testing and
random spot-checks in the migrations.

• But we also made sure they could be reversed (including round-trip testing of
both schema and table contents).

Being Careful
• Build correctness checking into the migration helpers

• Remember: we kept the uuid column

• At start of change: store random sample of records

• After change: find those records and ensure they still refer to same UUID

• Finally deployed on five consecutive weekends (simplest first)

Stage 3: 
Private, Per-Customer Contacts

• Nearly every query had to be ﬁltered based on source

• Extra complexity

• Joining through polymorphic table was costly

• Sooner or later we would miss it and violate data privacy

A Spike
• Doing this in Ruby was fairly straightforward … but very slow (about a day
per realtor)

• Doing it in SQL required fairly advanced skills … but took about ten
minutes per realtor

• As with stage 1, decided on a user-by-user approach

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
id: 3 
id: 2 
realtors contact_relationships
contacts Name: Nancy 
contact_id: 1
Email: nancy@example.com 
contact_id: 1
• Simple example: one contact shared by three realtors.

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• First, add old_contact_id column to contact_relationships
• Populate it with current value of contact_id

The Strategy
id: 1 
contact_relationship_id: "
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
uniqueness constraint 
on contact_relationship_id
• Next, add contact_relationship_id column to contacts
• Populate it with NULL (represented as ")
• Add uniqueness constraint for that column

id: 1 
old_contact_id: 1
The Strategy
id: 1 
contact_relationship_id: "
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1

The Strategy
id: 1 
contact_relationship_id: 1
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
UPDATE contacts
SET contact_relationship_id = contact_relationships.id
FROM contact_relationships
WHERE contacts.id = contact_relationships.contact_id
AND contact_relationships.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
• Update contact_relationship_id IF it’s NULL

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• Now INSERT into contacts for each of Alice’s contact_relationships
• ON CONFLICT just set updated_at on the existing one
• and then UPDATE contact_relationships to point to the new contact records
?

INSERT with ON CONFLICT
WITH new_contacts AS (
INSERT INTO contacts (cr_id, created_at, updated_at)
(
SELECT cr.id AS cr_id, cr.created_at, cr.updated_at
FROM contacts
INNER JOIN contact_relationships cr
ON contact.id = cr.contact_id
WHERE cr.realtor_id = 1
AND contacts.contact_relationship_id IS NULL
ORDER BY cr_id ASC
)
ON CONFLICT ON CONSTRAINT contact_relationship_id_uniqueness
DO UPDATE SET updated_at = EXCLUDED.updated_at
RETURNING *)
UPDATE contact_relationships
SET contact_id = new_contacts.id
FROM new_contacts
WHERE contact_relationships.id = new_contacts.contact_relationship_id;

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• INSERT into contacts for each of Alice’s contact_relationships
• ON CONFLICT just set updated_at on the existing one
• and then UPDATE contact_relationships to point to the new contact records
id: 1001 
X

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• That time, nothing happened, because Alice was the ﬁrst realtor for contact 1.

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• Now let’s try Bill.

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• We try to claim the contact for Bill by updating 
contacts.contact_relationship_id
• But it isn’t NULL, so we don’t update it to 2
X

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• But the INSERT works because it doesn’t create a uniqueness violation
id: 1001 

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• And then the UPDATE ﬁxes up the contact_relationships record
• But what about the attached attributes?
id: 1001 

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• For each of Bill’s contacts where 
contact_relationships.old_contact_id != contacts.id, 
go copy all of the attached attributes from old_contact_id
id: 1001 

The Strategy
id: 1 
Realtor
Alice
Realtor
Bill
Realtor
Carl
id: 1 
old_contact_id: 1
id: 3 
old_contact_id: 1
id: 2 
old_contact_id: 1
contact_id: 1
contact_id: 1
• For each of Bill’s contacts where 
contact_relationships.old_contact_id != contacts.id, 
go copy all of the attached attributes from old_contact_id
• A lot of queries, but basically straightforward
• Then move on to Carl
id: 1001 
contact_id: 2
Name: Nancy 
contact_id: 2

Being Careful
• Again: ran these transformations against a clone of production

• Run for a realtor, compare against that realtor’s production data

• Complete run-through of all realtors in staging before moving on to
production

• During run-through, I plotted changes to table counts as a sanity check

An OUTER JOIN should’ve been an INNER JOIN

Abby
Bill
realtors
contact_relationships
contacts
Postgres
Isaac
Jane
Kathy
Lee
Mike
Nancy
Quentin
Sally
…
Kathy
Nancy
Oscar
Pat
Quentin
Robert
Sally
Tina

Stage 4: 
From Join Table to belongs_to

• Everything’s just a little more complex with the join table

• Requires constraints and integrity checks that wouldn’t be
necessary without it

• Another team member challenged me to get rid of it!

• It really wasn’t causing us enough trouble to justify a big push

• But I realized we could set this up to do opportunistically

The Idea
• Go ahead and add the direct contacts.realtor_id foreign key

• Populate it to match the existing contact_relationships.

• Then just make sure they stay consistent!

Triggers
• Rails developers are wary of stored procedures and triggers (for good
reason)

• But sometimes they’re exactly what you need. This is one of those times.

• I had a lot of ignorance to overcome.

• So I worked on a spike, curling up with the Postgres manual and
experimenting …

ContactRelationships Contacts
Realtors
insert
set realtor_id

Realtors
insert
set realtor_id
insert
X

Realtors
insert
set realtor_id
X
set

Triggers Are Difﬁcult 
(for me, anyway)
• For eﬃciency, control the conditions under which invoked

• For correctness, decide before/after

• Carefully write updates/inserts to only make changes if
things are inconsistent

The Plan: A 12-Step Epic
• Step 1: build a way to track progress

• Step 2: build a way to audit the activity of the triggers

• Step 3: add contacts.realtor_id and triggers

• Steps 4–6: move ﬁelds from contact_relationships to contacts

• Steps 7–8: retargeting polymorphic associations

• Steps 9-11: retargeting associations, scopes, and query fragments

• Step 12: DROP TABLE contact_relationships

Tracking Progress
rg --count --ignore-file .rg_crprogress_ignore '[Cc]ontact_?[Rr]elationship'  
| cut -d : -f 2  
| sed '2,$s/$/+/; $s/$/p/'  
| dc

Auditing Trigger Activity
• Updated the triggers to log behavior to new
contact_relationship_trigger_actions table.

• Utility script to audit this table for consistency occasionally.
id action contact_relationship_id contact_id performed_update time
4995810 c_setrealtor 5622768 FALSE 2019-02-18 13:31:49.395671
4995811 cr_insert 10607228 5622768 TRUE 2019-02-18 13:31:49.395671
4995813 cr_insert 10607230 5622769 TRUE 2019-02-18 13:31:50.181528

Executing
• One step a week

• Each took 8–10 hours, on average

• Most deployments on weekends, even when
no downtime required

Slow and Steady
• Incremental, “worst pain ﬁrst” strategy

• Contained risk

• Enabled feature development

• Produced enormous technical improvement over time

Keep Looking Ahead
• We were always looking for ways to improve the system

• An “inventory of pain” helps you to identify which pain is the worst right
now

Each Stage Was Different!
• Entirely different, creative solutions required at each step

• Ruby magic

• Migrations and database reflections

• Fancy Postgres UPSERT (i.e., INSERT … ON CONFLICT) queries and CTEs

• Triggers

• Entirely different testing strategies, too.

• There is no recipe. Find what works.

Leverage Your Database
• We Rails developers love ActiveRecord and Arel for queries.

• But for all its problems, SQL is powerful.

• Data and referential integrity protections can save you.

• Without Postgres’ transactional DDL, the risk and eﬀort would have been
enormously greater. (I’d guess roughly tenfold.)

• Stored procedures and triggers have their place.

The Luxury of Downtime
• We have the luxury of being able to schedule maintenance time.

• If you can, do that.

• If not, you have to explore other techniques. (It’s worth bringing in an
experienced database consultant if you need to explore these.)

Focus: The Two-Edged Sword
• These kinds of tasks really beneﬁt from intense focus.

• But that kind of focus can keep you from seeing danger.

• Make sure you come up for air and have someone looking over your
shoulder.

What Would We Do Differently?
• If we had clearly understood our end goal, we could have done all of this
in stage 1.

• But we still thought we were building a social graph.

• You can never be sure you understand the future of your business.

What Would We Do Differently?
• There is one mistake we could have avoided based on technical principles.

• We should never have used UUID primary keys.

• They are useful only if you need to distribute primary key creation.

• Probably where contention on the primary key sequence is a bottleneck.

• Maybe also when you need to provide a key with less latency than a DB
round trip.

• THAT’S IT.

The 30-Month Migration

Recomendados

Recomendados

Más contenido relacionado

Similar a The 30-Month Migration

Similar a The 30-Month Migration (20)

Último

Último (20)

The 30-Month Migration