2. Active/Active
!
What
• Resilient to datacenter-level failure
• Resilient to Internet routing
problems
• Transparent to the merchant
• No human intervention
!
Why
• Every second of uptime matters to
our merchants. Goal is 5 9s.
• Much easier and safer to perform
datacenter-level maintenance.
3. Challenges
!
Inconsistent state between
datacenters
Datacenters can’t tell if a transaction
has already been processed
elsewhere.
!
Limited idempotence
Payment networks can’t reliably
guarantee idempotence on retries.
!
Real-time latency requirements
We can’t just wait until our
datacenters get in sync.
!
!
8. Multi-Tender
Multi-DC challenge
Scenario
When Merchant try to sell items/products to customers, customers will
have the option to pay with multiple tenders.
!
APIs
1. 1. CreateBill
2.2. AddTender
3.3. CompleteBill / CancelBIll
!
Challenges
1. 1. Each time we receive a tender request, we need to process this
tender immediately. Thus different tenders for the same bill may be
processed at different data centers.
2.2. When receiving the CompleteBill request, we may need to wait for
the tender information from remote data center.
10. Multi-Tender
Multi-DC resolution
State Machine
Tender state machine
!
!
!
!
Bill state machine
!
!
!
Correctness
1. 1. A formal proof
2.2. Simulate all the possible operational combinations and verify the
results
11. Caveats
Eventually consistent
Asynchronous, eventually consistent
systems are harder to reason about.
!
Complex
Active/active systems are harder to
design, implement, and test.
!
Data Loss
If the original data center is down and
never comes back, we may not be able
the perform the capture due to the loss
of original auth.
!
Downstream effects
Not all downstream effects are
reversible.
12. Future Plans
!
We want a storage solution with the
following properties:
1. Horizontally scalable
2. Tolerant to DC failure
3. Transactional
!
CockroachDB: a Scalable, Geo-
Replicated, Transactional Datastore
!
!
!
!
!
http://cockroachdb.org/