This is a peek at PayPal’s inner workings. Take a broad look at the technologies and foundations of the PayPal system, and the history and evolution behind its design. Take a tour of PayPal’s service-oriented architecture, the techniques the engineering team uses to achieve such secure financial transaction processing, and how PayPal innovates at scale.
16. app
1 work
2 work
3 COMMIT
Redo log
standby offsite failover
17. app A
1 work
2 work
3 insert message 5 read message
app B
4 COMMIT
6 if message isn’t “done”
7 work
8 mark message as “done”
9 COMMIT
18. 5 sync call
app A
1 work
2 work
3 insert message 10 read message
app B
4 COMMIT
6 if message isn’t “done”
7 work
8 mark message as “done”
9 COMMIT
19.
20. physical security
machine access controls
firewalls
service-level access control
encryption on the wire
encryption at rest
hardware encryption (HSM)
21. balance log balance: $100 Account Activity
Oct 1 open: $0 Oct 12 Add Funds $15
Oct 12 +$150 from Bank 0
Oct 13 - $50 Oct 13 Balance- $50
funded
Payment
what paypal tech doesI picked some problemsideas influencing devsampler platetechnical, dense12:30-1:15 Thurs 10/13 Room 2018(40m, 5m Q&A)
startuprelease nights, stayed up late, ate eggrolls, and crossed fingers that the site wouldn’t crashdb pw
daughter, (new reason to stay up all night)still eat eggrolls for release, but now it happens during the day (don’t sleep under desks)even if I wanted, customer data with 10 foot polewhat’s inside?
early engineerswere smart, motivated - take over the worldno problem unsolveable, no technique off-limitsinnovation (GL – impl of captcha)huge copy/paste, “magic” communication
voila! inside kind of looks like this.
spend money on reliable data storehttpd & geronimo
1998 ecosystemfounding culture of “we can build it better” – different nowC++ ecosystem is weak compared to what you see in Java, Pythonwhat does all this stuff do?
trend: API box covering more
tech problems, our solutions
3 themes for problems
why do we consider this “reliable” at this point?redo log for raw datastandby – fast failoveroffsite – recover from disasterwhat about pieces in payments that have to work together
lots of different systems involved in fulfilling a payment, working together reliablyif you wonder about delay
ensure that a payment reaches the end stateinfra technique used many domainsas a business that deals with money, how do we build trust that nothing fishy is going on?
trustworthy!how do you prevent or and detect tampering?
responsibleexamples in the industrypreso: Bill Corry info secwhat about people that are allowed to touch these things?
Two ways to answer the same questionchain of comparisons ultimately takes you to border between PP and external financial systemor penny-slicing, like insuperman III?round-off slicing doesn’t really apply – PP is in the middle of fxtxns, and round-off would have to be a txn
tricky word because it can apply to a lot of things
Does your codebase have “room”?tech organization that experienced huge, continuous growth, all of these dims have had “scaling” challenges
payment processing capacitymem managementzombie boxes in/out rotationpush (connections)less eff, but SIMPLE to debug/operateput on read-only instances of DB’shorizontal scaling, indefiniteisolate problems of state-mgt
our strategy for scaling readsauthentication, customer historybut what about state?
monster box, could take 128 CPUs, started with 48
too much work sync between cpuswork not totally independent (indices, etc.)
business functionpartition by domain, independent machinesone machine lot of CPUs couldn’t do it
dependencies!points of failure!work gets to be too big for one?
don’t need all partitions to serve requestlocalize customer data to an in-country datacenterwhat about work spanning users (later)
hiring more peoplelogisticsgetting people into a roommake room, keep small scope
domainsexercise at first forced us to define “what paypalis”given these buckets, next question was how to ensure ensure dependencies don’t just turn it into a black holesmall scope, SOAbut something else that’s proved effective is…
files that specify dependenciestopology and securitymaven has thistools that tell us, constrain, what talks to whatenforces boundaries – keeps things apartbut how do you get composition without coupling?
some principles that we’re working with, to ensure scales and is reliablesum it up in one phrase
ACID propertiesdo work, spanning lots of systems, together, in a way that ensures a consistent outcomeincomplete financial activityhow to classify “work” or transactions?
how to keep these transactions consistent at scale?first two are easy – replicas, work on single partitions
entities are customersother happens eventuallydecouples how they get their job done, contentionconsistent state change across entitiescloud storage APIs – this constraint in txnswhat’s the difficulty with this model?
RPC/http/soapreliable, trustworthy - “unknown” is the worst possible answer!all systems with RPC-style interaction, with remote stateas you partition state, more places where this can happencoding this is a mess – have infra help you
constraints, freedoms of: data access, RPC, memory/process model. your public APIs.Constraints define what happen when you scalewe underestimated the weight/cost of this, and didn’t invest enough early in engineering solutionsimplications can’t be completely hidden from app; rely on infrayour core competency, differentiator
motivating designconstantly looking for ways to push this into the infrastructures.t.devs don’t have to worry about scale, throughput, reliability, correctness – our differentiator
good read because:-short, with lots of pictures; appear intelligent without having to read a library- covers fundamental issues in large-scale, reliable, distributed systemsweakened ACIDPP Wars Eric Jackson, beginnings, war stories