SlideShare una empresa de Scribd logo
1 de 33
PayPal Risk Platform
High Performance Practice
Ling ZhiJun (Brian Ling)
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
2017 Software Architecture Summit
TPV/day
~1
BILLIONpayments/year
6.1
BILLIO
N
Computation/day
~20
Billion
Active Customer
Accounts
210M
petabytes of
data
105
Queries/ day
250
Billion
PayPal operates
one of the largest
Online
Payment
in the world
0.32%
Loss Rate
The power of
our platform
Our technology transformation enables us to:
• Process payments at tremendous scale (200+ countries & 25currencies
supported)
• Accelerate the innovation of new products
• Engage world-class developers & technologists
PayPal Overview
2017 Software Architecture Summit
TPV
+35
4
BILLION
payments/year
6.1
BILLIO
N
payments/
second at peak
1.8B
active customer
accounts
210M
petabytes of
data
73
database
calls/ quarter
4.5T
PayPal operates
one of the largest
Online
Payment
in the world
0.32%
Loss Rate
The power of
our platform
Our technology transformation enables us to:
• Process payments at tremendous scale (200+ countries & 25currencies
supported)
• Accelerate the innovation of new products
• Engage world-class developers & technologists
PayPal Risk KPI
Payments
transactions
Requirement for Risk Platform
Accuracy vs Latency Low Latency + Hardware Investment
Vs Large Throughput
2017 Software Architecture Summit
PayPal Risk Platform Architecture
Online
Offline
DAL
Service
Real-time
Compute Data
Offline
Generated Data
Model +
Variable
Computation
Service
Decision
Service
Variable Rollup
Service
Logging System/ ETL
Read
Path
Write
Path
Gateway
Service
Offline
Generated Data
Simulated
Real-time
Data
Offline Variable
Simulation
PlatformModel
Training
Platform Offline Variable
Aggregation
Service
2017 Software Architecture Summit
PayPal Risk Platform Architecture
Online
Offline
DAL
Service
Offline
Generated Data
Real-time
Compute Data
Model +
Variable
Computation
Service
Decision
Service
Variable
Aggregation
Service
Logging System/ ETL
Read
Path
Write
Path
Gateway
Service
Offline
Generated Data
Simulated
Real-time
Data
Offline Variable
Simulation
PlatformModel
Training
Platform Offline Variable
Aggregation
Service
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
DAL Service Ultimate Questions
JVM-Based High Performance & ATB DAL Service
<100ms P99.99 Latency ??
For single instance, 20k-30k Peak TPS ??
• 99.99% Availability-To-Business??
DAL Service Technical Challenges
Budget Cost
• Align with traffic, Hardware
investment Exponential Increase
Performance Issue
• P99 Latency Significantly
differentiate Avg latency
• Too Many Latency Spike under
Traffic
• Storage Cluster Unavailability Impact
Latency
Customer Requirement
• Adopt New Use Case
• Access behavior Differentiate per
Colo
• Flexibility & Fast-evolving Use Case
• Replication
• Traffic Strategy
Operational Cost
• Maintain too many Client with
multiple versions
• Too Frequent Release tie to Biz
Case
• Standby Storage Cluster switch-
over
Req
Tech
Value Cost
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
2017 Software Architecture Summit
Async Original Benefit
• More Efficient Thread Scheduling
• Non-blocking Call
• Event-Driven Callback
• Less Context Switch
• Fault Isolation
2017 Software Architecture Summit
Reactor Pattern Threading Model
2017 Software Architecture Summit
Async DAL Service KPI Comparison
• Low Latency
• ~10-35% Reduction (Average/P99)
0
20000
40000
60000
80000
100000
120000
200030004000500060007000800090001000011000120001300014000150001600017000
LATENCY(INMICROSECONDS)
THROUGHPUT (REQUESTS PER SEC)
E2E Client-Service-Aerospike
Benchmark: Read 50% Write 50%
Latency vs. Throughput (4-core VM)
99thPercentileLatency_update 99thPercentileLatency_read
AvgLatency_read AvgLatency_update
99.9thPercentileLatency_read 99.9thPercentileLatency_update
99.99thPercentileLatency_read 99.99thPercentileLatency_update
2017 Software Architecture Summit
Async DAL Service KPI Comparison – Cont.
• High Throughput
• 3-10X Increase (Single Instance Comparison)
2017 Software Architecture Summit
Async DAL Service KPI Comparison – Cont.
• Less CPU Usage
• 50% CPU Usage Reduction
• 66%+ Reduction for Context Switch & System Interrupts
2017 Software Architecture Summit
Async DAL Service KPI Comparison – Cont.
• Less Thread Pool
• 90% Reduction for Thread pool number
0
20
40
60
80
100
120
140
160
180
200
Server RPC Thread Operation Thread Replication Thread Management Thread
9
0 0 2
200
14
40
2
Thread Number Comparison
Async Sync
Async DAL Service KPI Comparison – Cont.
• Memory Friendly
• 20% Reduction for Memory Allocation
• 100+MB Young Generation after Young GC
• 130+MB Pooled Off-heap
0.00%
0.01%
0.02%
0.03%
0.04%
0.05%
0.06%
0.07%
Sync Async
GC Time / Total Time
GC Time / Total Time
0
50
100
150
200
250
300
350
Sync Async
GC Count
GC Count
We Have ONE Async Dream
• Reform Application Charter from CPU-bound Charter to IO-
bound
• Traffic Throughput (non-)linear growth with CPU Usage
• By guarantee Low Latency, Taking 20-30K TPS with 500MB
JVM Heap (After young GC)
• Cloud Friendly Application
• Less Hardware Investment
• Low Operational Cost
• Easy Capacity Estimation
High Performance Design
E2E Async • Non-blocking Pipeline: Async
RPC + Async DataAccess
Less is More • Shared ThreadPool OVER
Separate ThreadPool
• Inline Execution over
Execution cross Multiple
Thread Pool
Autonomous Memory
Management
• Use Off-Heap as much as
possible
(inbound/outbound &
[de]serialization)
• Release Inbound Memory At
earlier stage (submitRequest)
High Performance Good Practice
• Performance Test as Critical Path
for Each Commit
• [Mandatory] Continuous
Performance Test for Each
Commit
Inbound/Outbound
Management
• Batch Consolidation
• Order Management
• Timeout Management
• Retry Only Happen in Client Side
Programming Habit • Fast Fail over Exception Thrown
Cascading
• Logging & Monitoring Matters
• Thread-safe Write Operation In
Control Plan while Exception-safe
Read Operation In Data Plane
KPI Sign-Off
Async High Level Architecture
Real Time Data Service
Data Set Clients
Data Set 1
Client
Data Set N
Client
Data Set Schema
Data Access API Metadata API Generic Configuration API
KV Store APIClient
Server
Biz logic
HTTP(s) RPC Client
HTTP(s) RPC Server
KV Store API
Generic logic
Schema-less
Read
KV Store
Metadata namespace Data set namespace
Configuration
namespace
Direct access
Service access
Store/Cache
Async DAL Service Hierarchy
Async Data Access Maturity
• Client& Server RoR Identification
• biz-schema aware on Client Side
• Schema-less on Sever Side
• Traffic Sharding & Routing
• Active-Active/Active-Standby
• Auto-Failover
• Multi-Tenancy
• ACL
• Direct/Service-To-Service Replication
… ....
• Source-of-Truth for Online Guideline &
Offline Inventory
• Centralized Configuration
• Zero Restart/Auto-Fresh
DAL Service Feature
Metadata Driven
Data Access
Mapping
DataSet => KV Mapping
Logical => Physical DataSet Mapping
2017 Software Architecture Summit
Async RPC Control Plane Abstraction
2017 Software Architecture Summit
Async RPC Maturity
• Configurable Execution Chain per URL
• Customize protobuf / json encoder
• Inject Monitoring Module
• Execution Resource Configuration
• Threadpool size / netty option (tcp_nodelay)
• Sharable or not
• Service Listener Registry
• Server Container Life Cycle Management
• Graceful Shutdown
• Partial Shutdown Given Container
• Auto Rebuild RPC Client Channel
High Flexibility
Configuration
RPC Resource
Management
Async RPC Embrace Async DataAccess
Async Core Value
• Low Latency + High Throughput
• Low System Load
• SLA Isolation
• Understand Performance Contribution More
• Zero Code Change + Zero Release (new case
on-board)
• Minimize new DB Storage Integration Effort
• Lego-Style Customization
• Highly Reusable Functionality
High Performance
Easy Adoption
Cost Saving • Less Hardware Investment
• Loose Constraint for Hardware/VM SKU
High Flexibility
Configuration
• Execution Chain per URL (RPC)
• DataAccess Storage & Option [consistency &
ttl]
• Traffic Routing Strategy
• Replication Strategy
2017 Software Architecture Summit
Async Family
Async
Data
Access
RPC
(Server/
Client)
In-Memory
Aerospike
Workflow
Messaging
(pub-sub)
Kafka
ActiveMQ
Netty
HBase
2017 Software Architecture Summit
AGENDA
PayPal & PayPal Risk (Platform)
Risk DAL Service Challenge
Async Solution
Async Future Plan
Future Plan
• Shared Eventloop
• Netty Option (IO Ratio)
• NIO vs Epoll SocketChannel
• JDK SSL vs OpenSSL
• Protobuf vs Msgpack
• Sync Client vs Async Client
• W/- Monitoring/Replication features
Async DataAccess • Compute Operation Support
• DB Server-side UDF Adoption
• Smart Client for Direct & Service Access
• Async HBase Integration
Async RPC • Finer Granularity Monitoring & Throttling
• Error Handling Injection
• Client Side Multiplexing
• Server Push Partial Response + RPC Client
Consolidate Response
Async+Sync Hybrid Workflow Execution
Continuous Performance
Tuning Deep Dive
Open Source in Year 2019
2017 Software Architecture Summit

Más contenido relacionado

La actualidad más candente

Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
Alejandro Correa Bahnsen, PhD
 

La actualidad más candente (20)

DBX Open Banking
DBX Open BankingDBX Open Banking
DBX Open Banking
 
The Path to Open Banking
The Path to Open BankingThe Path to Open Banking
The Path to Open Banking
 
India Stack/Aadhaar Stack
India Stack/Aadhaar StackIndia Stack/Aadhaar Stack
India Stack/Aadhaar Stack
 
Fintech 2021: Overview and Applications
Fintech 2021: Overview and Applications  Fintech 2021: Overview and Applications
Fintech 2021: Overview and Applications
 
Open Banking - Opening the door to Digital Transformation
Open Banking - Opening the door to Digital Transformation Open Banking - Opening the door to Digital Transformation
Open Banking - Opening the door to Digital Transformation
 
Payment Gateway Integration: Growth Strategy for SAAS
Payment Gateway Integration: Growth Strategy for SAASPayment Gateway Integration: Growth Strategy for SAAS
Payment Gateway Integration: Growth Strategy for SAAS
 
Fintech Overview and Growth Drivers
Fintech Overview and Growth DriversFintech Overview and Growth Drivers
Fintech Overview and Growth Drivers
 
Cryptocurrency
Cryptocurrency Cryptocurrency
Cryptocurrency
 
Payment Gateway
Payment GatewayPayment Gateway
Payment Gateway
 
Payment gateway
Payment gatewayPayment gateway
Payment gateway
 
The future of banking
The future of bankingThe future of banking
The future of banking
 
EU Digital Identity Wallet - INNOPAY.pptx
EU Digital Identity Wallet - INNOPAY.pptxEU Digital Identity Wallet - INNOPAY.pptx
EU Digital Identity Wallet - INNOPAY.pptx
 
IBM Payments Gateway
IBM Payments GatewayIBM Payments Gateway
IBM Payments Gateway
 
Monetize with PayPal X Payments Platform
Monetize with PayPal X Payments PlatformMonetize with PayPal X Payments Platform
Monetize with PayPal X Payments Platform
 
Open banking [Evolution, Risks & Opportunities]
Open banking [Evolution, Risks & Opportunities]Open banking [Evolution, Risks & Opportunities]
Open banking [Evolution, Risks & Opportunities]
 
Peter Afanasiev - Architecture of online Payments
Peter Afanasiev - Architecture of online PaymentsPeter Afanasiev - Architecture of online Payments
Peter Afanasiev - Architecture of online Payments
 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
 
E wallet
E walletE wallet
E wallet
 
Chances of open banking
Chances of open banking Chances of open banking
Chances of open banking
 
PagerDuty: Optimizing Incident Response to Deliver Amazing Digital Experiences
PagerDuty: Optimizing Incident Response to Deliver Amazing Digital ExperiencesPagerDuty: Optimizing Incident Response to Deliver Amazing Digital Experiences
PagerDuty: Optimizing Incident Response to Deliver Amazing Digital Experiences
 

Similar a PayPal Risk Platform High Performance Practice

Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
Claudiu Barbura
 

Similar a PayPal Risk Platform High Performance Practice (20)

Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
Pace of Innovation at AWS - London Summit Enteprise Track RePlay
Pace of Innovation at AWS - London Summit Enteprise Track RePlayPace of Innovation at AWS - London Summit Enteprise Track RePlay
Pace of Innovation at AWS - London Summit Enteprise Track RePlay
 
Serverless Computing: Run code, not servers
Serverless Computing: Run code, not serversServerless Computing: Run code, not servers
Serverless Computing: Run code, not servers
 
Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018
 
Kinesis @ lyft
Kinesis @ lyftKinesis @ lyft
Kinesis @ lyft
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Data & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real TimeData & Analytics Forum: Moving Telcos to Real Time
Data & Analytics Forum: Moving Telcos to Real Time
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache Samza
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Customer Sharing: HTC - What is in AWS Cloud for me?
Customer Sharing: HTC - What is in AWS Cloud for me?Customer Sharing: HTC - What is in AWS Cloud for me?
Customer Sharing: HTC - What is in AWS Cloud for me?
 
Cortex v5: Re-designed Re-engineered Re-launched
Cortex v5: Re-designed Re-engineered Re-launchedCortex v5: Re-designed Re-engineered Re-launched
Cortex v5: Re-designed Re-engineered Re-launched
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
Scale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on AzureScale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on Azure
 
Metrics driven development with dedicated Observability Team
Metrics driven development with dedicated Observability TeamMetrics driven development with dedicated Observability Team
Metrics driven development with dedicated Observability Team
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 

Último

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 

Último (15)

lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 

PayPal Risk Platform High Performance Practice

  • 1. PayPal Risk Platform High Performance Practice Ling ZhiJun (Brian Ling)
  • 2. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 3. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 4. 2017 Software Architecture Summit TPV/day ~1 BILLIONpayments/year 6.1 BILLIO N Computation/day ~20 Billion Active Customer Accounts 210M petabytes of data 105 Queries/ day 250 Billion PayPal operates one of the largest Online Payment in the world 0.32% Loss Rate The power of our platform Our technology transformation enables us to: • Process payments at tremendous scale (200+ countries & 25currencies supported) • Accelerate the innovation of new products • Engage world-class developers & technologists PayPal Overview
  • 5. 2017 Software Architecture Summit TPV +35 4 BILLION payments/year 6.1 BILLIO N payments/ second at peak 1.8B active customer accounts 210M petabytes of data 73 database calls/ quarter 4.5T PayPal operates one of the largest Online Payment in the world 0.32% Loss Rate The power of our platform Our technology transformation enables us to: • Process payments at tremendous scale (200+ countries & 25currencies supported) • Accelerate the innovation of new products • Engage world-class developers & technologists PayPal Risk KPI Payments transactions
  • 6. Requirement for Risk Platform Accuracy vs Latency Low Latency + Hardware Investment Vs Large Throughput
  • 7. 2017 Software Architecture Summit PayPal Risk Platform Architecture Online Offline DAL Service Real-time Compute Data Offline Generated Data Model + Variable Computation Service Decision Service Variable Rollup Service Logging System/ ETL Read Path Write Path Gateway Service Offline Generated Data Simulated Real-time Data Offline Variable Simulation PlatformModel Training Platform Offline Variable Aggregation Service
  • 8. 2017 Software Architecture Summit PayPal Risk Platform Architecture Online Offline DAL Service Offline Generated Data Real-time Compute Data Model + Variable Computation Service Decision Service Variable Aggregation Service Logging System/ ETL Read Path Write Path Gateway Service Offline Generated Data Simulated Real-time Data Offline Variable Simulation PlatformModel Training Platform Offline Variable Aggregation Service
  • 9. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 10. DAL Service Ultimate Questions JVM-Based High Performance & ATB DAL Service <100ms P99.99 Latency ?? For single instance, 20k-30k Peak TPS ?? • 99.99% Availability-To-Business??
  • 11. DAL Service Technical Challenges Budget Cost • Align with traffic, Hardware investment Exponential Increase Performance Issue • P99 Latency Significantly differentiate Avg latency • Too Many Latency Spike under Traffic • Storage Cluster Unavailability Impact Latency Customer Requirement • Adopt New Use Case • Access behavior Differentiate per Colo • Flexibility & Fast-evolving Use Case • Replication • Traffic Strategy Operational Cost • Maintain too many Client with multiple versions • Too Frequent Release tie to Biz Case • Standby Storage Cluster switch- over Req Tech Value Cost
  • 12. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 13. 2017 Software Architecture Summit Async Original Benefit • More Efficient Thread Scheduling • Non-blocking Call • Event-Driven Callback • Less Context Switch • Fault Isolation
  • 14. 2017 Software Architecture Summit Reactor Pattern Threading Model
  • 15. 2017 Software Architecture Summit Async DAL Service KPI Comparison • Low Latency • ~10-35% Reduction (Average/P99) 0 20000 40000 60000 80000 100000 120000 200030004000500060007000800090001000011000120001300014000150001600017000 LATENCY(INMICROSECONDS) THROUGHPUT (REQUESTS PER SEC) E2E Client-Service-Aerospike Benchmark: Read 50% Write 50% Latency vs. Throughput (4-core VM) 99thPercentileLatency_update 99thPercentileLatency_read AvgLatency_read AvgLatency_update 99.9thPercentileLatency_read 99.9thPercentileLatency_update 99.99thPercentileLatency_read 99.99thPercentileLatency_update
  • 16. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • High Throughput • 3-10X Increase (Single Instance Comparison)
  • 17. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • Less CPU Usage • 50% CPU Usage Reduction • 66%+ Reduction for Context Switch & System Interrupts
  • 18. 2017 Software Architecture Summit Async DAL Service KPI Comparison – Cont. • Less Thread Pool • 90% Reduction for Thread pool number 0 20 40 60 80 100 120 140 160 180 200 Server RPC Thread Operation Thread Replication Thread Management Thread 9 0 0 2 200 14 40 2 Thread Number Comparison Async Sync
  • 19. Async DAL Service KPI Comparison – Cont. • Memory Friendly • 20% Reduction for Memory Allocation • 100+MB Young Generation after Young GC • 130+MB Pooled Off-heap 0.00% 0.01% 0.02% 0.03% 0.04% 0.05% 0.06% 0.07% Sync Async GC Time / Total Time GC Time / Total Time 0 50 100 150 200 250 300 350 Sync Async GC Count GC Count
  • 20. We Have ONE Async Dream • Reform Application Charter from CPU-bound Charter to IO- bound • Traffic Throughput (non-)linear growth with CPU Usage • By guarantee Low Latency, Taking 20-30K TPS with 500MB JVM Heap (After young GC) • Cloud Friendly Application • Less Hardware Investment • Low Operational Cost • Easy Capacity Estimation
  • 21. High Performance Design E2E Async • Non-blocking Pipeline: Async RPC + Async DataAccess Less is More • Shared ThreadPool OVER Separate ThreadPool • Inline Execution over Execution cross Multiple Thread Pool Autonomous Memory Management • Use Off-Heap as much as possible (inbound/outbound & [de]serialization) • Release Inbound Memory At earlier stage (submitRequest)
  • 22. High Performance Good Practice • Performance Test as Critical Path for Each Commit • [Mandatory] Continuous Performance Test for Each Commit Inbound/Outbound Management • Batch Consolidation • Order Management • Timeout Management • Retry Only Happen in Client Side Programming Habit • Fast Fail over Exception Thrown Cascading • Logging & Monitoring Matters • Thread-safe Write Operation In Control Plan while Exception-safe Read Operation In Data Plane KPI Sign-Off
  • 23. Async High Level Architecture Real Time Data Service Data Set Clients Data Set 1 Client Data Set N Client Data Set Schema Data Access API Metadata API Generic Configuration API KV Store APIClient Server Biz logic HTTP(s) RPC Client HTTP(s) RPC Server KV Store API Generic logic Schema-less Read KV Store Metadata namespace Data set namespace Configuration namespace Direct access Service access Store/Cache
  • 24. Async DAL Service Hierarchy
  • 25. Async Data Access Maturity • Client& Server RoR Identification • biz-schema aware on Client Side • Schema-less on Sever Side • Traffic Sharding & Routing • Active-Active/Active-Standby • Auto-Failover • Multi-Tenancy • ACL • Direct/Service-To-Service Replication … .... • Source-of-Truth for Online Guideline & Offline Inventory • Centralized Configuration • Zero Restart/Auto-Fresh DAL Service Feature Metadata Driven Data Access Mapping DataSet => KV Mapping Logical => Physical DataSet Mapping
  • 26. 2017 Software Architecture Summit Async RPC Control Plane Abstraction
  • 27. 2017 Software Architecture Summit Async RPC Maturity • Configurable Execution Chain per URL • Customize protobuf / json encoder • Inject Monitoring Module • Execution Resource Configuration • Threadpool size / netty option (tcp_nodelay) • Sharable or not • Service Listener Registry • Server Container Life Cycle Management • Graceful Shutdown • Partial Shutdown Given Container • Auto Rebuild RPC Client Channel High Flexibility Configuration RPC Resource Management
  • 28. Async RPC Embrace Async DataAccess
  • 29. Async Core Value • Low Latency + High Throughput • Low System Load • SLA Isolation • Understand Performance Contribution More • Zero Code Change + Zero Release (new case on-board) • Minimize new DB Storage Integration Effort • Lego-Style Customization • Highly Reusable Functionality High Performance Easy Adoption Cost Saving • Less Hardware Investment • Loose Constraint for Hardware/VM SKU High Flexibility Configuration • Execution Chain per URL (RPC) • DataAccess Storage & Option [consistency & ttl] • Traffic Routing Strategy • Replication Strategy
  • 30. 2017 Software Architecture Summit Async Family Async Data Access RPC (Server/ Client) In-Memory Aerospike Workflow Messaging (pub-sub) Kafka ActiveMQ Netty HBase
  • 31. 2017 Software Architecture Summit AGENDA PayPal & PayPal Risk (Platform) Risk DAL Service Challenge Async Solution Async Future Plan
  • 32. Future Plan • Shared Eventloop • Netty Option (IO Ratio) • NIO vs Epoll SocketChannel • JDK SSL vs OpenSSL • Protobuf vs Msgpack • Sync Client vs Async Client • W/- Monitoring/Replication features Async DataAccess • Compute Operation Support • DB Server-side UDF Adoption • Smart Client for Direct & Service Access • Async HBase Integration Async RPC • Finer Granularity Monitoring & Throttling • Error Handling Injection • Client Side Multiplexing • Server Push Partial Response + RPC Client Consolidate Response Async+Sync Hybrid Workflow Execution Continuous Performance Tuning Deep Dive Open Source in Year 2019

Notas del editor

  1. DAL Service: Control Connection Pool Centralized Control & Highly Reusability (easily storage migration/non-backward compatible migration & throttling & ACL control) => Minimize Client Upgrade & Integration Effort Seamless storage switch & upgrade
  2. Control Connection Pool Centralized Control & Highly Reusability (easily storage migration/non-backward compatible migration & throttling & ACL control) Minimize Client Upgrade & Integration Effort
  3. GC issue Lock Contention (non-blocking) Threading switch & context switch IO Blocking cache line refresh/cache miss IPC => instruction per cycle
  4. Use case: TTL/timeout ACL Replication Traffic strategy
  5. Leverage OS support event-driven notification: windows IOCP & Linux Epoll & osx kqueue Fully leverage CPU Cycle only for Inbound & outbound Handle Short-lived Thread Task for better Thread Usage Not-involve Client Thread for blocking waiting for downstream storage response & less impact for Client System Resource Usage 我们可以知道Epoll不负责IO操作,所以它只告诉你当前可读可写了,并且将协议读写缓冲填充,由用户去读写控制,此时我们可以做出额外的许多操作。IOCP则直接将IO通道里的读写操作都做完了才通知用户,当IO通道里发生了堵塞等状况我们是无法控制的。
  6. 反应器模式:Boss Thread同步的将输入的请求事件 利用多路复用分配策略快速分发给相应的Worker Thread Handler 通过底层数据存储回调事件 通知事后的Response 处理 ** 异步操作:有通知无需轮询检查 非堵塞:操作结果是否等待(是否马上有返回值)由回调的事件触发后续RPC Channel flush 返回结果给客户端
  7. Under same throughput situation
  8. Async for platform-wise & framework level, for business logic, not easy to adopt async pattern Use off-heap: Schema-less for inbound & outbound Release request memory: Retry won’t happen in DAL service
  9. Aerospike: High write performance & specific optimization for SSD => 1M TPS with P99 <1ms DRAM/SSD Hybrid Solution High ATB & Scalability | Local Replication & XDR Aerospike VLDB 2016 Paper
  10. Batch & Retry Traffic Routing & HA ACL & Multi-Tenancy
  11. 以性能为导向的 可靠的 全链路异步服务访问框架 灵活支持企业级需求 数据访问 可配置 高性能 异步RPC访问