SlideShare una empresa de Scribd logo
1 de 40
Skype’s Journey From P2P:
It’s Not Just About the Services
Bruce Lowekamp
People and Connections
June 27, 2018
Microsoft’s Intelligent Conversations and
Communications Cloud (IC3)
Powering Skype, Teams, and O365
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
skype-p2p
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Skype History
• First released in 2003
• P2P, based on Global Index originally used for KaZaa file sharing
• Chat, audio, video, file sharing, contact invites all over P2P
• Acquired by Microsoft in 2011
• Supernodes moved to datacenters
• Chat moved to (evolution of) Messenger chat service
• Calling, file-transfer, contacts, etc moved to new services
• P2P network officially being decommissioned in Fall 2018
Outline
• Original P2P architecture
• P2P compared to Modern service architecture
• Why not P2P?
• Migrating from old to new architectures
• Doing it well: Experimentation at massive client scale
Skype P2P Architecture
P2P Network formed by clients
Backend team running mostly DB-based services
Shared Library with clients (data structures, etc)
Services were thin shim on top of sharded PG SQL
PG bouncer: Transparently sharded stored procedures
LUX + DUB
P2P Contact Invites
Search for users across SNs
Send invite (signed) to target via P2P
Receive signed ack with secret.
Update local and feed to other nodes
Lazy sync to backend
High Availability in P2P
P2P Network implements HA
• Invites easily sent when both clients online
Backend forwards P2P invite
• When invitee offline
Operation completed by clients
Changes to contact list lazily synced to DB
AP
CPCAP Theorem
• P2P Network is AP
• BE DBs are CP
Breaking apart P2P Contact Changes
“Changes lazily synced to DB”
Sequence of changes sent to clients and DB
DB syncs to clients
Eventually all DB and clients see same result AP
CP
CRDT?
“J” in JCS was for Journaled
Distributed Service vs P2P Architecture
Clients
Service
Distributed DB
Storage
Contacts Contacts Contacts Contacts
CP
CP
Why not P2P?
Desktop apps no longer dominant
Servers cheap
Need to support mobile
Offline messaging, suggestions, server-side search, browser state
Business logic (and service implementation) in clients, not services
Can still do P2P media and E2E encryption in service-based systems
Migrations
Supernodes->Dedicated Supernodes->Trouter
Chat: P2P -> P2P+Griffin -> Messenger -> New Chat Service
Contacts: CBL->JCS->ABCH->PCS->EXO
Calling: P2P -> NGC
Login: Skype -> MSA
Dual-head vs Gateway
Contacts
P2P
Calling
New
Calling
P2P
Calling
New
Contacts
Contacts
Service
Contacts
Gateway
Dual-Stack: Calling and Chat
P2P
Calling
New
Calling
P2P
Calling
Calling
Service
Call Alice
Call Alice
Chat
Service
Alice: Hi Bob!
Bob: Hi Alice!
New
Chat
Alice: Hi Bob!
Bob: Hi Alice!
Technology gateway: Dual-headed with Help
P2P requires clients running continuously
Mobile devices don’t…
P2P
Calling
P2P GW
P2P
Calling
New
Calling
Call Alice
Call Alice
Push
Notifications
Gateway for Contact migrations
Contacts
New
Contacts
Contacts
Service
Contacts
Gateway
Contacts
Contacts
Migration 1
Move Contact Data
Migration 2
Update Client
Contact migrations
Contacts
New
Contacts
Contacts
Service
Contacts
Gateway
Get
Contacts
Get Contacts?
Get
Contacts?
Flags:
Migration in Progress
Migrated
Flag:
Is Master
Write Blobs
to Cache
When to migrate?
Contacts
New
Contacts
Contacts
Service
Contacts
Gateway
Contacts
Migration 1
Move Contact Data
Migration 2
Update Client
Need for Online Experimentation
Even objective metrics are a function of
• Product quality
• Seasonal/weekly effects
• User population
• Device population
• Usage scenario
These aren’t stable across new client releases
Need robust online experimentation to separate
new calling implementation from other factors.
6/26/2018 MICROSOFT CONFIDENTIAL Lync + Skype 17
Early Adopter Bias
Seasonality, Overall Trends
Experimentation – When to use A/B Testing?
When to use A/B testing:
• Making a data-driven decision about the impact of a change in the product
How is A/B testing different from "monitoring metrics before and after a change":
• A/B testing is the only valid method to draw causal inference – i.e. the changes in metric behavior cannot
be attributed to any particular change in code unless in a randomized treatment assignment (A/B testing)
setting
Why set up automated scorecards vs manually aggregating data into test statistics:
• To make sure the results are trustworthy – it is easy to be misled by data!
• To scale the experimentation so you don’t need a data scientist for every single experiment analysis
• To have a standard procedure that controls the rates of false positive/negative in long run over the entire
org
First step for getting started on experimentation:
• Data!
• Decide about which metrics are to be used for tracking the improvements - they should be aligned with T0 KPIs of the org
• Make the data available for querying with experimentation labels (e.g. knowing which each calls fell into)
• Link data to a validated scorecard
Experimentation Lifecycle
Experimentation Lifecycle, Client Edition
Experimentation Requirements
Many teams
• Self-service
• Structured Config
Configuration-centric
• Long-lived clients know what, not why
High-quality scorecards
• A&E Experimentation team evolved out of Bing
Experimentation and Configuration Service (ECS) was built to address
the flighting and configuration portion of experimentation.
Configuration-Centric View
Straightforward approach gives the client configuration describing its
situation, and client decides what to do.
ECSApplication
Presents Client Context
Relevant Configurations
ClientLib
ConfigValueA =
ClientLib.GetSettings(“Shutdown.
A”) ??
ClientLib.GetSettings(“Region.A”)
??
ClientLib.GetSettings(“Rollout.A”)
Configuration-Centric View
But reasons to change behavior interact
Resolving these collision manually and statically is not scalable
IF Ver>2.0 && 80% THEN A=5
IF Version>1.0 THEN A=3
IF Country=Australia THEN A=4
IF Shutdown THEN A=0
IF NOT Shutdown AND Country != Australia AND
Version>2.0 && 80% THEN A=5
IF NOT Shutdown AND Country != Australia AND !(Version
> 2.0 && 80%) AND Version>1.0 THEN A=3
IF NOT Shutdown AND Country=Australia THEN A=4
IF Shutdown THEN A=0
Configuration-Centric View
...becomes a Live-site issue
What if the Australia setup needs to be turned off? It is more
manageable to disable the precise setup
IF Ver>2.0 && 80% THEN A=5
IF Version>1.0 THEN A=3
IF Country=Australia THEN A=4
IF Shutdown THEN A=0
IF NOT Shutdown AND Country != Australia AND
Version>2.0 && 80% THEN A=5
IF NOT Shutdown AND Country != Australia AND !(Version
> 2.0 && 80%) AND Version>1.0 THEN A=3
IF NOT Shutdown AND Country=Australia THEN A=4
IF Shutdown THEN A=0
Configuration-Centric View
Applications are made to be Configurable
Applications should only be concerned on What it should be
configured to, not Why
ECSApplication
Presents Client Context
Relevant Configurations
ClientLib
ConfigValueA =
ClientLib.GetSettings(“A”)
Configuration-Centric View
And the reason to configure will be many
As the number of reasons scale, the reasons will collide
Need Tie-breaking Rules
ECSApplication
Presents Client Context
Relevant Configurations
ClientLib
ConfigValueA =
ClientLib.GetSettings(“A”)
Many Reasons to Configure:
• Experimentation
• Feature Rollouts to X%
• Regional Settings
• Exposure to User/Tenants
(Murphy/Rings)
• Live-site assistance
• Traffic Routing
• Sampling
• Any combinations (e.g. 5% of
Ring 2 in Europe)
• and many more…
Configuration-Centric View
ECS configuration approach is to provide a set of Tie-breaking rules
for users, but let the service resolve the collision dynamically
ECSApplication
Presents Client Context
Relevant Configurations
ClientLib
ConfigValueA =
ClientLib.GetSettings(“A”)
Value of A
INPUT: Version = 3.0, Country = US,
Shutdown = false, UserID=myuser
OUTPUT: 5
IF Ver>2.0 and 80% THEN A=5
IF Version>1.0 THEN A=3
IF Country=Australia THEN A=4
IF Shutdown THEN A=0
Configuration-Centric View
ECS
Application
Presents Client Context
Relevant Configurations
ClientLib
ConfigValueA =
ClientLib.GetSettings(“A”)
Experiment
Rollout
Ring-Based
Sampling
Configuration
Prioritization
(Config Merge,
Layer Order,
Priority Order) Shutdown
Default
External
……
No Client-Service Contract Change
Example: Configuration with Rings
• Decoupling who the user is from how the application is configured
• No Client-Server contract change. No Mobile re-deployment for Rings
Application
ECS
Resolves User
to Ring X
Presents Client Context (UserID, TenantID)
Relevant Configurations for Ring X
ClientLib
+Cache
Ring
Definition
(ECS)
Ring
Definition
(Partner)
Translate to
empower
ECS Ring Filters
ConfigID
Identify each experiment, rollout, default
Needed for debugging and analysis
<Type-ExpID-TreatID-Iteration>
Configuration Merge
Experiment Config
“SkypeAndroid": {
"ShortCircuit": true
}
Rollout Config
“SkypeAndroid": {
"PhoneVerification": false,
“ShortCircuit”: false
}
Merged Config
“SkypeAndroid": {
"ShortCircuit": true,
"PhoneVerification": false
}
ETag
ETag is a hash of the set of ConfigIDs being served
ETag-ConfigID mapping is forwarded to data pipeline by ECS service
Client Telemetry is logged with the ETag
Data Analysis to associate telemetry with an iteration of the treatment
• Client Telemetry.Etag
• Data Analysis.ConfigID
• Service log: Etag-ConfigID Map
Also useful for debugging client implementation
Impression-based vs Sticky
Conventional experimentation:
• Numberline assigns user to experiment. Experiment is sticky.
• Analyze impact over time
What if your experiment is more risky?
• Next-gen code frequently known not to be better (yet)
• Still need to get real-world experiment
• “Impression-based” assign at random each fetch
• No one gets broken experience for more than an hour/restart
Importance of Scorecards
Changes in important metrics
ALL metrics, not just intended by experiment
P-values of changes to confirm caused by experiment
Unanswered call UX experiment
- Higher ratio of established calls
- BUT, Call Drop Ratio is up by 0.07% overall, caused by PSTN
drops
Likely explanation: retrying a failed call on PSTN isn’t useful on a bad
network
Experiments can have unexpected consequences on other scenarios.
A scorecard capturing important metrics across all scenarios is
needed to find unintended consequences.
PSTN Calls only
ECS Today
Scale (as of 6/8/18)
479 Project Teams
Currently running:
Experiments 388
Rollouts 2.74K
Defaults 701
12.69K Complex Configs
3.83K layers (uniquely salted numberline)
~140K RPS (daily peak)
Used by Skype & Teams clients and services. Most Office apps, etc…
Lessons from Skype’s evolution
P2P
• Architecture is different, but same HA principles can be achieved
• Solved problems originally, but became a bottleneck over time
Migrations
• Config support plans for your next migration in advance
• Pick strategy based on complexity of transition
Experimentation
• Migration (and other changes) require robust experimentation
• Don’t bake in experiments: What not Why!
Acknowledgements
Many, many people at Skype and Microsoft built the systems described
here and implemented the strategies to migrate users to newer systems.
Special thanks to Eric Lau, Michael Rubin, Daniel Schneider, and the ECS
team. The E2E experimentation pipeline includes major components
developed by the Aria, A&E EXP, IC3 Media, and other partner teams.
bruce.lowekamp@skype.net
https://linkedin.com/in/brucelowekamp
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
skype-p2p

Más contenido relacionado

Más de C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Más de C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Skype's Journey from P2P: It's Not Just about the Services

  • 1. Skype’s Journey From P2P: It’s Not Just About the Services Bruce Lowekamp People and Connections June 27, 2018 Microsoft’s Intelligent Conversations and Communications Cloud (IC3) Powering Skype, Teams, and O365
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ skype-p2p
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Skype History • First released in 2003 • P2P, based on Global Index originally used for KaZaa file sharing • Chat, audio, video, file sharing, contact invites all over P2P • Acquired by Microsoft in 2011 • Supernodes moved to datacenters • Chat moved to (evolution of) Messenger chat service • Calling, file-transfer, contacts, etc moved to new services • P2P network officially being decommissioned in Fall 2018
  • 5. Outline • Original P2P architecture • P2P compared to Modern service architecture • Why not P2P? • Migrating from old to new architectures • Doing it well: Experimentation at massive client scale
  • 6. Skype P2P Architecture P2P Network formed by clients Backend team running mostly DB-based services Shared Library with clients (data structures, etc) Services were thin shim on top of sharded PG SQL PG bouncer: Transparently sharded stored procedures LUX + DUB
  • 7. P2P Contact Invites Search for users across SNs Send invite (signed) to target via P2P Receive signed ack with secret. Update local and feed to other nodes Lazy sync to backend
  • 8. High Availability in P2P P2P Network implements HA • Invites easily sent when both clients online Backend forwards P2P invite • When invitee offline Operation completed by clients Changes to contact list lazily synced to DB AP CPCAP Theorem • P2P Network is AP • BE DBs are CP
  • 9. Breaking apart P2P Contact Changes “Changes lazily synced to DB” Sequence of changes sent to clients and DB DB syncs to clients Eventually all DB and clients see same result AP CP CRDT? “J” in JCS was for Journaled
  • 10. Distributed Service vs P2P Architecture Clients Service Distributed DB Storage Contacts Contacts Contacts Contacts CP CP
  • 11. Why not P2P? Desktop apps no longer dominant Servers cheap Need to support mobile Offline messaging, suggestions, server-side search, browser state Business logic (and service implementation) in clients, not services Can still do P2P media and E2E encryption in service-based systems
  • 12. Migrations Supernodes->Dedicated Supernodes->Trouter Chat: P2P -> P2P+Griffin -> Messenger -> New Chat Service Contacts: CBL->JCS->ABCH->PCS->EXO Calling: P2P -> NGC Login: Skype -> MSA
  • 14. Dual-Stack: Calling and Chat P2P Calling New Calling P2P Calling Calling Service Call Alice Call Alice Chat Service Alice: Hi Bob! Bob: Hi Alice! New Chat Alice: Hi Bob! Bob: Hi Alice!
  • 15. Technology gateway: Dual-headed with Help P2P requires clients running continuously Mobile devices don’t… P2P Calling P2P GW P2P Calling New Calling Call Alice Call Alice Push Notifications
  • 16. Gateway for Contact migrations Contacts New Contacts Contacts Service Contacts Gateway Contacts Contacts Migration 1 Move Contact Data Migration 2 Update Client
  • 19. Need for Online Experimentation Even objective metrics are a function of • Product quality • Seasonal/weekly effects • User population • Device population • Usage scenario These aren’t stable across new client releases Need robust online experimentation to separate new calling implementation from other factors. 6/26/2018 MICROSOFT CONFIDENTIAL Lync + Skype 17 Early Adopter Bias Seasonality, Overall Trends
  • 20. Experimentation – When to use A/B Testing? When to use A/B testing: • Making a data-driven decision about the impact of a change in the product How is A/B testing different from "monitoring metrics before and after a change": • A/B testing is the only valid method to draw causal inference – i.e. the changes in metric behavior cannot be attributed to any particular change in code unless in a randomized treatment assignment (A/B testing) setting Why set up automated scorecards vs manually aggregating data into test statistics: • To make sure the results are trustworthy – it is easy to be misled by data! • To scale the experimentation so you don’t need a data scientist for every single experiment analysis • To have a standard procedure that controls the rates of false positive/negative in long run over the entire org First step for getting started on experimentation: • Data! • Decide about which metrics are to be used for tracking the improvements - they should be aligned with T0 KPIs of the org • Make the data available for querying with experimentation labels (e.g. knowing which each calls fell into) • Link data to a validated scorecard
  • 23. Experimentation Requirements Many teams • Self-service • Structured Config Configuration-centric • Long-lived clients know what, not why High-quality scorecards • A&E Experimentation team evolved out of Bing Experimentation and Configuration Service (ECS) was built to address the flighting and configuration portion of experimentation.
  • 24. Configuration-Centric View Straightforward approach gives the client configuration describing its situation, and client decides what to do. ECSApplication Presents Client Context Relevant Configurations ClientLib ConfigValueA = ClientLib.GetSettings(“Shutdown. A”) ?? ClientLib.GetSettings(“Region.A”) ?? ClientLib.GetSettings(“Rollout.A”)
  • 25. Configuration-Centric View But reasons to change behavior interact Resolving these collision manually and statically is not scalable IF Ver>2.0 && 80% THEN A=5 IF Version>1.0 THEN A=3 IF Country=Australia THEN A=4 IF Shutdown THEN A=0 IF NOT Shutdown AND Country != Australia AND Version>2.0 && 80% THEN A=5 IF NOT Shutdown AND Country != Australia AND !(Version > 2.0 && 80%) AND Version>1.0 THEN A=3 IF NOT Shutdown AND Country=Australia THEN A=4 IF Shutdown THEN A=0
  • 26. Configuration-Centric View ...becomes a Live-site issue What if the Australia setup needs to be turned off? It is more manageable to disable the precise setup IF Ver>2.0 && 80% THEN A=5 IF Version>1.0 THEN A=3 IF Country=Australia THEN A=4 IF Shutdown THEN A=0 IF NOT Shutdown AND Country != Australia AND Version>2.0 && 80% THEN A=5 IF NOT Shutdown AND Country != Australia AND !(Version > 2.0 && 80%) AND Version>1.0 THEN A=3 IF NOT Shutdown AND Country=Australia THEN A=4 IF Shutdown THEN A=0
  • 27. Configuration-Centric View Applications are made to be Configurable Applications should only be concerned on What it should be configured to, not Why ECSApplication Presents Client Context Relevant Configurations ClientLib ConfigValueA = ClientLib.GetSettings(“A”)
  • 28. Configuration-Centric View And the reason to configure will be many As the number of reasons scale, the reasons will collide Need Tie-breaking Rules ECSApplication Presents Client Context Relevant Configurations ClientLib ConfigValueA = ClientLib.GetSettings(“A”) Many Reasons to Configure: • Experimentation • Feature Rollouts to X% • Regional Settings • Exposure to User/Tenants (Murphy/Rings) • Live-site assistance • Traffic Routing • Sampling • Any combinations (e.g. 5% of Ring 2 in Europe) • and many more…
  • 29. Configuration-Centric View ECS configuration approach is to provide a set of Tie-breaking rules for users, but let the service resolve the collision dynamically ECSApplication Presents Client Context Relevant Configurations ClientLib ConfigValueA = ClientLib.GetSettings(“A”) Value of A INPUT: Version = 3.0, Country = US, Shutdown = false, UserID=myuser OUTPUT: 5 IF Ver>2.0 and 80% THEN A=5 IF Version>1.0 THEN A=3 IF Country=Australia THEN A=4 IF Shutdown THEN A=0
  • 30. Configuration-Centric View ECS Application Presents Client Context Relevant Configurations ClientLib ConfigValueA = ClientLib.GetSettings(“A”) Experiment Rollout Ring-Based Sampling Configuration Prioritization (Config Merge, Layer Order, Priority Order) Shutdown Default External ……
  • 31. No Client-Service Contract Change Example: Configuration with Rings • Decoupling who the user is from how the application is configured • No Client-Server contract change. No Mobile re-deployment for Rings Application ECS Resolves User to Ring X Presents Client Context (UserID, TenantID) Relevant Configurations for Ring X ClientLib +Cache Ring Definition (ECS) Ring Definition (Partner) Translate to empower ECS Ring Filters
  • 32. ConfigID Identify each experiment, rollout, default Needed for debugging and analysis <Type-ExpID-TreatID-Iteration>
  • 33. Configuration Merge Experiment Config “SkypeAndroid": { "ShortCircuit": true } Rollout Config “SkypeAndroid": { "PhoneVerification": false, “ShortCircuit”: false } Merged Config “SkypeAndroid": { "ShortCircuit": true, "PhoneVerification": false }
  • 34. ETag ETag is a hash of the set of ConfigIDs being served ETag-ConfigID mapping is forwarded to data pipeline by ECS service Client Telemetry is logged with the ETag Data Analysis to associate telemetry with an iteration of the treatment • Client Telemetry.Etag • Data Analysis.ConfigID • Service log: Etag-ConfigID Map Also useful for debugging client implementation
  • 35. Impression-based vs Sticky Conventional experimentation: • Numberline assigns user to experiment. Experiment is sticky. • Analyze impact over time What if your experiment is more risky? • Next-gen code frequently known not to be better (yet) • Still need to get real-world experiment • “Impression-based” assign at random each fetch • No one gets broken experience for more than an hour/restart
  • 36. Importance of Scorecards Changes in important metrics ALL metrics, not just intended by experiment P-values of changes to confirm caused by experiment Unanswered call UX experiment - Higher ratio of established calls - BUT, Call Drop Ratio is up by 0.07% overall, caused by PSTN drops Likely explanation: retrying a failed call on PSTN isn’t useful on a bad network Experiments can have unexpected consequences on other scenarios. A scorecard capturing important metrics across all scenarios is needed to find unintended consequences. PSTN Calls only
  • 37. ECS Today Scale (as of 6/8/18) 479 Project Teams Currently running: Experiments 388 Rollouts 2.74K Defaults 701 12.69K Complex Configs 3.83K layers (uniquely salted numberline) ~140K RPS (daily peak) Used by Skype & Teams clients and services. Most Office apps, etc…
  • 38. Lessons from Skype’s evolution P2P • Architecture is different, but same HA principles can be achieved • Solved problems originally, but became a bottleneck over time Migrations • Config support plans for your next migration in advance • Pick strategy based on complexity of transition Experimentation • Migration (and other changes) require robust experimentation • Don’t bake in experiments: What not Why!
  • 39. Acknowledgements Many, many people at Skype and Microsoft built the systems described here and implemented the strategies to migrate users to newer systems. Special thanks to Eric Lau, Michael Rubin, Daniel Schneider, and the ECS team. The E2E experimentation pipeline includes major components developed by the Aria, A&E EXP, IC3 Media, and other partner teams. bruce.lowekamp@skype.net https://linkedin.com/in/brucelowekamp
  • 40. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ skype-p2p