SlideShare a Scribd company logo
1 of 29
Why is scale important? 
80000 
70000 
60000 
50000 
40000 
30000 
20000 
10000 
0 
Missed opportunity 
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
Usage Difficulty 
“Do things that don’t scale!” 
Permanent scaling need 
But scale if it’s on the way.
A tale of two startups 
(“Or how I spent 2013…”) 
Clipless 
Built to scale. 
v1 developed in 3 months. 
PR blast to TechCrunch, AndroidPolice, etc. 
led to 1700% month over month growth. 
Handling over 10,000 QPS. 
Acquired 3 months from launch. 
Shark Tank Startup 
Scaling ignored. 
v1 developed in 3 months. 
Reran on Shark Tank, service and website 
went down almost immediately. 
Still slow (but steady) growth.
What was different? 
Clipless (Tomcat, 1-3 Digital Ocean VMs) 
Load balanced, replicated servers and DBs. 
Well-written RESTful API, any server could 
answer any query. 
Multithreaded backend. 
Batched, asynchronous DB operations. 
Caching by locality and time. 
Queued network operations. 
S.T. Startup (Ruby on Rails, Heroku) 
No load balancing. 
Replicated DB via Heroku postgres. 
Not truly REST, backends kept state. 
Single-threaded backend (one request 
blocked entire Heroku dyno). 
Direct, blocking DB access. 
DB caching via ActiveRecord.
Potential Bottlenecks 
• Client resources 
• CPU 
• Memory 
• I/O 
• Server resources 
• Database resources 
• Open connections 
• Running queries 
• Network resources 
• Bandwidth 
• Connections / open sockets 
• Availability (esp. on Wifi / mobile networks)
Potential Bottlenecks 
• Client resources 
• CPU 
• Memory 
• I/O 
• Server resources 
• Database resources 
• Network resources 
Profile your algorithms 
Crunch less data 
Reuse more old work 
Offload some processing to the server 
• Bandwidth 
• Connections / open sockets 
• Availability (esp. on Wifi / mobile networks)
Potential Bottlenecks 
• Client resources 
• Server resources 
• CPU 
• Memory 
• I/O 
• Database resources 
• Network resources 
Profile your algorithms 
Crunch less data 
Reuse more old work (across users) 
Divide and Conquer (“shard”) 
Spin up and balance more servers 
• Bandwidth 
• Connections / open sockets 
• Availability (esp. on Wifi / mobile networks)
Potential Bottlenecks 
• Client resources 
• Server resources 
• Database resources 
• Open connections 
• Running queries 
• Network resources 
Optimize your queries 
Connection pooling 
Add a second-level cache 
Reuse more old work (across users) 
Divide and Conquer (“shard”) 
Batch DB requests 
Spin up and replicate more DBs 
• Bandwidth 
• Connections / open sockets 
• Availability (esp. on Wifi / mobile networks)
Potential Bottlenecks 
• Client resources 
• Server resources 
• Database resources 
• Network resources 
• Bandwidth 
• Connections / open sockets 
• Availability (esp. on Wifi / mobile networks) 
Add a local cache 
Send diffs 
Compress responses (CPU tradeoff) 
Connection pooling 
Batch network requests
Profiling 
(Diagnosing the problem) 
Purpose: find the “hotspots” in your program. 
Things you care about: 
• “CPU time” – time spent processing your program’s instructions. 
• “Memory” – RAM being used to store your program’s data. 
• “Wall time” – overall time spent waiting for the program. 
• Methods: 
• Basic: “Stopwatch” 
• Advanced: Profiler (e.g. jprof, jprofiler, hprof, Netbeans, Visual Studio)
Stopwatch 
• Easy: just time methods. 
Matlab: 
function [result] = do_something_expensive(data) 
tic 
… 
toc 
end 
• In Java, use Guava’s Stopwatch class (start() and stop() methods).
Profiler
Strategies
Caching and Reuse 
“There are only two hard things in Computer Science: 
cache invalidation and naming things.” --Phil Karlton 
• Trades off CPU for space. 
• Look for repetition of input. 
(Including subproblems) 
• Compute a key from the input. 
• Associate the result with the key. 
• Important: algorithm must be a 
deterministic mapping from input 
to output. 
• Important: if you change what the 
algorithm depends on, update the 
cache key. 
Name: Alice 
Job: Developer 
Salary: 100,000 
<Alice, 
a@co.com> 
Cache
Computing a Cache Key 
• Hashing is a good strategy. 
• Object.hash (JDK7) / Objects.hashCode (Guava) 
• Beware: Hashes can collide – sanity check results! 
• Searching: 
• Hash data 
• Query cache for hash key. 
• If found, return associated value. 
• If not, query live service and store the result in the cache. 
<Alice, 
a@co.com> 
0xAF724…
Concurrency 
Sequential programs run like this: 
Work Work Work 
Concurrent programs run like this: 
Work 
Work 
Work 
A lot of time 
Less time
Race Conditions 
Problem: Two threads can simultaneously write to the same variables. 
If you ran this code in two threads: 
if (x < 1) { x++; } 
Then x would usually end up at 1. 
But sometimes it would be 2! 
• Race conditions such as that one are among the hardest bugs to find + fix. 
• Three ways to manage this: 
• Immutability 
• Local state 
• Synchronization 
• Race conditions only happen when you write to shared, mutable state.
Immutability 
• General tip: try to minimize the number of states your program can end up in. 
• Concurrency 
• REST 
• (And your programs will just have less state, so you’ll produce fewer bugs) 
• Declare variables final where possible, set them in the constructor, and don’t write setters unless you must: 
// String is an immutable type - can’t change it at runtime. 
// foo is an immutable variable - can’t reassign it. 
private final String foo; 
public Bar(String foo) { 
this.foo = Preconditions.checkNotNull(foo); 
}
Local State 
• Sometimes you need to modify state. 
• But you can still avoid locking if it’s only visible to you: 
• Two threads can write copies of same data. 
• Optionally, can be merged back in single thread afterwards. 
• (This is how MapReduce works) 
Java inner classes help tremendously with this! 
// Every time you run sendToNetwork, you’ll use a new channel. No shared state! 
void sendToNetwork() { 
final Channel channel = new HttpChannel(context); 
channel.connect(); 
Thread foo = new Thread() { 
@Override 
public void run() { 
channel.send(“I am the jabberwocky”); 
} 
} 
}
Synchronization 
• If you do need to write shared state, you need to synchronize access to it. 
• Last resort: slows your program and deadlock-prone. 
Object lock; 
synchronized (lock) { 
if (x < 1) { x++; } 
} 
Now x is always 1! No interruption possible between read and write. 
• More advanced: read/write locks (ReentrantReadWriteLock…) 
• Also check out Java “Atomic” classes and “concurrent” collections: 
• AtomicBoolean, AtomicInteger, … 
• ConcurrentHashMap…
Futures 
• Threads compute asynchronously. 
• Caller wants some way of knowing the result when it’s ready. 
• Future: handle to a result that may or may not be available yet. 
• future.get(): waits for a result and returns it, with optional timeout. 
• Futures allow for asynchronous calls to immediately return, and for the program to wait for the results when it’s convenient. 
• Also see Guava’s ListenableFuture. 
The usual pattern: 
ThreadPoolExecutor pool; 
Callable<String> action = new Callable<String>() { 
@Override 
public String call() throws NetworkException { 
return askTheNetworkForMyString(); 
} 
}; 
Future<String> result = pool.submit(callable); 
String myString = result.get(); // Waits until the result is available. Throws if an exception was thrown inside the Callable.
REST 
• Scalable client / server architecture. 
• Sockets are complicated, usually uses HTTP. 
• Each HTTP request hits an “endpoint”, which does one thing. 
e.g. GET http://api.clipless.co/json/deals/near/Times_Square 
• Principles: 
• Server does not store state (see immutability) 
• Responses can be cached (see caching) 
• Client doesn’t care if server is final endpoint or proxy. 
• State usually ends up in DB, server communicates with client using tokens.
Clipless Architecture 
Protobuf over HTTP 
10,000 reqs / second 
Apache (mod_proxy_balancer) 
Tomcat 
MySQL 
Content- 
Addressable 
Cache 
Content- 
Addressable 
Cache
Static Content 
• Static content (e.g. HTML, images) is highly cacheable. 
• Easiest way to cache: use a CDN 
• Akamai, S3, CloudFlare, CloudFront, MaxCDN, … 
• Cache key: 
• Some HTTP headers (inc. Cache-Control header) 
• Page requested 
• Last-modified (e.g. from a “HEAD” to your server) 
• Added bonus: most CDNs are “closer” to your users than your server. 
• Compressing content reduces bandwidth: 
• Browsers usually support gzip decompression. 
• Apache, nginx: Gzip compression plugins 
• Javascript / CSS: minification 
• Images: Google PageSpeed service / CloudFlare 
• Program data: Protocol Buffers, Thrift 
• Why use your bandwidth when you can use someone else’s?
Sharding 
Alice 
Bob 
Mallory 
Requests A-L Requests M-Z
Batching Network Requests 
The Operation Queue / Proactor Pattern 
Producer 
Producer 
Worker Thread Pool 
Thread-safe queue 
Work 
NetworkListener 
onUp: queue.resume() 
onDown: queue.suspend() 
Work Work 
Producer 
ListenableFuture<Result>
How to Test 
• Mock large amounts of data, measure performance 
• Can be automated so you never encounter performance regressions 
• Network stress tests 
• ab 
• blitz 
• loader.io 
• ulimits 
• Packet sniffers 
• Round trip time services, e.g. NewRelic.
General Principles 
• Scale when you anticipate the need. 
• Scale eagerly when you don’t need to go far out of the way. 
• CDNs and Gzip compression good examples. 
• Or when retrofitting will be painful. 
• RESTful architecture from the beginning: much easier than tacking it on later! 
• But caching is usually easy to add later. 
• Focus on the big improvements: 
• 80/20 rule 
• Profile and knock out the biggest CPU / memory hogs first. 
• Practice and internalize to reduce scaling costs! 
• Concurrency is much easier with mastery. 
• Caching seems much easier with mastery, often isn’t. 
• Internalize immutability and you’ll just write better code.
Thanks! 
Good luck, and always bring mangosteens to acquisition talks.

More Related Content

More from New York City College of Technology Computer Systems Technology Colloquium

More from New York City College of Technology Computer Systems Technology Colloquium (10)

Ontology-based Classification and Faceted Search Interface for APIs
Ontology-based Classification and Faceted Search Interface for APIsOntology-based Classification and Faceted Search Interface for APIs
Ontology-based Classification and Faceted Search Interface for APIs
 
Towards Improving Interface Modularity in Legacy Java Software Through Automa...
Towards Improving Interface Modularity in Legacy Java Software Through Automa...Towards Improving Interface Modularity in Legacy Java Software Through Automa...
Towards Improving Interface Modularity in Legacy Java Software Through Automa...
 
Data-driven, Interactive Scientific Articles in a Collaborative Environment w...
Data-driven, Interactive Scientific Articles in a Collaborative Environment w...Data-driven, Interactive Scientific Articles in a Collaborative Environment w...
Data-driven, Interactive Scientific Articles in a Collaborative Environment w...
 
Cloud Technology: Virtualization
Cloud Technology: VirtualizationCloud Technology: Virtualization
Cloud Technology: Virtualization
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
 
Static Analysis and Verification of C Programs
Static Analysis and Verification of C ProgramsStatic Analysis and Verification of C Programs
Static Analysis and Verification of C Programs
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
Introduction to new features in java 8
Introduction to new features in java 8Introduction to new features in java 8
Introduction to new features in java 8
 
Android Apps the Right Way
Android Apps the Right WayAndroid Apps the Right Way
Android Apps the Right Way
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

Software at Scale

  • 1.
  • 2. Why is scale important? 80000 70000 60000 50000 40000 30000 20000 10000 0 Missed opportunity Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Usage Difficulty “Do things that don’t scale!” Permanent scaling need But scale if it’s on the way.
  • 3. A tale of two startups (“Or how I spent 2013…”) Clipless Built to scale. v1 developed in 3 months. PR blast to TechCrunch, AndroidPolice, etc. led to 1700% month over month growth. Handling over 10,000 QPS. Acquired 3 months from launch. Shark Tank Startup Scaling ignored. v1 developed in 3 months. Reran on Shark Tank, service and website went down almost immediately. Still slow (but steady) growth.
  • 4. What was different? Clipless (Tomcat, 1-3 Digital Ocean VMs) Load balanced, replicated servers and DBs. Well-written RESTful API, any server could answer any query. Multithreaded backend. Batched, asynchronous DB operations. Caching by locality and time. Queued network operations. S.T. Startup (Ruby on Rails, Heroku) No load balancing. Replicated DB via Heroku postgres. Not truly REST, backends kept state. Single-threaded backend (one request blocked entire Heroku dyno). Direct, blocking DB access. DB caching via ActiveRecord.
  • 5. Potential Bottlenecks • Client resources • CPU • Memory • I/O • Server resources • Database resources • Open connections • Running queries • Network resources • Bandwidth • Connections / open sockets • Availability (esp. on Wifi / mobile networks)
  • 6. Potential Bottlenecks • Client resources • CPU • Memory • I/O • Server resources • Database resources • Network resources Profile your algorithms Crunch less data Reuse more old work Offload some processing to the server • Bandwidth • Connections / open sockets • Availability (esp. on Wifi / mobile networks)
  • 7. Potential Bottlenecks • Client resources • Server resources • CPU • Memory • I/O • Database resources • Network resources Profile your algorithms Crunch less data Reuse more old work (across users) Divide and Conquer (“shard”) Spin up and balance more servers • Bandwidth • Connections / open sockets • Availability (esp. on Wifi / mobile networks)
  • 8. Potential Bottlenecks • Client resources • Server resources • Database resources • Open connections • Running queries • Network resources Optimize your queries Connection pooling Add a second-level cache Reuse more old work (across users) Divide and Conquer (“shard”) Batch DB requests Spin up and replicate more DBs • Bandwidth • Connections / open sockets • Availability (esp. on Wifi / mobile networks)
  • 9. Potential Bottlenecks • Client resources • Server resources • Database resources • Network resources • Bandwidth • Connections / open sockets • Availability (esp. on Wifi / mobile networks) Add a local cache Send diffs Compress responses (CPU tradeoff) Connection pooling Batch network requests
  • 10. Profiling (Diagnosing the problem) Purpose: find the “hotspots” in your program. Things you care about: • “CPU time” – time spent processing your program’s instructions. • “Memory” – RAM being used to store your program’s data. • “Wall time” – overall time spent waiting for the program. • Methods: • Basic: “Stopwatch” • Advanced: Profiler (e.g. jprof, jprofiler, hprof, Netbeans, Visual Studio)
  • 11. Stopwatch • Easy: just time methods. Matlab: function [result] = do_something_expensive(data) tic … toc end • In Java, use Guava’s Stopwatch class (start() and stop() methods).
  • 14. Caching and Reuse “There are only two hard things in Computer Science: cache invalidation and naming things.” --Phil Karlton • Trades off CPU for space. • Look for repetition of input. (Including subproblems) • Compute a key from the input. • Associate the result with the key. • Important: algorithm must be a deterministic mapping from input to output. • Important: if you change what the algorithm depends on, update the cache key. Name: Alice Job: Developer Salary: 100,000 <Alice, a@co.com> Cache
  • 15. Computing a Cache Key • Hashing is a good strategy. • Object.hash (JDK7) / Objects.hashCode (Guava) • Beware: Hashes can collide – sanity check results! • Searching: • Hash data • Query cache for hash key. • If found, return associated value. • If not, query live service and store the result in the cache. <Alice, a@co.com> 0xAF724…
  • 16. Concurrency Sequential programs run like this: Work Work Work Concurrent programs run like this: Work Work Work A lot of time Less time
  • 17. Race Conditions Problem: Two threads can simultaneously write to the same variables. If you ran this code in two threads: if (x < 1) { x++; } Then x would usually end up at 1. But sometimes it would be 2! • Race conditions such as that one are among the hardest bugs to find + fix. • Three ways to manage this: • Immutability • Local state • Synchronization • Race conditions only happen when you write to shared, mutable state.
  • 18. Immutability • General tip: try to minimize the number of states your program can end up in. • Concurrency • REST • (And your programs will just have less state, so you’ll produce fewer bugs) • Declare variables final where possible, set them in the constructor, and don’t write setters unless you must: // String is an immutable type - can’t change it at runtime. // foo is an immutable variable - can’t reassign it. private final String foo; public Bar(String foo) { this.foo = Preconditions.checkNotNull(foo); }
  • 19. Local State • Sometimes you need to modify state. • But you can still avoid locking if it’s only visible to you: • Two threads can write copies of same data. • Optionally, can be merged back in single thread afterwards. • (This is how MapReduce works) Java inner classes help tremendously with this! // Every time you run sendToNetwork, you’ll use a new channel. No shared state! void sendToNetwork() { final Channel channel = new HttpChannel(context); channel.connect(); Thread foo = new Thread() { @Override public void run() { channel.send(“I am the jabberwocky”); } } }
  • 20. Synchronization • If you do need to write shared state, you need to synchronize access to it. • Last resort: slows your program and deadlock-prone. Object lock; synchronized (lock) { if (x < 1) { x++; } } Now x is always 1! No interruption possible between read and write. • More advanced: read/write locks (ReentrantReadWriteLock…) • Also check out Java “Atomic” classes and “concurrent” collections: • AtomicBoolean, AtomicInteger, … • ConcurrentHashMap…
  • 21. Futures • Threads compute asynchronously. • Caller wants some way of knowing the result when it’s ready. • Future: handle to a result that may or may not be available yet. • future.get(): waits for a result and returns it, with optional timeout. • Futures allow for asynchronous calls to immediately return, and for the program to wait for the results when it’s convenient. • Also see Guava’s ListenableFuture. The usual pattern: ThreadPoolExecutor pool; Callable<String> action = new Callable<String>() { @Override public String call() throws NetworkException { return askTheNetworkForMyString(); } }; Future<String> result = pool.submit(callable); String myString = result.get(); // Waits until the result is available. Throws if an exception was thrown inside the Callable.
  • 22. REST • Scalable client / server architecture. • Sockets are complicated, usually uses HTTP. • Each HTTP request hits an “endpoint”, which does one thing. e.g. GET http://api.clipless.co/json/deals/near/Times_Square • Principles: • Server does not store state (see immutability) • Responses can be cached (see caching) • Client doesn’t care if server is final endpoint or proxy. • State usually ends up in DB, server communicates with client using tokens.
  • 23. Clipless Architecture Protobuf over HTTP 10,000 reqs / second Apache (mod_proxy_balancer) Tomcat MySQL Content- Addressable Cache Content- Addressable Cache
  • 24. Static Content • Static content (e.g. HTML, images) is highly cacheable. • Easiest way to cache: use a CDN • Akamai, S3, CloudFlare, CloudFront, MaxCDN, … • Cache key: • Some HTTP headers (inc. Cache-Control header) • Page requested • Last-modified (e.g. from a “HEAD” to your server) • Added bonus: most CDNs are “closer” to your users than your server. • Compressing content reduces bandwidth: • Browsers usually support gzip decompression. • Apache, nginx: Gzip compression plugins • Javascript / CSS: minification • Images: Google PageSpeed service / CloudFlare • Program data: Protocol Buffers, Thrift • Why use your bandwidth when you can use someone else’s?
  • 25. Sharding Alice Bob Mallory Requests A-L Requests M-Z
  • 26. Batching Network Requests The Operation Queue / Proactor Pattern Producer Producer Worker Thread Pool Thread-safe queue Work NetworkListener onUp: queue.resume() onDown: queue.suspend() Work Work Producer ListenableFuture<Result>
  • 27. How to Test • Mock large amounts of data, measure performance • Can be automated so you never encounter performance regressions • Network stress tests • ab • blitz • loader.io • ulimits • Packet sniffers • Round trip time services, e.g. NewRelic.
  • 28. General Principles • Scale when you anticipate the need. • Scale eagerly when you don’t need to go far out of the way. • CDNs and Gzip compression good examples. • Or when retrofitting will be painful. • RESTful architecture from the beginning: much easier than tacking it on later! • But caching is usually easy to add later. • Focus on the big improvements: • 80/20 rule • Profile and knock out the biggest CPU / memory hogs first. • Practice and internalize to reduce scaling costs! • Concurrency is much easier with mastery. • Caching seems much easier with mastery, often isn’t. • Internalize immutability and you’ll just write better code.
  • 29. Thanks! Good luck, and always bring mangosteens to acquisition talks.

Editor's Notes

  1. So before we jump into this discussion about how to scale, we should probably talk about why it’s important in the first place. Most people will tell you, especially in the startup community, that premature scaling is a bad idea. Premature scaling is the root of all evil and all that. This is true to the extent that it distracts you from getting a product out, if you’re in business, because how fast you execute is more important than how well your thing scales to users you don’t have yet. But the thing is, most scaling is actually pretty easy in the beginning. Some of it almost commonsense stuff. And internalizing that is going to make your software more robust without making your development slower. So typically, what happens is you launch, a few people use your product, and either that’s it, or it starts to gain popularity. At some point you end up in the press, and now 100,000 users flood the site that you built for 5 people at once. And do you know what happens? They see a lot of this. *click* That’s not fatal, but it’s a missed opportunity. My app was acquired after it survived such a spike. Anyway, as you grow, or if you’re solving a big research problem – something like sequence alignment in genetics – you’ll start to exhaust the low hanging fruit, and scaling will start to get harder. And then you’ll have to do it. At that point it also starts to become a competitive advantage, though: Google can do a lot of the stuff it does because it can process data at a scale no one else can.
  2. So in 2013 I was kind of crazy and was on the founding team of two startups at once. The successful one was an app called Clipless, which was basically the first daily deals app to use context to show you deals when you walked into places that offered them. The other one was a Shark Tank startup that I won’t name. There was a philosophical difference in these two: I decided to make Clipless as scalable as I could without going out of the way. I particularly focused on the things I couldn’t easily add later, like a RESTful architecture and a compact protocol representation. The other founders on the Shark Tank startup were all nontechnical people, and they kind of didn’t get the whole idea of throwing effort into something that users didn’t directly see, so it was kind of duct taped together. The first version of both apps both took about 3 months to code. Both had their moments: Clipless made it onto the front page of TechCrunch twice, the Shark Tank startup re-ran on Shark Tank. Clipless did pretty well, the Shark Tank startup not so much.
  3. So what was different? They were on different stacks – Digital Ocean vs. Heroku – but that’s not too important, honestly. Setting up the Clipless environment took maybe a day. Basically a bunch of Tomcat servers behind an Apache load balancer, you could probably also use Amazon’s elastic load balancers to good effect. Replicated MySQL setup, but that’s also not too hard once you’ve done it. That took about 20 minutes (I actually won a dorky t-shirt in a Linux wrangling competition for that, but that’s another story). And that was about it on the hardware side. Marginal time cost, but a solid stack. The other startup was using Heroku, so a lot of that was handled for them. Unfortunately, that allowed them to get kind of sloppy. They didn’t do any load balancing, kept state on their backends, didn’t try to make anything concurrent, relied on ActiveRecord’s default database caching, which wasn’t great. I also couldn’t convince these guys to do any load testing, which was frustrating. I’ll get into that later. So it didn’t kill them when they crashed, but it did prevent them from growing to the next level when the opportunity came up.
  4. So what sorts of things can we expect to deal with when scaling an app? Well, you’ve got resource usage on the client, server, database, and network. Client and server resources are basically CPU, memory, and disk. Saturate any of these and your users are going to have a bad experience. On the database side, you usually deal with the number of open connections and concurrently running queries. The topic of database optimization is a whole course in itself, and not something I’ll have the time to get into today. I’ll be glad to speak to anyone afterwards about it, though. On the network side, the two killers are bandwidth and simple availability. You don’t want to push too many bytes down the network, but you also just need the network to be up when you want to send data. That’s a particular problem when dealing with mobile devices, since they can flit on and off of networks at a whim, like if you’re on a train going into a tunnel. By the way, does anyone know why I have a picture of a slow loris up here?
  5. Let’s tackle each of these one by one. If your client is unresponsive, you’re going to get lots of “application not responding” types of dialogs, or operations will just start taking a long time. The first step is to identify what’s slowing the client down, and for that you’ll probably want to do something called profiling, which I’ll get into in a minute. Once you have a good sense of where the problem is, you can try crunching less data, maybe offloading some of that computation to the server side, and reusing more work instead of recomputing it.
  6. Same thing on the server side, more or less. You can always start up more servers, but that’s costly. You can also distribute chunks of the work to multiple servers at once; that’s called sharding and I’ll briefly discuss it later. The nice thing about caching on the server side is that it can be done across multiple users and requests.
  7. On the database side, see whether it’s connections or queries that are slowing you down. If it’s long-running queries, you need to optimize them, which might mean changing your schema or adding indices. If it’s open connections, you can try something called connection pooling, where sockets get held open after clients disconnect so they can service new clients without tearing down and reestablishing system resources, which is expensive. You can also stick a cache in front of the DB in some cases. That works wonderfully when your access pattern is very read-heavy, that is, you read much more than you write data. Going hand-in-hand with that, you can write all of your requests out to the DB at once, batching in one connection at a convenient time, instead of issuing a whole bunch of instant queries as they come in.
  8. Finally, on the network side, try reducing the amount of data you send first. That can mean sending diffs instead of full responses, or adding a local cache to avoid hitting the network at all, or connection pooling again (this time with sockets). You can also compress responses with gzip, which is something that most stacks already transparently support.
  9. So how do we diagnose performance problems? Through a process called “profiling”. Basically, we’re going to step through the software and check on how much time, memory, and I/O each method is using. Then we can identify “hot spots”, where the program is using a lot of resources, and optimize those. Usually there’s only a handful of them; performance problems usually obey an 80/20 rule. There are two ways to do this: the basic way, which is by inserting “stop watches” into your code, and the more advanced way, which is by attaching a profiler. There are a lot of profilers out there – Eclipse, Netbeans, and Visual Studio all have one – IntelliJ probably does too - then there are standalone utilities like jprofiler.
  10. A stopwatch is just an extra pair of statements in your code – one usually starts the stopwatch, the other stops it and returns the elapsed time. They work pretty much as you’d expect. I like Matlab’s stopwatch functionality – they just call the start and stop methods “tic” and “toc”. Toc prints out how much time has elapsed since tic. If you’re using Java, the Guava library has an excellent Stopwatch class which works similarly.
  11. Here’s what a profiler looks like – this is jprofiler in particular. Over here we’re looking at CPU usage and time, and it looks like there’s a hot spot when reading query results from a Postgres database. So this gives you a clue that you might want to inspect that query, figure out if the connection is being held open for a long time, or if there’s lots of data coming down in the result, maybe just send less at once. Notice that this goes all the way down into the Java built-in classes, so you can see how much time is being spent in things like strings. These are pretty advanced tools – you can actually dig into the source code and see exactly how many times and how long each call takes right next to the source. Profilers also let you see how long you’re spending doing things like waiting for locks, which is something I’ll talk about in a bit.
  12. Now that we know how to identify problems, let’s talk about ways we can speed things up. Any questions so far?
  13. The first technique I want to talk about is caching. It’s also one of the most nuanced, and can come back to bite you if you’re not careful. Basically, what you’re doing when you cache is trading off computation for space. Instead of recomputing a result, you’re going to store it in a cache the first time it’s computed, and retrieve it every time it’s requested again. Maybe that will apply for your whole result, or just a piece of it in your algorithm, you can do both. So let’s say that we’re retrieving payroll records. Alice logs in with username “Alice”, email “a@co.com”. Without a cache, we’d go straight to the service and do a database query, which could be expensive. To add caching to this, we first look Alice up in the cache. If her information is there, we just return it and skip the database query. That’s called a “cache hit”. If she isn’t there, we have a “cache miss” and have to query the live service for the information. When we do get it back, we store it in the cache so that future requests can retrieve it. In order to do this, we need a concise representation of what Alice’s data depends on. Here it’s her name and email address. This has to be unique – no one else should be identifiable by Alice’s name and email. It also needs to be deterministic – that combination should always return Alice, never someone else. We call the information required to identify Alice the cache key.
  14. How do you pick a good cache key? It depends on the information that uniquely identifies a record, but in general, hashing the input into your algorithm works well. In Java, you have the advantage of very strong built-in support for hashing, too. So basically, you turn the information into a number and associate the results with that number. Then when you want to query the cache, do the hashing, get the number, and see if an entry already exists in the cache. HashMaps are an ideal structure for this in Java. One caveat: hash functions can collide, meaning multiple records can hash to the same key. So when you get a result back, you always need to sanity check it to make sure that it’s what you asked for.
  15. Next let’s talk about concurrency, which I suspect is one of the most needlessly feared topics in CS. With CPUs racing for more and more cores, this is something you’ll have to master at some point. A standard single threaded program executes work sequentially. Do one thing, finish it, do the next, finish it, etc. This usually ends up taking a lot of time, and leaves a lot of system resources idle. It’s particularly bad if you have one long-running piece of work and a lot of short running things that could be going at the same time if it weren’t running. Here’s a concurrent program. Instead of doing a lot of work in sequence, you do it all at once. So the total time is just whatever the time of the longest task was, instead of the sum of the task times. In Java, you achieve concurrency by spawning a bunch of threads and giving them all work to do. Java’s got some very powerful utilities for managing concurrency, but it’s also easy to shoot yourself in the foot if you’re not careful. For instance…
  16. When you write multithreaded code, you’ll eventually run into a bug called a race condition. These are incredibly subtle sometimes. Like 99.9% of the time it will work perfectly fine and that other 1 in 1000 times, it will just blow up. The problem is that multiple threads can write to the same variables at the same time. So if you were to run this code in two threads at once, you could end up in a situation where you read the value of x, say “Ah ha, it’s 0”, then another thread interrupts you, increments it to 1, and returns control to you. Now your program picks up where it left off and increments it to 2, which isn’t what you wanted. The annoying thing is that this won’t usually happen. It will just happen sometimes, depending on the totally arbitrary order of when each thread decides to run. So how do we avoid this? Lots of people jump straight to locking, but I think there’s a better way. The key issue here is that both threads can write to the same variable. It’s what people call shared, mutable state. And you can avoid it.
  17. I’m going to start by discussing a strategy called immutability. If you’ve ever played around with the String class in Java, you’ll know what I mean – you can’t change a string at runtime, you can only make new strings with different data. As a general tip, state is your enemy. The more of it you have, the more complex your program will be and the more places bugs can slip in. Making things immutable basically means you promise not to change them after they’re set, which dramatically cuts down on the number of states your program can end up in. This is really useful for avoiding the race conditions I mentioned just now, because if you can’t change something, you’re guaranteed to get the same value in all threads that read it. The pattern in Java is simple: you declare your reference final, so it can only be set in the constructor, and then you make sure not to add setters to the type (or don’t use them outside of the constructor, if the type was written with them already). So you can’t change the reference or the value. Anything declared this way is totally safe to use from multiple threads. You can’t do something like incrementing x, but you’ll be surprised how far this gets you in practice. This is also useful for implementing REST, which we’ll talk about shortly, and will generally make your code less buggy.
  18. Here’s another pattern that helps: sometimes you need a new thread to do something asynchronously on each method call. Instead of sharing state, you can reinitialize whatever you need to do synchronously in the method body, and then pass it into the thread for the asynchronous part. Java lets you do this as long as the reference is final. So there’s no interference, every thread has its own copy of the object.
  19. But sometimes you really can’t avoid using shared mutable state, like x, and in that case you’ll need synchronization. Java accomplishes this by locking a synchronized block down so only one thread can access it at a time. To do this, you add the synchronized keyword to a method, or add a synchronized block, as shown. You want to make sure that both the read and the write of the variable are locked, so that another thread can’t interrupt you between reading the old value and writing the new value. Now you never need to worry about x being interrupted, because if another thread tries, it will get locked out until your thread is out of the synchronized block. Then the other thread can come in, check x, and see that it’s greater than 1. So here you’ll always get 2.
  20. There’s another concurrency trick in Java that I briefly want to mention, which is an object called a Future. The problem with asynchronous methods in general is that when you call them, you don’t get a result back immediately. So what you can do is return a Future to them, which is basically a way of saying “a result will eventually be available”, and then call get() on that future to wait for the result when it’s convenient for your program. Check this code out.
  21. So that’s concurrency on a single machine. There’s a different type of concurrency that you might want to use, and that’s farming out requests to multiple servers. For that, the REST paradigm is really useful. Lots of people think REST means “you make your requests over HTTP”. That is the most common way to do it, because HTTP is easy and makes sense for a lot of applications. Also, there are some great web servers out there already, so you won’t have to reinvent the wheel if you use HTTP. So what you do is define an endpoint, which knows how to do one thing really well. For instance, you might have an endpoint for getting all deals near an area, and then you can pass Times Square in. The deals come back as JSON (because in this case, that’s what you asked for). This is how 99% of the APIs on the web work right now, so you should be comfortable with it. But if you’re implementing a REST API, there are some more things you should know. There are three main principles, two of which dovetail with things I just spoke about: first, the server should never store state. It should be immutable. That’s why everyone passes tokens around – that’s the way of representing the state, and it’s usually carried by the client (thus why it’s called representational state transfer). The second principle is that responses should be cacheable. The same query should get you, if not the exact same results, then at least a sane and similar set of results. The nice thing about using HTTP is that you get a lot of caching for free, since the protocol has it built in. If your service is deterministic based on the parameters you’re passing into it, you’ll get caching for free from HTTP. Finally, the client shouldn’t care if the server is actually the final endpoint or if it’s a proxy on the way to the destination. This usually trivially follows from the other two, and is useful because it lets you transparently load balance and scale. Usually you do need to store some state, so the RESTful way to do it is to push that job out to the database. Then all servers can read from the database, without the client worrying about which one it’s connecting to. (Then you just need to make sure the database scales well).
  22. Here’s how I did it in Clipless. So we had the client, which peaked around 10,000 requests per second. That used protocol buffers to communicate with a couple of Apache load balancers, which in turn knew about a pool of Tomcat application servers. The load balancers forwarded the requests to Tomcat, which processed them by making queries to databases and external servers. The company was acquired before I had the chance to take it to the next level of scaling, but I would have added something called a content-addressable cache next, to alleviate database load and allow requests to be batched.
  23. But for some content, the best solution is to make scaling Somebody Else’s Problem. In general, serving your own static content – that is, HTML pages or images – is a losing game. Use a content distribution network, like CloudFlare, AWS, or Akamai, and it will do a lot of the caching for you. A lot of CDNs will also compress data transparently, which will help save bandwidth. CDNs will almost certainly have a closer server to your visitors than you do, so this will speed up your website considerably. They’re easy, cheap, and work pretty well. I’ve listed CloudFlare first because it actually has a free tier.
  24. This is pretty intuitive, so I’m just going to mention the idea, but you can also shard your data, meaning partition it by some property that divides it cleanly. For instance, at voting booths they do that whole first letter of last name thing - “A-L go stand on this line”, “M-Z go stand on this line”. That’s sharding data. You’ve created two parallel channels for things to get processed, and split the dataset roughly in half. That reduces the load on any one server, spreads it out. You can do this with client IP addresses, or geographic location, or request size, and if you do it right, you’ll keep traffic flowing smoothly through your service as it grows.
  25. Just one more thing I want to touch on, which will be of particular interest to mobile developers. The problem with mobile devices is that they tend to have unreliable network connections. You go into a building or through a tunnel and suddenly you lose connectivity. So any mobile app which uses the Internet needs to deal with this. Here’s one way to do it. If you’ve done iOS you might have seen this before – it’s called the operation queue pattern, or the “proactor” pattern (which is the multithreaded version of the “reactor” pattern). So what you can do is define a work queue and a thread pool which will process the work. This is all on the client side, so these worker threads are all within your application. Parts of your application are going to want to do network stuff – I call these producers – and so they’ll submit some work to the queue. The work can be a Java Runnable, or some other way of representing stuff you want to do. And you can dress these up with fun properties like expiration times. So these producers submit work to the queue, and the queue chugs along and send the work to the thread pool when it has spare capacity. The thread pool does the work and returns a future back to the producer, so it can get the results when they become available. Now here’s the magic part of what makes this queue special: you can hook it up to a Network listener, something that gets notified when the network goes up or down. In Android, this is just a BroadcastReceiver which listens for the “network state changed” intent. This thing can suspend or resume the queue, so that work stops being processed when the network goes down. Then when it goes back up, the work resumes and gets executed again. Here’s the code for that class, if anyone’s curious. This is much cleaner than passing callbacks all over the place.
  26. And those are all of the strategies I want to cover. So we’ve done all of this work – how do we test it? Lots of ways: you can mock your classes and pass huge amounts of data in via the mocks, see how they handle it. You can also do network stress tests using tools like apache benchmark, blitz, or loader.io – these send actual traffic at your services. I like the mocks more as a first line because that much bandwidth creates a lot of noise, but you should do this at least once before you launch. You can use ulimits on Linux to artificially limit how much CPU and memory your program gets. If it crashes under the ulimit, you know it’s using too much. Packet sniffers are good to get a sense of how much bandwidth you’re using. If you’re testing an Android program, you also have a handy tool called DDSM which monitors this. And finally, you can use round trip time services which attach to both your client and server, such as NewRelic. These will break the request time down for you so you can see where your app is having trouble.