Software at Scale

Why is scale important?
80000
70000
60000
50000
40000
30000
20000
10000
0
Missed opportunity
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Usage Difficulty
“Do things that don’t scale!”
Permanent scaling need
But scale if it’s on the way.

A tale of two startups
(“Or how I spent 2013…”)
Clipless
Built to scale.
v1 developed in 3 months.
PR blast to TechCrunch, AndroidPolice, etc.
led to 1700% month over month growth.
Handling over 10,000 QPS.
Acquired 3 months from launch.
Shark Tank Startup
Scaling ignored.
v1 developed in 3 months.
Reran on Shark Tank, service and website
went down almost immediately.
Still slow (but steady) growth.

What was different?
Clipless (Tomcat, 1-3 Digital Ocean VMs)
Load balanced, replicated servers and DBs.
Well-written RESTful API, any server could
answer any query.
Multithreaded backend.
Batched, asynchronous DB operations.
Caching by locality and time.
Queued network operations.
S.T. Startup (Ruby on Rails, Heroku)
No load balancing.
Replicated DB via Heroku postgres.
Not truly REST, backends kept state.
Single-threaded backend (one request
blocked entire Heroku dyno).
Direct, blocking DB access.
DB caching via ActiveRecord.

Potential Bottlenecks
• Client resources
• CPU
• Memory
• I/O
• Server resources
• Database resources
• Open connections
• Running queries
• Network resources
• Bandwidth
• Connections / open sockets
• Availability (esp. on Wifi / mobile networks)

• CPU
• Memory
• I/O
Profile your algorithms
Crunch less data
Reuse more old work
Offload some processing to the server
• Bandwidth

• CPU
• Memory
• I/O
Profile your algorithms
Crunch less data
Reuse more old work (across users)
Divide and Conquer (“shard”)
Spin up and balance more servers
• Bandwidth

• Open connections
• Running queries
Optimize your queries
Connection pooling
Add a second-level cache
Reuse more old work (across users)
Divide and Conquer (“shard”)
Batch DB requests
Spin up and replicate more DBs
• Bandwidth

• Bandwidth
Add a local cache
Send diffs
Compress responses (CPU tradeoff)
Connection pooling
Batch network requests

Profiling
(Diagnosing the problem)
Purpose: find the “hotspots” in your program.
Things you care about:
• “CPU time” – time spent processing your program’s instructions.
• “Memory” – RAM being used to store your program’s data.
• “Wall time” – overall time spent waiting for the program.
• Methods:
• Basic: “Stopwatch”
• Advanced: Profiler (e.g. jprof, jprofiler, hprof, Netbeans, Visual Studio)

Stopwatch
• Easy: just time methods.
Matlab:
function [result] = do_something_expensive(data)
tic
…
toc
end
• In Java, use Guava’s Stopwatch class (start() and stop() methods).

Caching and Reuse
“There are only two hard things in Computer Science:
cache invalidation and naming things.” --Phil Karlton
• Trades off CPU for space.
• Look for repetition of input.
(Including subproblems)
• Compute a key from the input.
• Associate the result with the key.
• Important: algorithm must be a
deterministic mapping from input
to output.
• Important: if you change what the
algorithm depends on, update the
cache key.
Name: Alice
Job: Developer
Salary: 100,000
<Alice,
a@co.com>
Cache

Computing a Cache Key
• Hashing is a good strategy.
• Object.hash (JDK7) / Objects.hashCode (Guava)
• Beware: Hashes can collide – sanity check results!
• Searching:
• Hash data
• Query cache for hash key.
• If found, return associated value.
• If not, query live service and store the result in the cache.
<Alice,
a@co.com>
0xAF724…

Concurrency
Sequential programs run like this:
Work Work Work
Concurrent programs run like this:
Work
Work
Work
A lot of time
Less time

Race Conditions
Problem: Two threads can simultaneously write to the same variables.
If you ran this code in two threads:
if (x < 1) { x++; }
Then x would usually end up at 1.
But sometimes it would be 2!
• Race conditions such as that one are among the hardest bugs to find + fix.
• Three ways to manage this:
• Immutability
• Local state
• Synchronization
• Race conditions only happen when you write to shared, mutable state.

Immutability
• General tip: try to minimize the number of states your program can end up in.
• Concurrency
• REST
• (And your programs will just have less state, so you’ll produce fewer bugs)
• Declare variables final where possible, set them in the constructor, and don’t write setters unless you must:
// String is an immutable type - can’t change it at runtime.
// foo is an immutable variable - can’t reassign it.
private final String foo;
public Bar(String foo) {
this.foo = Preconditions.checkNotNull(foo);
}

Local State
• Sometimes you need to modify state.
• But you can still avoid locking if it’s only visible to you:
• Two threads can write copies of same data.
• Optionally, can be merged back in single thread afterwards.
• (This is how MapReduce works)
Java inner classes help tremendously with this!
// Every time you run sendToNetwork, you’ll use a new channel. No shared state!
void sendToNetwork() {
final Channel channel = new HttpChannel(context);
channel.connect();
Thread foo = new Thread() {
@Override
public void run() {
channel.send(“I am the jabberwocky”);
}
}
}

Synchronization
• If you do need to write shared state, you need to synchronize access to it.
• Last resort: slows your program and deadlock-prone.
Object lock;
synchronized (lock) {
if (x < 1) { x++; }
}
Now x is always 1! No interruption possible between read and write.
• More advanced: read/write locks (ReentrantReadWriteLock…)
• Also check out Java “Atomic” classes and “concurrent” collections:
• AtomicBoolean, AtomicInteger, …
• ConcurrentHashMap…

Futures
• Threads compute asynchronously.
• Caller wants some way of knowing the result when it’s ready.
• Future: handle to a result that may or may not be available yet.
• future.get(): waits for a result and returns it, with optional timeout.
• Futures allow for asynchronous calls to immediately return, and for the program to wait for the results when it’s convenient.
• Also see Guava’s ListenableFuture.
The usual pattern:
ThreadPoolExecutor pool;
Callable<String> action = new Callable<String>() {
@Override
public String call() throws NetworkException {
return askTheNetworkForMyString();
}
};
Future<String> result = pool.submit(callable);
String myString = result.get(); // Waits until the result is available. Throws if an exception was thrown inside the Callable.

REST
• Scalable client / server architecture.
• Sockets are complicated, usually uses HTTP.
• Each HTTP request hits an “endpoint”, which does one thing.
e.g. GET http://api.clipless.co/json/deals/near/Times_Square
• Principles:
• Server does not store state (see immutability)
• Responses can be cached (see caching)
• Client doesn’t care if server is final endpoint or proxy.
• State usually ends up in DB, server communicates with client using tokens.

Clipless Architecture
Protobuf over HTTP
10,000 reqs / second
Apache (mod_proxy_balancer)
Tomcat
MySQL
Content-
Addressable
Cache
Content-
Addressable
Cache

Static Content
• Static content (e.g. HTML, images) is highly cacheable.
• Easiest way to cache: use a CDN
• Akamai, S3, CloudFlare, CloudFront, MaxCDN, …
• Cache key:
• Some HTTP headers (inc. Cache-Control header)
• Page requested
• Last-modified (e.g. from a “HEAD” to your server)
• Added bonus: most CDNs are “closer” to your users than your server.
• Compressing content reduces bandwidth:
• Browsers usually support gzip decompression.
• Apache, nginx: Gzip compression plugins
• Javascript / CSS: minification
• Images: Google PageSpeed service / CloudFlare
• Program data: Protocol Buffers, Thrift
• Why use your bandwidth when you can use someone else’s?

Sharding
Alice
Bob
Mallory
Requests A-L Requests M-Z

Batching Network Requests
The Operation Queue / Proactor Pattern
Producer
Producer
Worker Thread Pool
Thread-safe queue
Work
NetworkListener
onUp: queue.resume()
onDown: queue.suspend()
Work Work
Producer
ListenableFuture<Result>

How to Test
• Mock large amounts of data, measure performance
• Can be automated so you never encounter performance regressions
• Network stress tests
• ab
• blitz
• loader.io
• ulimits
• Packet sniffers
• Round trip time services, e.g. NewRelic.

General Principles
• Scale when you anticipate the need.
• Scale eagerly when you don’t need to go far out of the way.
• CDNs and Gzip compression good examples.
• Or when retrofitting will be painful.
• RESTful architecture from the beginning: much easier than tacking it on later!
• But caching is usually easy to add later.
• Focus on the big improvements:
• 80/20 rule
• Profile and knock out the biggest CPU / memory hogs first.
• Practice and internalize to reduce scaling costs!
• Concurrency is much easier with mastery.
• Caching seems much easier with mastery, often isn’t.
• Internalize immutability and you’ll just write better code.

Thanks!
Good luck, and always bring mangosteens to acquisition talks.

Software at Scale

Recommended

Recommended

More Related Content

More from New York City College of Technology Computer Systems Technology Colloquium

More from New York City College of Technology Computer Systems Technology Colloquium (10)

Recently uploaded

Recently uploaded (20)

Software at Scale

Editor's Notes