This presentation was given at LinkedIn. It is a collection of guidelines and wisdom for re-thinking how we do engineering for massively scalable systems. Useful for anyone who cares about Big Data, Distributed Computing, Hadoop, and more.
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Make Life Suck Less (Building Scalable Systems)
1. How To Make Life Suck
Less!
(when building scalable systems)
Bradford Stephens
c: www.DrawnToScaleHQ.com
b: www.roadtofailure.com
t: @lusciouspear
2. About Me
• Founder, Drawn to Scale. Lead Engineer,
Visible Technologies
• CS Degree, University of North FL
• Former careers in politics, music, finance,
consulting
3. Drawn to Scale
• Building the “Big Data” platform: ingestion,
processing, storage, search
• Products coming: Big Log, Big Search
(faceted), Big Message...
5. Everything Changes
with Big Data
• Bar is set higher: a previously niche field,
few standard stacks (like LAMP)
• You need to have better engineering for
minimum success
6. Scalability Matters
• “Web-Scale” data is unstructured and
exponentially interconnected
• Social Media: Catalyst
• All data is important
• Data Size != Business Size
7. The Traditional DB
• Excel with highly structured, normalizable
data
• Non-Linear Scale Cost
• More data = less features
• Optimized for single-node
• 90% of utility is 5% of capability
8. Ergo, Distributed
• Optimize for the problems, no Swiss-Army
knife
• Shared-nothing, commodity boxes
• Linear scale cost
9. The State of Things
• Order changed from 20 years ago:
• Cust. Experience is paramount
• Engineers are precious
• Fast I/O is expensive
• Storage is cheap
11. Operations
Moving the Box: Sysadmin ratio from 2:1 to
200:1 to 2000:1
(yes devs, you’ll care about this too)
12. Ops vs. Eng
• Engineers build, Ops manages
• Fixing problems: devs code+automate, ops
hire
• Want something fixed? Call devs at 2 AM.
13. Config is Important
• Configuration is not 2nd-class anymore
• Needs to be tackled by Engineers
• New frameworks = months of
configuration and experimentation
• Chef is a good start, but...
14. Production = Test
• Surprise! You don’t have a Test environment
any more.
• Test Cost => Prod Cost
• Anything that’s not your data center is an
approximation. Switches, cable, power,
boxes, etc...
15. You’re Always Testing
• Constantly simulate failures and brownouts
of boxes, racks, switches...
• “Canary in the Coal Mine”: run a box and
rack at 175% current load.
17. Built to Fail
• “It’s working” isn’t binary
• Acting weird? Shoot it.
• Multi-system failure is common: be
topology aware
• Avoid false negative: something’s wrong and
you don’t know it, lose customer data
• This is empowering!
19. This is Hard :(
• Engineering at scale is very different than
writing a 3-tier webapp
• Care about garbage collection, election
algorithms, data structures, access patterns,
etc...
• CS knowledge is required, not a luxury
• DBA/RDBMS skills pretty useless
• CAP is law
20. Not Everything’s a Table
• Structure your data according to how it
needs to be used
• Unstructured massive files, graphs, KV-
stores
• The more your problem narrows, the
easier it is to scale
21. Big Data is BIG
• Imagine your test passes taking hours
• What works at 1.5 TB may fail at 10MB or
2 TB
• Many tests, simple code
• Soft Delete Only
22. “No, I won’t give you a
repro”
• Often impossible to repro a bug on
demand in a cluster
• Either fix your logging or your bug
• Log everything (we have a product for this!)
23. Avoiding Impedance
Mismatch
• High vs. Low Latency vs. Throughput
• A lot of data eventually, or a little now
• MapReduce vs. Sharding/Indexing
24. Simple Workflow
Semantic Unstructured
Hadoop Collect
Analysis Analysis
Structured
Analysis
Hadoop + Store in
HBase HBase
Store in
Indexing
Hadoop
Lucene+ Load/
Pull
Solr+ Replicate
Indexes
Katta Shards Search
26. Hiring
• Plan for more engineers, less ops
• Be aware of “context switch cost” when
training RDBMS-folks
27. It’s Not Just Coding
• Be aware of research cost
• Much more time spent experimenting, not
coding
• Coding all this from scratch is horrific
• Nailing together 10+ OSS projects is a pain
• Open source anything not “Secret sauce”
28. Solve your Core
Problem
• “Making your own electricity doesn’t create
better tasting beer”
• Plan to use an end-to-end platform in the
future (hint: ours!)
29. In Summary
• Plan for everything to fail
• Test constantly in production
• Systems Software requires Computer
Science
• Don’t build it if you don’t have to
30. Thanks!
• Ya’ll
• Road to Failure Readers
• James Hamilton, Amazon/MS
• Bradford Cross, Flightcaster
• Ryan Rawson, HBase/Stumbleupon