P. Vijayashanker From Small Datasets to Scale: Planning for the Evolution of Data Social Developer Summit
1. Small Data Sets to Scale:
Planning for the Evolution of Data
Poornima Vijayashanker
CEO & Founder BizeeBee
poornima@bizeebee.com
@poornima
www.femgineer.com
2. AGENDA
I. Stealth Mode - “pre-data” phase
II. Launch
III. Compute Growth Rate
IV. Optimizations
V. Data Storage
3. Pre-Data
Stealth Mode - “pre-data” phase
Small initial data set
Easy storage
Storage solutions like Heroku, RackSpace
Design features around it
Simplicity of Storage v. Complexity of Design
e.g. Mint - 3 months of financial data, FB - social graph is
limited to universities
4. 0 to 100k to 1M
0 - 100k easiest schema design
Single DB - with user & static data
Single instance of app accessing the db
100k - 1M+ time to re-design db and app
Break up databases - user & static
Multiple instances of the app
5. Growth Rate
What is your user growth rate?
Basic unit e.g. Mint - transaction
User generated content
Size of unit e.g. FB - photo
Storage capacity v. Seek v. Size
6. Optimizations
Capacity - throw hardware
Seek - throw software
Cache data
Size - design around it
Limit usage size e.g. 4MB picture
7. Optimizations Cont’d
Code Level
Processes - Computation v. Retrieval
DB Techniques - Index, De-Normalize
Data Level
Partioning: Siloed v. Interconnected
8. Data Storage
Single User’s Data v. Aggregated Data
Single user’s data v. data aggregated across users
e.g Mint - Spending Trends
Scheme to compute, store, and retrieve aggregated data
9. Conclusion
Start small - provide enough value to user
Monitor & project growth rate of data
Break data apart
Simple optimizations - indexing, de-
normalizing, caching
Large data sets - warehousing, partitioning
db
Hiring designer & engineer for BizeeBee :)