In this webinar, we cover how ScaleBase provides transparent data distribution to its clients, overcoming caveats, hiding the complexity involved in data distribution, and making it transparent to the application.
2. Agenda
1. Who We Are
2. The Scalability Problem
3. Benefits of Automatic Data Distribution
4. Customer ROI/Case Studies
5. Q & A
(please type questions directly into the GoToWebinar side panel)
2
3. Who We Are
Presenters: Paul Campaniello,
VP of Global Marketing
25 year technology veteran with
marketing experience at Mendix,
Lumigent, Savantis and Precise.
Doron Levari, Founder
A technologist and long-time
veteran of the database industry.
Prior to founding ScaleBase, Doron
was CEO to Aluna.
3
4. Pain Points – The Scalability Problem
• Thousands of new online and mobile
apps launching every day
• Demand climbs for these apps and
databases can’t keep up
• App must provide uninterrupted
access and availability
• Database performance and
scalability is critical
4
5. Big Data = Big Scaling Needs
Big Data = Transactions + Interactions + Observations
Sensors/RFID/Devices Mobile Web User Generated Content Spatial & GPS Coordinates
BIG DATA
Petabytes User Click Stream Sentiment Social Interactions & Feeds
Web Logs Dynamic Pricing Search Marketing
WEB
Offer History A/B Testing Affiliate Networks
Terabytes External
Demographics
Segmentation Customer Touches
CRM
Business Data
Offer Details Support Contacts Feeds
Gigabytes
HD Video, Audio, Images
Behavioral
ERP
Purchase Detail
Targeting Speech to Text
Purchase Record
Product/Service Logs
Payment Record Dynamic
Funnels
SMS/MMS
Megabytes
Increasing Data Variety and Complexity
5
The 451 Group & Teradata
6. Scalability Pain
Infrastructure
Cost $
Large You just lost
Capital customers
Expenditure
Predicted
Demand
Opportunity Traditional
Cost Hardware
Actual
Demand
Dynamic
Scaling
time
6
7. Ongoing “Scaling MySQL” Series
• August 16 & September 20, 2012
– Scaling MySQL: ScaleUp versus Scale Out
• October 23, 2012
– Methods and challenges to Scale out MySQL
• Today
– Benefits of Automatic Data Distribution
• January 17, 2013
– Catch 22 of read-write splitting
7
8. The Database Engine is the Bottleneck...
• Every write operation is At Least 4 write operations inside the DB:
– Data segment
– Index segment
– Undo segment
– Transaction log
• And Multiple Activities in the DB engine memory:
– Buffer management
– Locking
– Thread locks/semaphores
– Recovery tasks
8
9. The Database Engine is the Bottleneck
• Every write operation is At Least 4 write operations inside the DB:
– Data segment
– Index segment
– Undo segment Now multiply
– Transaction log by 10TB
accessed by
• And Multiple Activities in the DB engine memory:
10000
– Buffer management
concurrent
– Locking
sessions
– Thread locks/semaphores
– Recovery tasks
9
10. COI – Customer, Order, Item
CUSTOMER ORDER ORDER_ITEM ITEM
C_ID NAME LOCATION RANK O_ID C_ID DATE OI_ID O_ID QUANT I_ID I_ID NAME
1 John MA 10 1 1 2012-02-01 1 1 3 1 1 iPhone
2 James AL 9 2 1 2012-02-01 2 1 6 2 2 iPad
3 Peter CA 10 3 2 2012-02-01 3 2 4 1 3 iPad Mini
4 Chris FL 8 4 6 2012-02-01 4 2 2 2 4 Kindle
5 Oliver MA 9 5 6 2012-02-01 5 2 1 5 5 Kindle Fire
6 Allan MA 9 6 8 2012-02-01 6 3 1 1 6 Galaxy S3
7 Janette CA 8 7 3 6 5
8 David MD 10 8 4 8 3
9 4 9 4
10 5 2 6
11 6 1 5
10
11. Requirements
• Every day:
• Updates Throughput
– 30,000 new customers
– 1,000,000 new orders, average of 5 items per order
– Items catalog is updated once a day, nightly, on 11pm
Latency
• Queries
– Top customers, rank 9 and up)
– New orders, joins across the board…
11
12. Splitting the data
• CUSTOMER – random (hash)
• ORDER – derivative (C_ID)
• ORDER_ITEM – transitive (O_ID -> C_ID)
• ITEM – global table
12
13. Sliced Database
CUSTOMER ORDER ORDER_ITEM ITEM
C_ID NAME LOCATION RANK O_ID C_ID DATE OI_ID O_ID QUANT I_ID I_ID NAME
1 John MA 10 1 1 2012-02-01 1 1 3 1 1 iPhone
4 Chris FL 8 2 1 2012-02-01 2 1 6 2 … …
7 Janette CA 8 3 2 4 1 6 Galaxy S3
4 2 2 2
DB - 1 5 2 1 5
C_ID NAME LOCATION RANK O_ID C_ID DATE OI_ID O_ID QUANT I_ID I_ID NAME
2 James AL 9 3 2 2012-02-01 6 3 1 1 1 iPhone
5 Oliver MA 9 6 8 2012-02-01 7 3 6 5 … …
8 David MD 10 11 6 1 5 6 Galaxy S3
DB - 2
C_ID NAME LOCATION RANK O_ID C_ID DATE OI_ID O_ID QUANT I_ID I_ID NAME
3 Peter CA 10 4 6 2012-02-01 8 4 8 3 1 iPhone
6 Allan MA 9 5 6 2012-02-01 9 4 9 4 … …
10 5 2 6 6 Galaxy S3
DB - 3
13
14. Requirements
Distribution
• Every day:
• Updates Throughput
– 30,000 new customers
– 1,000,000 new orders, average of 5 items per order
– Items catalog is updated once a day, nightly, on 11pm
Parallelism
Latency
• Queries
– Top customers, rank 9 and up)
– New orders, joins across the board…
14
15. Automatic Data Distribution
• The ultimate way to scale
• Provides significant performance improvements
• The only way to really improve read and also writes
• Good for scaling high session-volume reads and writes
• Good for scaling high data-volume reads and writes
• Home-grown implementations have drawbacks
15
16. Scale Out Features and Benefits
Feature Benefit
Parallel query execution Great performance of cross-db queries &
maintenance commands
Query result aggregation Support of sophisticated cross-db queries, even with
ORDER BY, GROUP BY, LIMIT, Aggregate functions…
Online data redistribution Flexibility: no need to over-provision
No downtime
100% compatible MySQL proxy Applications unmodified
Standard MySQL tools and interfaces
MySQL databases untouched Data is safe within MySQL InnoDB/MyISAM/any
Data distribution review and analysis Optimization of data distribution policy
Data consistency verifier Validate system-wide data consistency
Real-time monitoring and alerts Simplify management, reduce TCO
16
17. Scale Out Provides Immediate & Tangible Value
Application Server Database A Standby A
Application Server Database B Standby B
Database C Standby C
BI
Database D Standby D
Management
17
18. Typical Scale Out (ScaleBase) Deployment
Application Server Database A Standby A
ScaleBase
Central Management
Application Server Database B Standby B
ScaleBase
Data Traffic Manager
Database C Standby C
BI
Database D Standby D
Management
18
19. Choose Your Scale-out Path
Data Distribution
Database Size
Read/Write Splitting
1 DB?
Good for me!
# of concurrent sessions
19
20. Scaling Out Achieves Unlimited Scalability
160000
140000
120000
100000
Throughput
84000
80000 Throughput (TPM)
Total DB Size (MB)
60000 60000 # Connections
48000
40000
36000
24000 2500
20000 2000
12000 1500 1500
6000 1000
0 500 500
1 2 4 6 8 10 14
Number of Databases
20
21. Detailed Scale Out Case Studies
Nokia AppDynamics Mozilla Solar Edge
• Device Apps App • Next gen APM • New Product/ • Next Gen
• Availability company Next Gen App/ Monitoring App
• Scalability • Scalability for the AppStore • Massive Scale
• Geo-clustering Netflix • Scalability • Monitors real
implementation • Geo-sharding time data from
• 100 Apps
thousands of
• 300 MySQL DB
distributed
systems
21
22. Summary
• Database scalability is a significant problem
– App explosion, Big Data, Mobile
• Scale Up helps somewhat, but Scale Out provides
a long-term, cost-effective solution
• ScaleBase has an effective Scale Out
solution with a proven ROI
– Improves performance &
requires NO changes to
your existing infrastructure
• Choose your scale-out path....
– The ScaleBase platform enables
you to start with R/W splitting and
grow into automatic data distribution
22
23. Questions (please enter directly into the GTW side panel)
617.630.2800
www.ScaleBase.com
doron.levari@scalebase.com
paul.campaniello@scalebase.com
23