Strategies for Landing an Oracle DBA Job as a Fresher
Severalnines Training: MySQL® Cluster - Part IX
1. 1Copyright 2011 Severalnines AB Control your database infrastructure
9th Installment
MySQL Cluster Self-Training
Part 8 – Designing a MySQL Cluster
2. 2Copyright 2011 Severalnines AB Control your database infrastructure
Topics
• Node Placement
• Capacity Planning and Dimensioning
• Hardware recommendations
• Best practice configuration
• Storage calculations
3. 3Copyright 2011 Severalnines AB Control your database infrastructure
Node Placement
• Data nodes should use dedicated instances
– Heavy user of RAM, CPU, and DISK
• API nodes (e.g. SQL node) should preferably be on
dedicated instance
– Heavy user of CPU, but little DISK
– RAM usage dependant on workload
• Management servers
– Negligible use of CPU, DISK, RAM
4. 4Copyright 2011 Severalnines AB Control your database infrastructure
Co-location
• Do not co-locate API nodes with Data nodes
– They will compete for CPU
– RAM usage for API nodes may grow, competing with
resources of the Data node (causing swapping and node
failures)
• Don’t co-locate Management servers with Data
nodes
– You lose protection from split brain/network partitioning
• API nodes and Management servers can be co-
located
5. 5Copyright 2011 Severalnines AB Control your database infrastructure
Cluster Size
• Number of Data Nodes
– Depends on Storage and Throughput requirements
– Use Sizer (http://www.severalnines.com/sizer) to calculate
storage requirements for your data
– At least two for redundancy
• Number of API Nodes
– Depends on the expected level of Throughput
– At least two for redundancy
– Usually recommended to have 2x API nodes compared to Data
nodes (2 data nodes 4 API nodes). Especially for API nodes
using the synchronous NDB API (mysqld, Cluster/J)
• Number of Management servers
– Two for redundancy. Always!
– Having one management server on every API node does not
make sense
6. 6Copyright 2011 Severalnines AB Control your database infrastructure
Good Initial Setup (1)
STORAGE LAYER (NDBCLUSTER)
ACCESS LAYER
API node
ndb_mgmd
API node
ndb_mgmd
ndbmtd ndbmtd
CLUSTER CONTROL
mysqld
cmon
7. 7Copyright 2011 Severalnines AB Control your database infrastructure
Good Initial Setup (2)
• Easy to scale:
– Data nodes can be added online (it is not easy but possible)
– API nodes can be added online (as long as there are free
[mysqld] slots in config.ini)
• Can be extended
– Replicating out to an InnoDB database for
reportinghttp://johanandersson.blogspot.se/2012/09/mysql-
cluster-to-innodb-replication.html
– Using the Hadoop Applier (oracle)
https://blogs.oracle.com/MySQL/entry/announcing_the_mys
ql_hadoop_applier
• The suggested setup is only a starting point
– The questions on the next slides might help determine if you
need more nodes
8. 8Copyright 2011 Severalnines AB Control your database infrastructure
Good Initial Setup (3)
• Can you load in the data that you need?
– YES: good
– NO:
• Can you add more RAM to the data nodes?
If not, create a new cluster with four nodes. Try and load in the
data gain.
• Can some of the less active tables use DISK DATA storage?
Avoid DISK DATA tables for frequently used data
• Use Severalnines Sizer (http://www.severalnines.com/sizer/)
(capacity planning tool). Create the schema in NDB Cluster,
run sizer and import the result to a spreadsheet. Manipulate the
row count.
• Use sizer to verify growth scenarios
9. 9Copyright 2011 Severalnines AB Control your database infrastructure
Good Initial Setup (4)
• Can you handle the throughput you need?
– verify with Bencher (www.severalnines.com/bencher/)
– YES: good
– NO:
• Are the data nodes the bottleneck?
Run:
top –Hd1
Any of the data nodes threads running >90%?
YES: create a new cluster with 2x the number of nodes.
NO: The APIs can be the bottleneck. Add more API nodes
• Tune schema and queries/requests (possible play with the NDB
cluster connection pool as well)
10. 10Copyright 2011 Severalnines AB Control your database infrastructure
Hardware for Data Nodes
• 8 cores or more
– Fast CPU and memory bus is important
• As much RAM as you need
– Memory tables and indexes for DISK DATA tables must fit in RAM
• Disk Subsystem:
– SATA2 is the absolute minimum (7200RPM), but not really suitable for
production
– Better options are:
• SAS
• SSD
• AWS IOPS preferably
– RAID 1+0 – requires 4 disks
• Disk Storage Capacity
– 10xDataMemory (for REDO LOG and LCP)
• If you use Disk data tables
– One disk for LCP
– One disk for Tablespace (SSD could be an option)
– One disk for UNDO/ REDO
11. 11Copyright 2011 Severalnines AB Control your database infrastructure
Hardware for API Nodes
• 8 cores or more
– Fast CPU and memory bus is important
• Disks
– Replication servers:
• Disk space must be dimensioned to store binary logs/relay logs
• 5MB/s written into NDB binary logs will grow with 5MB/s
– Disk is not important for the API Nodes
• API nodes do not save any state information to disk (except
small meta data like .frm files)
12. 12Copyright 2011 Severalnines AB Control your database infrastructure
Network
• Network interconnect is important
– Ethernet
• 1Gig-E is most common
• 10Gig-E is coming
– Infiniband
• IBOIP
• Lower latency than Ethernet
• Load-balancing
– Hardware: F5, Extreme Summit, Cisco
– Software: HAProxy, LVS
13. 13Copyright 2011 Severalnines AB Control your database infrastructure
Storage Calculations
• Two things to consider
– Disk space
– Memory consumption
14. 14Copyright 2011 Severalnines AB Control your database infrastructure
Disk Space
• One data node needs
– 3xDM for LCP (3x for Headroom, 2x is on the limit)
– 4-6xDM for Redo Log
• 4x – read mostly applications
• 6x – write intensive applications
– Tablespace
• Depends on how much data you plan to store on disk
• Storage needed per table per node:
2 x( #records x size_of_non_indexed_cols + 40B) x NoOfReplicas /#nodes
Note: 40B is the record overhead
– Store one or more backups
• 1 x DM for each backup
• This sums to >8x disk space than DataMemory
15. 15Copyright 2011 Severalnines AB Control your database infrastructure
DataMemory and IndexMemory
• IndexMemory = 20B xsum_for_all_records
• DataMemory / per table = 40B + avg_record_size
• Per node:
– DataMemory=SUM(DataMemory/table) x NO_OF_NODES /
NO_OF_REPLICAS
– IndexMemory=IndexMemoryx NO_OF_NODES /
NO_OF_REPLICAS
• Easy way:
– www.severalnines.com/sizer
• Provision a data model in cluster
Run: ./sizer –a
Import the csv data into the excel template.
16. 16Copyright 2011 Severalnines AB Control your database infrastructure
Disk Data tables
• Not everything has to stay in RAM.
– Log data, archives etc not frequently accessed can be stored
in DISK DATA tables:
http://johanandersson.blogspot.se/2012/04/mysql-cluster-
disk-data-config.html
– Indexedcolumnswillalwaysstay in RAM for DISK DATA
tables.
– Disk data access is notfast, butSSDshelps a lot.
– Disk Data tablespacecanbeincreasedovertimeonline.
17. 17Copyright 2011 Severalnines AB Control your database infrastructure
Performance Planning
• Transaction capacity planning requires benchmarking
– Throughput and Response times requirements affects the
number of nodes, both data nodes and mysql servers.
• Benchmark the common use cases
– Severalnines Bencher allows to drive a high load and test
individual queries.
– Jmeteretc can be used to drive web load
– Try to simulate expected peak traffic.
• Can the cluster handle the load? If not add resources
online where needed.
18. 18Copyright 2011 Severalnines AB Control your database infrastructure
Coming next in Installment 10:
Troubleshooting MySQL Cluster
19. 19Copyright 2011 Severalnines AB Control your database infrastructure
We hope these training slides are
useful to you!
Please visit our website to view the
next section of this training.
For any questions, comments or feedback,
please contact us at:
services@severalnines.com
Thank you!