Severalnines Training: MySQL® Cluster - Part IX

1Copyright 2011 Severalnines AB Control your database infrastructure
9th Installment
MySQL Cluster Self-Training
Part 8 – Designing a MySQL Cluster

Topics
• Node Placement
• Capacity Planning and Dimensioning
• Hardware recommendations
• Best practice configuration
• Storage calculations

Node Placement
• Data nodes should use dedicated instances
– Heavy user of RAM, CPU, and DISK
• API nodes (e.g. SQL node) should preferably be on
dedicated instance
– Heavy user of CPU, but little DISK
– RAM usage dependant on workload
• Management servers
– Negligible use of CPU, DISK, RAM

Co-location
• Do not co-locate API nodes with Data nodes
– They will compete for CPU
– RAM usage for API nodes may grow, competing with
resources of the Data node (causing swapping and node
failures)
• Don’t co-locate Management servers with Data
nodes
– You lose protection from split brain/network partitioning
• API nodes and Management servers can be co-
located

Cluster Size
• Number of Data Nodes
– Depends on Storage and Throughput requirements
– Use Sizer (http://www.severalnines.com/sizer) to calculate
storage requirements for your data
– At least two for redundancy
• Number of API Nodes
– Depends on the expected level of Throughput
– At least two for redundancy
– Usually recommended to have 2x API nodes compared to Data
nodes (2 data nodes  4 API nodes). Especially for API nodes
using the synchronous NDB API (mysqld, Cluster/J)
• Number of Management servers
– Two for redundancy. Always!
– Having one management server on every API node does not
make sense

Good Initial Setup (1)
STORAGE LAYER (NDBCLUSTER)
ACCESS LAYER
API node
ndb_mgmd
API node
ndb_mgmd
ndbmtd ndbmtd
CLUSTER CONTROL
mysqld
cmon

• Easy to scale:
– Data nodes can be added online (it is not easy but possible)
– API nodes can be added online (as long as there are free
[mysqld] slots in config.ini)
• Can be extended
– Replicating out to an InnoDB database for
reportinghttp://johanandersson.blogspot.se/2012/09/mysql-
cluster-to-innodb-replication.html
– Using the Hadoop Applier (oracle)
https://blogs.oracle.com/MySQL/entry/announcing_the_mys
ql_hadoop_applier
• The suggested setup is only a starting point
– The questions on the next slides might help determine if you
need more nodes

• Can you load in the data that you need?
– YES: good
– NO:
• Can you add more RAM to the data nodes?
If not, create a new cluster with four nodes. Try and load in the
data gain.
• Can some of the less active tables use DISK DATA storage?
Avoid DISK DATA tables for frequently used data
• Use Severalnines Sizer (http://www.severalnines.com/sizer/)
(capacity planning tool). Create the schema in NDB Cluster,
run sizer and import the result to a spreadsheet. Manipulate the
row count.
• Use sizer to verify growth scenarios

• Can you handle the throughput you need?
– verify with Bencher (www.severalnines.com/bencher/)
– YES: good
– NO:
• Are the data nodes the bottleneck?
Run:
top –Hd1
Any of the data nodes threads running >90%?
YES: create a new cluster with 2x the number of nodes.
NO: The APIs can be the bottleneck. Add more API nodes
• Tune schema and queries/requests (possible play with the NDB
cluster connection pool as well)

Hardware for Data Nodes
• 8 cores or more
– Fast CPU and memory bus is important
• As much RAM as you need
– Memory tables and indexes for DISK DATA tables must fit in RAM
• Disk Subsystem:
– SATA2 is the absolute minimum (7200RPM), but not really suitable for
production
– Better options are:
• SAS
• SSD
• AWS IOPS preferably
– RAID 1+0 – requires 4 disks
• Disk Storage Capacity
– 10xDataMemory (for REDO LOG and LCP)
• If you use Disk data tables
– One disk for LCP
– One disk for Tablespace (SSD could be an option)
– One disk for UNDO/ REDO

Hardware for API Nodes
• 8 cores or more
– Fast CPU and memory bus is important
• Disks
– Replication servers:
• Disk space must be dimensioned to store binary logs/relay logs
• 5MB/s written into NDB  binary logs will grow with 5MB/s
– Disk is not important for the API Nodes
• API nodes do not save any state information to disk (except
small meta data like .frm files)

Network
• Network interconnect is important
– Ethernet
• 1Gig-E is most common
• 10Gig-E is coming
– Infiniband
• IBOIP
• Lower latency than Ethernet
• Load-balancing
– Hardware: F5, Extreme Summit, Cisco
– Software: HAProxy, LVS

Storage Calculations
• Two things to consider
– Disk space
– Memory consumption

Disk Space
• One data node needs
– 3xDM for LCP (3x for Headroom, 2x is on the limit)
– 4-6xDM for Redo Log
• 4x – read mostly applications
• 6x – write intensive applications
– Tablespace
• Depends on how much data you plan to store on disk
• Storage needed per table per node:
2 x( #records x size_of_non_indexed_cols + 40B) x NoOfReplicas /#nodes
Note: 40B is the record overhead
– Store one or more backups
• 1 x DM for each backup
• This sums to >8x disk space than DataMemory

DataMemory and IndexMemory
• IndexMemory = 20B xsum_for_all_records
• DataMemory / per table = 40B + avg_record_size
• Per node:
– DataMemory=SUM(DataMemory/table) x NO_OF_NODES /
NO_OF_REPLICAS
– IndexMemory=IndexMemoryx NO_OF_NODES /
NO_OF_REPLICAS
• Easy way:
– www.severalnines.com/sizer
• Provision a data model in cluster
Run: ./sizer –a
Import the csv data into the excel template.

Disk Data tables
• Not everything has to stay in RAM.
– Log data, archives etc not frequently accessed can be stored
in DISK DATA tables:
http://johanandersson.blogspot.se/2012/04/mysql-cluster-
disk-data-config.html
– Indexedcolumnswillalwaysstay in RAM for DISK DATA
tables.
– Disk data access is notfast, butSSDshelps a lot.
– Disk Data tablespacecanbeincreasedovertimeonline.

Performance Planning
• Transaction capacity planning requires benchmarking
– Throughput and Response times requirements affects the
number of nodes, both data nodes and mysql servers.
• Benchmark the common use cases
– Severalnines Bencher allows to drive a high load and test
individual queries.
– Jmeteretc can be used to drive web load
– Try to simulate expected peak traffic.
• Can the cluster handle the load? If not add resources
online where needed.

Coming next in Installment 10:
Troubleshooting MySQL Cluster

We hope these training slides are
useful to you!
Please visit our website to view the
next section of this training.
For any questions, comments or feedback,
please contact us at:
services@severalnines.com
Thank you!

Disclaimer
© Copyright 2011 Severalnines AB. All rights reserved.
Severalnines& the Severalnineslogo(s) are trademarks of Severalnines AB.
MySQL is a registered trademark of Oracle and/or its affiliates.
Other names may be trademarks of their respective owners.

Severalnines Training: MySQL® Cluster - Part IX

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Severalnines Training: MySQL® Cluster - Part IX

Similar a Severalnines Training: MySQL® Cluster - Part IX (20)

Más de Severalnines

Más de Severalnines (20)

Último

Último (20)

Severalnines Training: MySQL® Cluster - Part IX