Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Shapira oda perf_webinar_v2
1. An Insider’s Guide to
ODA
P e rfo rm a nc e
Prepared by: Alex Gorbachev, Pythian CTO & Gwen Shapira
Presented by: Gwen Shapira, Senior Pythian Consultant
2. Alex Gorbachev Gwen Shapira
CTO, Pythian Senior Consultant, Pythian
President, Oracle RAC SIG Oracle Ace Director
3. W h y C o m p a n ie s Tr u s t P y t h ia n
Recognized Leader:
• Global industry-leader in remote database administration services
and consulting for Oracle, Oracle Applications, MySQL and SQL Server
• Work with over 150 multinational companies such as Western Union,
Fox Interactive Media, and MDS Inc. to help manage their complex IT
deployments
Expertise:
• One of the world’s largest concentrations of dedicated, full-time DBA
expertise.
Global Reach & Scalability:
• 24/7/365 global remote support for DBA and consulting, systems
administration, special projects or emergency response
38
13. O r a c le D a t a b a s e A p p lia n c e
S to ra g e
• 20 SAS 15000 RPM 600GB
•4 SAS SSD 73GB
• Each SN – 2 HBA
• Each SN – 2 Expanders
• Each Expander – 12 disks
• Each disk – 2 SAS ports
13
17. Whe re ’ s the
In t e r c o n n e c t ?
[root@odaorcl1 ~]# /u01/app/11.2.0.3/grid/bin/oifcfg getif
eth0 192.168.16.0 global cluster_interconnect
eth1 192.168.17.0 global cluster_interconnect
bond0 172.20.31.0 global public
eth0 Link encap:Ethernet HWaddr 00:21:28:E7:C3:72
inet addr:192.168.16.24 Bcast:192.168.16.255
inet6 addr: fe80::221:28ff:fee7:c372/64
UP BROADCAST RUNNING MULTICAST MTU:9000
17
18. [root@odaorcl1 ~]# ethtool eth0
Settings for eth0:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: d
Current message level: 0x00000001 (1)
18
Link detected: yes
21. B u t W a it !
Event Waits Time(s) (ms) time
Wait Class
------------------------------ ------------ ----------- ------ ------
DB CPU 6,459 29.9
buffer busy waits 123,162 3,725 30 17.3
Concurrenc
gc buffer busy release 8,871 3,383 381 15.7
Cluster
gc current block 2-way 3,282,774 1,969 1 9.1
Cluster
gc buffer busy acquire 11,073 1,364 123 6.3
Cluster
21
22. B u t W a it !
Event Waits Time(s) (ms) time
Wait Class
------------------------------ ------------ ----------- ------ ------
enq: US - contention 1,123,271 33,733 30 38.2
Other
enq: HW - contention 42,551 17,317 407 19.6
Configurat
buffer busy waits 156,152 11,550 74 13.1
Concurrenc
latch: row cache objects 798,648 6,181 8 7.0
Concurrenc
DB CPU 5,796 6.6
22
23. I need that
buffer.
I’m busy!
Waiting
381 ms later:
Here’s the
buffer!
23
24. In t e r c o n n e c t A g a in
Send Receive
Used By Mbytes/sec Mbytes/sec
---------------- ----------- -----------
Global Cache 48.94 43.04
Parallel Query .00 .00
DB Locks 4.99 5.23
DB Streams .00 .00
Other .00 .01
In s t a n c e L a te nc y L a te nc y
5 0 0 B MS G 8 K MGS
1 0.14 0.13
2 0.58 0.69
24
27. S S D
• 4x 73GB
• D e d ic a t e d t o r e d o lo g s
• Reminder:
• 0.025ms read
• 0.250ms write (best case)
• Writes are not just writes
• Over-provisioning
27
29. S S D fo r R e d o
• Not a general recommendation
• Consistent low latency
• Works well for multiple databases
• Leftover space
29
30. O D A : S S D P e rfo rm a nc e
fo r L G WR
30
31. M o re L G WR P e rfo rm a n c e
Saturating LGWR Test
• 3200 writes, 2 nodes, 0.2ms latency
• LGWR spent 70% of time on CPU
SwingBench Order Entry
• 4500 TPS
• Bottleneck was buffer busy contention
Big data load
• 100K size write, several ms latency
• Data warehouse load – bad fit for ODA
31
44. C h o o s in g C o n s o lid a t io n
C a n d id a t e s
• Vendor limitations
• SLAs
• Dependencies
• CPU utilization
• Workload type
Big Question: Will it fit?
44
45. C o lle c t m e t r ic s
• CPU utilization
• Memory usage – SGA + PGA
• Storage requirements
• Workload types
• I/O requirements – IOPS, throughput
• RAC – current interconnect load
45
46. C PU
Build time-based model of utilization on existing servers:
Time S1 (8 S2 (4 S3 (32 Total
core) core) core)
00:00 50% 25% 10% 8*0.5+4*0.25+32*0.1 = 8.2
00:15 30% 50% 10% 7.6
00:30 100% 25% 10% 12.2
We calculated 12.2 cores in use at peak time.
ODA’s 24 cores give plenty of spare capacity
You can get more accurate results by taking core speed into
account. This is a rough model.
46
47. Me mory
• Easiest way: Sum memory on existing servers
• Actually: Sum SGA and PGA sizes, and leave
20-30% spare
Use advisors:
• OEMgives graphs with SGA and PGA size
recommendations.
47
48. IO C a p a c it y
• OLTP and DWH go in separate boxes
• Each can be standby of the other
• Consider throughput and latency requirements
• According to our tests:
• 12K redo IOPS at 0.5 ms latency
• Over 3000 data file IOPS at 15ms latency
• Almost 6000 if using outside only
• Can reach 2.4GBPS
48
49. D is k S p a c e
• High redundancy – triple data usage
• Can use external storage if needed
• ZFS supports HCC
• Take backups into account
49
50. Te s t i n g
• Always test
• Bad tests are still better than no tests
• Replicating production load:
• RAT
• “Brewing Benchmarks”
• Jmeter, Loadrunner, etc
• Especially test:
• Migration strategy and times
• Non-RAC applications going to RAC
• Upgrades
50
Lets get started! My name is Gwen Shapira, I ’ m a senior consultant for Pythian. We are here to discuss the performance of the Oracle Database Appliance. I get two type of performance questions from companies considering ODA: I need to scale my application. Is ODA the answer? I ’ m planning to move to ODA for other reasons, how do I know I ’ ll still get the performance I need. This presentation will address these two questions and give you an idea of which applications and workloads are a good fit for ODA, and what kind of performance you can expect.
Alex Gorbachev, Pythian ’ s CTO and President of the RAC special interest group. He ran many of the tests and benchmarks that we ’ ll show in this presentation. I ’ m a senior consultant for Pythian with many years of RAC experience. I ran other benchmarks and will be presenting the results here. We are both Oracle ACE Directors and members of the Oak Table Network.
- Successful growing business for more than 10 years - Served many customers with complex requirements/infrastructure just like yours. - Operate globally for 24 x 7 “always awake” services
Enough about us – lets talk about ODA Simple and RAC did not use to appear in the same sentence. RAC is a complex system with many components and dependencies on storage and network. Setting up RAC system requires a lot of coordination between network admins, storage admins, sysadmins and DBAs. Its considered a large project and can take a long time (weeks) to get going right. ODA is intended to be plug-and-play solution. Get going relatively quickly (hours instead of days or weeks), with a pre-configured system it is more difficult to get things wrong. Doesn ’ t have to be RAC! One customer asked us if he can have a dataguard standby with primary on one node and standby on the other – not recommended, but definitely a possibility!
Interconnect and storage have big impact on performance
You see 24 disks here and various indicator lights. The upper row has the 4 SSD disks. If you need to replace a disk – this is where you do it.
On the left: power supply. 4 network port in two bonded interfaces for backups, DR. Two large ports below are 10gE public database interface. On the right panel: leftmost is the serial connector to console, then 2x1gE for public network, ethernet ILOM connection and USB+Video connectors. What you don ’ t see is the interconnect. There are two on-board integrated interconnect interfaces. Not bonded but used for redundancy.
This is the part that plugs into the back plane, with the interconnects and power supply.
When we do forklift migrations to Exadata (i.e. with no application changes), we are always impressed by the performance improvements. 10x improvement is not rare, its expected. Some of it is due to Exadata secret sauce (Mostly not included in ODA) But some is due to modern, well thought out hardware architecture that is pre-configured And some improvements are due to 11gR2 optimizations. With ODA you get two of the big Exadata benefits. Westmere cores, fastest you can get at 95W (easier to cool)
Each node has two HBAs, each connected to two expanders. Each expander is connected to 12 disks. Each disk has two ports so it has connectivity to both nodes. There are two paths from the node to each disk – through both HBAs, but you don ’ t need to configure multipathing or even know what it is. It is pre-configured. Another nice thing is the high availability – any component can (and will) fail without impacting the system availability. Pull out a disk, a cable, an entire node, shoot a hole through an HBA – the system will keep running.
This is a pretty good deal. Going with HP hardware, it will be close to 25K just for the DB servers, and you still need the interconnect network, shared storage and its network and SSD.
Don’t believe benchmarks! But here’s my test for small (10G) OLTP database.
Note that we have two interconnect interfaces, for redundancy. Jumbo frames is configured by default. This is a big deal – jumbo-frames improves performance and is normally a pain to set up for RAC.
Note that we used “ cheats ” to improve application scalability – sequences were created with cache size of 1000 (not usual 20!) and many indexes are reversed to reduce contention. These “ cheats ” can improve scalability – so you should use them too!
GC Wait doesn’t necessarily mean the interconnect is a problem
128Mb/s theoretical saturation
You want the time to write to redo log to be as fast as possible, because a transaction has to wait until redo is written to disk when it commits before it can move on. This is a serial part and can quickly become a bottleneck and impact the performance of an entire instance. From our previous benchmarks, we were already pretty sure that ODA configuration does not pose specific problems in this regard, but we wanted to take a closer look and find the limits of how much redo we can push.
Lets start with something important you need to know about ODA – It is intended to be a RAC cluster. Therefore the storage has to be shared, which means that there can ’ t be any non-shared cache between the storage and the database. SANs have cache to speed up redo processing because its performance is so critical and we don ’ t want anything slowing down commits, but ODA can ’ t do it. This means that excessive IO can impact the latency of the HDDs. Traditional systems place redo on their own array or carefully configure the SAN to reduce redo latency. ODA takes an easy solution – SSD. Of course, datafiles are still written to normal disks which can get congested, so tuning DBWR to avoid excessive IO is still recommended.
, I’ve read Oracle claims that no redo log write will take more than 0.5 ms. According to my ORION benchmarks doing sequential 32K writes (Figure 2), I have achieved around 4,000 writes to SSD disks accounting for ASM high redundancy (i.e., one redo write is written on three disks) with eight to ten parallel threads. This means four to five RAC databases with each instance aggressively pounding redo logs. In this situation, the average write time is still around 0.5 ms. Note that because of the piggyback effect of multiple commits, the effective achievable transaction rate is actually higher On corporate SAN we are often happy with 2-3ms commit times.
Without writes – 20ms latency with 4700 IOPS, with 40% writes its 4500 IOPS
Without writes – 20ms latency with 4700 IOPS, with 40% writes its 4500 IOPS
Depending on the patterns of parallel scans, I was able to get up to 2.4 GBPS using ORION on a single node with 1GB reads. t Oracle specs of ODA claim up to a 4 GB scan rate. We didn ’ t test both nodes, so we don ’ t know what we can realistically reach.