2. The Challenge?
• You have an app that works
• You have users that like it
Awesome
• Performance is suffering as you scale.
• Reliability is getting worse, not better.
• As your data sets grow,
the problems are more pronounced.
• The operations team are talking about problems, not
solutions
Not so awesome
6. What is the root cause?
• Take the time to understand what happens when your code
asks the server to do some task.
select * from
some_production_table_with_100,000,000_records
Is really not the same workload as
select * from some_dev_table_with_100_records
• Look for evidence in logs and tools that provide real insight.
8. Issues of priority…
• Disk drive, single user
session
• Disk drives, Multiple
users….
9. Issues of Scale…
• Fetching Blocks, single
user session
• Fetching Blocks,
enterprise workload
10. Storage
• Many database and operating system vendor
recommendations are woefully out of date.
• Modern techniques utilising flash in the right way can deliver
millions of random IOPS.
• SAN and flash vendors have made dramatic changes over the
last few years that invalidate many of the old
recommendations.
• Some principles still hold and are important for optimised
performance
– 1 process writes to each disk group
– Avoid reads and writes occurring simultaneously if possible
11. CPU
• CPUs are not all created equal.
• Use SpecInt to compare if it matters for your workload.
• Split up the work and scale wide if you can. There is a reason
the web scale companies have.
• Don’t process work now that can wait until later.
• Later might be in a few seconds and on another box.
• Schedule intensive workloads like reports.
• Don’t expect your laptop and the production server to scale
the same way.
12. Memory
• Memory is addressable in various forms with performance
tradeoffs for capacity.
• Use the lowest latency one you can afford.
Memory Type Typical Capacity Approximate
Access time
CPU cache 30MB < 10 ns
DDR3 64GB <100ns
SSD ~ 800GB <20,000ns
FC or SAS ~ 1TB <20,000,000ns
SATA 4TB + <8,000,000ns
13. Network
• Why is it that we conceptualise networks from an individual
point of view?
15. Network
• Latency & Bandwidth are not the same thing.
– Think satellite delay on a TV interview
• In this context we use these definitions
– Latency is the amount of time a network takes to reach the other end.
– Bandwidth is the rate at which we can successfully transmit data to the
other end.
• This is why you need to test your app through a latency
generator.
– There are capable free open source tools such as WANEM
16. Middleware
• Websphere, WebLogic, JBOSS, Tomcat
– Garbage collection tradeoffs between JVM size and system
memory/CPU capacities.
• Django
– Read HighPerformanceDjango by the team from Lincoln Loop
– Sponsored by the Common Code team
17. SQL databases
• Microsoft SQL, Oracle DB, PostgreSQL & MySQL.
• Various strengths & weaknesses for each but have some key
things in common.
• Offload reporting away from OLTP workloads
• Indexes are important
• Transaction Logs are a performance bottleneck
• Think deeply about scaling out
• Think about caching queries
• Backups are critical because you will need to restore one day
18. Backup is about Restore
• Enterprise wide backup will find all your infrastructure failings
by pushing more data for longer while other work continues.
• Test your restores. Really, test them.
• Offload large backups away from your production systems.
19. Questions?
How to get in touch?
James Clifford
Email: james@proitconsulting.com.au
Phone: 0421 648 034
Brenton Carbins
Email: brenton@proitconsulting.com.au
Phone: 0409 779 230