SlideShare una empresa de Scribd logo
1 de 35
		Designing Information Structures for Performance and Reliability Key elements to maximizing DB Server Performance Bryan Randol IT/Systems Manager 1
Designing Information Structures for Performance and Reliability : Discussion Outline  DAY 1: Hardware Performance:  Systematic Tuning Concepts CPU Memory Architecture and Front-Side Bus (FSB) Data Flow Concepts Disk Considerations RAID DAY 2: Database Performance: OLAP vs. OLTP GreenPlum vs. PostgreSQL PostgreSQL Concepts and Performance Tweaking PSA v.1 – GreenPlum  AOPen mini-PCs “dbnode1-dbnode6” PSA v.2 – Tyan Transport w/PostgreSQL PSA v.3 – Current PSA Implementation, DELL PowerEdge 2950 w/PostgreSQL 8.3 2
3 I.	Database Server Performance:  Hardware & Operating System Considerations DAY 1: Hardware Performance
4 Designing Information Structures for Performance and Reliability : Discussion Outline Systematic tuning essentially follows these five steps: Assess the problem and establish numeric values that categorize acceptable behavior. (Know the system’s specifications and set realistic goals.) Measure the performance of the system before modification. (Benchmark) Identify the part of the system that is critical for improving the performance. This is called the “bottleneck”. (Analyze) Modify that part of the system to remove the bottleneck. (Upgrade/Tweak) Measure the performance of the system after  	modification. (Benchmark) Repeat steps 3-6 as needed.           (Continuous Improvement)
5 I.	Database Server Performance: Data Flow Concepts DB Files are stored in the filesystem on disk in blocks. A “job” is requested, initiating a “process thread”, associated files are read into memory “pages”. Memory pages are read into the CPU’s cache as needed. “Page-outs” to disk occur to make space as needed.  “Page-ins “ fromdisk are what slows down performance Once in CPU cache, jobs are processed in threads per CPU (or “core”).
6 I.	Database Server Performance:  Hardware & Operating System Considerations Server Performance Considerations: CPU: 	Each CPU has at least one core, each core processes jobs (threads) sequentially based on the job’s priority. Higher priority jobs get more CPU time. Multi-threaded jobs are distributed evenly across all cores (“parallelized”). Internal Clock Speed: Operations the CPU can process internally per second in MHz, as advertised. External Clock Speed:  Speed at which the CPU interacts with  	the rest of the system….also known as the front side bus (FSB). Memory Clock Speed: Speed at which RAM is given requests for data. Important PostgreSQL Performance Note: PostgreSQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs.  	However, if only a single database connection is active, it can only use one CPU.  	PostgreSQL does not use multi-threading to allow a single process to use multiple CPUs.
7 I.	Database Server Performance:  Hardware & Operating System Considerations Server Performance Considerations: Memory Architecture and FSB (Front Side Bus): 	On Intel based computers the CPU interfaces with memory through the “North Bridge” memory controller, across the FSB (Front Side Bus). 	FSB speed and the NorthBridge MMU (memory management unity) drastically affects the server’s performance, as it determines how fast data can be fed into the CPU from memory. 	Unless special care is taken, a database 	server running even a simple sequential  	scan on a table will spend 95% of its cycles 	waitingfor memory to be accessed. 	This memory access bottleneck is even more  	difficult to avoid in more complex database  	operations such as sorting, aggregation and  join, which exhibit a random access pattern. 	Database algorithms and data structures  	should therefore be designed and optimized  	for memory access from the outset.
8 I.	Database Server Performance:  Hardware & Operating System Considerations Intel “Xeon” based systems: Memory Access Challenges FSB is a fixed frequency and requires a separate chip to access memory. Newer processors will run at the same fixed FSB speed. Memory access is delayed by passing through the separate controller chip.  Both Processors share the same Front Side Buseffectively halving each processors bandwidth to memory, thereby stalling one processor while the other is accessing memory or I/O. All processor to system I/O and control must use this one path.  One interleaved memory bank for both processors,  again, effectively halving each processor’s bandwidth to memory.  Half the bandwidth of a 2 memory bank architecture.  All program access to graphics, PCI(e), PCI-X or other I/O must be through this bottleneck
9 I.	Database Server Performance:  Hardware & Operating System Considerations Multiprocessing Memory Access Approaches Intel Xeon Multiprocessing  “1st Gen.” ,[object Object]
NorthBridge controller produces overhead
UMA (Uniform Memory Access)Access to memory banks is “uniform”. AMD Multiprocessing ,[object Object]
 FSB is on the CPU
NUMA (Non-Uniform Memory Access)Latency to each memory bank varies
10 I.	Database Server Performance:  Hardware & Operating System Considerations Intel “Harpertown” Xeon Improvements DELL PowerEdge 2950 III (2 x Xeon E5405 = 8 cores) 4 cores/CPU + faster FSB ( >= 1333MHz) Northbridge Controller bandwidth increased to 21.3GB/sreads from memory, and 10.7GB/swrites into memory…32GB/s overall bandwidth. DELL PowerEdge 1950 (2 x Xeon E5405 = 8 cores)
11 I.	Database Server Performance:  Hardware & Operating System Considerations Disk Considerations (secondary storage): Seek Time/Rotational Delay: 	How fast the read/write head is positioned appropriately for reading/writing and how fast the addressed area is placed under the read/write head for data transfer… SATA  (Serial Advanced Technology Attachment) drives  are cheap and come in sizes up to 2.5TB, typically maxing out at 7200RPMs. (“Velociraptor” is the exception @ 10,000RPM) SAS (Serial Attached SCSI) drives are twice as fast (15,000 RPMS)  and typically twice as expensive, with roughly 1/5 the max capacity of SATA (~450GB). Bandwidth/Throughput (Transfer Time): Raw throughput rate at which data is transferred from disk into memory.  This can be aggregated using RAID, which will be discussed later. SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s  real speed. SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.
12 I.	Database Server Performance:  Hardware & Operating System Considerations Disk Considerations (secondary storage): Buffer/Cache: 	Disks contain intelligent controllers, read cache and write cache. When you ask for a given piece of data, the disk locates the data and sends it back to the motherboard. It also reads the rest of the track and caches this data on the assumption that you will want the next piece of data on the disk.  	This data is stored locally in its read cache. If, sometime later you request the next piece of data and it is in the read cache the disk can deliver it with almost no delay. Write back cache improves performance, because a 	write to the high-speed cache is faster than writes to 	normal 	RAM or disk….this cache aids in addressing the 	disk-to-	memory subsystem bottleneck. 	Most good drives feature a 32MB buffer cache.
13 I.	Database Server Performance:  Hardware & Operating System Considerations Disk Considerations : 4.	Track Data Density : 	Defines how much information can be stored on a given track. 	The higher the track data density, the more information the disk 	can store. 	If a disk can store more data on one track it does not have to 	move the head to the next track as often.  	This means that the higher the recording  	density the lower the chances are that the  	head will have to be moved to the next track  	to get the required data.
14 I.	Database Server Performance:  Hardware & Operating System Considerations Disk Considerations: 5.	RAID: (n = number of drives in array) 	“Redundant Array of Inexpensive Disks”. Pools disks together to aggregate their throughput by “striping” data in segments across each disk. Also provides fault-tolerance. (n = number of drives) RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none” 	RAID1 “Mirroring” (n/2):  Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized. RAID5 “Striping w/Parity” (n – 1):  Fast, with a drive set aside for fault-tolerance.  Only one drive can fail before the array is lost. RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
15 I.	Database Server Performance:  Hardware & Operating System Considerations Disk Considerations: RAID controller 	Device responsible for managing the disk drives in an array. 	Stores the RAID configuration while also providing additional disk cache. Offloads costly checksum routines from CPU in parity driven RAID configurations (e.g. RAID5 and RAID6) 	The type of internal and external interface dramatically impacts the overall I/O performance of the array. Internal bus interface should be PCIe v2.0 (500 MB/s per lane throughput). 	Most common cards are  x2, x4, and x8  “lanes” providing:  1GB/s, 2GB/s, and 4 GB/s throughput respectively. 	Notable external storage interfaces to the array enclosure include:
16 I.	Database Server Performance:  Hardware & Operating System Considerations Filesystem Considerations As an easy performance boost with no downside, make sure the file system on which your database is kept is mounted "noatime", which turns off the access time bookkeeping. XFS is a 64-bit filesystem, supports a maximum filesystem size of 8 binary exabytes minus one byte. On 32-bit Linux systems, XFS is “limited” to 16 binary terabytes. Journal updates in XFS are performed asynchronously to prevent a performance penalty. Files and directories in XFS can span allocation groups, each allocation group manages its own inode  tables (unlike EXT3/EXT2), providing scalability and parallelism. Multiple threads and processes can perform I/O operations on the same filesystem simultaneously. On a RAID array, a “stripe unit” can be specified within XFS at creation time. This maximizes throughput  by aligning  inode allocations with RAID stripe sizes. XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes,  and for holes within files for which no disk space is allocated.
17 I.	Database Server Performance:  Hardware & Operating System Considerations Takeaways from Hardware Performance Concepts: 	Keep relevant data closest to the CPU in memory once it has been read from disk.  More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads.  	A multithreaded database takes advantage of multi-processing by         	distributing a query into several threads across multiple CPUs,  	drastically increasing the query’s efficiency while reducing its  	process time. Faster disks with high bandwidth and low seek times maximize  	read performance into memory for CPUs to process complex queries.  	OLAP databases benefit from this because they scan large datasets  	frequently. 	Using RAID allows you to aggregate disk  I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
18 I.	Database Server Performance:  Hardware & Operating System Considerations DAY 2: Database Performance
19 II.	Software & Application Considerations: OLAP and OLTP    OLAP (Online Analytical Processing): Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model. DB size is typically 100GB to several TB (even petabytes) Mostly read-only operations, lots of scans, complex queries. Benefits from  multi-threading, parallel processing, and fast drives with highread throughput/low seek times. Key Performance Metrics:  Query throughput/Response time. OLTP (Online Transactional Processing):  Provides detailed audit, supports operations, needs detailed 	   data, finds one dataset quickly, uses a relational model. DB size typically < 100GB Short, atomic transactions. Heavy emphasis on  lightning fast writes. Key Performance Metrics:  Transaction Throughput, Availability
20 II.	Software & Application Considerations: OLAP and OLTP    Database Types: OLAP (Online Analytical Processing): OLAP databases should only receive historical business data and remain isolated from OLTP (transactional) databases. Summaries not transactions. Data in OLAP databases never change, OLTP data constantly changes. OLAP databases typically contain fewer tables arranged into a “star” or “snowflake” schema.  The central table in this star schema is called the “fact table”. The leaf tables are called “dimension tables”. The facts within a dimension table are called “members”. The joins between the dimension and fact tables allow you to browse through the facts across any number of dimensions. The simple design of the star schema makes it easier to write queries, and they run faster. OLTP database could involve dozens of tables, making query design complicated. In addition, the resulting query could take hours to run. OLAP databases make heavy use of indexes because they help find records in less time. In contrast, OLTP databases avoid them because they lengthen the process of inserting data.
21 II.	Software & Application Considerations: OLAP and OLTP    Database Types: OLAP (Online Analytical Processing): The process by which OLAP databases are populated is called: Extract, Transform, and Load (ETL). No direct  data-entries are made into a OLAP database, only summaritive bulk ETL transactions. A cube aggregates the facts in each level of each dimension in a given OLAP schema.  Because the cube contains all of the data in an aggregated form, it seems to know the answers to queries in advance. This arrangement of data into cubes overcomes a limitation of relational databases.
22 II.	Software & Application Considerations: OLAP and OLTP    OLAP (Online Analytical Processing): What happens during a query? Client statement is issued  Database Server Processes the query by locating extents  Data is found on Disk Results are sent through database server to client.
23 II.	Software & Application Considerations: PostgreSQL Query Flow PostgreSQL:  The Path of a Query 1.  Connection from Application. 2.  Parsing Stage 3.  Rewrite Stage 4.  Cost comparison and Plan/Optimization Stage 5.  Execution Stage 6.  Result
24 II.	Software & Application Considerations: OLAP and OLTP    GreenPlum and PostgreSQL: Of the open source database options, PostgreSQL is the most robust, object-relational database management system. GreenPlum is a commercially based PostgreSQL DBMS, adding enterprise (OLAP) oriented enhancements to PostgreSQL, promising the following features: ,[object Object]
 Massively Parallel Query Execution
 Unified Analytical Processing
 Shared-nothing massively parallel processing architecture
 Fault tolerance
 Linear Scalability
 “In-database” compression, 3-10x disk space reduction,     with corresponding I/O improvement. License was $20,000 every 6 months ($40,000/yr.) It’s important to note that PostgreSQL is free and can be modified to perform similarly to GreenPlum. We did just that with our PSA server reconstruction project.
PostgreSQL tweaks explained: PostgreSQL is tweaked through a configuration file called: “postgresql.conf”  This flat file contains several dozen parameters from which the master PostgreSQL service “postmaster” reads at startup.  Changes made to this file require the “postgresql “ service to be bounced (restarted) via the command as root: “service postgresql restart” Corresponding “postgresql.conf” parameter affecting query performance: Maximum Connections (max_connections): Determines the maximum number of concurrent connections to the database server.  Keep in mind that this figure is used as a multiplier for work_mem.   Shared Buffers (shared_buffers):  The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system.  Working Memory (work_mem):  If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.  25 II.	Software & Application Considerations: PostgreSQL Tweaks

Más contenido relacionado

La actualidad más candente

Microsoft SQL Server 2014 in memory oltp tdm white paper
Microsoft SQL Server 2014 in memory oltp tdm white paperMicrosoft SQL Server 2014 in memory oltp tdm white paper
Microsoft SQL Server 2014 in memory oltp tdm white paper
David J Rosenthal
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
Haris456
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
Sisimon Soman
 
PAM g.tr 3832
PAM g.tr 3832PAM g.tr 3832
PAM g.tr 3832
Accenture
 
Dheeraj chugh -_presentation_on_ms-dos
Dheeraj chugh -_presentation_on_ms-dosDheeraj chugh -_presentation_on_ms-dos
Dheeraj chugh -_presentation_on_ms-dos
REXY J
 
Presentacion pujol
Presentacion pujolPresentacion pujol
Presentacion pujol
Dylan Real G
 
2. the memory systems (module2)
2. the memory systems (module2)2. the memory systems (module2)
2. the memory systems (module2)
Ajit Saraf
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
Dhritiman Halder
 

La actualidad más candente (18)

Microsoft SQL Server 2014 in memory oltp tdm white paper
Microsoft SQL Server 2014 in memory oltp tdm white paperMicrosoft SQL Server 2014 in memory oltp tdm white paper
Microsoft SQL Server 2014 in memory oltp tdm white paper
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
logical memory-organisation
logical memory-organisationlogical memory-organisation
logical memory-organisation
 
internal_memory
internal_memoryinternal_memory
internal_memory
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Windows memory manager internals
Windows memory manager internalsWindows memory manager internals
Windows memory manager internals
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
How a cpu works1
How a cpu works1How a cpu works1
How a cpu works1
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
PAM g.tr 3832
PAM g.tr 3832PAM g.tr 3832
PAM g.tr 3832
 
Dheeraj chugh -_presentation_on_ms-dos
Dheeraj chugh -_presentation_on_ms-dosDheeraj chugh -_presentation_on_ms-dos
Dheeraj chugh -_presentation_on_ms-dos
 
Presentacion pujol
Presentacion pujolPresentacion pujol
Presentacion pujol
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Lec10. Memory and storage
Lec10.      Memory    and      storageLec10.      Memory    and      storage
Lec10. Memory and storage
 
2. the memory systems (module2)
2. the memory systems (module2)2. the memory systems (module2)
2. the memory systems (module2)
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
 

Destacado

Analects Wrksht
Analects WrkshtAnalects Wrksht
Analects Wrksht
lilgreenbb
 
Kevin Mexco
Kevin MexcoKevin Mexco
Kevin Mexco
kevinkv2
 
Dizzee Rascal Research Tongue N Cheek
Dizzee Rascal Research   Tongue N CheekDizzee Rascal Research   Tongue N Cheek
Dizzee Rascal Research Tongue N Cheek
Kelly Cartwright
 
ACTA PRIMERA REUNIÓN Red academica E.I
ACTA PRIMERA REUNIÓN Red academica E.IACTA PRIMERA REUNIÓN Red academica E.I
ACTA PRIMERA REUNIÓN Red academica E.I
Veronica Lopez Yanez
 
Dvd digipaks examples
Dvd digipaks  examplesDvd digipaks  examples
Dvd digipaks examples
TRUDY
 
Documentation required for Foster/Adopt Program
Documentation required for Foster/Adopt ProgramDocumentation required for Foster/Adopt Program
Documentation required for Foster/Adopt Program
Jonathan's Place
 

Destacado (20)

SaletteUSAtv MobileTech Classes 2012
SaletteUSAtv MobileTech  Classes 2012SaletteUSAtv MobileTech  Classes 2012
SaletteUSAtv MobileTech Classes 2012
 
doc - Κινητές Βιβλιοθήκες: διευρύνοντας τον ορίζοντα των συνεργασιών στις λαϊ...
doc - Κινητές Βιβλιοθήκες: διευρύνοντας τον ορίζοντα των συνεργασιών στις λαϊ...doc - Κινητές Βιβλιοθήκες: διευρύνοντας τον ορίζοντα των συνεργασιών στις λαϊ...
doc - Κινητές Βιβλιοθήκες: διευρύνοντας τον ορίζοντα των συνεργασιών στις λαϊ...
 
Writing J27
Writing  J27Writing  J27
Writing J27
 
Analects Wrksht
Analects WrkshtAnalects Wrksht
Analects Wrksht
 
Kevin Mexco
Kevin MexcoKevin Mexco
Kevin Mexco
 
First Week
First WeekFirst Week
First Week
 
Reportes de-evaluacion-2014-2015
Reportes de-evaluacion-2014-2015Reportes de-evaluacion-2014-2015
Reportes de-evaluacion-2014-2015
 
El patito feo
El patito feoEl patito feo
El patito feo
 
Android intro
Android introAndroid intro
Android intro
 
Library 2.0: Wikis
Library 2.0:  WikisLibrary 2.0:  Wikis
Library 2.0: Wikis
 
Portugal Ii
Portugal IiPortugal Ii
Portugal Ii
 
Forum May 2011 The NYS ALBETAC at NYU
Forum May 2011 The NYS ALBETAC at NYUForum May 2011 The NYS ALBETAC at NYU
Forum May 2011 The NYS ALBETAC at NYU
 
Dizzee Rascal Research Tongue N Cheek
Dizzee Rascal Research   Tongue N CheekDizzee Rascal Research   Tongue N Cheek
Dizzee Rascal Research Tongue N Cheek
 
Content Pages
Content PagesContent Pages
Content Pages
 
DFPS Request for Background Check
DFPS Request for Background CheckDFPS Request for Background Check
DFPS Request for Background Check
 
ACTA PRIMERA REUNIÓN Red academica E.I
ACTA PRIMERA REUNIÓN Red academica E.IACTA PRIMERA REUNIÓN Red academica E.I
ACTA PRIMERA REUNIÓN Red academica E.I
 
Ανάπτυξη καταλόγων: η χρήση των AACR και του UNIMARC από τις ελληνικές δημόσι...
Ανάπτυξη καταλόγων: η χρήση των AACR και του UNIMARC από τις ελληνικές δημόσι...Ανάπτυξη καταλόγων: η χρήση των AACR και του UNIMARC από τις ελληνικές δημόσι...
Ανάπτυξη καταλόγων: η χρήση των AACR και του UNIMARC από τις ελληνικές δημόσι...
 
Redis varnish js
Redis varnish jsRedis varnish js
Redis varnish js
 
Dvd digipaks examples
Dvd digipaks  examplesDvd digipaks  examples
Dvd digipaks examples
 
Documentation required for Foster/Adopt Program
Documentation required for Foster/Adopt ProgramDocumentation required for Foster/Adopt Program
Documentation required for Foster/Adopt Program
 

Similar a Designing Information Structures For Performance And Reliability

Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Tommy Lee
 
Blake Novak semester 2 presentation on overclocking and heat
Blake Novak semester 2 presentation on overclocking and heatBlake Novak semester 2 presentation on overclocking and heat
Blake Novak semester 2 presentation on overclocking and heat
Blake Novak
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
oscon2007
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
PostgreSQL Experts, Inc.
 
Chap2 5e u v2 - theory
Chap2 5e u v2 - theoryChap2 5e u v2 - theory
Chap2 5e u v2 - theory
dd25251
 

Similar a Designing Information Structures For Performance And Reliability (20)

Disks.pptx
Disks.pptxDisks.pptx
Disks.pptx
 
IO Dubi Lebel
IO Dubi LebelIO Dubi Lebel
IO Dubi Lebel
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case study
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
 
Introduction to Hard Disk Drive by Vishal Garg
Introduction to Hard Disk Drive by Vishal GargIntroduction to Hard Disk Drive by Vishal Garg
Introduction to Hard Disk Drive by Vishal Garg
 
Topics - , Addressing modes, GPU, .pdf
Topics - , Addressing modes, GPU,  .pdfTopics - , Addressing modes, GPU,  .pdf
Topics - , Addressing modes, GPU, .pdf
 
L21-Introduction-to-IO.ppt
L21-Introduction-to-IO.pptL21-Introduction-to-IO.ppt
L21-Introduction-to-IO.ppt
 
04.01 file organization
04.01 file organization04.01 file organization
04.01 file organization
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Blake Novak semester 2 presentation on overclocking and heat
Blake Novak semester 2 presentation on overclocking and heatBlake Novak semester 2 presentation on overclocking and heat
Blake Novak semester 2 presentation on overclocking and heat
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
 
Raid
Raid Raid
Raid
 
SQL 2005 Disk IO Performance
SQL 2005 Disk IO PerformanceSQL 2005 Disk IO Performance
SQL 2005 Disk IO Performance
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
 
os
osos
os
 
Introduction to Storage technologies
Introduction to Storage technologiesIntroduction to Storage technologies
Introduction to Storage technologies
 
Chap2 5e u v2 - theory
Chap2 5e u v2 - theoryChap2 5e u v2 - theory
Chap2 5e u v2 - theory
 
System's Specification
System's SpecificationSystem's Specification
System's Specification
 
Hard disk PPT
Hard disk PPTHard disk PPT
Hard disk PPT
 
HARD DISK DRIVE ppt
HARD DISK DRIVE pptHARD DISK DRIVE ppt
HARD DISK DRIVE ppt
 

Designing Information Structures For Performance And Reliability

  • 1. Designing Information Structures for Performance and Reliability Key elements to maximizing DB Server Performance Bryan Randol IT/Systems Manager 1
  • 2. Designing Information Structures for Performance and Reliability : Discussion Outline DAY 1: Hardware Performance: Systematic Tuning Concepts CPU Memory Architecture and Front-Side Bus (FSB) Data Flow Concepts Disk Considerations RAID DAY 2: Database Performance: OLAP vs. OLTP GreenPlum vs. PostgreSQL PostgreSQL Concepts and Performance Tweaking PSA v.1 – GreenPlum AOPen mini-PCs “dbnode1-dbnode6” PSA v.2 – Tyan Transport w/PostgreSQL PSA v.3 – Current PSA Implementation, DELL PowerEdge 2950 w/PostgreSQL 8.3 2
  • 3. 3 I. Database Server Performance: Hardware & Operating System Considerations DAY 1: Hardware Performance
  • 4. 4 Designing Information Structures for Performance and Reliability : Discussion Outline Systematic tuning essentially follows these five steps: Assess the problem and establish numeric values that categorize acceptable behavior. (Know the system’s specifications and set realistic goals.) Measure the performance of the system before modification. (Benchmark) Identify the part of the system that is critical for improving the performance. This is called the “bottleneck”. (Analyze) Modify that part of the system to remove the bottleneck. (Upgrade/Tweak) Measure the performance of the system after modification. (Benchmark) Repeat steps 3-6 as needed. (Continuous Improvement)
  • 5. 5 I. Database Server Performance: Data Flow Concepts DB Files are stored in the filesystem on disk in blocks. A “job” is requested, initiating a “process thread”, associated files are read into memory “pages”. Memory pages are read into the CPU’s cache as needed. “Page-outs” to disk occur to make space as needed. “Page-ins “ fromdisk are what slows down performance Once in CPU cache, jobs are processed in threads per CPU (or “core”).
  • 6. 6 I. Database Server Performance: Hardware & Operating System Considerations Server Performance Considerations: CPU: Each CPU has at least one core, each core processes jobs (threads) sequentially based on the job’s priority. Higher priority jobs get more CPU time. Multi-threaded jobs are distributed evenly across all cores (“parallelized”). Internal Clock Speed: Operations the CPU can process internally per second in MHz, as advertised. External Clock Speed: Speed at which the CPU interacts with the rest of the system….also known as the front side bus (FSB). Memory Clock Speed: Speed at which RAM is given requests for data. Important PostgreSQL Performance Note: PostgreSQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs. However, if only a single database connection is active, it can only use one CPU. PostgreSQL does not use multi-threading to allow a single process to use multiple CPUs.
  • 7. 7 I. Database Server Performance: Hardware & Operating System Considerations Server Performance Considerations: Memory Architecture and FSB (Front Side Bus): On Intel based computers the CPU interfaces with memory through the “North Bridge” memory controller, across the FSB (Front Side Bus). FSB speed and the NorthBridge MMU (memory management unity) drastically affects the server’s performance, as it determines how fast data can be fed into the CPU from memory. Unless special care is taken, a database server running even a simple sequential scan on a table will spend 95% of its cycles waitingfor memory to be accessed. This memory access bottleneck is even more difficult to avoid in more complex database operations such as sorting, aggregation and join, which exhibit a random access pattern. Database algorithms and data structures should therefore be designed and optimized for memory access from the outset.
  • 8. 8 I. Database Server Performance: Hardware & Operating System Considerations Intel “Xeon” based systems: Memory Access Challenges FSB is a fixed frequency and requires a separate chip to access memory. Newer processors will run at the same fixed FSB speed. Memory access is delayed by passing through the separate controller chip.  Both Processors share the same Front Side Buseffectively halving each processors bandwidth to memory, thereby stalling one processor while the other is accessing memory or I/O. All processor to system I/O and control must use this one path. One interleaved memory bank for both processors,  again, effectively halving each processor’s bandwidth to memory. Half the bandwidth of a 2 memory bank architecture.  All program access to graphics, PCI(e), PCI-X or other I/O must be through this bottleneck
  • 9.
  • 11.
  • 12. FSB is on the CPU
  • 13. NUMA (Non-Uniform Memory Access)Latency to each memory bank varies
  • 14. 10 I. Database Server Performance: Hardware & Operating System Considerations Intel “Harpertown” Xeon Improvements DELL PowerEdge 2950 III (2 x Xeon E5405 = 8 cores) 4 cores/CPU + faster FSB ( >= 1333MHz) Northbridge Controller bandwidth increased to 21.3GB/sreads from memory, and 10.7GB/swrites into memory…32GB/s overall bandwidth. DELL PowerEdge 1950 (2 x Xeon E5405 = 8 cores)
  • 15. 11 I. Database Server Performance: Hardware & Operating System Considerations Disk Considerations (secondary storage): Seek Time/Rotational Delay: How fast the read/write head is positioned appropriately for reading/writing and how fast the addressed area is placed under the read/write head for data transfer… SATA (Serial Advanced Technology Attachment) drives are cheap and come in sizes up to 2.5TB, typically maxing out at 7200RPMs. (“Velociraptor” is the exception @ 10,000RPM) SAS (Serial Attached SCSI) drives are twice as fast (15,000 RPMS) and typically twice as expensive, with roughly 1/5 the max capacity of SATA (~450GB). Bandwidth/Throughput (Transfer Time): Raw throughput rate at which data is transferred from disk into memory. This can be aggregated using RAID, which will be discussed later. SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s real speed. SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.
  • 16. 12 I. Database Server Performance: Hardware & Operating System Considerations Disk Considerations (secondary storage): Buffer/Cache: Disks contain intelligent controllers, read cache and write cache. When you ask for a given piece of data, the disk locates the data and sends it back to the motherboard. It also reads the rest of the track and caches this data on the assumption that you will want the next piece of data on the disk. This data is stored locally in its read cache. If, sometime later you request the next piece of data and it is in the read cache the disk can deliver it with almost no delay. Write back cache improves performance, because a write to the high-speed cache is faster than writes to normal RAM or disk….this cache aids in addressing the disk-to- memory subsystem bottleneck. Most good drives feature a 32MB buffer cache.
  • 17. 13 I. Database Server Performance: Hardware & Operating System Considerations Disk Considerations : 4. Track Data Density : Defines how much information can be stored on a given track. The higher the track data density, the more information the disk can store. If a disk can store more data on one track it does not have to move the head to the next track as often. This means that the higher the recording density the lower the chances are that the head will have to be moved to the next track to get the required data.
  • 18. 14 I. Database Server Performance: Hardware & Operating System Considerations Disk Considerations: 5. RAID: (n = number of drives in array) “Redundant Array of Inexpensive Disks”. Pools disks together to aggregate their throughput by “striping” data in segments across each disk. Also provides fault-tolerance. (n = number of drives) RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none” RAID1 “Mirroring” (n/2): Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized. RAID5 “Striping w/Parity” (n – 1): Fast, with a drive set aside for fault-tolerance. Only one drive can fail before the array is lost. RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
  • 19. 15 I. Database Server Performance: Hardware & Operating System Considerations Disk Considerations: RAID controller Device responsible for managing the disk drives in an array. Stores the RAID configuration while also providing additional disk cache. Offloads costly checksum routines from CPU in parity driven RAID configurations (e.g. RAID5 and RAID6) The type of internal and external interface dramatically impacts the overall I/O performance of the array. Internal bus interface should be PCIe v2.0 (500 MB/s per lane throughput). Most common cards are x2, x4, and x8 “lanes” providing: 1GB/s, 2GB/s, and 4 GB/s throughput respectively. Notable external storage interfaces to the array enclosure include:
  • 20. 16 I. Database Server Performance: Hardware & Operating System Considerations Filesystem Considerations As an easy performance boost with no downside, make sure the file system on which your database is kept is mounted "noatime", which turns off the access time bookkeeping. XFS is a 64-bit filesystem, supports a maximum filesystem size of 8 binary exabytes minus one byte. On 32-bit Linux systems, XFS is “limited” to 16 binary terabytes. Journal updates in XFS are performed asynchronously to prevent a performance penalty. Files and directories in XFS can span allocation groups, each allocation group manages its own inode tables (unlike EXT3/EXT2), providing scalability and parallelism. Multiple threads and processes can perform I/O operations on the same filesystem simultaneously. On a RAID array, a “stripe unit” can be specified within XFS at creation time. This maximizes throughput by aligning inode allocations with RAID stripe sizes. XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes, and for holes within files for which no disk space is allocated.
  • 21. 17 I. Database Server Performance: Hardware & Operating System Considerations Takeaways from Hardware Performance Concepts: Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing by distributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • 22. 18 I. Database Server Performance: Hardware & Operating System Considerations DAY 2: Database Performance
  • 23. 19 II. Software & Application Considerations: OLAP and OLTP OLAP (Online Analytical Processing): Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model. DB size is typically 100GB to several TB (even petabytes) Mostly read-only operations, lots of scans, complex queries. Benefits from multi-threading, parallel processing, and fast drives with highread throughput/low seek times. Key Performance Metrics: Query throughput/Response time. OLTP (Online Transactional Processing): Provides detailed audit, supports operations, needs detailed data, finds one dataset quickly, uses a relational model. DB size typically < 100GB Short, atomic transactions. Heavy emphasis on lightning fast writes. Key Performance Metrics: Transaction Throughput, Availability
  • 24. 20 II. Software & Application Considerations: OLAP and OLTP Database Types: OLAP (Online Analytical Processing): OLAP databases should only receive historical business data and remain isolated from OLTP (transactional) databases. Summaries not transactions. Data in OLAP databases never change, OLTP data constantly changes. OLAP databases typically contain fewer tables arranged into a “star” or “snowflake” schema. The central table in this star schema is called the “fact table”. The leaf tables are called “dimension tables”. The facts within a dimension table are called “members”. The joins between the dimension and fact tables allow you to browse through the facts across any number of dimensions. The simple design of the star schema makes it easier to write queries, and they run faster. OLTP database could involve dozens of tables, making query design complicated. In addition, the resulting query could take hours to run. OLAP databases make heavy use of indexes because they help find records in less time. In contrast, OLTP databases avoid them because they lengthen the process of inserting data.
  • 25. 21 II. Software & Application Considerations: OLAP and OLTP Database Types: OLAP (Online Analytical Processing): The process by which OLAP databases are populated is called: Extract, Transform, and Load (ETL). No direct data-entries are made into a OLAP database, only summaritive bulk ETL transactions. A cube aggregates the facts in each level of each dimension in a given OLAP schema. Because the cube contains all of the data in an aggregated form, it seems to know the answers to queries in advance. This arrangement of data into cubes overcomes a limitation of relational databases.
  • 26. 22 II. Software & Application Considerations: OLAP and OLTP OLAP (Online Analytical Processing): What happens during a query? Client statement is issued Database Server Processes the query by locating extents Data is found on Disk Results are sent through database server to client.
  • 27. 23 II. Software & Application Considerations: PostgreSQL Query Flow PostgreSQL: The Path of a Query 1. Connection from Application. 2. Parsing Stage 3. Rewrite Stage 4. Cost comparison and Plan/Optimization Stage 5. Execution Stage 6. Result
  • 28.
  • 29. Massively Parallel Query Execution
  • 31. Shared-nothing massively parallel processing architecture
  • 34. “In-database” compression, 3-10x disk space reduction, with corresponding I/O improvement. License was $20,000 every 6 months ($40,000/yr.) It’s important to note that PostgreSQL is free and can be modified to perform similarly to GreenPlum. We did just that with our PSA server reconstruction project.
  • 35. PostgreSQL tweaks explained: PostgreSQL is tweaked through a configuration file called: “postgresql.conf” This flat file contains several dozen parameters from which the master PostgreSQL service “postmaster” reads at startup. Changes made to this file require the “postgresql “ service to be bounced (restarted) via the command as root: “service postgresql restart” Corresponding “postgresql.conf” parameter affecting query performance: Maximum Connections (max_connections): Determines the maximum number of concurrent connections to the database server. Keep in mind that this figure is used as a multiplier for work_mem. Shared Buffers (shared_buffers): The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system. Working Memory (work_mem): If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents. 25 II. Software & Application Considerations: PostgreSQL Tweaks
  • 36. 26 The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf. The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf. II. Software & Application Considerations: PostgreSQL Tweaks PostgreSQL tweaks explained: Shared Buffers PostgreSQL does not directly change information on disk. Instead, it requests data be read into the PostgreSQL shared buffer cache. PostgreSQL backends then read/write these blocks, and finally flush them back to disk. Backends that need to access tables first look for needed blocks in this cache. If they are already there, they can continue processing right away. If not, an operating system request is made to load the blocks. The blocks are loaded either from the kernel disk buffer cache, or from disk. These can be expensive operations. The default PostgreSQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find information in cache...to a limit.
  • 37. 27 The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf. The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf. II. Software & Application Considerations: PostgreSQL Tweaks PostgreSQL tweaks explained: Shared Buffers “How much is too much?” Setting “shared_buffers” too high results in expensive “paging”...which severely degrades the database’s performance. If everything doesn't fit in RAM, the kernel starts forcing memory pages to a disk area called swap. It moves pages that have not been used recently. This operation is called a swap pageout. Pageouts are not a problem because they happen during periods of inactivity. What is bad is when these pages have to be brought back in from swap, meaning an old page that was moved out to swap has to be moved back into RAM. This is called a swap pagein.This is bad because while the page is moved from swap, the program is suspended until the pagein completes.
  • 38. PostgreSQL tweaks explained: Horizontal “Range” Partitioning: Also known as “shard” involves putting different rows into different tables for improved manageability and performance. Benefits of partitioning include: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Seldom-used data can be migrated to cheaper and slower storage media. 28 II. Software & Application Considerations: PostgreSQL Tweaks
  • 39. PostgreSQL tweaks explained: Partitioning (cont.) The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server. The following forms of partitioning can be implemented in PostgreSQL: Range Partitioning (aka “Horizontal”) The table is partitioned into "ranges" defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example one might partition by date ranges, or by ranges of identifiers for particular business objects. List Partitioning The table is partitioned by explicitly listing which key values appear in each partition. 29 II. Software & Application Considerations: PostgreSQL Tweaks
  • 40. PostgreSQL tweaks explained: VACUUM: Ensures database is ACID Atomic Consistent Isolated Durable PostgreSQL uses MVCC (Multi-version Concurrency Control)…eliminating read locks on records by allowing several versions of data to exist in a database. VACUUM removes old versions of this multi-versioned data in base tables from the database. These old versions waste space once a commit is made. To keep a PostgreSQL database performing well, you must ensure VACUUM is run correctly. AUTOVACUUM suffices for our query based, low transaction database, keeping dead space to a minimum. 30 II. Software & Application Considerations: PostgreSQL Tweaks
  • 41. 31 III. PSA Server Case Studies: AOPen mini-PCs + GreenPlum PSA Server (v1): “dbnode1 – dbnode6” Originally, PSA was hosted on GreenPlum using 6 AOpen mini-PC nodes. Performance was slow, disk I/O was roughly 90MB/s (realized), Sysco’s weekly reports took roughly 15 minutes. Database volume was constantly around 90% capacity, causing Mike to have to manually delete tables…space was at a premium. Licensing with GreenPlum was expensive ($20,000/6 months….$40,000/yr.) and the system didn’t deliver performance as promised (in either PSA or NewOps). NewOps’ performance should have been significantly better given it’s more robust hardware (12 x DELL PowerEdge 2950’s). Since GreenPlum is based on PostgreSQL, it made sense to leverage the underlying free open source code and scrap the proprietary distributed DB solution, opting for a standalone server with enhanced space and I/O. Migrating existing tables to PostgreSQL required very little modification. The mini-PC’s we used to cluster GreenPlum were limited in capacity and scalability…each box was sealed and didn’t allow for expansion. Mini-PC Details: AOpenMP965-D Intel® Core™2 Duo CPU T7300 @ 2GHz 3.24GB Memory Bus Speed: 800MHz 150GBSATA Drive
  • 42. III. PSA Server Case Studies: TYAN Transport + PostgreSQL PSA Server (v2): “sentrana-psa-dw” This is our second generation PSA box, this time using PostgreSQL 8.3 instead of GreenPlum. Formerly used as a testing box at the colo, named “econ.sentrana.com”….consists of a basic Tyan Transport GX28 (B2881) commodity chassis, with a Tyan Thunder K8SR (S2881) motherboard, 2 Dual Core AMD Opteron 270’s @ 1000MHz w/2MBL2 Cache, 8GB memory, and 4 SATA-1 drive bays (SATA-II drives are backwards compatible, able to fit in these bays, however running at SATA-I speed). Filesystem: EXT3 (4KB block size = kernel page size) Storage Configuration: 4 drives bays = 1 OS drive + 3 RAID5 DB Drives @ SATA-I speed (150MB/s) Read Performance: ~ 76.75MB/s 32
  • 43. III. PSA Server Case Studies: DELL PowerEdge 2950 + PostgreSQL PSA Server (v3): “psa-dw-2950” This is our third (and current) generation PSA box, still using PostgreSQL, only the server platform has evolved to a DELL PowerEdge 2950, with dual Xeon Quad Core processors @ 2.5GHz, 16GBDDR memory, 1333MHz FSB, and 6 SATA-II/SAS drive bays configured via PCIe PERC6/I integrated RAID controller. Formerly used as one of the NewOps DBNode’s, with GreenPlum, this box was rebuilt from the OS out using Ubuntu 8.10 Linux as the OS serving PostgreSQL 8.3 as the DB System. Filesystem: XFS (4KB block size = kernel page size) Storage Configuration: 6 x 1TB Drives @ 7,2KRPMs (300Mb/s SATA-II speed) in single RAID5 array ~ 5TB actual storage space (5 drive spindles used for data, 1 for RAID5 parity) Read Performance: ~ 507MB/s 33
  • 44. III. PSA Server Case Studies: DELL PowerEdge 2950 + PostgreSQL PSA Server (v3): “psa-dw-2950” Postgresql.conf settings: max_connections = 25 shared_buffers = 4096MB (1/4 total physical memory) (Sets the amount of memory the database server uses for shared memory buffers. ) temp_buffers = 1024MB (Sets the maximum number of temporary buffers used by each database session.) work_mem = 4096MB Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. (too high = paging will occur, too low = writing to tempdb) maintenance_work_mem = 256MB random_page_cost = 2.0 (query planner constant... stating the cost of using disks is 2.0) effective_cache_size = 12288MB (query planner constant) constraint_exclusion = on (query planner uses table constraints to optimize queries...e.g. partitioned tables) 34
  • 45. 1725 Eye St. NW, Suite 900 Washington DC, 20006 OFFICE 202.507.4480 FAX 866.597.3285 WEB sentrana.com

Notas del editor

  1. Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  2. Set realistic goals, know the hardware’s expected limitations....Measure current performance....Analyze the results (research upgrades and possible performance problems)Modify the system..Benchmark again....Repeat as needed
  3. Client issues a query across the network ....Database server searches cache and memory for database extents...if they’re not found in memory, they’re located on disk...Disks then seek out the blocks containing the database extents and begin loading the data into memory...Memory pages are then fed into CPU’s cache and ultimately into the CPU for processing...Results are found and sent back to the client
  4. Each CPU has several coresInternal Clock Speed: processes per second in MHz or GHz (advertised)External Clock Speed: speed the FSB is accessed (typical bottleneck)Memory Clock Speed: speed at which RAM is given requests for data (another bottleneck)PostgreSQL is multi-process, one Unix process per DB connection.A single connection can only use on CPU, not multithreaded.
  5. CPU speed has increased roughly 70% each year, memory speed hasn’t kept up.DDR Memory (double data rate) allows for sending data to the CPU at the top and bottom of the clock cycle (sine wave). Doubling throughput....still bottleneck.Memory tradeoff is typically speed for capacity, faster = less capacity/more expensiveFurther out from the CPU you go, the slower and greater capacity the storage. Also, disk is the only permanent storage....also holds swap.
  6. 1st Generation Xeon multiprocessing bottlenecks....still bottlenecks today, but less so...Shared FSB between processors....halves bandwidth to memory....second processor competes for FSB bandwidth...Memory access delayed between controller and memory bankBandwidth between I/O controller (Southbridge) and memory controller (Northbridge) congested...as is bandwidth from Northbridge to the expansion slots.
  7. Here you see how each Intel processor shares a common FSB bandwidth, dividing the bandwidth per CPU. Access to memory must be at the reduced bandwidth, through the northbridge memory controller, and into the memory banks.AMD’s approach places a Northbridge controller directly on each processor, so there’s no external chipsets to deal with. Each processor features three point-to-point HyperTransport links, delivering 3.2 GB/s of bandwidth in each direction (6.4GB/s full duplex). So AMD’s scalability was better in the earlier days of Xeon multiprocessing.
  8. “Second Generation” (Harpertown) Xeon Processors E5200/E5400:==========================================Each CPU has a clock speed of 2GHz, 12MB of L2 cache, and a FSB of 1333MHz (1600MHz max on other models). The read bandwidth for each DDR2 667-MHz memory channel is 5.325 GB/s which gives a total read bandwidth of 21.3 GB/s for four memory channelsWrite memory bandwidth through the same four channels is 10.7 GB/s write memory bandwidth for the same four memory channels. Overall Effective bandwidth to memory is then 32 GB/s ... 21.3 GB/s read and 10.7 GB/s write.5500-series "Nehalem-EP" (Gainestown) adds: (December 2008)=================================Integrated memory controller supporting 2-3 DDR3 memory channelsPoint to Point processor interconnect called “QuickPath” (like AMD’s HyperTransport), bypassing FSBHyperthreading, doubling each core
  9. There’s 3 delays associated with reading or writing to a hard drive:Seek Time, Rotational Delay, and Transfer TimeSeek Time is the time it takes for the drive’s read/write head to be physically moved into the correct place for the data being sought.Rotational Delay is the time required for the addressed area of the disk to rotate into a position where it is accessible by the read/write head….typically measured in milliseconds.Transfer Time is the time it takes to transfer data from the disk through the read/write head, across the storage bus, into memory for processing by the CPU.Seek Time/Rotational Delay is heavily influenced by the disk’s rotational speed (RPMs), data location on the actual platters, how many platters the disk has, and the diameter of the platters.Generally speaking, the faster a disk spins, the lower its seek times will be. Also, the further outside the circumference of the platter data is located, the faster it will be sought and lower it’s rotational delay will be.Bandwidth/Throughput (Transfer Time): Once data is located, this is the raw throughput rate at which data is transferred from disk into memory. This can be aggregated using RAID, which will be discussed later.SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s real speed.SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.Generally speaking, the higher the data density of the platter, the more data will be sent through the read/write head per block…resulting in higher throughput and lower transfer times.
  10. Buffer/Cache:Writeback-cacheData normally written to disk by the CPU is first written into disk’s the cache. This allows for higher write performance with the risk that data stored in cache isn’t flushed to disk before a power During idle machine cycles, the data are written from the cache into memory or onto disk. Write back caches improve performance, because a write to the high-speed cache is faster than to normal RAM or disk….this cache aids in Addressing the disk-to-memory subsystem bottleneck.I’ve enabled write-back caching on all of our RAID arrays. RAID will be discussed later.
  11. 4. Track Data Density :Defines how much information can be stored on a given track. The higher the track data density, the more information the disk can store on one track. If a disk can store more data on one track it does not have to move the head to the next track as often. This means that the higher the recording density the lower the chances are that the head will have to be moved to the next track to get the required data.
  12. RAID: (n = number of drives in array) “Redundant Array of Inexpensive Disks”. RAID systems improve performance by allowing the controller to exploit the capabilities of multiple hard disks to get around performance-limiting mechanical issues that plague individual hard disks. Different RAID implementations improve performance in different ways and to different degrees, but all improve it in some way. RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none” RAID1 “Mirroring” (n/2): Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized.RAID5 “Striping w/Parity” (n – 1): Fast, with a drive set aside for fault-tolerance. Only one drive can fail before the array is lost.RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
  13. Normal PCI’s bandwidth is 132MB/sAGP8x is 2.1GB/sPCI Express outperforms PCI significantly:PCIe is bidirectional/full-duplex...allowing data to flow in both directions simultaneously (doubling throughput):PCIe 1x = 500MB/s (250MB/s each way)PCIe 2x = 1GB/s (500MB/s each way)PCIe 4x = 2GB/s (1GB/s each way)PCIe 8x = 4GB/s (2GB/s each way)PCIe 16x = 8GB/s (4/GB each way)Even PCIe 32x = 16GB/s (8GB/s each way)So, to open the Internally, you want to use PCIe, not just plain PCI which is old and slow in comparison.AGP is also obsolete due to PCIe’s introduction, now graphics cards use this interface as well.Regular PCI is a bottleneck in modern computers.All of our 2950 servers have PERC6/i RAID controllers built in, the “i” means “integrated’ on the motherboard.I found that our throughput was significantly slower than what we expected, despite having 6 SATA-II drives even in RAID0. The settings we selected for the RAID Virtual Drives were: Stripe Element Size 64KB Read Policy: Adaptive Read-Ahead (to optimize large read operations) Write Policy: Write BackWe were seeing read speeds in RAID5 of approximately 150-225MB/s across 4 drives..which we knew was way too slow given the hardware.After rebuilding the array several times and searching around on the Internet, I came across DELL’s PERC firmware update site, which showed that a newer release was available: v.6.2.0-0013“performance enhancements including significant improvements in random-write performance, multi-threaded write performance, and reduction in maximum and average I/O response times.”I couldn’t flash the PERC controllers without a floppy, so I had to create a Linux based FreeDOS bootable CD with the updated PERC firmware in a subdirectory, allowing me to successfully flash the controller’s BIOS. Later, I discovered that DELL’s OpenManage CD provides a tool to handle BIOS updates, however, I wasn’t able to get this working....so the FreeDOS solution worked out.I also dug around and found that I could set filesystem read-ahead parameters through “hdparm” in Linux that would allow me to tell the OS to read ahead 2048 blocks whenever a read operation was performed. I set this in /etc/rc.local to persist after a reboot.Once the PERC controller was flashed and linux filesystem readahead was set, performance increased dramatically:We’re now seeing just over 500MB/s reads in RAID5/6.This significantly reduces the time it takes to load tables into memory for complex queries, thereby reducing overall query execution time. Performance is now on par with GreenPlum without having to pay $40,000/year licensing.
  14. Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  15. Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  16. Database Types:OLAP (Online Analytical Processing):Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model.DB size is typically 100GB to several TB (even petabytes)Mostly read-only operations, lots of scans, complex queries.Benefits from multi-threading, parallel processing, and fast drives with highread throughput/low seek times.Key Performance Metrics: Query throughput/Response time.OLTP (Online Transactional Processing): Provides detailed audit, supports operations, needs detailed data, finds one dataset quickly, uses a relational model.DB size typically < 100GBShort, atomic transactions…read/write.Key Performance Metrics: Transaction Throughput, Availability
  17. Database Types:The time and expense involved in retrieving answers from databases means that a lot of business intelligence information often goes unused. The reason: most operational databases (OLTP) are designed to store your data, not to help you analyze it. The solution: an online analytical processing (OLAP) database, a specialized database designed to help you extract business intelligence information from your data.
  18. A connection from an application program to the PostgreSQL server has to be established.The parser stagechecks the query transmitted by the application program for correct syntax and creates a query tree. The rewrite systemtakes the query tree created by the parser stage and looks for any rules (stored in the system catalogs) to apply to the query tree. It performs the transformations given in the rule bodies. The planner/optimizer takes the (rewritten) query tree and creates a query planthat will be the input to the executor. It does so by first creating all possible paths leading to the same result. For example if there is an index on a relation to be scanned, there are two paths for the scan. One possibility is a simple sequential scan and the other possibility is to use the index. Next the cost for the execution of each path is estimated and the cheapest path is chosen. The executor recursively steps through the plan tree and retrieves rows in the way represented by the plan. The executor makes use ofthe storage systemwhile scanning relations, performs sorts and joins, evaluates qualifications and finally hands back the rows derived.
  19. GreenPlum and PostgreSQL:We found that despite the claims above, GreenPlum was overpriced, slow, and problematic. Furthermore, our GreenPlum PSA database grew to exceed the hardware we had in place, requiring us to constantly have to manually delete old tables. To replace GreenPlum while maintaining the table structures already in place, we opted to go with PostgreSQL, aware that it’s not pre-optimized for OLAP/data-warehouse applications.The mindset in doing this was that we could tweak PostgreSQL to mimic the actual performance we saw from GreenPlum without having to pay an expensive license.Understanding how to tweak PostgreSQL to mimic the performance of GreenPlum requires an understanding of PostgreSQL query execution characteristics and its tweak file concepts.
  20. Max_connections sets the maximum number of client connections per server. Several performance parameters use “max_connections” as part of their formula for tweaking Postgresql.Shared buffers: As the name implies, this is the maximum shared memory allowed to PostgreSQL. Too much and you risk paging.Working Memory:You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory. This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem.
  21. Max_connections sets the maximum number of client connections per server. Several performance parameters use “max_connections” as part of their formula for tweaking Postgresql.Shared buffers: As the name implies, this is the maximum shared memory allowed to PostgreSQL. Too much and you risk paging.Working Memory:You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory. This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem.
  22. Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE is far faster than a bulk operation. It also entirely avoids the VACUUM overhead caused by a bulk DELETE. Seldom-used data can be migrated to cheaper and slower storage media.
  23. Vacuuming ensures the databases remain ACID.... Atomic, Consistent, Isolated, and Durable.Atomicity: Guarantees that either all of the tasks of a transaction are performed or none.Consistency: Only valid data will ever be written to the database.Isolation: other operations cannot access or see the data in an intermediate state during a transaction.Durability: once the user has been notified of the transaction’s success, the transaction will persist and not be undone….thus surviving a system failure.
  24. disk I/O was roughly 90MB/s Database volume was constantly around 90% capacity, causing Mike to have to manually delete tables…space was at a premium.Licensing with GreenPlum was expensive ($20,000/6 months….$40,000/yr.) made sense to leverage the underlying free open source code and scrap the proprietary distributed DB solution
  25. Performance challenges with this server:Limited capacity, drives were small and slow (150MB/s).RAID controller’s SATA-1 interface didn’t recognize higher capacity SATA-II drives, despite SATA-II backwards compatibility.No floppy drive or USB boot capability…making it difficult to flash the controller and BIOS for SATA-II backwards compatibility. No PCIe expansion bays, only PCI-X, ruling out high-performance external enclosures.PostgreSQL requires significant configuration tweaks to realize decent performance.PostgreSQL isn’t multithreaded. A single query process (regardless of its complexity) uses only 1 CPU core.
  26. Performance challenges with this server:This is our third (and current) generation PSA boxDELL PowerEdge 2950, with dual Xeon Quad Core processors @ 2.5GHz16GBDDR memory @ 667MHz bus speed (x2, DDR)1333MHz FSB6 SATA-II1TB7200RPM Drives, configured in a single 5TBRAID5 Array…1 drive can fail. Throughput is across 5 spindles…effective capacity is roughly 4.5TB.Drive Setup: Used PERC6/i BIOS Menu for RAID configuration (hardware RAID, OS transparent), battery backup is enabled for write-caching.Virtual Drive1 (Physical drives 1 and 2): RAID1 for system (1TB), 64KB Stripe Element Size, Write-Back enabledVirtual Drive2 (Physical drives 3,4,5,6): RAID5 for PostgreSQL data (3TB), 64KB Stripe Element Size, Write-Back enabledI/O Performance: Read:    507MB/sWrite:   401MB/s
  27. Through process of elimination and online research (Google and Postgresql forums) we have setteled on the above settings in the PSA server’s configuration file:max_connections = 25Mike confirmed that only 10-15 PSA ever really connect at any given time, so this setting allows for spikes while remaining conservative enough to not inflate “work_mem” as work_mem uses max_connections in its memory allocation formula.shared_buffers = 4096MB This number comes from best practices, ¼ total physical memory (16GB)