SlideShare a Scribd company logo
1 of 32
Building and Optimizing DataBuilding and Optimizing Data
Warehouse "Star Schemas"Warehouse "Star Schemas"
with MySQLwith MySQL
Bert Scalzo, Ph.D.Bert Scalzo, Ph.D.
Bert.Scalzo@Quest.comBert.Scalzo@Quest.com
About the AuthorAbout the Author
 Oracle DBA for 20+ years, versions 4 through 10g
 Been doing MySQL work for past year (4.x and 5.x)
 Worked for Oracle Education & Consulting
 Holds several Oracle Masters (DBA & CASE)
 BS, MS, PhD in Computer Science and also an MBA
 LOMA insurance industry designations: FLMI and ACS
 Books
– The TOAD Handbook (Feb 2003)
– Oracle DBA Guide to Data Warehousing and Star Schemas (Jun 2003)
– TOAD Pocket Reference 2nd
Edition (May 2005)
 Articles
– Oracle Magazine
– Oracle Technology Network (OTN)
– Oracle Informant
– PC Week (now E-Magazine)
– Linux Journal
– www.linux.com
– www.quest-pipelines.com
Star Schema DesignStar Schema Design
Dimensions: smaller, de-normalized tables containing
business descriptive columns that users use to query
Facts: very large tables with primary keys formed from
the concatenation of related dimension table foreign key
columns, and also possessing numerically additive, non-
key columns used for calculations during user queries
“Star schema” approach to dimensional data
modeling was pioneered by Ralph Kimball
Facts
Dimensions
108th
– 1010th
103rd
– 105th
The Ad-Hoc ChallengeThe Ad-Hoc Challenge
How much data would a data miner mine,
if a data miner could mine data?
Dimensions: generally queried selectively to find lookup
value matches that are used to query against the fact table
Facts: must be selectively queried, since they generally
have hundreds of millions to billions of rows – even full
table scans utilizing parallel are too big for most systems
Business Intelligence (BI) tools generally offer end-users the
ability to perform projections of group operations on columns
from facts using restrictions on columns from dimensions …
Hardware Not CompensateHardware Not Compensate
Often, people have expectation that using
expensive hardware is only way to obtain
optimal performance for a data warehouse
•CPU
•SMP
•MPP
•Disk
•15,000 RPM
•RAID (EMC)
•OS
•UNIX
•64-bit
•MySQL
•4.x / 5.x
•64-bit
DB Design ParamountDB Design Paramount
In reality, the database design is the key
factor to optimal query performance for a
data warehouse built as a “Star Schema”
There are certain minimum hardware and
software requirements that once met, play a
very subordinate role to tuning the database
Golden Rule: get the basic database
design and query explain plan correct
Key Tuning RequirementsKey Tuning Requirements
1. MySQL 5.x
2. MySQL.ini
3. Table Design
4. Index Design
5. Data Loading Architecture
6. Analyze Table
7. Query Style
8. Explain plan
1. MySQL 5.X (help on the way)1. MySQL 5.X (help on the way)
•Index Merge Explain
•Prior to 5.x, only one index used per referenced table
•This radically effects both index design and explain plans
•Rename Table for MERGE fixed
•With 4.x, some scenarios could cause table corruption
•New ``greedy search'' optimizer that can significantly reduce the
time spent on query optimization for some many-table joins
•Views
•Useful for pre-canning/forcing query style or syntax (i.e. hints)
•Stored Procedures
•Rudimentary Triggers
•InnoDB
•Compact Record Format
•Fast Truncate Table
2. MySQL.ini2. MySQL.ini
•query_cache_size = 0 (or 13% overhead)
•sort_buffer_size >= 4MB
•bulk_insert_buffer_size >= 16MB
•key_buffer_size >= 25-50% RAM
•myisam_sort_buffer_size >= 16MB
•innodb_additional_mem_pool_size >= 4MB
•innodb_autoextend_increment >= 64MB
•innodb_buffer_pool_size >= 25-50% RAM
•innodb_file_per_table = TRUE
•innodb_log_file_size = 1/N of buffer pool
•innodb_log_buffer_size = 4-8 MB
3. Table Design3. Table Design
SPEED vs. SPACE vs. MANAGEMENT
64 MB / Million Rows (Avg. Fact)
500 Million Rows
===================
32,000 MB (32 GB)
Primary storage engine options:
•MyISAM
•MyISAM + RAID_TYPE
•MERGE
•InnoDB
CREATE TABLE ss.pos_day (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0‘
PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
ADD INDEX PERIOD(PERIOD_ID),
ADD INDEX LOCATION(LOCATION_ID),
ADD INDEX PRODUCT(PRODUCT_ID)
) ENGINE=MyISAM
PACK_KEYS
DATA_DIRECTORY=‘C:mysqldata’
INDEX_DIRECTORY=‘D:mysqldata’;
ENGINE = MyISAMENGINE = MyISAM
Pros:
•Non-transactional – faster, lower disk space usage, and less memory
Cons:
•2/4GB data file limit on operating systems that don't support big files
•2/4GB index file limit on operating systems that don't support big files
•One big table poses data archival and index maintenance challenges
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
CREATE TABLE ss.pos_day (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0‘
PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
ADD INDEX PERIOD(PERIOD_ID),
ADD INDEX LOCATION(LOCATION_ID),
ADD INDEX PRODUCT(PRODUCT_ID)
) ENGINE=MyISAM
PACK_KEYS
RAID_TYPE=STRIPED;
ENGINE = MyISAM + RAID_TYPEENGINE = MyISAM + RAID_TYPE
Pros:
•Non-transactional – faster, lower disk space usage, and less memory
•Can help you to exceed the 2GB/4GB limit for the MyISAM data file
•Creates up to 255 subdirectories, each with file named table_name.myd
•Distributed IO – put each table subdirectory and file on a different disk
Cons:
•2/4GB index file limit on operating systems that don't support big files
•One big table poses data archival and index maintenance challenges
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
CREATE TABLE ss.pos_merge (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0',
INDEX PK(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
INDEX PERIOD(PERIOD_ID),
INDEX LOCATION(LOCATION_ID),
INDEX PRODUCT(PRODUCT_ID)
) ENGINE=MERGE
UNION=(pos_1998,pos_1999,pos_2000) INSERT_METHOD=LAST;
ENGINE = MERGEENGINE = MERGE
Pros:
•Non-transactional – faster, lower disk space usage, and less memory
•Partitioned tables offer data archival and index maintenance options
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
•Distributed IO – put individual tables and indexes on different disks
Cons:
•MERGE tables use more file descriptors on database server
•MERGE key lookups are much slower on “eq_ref” searches
•Can use only identical MyISAM tables for a MERGE table
CREATE TABLE ss.pos_day (
PERIOD_ID decimal(10,0) NOT NULL default '0',
LOCATION_ID decimal(10,0) NOT NULL default '0',
PRODUCT_ID decimal(10,0) NOT NULL default '0',
SALES_UNIT decimal(10,0) NOT NULL default '0',
SALES_RETAIL decimal(10,0) NOT NULL default '0',
GROSS_PROFIT decimal(10,0) NOT NULL default '0‘
PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID),
ADD INDEX PERIOD(PERIOD_ID),
ADD INDEX LOCATION(LOCATION_ID),
ADD INDEX PRODUCT(PRODUCT_ID)
) ENGINE=InnoDB
PACK_KEYS;
ENGINE = InnoDBENGINE = InnoDB
Pros:
•Simple yet flexible tablespace datafile configuration
innodb_data_file_path=ibdata1:1G:autoextend:max:2G; ibdata2:1G:autoextend:max:2G
Cons:
•Uses much more disk space – typically 2.5 times as much disk space as MyISAM!!!
•Transaction Safe – not needed, consumes resources (SET AUTOCOMMIT=0)
•Foreign Keys – not needed, consumes resources (SET FOREIGN_KEY_CHECKS=0)
•Prior to MySQL 4.1.1 – no “mutliple tablespaces” feature (i.e. one table per tablespace)
•One big table poses data archival and index maintenance challenges
(e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
Space Usage 21 Million RecordsSpace Usage 21 Million Records
3GB
3GB
MERGE
MyISAM
8GB
InnoDB
4. Index Design4. Index Design
Index Design must be driven by DW users’ nature: you don’t know what
they’ll query upon, and the more successful they are data mining – the
more they’ll try (which is a actually a really good thing) …
Therefore you don’t know which dimension tables they’ll reference and
which dimension columns they will restrict upon – so:
•Fact tables should have primary keys – for data load integrity
•Fact table dimension reference (i.e. foreign key) columns should each
be individually indexed – for variable fact/dimension joins
•Dimension tables should have primary keys
•Dimension tables should be fully indexed
•MySQL 4.x – only one index per dimension will be used
•If you know that one column will always be used in
conjunction with others, create concatenated indexes
•MySQL 5.x – new index merge will use multiple indexes
Note: Make sure to build indexes based
off cardinality (i.e. leading portion most
selective), so in this case the index was
built backwards
MyISAM Key Cache MagicMyISAM Key Cache Magic
1. Utilize two key caches:
•Default Key Cache – for fact table indexes
•Hot Key Cache – for dimension key indexes
command-line option:
shell> mysqld --hot_cache.key_buffer_size=16M
option file:
[mysqld]
hot_cache.key_buffer_size=16M
CACHE INDEX t1, t2, t3 IN hot_cache;
2. Pre-Load Dimension Indexes:
LOAD INDEX INTO CACHE t1, t2, t3 IGNORE LEAVES;
5. Data Loading Architecture5. Data Loading Architecture
Archive:
•ALTER TABLE fact_table UNION=(mt2, mt3, mt4)
•DROP TABLE mt1
Load:
•TRUNCATE TABLE staging_table
•Run nightly/weekly data load into staging_table
•ALTER TABLE merge_table_4 DROP PRIMARY KEY
•ALTER TABLE merge_table_4 DROP INDEX
•INSERT INTO merge_table_4 SELECT * FROM staging_table
•ALTER TABLE merge_table_4 ADD PRIMARY KEY(…)
•ALTER TABLE merge_table_4 ADD INDEX(…)
•ANALYZE TABLE merge_table_4
6. Analyze Table6. Analyze Table
•Analyze Table statement analyzes and stores the
key distribution for a table
•MySQL uses the stored key distribution to decide
the order in which tables should be joined
• If you have a problem with incorrect index usage,
you should run ANALYZE TABLE to update table
statistics such as cardinality of keys, which can
affect the choices the optimizer makes
ANALYZE TABLE ss.pos_merge;
7. Query Style7. Query Style
Many Business Intelligence (BI) and Report Writing
tools offer initialization parameters or global settings
which control the style of SQL code they generate.
Options often include:
•Simple N-Way Join
•Sub-Selects
•Derived Tables
•Reporting Engine does Join operations
For now – only the first options make sense…
(see following section regarding explain plans)
8. Explain Plan8. Explain Plan
The EXPLAIN statement can be used either as a synonym for
DESCRIBE or as a way to obtain information about how the
MySQL query optimizer will execute a SELECT statement.
You can get a good indication of how good a join is by taking
the product of the values in the rows column of the
EXPLAIN output. This should tell you roughly how many
rows MySQL must examine to execute the query.
We’ll refer to this calculation as the explain plan cost – and
use this as our primary comparative measure (along with of
course the actual run time) …
Method: Derived Tables
Cost = 1.8 x 1016th
Huge Explain Cost, and
statement ran forever!!!
Method: Sub-Selects
Cost = 2.1 x 107th
Sub-Select better than
Derived Table – but not
as good as Simple Join
Method: Simple Joins and
Single-Column Indexes
Cost = 7.1 x 106th
Join better than Sub-
Select and much better
than Derived Table
Cost = 7.1 x 106th
Method: Merge Table and
Single-Column Indexes
Same Explain Cost, but
statement ran 2X faster
Method: Merge Table and
Multi-Column Indexes
Cost = 3.2 x 105th
Concatenated Index
yielded best run time
Method: Merge Table and
Merge Indexes
Cost = 5.2 x 105th
MySQL 5.0 new Index
Merge = best run time
Conclusion …Conclusion …
•MySQL can easily be used to build large and
effective “Star Schema” Data Warehouses
•MySQL Version 5.x will offer even more useful index
and join query optimizations
•MySQL can be better configured for DW use through
effective mysql.ini option settings
•Table and Index designs are paramount to success
•Query style and resulting explain plans are critical to
achieving the fastest query run times

More Related Content

What's hot

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland Bouman
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.George Joseph
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesMark Rittman
 
Hp vertica certification guide
Hp vertica certification guideHp vertica certification guide
Hp vertica certification guideneinamat
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseDavid Lauzon
 
SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB Shy Engelberg
 
Azure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersAzure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersMichaela Murray
 
Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubesmister_zed
 
Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Mahmoud Sabri
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerAntonios Chatzipavlis
 
Challenges in building a Data Pipeline
Challenges in building a Data PipelineChallenges in building a Data Pipeline
Challenges in building a Data PipelineHevo Data Inc.
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsIke Ellis
 
In memory big data management and processing a survey
In memory big data management and processing a surveyIn memory big data management and processing a survey
In memory big data management and processing a surveyredpel dot com
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDatawarehouse Trainings
 

What's hot (19)

Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" Sources
 
OLAP
OLAPOLAP
OLAP
 
Hp vertica certification guide
Hp vertica certification guideHp vertica certification guide
Hp vertica certification guide
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB
 
Azure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginnersAzure SQL Data Warehouse for beginners
Azure SQL Data Warehouse for beginners
 
Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
 
Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?Big Data .. Are you ready for the next wave?
Big Data .. Are you ready for the next wave?
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
OLAP
OLAPOLAP
OLAP
 
Challenges in building a Data Pipeline
Challenges in building a Data PipelineChallenges in building a Data Pipeline
Challenges in building a Data Pipeline
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
In memory big data management and processing a survey
In memory big data management and processing a surveyIn memory big data management and processing a survey
In memory big data management and processing a survey
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
 

Similar to Data Warehouse Logical Design using Mysql

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDBMongoDB
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performanceguest9912e5
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
01 upgrade to my sql8
01 upgrade to my sql8 01 upgrade to my sql8
01 upgrade to my sql8 Ted Wennmark
 
Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!Ted Wennmark
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Charley Hanania
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)Girish Srivastava
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMahesh Salaria
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMahesh Salaria
 
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Upgrade to MySQL 5.7 and latest news planned for MySQL 8Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Upgrade to MySQL 5.7 and latest news planned for MySQL 8Ted Wennmark
 
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerGeek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerIDERA Software
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataAbishek V S
 

Similar to Data Warehouse Logical Design using Mysql (20)

Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Where to save my data, for devs!
Where to save my data, for devs!Where to save my data, for devs!
Where to save my data, for devs!
 
01 upgrade to my sql8
01 upgrade to my sql8 01 upgrade to my sql8
01 upgrade to my sql8
 
Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!
 
NoSQL
NoSQLNoSQL
NoSQL
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
 
IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)IBM Pure Data System for Analytics (Netezza)
IBM Pure Data System for Analytics (Netezza)
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Upgrade to MySQL 5.7 and latest news planned for MySQL 8Upgrade to MySQL 5.7 and latest news planned for MySQL 8
Upgrade to MySQL 5.7 and latest news planned for MySQL 8
 
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL ServerGeek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
Geek Sync I Need for Speed: In-Memory Databases in Oracle and SQL Server
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
 
SQLServer Database Structures
SQLServer Database Structures SQLServer Database Structures
SQLServer Database Structures
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
 
MySQL 开发
MySQL 开发MySQL 开发
MySQL 开发
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 

Recently uploaded

2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 

Recently uploaded (20)

2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 

Data Warehouse Logical Design using Mysql

  • 1. Building and Optimizing DataBuilding and Optimizing Data Warehouse "Star Schemas"Warehouse "Star Schemas" with MySQLwith MySQL Bert Scalzo, Ph.D.Bert Scalzo, Ph.D. Bert.Scalzo@Quest.comBert.Scalzo@Quest.com
  • 2. About the AuthorAbout the Author  Oracle DBA for 20+ years, versions 4 through 10g  Been doing MySQL work for past year (4.x and 5.x)  Worked for Oracle Education & Consulting  Holds several Oracle Masters (DBA & CASE)  BS, MS, PhD in Computer Science and also an MBA  LOMA insurance industry designations: FLMI and ACS  Books – The TOAD Handbook (Feb 2003) – Oracle DBA Guide to Data Warehousing and Star Schemas (Jun 2003) – TOAD Pocket Reference 2nd Edition (May 2005)  Articles – Oracle Magazine – Oracle Technology Network (OTN) – Oracle Informant – PC Week (now E-Magazine) – Linux Journal – www.linux.com – www.quest-pipelines.com
  • 3.
  • 4. Star Schema DesignStar Schema Design Dimensions: smaller, de-normalized tables containing business descriptive columns that users use to query Facts: very large tables with primary keys formed from the concatenation of related dimension table foreign key columns, and also possessing numerically additive, non- key columns used for calculations during user queries “Star schema” approach to dimensional data modeling was pioneered by Ralph Kimball
  • 7. The Ad-Hoc ChallengeThe Ad-Hoc Challenge How much data would a data miner mine, if a data miner could mine data? Dimensions: generally queried selectively to find lookup value matches that are used to query against the fact table Facts: must be selectively queried, since they generally have hundreds of millions to billions of rows – even full table scans utilizing parallel are too big for most systems Business Intelligence (BI) tools generally offer end-users the ability to perform projections of group operations on columns from facts using restrictions on columns from dimensions …
  • 8. Hardware Not CompensateHardware Not Compensate Often, people have expectation that using expensive hardware is only way to obtain optimal performance for a data warehouse •CPU •SMP •MPP •Disk •15,000 RPM •RAID (EMC) •OS •UNIX •64-bit •MySQL •4.x / 5.x •64-bit
  • 9. DB Design ParamountDB Design Paramount In reality, the database design is the key factor to optimal query performance for a data warehouse built as a “Star Schema” There are certain minimum hardware and software requirements that once met, play a very subordinate role to tuning the database Golden Rule: get the basic database design and query explain plan correct
  • 10. Key Tuning RequirementsKey Tuning Requirements 1. MySQL 5.x 2. MySQL.ini 3. Table Design 4. Index Design 5. Data Loading Architecture 6. Analyze Table 7. Query Style 8. Explain plan
  • 11. 1. MySQL 5.X (help on the way)1. MySQL 5.X (help on the way) •Index Merge Explain •Prior to 5.x, only one index used per referenced table •This radically effects both index design and explain plans •Rename Table for MERGE fixed •With 4.x, some scenarios could cause table corruption •New ``greedy search'' optimizer that can significantly reduce the time spent on query optimization for some many-table joins •Views •Useful for pre-canning/forcing query style or syntax (i.e. hints) •Stored Procedures •Rudimentary Triggers •InnoDB •Compact Record Format •Fast Truncate Table
  • 12. 2. MySQL.ini2. MySQL.ini •query_cache_size = 0 (or 13% overhead) •sort_buffer_size >= 4MB •bulk_insert_buffer_size >= 16MB •key_buffer_size >= 25-50% RAM •myisam_sort_buffer_size >= 16MB •innodb_additional_mem_pool_size >= 4MB •innodb_autoextend_increment >= 64MB •innodb_buffer_pool_size >= 25-50% RAM •innodb_file_per_table = TRUE •innodb_log_file_size = 1/N of buffer pool •innodb_log_buffer_size = 4-8 MB
  • 13. 3. Table Design3. Table Design SPEED vs. SPACE vs. MANAGEMENT 64 MB / Million Rows (Avg. Fact) 500 Million Rows =================== 32,000 MB (32 GB) Primary storage engine options: •MyISAM •MyISAM + RAID_TYPE •MERGE •InnoDB
  • 14. CREATE TABLE ss.pos_day ( PERIOD_ID decimal(10,0) NOT NULL default '0', LOCATION_ID decimal(10,0) NOT NULL default '0', PRODUCT_ID decimal(10,0) NOT NULL default '0', SALES_UNIT decimal(10,0) NOT NULL default '0', SALES_RETAIL decimal(10,0) NOT NULL default '0', GROSS_PROFIT decimal(10,0) NOT NULL default '0‘ PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID), ADD INDEX PERIOD(PERIOD_ID), ADD INDEX LOCATION(LOCATION_ID), ADD INDEX PRODUCT(PRODUCT_ID) ) ENGINE=MyISAM PACK_KEYS DATA_DIRECTORY=‘C:mysqldata’ INDEX_DIRECTORY=‘D:mysqldata’; ENGINE = MyISAMENGINE = MyISAM Pros: •Non-transactional – faster, lower disk space usage, and less memory Cons: •2/4GB data file limit on operating systems that don't support big files •2/4GB index file limit on operating systems that don't support big files •One big table poses data archival and index maintenance challenges (e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
  • 15. CREATE TABLE ss.pos_day ( PERIOD_ID decimal(10,0) NOT NULL default '0', LOCATION_ID decimal(10,0) NOT NULL default '0', PRODUCT_ID decimal(10,0) NOT NULL default '0', SALES_UNIT decimal(10,0) NOT NULL default '0', SALES_RETAIL decimal(10,0) NOT NULL default '0', GROSS_PROFIT decimal(10,0) NOT NULL default '0‘ PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID), ADD INDEX PERIOD(PERIOD_ID), ADD INDEX LOCATION(LOCATION_ID), ADD INDEX PRODUCT(PRODUCT_ID) ) ENGINE=MyISAM PACK_KEYS RAID_TYPE=STRIPED; ENGINE = MyISAM + RAID_TYPEENGINE = MyISAM + RAID_TYPE Pros: •Non-transactional – faster, lower disk space usage, and less memory •Can help you to exceed the 2GB/4GB limit for the MyISAM data file •Creates up to 255 subdirectories, each with file named table_name.myd •Distributed IO – put each table subdirectory and file on a different disk Cons: •2/4GB index file limit on operating systems that don't support big files •One big table poses data archival and index maintenance challenges (e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
  • 16. CREATE TABLE ss.pos_merge ( PERIOD_ID decimal(10,0) NOT NULL default '0', LOCATION_ID decimal(10,0) NOT NULL default '0', PRODUCT_ID decimal(10,0) NOT NULL default '0', SALES_UNIT decimal(10,0) NOT NULL default '0', SALES_RETAIL decimal(10,0) NOT NULL default '0', GROSS_PROFIT decimal(10,0) NOT NULL default '0', INDEX PK(PRODUCT_ID, LOCATION_ID, PERIOD_ID), INDEX PERIOD(PERIOD_ID), INDEX LOCATION(LOCATION_ID), INDEX PRODUCT(PRODUCT_ID) ) ENGINE=MERGE UNION=(pos_1998,pos_1999,pos_2000) INSERT_METHOD=LAST; ENGINE = MERGEENGINE = MERGE Pros: •Non-transactional – faster, lower disk space usage, and less memory •Partitioned tables offer data archival and index maintenance options (e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc) •Distributed IO – put individual tables and indexes on different disks Cons: •MERGE tables use more file descriptors on database server •MERGE key lookups are much slower on “eq_ref” searches •Can use only identical MyISAM tables for a MERGE table
  • 17. CREATE TABLE ss.pos_day ( PERIOD_ID decimal(10,0) NOT NULL default '0', LOCATION_ID decimal(10,0) NOT NULL default '0', PRODUCT_ID decimal(10,0) NOT NULL default '0', SALES_UNIT decimal(10,0) NOT NULL default '0', SALES_RETAIL decimal(10,0) NOT NULL default '0', GROSS_PROFIT decimal(10,0) NOT NULL default '0‘ PRIMARY KEY(PRODUCT_ID, LOCATION_ID, PERIOD_ID), ADD INDEX PERIOD(PERIOD_ID), ADD INDEX LOCATION(LOCATION_ID), ADD INDEX PRODUCT(PRODUCT_ID) ) ENGINE=InnoDB PACK_KEYS; ENGINE = InnoDBENGINE = InnoDB Pros: •Simple yet flexible tablespace datafile configuration innodb_data_file_path=ibdata1:1G:autoextend:max:2G; ibdata2:1G:autoextend:max:2G Cons: •Uses much more disk space – typically 2.5 times as much disk space as MyISAM!!! •Transaction Safe – not needed, consumes resources (SET AUTOCOMMIT=0) •Foreign Keys – not needed, consumes resources (SET FOREIGN_KEY_CHECKS=0) •Prior to MySQL 4.1.1 – no “mutliple tablespaces” feature (i.e. one table per tablespace) •One big table poses data archival and index maintenance challenges (e.g. drop 1998 data, make 1999 read only, rebuild 2000 indexes, etc)
  • 18. Space Usage 21 Million RecordsSpace Usage 21 Million Records 3GB 3GB MERGE MyISAM 8GB InnoDB
  • 19. 4. Index Design4. Index Design Index Design must be driven by DW users’ nature: you don’t know what they’ll query upon, and the more successful they are data mining – the more they’ll try (which is a actually a really good thing) … Therefore you don’t know which dimension tables they’ll reference and which dimension columns they will restrict upon – so: •Fact tables should have primary keys – for data load integrity •Fact table dimension reference (i.e. foreign key) columns should each be individually indexed – for variable fact/dimension joins •Dimension tables should have primary keys •Dimension tables should be fully indexed •MySQL 4.x – only one index per dimension will be used •If you know that one column will always be used in conjunction with others, create concatenated indexes •MySQL 5.x – new index merge will use multiple indexes
  • 20. Note: Make sure to build indexes based off cardinality (i.e. leading portion most selective), so in this case the index was built backwards
  • 21. MyISAM Key Cache MagicMyISAM Key Cache Magic 1. Utilize two key caches: •Default Key Cache – for fact table indexes •Hot Key Cache – for dimension key indexes command-line option: shell> mysqld --hot_cache.key_buffer_size=16M option file: [mysqld] hot_cache.key_buffer_size=16M CACHE INDEX t1, t2, t3 IN hot_cache; 2. Pre-Load Dimension Indexes: LOAD INDEX INTO CACHE t1, t2, t3 IGNORE LEAVES;
  • 22. 5. Data Loading Architecture5. Data Loading Architecture Archive: •ALTER TABLE fact_table UNION=(mt2, mt3, mt4) •DROP TABLE mt1 Load: •TRUNCATE TABLE staging_table •Run nightly/weekly data load into staging_table •ALTER TABLE merge_table_4 DROP PRIMARY KEY •ALTER TABLE merge_table_4 DROP INDEX •INSERT INTO merge_table_4 SELECT * FROM staging_table •ALTER TABLE merge_table_4 ADD PRIMARY KEY(…) •ALTER TABLE merge_table_4 ADD INDEX(…) •ANALYZE TABLE merge_table_4
  • 23. 6. Analyze Table6. Analyze Table •Analyze Table statement analyzes and stores the key distribution for a table •MySQL uses the stored key distribution to decide the order in which tables should be joined • If you have a problem with incorrect index usage, you should run ANALYZE TABLE to update table statistics such as cardinality of keys, which can affect the choices the optimizer makes ANALYZE TABLE ss.pos_merge;
  • 24. 7. Query Style7. Query Style Many Business Intelligence (BI) and Report Writing tools offer initialization parameters or global settings which control the style of SQL code they generate. Options often include: •Simple N-Way Join •Sub-Selects •Derived Tables •Reporting Engine does Join operations For now – only the first options make sense… (see following section regarding explain plans)
  • 25. 8. Explain Plan8. Explain Plan The EXPLAIN statement can be used either as a synonym for DESCRIBE or as a way to obtain information about how the MySQL query optimizer will execute a SELECT statement. You can get a good indication of how good a join is by taking the product of the values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query. We’ll refer to this calculation as the explain plan cost – and use this as our primary comparative measure (along with of course the actual run time) …
  • 26. Method: Derived Tables Cost = 1.8 x 1016th Huge Explain Cost, and statement ran forever!!!
  • 27. Method: Sub-Selects Cost = 2.1 x 107th Sub-Select better than Derived Table – but not as good as Simple Join
  • 28. Method: Simple Joins and Single-Column Indexes Cost = 7.1 x 106th Join better than Sub- Select and much better than Derived Table
  • 29. Cost = 7.1 x 106th Method: Merge Table and Single-Column Indexes Same Explain Cost, but statement ran 2X faster
  • 30. Method: Merge Table and Multi-Column Indexes Cost = 3.2 x 105th Concatenated Index yielded best run time
  • 31. Method: Merge Table and Merge Indexes Cost = 5.2 x 105th MySQL 5.0 new Index Merge = best run time
  • 32. Conclusion …Conclusion … •MySQL can easily be used to build large and effective “Star Schema” Data Warehouses •MySQL Version 5.x will offer even more useful index and join query optimizations •MySQL can be better configured for DW use through effective mysql.ini option settings •Table and Index designs are paramount to success •Query style and resulting explain plans are critical to achieving the fastest query run times