SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
PARTITIONING
UNDER THE HOOD

 Mattias Jonsson
●

–M.Sc.
●Mikael Ronström

–Ph.D.
 Sun MySQL
●
                   1
Who are we?
• Mikael is a founder of the technology behind NDB
  Cluster (MySQL Cluster)
• Mikael is also the author of the partitioning in
  MySQL 5.1
• Mattias worked as an developer and MySQL
  consultant before joining MySQL in 2007, and has
  worked with the partitioning code since.




                                                     2
How is partitioning implemented?
• Extended syntax is added to the parser
• Common partitioning support
• A generic partitioning engine/handler for engines
  without native partitioning support
• NDB (MySQL Cluster) does partitioning natively
• Pruning as an extra optimizer step




                                                      3
Where is the code?

• Handler code in sql/ha_partition.{h,cc}
• Common routines in sql/sql_partition.{h,cc}
• Structures in sql/partition_info.{h,cc} and
  sql/partition_element.h
• Pruning in sql/opt_range.{h,cc}
• Minor partitioning specifics in sql/sql_delete.cc,
  sql/sql_update.cc sql/sql_select.cc (pruning),
  sql/unireg.cc (frm handling), sql/sql_table.cc (alter)
  etc.
                                                           4
Execution flow
    Parsing
•
    Open/lock tables (including all partitions)
•
    Static pruning (specific for partitioned tables)
•
    Query execution / sending result
•
    Unlock/close tables
•




                                                       5
Overhead of locking partitioned table
• Currently the hottest place for improvement
• The server always store locks for all tables used in
  the query before optimizing
• For a partitioned table, this also means store locks
  all partition
• After all locks are stored, they are sorted
• The sorting assumes only a few locks to sort, but for
  a heavily partitioned table, it can be many locks
• After sorting, they are locked in that order, to
  prevent deadlocks in the server.
                                                          6
What does this overhead mean?
• Since pruning is after the optimization, we always
  have to lock all partitions
• Having too many partitions will give every query a
  high overhead (easy to observe on small queries)
• For inserts, try to use multi-row inserts.
• If possible, encapsulate a set of queries with
  'LOCK TABLE' (Also see bug#37252)
• It will not be fixed in 5.1, but the new meta-data
  locking in 5.4 will help to implement a fix for this
  limitation.
                                                         7
How does partitioning work?
• After opening and locking all tables, pruning is done
  to prevent scanning of non-matching partitions
• For non native partitioned engines (all except for
  NDB):
  > The table engine/handler is the ha_partition, receiving all
    the server calls via the Storage Engine API
  > Each partition is represented by an own underlying
    handler (InnoDB, MyISAM, Memory, Archive etc.) which
    in turn is called by the partitioning handler




                                                                  8
How is partitioning implemented?
• handler::write_row (insert) works by first calculate
  which partition it belongs to, and then redirect the
  call to that partitions handler
• handler::update_row (update) works the same, but if
  the row is to be moved, it is done by a delete in the
  old partition and insert in the new partition.
• handler::index_* (select … order by <index>) has to
  forward the call to all partitions and create a priority
  queue to pick from
• For all scan and index reads, the pruning is used to
  avoid impossible partitions

                                                             9
Insert



 YELLOW        GREEN              BLUE
                         RED




   Insert

     GOX 123    $ 3000   2001   YELLOW




                                         10
Update


  Yellow             Blue          Red            Green




Delete                                              Insert


           GOX 123    $ 3000    2001     Yellow



           GOX 123     $ 3000   2001     Green



                                                             11
Index walking

                                        Handler output


                                    Merge Sort




                                                               Sorted
                                            Sorted
      Sorted             Sorted
                                                            output stream
                                         output stream
   output stream      output stream
                                                           from Partition 8
                                        from Partition 5
  from Partition 1   from Partition 4
                                                                index
                                             index
       index              index




                                                                              12
Why must all unique indexes include
the partitioning functions columns?
• To ensure its uniqueness!
• Since the indexes is also partitioned, every unique
  tuple must be stored in the same partition
• So 'UNIQUE KEY (a,b) PARTITION BY (c)' gives the
  possibility to place (1,2,3) and (1,2,4) in different
  partitions/indexes which can not enforce the
  uniqueness!
• Will be possible when global indexes is
  implemented
                                                          13
Pruning
• Only RANGE partitioning support full range pruning
• All other types support range pruning by partitioning
  walking (in 5.1 max 10 steps, in 5.4 max 32 steps)
• Remember that the optimizer cannot use
  func(col) = val, so use col = val instead and leave it
  to the optimizer
• Verify with EXPLAIN PARTITIONS
• It all about pruning, this is where the big gain is!


                                                           14
Dynamic Pruning
 SELECT * FROM t1, t2 WHERE t2.a = t1.a;
 If t1 is inner loop it is possible to select only one
 partition in each of its scan (one scan per record in
 outer table t2).
 If t1 is outer loop it has to scan all partitions.

 Explanation: This works since there is an index on
 ‘a’ that contains all partitioning fields and this is
 bound for each scan in inner loop
 Cannot be seen in EXPLAIN PARTITONS
 (WL#4128)                                               15
How is the different partitioning
types implemented?
• Key works with a list of any column type (it
  calculates a hash/integer from hashing all listed
  columns and then modulo number of partitions)
• All other types works on integers only
• Hash use a simple modulo number of partitions
• The LINEAR variants is the same for both Hash and
  KEY, enabling less copying of data when
  adding/coalesce partitions; no need to rebuild all
  partitions, only splitting/merging the affected number
                                                           16
LINEAR KEY/HASH
• The normal HASH/KEY uses a modulo function to
  spread records => Even distribution of data among
  partitions
• Also leads to full rebuild of table for
  ADD/COALESCE Partition
• LINEAR HASH/KEY partitions use an algorithm
  based on linear hashing which means that some
  partitions will have twice as much data as others if
  the number of partitions is not on the form 2 ^ n
• LINEAR HASH/KEY only requires rebuild of a
  number of the partitions, thus faster partition
  management at the cost of a slight imbalance           17
LINEAR HASH distribution
    partition_name table_rows (14 additions ~ 78 s)
    p0 327687
•
    ALTER TABLE t ADD PARTITION PARTITIONS 1
    p0 163843
•
    p1 163844
•
    ALTER TABLE t ADD PARTITION PARTITIONS 1
    p0 81921
•
    p1 163844
•
    p2 81922
•
                                                      18
non LINEAR HASH distribution
    partition_name table_rows (14 additions ~ 230 s)
    p0 327687
•
    ALTER TABLE t ADD PARTITION PARTITIONS 1
    p0 163843
•
    p1 163844
•
    ALTER TABLE t ADD PARTITION PARTITIONS 1
    p0 109229
•
    p1 109229
•
    p2 109229
•
                                                       19
RANGE partitioning
• Partition found by binary search in the ranges.
• Pruning can also be done on open ranges.
• Can also be subpartitioned by [LINEAR] HASH/KEY




                                                    20
LIST partitioning
• All list values are stored in a sorted array
• Partition found by binary search of that array.
• Can also be subpartitioned by [LINEAR] HASH/KEY




                                                    21
SUBpartitioning
    Just combining RANGE/LIST with HASH/KEY
•
    First calculating which partition
•
    Then calculating which subpartition
•
    And finally combining those (no_subpartition *
•
    part_id + subpart_id)




                                                     22
ALTER TABLE t CMD PARTITION
• Handles REORGANIZE, ADD, DROP, COALESCE,
  REBUILD in mysql_alter_table
• Special functions for handling these ALTER
  PARTITION commands:
 > Preparations done in prep_alter_part_table, which
   returns if it needs partition changes and if it can be done
   in a fast way (not copying the full table).
 > If possible to do in a fast manner, it is done in
   fast_alter_partition_table which in turn uses
   mysql_change_partitions/mysql_drop_partitions/mysql_r
   ename_partitions. For more information read the
   comments which explains the what and why.
                                                                 23
ALTER TABLE t CMD PARTITION
• Handles ANALYZE, CHECK, OPTIMIZE, REPAIR in
  mysql_admin_tables (just like ANALYZE/.. TABLE)
• Operates just like the TABLE variants, but only
  affects the given partitions
• The handler methods is done through the
  ha_partition handler.
• Currently a bug in the OPTIMIZE PARTITION for
  InnoDB partitions due to no real support for
  OPTIMIZE in InnoDB, it proposes recreate which
  rebuild the full table. (see bug#42822, use
  REBUILD + ANALYZE instead)
                                                    24
AUTO_INCREMENT Handling
• Was changed from always check with the partitions
  for the current auto_increment value to cache it in
  the table_share in the fix for bug#33479.
• It only loops through all partitions when initializing
  the shared auto_increment value
• Result is faster auto_increment handling in
  partitioned tables
• And fewer gaps


                                                           25
INFORMATION_SCHEMA.PARTITIONS
• [SUB]Partition name, description, type, expression
  etc
• Also some table statistics on per partition level such
  as rows, index/data size
• Implemented by calling every partitions handler to
  get the data, so it is equivalent with the
  INFORMATION_SCHEMA.TABLES.




                                                           26
How about TRUNCATE PARTITION?
• ALTER TABLE tbl_name TRUNCATE PARTITION
  (ALL|partition_name[,partition_name...])
• To be in 5.4 (patch in bug#19405)
• Uses the same code path as TRUNCATE TABLE
  with added partitioning pruning
• Uses the optimized delete_all_rows call




                                              27
And using key caches per PARTITION?
 • To be in 5.4 (patch in bug#39637)
 • CACHE INDEX tbl_name PARTITION (ALL|
   partition[, partition] …) [INDEX|KEY (index_name[,
   index_name] ...)] IN key_cache_name
 • LOAD INDEX INTO CACHE tbl_name PARTITION
   (ALL|partition[, partition] ...)
   [INDEX|KEY (index_name[,index_name] ...)
   [IGNORE LEAVES]
 • Both assignment and preload per partition
 • Specific to partitioned MyISAM
                                                        28
Too many open files on partitioned
MyISAM
 • An experimental patch (in review) that splitted the
   open into open_handler and open_files
 • Allows closing partition files based on LRU
 • Specific to partitioned MyISAM, extensible to
   Archive.




                                                         29
Time consuming ALTER TABLE?
• Try doing it in parallel! Can use all cores in one alter!
• Experimental (need common thread handling in the server)
• Two implementations of the real copy data between tables
  > No change in the partitioning – parallel data copy within
     clusters of partitions
  > Otherwise several read threads which sorts/feeds several
     write threads (both clusters of one or more partitions).
• Preview at launchpad: lp:mysql-server/mysql-5.1-wl2550




                                                                30
Range partition on non integer column
• PARTITION BY RANGE COLUMN_LIST(a,b,c...)
  (PARTITION p0 VALUES LESS THAN COLUMN_LIST('2009-
  04-22', 3.14, 'string');
• Values can also be MINVALUE or MAXVALUE
• PARTITION BY LIST COLUMN_LIST(a,b,c...) also possible
• Pruning on TO_SECONDS function
• Preview at launchpad lp:~mikael-ronstrom/mysql-server/mysql-
  5.1-wl3352




                                                                 31
Range partition on non integer column
CREATE TABLE t1
(joined DATE,
 category ENUM('beginner', 'medium', 'advanced', 'EOF'),
 name varchar(64))
PARTITION BY RANGE COLUMN_LIST(joined, category, name)
(PARTITION lt2008 VALUES LESS THAN (column_list('2008-01-01', 'beginner', '')),
 PARTITION bg2008 VALUES LESS THAN (column_list('2009-01-01', 'beginner', '')),
 PARTITION p12009 VALUES LESS THAN (column_list('2009-04-22', 'beginner', '')),
 PARTITION p22009 VALUES LESS THAN (column_list('2009-04-22', 'beginner', 'John')),
 PARTITION p32009 VALUES LESS THAN (column_list('2009-04-22', 'medium', '')),
 PARTITION p42009 VALUES LESS THAN (column_list('2009-04-22', 'advanced', '')),
 PARTITION p52009 VALUES LESS THAN (column_list('2009-04-22', 'EOF', '')),
 PARTITION future VALUES LESS THAN (column_list('2032-01-01', 'EOF', '')));




                                                                                      32
COLUMN_LIST partitioning
SELECT * FROM t1;
joined category   name
1999-01-01 advanced     Lars
2001-02-01 medium Steven
2008-02-01 beginner    Martin
2008-09-21 advanced     John
2009-02-01 beginner    Frank
2009-01-30 beginner    Joe
2009-04-22 beginner    Eve
2009-04-22 beginner    Zoe
2009-04-22 medium Jona
2009-04-22 medium Sarah
2009-04-22 advanced     Herb
2009-04-22 advanced     Jane
                                33
COLUMN_LIST partitioning
SELECT PARTITION_NAME, TABLE_ROWS FROM
INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME =
't1' AND TABLE_SCHEMA = 'test';
PARTITION_NAME TABLE_ROWS
Lt2008 2
Bg2008 2
P12009 2
P22009 1
P32009 1
P42009 2
P52009 2
future 0


                                                   34
Whats next?
• Moving partitions into tables
• Moving and checking tables into partitions
• Reworking the locking schedule, to allow pruning
  before locking (better support for this in 5.4),
  pruning for write operations
• Parallel scan (need more server support)
• Multi engine partitioning (Will be hard to get the
  optimizing properly)


                                                       35
PARTITIONING
UNDER THE HOOD
 Mattias Jonsson
●

–mattias.jonsson at sun.com
●Mikael Ronström

–mikael.ronstrom at sun.com

                              36

Más contenido relacionado

Similar a Partitioning Under The Hood

Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 Partitions
PerconaPerformance
 
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSIDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
Cuneyt Goksu
 

Similar a Partitioning Under The Hood (20)

Boost Performance With My S Q L 51 Partitions
Boost Performance With  My S Q L 51 PartitionsBoost Performance With  My S Q L 51 Partitions
Boost Performance With My S Q L 51 Partitions
 
MySQL 5.1 and beyond
MySQL 5.1 and beyondMySQL 5.1 and beyond
MySQL 5.1 and beyond
 
Things you should know about Oracle truncate
Things you should know about Oracle truncateThings you should know about Oracle truncate
Things you should know about Oracle truncate
 
Boost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitionsBoost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitions
 
Are you an abap coder or a programmer?
Are you an abap coder or a programmer?Are you an abap coder or a programmer?
Are you an abap coder or a programmer?
 
Shaping Optimizer's Search Space
Shaping Optimizer's Search SpaceShaping Optimizer's Search Space
Shaping Optimizer's Search Space
 
MySQL partitions tutorial
MySQL partitions tutorialMySQL partitions tutorial
MySQL partitions tutorial
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
 
Postgre sql 10 table partitioning
Postgre sql 10  table partitioningPostgre sql 10  table partitioning
Postgre sql 10 table partitioning
 
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSIDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
 
IBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash JoinIBM Informix Database SQL Set operators and ANSI Hash Join
IBM Informix Database SQL Set operators and ANSI Hash Join
 
IBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration BasicsIBM DB2 for z/OS Administration Basics
IBM DB2 for z/OS Administration Basics
 
Oracle Join Methods and 12c Adaptive Plans
Oracle Join Methods and 12c Adaptive PlansOracle Join Methods and 12c Adaptive Plans
Oracle Join Methods and 12c Adaptive Plans
 
Adv.+SQL+PPT+final.pptx
Adv.+SQL+PPT+final.pptxAdv.+SQL+PPT+final.pptx
Adv.+SQL+PPT+final.pptx
 
Partitioning 20061205
Partitioning 20061205Partitioning 20061205
Partitioning 20061205
 
Data stage
Data stageData stage
Data stage
 
Interview Preparation
Interview PreparationInterview Preparation
Interview Preparation
 
MySql Practical Partitioning
MySql Practical PartitioningMySql Practical Partitioning
MySql Practical Partitioning
 
lecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdflecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdf
 
Data Structure Marge sort Group 5 pptx so
Data Structure Marge sort Group 5 pptx soData Structure Marge sort Group 5 pptx so
Data Structure Marge sort Group 5 pptx so
 

Más de MySQLConference

Memcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMemcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My Sql
MySQLConference
 
Using Open Source Bi In The Real World
Using Open Source Bi In The Real WorldUsing Open Source Bi In The Real World
Using Open Source Bi In The Real World
MySQLConference
 
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The CloudTricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
MySQLConference
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
MySQLConference
 
Writing Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using NdbjWriting Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using Ndbj
MySQLConference
 
My Sql Performance On Ec2
My Sql Performance On Ec2My Sql Performance On Ec2
My Sql Performance On Ec2
MySQLConference
 
Inno Db Performance And Usability Patches
Inno Db Performance And Usability PatchesInno Db Performance And Usability Patches
Inno Db Performance And Usability Patches
MySQLConference
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
MySQLConference
 
Solving Common Sql Problems With The Seq Engine
Solving Common Sql Problems With The Seq EngineSolving Common Sql Problems With The Seq Engine
Solving Common Sql Problems With The Seq Engine
MySQLConference
 
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
MySQLConference
 
Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With Maatkit
MySQLConference
 
Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20
MySQLConference
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
MySQLConference
 
Unleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business IntelligenceUnleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business Intelligence
MySQLConference
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code Structure
MySQLConference
 
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin ExpressMy Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
MySQLConference
 

Más de MySQLConference (17)

Memcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMemcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My Sql
 
Using Open Source Bi In The Real World
Using Open Source Bi In The Real WorldUsing Open Source Bi In The Real World
Using Open Source Bi In The Real World
 
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The CloudTricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
Tricks And Tradeoffs Of Deploying My Sql Clusters In The Cloud
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
 
Writing Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using NdbjWriting Efficient Java Applications For My Sql Cluster Using Ndbj
Writing Efficient Java Applications For My Sql Cluster Using Ndbj
 
My Sql Performance On Ec2
My Sql Performance On Ec2My Sql Performance On Ec2
My Sql Performance On Ec2
 
Inno Db Performance And Usability Patches
Inno Db Performance And Usability PatchesInno Db Performance And Usability Patches
Inno Db Performance And Usability Patches
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
The Smug Mug Tale
The Smug Mug TaleThe Smug Mug Tale
The Smug Mug Tale
 
Solving Common Sql Problems With The Seq Engine
Solving Common Sql Problems With The Seq EngineSolving Common Sql Problems With The Seq Engine
Solving Common Sql Problems With The Seq Engine
 
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql BottlenecksUsing Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
Using Continuous Etl With Real Time Queries To Eliminate My Sql Bottlenecks
 
Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With Maatkit
 
Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20Getting The Most Out Of My Sql Enterprise Monitor 20
Getting The Most Out Of My Sql Enterprise Monitor 20
 
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service BackendWide Open Spaces Using My Sql As A Web Mapping Service Backend
Wide Open Spaces Using My Sql As A Web Mapping Service Backend
 
Unleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business IntelligenceUnleash The Power Of Your Data Using Open Source Business Intelligence
Unleash The Power Of Your Data Using Open Source Business Intelligence
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code Structure
 
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin ExpressMy Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
My Sql High Availability With A Punch Drbd 83 And Drbd For Dolphin Express
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Partitioning Under The Hood

  • 1. PARTITIONING UNDER THE HOOD Mattias Jonsson ● –M.Sc. ●Mikael Ronström –Ph.D. Sun MySQL ● 1
  • 2. Who are we? • Mikael is a founder of the technology behind NDB Cluster (MySQL Cluster) • Mikael is also the author of the partitioning in MySQL 5.1 • Mattias worked as an developer and MySQL consultant before joining MySQL in 2007, and has worked with the partitioning code since. 2
  • 3. How is partitioning implemented? • Extended syntax is added to the parser • Common partitioning support • A generic partitioning engine/handler for engines without native partitioning support • NDB (MySQL Cluster) does partitioning natively • Pruning as an extra optimizer step 3
  • 4. Where is the code? • Handler code in sql/ha_partition.{h,cc} • Common routines in sql/sql_partition.{h,cc} • Structures in sql/partition_info.{h,cc} and sql/partition_element.h • Pruning in sql/opt_range.{h,cc} • Minor partitioning specifics in sql/sql_delete.cc, sql/sql_update.cc sql/sql_select.cc (pruning), sql/unireg.cc (frm handling), sql/sql_table.cc (alter) etc. 4
  • 5. Execution flow Parsing • Open/lock tables (including all partitions) • Static pruning (specific for partitioned tables) • Query execution / sending result • Unlock/close tables • 5
  • 6. Overhead of locking partitioned table • Currently the hottest place for improvement • The server always store locks for all tables used in the query before optimizing • For a partitioned table, this also means store locks all partition • After all locks are stored, they are sorted • The sorting assumes only a few locks to sort, but for a heavily partitioned table, it can be many locks • After sorting, they are locked in that order, to prevent deadlocks in the server. 6
  • 7. What does this overhead mean? • Since pruning is after the optimization, we always have to lock all partitions • Having too many partitions will give every query a high overhead (easy to observe on small queries) • For inserts, try to use multi-row inserts. • If possible, encapsulate a set of queries with 'LOCK TABLE' (Also see bug#37252) • It will not be fixed in 5.1, but the new meta-data locking in 5.4 will help to implement a fix for this limitation. 7
  • 8. How does partitioning work? • After opening and locking all tables, pruning is done to prevent scanning of non-matching partitions • For non native partitioned engines (all except for NDB): > The table engine/handler is the ha_partition, receiving all the server calls via the Storage Engine API > Each partition is represented by an own underlying handler (InnoDB, MyISAM, Memory, Archive etc.) which in turn is called by the partitioning handler 8
  • 9. How is partitioning implemented? • handler::write_row (insert) works by first calculate which partition it belongs to, and then redirect the call to that partitions handler • handler::update_row (update) works the same, but if the row is to be moved, it is done by a delete in the old partition and insert in the new partition. • handler::index_* (select … order by <index>) has to forward the call to all partitions and create a priority queue to pick from • For all scan and index reads, the pruning is used to avoid impossible partitions 9
  • 10. Insert YELLOW GREEN BLUE RED Insert GOX 123 $ 3000 2001 YELLOW 10
  • 11. Update Yellow Blue Red Green Delete Insert GOX 123 $ 3000 2001 Yellow GOX 123 $ 3000 2001 Green 11
  • 12. Index walking Handler output Merge Sort Sorted Sorted Sorted Sorted output stream output stream output stream output stream from Partition 8 from Partition 5 from Partition 1 from Partition 4 index index index index 12
  • 13. Why must all unique indexes include the partitioning functions columns? • To ensure its uniqueness! • Since the indexes is also partitioned, every unique tuple must be stored in the same partition • So 'UNIQUE KEY (a,b) PARTITION BY (c)' gives the possibility to place (1,2,3) and (1,2,4) in different partitions/indexes which can not enforce the uniqueness! • Will be possible when global indexes is implemented 13
  • 14. Pruning • Only RANGE partitioning support full range pruning • All other types support range pruning by partitioning walking (in 5.1 max 10 steps, in 5.4 max 32 steps) • Remember that the optimizer cannot use func(col) = val, so use col = val instead and leave it to the optimizer • Verify with EXPLAIN PARTITIONS • It all about pruning, this is where the big gain is! 14
  • 15. Dynamic Pruning SELECT * FROM t1, t2 WHERE t2.a = t1.a; If t1 is inner loop it is possible to select only one partition in each of its scan (one scan per record in outer table t2). If t1 is outer loop it has to scan all partitions. Explanation: This works since there is an index on ‘a’ that contains all partitioning fields and this is bound for each scan in inner loop Cannot be seen in EXPLAIN PARTITONS (WL#4128) 15
  • 16. How is the different partitioning types implemented? • Key works with a list of any column type (it calculates a hash/integer from hashing all listed columns and then modulo number of partitions) • All other types works on integers only • Hash use a simple modulo number of partitions • The LINEAR variants is the same for both Hash and KEY, enabling less copying of data when adding/coalesce partitions; no need to rebuild all partitions, only splitting/merging the affected number 16
  • 17. LINEAR KEY/HASH • The normal HASH/KEY uses a modulo function to spread records => Even distribution of data among partitions • Also leads to full rebuild of table for ADD/COALESCE Partition • LINEAR HASH/KEY partitions use an algorithm based on linear hashing which means that some partitions will have twice as much data as others if the number of partitions is not on the form 2 ^ n • LINEAR HASH/KEY only requires rebuild of a number of the partitions, thus faster partition management at the cost of a slight imbalance 17
  • 18. LINEAR HASH distribution partition_name table_rows (14 additions ~ 78 s) p0 327687 • ALTER TABLE t ADD PARTITION PARTITIONS 1 p0 163843 • p1 163844 • ALTER TABLE t ADD PARTITION PARTITIONS 1 p0 81921 • p1 163844 • p2 81922 • 18
  • 19. non LINEAR HASH distribution partition_name table_rows (14 additions ~ 230 s) p0 327687 • ALTER TABLE t ADD PARTITION PARTITIONS 1 p0 163843 • p1 163844 • ALTER TABLE t ADD PARTITION PARTITIONS 1 p0 109229 • p1 109229 • p2 109229 • 19
  • 20. RANGE partitioning • Partition found by binary search in the ranges. • Pruning can also be done on open ranges. • Can also be subpartitioned by [LINEAR] HASH/KEY 20
  • 21. LIST partitioning • All list values are stored in a sorted array • Partition found by binary search of that array. • Can also be subpartitioned by [LINEAR] HASH/KEY 21
  • 22. SUBpartitioning Just combining RANGE/LIST with HASH/KEY • First calculating which partition • Then calculating which subpartition • And finally combining those (no_subpartition * • part_id + subpart_id) 22
  • 23. ALTER TABLE t CMD PARTITION • Handles REORGANIZE, ADD, DROP, COALESCE, REBUILD in mysql_alter_table • Special functions for handling these ALTER PARTITION commands: > Preparations done in prep_alter_part_table, which returns if it needs partition changes and if it can be done in a fast way (not copying the full table). > If possible to do in a fast manner, it is done in fast_alter_partition_table which in turn uses mysql_change_partitions/mysql_drop_partitions/mysql_r ename_partitions. For more information read the comments which explains the what and why. 23
  • 24. ALTER TABLE t CMD PARTITION • Handles ANALYZE, CHECK, OPTIMIZE, REPAIR in mysql_admin_tables (just like ANALYZE/.. TABLE) • Operates just like the TABLE variants, but only affects the given partitions • The handler methods is done through the ha_partition handler. • Currently a bug in the OPTIMIZE PARTITION for InnoDB partitions due to no real support for OPTIMIZE in InnoDB, it proposes recreate which rebuild the full table. (see bug#42822, use REBUILD + ANALYZE instead) 24
  • 25. AUTO_INCREMENT Handling • Was changed from always check with the partitions for the current auto_increment value to cache it in the table_share in the fix for bug#33479. • It only loops through all partitions when initializing the shared auto_increment value • Result is faster auto_increment handling in partitioned tables • And fewer gaps 25
  • 26. INFORMATION_SCHEMA.PARTITIONS • [SUB]Partition name, description, type, expression etc • Also some table statistics on per partition level such as rows, index/data size • Implemented by calling every partitions handler to get the data, so it is equivalent with the INFORMATION_SCHEMA.TABLES. 26
  • 27. How about TRUNCATE PARTITION? • ALTER TABLE tbl_name TRUNCATE PARTITION (ALL|partition_name[,partition_name...]) • To be in 5.4 (patch in bug#19405) • Uses the same code path as TRUNCATE TABLE with added partitioning pruning • Uses the optimized delete_all_rows call 27
  • 28. And using key caches per PARTITION? • To be in 5.4 (patch in bug#39637) • CACHE INDEX tbl_name PARTITION (ALL| partition[, partition] …) [INDEX|KEY (index_name[, index_name] ...)] IN key_cache_name • LOAD INDEX INTO CACHE tbl_name PARTITION (ALL|partition[, partition] ...) [INDEX|KEY (index_name[,index_name] ...) [IGNORE LEAVES] • Both assignment and preload per partition • Specific to partitioned MyISAM 28
  • 29. Too many open files on partitioned MyISAM • An experimental patch (in review) that splitted the open into open_handler and open_files • Allows closing partition files based on LRU • Specific to partitioned MyISAM, extensible to Archive. 29
  • 30. Time consuming ALTER TABLE? • Try doing it in parallel! Can use all cores in one alter! • Experimental (need common thread handling in the server) • Two implementations of the real copy data between tables > No change in the partitioning – parallel data copy within clusters of partitions > Otherwise several read threads which sorts/feeds several write threads (both clusters of one or more partitions). • Preview at launchpad: lp:mysql-server/mysql-5.1-wl2550 30
  • 31. Range partition on non integer column • PARTITION BY RANGE COLUMN_LIST(a,b,c...) (PARTITION p0 VALUES LESS THAN COLUMN_LIST('2009- 04-22', 3.14, 'string'); • Values can also be MINVALUE or MAXVALUE • PARTITION BY LIST COLUMN_LIST(a,b,c...) also possible • Pruning on TO_SECONDS function • Preview at launchpad lp:~mikael-ronstrom/mysql-server/mysql- 5.1-wl3352 31
  • 32. Range partition on non integer column CREATE TABLE t1 (joined DATE, category ENUM('beginner', 'medium', 'advanced', 'EOF'), name varchar(64)) PARTITION BY RANGE COLUMN_LIST(joined, category, name) (PARTITION lt2008 VALUES LESS THAN (column_list('2008-01-01', 'beginner', '')), PARTITION bg2008 VALUES LESS THAN (column_list('2009-01-01', 'beginner', '')), PARTITION p12009 VALUES LESS THAN (column_list('2009-04-22', 'beginner', '')), PARTITION p22009 VALUES LESS THAN (column_list('2009-04-22', 'beginner', 'John')), PARTITION p32009 VALUES LESS THAN (column_list('2009-04-22', 'medium', '')), PARTITION p42009 VALUES LESS THAN (column_list('2009-04-22', 'advanced', '')), PARTITION p52009 VALUES LESS THAN (column_list('2009-04-22', 'EOF', '')), PARTITION future VALUES LESS THAN (column_list('2032-01-01', 'EOF', ''))); 32
  • 33. COLUMN_LIST partitioning SELECT * FROM t1; joined category name 1999-01-01 advanced Lars 2001-02-01 medium Steven 2008-02-01 beginner Martin 2008-09-21 advanced John 2009-02-01 beginner Frank 2009-01-30 beginner Joe 2009-04-22 beginner Eve 2009-04-22 beginner Zoe 2009-04-22 medium Jona 2009-04-22 medium Sarah 2009-04-22 advanced Herb 2009-04-22 advanced Jane 33
  • 34. COLUMN_LIST partitioning SELECT PARTITION_NAME, TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME = 't1' AND TABLE_SCHEMA = 'test'; PARTITION_NAME TABLE_ROWS Lt2008 2 Bg2008 2 P12009 2 P22009 1 P32009 1 P42009 2 P52009 2 future 0 34
  • 35. Whats next? • Moving partitions into tables • Moving and checking tables into partitions • Reworking the locking schedule, to allow pruning before locking (better support for this in 5.4), pruning for write operations • Parallel scan (need more server support) • Multi engine partitioning (Will be hard to get the optimizing properly) 35
  • 36. PARTITIONING UNDER THE HOOD Mattias Jonsson ● –mattias.jonsson at sun.com ●Mikael Ronström –mikael.ronstrom at sun.com 36