SlideShare una empresa de Scribd logo
1 de 27
Transactional Storage for MySQL
        FAST. RELIABLE. PROVEN.



InnoDB Internals: InnoDB File
  Formats and Source Code
         Structure
   MySQL University, October 2009


                   Calvin Sun
               Principal Engineer
               Oracle Corporation
Today’s Topics
•   Goals of InnoDB
•   Key Functional Characteristics
•   InnoDB Design Considerations
•   InnoDB Architecture
•   InnoDB On Disk Format
•   Source Code Structure
•   Q&A
Goals of InnoDB


•   OLTP oriented
•   Performance, Reliability, Scalability
•   Data Protection
•   Portability
InnoDB Key Functional
            Characteristics
•   Full transaction support
•   Row-level locking
•   MVCC
•   Crash recovery
•   Efficient IO
Design Considerations
• Modeled on Gray & Reuter’s “Transactions
 Processing: Concepts & Techniques”
• Also emulated the Oracle architecture
• Added unique subsystems
  • Doublewrite
  • Insert buffering
  • Adaptive hash index
• Designed to evolve with changing
  hardware & requirements
InnoDB Architecture
    Server                        Applications


 Handler API         Embedded InnoDB API
                 Transaction
                   Cursor / Row
   Mini-
                      B-tree             Lock
transaction
                      Page
                    Buffer
              File Space Manager
                     IO
InnoDB On Disk Format
•   InnoDB Database Files
•   InnoDB Tablespaces
•   InnoDB Pages / Extents
•   InnoDB Rows
•   InnoDB Indexes
•   InnoDB Logs
•   File Format Design Considerations
InnoDB Database Files
                               MySQL Data Directory
System tablespace

                                                         InnoDB
                                                          tables
                           internal
                             data                                  .frm files
                          dictionary

                            insert        OR          innodb_file_per_table
                            buffer

                            undo
                            logs
                                                               .ibd files

                     ibdata files
InnoDB Tablespaces
• A tablespace consists of multiple files and/or
  raw disk partitions.
  file_name:file_size[:autoextend[:max:max_file_size]]
• A file/partition is a collection of segments.
• A segment consists of fixed-length pages.
• The page size is always 16KB in uncompressed
  tablespaces, and 1KB-16KB in compressed
  tablespaces (for both data and index).
System Tablespace
•   Internal Data Dictionary
•   Undo
•   Insert Buffer
•   Doublewrite Buffer
•   MySQL Replication Info
InnoDB Tablespaces
 Tablespace
                                           Segment
                                      Extent          Extent
 Leaf node segment
Non-leaf node segment                 Extent          Extent

                                                                   Extent
  Rollback segment
                                           Page
                                     Row        Row
             Row
             Trx id                 Row    Row Row
           Roll pointer
       Field pointers               Row   Row

 Field 1    Field 2       Field n

                                                        an extent = 64 pages
InnoDB Pages
                        InnoDB Page Types
      Symbol             Value                    Notes
   FIL_PAGE_INODE          3     File segment inode
   FIL_PAGE_INDEX        17855   B-tree node
 FIL_PAGE_TYPE_BLOB       10     Uncompressed BLOB page

 FIL_PAGE_TYPE_ZBLOB      11     1st compressed BLOB page
FIL_PAGE_TYPE_ZBLOB2      12     Subsequent compressed BLOB page
  FIL_PAGE_TYPE_SYS        6     System page

FIL_PAGE_TYPE_TRX_SYS      7     Transaction system page
                                 i-buf bitmap, I-buf free list, file space
       others                    header, extent desp page, new
                                 allocated page
InnoDB Pages
A page consists of: a page header, a page
  trailer, and a page body (rows or other
  contents).
                             Page header
               Row                 Row         Row       Row

                             Row                        Row

              Row       Row              Row


                 Row   Row


                                     row offset array
                              Page trailer
Page Declares
typedef struct                    /* a space address */
   {
     ulint     pageno;            /* page number within the file */
     ulint     boffset;           /* byte offset within the page */
   } fil_addr_t;

typedef struct
  {
   ulint      checksum;      /*
                             checksum of the page (since 4.0.14) */
   ulint      page_offset;   /*
                             page offset inside space */
   fil_addr_t previous;      /*
                             offset or fil_addr_t */
   fil_addr_t next;          /*
                             offset or fil_addr_t */
   dulint     page_lsn;      /*
                             lsn of the end of the newest
                              modification log record to the page */
  PAGE_TYPE page type;    /* file page type */
  dulint     file_flush_lsn;/* the file has been flushed to disk
                             at least up to this lsn */
  int         space_id;  /* space id of the page */
  char        data[];    /* will grow */
  ulint       page_lsn;  /* the last 4 bytes of page_lsn */
  ulint       checksum;  /* page checksum, or checksum magic, or 0 */
  } PAGE, *PAGE;
InnoDB Compressed Pages
   Page header      • InnoDB keeps a “modification
                      log” in each page
                • Updates & inserts of small
compressed data records are written to the log
                  w/o page reconstruction;
                  deletes don’t even require
                  uncompression
 modification log   • Log also tells InnoDB if the
                      page will compress to fit page
   empty space        size
  BLOB pointers     • When log space runs out,
  page directory      InnoDB uncompresses the
   Page trailer       page, applies the changes and
                      recompresses the page
InnoDB Rows
                             …     prefix(768B)          …
                                                          COMACT format



                                                                overflow
             20 bytes                                             page
       …                    …
                              DYNAMIC format



                                        overflow
                                          page



Record hdr   Trx ID     Roll ptr   Fld ptrs overflow-page ptr .. Field values
InnoDB Indexes - Primary
                           PK values
                           001 - nnn
                                                                         ●Data   rows are stored
                  …             …                                        in the B-tree leaf
                                                                         nodes of a clustered
      001 –
       500
                            500 –
                             800
                                                     801 –
                                                      nnn                index
                                                                          ● B-tree is organized
                                                                   xxx
001
 -
275
          276 –
           500
                     501
                      -
                     630
                             631
                              -
                             768
                                       769
                                        -
                                       800
                                               801
                                                -
                                               949
                                                             950
                                                              -
                                                             xxx
                                                                    -
                                                                   nnn      by primary key or
                                                                            non-null unique key
                                           clustered
                                                                            of table, if defined;
                  Key values
                                         (primary key)
                                             index                          else, an internal
                   501-
                   501-630
                   + data for
              corresponding rows
                                             Primary Index                  column with 6-byte
                                                                            ROW_ID is added.
InnoDB Indexes - Secondary
                                          clustered
                                     clustered
                                       (primary key)
                                   (primary PK values
                                              key)
                                        index - nnn
                                              001
                                            index

● Secondary index B-
  tree leaf nodes
  contain, for each key
  value, the primary           B-tree leaf nodes, containing data
  keys of the
  corresponding rows,
  used to access                            key values
                                               A Z

  clustering index to
  obtain the data
             Secondary Index
                               B-tree leaf nodes, containing PKs

                                        Secondary index
                                     Secondary index
InnoDB Logging

                              Rollback segments




     Log Buffer                                   Buffer Pool

           log thread
                                                      write thread




Log File                Log File
  #1
           redo                                   DATA
                          #2                                         rollback
            log
                                   log files
                                                       ibdata files
InnoDB Redo Log



         end of log      start of log        last checkpoint
                                     min LSN

Redo log structure:
        Space id      PageNo    OpCode       Data
File Format Management
              • Builtin InnoDB format: “Antelope”
              • New “Barracuda” format enables
                compression,ROW_FORMAT=DYNAMIC
   .ibd
                • Fast index creation, other features do not
 data files       require Barracuda file format
 (file per
  table)      • Builtin InnoDB can access “Antelope”
                databases, but not “Barracuda”
                databases
                • Check file format tag in system tablespace
                  on startup
              • Enable a file format with new dynamic
                parameter innodb_file_format
              • Preserves ability to downgrade easily
InnoDB File Format Design
      Considerations
• Durability
  • Logging, doublewrite, checksum;
• Performance
  • Insert buffering, table compression
• Efficiency
  • Dynamic row format, table compression
• Compatibility
  • File format management
Source Code Structure
• 31 subdirectories
• Relevant InnoDB source files on file
  formats
  • Tablespace: fsp0fsp {.c, .ic, .h}
  • Page: page0page, page0zip {.c, .ic, .h}
  • Log: log0log {.c, .ic, .h}
Source Code Subdirectories
•   buf       •   ibuf      •   que
•   data      •   include   •   read
•   db        •   lock      •   rem
•   dict      •   log       •   row
•   dyn       •   math      •   srv
•   eval      •   mem       •   sync
•   fil       •   mtr       •   thr
•   fsp       •   os        •   trx
•   fut       •   page      •   usr
•   ha        •   pars      •   ut
•   handler
Summary:
        Durability, Performance,
       Compatibility & Efficiency
• InnoDB is the leading transactional storage engine
  for MySQL
• InnoDB’s architecture is well-suited to modern, on-
  line transactional applications; as well as embedded
  applications.
• InnoDB’s file format is designed for high durability,
  better performance, and easy to manage
QUESTIONS
 ANSWERS
InnoDB Size Limits
•   Max # of tables: 4 G
•   Max size of a table: 32TB
•   Columns per table: 1000
•   Max row size: n*4 GB
    • 8 kB if stored on the same page
    • n*4 GB with n BLOBs
• Max key length: 3500
• Maximum tablespace size: 64 TB
• Max # of concurrent trxs: 1023

Más contenido relacionado

La actualidad más candente

mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementlalit choudhary
 
MySQL Server Settings Tuning
MySQL Server Settings TuningMySQL Server Settings Tuning
MySQL Server Settings Tuningguest5ca94b
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index CookbookMYXPLAIN
 
InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsMIJIN AN
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackSveta Smirnova
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOpsSveta Smirnova
 
MySQL Architecture and Engine
MySQL Architecture and EngineMySQL Architecture and Engine
MySQL Architecture and EngineAbdul Manaf
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performanceMariaDB plc
 
Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Mydbops
 
How to upgrade like a boss to MySQL 8.0 - PLE19
How to upgrade like a boss to MySQL 8.0 -  PLE19How to upgrade like a boss to MySQL 8.0 -  PLE19
How to upgrade like a boss to MySQL 8.0 - PLE19Alkin Tezuysal
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer ManagementMIJIN AN
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMarco Tusa
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorJean-François Gagné
 

La actualidad más candente (20)

mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancement
 
MySQL Server Settings Tuning
MySQL Server Settings TuningMySQL Server Settings Tuning
MySQL Server Settings Tuning
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and Checkpoints
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
Why MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it BackWhy MySQL Replication Fails, and How to Get it Back
Why MySQL Replication Fails, and How to Get it Back
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 
MySQL Architecture and Engine
MySQL Architecture and EngineMySQL Architecture and Engine
MySQL Architecture and Engine
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
How to use histograms to get better performance
How to use histograms to get better performanceHow to use histograms to get better performance
How to use histograms to get better performance
 
Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication Evolution of MySQL Parallel Replication
Evolution of MySQL Parallel Replication
 
How to upgrade like a boss to MySQL 8.0 - PLE19
How to upgrade like a boss to MySQL 8.0 -  PLE19How to upgrade like a boss to MySQL 8.0 -  PLE19
How to upgrade like a boss to MySQL 8.0 - PLE19
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
MySQL innoDB split and merge pages
MySQL innoDB split and merge pagesMySQL innoDB split and merge pages
MySQL innoDB split and merge pages
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and OrchestratorAlmost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
 

Destacado

MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsTuyen Vuong
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureMySQLConference
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomyRyan Robson
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevFuenteovejuna
 
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlPostgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlReactive.IO
 
innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話Takaaki Konaka
 
Mastering InnoDB Diagnostics
Mastering InnoDB DiagnosticsMastering InnoDB Diagnostics
Mastering InnoDB Diagnosticsguest8212a5
 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 SemanticAli Usman
 
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQLPL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQLReactive.IO
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMario Beck
 
Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)Ontico
 
The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!Boris Hristov
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL ExplainMYXPLAIN
 
Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013EDB
 
Como migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designerComo migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designerAlex Bernal
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structurezhaolinjnu
 

Destacado (20)

MySQL Atchitecture and Concepts
MySQL Atchitecture and ConceptsMySQL Atchitecture and Concepts
MySQL Atchitecture and Concepts
 
Inno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code StructureInno Db Internals Inno Db File Formats And Source Code Structure
Inno Db Internals Inno Db File Formats And Source Code Structure
 
cPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB AnatomycPanelCon 2014: InnoDB Anatomy
cPanelCon 2014: InnoDB Anatomy
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
 
Optimizing MySQL
Optimizing MySQLOptimizing MySQL
Optimizing MySQL
 
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency ControlPostgres MVCC - A Developer Centric View of Multi Version Concurrency Control
Postgres MVCC - A Developer Centric View of Multi Version Concurrency Control
 
innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話innoDBのインデックスとアルゴリズムについて調べてみた話
innoDBのインデックスとアルゴリズムについて調べてみた話
 
Mastering InnoDB Diagnostics
Mastering InnoDB DiagnosticsMastering InnoDB Diagnostics
Mastering InnoDB Diagnostics
 
Schemadoc
SchemadocSchemadoc
Schemadoc
 
Database , 5 Semantic
Database , 5 SemanticDatabase , 5 Semantic
Database , 5 Semantic
 
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQLPL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
PL/pgSQL - An Introduction on Using Imperative Programming in PostgreSQL
 
MySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDBMySQL 5.7: Focus on InnoDB
MySQL 5.7: Focus on InnoDB
 
Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)Mvcc Unmasked (Bruce Momjian)
Mvcc Unmasked (Bruce Momjian)
 
The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL Explain
 
Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013Mv unmasked.w.code.march.2013
Mv unmasked.w.code.march.2013
 
Como migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designerComo migrar una base de datos de mysql a power designer
Como migrar una base de datos de mysql a power designer
 
Explain
ExplainExplain
Explain
 
Inno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structureInno db internals innodb file formats and source code structure
Inno db internals innodb file formats and source code structure
 

Similar a InnoDB Internal

Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tablesOpen sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tablesArvids Godjuks
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-bufferKlaas Krona
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)guest808c167
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)Ontico
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsmysqlops
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)referenceFromDual GmbH
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfycelgemici1
 
Mysteries of the binary log
Mysteries of the binary logMysteries of the binary log
Mysteries of the binary logMats Kindahl
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologiesMariaDB plc
 
Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化YUCHENG HU
 
BGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesBGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesMarco Gralike
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand recordsashish61_scs
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataMichael Rys
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Incremental backups
Incremental backupsIncremental backups
Incremental backupsVlad Lesin
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosqlthinkinlamp
 
PAGING MECHANISM Pentium.ppt
PAGING MECHANISM Pentium.pptPAGING MECHANISM Pentium.ppt
PAGING MECHANISM Pentium.pptPramodBorole2
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance OptimisationMydbops
 

Similar a InnoDB Internal (20)

Data recovery talk on PLUK
Data recovery talk on PLUKData recovery talk on PLUK
Data recovery talk on PLUK
 
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tablesOpen sql2010 recovery-of-lost-or-corrupted-innodb-tables
Open sql2010 recovery-of-lost-or-corrupted-innodb-tables
 
Lecture storage-buffer
Lecture storage-bufferLecture storage-buffer
Lecture storage-buffer
 
Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)Recovery of lost or corrupted inno db tables(mysql uc 2010)
Recovery of lost or corrupted inno db tables(mysql uc 2010)
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
Pldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internalsPldc2012 innodb architecture and internals
Pldc2012 innodb architecture and internals
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
 
Mysteries of the binary log
Mysteries of the binary logMysteries of the binary log
Mysteries of the binary log
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologies
 
Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
 
BGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index StrategiesBGOUG 2012 - XML Index Strategies
BGOUG 2012 - XML Index Strategies
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Incremental backups
Incremental backupsIncremental backups
Incremental backups
 
My sql innovation work -innosql
My sql innovation work -innosqlMy sql innovation work -innosql
My sql innovation work -innosql
 
PAGING MECHANISM Pentium.ppt
PAGING MECHANISM Pentium.pptPAGING MECHANISM Pentium.ppt
PAGING MECHANISM Pentium.ppt
 
InnoDB Performance Optimisation
InnoDB Performance OptimisationInnoDB Performance Optimisation
InnoDB Performance Optimisation
 

Más de mysqlops

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautifulmysqlops
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解mysqlops
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementmysqlops
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationmysqlops
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Clustermysqlops
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告mysqlops
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫mysqlops
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践mysqlops
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用mysqlops
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现mysqlops
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析mysqlops
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考mysqlops
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示mysqlops
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事mysqlops
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDLmysqlops
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护mysqlops
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规mysqlops
 

Más de mysqlops (20)

The simplethebeautiful
The simplethebeautifulThe simplethebeautiful
The simplethebeautiful
 
Oracle数据库分析函数详解
Oracle数据库分析函数详解Oracle数据库分析函数详解
Oracle数据库分析函数详解
 
Percona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-managementPercona Live 2012PPT:mysql-security-privileges-and-user-management
Percona Live 2012PPT:mysql-security-privileges-and-user-management
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB ClusterPercona Live 2012PPT: MySQL Cluster And NDB Cluster
Percona Live 2012PPT: MySQL Cluster And NDB Cluster
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
DBA新人的述职报告
DBA新人的述职报告DBA新人的述职报告
DBA新人的述职报告
 
分布式爬虫
分布式爬虫分布式爬虫
分布式爬虫
 
MySQL应用优化实践
MySQL应用优化实践MySQL应用优化实践
MySQL应用优化实践
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现基于协程的网络开发框架的设计与实现
基于协程的网络开发框架的设计与实现
 
eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析eBay基于Hadoop平台的用户邮件数据分析
eBay基于Hadoop平台的用户邮件数据分析
 
对MySQL DBA的一些思考
对MySQL DBA的一些思考对MySQL DBA的一些思考
对MySQL DBA的一些思考
 
QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示QQ聊天系统后台架构的演化与启示
QQ聊天系统后台架构的演化与启示
 
腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事腾讯即时聊天IM1.4亿在线背后的故事
腾讯即时聊天IM1.4亿在线背后的故事
 
分布式存储与TDDL
分布式存储与TDDL分布式存储与TDDL
分布式存储与TDDL
 
MySQL数据库生产环境维护
MySQL数据库生产环境维护MySQL数据库生产环境维护
MySQL数据库生产环境维护
 
Memcached
MemcachedMemcached
Memcached
 
DevOPS
DevOPSDevOPS
DevOPS
 
MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规MySQL数据库开发的三十六条军规
MySQL数据库开发的三十六条军规
 

Último

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

InnoDB Internal

  • 1. Transactional Storage for MySQL FAST. RELIABLE. PROVEN. InnoDB Internals: InnoDB File Formats and Source Code Structure MySQL University, October 2009 Calvin Sun Principal Engineer Oracle Corporation
  • 2. Today’s Topics • Goals of InnoDB • Key Functional Characteristics • InnoDB Design Considerations • InnoDB Architecture • InnoDB On Disk Format • Source Code Structure • Q&A
  • 3. Goals of InnoDB • OLTP oriented • Performance, Reliability, Scalability • Data Protection • Portability
  • 4. InnoDB Key Functional Characteristics • Full transaction support • Row-level locking • MVCC • Crash recovery • Efficient IO
  • 5. Design Considerations • Modeled on Gray & Reuter’s “Transactions Processing: Concepts & Techniques” • Also emulated the Oracle architecture • Added unique subsystems • Doublewrite • Insert buffering • Adaptive hash index • Designed to evolve with changing hardware & requirements
  • 6. InnoDB Architecture Server Applications Handler API Embedded InnoDB API Transaction Cursor / Row Mini- B-tree Lock transaction Page Buffer File Space Manager IO
  • 7. InnoDB On Disk Format • InnoDB Database Files • InnoDB Tablespaces • InnoDB Pages / Extents • InnoDB Rows • InnoDB Indexes • InnoDB Logs • File Format Design Considerations
  • 8. InnoDB Database Files MySQL Data Directory System tablespace InnoDB tables internal data .frm files dictionary insert OR innodb_file_per_table buffer undo logs .ibd files ibdata files
  • 9. InnoDB Tablespaces • A tablespace consists of multiple files and/or raw disk partitions. file_name:file_size[:autoextend[:max:max_file_size]] • A file/partition is a collection of segments. • A segment consists of fixed-length pages. • The page size is always 16KB in uncompressed tablespaces, and 1KB-16KB in compressed tablespaces (for both data and index).
  • 10. System Tablespace • Internal Data Dictionary • Undo • Insert Buffer • Doublewrite Buffer • MySQL Replication Info
  • 11. InnoDB Tablespaces Tablespace Segment Extent Extent Leaf node segment Non-leaf node segment Extent Extent Extent Rollback segment Page Row Row Row Trx id Row Row Row Roll pointer Field pointers Row Row Field 1 Field 2 Field n an extent = 64 pages
  • 12. InnoDB Pages InnoDB Page Types Symbol Value Notes FIL_PAGE_INODE 3 File segment inode FIL_PAGE_INDEX 17855 B-tree node FIL_PAGE_TYPE_BLOB 10 Uncompressed BLOB page FIL_PAGE_TYPE_ZBLOB 11 1st compressed BLOB page FIL_PAGE_TYPE_ZBLOB2 12 Subsequent compressed BLOB page FIL_PAGE_TYPE_SYS 6 System page FIL_PAGE_TYPE_TRX_SYS 7 Transaction system page i-buf bitmap, I-buf free list, file space others header, extent desp page, new allocated page
  • 13. InnoDB Pages A page consists of: a page header, a page trailer, and a page body (rows or other contents). Page header Row Row Row Row Row Row Row Row Row Row Row row offset array Page trailer
  • 14. Page Declares typedef struct /* a space address */ { ulint pageno; /* page number within the file */ ulint boffset; /* byte offset within the page */ } fil_addr_t; typedef struct { ulint checksum; /* checksum of the page (since 4.0.14) */ ulint page_offset; /* page offset inside space */ fil_addr_t previous; /* offset or fil_addr_t */ fil_addr_t next; /* offset or fil_addr_t */ dulint page_lsn; /* lsn of the end of the newest modification log record to the page */ PAGE_TYPE page type; /* file page type */ dulint file_flush_lsn;/* the file has been flushed to disk at least up to this lsn */ int space_id; /* space id of the page */ char data[]; /* will grow */ ulint page_lsn; /* the last 4 bytes of page_lsn */ ulint checksum; /* page checksum, or checksum magic, or 0 */ } PAGE, *PAGE;
  • 15. InnoDB Compressed Pages Page header • InnoDB keeps a “modification log” in each page • Updates & inserts of small compressed data records are written to the log w/o page reconstruction; deletes don’t even require uncompression modification log • Log also tells InnoDB if the page will compress to fit page empty space size BLOB pointers • When log space runs out, page directory InnoDB uncompresses the Page trailer page, applies the changes and recompresses the page
  • 16. InnoDB Rows … prefix(768B) … COMACT format overflow 20 bytes page … … DYNAMIC format overflow page Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values
  • 17. InnoDB Indexes - Primary PK values 001 - nnn ●Data rows are stored … … in the B-tree leaf nodes of a clustered 001 – 500 500 – 800 801 – nnn index ● B-tree is organized xxx 001 - 275 276 – 500 501 - 630 631 - 768 769 - 800 801 - 949 950 - xxx - nnn by primary key or non-null unique key clustered of table, if defined; Key values (primary key) index else, an internal 501- 501-630 + data for corresponding rows Primary Index column with 6-byte ROW_ID is added.
  • 18. InnoDB Indexes - Secondary clustered clustered (primary key) (primary PK values key) index - nnn 001 index ● Secondary index B- tree leaf nodes contain, for each key value, the primary B-tree leaf nodes, containing data keys of the corresponding rows, used to access key values A Z clustering index to obtain the data Secondary Index B-tree leaf nodes, containing PKs Secondary index Secondary index
  • 19. InnoDB Logging Rollback segments Log Buffer Buffer Pool log thread write thread Log File Log File #1 redo DATA #2 rollback log log files ibdata files
  • 20. InnoDB Redo Log end of log start of log last checkpoint min LSN Redo log structure: Space id PageNo OpCode Data
  • 21. File Format Management • Builtin InnoDB format: “Antelope” • New “Barracuda” format enables compression,ROW_FORMAT=DYNAMIC .ibd • Fast index creation, other features do not data files require Barracuda file format (file per table) • Builtin InnoDB can access “Antelope” databases, but not “Barracuda” databases • Check file format tag in system tablespace on startup • Enable a file format with new dynamic parameter innodb_file_format • Preserves ability to downgrade easily
  • 22. InnoDB File Format Design Considerations • Durability • Logging, doublewrite, checksum; • Performance • Insert buffering, table compression • Efficiency • Dynamic row format, table compression • Compatibility • File format management
  • 23. Source Code Structure • 31 subdirectories • Relevant InnoDB source files on file formats • Tablespace: fsp0fsp {.c, .ic, .h} • Page: page0page, page0zip {.c, .ic, .h} • Log: log0log {.c, .ic, .h}
  • 24. Source Code Subdirectories • buf • ibuf • que • data • include • read • db • lock • rem • dict • log • row • dyn • math • srv • eval • mem • sync • fil • mtr • thr • fsp • os • trx • fut • page • usr • ha • pars • ut • handler
  • 25. Summary: Durability, Performance, Compatibility & Efficiency • InnoDB is the leading transactional storage engine for MySQL • InnoDB’s architecture is well-suited to modern, on- line transactional applications; as well as embedded applications. • InnoDB’s file format is designed for high durability, better performance, and easy to manage
  • 27. InnoDB Size Limits • Max # of tables: 4 G • Max size of a table: 32TB • Columns per table: 1000 • Max row size: n*4 GB • 8 kB if stored on the same page • n*4 GB with n BLOBs • Max key length: 3500 • Maximum tablespace size: 64 TB • Max # of concurrent trxs: 1023