SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Understanding Histograms in the
Oracle Database

                Robert Gaydos
                R Wave Solutions
                Robert.Gaydos@rwavesolutions.com




               Copyright © R Wave
               Solutions 2008 All rights
               reserved
Addenda

 What is a Histogram?
 Types of Histograms
 How to create Histograms
 How to identify existing Histograms
 Rules and tips when creating Histograms
 Q&A


Copyright © R Wave Solutions 2008 All rights reserved
What is a Histogram?

  A histogram holds the data distribution of
  values within a column of a table.
   –   Number of occurrences for a specific value/range
   –   Used by CBO to optimize a query.
  It is collected by using DBMS_STATS.
   –   By default DBMS_STATS does not collect
       histogram stats.
         Min/Max Value and # of distinct values
         NOTE: DBMS_STAT is used to delete histogram data.


 Copyright © R Wave Solutions 2008 All rights reserved
Types of Histograms

 Two types of Histograms
  –   frequency histograms
  –   height-balanced histograms
  –   Type of histogram is stored in the HISTOGRAM
      column of the *tab_col_statistics views (USER/DBA)
  –   Value = (‘HEIGHT BALANCED’, ‘FREQUENCY’,
      or ‘NONE’)



Copyright © R Wave Solutions 2008 All rights reserved
Histogram and Buckets

 When Histograms are created the number of
 buckets can be specified.
 It is this number that controls the type of
 histogram created.
 # Buckets = # of Rows of information.
 When Distinct Values <= Buckets
  –   Then Frequency Histogram is created
  –   Else Height-Balanced Histogram is created

Copyright © R Wave Solutions 2008 All rights reserved
Frequency Histograms

 Each value of the column corresponds to a
 single bucket of the histogram.
 Each bucket contains the number of
 occurrences of that single value.




Copyright © R Wave Solutions 2008 All rights reserved
Frequency Histograms Example
Data             Sorted
A                A               Results
B                B
C                B               Bucket 1 A = 1
B                B
C                C               Bucket 2 B = 3
C                C
C                C               Bucket 3 C = 6
C                C
C                C               Bucket 4 D = 1
B                C
D                D               Bucket 5 E = 1
E                E


 Copyright © R Wave Solutions 2008 All rights reserved
Height-Balanced Histograms

 In a height-balanced histogram, the column
 values are divided into bands so that each
 band contains approximately the same
 number of rows.
 The useful information that the histogram
 provides is where in the range of values the
 endpoints fall.


Copyright © R Wave Solutions 2008 All rights reserved
Height-Balanced Example 1
Data             Sorted
A                A               Results
B                B
C                B               EPN             EPV
B                B
C                C               0               B
C                C
C                C               1               C
C                C
C                C               2               C
B                C
D                D               3               E
E                E
                                  Buckets = 4
 Copyright © R Wave Solutions 2008 All rights reserved
The DB only stores
                                   0-B
Height-Balanced                   Example 1
                                   2-C
                                   3-E
Data        Sorted
A           A                    Results
Notice that B
B           C                    EPN             EPV
C           B
Crosses Buckets.
B           B
C           C                    0               B
C           C
This is howC
C                                1               C
database knows
C           C
                                 2               C
C           C
which values are
B           C
D
popular     D                    3               E
E           E
                                  Buckets = 4
 Copyright © R Wave Solutions 2008 All rights reserved
How to Create a Histogram

  Created by using
  DBMS_STATS.GATHER_TABLE_STATS
  METHOD_OPT => 'FOR COLUMNS SIZE
  <# of buckets> <Column Name>‘
execute dbms_stats.gather_table_stats
(ownname => 'oe', tabname => 'inventories',
 METHOD_OPT => 'FOR COLUMNS SIZE 10
  quantity_on_hand');

 Copyright © R Wave Solutions 2008 All rights reserved
Identify Existing Histograms

  In 10g
   –   HISTOGRAM column of the *tab_col_statistics views
       (USER/DBA)
  In 9
   select owner, table_name, column_name, count(*)
      buckets
   from dba_histograms
   where endpoint_number not in (0,1)
   group by owner,
   table_name, column_name
 Copyright © R Wave Solutions 2008 All rights reserved
Rules On When To Create Histograms

  First there are no rules of thumb that are
  always correct.
  When creating using dbms_stats, the use of
  the DEGREE (for parallel) can skew the
  value of density
  Running dbms_stats can drop histograms
   –   Use method_opt==>'REPEAT‘


 Copyright © R Wave Solutions 2008 All rights reserved
Rules On When To Create Histograms

  Index maintenance overhead is performed
  during update, insert and delete operations.
  Histogram maintenance overhead is
  performed during the analyze/dbms_stats
  process.
  Histograms are NOT just for indexed
  columns.
   –   Adding a histogram to an un-indexed column that is used in
       a where clause can improve performance.


 Copyright © R Wave Solutions 2008 All rights reserved
Histogram Example

select sum(t1.d2*t2.d3*t3.d3)
from t1, t2, t3
where t1.fk1 = t2.pk1
and t3.pk1 = t2.fk1
and t3.d2 = 35
and t1.d3 = 0; << This column has 2 values in it
  In this example column T1.D3 has 2 values and is our
  most selective criterion, but without a histogram CBO
  assumes an even distribution of 2 distinct values
  (density = 0.5).

 Copyright © R Wave Solutions 2008 All rights reserved
SQL*Plan Without Histogram

Rows          Row Source Operation
-------       ---------------------------------
      1       SORT AGGREGATE
   2088        HASH JOIN
    601         TABLE ACCESS BY INDEX ROWID
    601          INDEX RANGE SCAN OBJ#(57612)
   1000         HASH JOIN
      4          TABLE ACCESS FULL OBJ#(57604)
                    filter("T1"."D3"=0)
 62500           TABLE ACCESS FULL OBJ#(57605)

  Copyright © R Wave Solutions 2008 All rights reserved
Timing Without Histogram
call     count        cpu    elapsed       disk      query    current         rows
------- ------   -------- ---------- ---------- ---------- ----------   ----------
Parse        1       0.00       0.00          0          0          0            0
Execute      1       0.00       0.00          0          0          0            0
Fetch        2       1.04       4.91      19013      20231          0            1
------- ------   -------- ---------- ---------- ---------- ----------   ----------
total        4       1.04       4.91      19013      20231          0            1



Query = 20231




  Copyright © R Wave Solutions 2008 All rights reserved
Use DBMS_STATS to Add Histogram

execute dbms_stats.gather_table_stats(NULL,
  'T1', estimate_percent=>null,
  method_opt=>'FOR COLUMNS SIZE AUTO
  d3');




 Copyright © R Wave Solutions 2008 All rights reserved
Timing With Histogram
call     count        cpu    elapsed       disk      query    current         rows
------- ------   -------- ---------- ---------- ---------- ----------   ----------
Parse        1       0.00       0.00          0          0          0            0
Execute      1       0.01       0.00          0          0          0            0
Fetch        2       0.56       2.84       9841      10666          0            1
------- ------   -------- ---------- ---------- ---------- ----------   ----------
total        4       0.57       2.84       9841      10666          0            1


Now Query = 10666
It was Query = 20231
½ the IO
  Copyright © R Wave Solutions 2008 All rights reserved
Plan With Histogram

Rows      Row Source Operation
-------   ---------------------------------------
      1 SORT AGGREGATE
   2088 HASH JOIN
    601   TABLE ACCESS BY INDEX ROWID OBJ#(57606)
    601    INDEX RANGE SCAN OBJ#(57612)
   1000   TABLE ACCESS BY INDEX ROWID OBJ#(57605)
   1005    NESTED LOOPS
      4     TABLE ACCESS FULL OBJ#(57604)
            filter("T1"."D3"=0)
    1000     INDEX RANGE SCAN OBJ#(57609)
          access("T1"."FK1"="T2"."PK1")

 Copyright © R Wave Solutions 2008 All rights reserved
Histograms Opportunities
 Any column used in a where clause with
 skewed data
 Columns that are not queried all the time
 Reduced overhead for insert, update, delete




Copyright © R Wave Solutions 2008 All rights reserved
When Not To Create A Histogram

 No rule of thumb is ever perfect.
 Do not create Histograms
  –   Evenly distributed data in the column
  –   Columns that are not queried
  –   Do not create them on every column of
      every table.
  –   PK of a table

Copyright © R Wave Solutions 2008 All rights reserved
Creating / Maintaining Histograms

  Table maintenance procedures are
  important.
   –   Accidentally deleting histogram data is possible.
   –   Histogram data is ONLY Collected/Maintained
       when stats are collected (DBMS_STATS).
   –   Changing data in the table does not change data
       stored in the histogram.



 Copyright © R Wave Solutions 2008 All rights reserved
Maintaining Histograms

 Always check if histogram exists on table
 before DBMS_STATS is run.
  –   Do not forget to note the # of buckets.
  –   Create tables to hold information.
 Use METHOD_OPT FOR ALL COLUMNS
 REPEAT to prevent deletion of histograms
 data.
 Distribution ratio change -> Recollect
Copyright © R Wave Solutions 2008 All rights reserved
Creating Histograms

 What is the best way to start out creating
 Histograms? Three popular ways to create.
  –   Salt and Pepper - Create them as you need them.
  –   Little Hammer DBMS_STATS using
      METHOD_OPT AUTO
        Should NOT be used after fresh restarted database
  –   Big Hammer DBMS_STATS using SKEWONLY
  –   Over Kill Hammer - FOR ALL COLUMNS


Copyright © R Wave Solutions 2008 All rights reserved
Other Comments

 It is the Density of a column that the
 optimizer considers when accessing a table.
 Changing the number of buckets changes
 the density.
 More Buckets DOES NOT mean better
 density.
 Using degree != 1 can affect density.

Copyright © R Wave Solutions 2008 All rights reserved
Review

 What is a Histogram?
 Two types of Histograms in the database
 How to create Histograms
 Data is collected by running DBMS_STATS.
 Different methods to create histograms
 Bucket size affects density of column.
 Density of the column affects CBO.

Copyright © R Wave Solutions 2008 All rights reserved
Thank You for Your Time




Copyright © R Wave Solutions 2008 All rights reserved

Más contenido relacionado

Destacado

Low-complexity robust adaptive generalized sidelobe canceller detector for DS...
Low-complexity robust adaptive generalized sidelobe canceller detector for DS...Low-complexity robust adaptive generalized sidelobe canceller detector for DS...
Low-complexity robust adaptive generalized sidelobe canceller detector for DS...
Dr. Ayman Elnashar, PhD
 

Destacado (7)

Phd Presentation
Phd PresentationPhd Presentation
Phd Presentation
 
Open Eu Praesentation Hu
Open Eu Praesentation HuOpen Eu Praesentation Hu
Open Eu Praesentation Hu
 
Low-complexity robust adaptive generalized sidelobe canceller detector for DS...
Low-complexity robust adaptive generalized sidelobe canceller detector for DS...Low-complexity robust adaptive generalized sidelobe canceller detector for DS...
Low-complexity robust adaptive generalized sidelobe canceller detector for DS...
 
Robust Adaptive Beamforming for Antenna Array
Robust Adaptive Beamforming for Antenna ArrayRobust Adaptive Beamforming for Antenna Array
Robust Adaptive Beamforming for Antenna Array
 
Profesyonel Fotoğrafçılık
Profesyonel FotoğrafçılıkProfesyonel Fotoğrafçılık
Profesyonel Fotoğrafçılık
 
Automatic analysis and classification of surface electromyography
Automatic analysis and classification of surface electromyographyAutomatic analysis and classification of surface electromyography
Automatic analysis and classification of surface electromyography
 
Sample-by-sample and block-adaptive robust constant modulus-based algorithms
Sample-by-sample and block-adaptive robust constant modulus-based algorithmsSample-by-sample and block-adaptive robust constant modulus-based algorithms
Sample-by-sample and block-adaptive robust constant modulus-based algorithms
 

Similar a Understanding histogramppt.prn

Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL Basic
Ron Liu
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
MLconf
 
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docxChapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
christinemaritza
 

Similar a Understanding histogramppt.prn (20)

Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
Day2 Verilog HDL Basic
Day2 Verilog HDL BasicDay2 Verilog HDL Basic
Day2 Verilog HDL Basic
 
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
Xgboost
XgboostXgboost
Xgboost
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
 
Lazy beats Smart and Fast
Lazy beats Smart and FastLazy beats Smart and Fast
Lazy beats Smart and Fast
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
 
e_lumley.pdf
e_lumley.pdfe_lumley.pdf
e_lumley.pdf
 
Verilog presentation final
Verilog presentation finalVerilog presentation final
Verilog presentation final
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Oracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECOOracle GoldenGate 12c CDR Presentation for ECO
Oracle GoldenGate 12c CDR Presentation for ECO
 
Savitch ch 022
Savitch ch 022Savitch ch 022
Savitch ch 022
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
 
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docxChapter 8 1 Digital Design and Computer Architecture, 2n.docx
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
 

Más de Leyi (Kamus) Zhang (11)

Oracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreOracle 12.2 sharding learning more
Oracle 12.2 sharding learning more
 
Oracle 12.2 sharded database management
Oracle 12.2 sharded database managementOracle 12.2 sharded database management
Oracle 12.2 sharded database management
 
Mac & Oracle
Mac & OracleMac & Oracle
Mac & Oracle
 
Vim - Amazing Editor for DBAs
Vim - Amazing Editor for DBAsVim - Amazing Editor for DBAs
Vim - Amazing Editor for DBAs
 
Hanganalyze presentation
Hanganalyze presentationHanganalyze presentation
Hanganalyze presentation
 
Vldb Statistics Gathering Strategy
Vldb Statistics Gathering StrategyVldb Statistics Gathering Strategy
Vldb Statistics Gathering Strategy
 
Exadata
ExadataExadata
Exadata
 
Kamus silde for summit
Kamus silde for summitKamus silde for summit
Kamus silde for summit
 
Oracle Resource Manager
Oracle Resource ManagerOracle Resource Manager
Oracle Resource Manager
 
数据库性能诊断的七种武器
数据库性能诊断的七种武器数据库性能诊断的七种武器
数据库性能诊断的七种武器
 
DTCC Rac Load Balancing Failover
DTCC Rac Load Balancing FailoverDTCC Rac Load Balancing Failover
DTCC Rac Load Balancing Failover
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Understanding histogramppt.prn

  • 1. Understanding Histograms in the Oracle Database Robert Gaydos R Wave Solutions Robert.Gaydos@rwavesolutions.com Copyright © R Wave Solutions 2008 All rights reserved
  • 2. Addenda What is a Histogram? Types of Histograms How to create Histograms How to identify existing Histograms Rules and tips when creating Histograms Q&A Copyright © R Wave Solutions 2008 All rights reserved
  • 3. What is a Histogram? A histogram holds the data distribution of values within a column of a table. – Number of occurrences for a specific value/range – Used by CBO to optimize a query. It is collected by using DBMS_STATS. – By default DBMS_STATS does not collect histogram stats. Min/Max Value and # of distinct values NOTE: DBMS_STAT is used to delete histogram data. Copyright © R Wave Solutions 2008 All rights reserved
  • 4. Types of Histograms Two types of Histograms – frequency histograms – height-balanced histograms – Type of histogram is stored in the HISTOGRAM column of the *tab_col_statistics views (USER/DBA) – Value = (‘HEIGHT BALANCED’, ‘FREQUENCY’, or ‘NONE’) Copyright © R Wave Solutions 2008 All rights reserved
  • 5. Histogram and Buckets When Histograms are created the number of buckets can be specified. It is this number that controls the type of histogram created. # Buckets = # of Rows of information. When Distinct Values <= Buckets – Then Frequency Histogram is created – Else Height-Balanced Histogram is created Copyright © R Wave Solutions 2008 All rights reserved
  • 6. Frequency Histograms Each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of occurrences of that single value. Copyright © R Wave Solutions 2008 All rights reserved
  • 7. Frequency Histograms Example Data Sorted A A Results B B C B Bucket 1 A = 1 B B C C Bucket 2 B = 3 C C C C Bucket 3 C = 6 C C C C Bucket 4 D = 1 B C D D Bucket 5 E = 1 E E Copyright © R Wave Solutions 2008 All rights reserved
  • 8. Height-Balanced Histograms In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same number of rows. The useful information that the histogram provides is where in the range of values the endpoints fall. Copyright © R Wave Solutions 2008 All rights reserved
  • 9. Height-Balanced Example 1 Data Sorted A A Results B B C B EPN EPV B B C C 0 B C C C C 1 C C C C C 2 C B C D D 3 E E E Buckets = 4 Copyright © R Wave Solutions 2008 All rights reserved
  • 10. The DB only stores 0-B Height-Balanced Example 1 2-C 3-E Data Sorted A A Results Notice that B B C EPN EPV C B Crosses Buckets. B B C C 0 B C C This is howC C 1 C database knows C C 2 C C C which values are B C D popular D 3 E E E Buckets = 4 Copyright © R Wave Solutions 2008 All rights reserved
  • 11. How to Create a Histogram Created by using DBMS_STATS.GATHER_TABLE_STATS METHOD_OPT => 'FOR COLUMNS SIZE <# of buckets> <Column Name>‘ execute dbms_stats.gather_table_stats (ownname => 'oe', tabname => 'inventories', METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand'); Copyright © R Wave Solutions 2008 All rights reserved
  • 12. Identify Existing Histograms In 10g – HISTOGRAM column of the *tab_col_statistics views (USER/DBA) In 9 select owner, table_name, column_name, count(*) buckets from dba_histograms where endpoint_number not in (0,1) group by owner, table_name, column_name Copyright © R Wave Solutions 2008 All rights reserved
  • 13. Rules On When To Create Histograms First there are no rules of thumb that are always correct. When creating using dbms_stats, the use of the DEGREE (for parallel) can skew the value of density Running dbms_stats can drop histograms – Use method_opt==>'REPEAT‘ Copyright © R Wave Solutions 2008 All rights reserved
  • 14. Rules On When To Create Histograms Index maintenance overhead is performed during update, insert and delete operations. Histogram maintenance overhead is performed during the analyze/dbms_stats process. Histograms are NOT just for indexed columns. – Adding a histogram to an un-indexed column that is used in a where clause can improve performance. Copyright © R Wave Solutions 2008 All rights reserved
  • 15. Histogram Example select sum(t1.d2*t2.d3*t3.d3) from t1, t2, t3 where t1.fk1 = t2.pk1 and t3.pk1 = t2.fk1 and t3.d2 = 35 and t1.d3 = 0; << This column has 2 values in it In this example column T1.D3 has 2 values and is our most selective criterion, but without a histogram CBO assumes an even distribution of 2 distinct values (density = 0.5). Copyright © R Wave Solutions 2008 All rights reserved
  • 16. SQL*Plan Without Histogram Rows Row Source Operation ------- --------------------------------- 1 SORT AGGREGATE 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID 601 INDEX RANGE SCAN OBJ#(57612) 1000 HASH JOIN 4 TABLE ACCESS FULL OBJ#(57604) filter("T1"."D3"=0) 62500 TABLE ACCESS FULL OBJ#(57605) Copyright © R Wave Solutions 2008 All rights reserved
  • 17. Timing Without Histogram call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 1.04 4.91 19013 20231 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 1.04 4.91 19013 20231 0 1 Query = 20231 Copyright © R Wave Solutions 2008 All rights reserved
  • 18. Use DBMS_STATS to Add Histogram execute dbms_stats.gather_table_stats(NULL, 'T1', estimate_percent=>null, method_opt=>'FOR COLUMNS SIZE AUTO d3'); Copyright © R Wave Solutions 2008 All rights reserved
  • 19. Timing With Histogram call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.01 0.00 0 0 0 0 Fetch 2 0.56 2.84 9841 10666 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.57 2.84 9841 10666 0 1 Now Query = 10666 It was Query = 20231 ½ the IO Copyright © R Wave Solutions 2008 All rights reserved
  • 20. Plan With Histogram Rows Row Source Operation ------- --------------------------------------- 1 SORT AGGREGATE 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID OBJ#(57606) 601 INDEX RANGE SCAN OBJ#(57612) 1000 TABLE ACCESS BY INDEX ROWID OBJ#(57605) 1005 NESTED LOOPS 4 TABLE ACCESS FULL OBJ#(57604) filter("T1"."D3"=0) 1000 INDEX RANGE SCAN OBJ#(57609) access("T1"."FK1"="T2"."PK1") Copyright © R Wave Solutions 2008 All rights reserved
  • 21. Histograms Opportunities Any column used in a where clause with skewed data Columns that are not queried all the time Reduced overhead for insert, update, delete Copyright © R Wave Solutions 2008 All rights reserved
  • 22. When Not To Create A Histogram No rule of thumb is ever perfect. Do not create Histograms – Evenly distributed data in the column – Columns that are not queried – Do not create them on every column of every table. – PK of a table Copyright © R Wave Solutions 2008 All rights reserved
  • 23. Creating / Maintaining Histograms Table maintenance procedures are important. – Accidentally deleting histogram data is possible. – Histogram data is ONLY Collected/Maintained when stats are collected (DBMS_STATS). – Changing data in the table does not change data stored in the histogram. Copyright © R Wave Solutions 2008 All rights reserved
  • 24. Maintaining Histograms Always check if histogram exists on table before DBMS_STATS is run. – Do not forget to note the # of buckets. – Create tables to hold information. Use METHOD_OPT FOR ALL COLUMNS REPEAT to prevent deletion of histograms data. Distribution ratio change -> Recollect Copyright © R Wave Solutions 2008 All rights reserved
  • 25. Creating Histograms What is the best way to start out creating Histograms? Three popular ways to create. – Salt and Pepper - Create them as you need them. – Little Hammer DBMS_STATS using METHOD_OPT AUTO Should NOT be used after fresh restarted database – Big Hammer DBMS_STATS using SKEWONLY – Over Kill Hammer - FOR ALL COLUMNS Copyright © R Wave Solutions 2008 All rights reserved
  • 26. Other Comments It is the Density of a column that the optimizer considers when accessing a table. Changing the number of buckets changes the density. More Buckets DOES NOT mean better density. Using degree != 1 can affect density. Copyright © R Wave Solutions 2008 All rights reserved
  • 27. Review What is a Histogram? Two types of Histograms in the database How to create Histograms Data is collected by running DBMS_STATS. Different methods to create histograms Bucket size affects density of column. Density of the column affects CBO. Copyright © R Wave Solutions 2008 All rights reserved
  • 28. Thank You for Your Time Copyright © R Wave Solutions 2008 All rights reserved