SlideShare a Scribd company logo
1 of 109
Download to read offline
Physical Database design for
     Data Warehousing


            Keshava Murthy,
 Architect, IBM Informix Development

                                       0
Enterprise Data Warehouse

                                                            Query
   LOB                                                      Tools
   apps
                                                            BI
                                                            Apps
Databases

                                                           BPS
                                                           Apps
       Other
transactional
data sources                                              Analytics


                 I/O & data   DBMS &         Query
                 loading      Storage mgmt   processing
                                                   Source: Forrester
                                                                1
2
3
Data Warehouse Schema and Queries.
• Characterized by:
  – “Star” or “snowflake” schema:
                                               Dimensions
              Region
                                                 Fact Table                   Brand
                          City
                                    Store                           Product

                    Month                          SALES
                                   Period                                     Category

                    Quarter



     – Complex, ad hoc queries that typically
       • Look for trends, exceptions to make actionable business decisions
       • Touch large subset of the database (unlike OLTP)
       • Involve aggregation functions (e.g., COUNT, SUM, AVG,…)
 4
Store Sales ER-Diagram from TPC-DS
                                                  300GB database




      73,049                              402

                204,000

                                    287,997,024           86,400
               1000



                        1,920,800
                                    1,000,000     7200


                                                         20
                      2,000,000

  5
Aggregates

select s_store_name, s_store_id,
     sum(case when (d_day_name='Sunday') then ss_sales_price else null end) sun_sales,
     sum(case when (d_day_name='Monday') then ss_sales_price else null end) mon_sales,
                                                                      Dimension tables.
     sum(case when (d_day_name='Tuesday') then ss_sales_price else null end) tue_sales,
    sum(case when (d_day_name='Wednesday') then ss_sales_price else null end) wed_sales,
                                                                     Fact table
     sum(case when (d_day_name='Thursday') then ss_sales_price else null end) thu_sales,
                                                                  Equijoins between
     sum(case when (d_day_name='Friday') then ss_sales_price else null end) fri_sales,
                                                               primary(dimension) and
     sum(case when (d_day_name='Saturday') then ss_sales_price else nullkeys(fact)
                                                                  foreign end) sat_sales
from store_sales, store, date_dim
where d_date_sk = ss_sold_date_sk and
   s_store_sk = ss_store_sk and
   s_gmt_offset = -5 and                                         Predicates on the
   d_year = 2002                                                  dimension tables

group by s_store_name, s_store_id
order by s_store_name,
  s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales

                                                              Grouping and ordering.

   6
select first 100 i_item_id,
        avg(ss_quantity) agg1,            Aggregates
        avg(ss_list_price) agg2,
                                                                  Dimension tables.
        avg(ss_coupon_amt) agg3,           Fact table
        avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, item, promotion
 where ss_sold_date_sk = d_date_sk and
       ss_item_sk = i_item_sk and                             Equijoins between
       ss_cdemo_sk = cd_demo_sk and                        primary(dimension) and
       ss_promo_sk = p_promo_sk and                           foreign keys(fact)
       cd_gender = 'F' and
       cd_marital_status = 'M' and
       cd_education_status = 'College' and
       (p_channel_email = 'N' or p_channel_event = 'N') and
       d_year = 2001                                           Predicates on the
                                                               dimension tables
 group by i_item_id
 order by i_item_id;

                                                                Grouping and ordering.




   7
1) dwa_ds.store_sales: INDEX PATH (SKIP SCAN)
 (1) Index Name: martinfu.sold_date_store_sales
   Index Keys (Detached): ss_sold_date_sk (Parallel, fragments: ALL)
   Lower Index Filter: dwa_ds.store_sales.ss_sold_date_sk = stream from dwa_ds.date_dim.d_date_sk

2) dwa_ds.customer_demographics: SEQUENTIAL SCAN
   Filters:
   Table Scan Filters: ((dwa_ds.customer_demographics.cd_education_status = 'College' AND
dwa_ds.customer_demographics.cd_marital_status = 'M' ) AND dwa_ds.customer_demographics.cd_gender = 'F' )

DYNAMIC HASH JOIN
 Dynamic Hash Filters: dwa_ds.store_sales.ss_cdemo_sk = dwa_ds.customer_demographics.cd_demo_sk
3) dwa_ds.promotion: SEQUENTIAL SCAN
   Filters:
   Table Scan Filters: (dwa_ds.promotion.p_channel_event = 'N' OR dwa_ds.promotion.p_channel_email = 'N' )

DYNAMIC HASH JOIN
 Dynamic Hash Filters: dwa_ds.store_sales.ss_promo_sk = dwa_ds.promotion.p_promo_sk
4) dwa_ds.item: SEQUENTIAL SCAN

DYNAMIC HASH JOIN
 Dynamic Hash Filters: dwa_ds.store_sales.ss_item_sk = dwa_ds.item.i_item_sk
5) dwa_ds.date_dim: SEQUENTIAL SCAN
   Filters:
   Table Scan Filters: dwa_ds.date_dim.d_year = 2001

DYNAMIC HASH JOIN (Index Push Down Key: dwa_ds.date_dim.d_date_sk to dwa_ds.store_sales)
 Dynamic Hash Filters: dwa_ds.store_sales.ss_sold_date_sk = dwa_ds.date_dim.d_date_sk
      8
300GB database



cd_marital_status = 'M' and
cd_education_status = 'College'                                 d_year = 2001
     customer_demographics                                        date_dim
      1,920,800                                                             73,049

                                   Store_sales
                                  287,997,024


           item                                                     promotion
      204,000                                                           1000
                                                 (p_channel_email = 'N' or p_channel_event = 'N')




 9
100GB database
                                                                              Hash Join ss_sold_date_sk = d_date_sk
                                                       Build
                                                                                Probe



         Sequential Scan
                                                                                          Hash Join: ss_item_sk = i_item_sk
         date_dim (73K)                                               Build
Index Push Down Key: (d_year = 2001)                                                        Probe
d_date_sk to dwa_ds.store_sales     Sequential Scan

                                            Item (200K)                                                Hash Join: ss_promo_sk = p_promo_sk
                                                                                  Build                  Probe

                                                                  Sequential Scan
   (p_channel_event = 'N' OR p_channel_email = 'N' )           Promotion (1000)                                  Hash Join:

                                                                                               Build             ss_cdemo_sk = cd_demo_sk

                                                                            Sequential Scan                           Probe
         ((cd_education_status = 'College'
         AND cd_marital_status = 'M' ) AND cd_gender = 'F' )          customer_demographics (1.9M)

                                                                                                                 Skip Scan

                                                               Index scan          Sort rowids                 Store_sales
                                                                                                             287,997,024
    10
Open
                                                                      next                                    300GB database
                                                                            Hash Join ss_sold_date_sk = d_date_sk
                                                       Build
                                                                              Probe
                                                                                 Open
                                                                                 next
         Sequential Scan
                                                                                         Hash Join: ss_item_sk = i_item_sk
         date_dim (73K)                                            Build
Index Push Down Key: (d_year = 2001)                                                       Probe
d_date_sk to dwa_ds.store_sales     Sequential Scan                                          next
                                                                                             Open
                                            Item (200K)                                             Hash Join: ss_promo_sk = p_promo_sk
                                                                              Build
                                                                                                      Probe
                                                                 Sequential Scan                       Open
                                                                                                        next
   (p_channel_event = 'N' OR p_channel_email = 'N' )           Promotion (1000)                                Hash Join:
                                                                                                               ss_cdemo_sk = cd_demo_sk
                                                                                           Build
                                                                       Sequential Scan                              Probe
         ((cd_education_status = 'College'
         AND cd_marital_status = 'M' ) AND cd_gender = 'F' )        customer_demographics (1.9M)                     Open
                                                                                                                     next


                                                      Fact                                         Skip
                                                                           Sort rowids                        Store_sales
                                                   Index scan                                      Scan
                                                                                                          Start Fact table scan
    11
Intersection
        of rowids



12
Tasks for database design

•   Logical design
•   Physical design
•   Hardware configuration
•   Informix configuration
•   Feedbacks, changing needs.




                                   13
Why is physical database design necessary?

  Performance. Performance. Performance.

  • Reduce IO
  • Improve IO throughput
  • Improve CPU and network
  • Improves administration efficiency
  • Do more with less

                                             14
Tasks of physical database design
1. Data type selection
2. Indexes
3. Summary tables
4. Compression
5. Memory allocation
6. Table partitioning




                                    15
Data type selection

• Numeric is faster than character
• BIGINT, BIGSERIAL is faster than int8, serial8
• Fixed char is faster than varchar, lvarchar
• All the character types exploit light scan
• Larger types means larger indices
• Date-time-columns use integer in fact table..
• (RDS – storing the DOMAINS for publishing)


                                                   16
17
create table "dwa_ds".web_sales
 (
   ws_sold_date_sk integer,
   ws_sold_time_sk integer,
   ws_ship_date_sk integer,
   ws_bill_customer_sk integer,
   ws_bill_cdemo_sk integer,
   ws_bill_hdemo_sk integer,
   ws_bill_addr_sk integer,
   ws_ship_customer_sk integer,
   ws_ship_cdemo_sk integer,
   .....
   ws_quantity integer,
   ws_wholesale_cost decimal(7,2),
   ws_list_price decimal(7,2),
   ws_sales_price decimal(7,2),
   ws_net_paid decimal(7,2),
   ws_net_paid_inc_tax decimal(7,2),
   ws_net_paid_inc_ship decimal(7,2),
   ws_net_paid_inc_ship_tax decimal(7,2),
   ws_net_profit decimal(7,2),
                                                       18
   primary key (ws_item_sk,ws_order_number) disabled
 );
Select d_month
       ,i_category
       ,sum(ws_net_paid) as total_sum
       ,substr(i_category,2) as lochierarchy
from
   web_sales
  ,date_dim
  ,item
 where
      d_year = 2002
 and d_date_sk = ws_sold_date_sk
 and i_item_sk = ws_item_sk
 group by i_category,i_class
 order by d_month, i_category;
                                               19
Index Design
1. Very few indices
2. Leaner indices – Informix can combine when necessary
3. Keep load, insert performance in mind
   1. When possible disable
   2. Attach/detach performance with indexes
4. Scan – Scan – Scan
5. Provide better options for Optimizer
6. Improve scan for tables – more for dimension than facts
7. Exploit multi-index scans
8. Star join can work with indexes or without indexes.
                                                             20
Summary tables
1. Handle the “canned” queries by pre-computing the answers
2. Saves system resources for handling complex ad-hoc queries
3. Can create multiple levels of summaries
    -- sales by region, by part, by (day, week, month)
4. Need time to create the summary tables ahead of time


                     Summary level 3 (week)

                     Summary level 2 (week)

                      Summary level 1 (day)

                 Fact table > billons of rows                   21
Traditional Compression
1. Very few indices
2. Leaner indices – Informix can combine when necessary
3. Keep load, insert performance in mind
   1. When possible disable
   2. Attach/detach performance with indexes
4. Scan – Scan – Scan
5. Provide better options for Optimizer
6. Improve scan for tables – more for dimension than facts
7. Exploit multi-index scans
8. Star join can work with indexes or without indexes.
                                                             22
Compression On Data Page With Multiple
  Rows
                                                                 Multiple
                                                               Compressed
                                                                  Pages

                                                                                Empty Data
                 compress                repack                                   Pages

                                                                            shrink




  Uncompressed              Compressed            Compressed




   • To use repack and shrink, the data need not be
     compressed
   • repack and shrink can be done independently
   • Shrink does not do an implicit repack
Animated                                                                               23
  Slide
Pagesize Considerations
1. IDS can configure page sizes that are multiples of the
2. default page size
3. Depending on OS can be 2k, 4k, 8k, or 16k
4. Each page size will necessitate its own buffer pool
5. Can be useful if OS supports larger page sizes for buffer
   transfers.
6. Ensure adequate memory exists




                                                               24
• Motivation
  –Performance, Performance, Performance
  –Performance of the queries
  –Performance of Warehouse tasks
    • Extract jobs
    • Load performance
    • Transformation
    • Summary data creation



                                      25
• Starting point
  – Logical design
• Observations
  – Presence or absence of indices and constraints

• What are your expected maintenance tasks
  – Daily tasks
  – Weekly tasks
  – Monthly tasks
  – Quarterly tasks
  – Yearly tasks

                                                     26
Topics for the day
• Create table:fact an dimension
• Create Index: Deciding to which indices to
  create and their keys and attributes
• Attach/detach
• Drop/Truncate
• SELECT, UPDATE, DELETE
• Update Statistics
• Distribution
                                               27
Features
•   External table
•   Options for loading/unloading
•   Fragment level stats
•   Fragmentation – interval/etc
•   PDQ
•   Interplay with configuration
•   Keeping things ONLINE



                                    28
Fragmentation = Partitioning
• In Informix, they are synonymous
• Informix was the first to implement this
  feature. Industry is using partitioning for
  some time now
• Within Informix syntax, we’re making
  fragmentation and partitioning synonymous.
• Informix Fragmentation has nothing to do
  with disk fragmentation.

                                            29
Fragmentation = Partitioning
• In Informix, they are synonymous
• Informix was the first to implement this
  feature. Industry is using partitioning for
  some time now
• Within Informix syntax, we’re making
  fragmentation and partitioning synonymous.
• Informix Fragmentation has nothing to do
  with disk fragmentation.

                                            30
Situation
1. OLTP
  – Data volume
  – performance
2. Data Warehousing
  – Data Volume
  – Performance
  – Data Management
3. Mixed workload
  – Do (1) and (2) together


                              31
Problem areas and Opportunities
 •   Capacity
 •   Storage IO throughput
 •   Performance
 •   Availability
 •   Storage management
 •   Reliability
 •   Time cyclic data management
 •   Security

                                   32
Capacity
• By default, a table gets one partition
• Limited number of pages in each partition
• Capacity is limited by the number of pages.
   – 2KB pagesize on most platforms.
   – 4KB pagesize on Windows and AIX
• Increase the capacity by creating dbspace with up to
  16KB pagesize.
• Compress the table table to reclaim space
• Consider fragmenting the table.


                                                         33
Storage IO throughput
• Warehouse and complex queries and DDL
  operations depend on IO throughput
  significantly.
• By creating multiple fragments in multiple
  dbspaces, you give Informix opportunity to
  parallelize operations




                                               34
Performance
• Enable parallel operations for queries
  – SELECT
  – INSERT
• Exploit fragment elimination (aka partition
  pruning)
• Improve create index timings
• Improve update statistics operation



                                                35
Availability
• Fragments distributed in multiple disks or
  volumes are more tolerant of failure
• If a disk failed, you can skip over those fragments
  via DATASKIP




                                                  36
Storage management
• Manage the data distribution on different storage
  disks or storage managers
• Match disk speeds to different quality of service
  – Recent data on faster/expensive storage




                                               37
Time cyclic data management
• The table would maintain specific period of data.
  – Last 25 hours
  – Last 3 months or 13 months
  – Last 7 years
• At every interval do the three operations
  – Detach the data no longer needed
  – Attach the new data or make room for it.
  – Modify the expressions to represent the new window



                                                   38
Time cyclic data management

     working window




 4Q2008   1Q2009      2Q2009   3Q2009   4Q2009   1Q2010   2Q2010




                                                              39
Security
• Grant or revoke access per fragment
  – Grant and revoke is done on the tables
• Example
  – Table is fragmented on states
  – Table is fragmented on departments
  – Table is fragmented on date or datetime




                                              40
Fragmentation Overview


                    41
DBSPACE

CHUNK       CHUNK        CHUNK         CHUNK
 Extent       Extent   Pages
 Extent
              Extent           Pages



                       Pages
Partition




                                               42
Tables, Indices and Partitions

Customer_table    Partition    Partition   Storesales_table


                   Partition   Partition
  idx_cust_id                                Idx_store_id


 Customer_table   Paritition




                                                            43
Fragmentation by Round Robin



        Fragment1     Fragment2      Fragment3   Fragment4




Fragmentation by Expression


    September       October       November   December




                                                             44
Fragmentation by Round Robin
CREATE TABLE customer (id int, state char(2))
 FRAGMENT BY ROUND ROBIN
   in dbspace1, dbspace2, dbspace3;


CREATE TABLE customer (id int, state char(2))
 FRAGMENT BY ROUND ROBIN
  PARTITION part1 IN dbspace1,
  PARTITION part2 IN dbspace1,
  PARTITION part3 IN dbspace2;



       part1     part2       part3

                                                45
Fragmentation by Round Robin
•   Predictable distribution of data and use of storage.
•   Create indices with either round robin or expression strategy
•   No fragment elimination possible
•   Use expression strategy for indices to get benefit of elimination
•   Index expression depends on the access pattern.
•   Blobs avoid extra writes in RR strategy.




                                                                   46
Fragmentation by expression
CREATE TABLE customer (id int, state char (2), zipcode decimal(5,0))
 FRAGMENT BY EXPRESSION
  (state = “CA“) in dbspace1,
  (state = “KS") in dbspace2,
  (state = “OR") in dbspace3,
  (state = “NV") in dbspace4;

CREATE TABLE customer (id int, state char (2))
 FRAGMENT BY EXPRESSION
  PARTITION part1 (state = “CA") in dbspace1,
  PARTITION part2 (state = “KS") in dbspace1,
  PARTITION part3 (state = “OR") in dbspace1,
  PARTITION part4 (state = “NV") in dbspace1;


           CA              KS              OR          NV




                                                                       47
Fragmentation by expression
CREATE TABLE customer (id int, state char (2), zipcode
  decimal(5,0))
 FRAGMENT BY EXPRESSION
 partition partca93 (state = “CA“ and zipcode < 93000)
  in dbspace1,
  partition partcagt93 (state = “CA“ and zipcode >=
  93000) in dbspace5,
  PARTITION part2 (state = “KS") in dbspace2,
  PARTITION part3 (state = “OR") in dbspace2,
  PARTITION part4 (state = “NV") in dbspace3;
      CA        CA
                           KS         OR       NV
      < 93000   >= 93000




                                                    48
Fragmentation by expression
• Destination of the row depends on the row data
• Design of expressions requires understanding
  and estimation of data sets
• Expressions provide a flexible mechanism
• With flexibility comes complexity
• Can make fragment elimination complex

        CA        KS         OR      NV




                                             49
Orders Table for 2010

   jan_partition      1/1/2010 to 1-31-2010

   feb_partition      2/1/2010 to 2-28-2010

  march_partition     3/1/2010 to 3-31-2010

  april_partition     4/1/2010 to 4-30-2010
                      5/1/2010 to 5-31-2010
  may_partition

   june-partition     6/1/2010 to 6-30-2010

   july_partition     7/1/2010 to 7-31-2010

 august_partition     8/1/2010 to 8-31-2010

september_partition   9/1/2010 to 9-30-2010

 october_partition    10/1/2010 to 10-31-2010

november_partition    11/1/2010 to 11-30-2010

                                                50
Attached Index on a Fragmented Table

• Large table DSS or OLTP environment.
• Attractive index parallel scans.
• Attractive index fragment elimination and smaller
  btrees.
• Attractive scans on data pages in parallel.
• Balanced I/O for indexes and data pages.




                                                      51
Detached Fragmented Index on a
          Non-fragmented Table
•   OLTP environment with high index hits vs. data page
    hits (key only reads).
•   Attractive index scans in parallel
•   Attractive index lookups with fragment elimination and
    smaller btrees.
•   Unattractive scans on data pages in series.




                                                             52
Detached Index on a Fragmented Table
• DSS environment with some selective queries.
• Attractive scans on data pages in parallel.
• Unattractive index read in series.




                                                 53
Detached Fragmented Index on a
              Fragmented Table
•   Mixed OLTP and DSS environments with data fragmented
    for DSS and index fragmented of OLTP or Selective queries
    and non-selective queries on different columns in a DSS
    environment.
•   Attractive index parallel scans.
•   Attractive index fragment elimination and smaller btrees.
•   Attractive scans on data pages in parallel.
•   Balanced I/O for indexes and data pages.




                                                                54
Three things important in determining
your strategy and expressions



Workload.    Workload.    Workload.




                                        55
Create table history (
Docid char(14),
..
) fragment by expression
Partition p1 (docid >= “abcdef20112001” and docid
   <(babcde20111000) in dbs1,
Partition p2(docid >= “babcde20111000” and ..

Docid: abcdef20110001

                                                    56
Factors to consider
• Query Performance
  – What kind of queries?
  – Data Distribution
  – Which applications use this table?
• Storage Management
  – How do you handle data growth?
  – What’s the table management strategy when dataset
    grows


                                                  57
Fragmentation Objectives
                                 Parallelism
        scan threads *           Fragments are accessed in
                                 parallel, decreasing scan or insert
                                 time.
        fragments


                                 Fragment Elimination
        scan threads *           Unneeded fragments are
                                 eliminated, decreasing scan or

X   X    fragments       X   X   insert time, and reducing disk
                                 contention.


        scan threads *           Fragment Elimination &
                                 Parallelism
                                 Both goals are achieved.
         fragments       X   X
                                                             58
OLTP characteristics:         For this environment:
• high volume of short        • Fragmentation by round
  transactions                  robin or expression strategy
• each transaction accesses   • Reduce index access times
  a few rows
                                using fine grain expressions
• index access method is
  used.
                                on the index fragment.
                                 – Fragment recent data by day
                                   and older data by week or
                                   month.




                                                           59
Data Warehousing characteristics
                                        For this environment:
                                        • fragment elimination
• low volume of long running queries
• queries access most of the rows in each • parallel scan the
  fragment                                  needed fragments
• very few indexes are generally required
• preferred access method is sequential
  scan
• preferred join method is the hash join




                                                                60
Top IDS features utilized for building warehouse
• Multi-threaded Dynamic Scalable • Time cyclic data mgmt
  Architecture (DSA)                 – Fragment elimination, fragment
                                                attach and detach
   – Scalability and Performance
                                              – Data/index distribution schemas
   – Optimal usage of hardware and OS
                                              – Improve large data volume
     resources
                                                manageability
• DSS Parameters to optimize memory           – Increase performance by
   – DSS queries                                maximizing I/O throughput
   – Efficient hash joins                  • Configurable Page Size
                                              – On disk and in memory
• Parallel Data Query for parallel
                                              – Additional performance gains
  operations
                                           • Large Chunks support
   – Light scans, extensive
                                              – Allows IDS instances to handle
   – calculations, sorts, multiple joins        large volumes
   – Ideal for DSS queries and batch       • Quick Sequential Scans
     operations                               – Essential for table scans common to
• Data Compression                              DSS environments 17
                                                                               61
  Source:
Top IDS features utilized for building warehouse
• Multi-threaded Dynamic Scalable • Time cyclic data mgmt
  Architecture (DSA)                          – Fragment elimination, fragment
                                                attach and detach
   – Scalability and Performance
                                              – Data/index distribution schemas
   – Optimal usage of hardware and OS         – Improve large data volume
   Fragmentation Features
     resources                                  manageability
• DSS Parameters to optimize memory           – Increase performance by
   – DSS queries                                maximizing I/O throughput

   – Efficient hash joins                  • Configurable Page Size
                                              – On disk and in memory
• Parallel Data Query for parallel            – Additional performance gains
  operations                               • Large Chunks support
   – Light scans, extensive                   – Allows IDS instances to handle
   – calculations, sorts, multiple joins        large volumes
   – Ideal for DSS queries and batch       • Quick Sequential Scans
     operations                               – Essential for table scans common to
                                                DSS environments 17
• Data Compression                                                             62
  Source:
Fragmentation Objectives
                                 Parallelism
        scan threads *           Fragments are accessed in
                                 parallel, decreasing scan or insert
                                 time.
        fragments


                                 Fragment Elimination
        scan threads *           Unneeded fragments are
                                 eliminated, decreasing scan or

X   X    fragments       X   X   insert time, and reducing disk
                                 contention.


        scan threads *           Fragment Elimination &
                                 Parallelism
                                 Both goals are achieved.
         fragments       X   X
                                                             63
Typical Query with Non-PDQ vs. PDQ
 Send to client
                     Sort   Sort
           Sort

                                      Send to client
           Join
                        Join



           Scan
                  Scan Scan    Scan




                                                64
PDQ/Fragmentation
• Consider fragmenting any large table in a dbspace that is
  getting a lot of IO activity
• Consider fragmenting any large table if scans must be done
  against the table
• Do not put multiple fragments of the same table on the
  same physical device
   – Be aware of the I/O throughput of your storage
• Avoid using round robin fragmentation for indexes.
• Do not over-fragment.
   – The cost of managing fragmentation can outweigh the
     benefits when there are excessive fragments.



                                                               65
PDQ Configuration
• MAX_PDQPRIORITY
   – Set highest percentage of PDQ resources that a single
     client can use
• b
   – Max number of DSS queries that can be run together
• DS_TOTAL_MEMORY
   – Total memory reserved for PDQ
• DS_MAX_SCANS
   – Max number of parallel scans allowed. Leave at default
     (1048567)

                                                      66
PDQ Configuration
• If the site is primary a DSS system, then it
  is recommended that most of the
  allocated memory be in the virtual buffers
  and that DS_TOTAL_MEMORY be very
  large
• PDQ can be used in smaller memory
  environments by setting PDQ_PRIORITY
  to 1 so that parallel scans can be done.

                                                 67
PDQ Configuration
• onmode can be used to dynamically
  change PDQ parameters
  – onmode –M (DS_TOTAL_MEMORY)
  – onmode –Q (DS_MAX_QUERIES)
  – onmode –D (MAX_PDQPRIORITY)
  – onmode –S (DS_MAX_SCANS)




                                      68
Design for Time Cyclic data mgmt
create table mytrans(
custid        integer,
proc_date date,
store_loc char(12)
….
) fragment by expression
......
(proc_date < DATE ('01/01/2009' ) ) in fe_auth_log20081231,
(MONTH(proc_date) = 1 ) in frag2009Jan ,
(MONTH(proc_date) = 2 ) in frag2009Feb,….
(MONTH(proc_date) = 10 and proc_date < DATE ('10/26/2009' ) ) in frag2009Oct ,

(proc_date   =   DATE   ('10/26/2009'   )   )   in   frag20091026 ,
(proc_date   =   DATE   ('10/27/2009'   )   )   in   frag20091027,
(proc_date   =   DATE   ('10/28/2009'   )   )   in   frag20091027 ,
(proc_date   =   DATE   ('10/29/2009'   )   )   in   frag20091027 ,
(proc_date   =   DATE   ('10/30/2009'   )   )   in   frag20091027 ,
(proc_date   =   DATE   ('10/31/2009'   )   )   in   frag20091027 ,
(proc_date   =   DATE   ('11/01/2009'   )   )   in   frag20091027 ,

;                                                                         69
Fragment elimination

Type of filter (WHERE   Nonoverlapping    Overlapping on a    Nonoverlapping
clause)                 Single fragment   single column key   Multiple column
                        key                                   key
Range expression        Can eliminate     Cannot eliminate    Cannot eliminate
Equality expression     Can eliminate     Can eliminate       Can eliminate




                                                                          70
create table f1(a   int, b varchar(32)) fragment by
  expression
partition p1 (a =   1)   in   rootdbs,
partition p2 (a =   2)   in   rootdbs,
partition p3 (a =   3)   in   rootdbs,
partition p4 (a =   4)   in   rootdbs,
partition p5 (a =   5)   in   rootdbs;

insert   into f1 select 1, 'oneworld' from systables;
insert   into f1 select 2, 'twoworld' from systables;
insert   into f1 select 3, 'threeworld' from systables;
insert   into f1 select 4, 'fourworld' from systables;
insert   into f1 select 5, 'fiveworld' from systables;
insert   into f1 select * from f1;
create   index f1ab on f1(a,b);
create   index f1b on f1(b);
update   statistics high for table f1;

                                                          71
QUERY: (OPTIMIZATION TIMESTAMP: 9-29-2010 00:56:12)
------
select * from t1 where a=1

Estimated Cost: 5
Estimated # of Rows Returned: 136

 1) keshav.t1: INDEX PATH

   (1) Index Name: keshav.t1ab
       Index Keys: a b  (Serial, fragments: 0)
       Fragments Scanned: (0) p1 in rootdbs
       Lower Index Filter: keshav.t1.a = 1

                                                 72
QUERY: (OPTIMIZATION TIMESTAMP: 9-29-2010 01:06:54)
------
select * from t1 where a <3

Estimated Cost: 11
Estimated # of Rows Returned: 272

 1) keshav.t1: SEQUENTIAL SCAN (Serial, fragments:
 0, 1)
 Fragments Scanned: (0) p1 in rootdbs, (1) p2 in
 rootdbs
       Filters: keshav.t1.a < 3



                                                 73
select count(*) from t1 where a = 3 and b =
  'threeworld'

Estimated Cost: 1
Estimated # of Rows Returned: 1

 1) keshav.t1: INDEX PATH

    (1) Index Name: keshav.t1ab
        Index Keys: a b   (Serial, fragments: 2)
        Fragments Scanned: (2) p3 in rootdbs
        Lower Index Filter: (keshav.t1.b =
  'threeworld' AND keshav.t1.a = 3 )

                                                   74
select count(*) from t1
  where a = a+1 and b = 'threeworld'

Estimated Cost: 11
Estimated # of Rows Returned: 1
  1) keshav.t1: INDEX PATH
    Filters: keshav.t1.a = keshav.t1.a + 1
    (1) Index Name: keshav.t1b
       Index Keys: b   (Serial, fragments: ALL)
 Lower Index Filter: keshav.t1.b = 'threeworld'




                                                  75
Managing Change
• You have a round robin strategy, but running out of
  space
• You have a round robin strategy, but need to modify it
  into expression expression strategy
• Time Cyclic changes aka Roll-on Roll-off
   –   Daily, weekly, monthly, yearly
   –   Hourly anyone?
   –   I’m running out of space in a fragment!
   –   I have uneven distribution of data
   –   I can’t have any down-time!


                                                     76
Managing Change
• You have a round robin strategy, but running out of
  space
   – Analyze how frequently you’re adding new fragments
   – Consider moving to higher pagesize (2K to 16K)
   – All fragments of the table have to be in same pagesize.
   – But, the indices can be in a different pagesize than table. But,
     all fragments in an index use same pgsize
   – On 11.50.xC4 or above, you have another option!
       • You can compress the table data
   – Consider moving to higher pagesize (2K to 16K) for further
     capacity

                                                                  77
Managing Change
• You have a round robin strategy, but
  need to modify it into expression
  expression strategy
  –You need exclusive access to the table
  –You need to schedule the down time.
  –ALTER FRAGMENT… INIT



                                        78
Managing Change
• Time Cyclic changes aka Roll-on Roll-off
                                                  New data waiting to
                                                        roll on

          Current working set



     4Q2008   1Q2009 2Q2009     3Q2009   4Q2009     1Q2010   2Q2010




Older rolled-off data

                                                                      79
Managing Change
• Time Cyclic changes aka Roll-on Roll-off



                             Q1         Q2




         Q1         Q2            Q3    Q4



                                             80
Managing Change
• Time Cyclic changes aka Roll-on Roll-off
   –   Daily, weekly, monthly, yearly
   –   Consider the data distribution and the queries on it.
   –   Fine granularity can help speed up analysis
   –   Consider hybrid
create table   t(a int, b date) fragment by expression
partition p1   (month(b) = 10) in dbs,
partition p2   (month(b) = 11) in dbs,
partition p3   (b >= date("12/01/2009") and b <= date("12/15/2009")) in
   dbs,
partition p4   (b >= date("12/16/2009") and b <= date("12/31/2009")) in
   dbs;

   – Test your strategies to ensure fragment elimination.

                                                                    81
Enterprise Data Warehouse Platform

                                                                                 Query
   LOB
                                                                                 Tools
   apps

                                                                                 BI
Databases                                                                        Apps

                                                                                BPS
                                                                                Apps
         Other
 transactional
  data sources                                                                Analytics


                 I/O & data   Interconnect   DBMS & Storage   Query
                 loading      & Networking   mgmt             processing   Source: Forrester


                                                                                   82
Enterprise Data Warehouse Platform
                                  Deep Compession            MERGE
                                                             Hierarchical Queries
                 External Table
                                                                                           Query
                                                                                      Microstrategy
   LOB           Online           Interval and List          Light Scans                  Tools
                                                                                      Cognos
   apps          attach/detach    Fragmentation
                                                             Multi-Index Scan         Pentaho
                                  Online attach/detach
                                                             Skip Scan                      BI
                                                                                      Jaspersoft
                                  Fragment level stats
Databases                                                    Bitmap Technology              Apps
                                  Storage provisioning                                SQW
                                                             Star and Snowflake
                                  Table defragmenter         join optimization             BPS
                                                             Implicit PDQ                  Apps
         Other
 transactional
  data sources                                                Query Processing           Analytics
                                                                                         Tooling
                                  Data & storage
                 Data Loading     management
                   I/O & data     Interconnect        DBMS & Storage     Query
                   loading        & Networking        mgmt               processing   Source: Forrester


                                                                                              83
Top IDS features utilized for building warehouse
                                            •   Time cyclic data mgmt
• Multi-threaded Dynamic Scalable
  Architecture (DSA)                             – Fragment elimination, fragment
                                                    attach and detach
   – Scalability and Performance
                                                 – Data/index distribution schemas
   – Optimal usage of hardware and OS
                                                 – Improve large data volume
     resources
                                                    manageability
• DSS Parameters to optimize memory
                                                 – Increase performance by
   – DSS queries                                    maximizing I/O throughput
   – Efficient hash joins                   •   Configurable Page Size
• Parallel Data Query for parallel operations    – On disk and in memory
   – Light scans, extensive                      – Additional performance gains
   – calculations, sorts, multiple joins    •   Large Chunks support
   – Ideal for DSS queries and batch             – Allows IDS instances to handle
     operations                                     large volumes
• Data Compression                          •   Quick Sequential Scans
                                                 – Essential for table scans common
                                                    to DSS environments
                                                                             84
New fragmentation Strategies in
        Informix v11.70
• List Fragmentation
  – Similar to expression based fragmentation
  – Syntax compatibility
• Interval Fragmentation
  – Like expression, but policy based
  – Improves availability of the system


                                            85
Time Cyclic Data management
         •   Time-cyclic data management (roll-on, roll-off)
         •   Attach the new fragment
Dec 08   •   Detach the fragment no longer needed
field
 field
field
         •   Update the statistics (low, medium/high) to keep
 field
field
 field
field
             everything up to date.
 field
field
 field
                   Jan      Feb      Mar     Apr
field
 field
field               field    field   field   field
 field              field    field   field   field
                    field    field   field   field
                    field    field   field   field    May 09
                    field    field   field   field
                    field    field   field   field    field
                    field    field   field   field    field
                                                      field
                                                      field
                    enables storing data over time    field
                                                      field
                                                      field
                                                              86
Time Cyclic Data management
• ATTACH, DETACH and rest of ALTERs require exclusive access
     – Planned Downtime
• These can be scripted, but still need to lock out the users
     – Informix 11.50.xC6 has DDL_FORCE_EXEC to lock out the users
• Expression strategy gives you flexibility, but elimination can be
  tricky.
Dec 08
   field
    field
   field
    field
   field
    field
   field
    field
   field
    field
                       Jan Feb Mar Apr
   field
    field               field   field   field   field
   field
    field               field   field   field   field
                        field   field   field   field                May 09
                        field   field   field   field                field
                        field   field   field   field                field
                        field   field   field   field                field
                        field   field   field   field                field
                                                                     field
                                                                           87
                                                                     field
                                                                     field
Fragment by Expression
create table orders
      (
      order_num               int,
      order_date              date,
      customer_num            integer not null,
      ship_instruct           char(40),
      backlog                 char(1),
      po_num                  char(10),
      ship_date               date,
      ship_weight             decimal(8,2),
      ship_charge             money(6),
      paid_date               date      ) partition by expression
partition prv_partition (order_date < date(’01-01-2010’)) in mydbs,
partition jan_partition (order_date >= date(’01-01-2010’) and
   order_date < date(’02-01-2010’) in mydbs,
partition feb_partition (order_date >= date(’02-01-2010’) and
   order_date < date(’03-01-2010’) in mydbs,
partition mar_partition (order_date >= date(’03-01-2010’) and
   order_date < date(’04-01-2010’) in mydbs,
partition apr_partition (order_date >= date(’04-01-2010’) and
   order_date < date(’05-01-2010’) in mydbs,
…

                                                                 88
Fragment by Interval
create table orders
   (
   order_num           int,
   order_date          date,
   customer_num           integer not null,
   ship_instruct       char(40),
   backlog           char(1),     Interval Value
   po_num             char(10),
   ship_date          date,
   ship_weight         decimal(8,2),
   ship_charge
  Partition Key        money(6),
   paid_date          date )
partition by range(order_date) interval(1 units month)
                                                    dbspaces
store in (dbs1, dbs2)
partition prv_partition values < date(’01-01-2010’) in dbs3;
                                        Initial Partition
                                                               89
Interval Fragmentation
• Fragments data based on an interval value
   – E.g. fragment for every month or every million customer records
• Tables have an initial set of fragments defined by a range
  expression
• When a row is inserted that does not fit in the initial range
  fragments, IDS will automatically create fragment to hold
  the row (no DBA intervention)
• No X-lock is required for fragment addition
• All the benefits of fragment by expression


                                                               90
ONLINE attach, detach
• ATTACH
  – Load the data into a staging table, create the indices exactly
    as you have in your target table.
  – Then simply attach the table as a fragment into another
    table.
• DETACH
  – Identify the partition you want to detach
  – Simply detach the partition with ONLINE keyword to avoid
    attemps to get exclusive access




                                                                 91
Attach Example



ALTER FRAGMENT ONLINE ON TABLE “sales”.orders
ATTACH december_orders_table as PARTITION december_partition
values < 01-01-2011;




                                                               92
Attach Example




                 93
Attaching online




                   94
New queries will work on the table and will

Attaching online                                                                                                 consider the new table fragment .


              Query1 and Query2 continue and                      New queries will work on the table and won’t
               won’t access the new partition                      consider the table fragment for the queries.


    December_orders_table
        Table to attach



                                                                                                                               query3
                                                                                  query3
                                                                                                                                            query4
                                                             query2
   query1               query2         query1                                                 query2
                                                                        query1




               orders




                                                           Get exclusive access to the partion list (in the
        Attach                                            dictionary) .The dictionary entry gets modified
                                                         and new dictionary entries for new queries from
                                                                              here on


Issue ALTER ATTACH ONLINE
                                   Modify the dictionary entry to indicate online attach                      ONLINE ATTACH operation is complete.
                                    is in progress. Other sessions can read the list but                        Table is fully available for queries 95
                                                      cannot modify.
ONLINE operations
• ATTACH a fragment
• DETACH a fragment
• MODIFY transition value
• Automatic ADDing of new fragments on insert
  or update
• tasks eliminated by interval fragmentation
    – Scheduling downtime to get exclusive access for
      ADD, ATTACH, DETACH
    – Defining proper expressions to ensure fragment
      elimination
    – Running of update statistics manually after ALTER
      operations
    – Time taken to collect statistics is reduced as well.
                                                             96
UPDATE STATISTICS during
       ATTACH, DETACH
• Automatically kick-off update
  statistics refresh in the background –
  need to enable fragment level
  statistics
• tasks eliminated by interval
  fragmentation
  –Running of update statistics manually
   after ALTER operations
  –Time taken to collect statistics is
   reduced as well.
                                           97
List fragmentation
CREATE TABLE customer
(id SERIAL, fname CHAR(32), lname CHAR(32), state
   CHAR(2), phone CHAR(12))
FRAGMENT BY LIST (state)
  PARTITION p0 VALUES ("KS", "IL", "IN") IN dbs0,
  PARTITION p1 VALUES ("CA", "OR", "NV") IN dbs1,
  PARTITION p2 VALUES ("NY", "MN") IN dbs2,
  PARTITION p3 VALUES (NULL) IN dbs3,
  PARTITION p4 REMAINDER IN dbs3;

                                                    98
Smarter Statistics
   Collection



                     99
Fragment Level Statistics (FLS)
• Generate and store column distribution at
  fragment level
• Fragment level stats are combined to form
  column distribution
• System monitors UDI (Update/Delete/Insert)
  activities on each fragment
• Stats are refreshed only for frequently updated
  fragments
• Fragment level distribution is used to re-calculate
  column distribution
• No need to re-generate stats across entire table

                                                        100
Generating Table Level Statistics
 Frag
   1                                                                          Data
           Feed                                Store                       Distribution
                          Feed                 Encoded                        Cache
           Column         Sorted               Distribution
           Data           Data
 Frag               S                 Bin
   2                                                                            Decode
                    O              Generator
                                                              Sysdistrib        Distribution
                    R              & Encoder
                                                               Catalog
                    T
                                                                table

 Frag
   n


  •     Distribution created for entire column dataset from all fragments.
  •     Stored in sysdistrib with (tabid,colno) combination.
  •     Dbschema utility can decodes and display encoded distribution.
  •     Optimizer uses in-memory distribution representation for query
        optimization.

                                                                                      101
Generating Fragment Level Statistics
                                                                                       Data
                                                                                    Distribution
                                            Store
                                                                                       Cache
                   S                        Encoded
   Frag                          Mini-Bin
                   O                        Minibins
     1                          Generator                                    Decode
                   R                                                         Distribution
                                & Encoder                             S
                   T
          Feed                                          Feed          O
          Column                                       decode         R
                       Feed
          Data                                         Minibins       T
                       Sorted
                       Data
                   S
                   O             Mini-Bin                                          Sysdistrib
   Frag                                     Sysfragdist
                   R            Generator                                           Catalog
     2                                       Catalog
                   T            & Encoder                                            Table
                                              Table


                                                                  Mini-Bin
                                                                  Merger
                   S                                               & Bin
                                 Mini-Bin
   Frag            O                                              Encoder
                                Generator                                            Store
     n             R
                                & Encoder                                          Encoded
                   T                                                              Distribution



                                                                                                 102
STATLEVEL property
STATLEVEL defines the granularity or level of statistics created for the
  table.
Can be set using CREATE or ALTER TABLE.
STATLEVEL [TABLE | FRAGMENT | AUTO] are the allowed values for
  STATLEVEL.
TABLE – entire table dataset is read and table level statistics are
  stored in sysdistrib catalog.
FRAGMENT – dataset of each fragment is read an fragment level
  statistics are stored in new sysfragdist catalog. This option is only
  allowed for fragmented tables.
AUTO – System determines when update statistics is run if TABLE or
  FRAGMENT level statistics should be created.

                                                                  103
UPDATE STATISTICS extensions
• UPDATE STATISTICS [AUTO | FORCE];
• UPDATE STATISTICS HIGH FOR TABLE [AUTO |
  FORCE];
• UPDATE STATISTICS MEDIUM FOR TABLE tab1
  SAMPLING SIZE 0.8 RESOLUTION 1.0 [AUTO | FORCE
  ];
• Mode specified in UPDATE STATISTICS statement
  overrides the AUTO_STAT_MODE session setting.
  Session setting overrides the ONCONFIG's
  AUTO_STAT_MODE parameter.

                                               104
UPDATE STATISTICS extensions
• New metadata columns - nupdates, ndeletes and ninserts –
  in sysdistrib and sysfragdist store the corresponding
  counter values from partition page at the time of statistics
  generation. These columns will be used by consecutive
  update statistics run for evaluating if statistics are stale or
  reusable.
• Statistics evaluation is done at fragment level for tables
  with fragment level statistics and at table level for the rest.
• Statistics created by MEDIUM or HIGH mode (column
  distributions) is evaluated.
• The LOW statistics is saved at the fragment level as well and
  is aggregated to collect global statistics
                                                                    105
Alter Fragment Attach/Detach
• Automatic background refreshing of column statistics after
  executing ALTER FRAGMENT ATTACH/DETACH on a table with
  fragmented statistics.
• Refreshing of statistics begins after the ALTER has been
  committed.
• For ATTACH operation, fragmented statistics of the new
  fragment is built and table level statistics is rebuilt from all
  fragmented statistics. Any existing fragments with out of date
  column statistics will be rebuilt at this time too.
• For DETACH operation, table level statistics of the resulting
  tables are rebuilt from the fragmented statistics.
• The background task that refreshes statistics is “refreshstats”
  and will print errors in online.log if any are encountered.
                                                                 106
Round   List   Expressio      Interval
                Robin             n
 Parallelism     Yes    Yes       Yes            Yes
   Range         No     Yes       Yes            Yes
 Expression
  Equality       No     Yes       Yes            Yes
 Expression
    FLS          Yes    Yes       Yes            Yes
Smarter Stats    Yes    Yes       Yes            Yes
  ATTACH         No     No        No             Yes
  ONLINE
  DETACH         No     No        No             Yes
  ONLINE
  MODIFY         No     No        No        Yes -- MODIFY
  ONLINE                                   transition value
Create index     Yes    Yes       Yes          Not yet
  ONLINE
  Storage        No     No        No             Yes 107
Provisioning
Session TPM3   8/26/2011   108

More Related Content

Similar to Informix physical database design for data warehousing

Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)Daniel Upton
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
SQL coding at Sydney Measure Camp 2018
SQL coding at Sydney Measure Camp 2018SQL coding at Sydney Measure Camp 2018
SQL coding at Sydney Measure Camp 2018Adilson Mendonca
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeAtScale
 
Dw design 1_dim_facts
Dw design 1_dim_factsDw design 1_dim_facts
Dw design 1_dim_factsClaudia Gomez
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Eduardo Castro
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data TransformationsBigML, Inc
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptxjainyshah20
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseFeatureByte
 
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScyllaDB
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data Warehouses and Multi-Dimensional Data Analysis
Data Warehouses and Multi-Dimensional Data AnalysisData Warehouses and Multi-Dimensional Data Analysis
Data Warehouses and Multi-Dimensional Data AnalysisRaimonds Simanovskis
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson PortfolioKbengt521
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Charles Givre
 
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 

Similar to Informix physical database design for data warehousing (20)

Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
Enhancing Dashboard Visuals with Multi-Dimensional Expressions (MDX)
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Business Intelligence: A Review
Business Intelligence: A ReviewBusiness Intelligence: A Review
Business Intelligence: A Review
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
SQL coding at Sydney Measure Camp 2018
SQL coding at Sydney Measure Camp 2018SQL coding at Sydney Measure Camp 2018
SQL coding at Sydney Measure Camp 2018
 
My2dw
My2dwMy2dw
My2dw
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
 
Dw design 1_dim_facts
Dw design 1_dim_factsDw design 1_dim_facts
Dw design 1_dim_facts
 
KPMG - TASK 1.pdf
KPMG - TASK 1.pdfKPMG - TASK 1.pdf
KPMG - TASK 1.pdf
 
Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data Transformations
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data Warehouses and Multi-Dimensional Data Analysis
Data Warehouses and Multi-Dimensional Data AnalysisData Warehouses and Multi-Dimensional Data Analysis
Data Warehouses and Multi-Dimensional Data Analysis
 
Kevin Bengtson Portfolio
Kevin Bengtson PortfolioKevin Bengtson Portfolio
Kevin Bengtson Portfolio
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 

More from Keshav Murthy

N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0Keshav Murthy
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
 
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...Keshav Murthy
 
Couchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing featuresCouchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing featuresKeshav Murthy
 
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram VemulapalliN1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram VemulapalliKeshav Murthy
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Keshav Murthy
 
Couchbase Query Workbench Enhancements By Eben Haber
Couchbase Query Workbench Enhancements  By Eben Haber Couchbase Query Workbench Enhancements  By Eben Haber
Couchbase Query Workbench Enhancements By Eben Haber Keshav Murthy
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersKeshav Murthy
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorKeshav Murthy
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 
From SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONFrom SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONKeshav Murthy
 
Tuning for Performance: indexes & Queries
Tuning for Performance: indexes & QueriesTuning for Performance: indexes & Queries
Tuning for Performance: indexes & QueriesKeshav Murthy
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingKeshav Murthy
 
Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5Keshav Murthy
 
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLBringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLKeshav Murthy
 
Query in Couchbase. N1QL: SQL for JSON
Query in Couchbase.  N1QL: SQL for JSONQuery in Couchbase.  N1QL: SQL for JSON
Query in Couchbase. N1QL: SQL for JSONKeshav Murthy
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications Keshav Murthy
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONKeshav Murthy
 

More from Keshav Murthy (20)

N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
 
Couchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing featuresCouchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing features
 
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram VemulapalliN1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
 
Couchbase Query Workbench Enhancements By Eben Haber
Couchbase Query Workbench Enhancements  By Eben Haber Couchbase Query Workbench Enhancements  By Eben Haber
Couchbase Query Workbench Enhancements By Eben Haber
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developers
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
From SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONFrom SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSON
 
Tuning for Performance: indexes & Queries
Tuning for Performance: indexes & QueriesTuning for Performance: indexes & Queries
Tuning for Performance: indexes & Queries
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and Indexing
 
Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5
 
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLBringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
 
Query in Couchbase. N1QL: SQL for JSON
Query in Couchbase.  N1QL: SQL for JSONQuery in Couchbase.  N1QL: SQL for JSON
Query in Couchbase. N1QL: SQL for JSON
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSON
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Informix physical database design for data warehousing

  • 1. Physical Database design for Data Warehousing Keshava Murthy, Architect, IBM Informix Development 0
  • 2. Enterprise Data Warehouse Query LOB Tools apps BI Apps Databases BPS Apps Other transactional data sources Analytics I/O & data DBMS & Query loading Storage mgmt processing Source: Forrester 1
  • 3. 2
  • 4. 3
  • 5. Data Warehouse Schema and Queries. • Characterized by: – “Star” or “snowflake” schema: Dimensions Region Fact Table Brand City Store Product Month SALES Period Category Quarter – Complex, ad hoc queries that typically • Look for trends, exceptions to make actionable business decisions • Touch large subset of the database (unlike OLTP) • Involve aggregation functions (e.g., COUNT, SUM, AVG,…) 4
  • 6. Store Sales ER-Diagram from TPC-DS 300GB database 73,049 402 204,000 287,997,024 86,400 1000 1,920,800 1,000,000 7200 20 2,000,000 5
  • 7. Aggregates select s_store_name, s_store_id, sum(case when (d_day_name='Sunday') then ss_sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then ss_sales_price else null end) mon_sales, Dimension tables. sum(case when (d_day_name='Tuesday') then ss_sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then ss_sales_price else null end) wed_sales, Fact table sum(case when (d_day_name='Thursday') then ss_sales_price else null end) thu_sales, Equijoins between sum(case when (d_day_name='Friday') then ss_sales_price else null end) fri_sales, primary(dimension) and sum(case when (d_day_name='Saturday') then ss_sales_price else nullkeys(fact) foreign end) sat_sales from store_sales, store, date_dim where d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk and s_gmt_offset = -5 and Predicates on the d_year = 2002 dimension tables group by s_store_name, s_store_id order by s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales Grouping and ordering. 6
  • 8. select first 100 i_item_id, avg(ss_quantity) agg1, Aggregates avg(ss_list_price) agg2, Dimension tables. avg(ss_coupon_amt) agg3, Fact table avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and Equijoins between ss_cdemo_sk = cd_demo_sk and primary(dimension) and ss_promo_sk = p_promo_sk and foreign keys(fact) cd_gender = 'F' and cd_marital_status = 'M' and cd_education_status = 'College' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 2001 Predicates on the dimension tables group by i_item_id order by i_item_id; Grouping and ordering. 7
  • 9. 1) dwa_ds.store_sales: INDEX PATH (SKIP SCAN) (1) Index Name: martinfu.sold_date_store_sales Index Keys (Detached): ss_sold_date_sk (Parallel, fragments: ALL) Lower Index Filter: dwa_ds.store_sales.ss_sold_date_sk = stream from dwa_ds.date_dim.d_date_sk 2) dwa_ds.customer_demographics: SEQUENTIAL SCAN Filters: Table Scan Filters: ((dwa_ds.customer_demographics.cd_education_status = 'College' AND dwa_ds.customer_demographics.cd_marital_status = 'M' ) AND dwa_ds.customer_demographics.cd_gender = 'F' ) DYNAMIC HASH JOIN Dynamic Hash Filters: dwa_ds.store_sales.ss_cdemo_sk = dwa_ds.customer_demographics.cd_demo_sk 3) dwa_ds.promotion: SEQUENTIAL SCAN Filters: Table Scan Filters: (dwa_ds.promotion.p_channel_event = 'N' OR dwa_ds.promotion.p_channel_email = 'N' ) DYNAMIC HASH JOIN Dynamic Hash Filters: dwa_ds.store_sales.ss_promo_sk = dwa_ds.promotion.p_promo_sk 4) dwa_ds.item: SEQUENTIAL SCAN DYNAMIC HASH JOIN Dynamic Hash Filters: dwa_ds.store_sales.ss_item_sk = dwa_ds.item.i_item_sk 5) dwa_ds.date_dim: SEQUENTIAL SCAN Filters: Table Scan Filters: dwa_ds.date_dim.d_year = 2001 DYNAMIC HASH JOIN (Index Push Down Key: dwa_ds.date_dim.d_date_sk to dwa_ds.store_sales) Dynamic Hash Filters: dwa_ds.store_sales.ss_sold_date_sk = dwa_ds.date_dim.d_date_sk 8
  • 10. 300GB database cd_marital_status = 'M' and cd_education_status = 'College' d_year = 2001 customer_demographics date_dim 1,920,800 73,049 Store_sales 287,997,024 item promotion 204,000 1000 (p_channel_email = 'N' or p_channel_event = 'N') 9
  • 11. 100GB database Hash Join ss_sold_date_sk = d_date_sk Build Probe Sequential Scan Hash Join: ss_item_sk = i_item_sk date_dim (73K) Build Index Push Down Key: (d_year = 2001) Probe d_date_sk to dwa_ds.store_sales Sequential Scan Item (200K) Hash Join: ss_promo_sk = p_promo_sk Build Probe Sequential Scan (p_channel_event = 'N' OR p_channel_email = 'N' ) Promotion (1000) Hash Join: Build ss_cdemo_sk = cd_demo_sk Sequential Scan Probe ((cd_education_status = 'College' AND cd_marital_status = 'M' ) AND cd_gender = 'F' ) customer_demographics (1.9M) Skip Scan Index scan Sort rowids Store_sales 287,997,024 10
  • 12. Open next 300GB database Hash Join ss_sold_date_sk = d_date_sk Build Probe Open next Sequential Scan Hash Join: ss_item_sk = i_item_sk date_dim (73K) Build Index Push Down Key: (d_year = 2001) Probe d_date_sk to dwa_ds.store_sales Sequential Scan next Open Item (200K) Hash Join: ss_promo_sk = p_promo_sk Build Probe Sequential Scan Open next (p_channel_event = 'N' OR p_channel_email = 'N' ) Promotion (1000) Hash Join: ss_cdemo_sk = cd_demo_sk Build Sequential Scan Probe ((cd_education_status = 'College' AND cd_marital_status = 'M' ) AND cd_gender = 'F' ) customer_demographics (1.9M) Open next Fact Skip Sort rowids Store_sales Index scan Scan Start Fact table scan 11
  • 13. Intersection of rowids 12
  • 14. Tasks for database design • Logical design • Physical design • Hardware configuration • Informix configuration • Feedbacks, changing needs. 13
  • 15. Why is physical database design necessary? Performance. Performance. Performance. • Reduce IO • Improve IO throughput • Improve CPU and network • Improves administration efficiency • Do more with less 14
  • 16. Tasks of physical database design 1. Data type selection 2. Indexes 3. Summary tables 4. Compression 5. Memory allocation 6. Table partitioning 15
  • 17. Data type selection • Numeric is faster than character • BIGINT, BIGSERIAL is faster than int8, serial8 • Fixed char is faster than varchar, lvarchar • All the character types exploit light scan • Larger types means larger indices • Date-time-columns use integer in fact table.. • (RDS – storing the DOMAINS for publishing) 16
  • 18. 17
  • 19. create table "dwa_ds".web_sales ( ws_sold_date_sk integer, ws_sold_time_sk integer, ws_ship_date_sk integer, ws_bill_customer_sk integer, ws_bill_cdemo_sk integer, ws_bill_hdemo_sk integer, ws_bill_addr_sk integer, ws_ship_customer_sk integer, ws_ship_cdemo_sk integer, ..... ws_quantity integer, ws_wholesale_cost decimal(7,2), ws_list_price decimal(7,2), ws_sales_price decimal(7,2), ws_net_paid decimal(7,2), ws_net_paid_inc_tax decimal(7,2), ws_net_paid_inc_ship decimal(7,2), ws_net_paid_inc_ship_tax decimal(7,2), ws_net_profit decimal(7,2), 18 primary key (ws_item_sk,ws_order_number) disabled );
  • 20. Select d_month ,i_category ,sum(ws_net_paid) as total_sum ,substr(i_category,2) as lochierarchy from web_sales ,date_dim ,item where d_year = 2002 and d_date_sk = ws_sold_date_sk and i_item_sk = ws_item_sk group by i_category,i_class order by d_month, i_category; 19
  • 21. Index Design 1. Very few indices 2. Leaner indices – Informix can combine when necessary 3. Keep load, insert performance in mind 1. When possible disable 2. Attach/detach performance with indexes 4. Scan – Scan – Scan 5. Provide better options for Optimizer 6. Improve scan for tables – more for dimension than facts 7. Exploit multi-index scans 8. Star join can work with indexes or without indexes. 20
  • 22. Summary tables 1. Handle the “canned” queries by pre-computing the answers 2. Saves system resources for handling complex ad-hoc queries 3. Can create multiple levels of summaries -- sales by region, by part, by (day, week, month) 4. Need time to create the summary tables ahead of time Summary level 3 (week) Summary level 2 (week) Summary level 1 (day) Fact table > billons of rows 21
  • 23. Traditional Compression 1. Very few indices 2. Leaner indices – Informix can combine when necessary 3. Keep load, insert performance in mind 1. When possible disable 2. Attach/detach performance with indexes 4. Scan – Scan – Scan 5. Provide better options for Optimizer 6. Improve scan for tables – more for dimension than facts 7. Exploit multi-index scans 8. Star join can work with indexes or without indexes. 22
  • 24. Compression On Data Page With Multiple Rows Multiple Compressed Pages Empty Data compress repack Pages shrink Uncompressed Compressed Compressed • To use repack and shrink, the data need not be compressed • repack and shrink can be done independently • Shrink does not do an implicit repack Animated 23 Slide
  • 25. Pagesize Considerations 1. IDS can configure page sizes that are multiples of the 2. default page size 3. Depending on OS can be 2k, 4k, 8k, or 16k 4. Each page size will necessitate its own buffer pool 5. Can be useful if OS supports larger page sizes for buffer transfers. 6. Ensure adequate memory exists 24
  • 26. • Motivation –Performance, Performance, Performance –Performance of the queries –Performance of Warehouse tasks • Extract jobs • Load performance • Transformation • Summary data creation 25
  • 27. • Starting point – Logical design • Observations – Presence or absence of indices and constraints • What are your expected maintenance tasks – Daily tasks – Weekly tasks – Monthly tasks – Quarterly tasks – Yearly tasks 26
  • 28. Topics for the day • Create table:fact an dimension • Create Index: Deciding to which indices to create and their keys and attributes • Attach/detach • Drop/Truncate • SELECT, UPDATE, DELETE • Update Statistics • Distribution 27
  • 29. Features • External table • Options for loading/unloading • Fragment level stats • Fragmentation – interval/etc • PDQ • Interplay with configuration • Keeping things ONLINE 28
  • 30. Fragmentation = Partitioning • In Informix, they are synonymous • Informix was the first to implement this feature. Industry is using partitioning for some time now • Within Informix syntax, we’re making fragmentation and partitioning synonymous. • Informix Fragmentation has nothing to do with disk fragmentation. 29
  • 31. Fragmentation = Partitioning • In Informix, they are synonymous • Informix was the first to implement this feature. Industry is using partitioning for some time now • Within Informix syntax, we’re making fragmentation and partitioning synonymous. • Informix Fragmentation has nothing to do with disk fragmentation. 30
  • 32. Situation 1. OLTP – Data volume – performance 2. Data Warehousing – Data Volume – Performance – Data Management 3. Mixed workload – Do (1) and (2) together 31
  • 33. Problem areas and Opportunities • Capacity • Storage IO throughput • Performance • Availability • Storage management • Reliability • Time cyclic data management • Security 32
  • 34. Capacity • By default, a table gets one partition • Limited number of pages in each partition • Capacity is limited by the number of pages. – 2KB pagesize on most platforms. – 4KB pagesize on Windows and AIX • Increase the capacity by creating dbspace with up to 16KB pagesize. • Compress the table table to reclaim space • Consider fragmenting the table. 33
  • 35. Storage IO throughput • Warehouse and complex queries and DDL operations depend on IO throughput significantly. • By creating multiple fragments in multiple dbspaces, you give Informix opportunity to parallelize operations 34
  • 36. Performance • Enable parallel operations for queries – SELECT – INSERT • Exploit fragment elimination (aka partition pruning) • Improve create index timings • Improve update statistics operation 35
  • 37. Availability • Fragments distributed in multiple disks or volumes are more tolerant of failure • If a disk failed, you can skip over those fragments via DATASKIP 36
  • 38. Storage management • Manage the data distribution on different storage disks or storage managers • Match disk speeds to different quality of service – Recent data on faster/expensive storage 37
  • 39. Time cyclic data management • The table would maintain specific period of data. – Last 25 hours – Last 3 months or 13 months – Last 7 years • At every interval do the three operations – Detach the data no longer needed – Attach the new data or make room for it. – Modify the expressions to represent the new window 38
  • 40. Time cyclic data management working window 4Q2008 1Q2009 2Q2009 3Q2009 4Q2009 1Q2010 2Q2010 39
  • 41. Security • Grant or revoke access per fragment – Grant and revoke is done on the tables • Example – Table is fragmented on states – Table is fragmented on departments – Table is fragmented on date or datetime 40
  • 43. DBSPACE CHUNK CHUNK CHUNK CHUNK Extent Extent Pages Extent Extent Pages Pages Partition 42
  • 44. Tables, Indices and Partitions Customer_table Partition Partition Storesales_table Partition Partition idx_cust_id Idx_store_id Customer_table Paritition 43
  • 45. Fragmentation by Round Robin Fragment1 Fragment2 Fragment3 Fragment4 Fragmentation by Expression September October November December 44
  • 46. Fragmentation by Round Robin CREATE TABLE customer (id int, state char(2)) FRAGMENT BY ROUND ROBIN in dbspace1, dbspace2, dbspace3; CREATE TABLE customer (id int, state char(2)) FRAGMENT BY ROUND ROBIN PARTITION part1 IN dbspace1, PARTITION part2 IN dbspace1, PARTITION part3 IN dbspace2; part1 part2 part3 45
  • 47. Fragmentation by Round Robin • Predictable distribution of data and use of storage. • Create indices with either round robin or expression strategy • No fragment elimination possible • Use expression strategy for indices to get benefit of elimination • Index expression depends on the access pattern. • Blobs avoid extra writes in RR strategy. 46
  • 48. Fragmentation by expression CREATE TABLE customer (id int, state char (2), zipcode decimal(5,0)) FRAGMENT BY EXPRESSION (state = “CA“) in dbspace1, (state = “KS") in dbspace2, (state = “OR") in dbspace3, (state = “NV") in dbspace4; CREATE TABLE customer (id int, state char (2)) FRAGMENT BY EXPRESSION PARTITION part1 (state = “CA") in dbspace1, PARTITION part2 (state = “KS") in dbspace1, PARTITION part3 (state = “OR") in dbspace1, PARTITION part4 (state = “NV") in dbspace1; CA KS OR NV 47
  • 49. Fragmentation by expression CREATE TABLE customer (id int, state char (2), zipcode decimal(5,0)) FRAGMENT BY EXPRESSION partition partca93 (state = “CA“ and zipcode < 93000) in dbspace1, partition partcagt93 (state = “CA“ and zipcode >= 93000) in dbspace5, PARTITION part2 (state = “KS") in dbspace2, PARTITION part3 (state = “OR") in dbspace2, PARTITION part4 (state = “NV") in dbspace3; CA CA KS OR NV < 93000 >= 93000 48
  • 50. Fragmentation by expression • Destination of the row depends on the row data • Design of expressions requires understanding and estimation of data sets • Expressions provide a flexible mechanism • With flexibility comes complexity • Can make fragment elimination complex CA KS OR NV 49
  • 51. Orders Table for 2010 jan_partition 1/1/2010 to 1-31-2010 feb_partition 2/1/2010 to 2-28-2010 march_partition 3/1/2010 to 3-31-2010 april_partition 4/1/2010 to 4-30-2010 5/1/2010 to 5-31-2010 may_partition june-partition 6/1/2010 to 6-30-2010 july_partition 7/1/2010 to 7-31-2010 august_partition 8/1/2010 to 8-31-2010 september_partition 9/1/2010 to 9-30-2010 october_partition 10/1/2010 to 10-31-2010 november_partition 11/1/2010 to 11-30-2010 50
  • 52. Attached Index on a Fragmented Table • Large table DSS or OLTP environment. • Attractive index parallel scans. • Attractive index fragment elimination and smaller btrees. • Attractive scans on data pages in parallel. • Balanced I/O for indexes and data pages. 51
  • 53. Detached Fragmented Index on a Non-fragmented Table • OLTP environment with high index hits vs. data page hits (key only reads). • Attractive index scans in parallel • Attractive index lookups with fragment elimination and smaller btrees. • Unattractive scans on data pages in series. 52
  • 54. Detached Index on a Fragmented Table • DSS environment with some selective queries. • Attractive scans on data pages in parallel. • Unattractive index read in series. 53
  • 55. Detached Fragmented Index on a Fragmented Table • Mixed OLTP and DSS environments with data fragmented for DSS and index fragmented of OLTP or Selective queries and non-selective queries on different columns in a DSS environment. • Attractive index parallel scans. • Attractive index fragment elimination and smaller btrees. • Attractive scans on data pages in parallel. • Balanced I/O for indexes and data pages. 54
  • 56. Three things important in determining your strategy and expressions Workload. Workload. Workload. 55
  • 57. Create table history ( Docid char(14), .. ) fragment by expression Partition p1 (docid >= “abcdef20112001” and docid <(babcde20111000) in dbs1, Partition p2(docid >= “babcde20111000” and .. Docid: abcdef20110001 56
  • 58. Factors to consider • Query Performance – What kind of queries? – Data Distribution – Which applications use this table? • Storage Management – How do you handle data growth? – What’s the table management strategy when dataset grows 57
  • 59. Fragmentation Objectives Parallelism scan threads * Fragments are accessed in parallel, decreasing scan or insert time. fragments Fragment Elimination scan threads * Unneeded fragments are eliminated, decreasing scan or X X fragments X X insert time, and reducing disk contention. scan threads * Fragment Elimination & Parallelism Both goals are achieved. fragments X X 58
  • 60. OLTP characteristics: For this environment: • high volume of short • Fragmentation by round transactions robin or expression strategy • each transaction accesses • Reduce index access times a few rows using fine grain expressions • index access method is used. on the index fragment. – Fragment recent data by day and older data by week or month. 59
  • 61. Data Warehousing characteristics For this environment: • fragment elimination • low volume of long running queries • queries access most of the rows in each • parallel scan the fragment needed fragments • very few indexes are generally required • preferred access method is sequential scan • preferred join method is the hash join 60
  • 62. Top IDS features utilized for building warehouse • Multi-threaded Dynamic Scalable • Time cyclic data mgmt Architecture (DSA) – Fragment elimination, fragment attach and detach – Scalability and Performance – Data/index distribution schemas – Optimal usage of hardware and OS – Improve large data volume resources manageability • DSS Parameters to optimize memory – Increase performance by – DSS queries maximizing I/O throughput – Efficient hash joins • Configurable Page Size – On disk and in memory • Parallel Data Query for parallel – Additional performance gains operations • Large Chunks support – Light scans, extensive – Allows IDS instances to handle – calculations, sorts, multiple joins large volumes – Ideal for DSS queries and batch • Quick Sequential Scans operations – Essential for table scans common to • Data Compression DSS environments 17 61 Source:
  • 63. Top IDS features utilized for building warehouse • Multi-threaded Dynamic Scalable • Time cyclic data mgmt Architecture (DSA) – Fragment elimination, fragment attach and detach – Scalability and Performance – Data/index distribution schemas – Optimal usage of hardware and OS – Improve large data volume Fragmentation Features resources manageability • DSS Parameters to optimize memory – Increase performance by – DSS queries maximizing I/O throughput – Efficient hash joins • Configurable Page Size – On disk and in memory • Parallel Data Query for parallel – Additional performance gains operations • Large Chunks support – Light scans, extensive – Allows IDS instances to handle – calculations, sorts, multiple joins large volumes – Ideal for DSS queries and batch • Quick Sequential Scans operations – Essential for table scans common to DSS environments 17 • Data Compression 62 Source:
  • 64. Fragmentation Objectives Parallelism scan threads * Fragments are accessed in parallel, decreasing scan or insert time. fragments Fragment Elimination scan threads * Unneeded fragments are eliminated, decreasing scan or X X fragments X X insert time, and reducing disk contention. scan threads * Fragment Elimination & Parallelism Both goals are achieved. fragments X X 63
  • 65. Typical Query with Non-PDQ vs. PDQ Send to client Sort Sort Sort Send to client Join Join Scan Scan Scan Scan 64
  • 66. PDQ/Fragmentation • Consider fragmenting any large table in a dbspace that is getting a lot of IO activity • Consider fragmenting any large table if scans must be done against the table • Do not put multiple fragments of the same table on the same physical device – Be aware of the I/O throughput of your storage • Avoid using round robin fragmentation for indexes. • Do not over-fragment. – The cost of managing fragmentation can outweigh the benefits when there are excessive fragments. 65
  • 67. PDQ Configuration • MAX_PDQPRIORITY – Set highest percentage of PDQ resources that a single client can use • b – Max number of DSS queries that can be run together • DS_TOTAL_MEMORY – Total memory reserved for PDQ • DS_MAX_SCANS – Max number of parallel scans allowed. Leave at default (1048567) 66
  • 68. PDQ Configuration • If the site is primary a DSS system, then it is recommended that most of the allocated memory be in the virtual buffers and that DS_TOTAL_MEMORY be very large • PDQ can be used in smaller memory environments by setting PDQ_PRIORITY to 1 so that parallel scans can be done. 67
  • 69. PDQ Configuration • onmode can be used to dynamically change PDQ parameters – onmode –M (DS_TOTAL_MEMORY) – onmode –Q (DS_MAX_QUERIES) – onmode –D (MAX_PDQPRIORITY) – onmode –S (DS_MAX_SCANS) 68
  • 70. Design for Time Cyclic data mgmt create table mytrans( custid integer, proc_date date, store_loc char(12) …. ) fragment by expression ...... (proc_date < DATE ('01/01/2009' ) ) in fe_auth_log20081231, (MONTH(proc_date) = 1 ) in frag2009Jan , (MONTH(proc_date) = 2 ) in frag2009Feb,…. (MONTH(proc_date) = 10 and proc_date < DATE ('10/26/2009' ) ) in frag2009Oct , (proc_date = DATE ('10/26/2009' ) ) in frag20091026 , (proc_date = DATE ('10/27/2009' ) ) in frag20091027, (proc_date = DATE ('10/28/2009' ) ) in frag20091027 , (proc_date = DATE ('10/29/2009' ) ) in frag20091027 , (proc_date = DATE ('10/30/2009' ) ) in frag20091027 , (proc_date = DATE ('10/31/2009' ) ) in frag20091027 , (proc_date = DATE ('11/01/2009' ) ) in frag20091027 , ; 69
  • 71. Fragment elimination Type of filter (WHERE Nonoverlapping Overlapping on a Nonoverlapping clause) Single fragment single column key Multiple column key key Range expression Can eliminate Cannot eliminate Cannot eliminate Equality expression Can eliminate Can eliminate Can eliminate 70
  • 72. create table f1(a int, b varchar(32)) fragment by expression partition p1 (a = 1) in rootdbs, partition p2 (a = 2) in rootdbs, partition p3 (a = 3) in rootdbs, partition p4 (a = 4) in rootdbs, partition p5 (a = 5) in rootdbs; insert into f1 select 1, 'oneworld' from systables; insert into f1 select 2, 'twoworld' from systables; insert into f1 select 3, 'threeworld' from systables; insert into f1 select 4, 'fourworld' from systables; insert into f1 select 5, 'fiveworld' from systables; insert into f1 select * from f1; create index f1ab on f1(a,b); create index f1b on f1(b); update statistics high for table f1; 71
  • 73. QUERY: (OPTIMIZATION TIMESTAMP: 9-29-2010 00:56:12) ------ select * from t1 where a=1 Estimated Cost: 5 Estimated # of Rows Returned: 136 1) keshav.t1: INDEX PATH (1) Index Name: keshav.t1ab Index Keys: a b (Serial, fragments: 0) Fragments Scanned: (0) p1 in rootdbs Lower Index Filter: keshav.t1.a = 1 72
  • 74. QUERY: (OPTIMIZATION TIMESTAMP: 9-29-2010 01:06:54) ------ select * from t1 where a <3 Estimated Cost: 11 Estimated # of Rows Returned: 272 1) keshav.t1: SEQUENTIAL SCAN (Serial, fragments: 0, 1) Fragments Scanned: (0) p1 in rootdbs, (1) p2 in rootdbs Filters: keshav.t1.a < 3 73
  • 75. select count(*) from t1 where a = 3 and b = 'threeworld' Estimated Cost: 1 Estimated # of Rows Returned: 1 1) keshav.t1: INDEX PATH (1) Index Name: keshav.t1ab Index Keys: a b (Serial, fragments: 2) Fragments Scanned: (2) p3 in rootdbs Lower Index Filter: (keshav.t1.b = 'threeworld' AND keshav.t1.a = 3 ) 74
  • 76. select count(*) from t1 where a = a+1 and b = 'threeworld' Estimated Cost: 11 Estimated # of Rows Returned: 1 1) keshav.t1: INDEX PATH Filters: keshav.t1.a = keshav.t1.a + 1 (1) Index Name: keshav.t1b Index Keys: b (Serial, fragments: ALL) Lower Index Filter: keshav.t1.b = 'threeworld' 75
  • 77. Managing Change • You have a round robin strategy, but running out of space • You have a round robin strategy, but need to modify it into expression expression strategy • Time Cyclic changes aka Roll-on Roll-off – Daily, weekly, monthly, yearly – Hourly anyone? – I’m running out of space in a fragment! – I have uneven distribution of data – I can’t have any down-time! 76
  • 78. Managing Change • You have a round robin strategy, but running out of space – Analyze how frequently you’re adding new fragments – Consider moving to higher pagesize (2K to 16K) – All fragments of the table have to be in same pagesize. – But, the indices can be in a different pagesize than table. But, all fragments in an index use same pgsize – On 11.50.xC4 or above, you have another option! • You can compress the table data – Consider moving to higher pagesize (2K to 16K) for further capacity 77
  • 79. Managing Change • You have a round robin strategy, but need to modify it into expression expression strategy –You need exclusive access to the table –You need to schedule the down time. –ALTER FRAGMENT… INIT 78
  • 80. Managing Change • Time Cyclic changes aka Roll-on Roll-off New data waiting to roll on Current working set 4Q2008 1Q2009 2Q2009 3Q2009 4Q2009 1Q2010 2Q2010 Older rolled-off data 79
  • 81. Managing Change • Time Cyclic changes aka Roll-on Roll-off Q1 Q2 Q1 Q2 Q3 Q4 80
  • 82. Managing Change • Time Cyclic changes aka Roll-on Roll-off – Daily, weekly, monthly, yearly – Consider the data distribution and the queries on it. – Fine granularity can help speed up analysis – Consider hybrid create table t(a int, b date) fragment by expression partition p1 (month(b) = 10) in dbs, partition p2 (month(b) = 11) in dbs, partition p3 (b >= date("12/01/2009") and b <= date("12/15/2009")) in dbs, partition p4 (b >= date("12/16/2009") and b <= date("12/31/2009")) in dbs; – Test your strategies to ensure fragment elimination. 81
  • 83. Enterprise Data Warehouse Platform Query LOB Tools apps BI Databases Apps BPS Apps Other transactional data sources Analytics I/O & data Interconnect DBMS & Storage Query loading & Networking mgmt processing Source: Forrester 82
  • 84. Enterprise Data Warehouse Platform Deep Compession MERGE Hierarchical Queries External Table Query Microstrategy LOB Online Interval and List Light Scans Tools Cognos apps attach/detach Fragmentation Multi-Index Scan Pentaho Online attach/detach Skip Scan BI Jaspersoft Fragment level stats Databases Bitmap Technology Apps Storage provisioning SQW Star and Snowflake Table defragmenter join optimization BPS Implicit PDQ Apps Other transactional data sources Query Processing Analytics Tooling Data & storage Data Loading management I/O & data Interconnect DBMS & Storage Query loading & Networking mgmt processing Source: Forrester 83
  • 85. Top IDS features utilized for building warehouse • Time cyclic data mgmt • Multi-threaded Dynamic Scalable Architecture (DSA) – Fragment elimination, fragment attach and detach – Scalability and Performance – Data/index distribution schemas – Optimal usage of hardware and OS – Improve large data volume resources manageability • DSS Parameters to optimize memory – Increase performance by – DSS queries maximizing I/O throughput – Efficient hash joins • Configurable Page Size • Parallel Data Query for parallel operations – On disk and in memory – Light scans, extensive – Additional performance gains – calculations, sorts, multiple joins • Large Chunks support – Ideal for DSS queries and batch – Allows IDS instances to handle operations large volumes • Data Compression • Quick Sequential Scans – Essential for table scans common to DSS environments 84
  • 86. New fragmentation Strategies in Informix v11.70 • List Fragmentation – Similar to expression based fragmentation – Syntax compatibility • Interval Fragmentation – Like expression, but policy based – Improves availability of the system 85
  • 87. Time Cyclic Data management • Time-cyclic data management (roll-on, roll-off) • Attach the new fragment Dec 08 • Detach the fragment no longer needed field field field • Update the statistics (low, medium/high) to keep field field field field everything up to date. field field field Jan Feb Mar Apr field field field field field field field field field field field field field field field field field field field field May 09 field field field field field field field field field field field field field field field field enables storing data over time field field field 86
  • 88. Time Cyclic Data management • ATTACH, DETACH and rest of ALTERs require exclusive access – Planned Downtime • These can be scripted, but still need to lock out the users – Informix 11.50.xC6 has DDL_FORCE_EXEC to lock out the users • Expression strategy gives you flexibility, but elimination can be tricky. Dec 08 field field field field field field field field field field Jan Feb Mar Apr field field field field field field field field field field field field field field field field May 09 field field field field field field field field field field field field field field field field field field field field field 87 field field
  • 89. Fragment by Expression create table orders ( order_num int, order_date date, customer_num integer not null, ship_instruct char(40), backlog char(1), po_num char(10), ship_date date, ship_weight decimal(8,2), ship_charge money(6), paid_date date ) partition by expression partition prv_partition (order_date < date(’01-01-2010’)) in mydbs, partition jan_partition (order_date >= date(’01-01-2010’) and order_date < date(’02-01-2010’) in mydbs, partition feb_partition (order_date >= date(’02-01-2010’) and order_date < date(’03-01-2010’) in mydbs, partition mar_partition (order_date >= date(’03-01-2010’) and order_date < date(’04-01-2010’) in mydbs, partition apr_partition (order_date >= date(’04-01-2010’) and order_date < date(’05-01-2010’) in mydbs, … 88
  • 90. Fragment by Interval create table orders ( order_num int, order_date date, customer_num integer not null, ship_instruct char(40), backlog char(1), Interval Value po_num char(10), ship_date date, ship_weight decimal(8,2), ship_charge Partition Key money(6), paid_date date ) partition by range(order_date) interval(1 units month) dbspaces store in (dbs1, dbs2) partition prv_partition values < date(’01-01-2010’) in dbs3; Initial Partition 89
  • 91. Interval Fragmentation • Fragments data based on an interval value – E.g. fragment for every month or every million customer records • Tables have an initial set of fragments defined by a range expression • When a row is inserted that does not fit in the initial range fragments, IDS will automatically create fragment to hold the row (no DBA intervention) • No X-lock is required for fragment addition • All the benefits of fragment by expression 90
  • 92. ONLINE attach, detach • ATTACH – Load the data into a staging table, create the indices exactly as you have in your target table. – Then simply attach the table as a fragment into another table. • DETACH – Identify the partition you want to detach – Simply detach the partition with ONLINE keyword to avoid attemps to get exclusive access 91
  • 93. Attach Example ALTER FRAGMENT ONLINE ON TABLE “sales”.orders ATTACH december_orders_table as PARTITION december_partition values < 01-01-2011; 92
  • 96. New queries will work on the table and will Attaching online consider the new table fragment . Query1 and Query2 continue and New queries will work on the table and won’t won’t access the new partition consider the table fragment for the queries. December_orders_table Table to attach query3 query3 query4 query2 query1 query2 query1 query2 query1 orders Get exclusive access to the partion list (in the Attach dictionary) .The dictionary entry gets modified and new dictionary entries for new queries from here on Issue ALTER ATTACH ONLINE Modify the dictionary entry to indicate online attach ONLINE ATTACH operation is complete. is in progress. Other sessions can read the list but Table is fully available for queries 95 cannot modify.
  • 97. ONLINE operations • ATTACH a fragment • DETACH a fragment • MODIFY transition value • Automatic ADDing of new fragments on insert or update • tasks eliminated by interval fragmentation – Scheduling downtime to get exclusive access for ADD, ATTACH, DETACH – Defining proper expressions to ensure fragment elimination – Running of update statistics manually after ALTER operations – Time taken to collect statistics is reduced as well. 96
  • 98. UPDATE STATISTICS during ATTACH, DETACH • Automatically kick-off update statistics refresh in the background – need to enable fragment level statistics • tasks eliminated by interval fragmentation –Running of update statistics manually after ALTER operations –Time taken to collect statistics is reduced as well. 97
  • 99. List fragmentation CREATE TABLE customer (id SERIAL, fname CHAR(32), lname CHAR(32), state CHAR(2), phone CHAR(12)) FRAGMENT BY LIST (state) PARTITION p0 VALUES ("KS", "IL", "IN") IN dbs0, PARTITION p1 VALUES ("CA", "OR", "NV") IN dbs1, PARTITION p2 VALUES ("NY", "MN") IN dbs2, PARTITION p3 VALUES (NULL) IN dbs3, PARTITION p4 REMAINDER IN dbs3; 98
  • 100. Smarter Statistics Collection 99
  • 101. Fragment Level Statistics (FLS) • Generate and store column distribution at fragment level • Fragment level stats are combined to form column distribution • System monitors UDI (Update/Delete/Insert) activities on each fragment • Stats are refreshed only for frequently updated fragments • Fragment level distribution is used to re-calculate column distribution • No need to re-generate stats across entire table 100
  • 102. Generating Table Level Statistics Frag 1 Data Feed Store Distribution Feed Encoded Cache Column Sorted Distribution Data Data Frag S Bin 2 Decode O Generator Sysdistrib Distribution R & Encoder Catalog T table Frag n • Distribution created for entire column dataset from all fragments. • Stored in sysdistrib with (tabid,colno) combination. • Dbschema utility can decodes and display encoded distribution. • Optimizer uses in-memory distribution representation for query optimization. 101
  • 103. Generating Fragment Level Statistics Data Distribution Store Cache S Encoded Frag Mini-Bin O Minibins 1 Generator Decode R Distribution & Encoder S T Feed Feed O Column decode R Feed Data Minibins T Sorted Data S O Mini-Bin Sysdistrib Frag Sysfragdist R Generator Catalog 2 Catalog T & Encoder Table Table Mini-Bin Merger S & Bin Mini-Bin Frag O Encoder Generator Store n R & Encoder Encoded T Distribution 102
  • 104. STATLEVEL property STATLEVEL defines the granularity or level of statistics created for the table. Can be set using CREATE or ALTER TABLE. STATLEVEL [TABLE | FRAGMENT | AUTO] are the allowed values for STATLEVEL. TABLE – entire table dataset is read and table level statistics are stored in sysdistrib catalog. FRAGMENT – dataset of each fragment is read an fragment level statistics are stored in new sysfragdist catalog. This option is only allowed for fragmented tables. AUTO – System determines when update statistics is run if TABLE or FRAGMENT level statistics should be created. 103
  • 105. UPDATE STATISTICS extensions • UPDATE STATISTICS [AUTO | FORCE]; • UPDATE STATISTICS HIGH FOR TABLE [AUTO | FORCE]; • UPDATE STATISTICS MEDIUM FOR TABLE tab1 SAMPLING SIZE 0.8 RESOLUTION 1.0 [AUTO | FORCE ]; • Mode specified in UPDATE STATISTICS statement overrides the AUTO_STAT_MODE session setting. Session setting overrides the ONCONFIG's AUTO_STAT_MODE parameter. 104
  • 106. UPDATE STATISTICS extensions • New metadata columns - nupdates, ndeletes and ninserts – in sysdistrib and sysfragdist store the corresponding counter values from partition page at the time of statistics generation. These columns will be used by consecutive update statistics run for evaluating if statistics are stale or reusable. • Statistics evaluation is done at fragment level for tables with fragment level statistics and at table level for the rest. • Statistics created by MEDIUM or HIGH mode (column distributions) is evaluated. • The LOW statistics is saved at the fragment level as well and is aggregated to collect global statistics 105
  • 107. Alter Fragment Attach/Detach • Automatic background refreshing of column statistics after executing ALTER FRAGMENT ATTACH/DETACH on a table with fragmented statistics. • Refreshing of statistics begins after the ALTER has been committed. • For ATTACH operation, fragmented statistics of the new fragment is built and table level statistics is rebuilt from all fragmented statistics. Any existing fragments with out of date column statistics will be rebuilt at this time too. • For DETACH operation, table level statistics of the resulting tables are rebuilt from the fragmented statistics. • The background task that refreshes statistics is “refreshstats” and will print errors in online.log if any are encountered. 106
  • 108. Round List Expressio Interval Robin n Parallelism Yes Yes Yes Yes Range No Yes Yes Yes Expression Equality No Yes Yes Yes Expression FLS Yes Yes Yes Yes Smarter Stats Yes Yes Yes Yes ATTACH No No No Yes ONLINE DETACH No No No Yes ONLINE MODIFY No No No Yes -- MODIFY ONLINE transition value Create index Yes Yes Yes Not yet ONLINE Storage No No No Yes 107 Provisioning
  • 109. Session TPM3 8/26/2011 108