SlideShare una empresa de Scribd logo
1 de 29
Welcome to the webinar on

Designing High Performance Datawarehouse

Presented by

&
Contents

1

What happened in the Data 1.0 World

2

What is shaping the new Data 2.0 World

3

Designing High Performance Datawarehouse

4

Q&A
What happened in the Data 1.0 World?
Before 2000

Do we need a DWH?

2000s

Select success : top down &
bottom up

Advent of ODS

Now

Business led

We’ve got BI / DWH Tools

Volume | Variety | Velocity |
Value

Performance vs. Volume :
Game Changer

Need insights from nonstructured data as well

Drill-down Reporting from
DWH – getting into mainstream

Analytics is a differentiator

Data Silos
Metrics for success?
OLAP = Insights
Painful Implementations

Show me the ROI
Standardized KPIs
Analytics as differentiator?

(DATA) Big, Real time, In-memory
– what do with existing
initiatives?

Retaining skills and expertise
Data 2.0 : scale, performance,
knowledge, relevance
Challenges in current DW environment - Survey
42%

say
Can’t scale to big data volumes

27% say
Inadequate data load speed

27%

say
Poor query response

25%
Existing DW modeled for
reports & OLAP only

24%
24%
23%
19%

Can’t score analytic models
Fast enough

18%

Cost of scaling up or out is too expensive

15%

Can’t support high
Concurrent user count

15%
Inadequate support for
In-memory processing

9%

18%
Current platform needs great
Manual effort for performance
Poorly suited to real-time
workloads
Can’t support in-database
analytics
Poor CPU speed and
capacity

Current platform is a legacy,
We must phase it out

TDWI research based on 278 respondents – Top Responses`
Social Media
Data

Data 2.0 World

True Sentiment
Faster Compliance

Text Data

Sensor Data

High Performance
Data Warehouse

Concurrency Enabled
Able to handle Complexity
Ability to Scale

Syndicated
Data

Faster Reach

Speed

Numeric
Data

Every 18 months, non-rich structured and unstructured enterprise
data doubles.

Big Data Analytics
Analytics =
Competitive Advantage

Efficiencies driving
down costs

Customer
experience & service

Business is now equipped to consume, identify and act upon this data for superior insights
So what is a High Performance Datawarehouse?

Key Dimensions
CONCURRENCY

S
P
E
E
D

HIGH
PERFORMANCE
DATA
WAREHOUSE

SCALE

C
O
M
P
L
E
X
I
T
Y
CONCURRENCY





 Streaming Big Data
S  Event Processing
P  Real time operation
 Operational BI
E
 Near time Analytics
E
 Dashboard
D
Refresh
 Fast Queries

Competing Workloads – OLAP, Analytics
Intraday data loads
Thousands of users
Ad hoc queries

High
Performance
Data
Warehouse






Big Data volumes
Detailed source data
Thousands of reports
Scale out into: cloud, clusters, grids, etc.

SCALE

 Big Data variety
 Unstructured
 Sensor
 Social media
 Many sources /
targets
 Complex models
and SQL
 High availability

C
O
M
P
L
E
X
I
T
Y
Designing High Performance Datawarehouse
Industry recognized top techniques
45%

say
Creating Summary Tables

44%

say

33%
Adding Indexes

say
Altering SQL Statements or routines

24%
24%

Changing physical data models

16%

Using in-memory databases

21%

16%

Upgrading Hardware

20%
16%

Choosing between column-row
oriented data storage
Restricting or throttling user queries

15%

Moving an application to a
separate data mart

10%
Applying workload to
management controls

Shifting some workloads
to off-peak hours
Adjusting system parameters

6%
Others

TDWI research based on 329 responses from 114 respondents
Designing Summary Tables

45%

say
Creating Summary Tables
Summary table design process
A good sampling of queries. These may come from user interviews, testing / QA queries,

COLLECT

production queries, reports or any other means that provide a good representation of

expected production queries

ANALYZE

IDENTIFY

The dimension hierarchy levels, dimension attributes, and fact table measures that are

required by each query or report.

The row counts associated with each dimension level represented.

The most commonly required dimension levels against the number of rows in the resulting

BALANCE

summary tables. A goal should be to design summary tables that are roughly 1/100th the size
of the source fact tables in terms of rows (or less)

MINIMIZE

The columns that are carried in the summary table in favor of joining back to the dimension
table. The larger the summary table, the less performance advantages it provides.

Some of the best candidates for aggregation will be those where the row counts decrease the most from one level in a
hierarchy to the next.
Capturing requirements for Summary table
•Choosing Aggregates to Create - There are two basic pieces of information which are
required to select the appropriate aggregates.
•Expected usage patterns of the data.
•Data volumes and distributions in the fact table
Report

Date
Calendar Year

Measures
Sales
Sale_Amt

Dimension

Level

Report 1

Dimension Level
Store
Item
District

Report 2

District

Calendar Year

Sales_Qty
Sale_Amt

Store Geography

Report 3

District

Calendar Month
Calendar Year

Sales_Qty
Sale_Amt

Calendar Month
Fiscal Period
Fiscal Week
Fiscal Period
Fiscal Week

Sales_Qty
Sale_Amt
Sales_Qty
Sale_Amt
Sale_Amt

Fiscal Week

Sales_Qty
Sale_Amt

Division
Region
District
Store
Subject
Category
Department
Fiscal Year
Fiscal Quarter
Fiscal Period
Fiscal Week

Report 4
Report 5
Report 6
Report 7
Report 8
Report 9
Report 10
Report 11

District
Store

Category

Dept
Dept

District
District
District
District
Region

Dept
Category

Fiscal Quarter
Fiscal Period
Fiscal Week

Sales_Qty
Sale_Amt
Sales_Qty

Item Category
Date

#
Populated
of Members
1
3
50
3980
279
1987
4145
3
12
36
156
Summary table design considerations
Aggregate storage column selection

 Semi-additive and all non-additive fact data
– need not be stored in the summary table
 Add as many “pre calculated” columns as possible
 “Count” columns could be added for non additive
facts to preserve a portion of the information

Recreating vs. Updating Aggregates

 Efficient for aggregation programs to update the
aggregate tables with the newly loaded data
 Regeneration more appropriate if there is a lot of
program logic to determine what data must be
updated in the aggregate table

Storing Aggregate Rows
 A combined table containing basic level fact
rows and aggregate rows
 A single aggregate table which holds all
aggregate data for a single base fact table
 A separate table for each aggregate created

– Most preferred option

Storing Aggregate Dimension Data
 Multiple hierarchies in a single dimension
 Store all of the aggregate dimension records
together in a single table
 Use a separate table for each level in the

dimension
 Add dimension data to aggregate fact table
Efficient Indexing for Datawarehouse

44%

say
Adding Indexes
Dimension table indexing
Create a non clustered, primary key on the surrogate key of
each dimension table

•

A clustered index on the business key should be considered.
• Enhance the query response when the business key is
used in the WHERE clause.
• Help avoid lock escalation during ETL process

•

For large type 2 SCDs, create a four-part non-clustered index :
business key, record begin date, record end date and surrogate
key

•

Create non-clustered indexes on columns in the dimension that
will be used for searching, sorting, or grouping,.

•

If there’s a hierarchy in a dimension, such as Category- Sub
Category-Product ID, then create index on Hierarchy

Index Type

EmployeeKey

•

Index columns

Non clustered

EmployeeNationalIDAlternateKey

clustered

EmployeeNationalIDAlternateKey,
StartDate, EndDate
EmployeeKey

Non clustered

FirstName
LastName
DeoartmentName

Non clustered
Fact table indexing

Index columns

Index Type
clustered

•

Create a clustered, composite index composed of each of
the foreign keys to the fact tables

OrderDateKey
ProductKey
CustomerKey
PromotionKey
CurrencyKey
SalesTerritoryKey
DueDateKey

•

Keep the most commonly queried date column as the
leftmost column in the index

•

There can be more than one date in the fact table but there
is usually one date that is of the most interest to business
users. A clustered index on this column has the effect of
quickly segmenting the amount of data that must be
evaluated for a given query
Column Oriented databases
Row Store and Column Store
Most of the queries does not
process all the attributes of a
particular relation.

Row Store

Column Store

(+) Easy to add/modify a record

(+) Only need to read in relevant data

(-) Might read in unnecessary data

(-) Tuple writes require multiple accesses

• One can obtain the performance benefits of a column-store using a row-store
by making some changes to the physical structure of the row store.
– Vertically partitioning
– Using index-only plans
– Using materialized views
Vertical Partitioning
• Process:
– Full Vertical partitioning of each relation
• Each column =1 Physical table
• This can be achieved by adding integer position column to every table
• Adding integer position is better than adding primary key

– Join on Position for multi column fetch
Index-only plans
• Process:
– Add B+Tree index for every Table.column
– Plans never access the actual tuples on disk
– Headers are not stored, so per tuple overhead is less
Using Hadoop for Datawarehouse
Ecosystem of
open
Source projects

Metadata Management
(Hcatlog)
Distributed Processing
(MapReduce)
Distributed Storage
(HDFS)

Hosted by
Apache
Foundation

Query
(Pig)

Google
developed and
shared
concepts

(Hcatlog APIs, WebHDFS,
Talend Open Studio for Big Data, Sqoop)

Scripting
(Pig)

Data Extraction & Loading

Non-Relational Database
(Hbase)

Workflow & Scheduling
(Oozie)

Management & Monitoring
(Ambari, Zookeeper)

Hadoop ecosystem

Distributed File
System that has
the ability to
scale out
Promising uses of Hadoop in DW context

Data Staging

Hadoop’s scalability and low cost
enable organizations to keep all
data forever in a readily
accessible online environment

Data archiving

Schema flexibility

Hadoop enables the growing
practice of “late binding” –
instead of transforming data as
it’s ingested by Hadoop, structure
is applied at runtime

Hadoop allows organizations to
deploy an extremely scalable and
economical ETL environment

Hadoop can quickly and easily
ingest any data format

Processing flexibility

Distributed DW architecture

Off load workloads for big data and
advanced analytics to HDFS,
discovery platforms and MapReduce
What led to Datawarehouse at Facebook
The Problem

The Hadoop Experiment

Challenges with Hadoop

Data, data and more data

Superior in availability, scalability

Programmability & Metadata



200 GB per day in

And Manageability compared

March 2008

to commercial Databases

2+ TB (compressed) per day

Uses Hadoop File System (HDFS)



Map Reduce hard to program
Need to publish data in well
known schemas

HIVE
What is Hive?

Key Building Principles

Tables

A system for managing and
querying structured data built on
top of Hadoop

SQL on structured data as a familiar data
warehousing tool

Each table has a corresponding directory in HDFS

Uses Map Reduce for execution

Pluggable map/reduce scripts in language
of your choice: Rich Data Types

Uses HDFS for storage

Performance

Each table points to existing data directories in
HDFS
Split data based on hash of a column – mainly for
parallelism
Analytical platforms
Analytical platforms overview
1010data
Aster Data (Teradata)
Calpont
Datallegro (Microsoft)
Exasol
Greenplum (EMC)
IBM SmartAnalytics
Infobright
Kognitio
Netezza (IBM)
Oracle Exadata
Paraccel
Pervasive
Sand Technology
SAP HANA
Sybase IQ (SAP)
Teradata
Vertica (HP)

Purpose-built database management
systems designed explicitly for query
processing and analysis that provides
dramatically higher price/performance
and availability compared to general
purpose solutions.
Deployment Options
-Software only (Paraccel, Vertica)
-Appliance (SAP, Exadata, Netezza)
-Hosted(1010data, Kognitio)

•

Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations

•

AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted
marketing
Which platform do you choose?

Hadoop

Analytic Database

General Purpose
RDBMS

Structured 

Semi-Structured 

Unstructured
Thank You
Please send your Feedback & Corporate Training /Consulting Services

requirements on BI to sameer@compulinkacademy.com

Más contenido relacionado

La actualidad más candente

SAP Data Services
SAP Data ServicesSAP Data Services
SAP Data ServicesGeetika
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)AYESHA JAVED
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension Sunita Sahu
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyDatamining Tools
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
White Paper - Data Warehouse Documentation Roadmap
White Paper -  Data Warehouse Documentation RoadmapWhite Paper -  Data Warehouse Documentation Roadmap
White Paper - Data Warehouse Documentation RoadmapDavid Walker
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse FundamentalsRashmi Bhat
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 

La actualidad más candente (20)

Retail Data Warehouse
Retail Data WarehouseRetail Data Warehouse
Retail Data Warehouse
 
SAP Data Services
SAP Data ServicesSAP Data Services
SAP Data Services
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter1 of datawarehouse cs614(solution of exercise)
 
04 olap
04 olap04 olap
04 olap
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
SAP Material master general document
SAP Material master   general documentSAP Material master   general document
SAP Material master general document
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technology
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
White Paper - Data Warehouse Documentation Roadmap
White Paper -  Data Warehouse Documentation RoadmapWhite Paper -  Data Warehouse Documentation Roadmap
White Paper - Data Warehouse Documentation Roadmap
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse Fundamentals
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 

Destacado

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
DWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - PresentationDWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - PresentationDavid Walker
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL ServerPeter Gfader
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanKirti Bhushan
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence OverviewClaudio Menozzi
 
Keynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsKeynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsCloudera, Inc.
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsFumiko Yamashita
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...Eric Javier Espino Man
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouseJ M
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Mike Frampton
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modelingaksrauf
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreCloudera, Inc.
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyMark Ginnebaugh
 

Destacado (20)

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
DWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - PresentationDWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
DWBI98 - Template Solutions for Data Warehouses and Data Marts - Presentation
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
Testing data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti BhushanTesting data warehouse applications by Kirti Bhushan
Testing data warehouse applications by Kirti Bhushan
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
Seminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia NusantaraSeminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia Nusantara
 
Keynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive AnalyticsKeynote: The Journey to Pervasive Analytics
Keynote: The Journey to Pervasive Analytics
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration Concepts
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
 
3 tier data warehouse
3 tier data warehouse3 tier data warehouse
3 tier data warehouse
 
Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2Data warehouse inmon versus kimball 2
Data warehouse inmon versus kimball 2
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Breakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data StoreBreakout: Hadoop and the Operational Data Store
Breakout: Hadoop and the Operational Data Store
 
Architecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case StudyArchitecting a Data Warehouse: A Case Study
Architecting a Data Warehouse: A Case Study
 

Similar a Designing high performance datawarehouse

Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AXAlvin You
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesInformaticaTrainingClasses
 
MicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy Nederland
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologytovetrivel
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsMariaDB plc
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsMariaDB plc
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Marc Nehme
 
Team project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataTeam project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataManasa Damera
 

Similar a Designing high performance datawarehouse (20)

Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
3dw
3dw3dw
3dw
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
 
3dw
3dw3dw
3dw
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
MicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business Dashboards
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminologyDay 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
Team project - Data visualization on Olist company data
Team project - Data visualization on Olist company dataTeam project - Data visualization on Olist company data
Team project - Data visualization on Olist company data
 

Más de Uday Kothari

Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune Uday Kothari
 
MoSync Cross Platform mobile app development
MoSync  Cross Platform mobile app developmentMoSync  Cross Platform mobile app development
MoSync Cross Platform mobile app developmentUday Kothari
 
Cross platform mobile app development tools review
Cross platform mobile app development tools reviewCross platform mobile app development tools review
Cross platform mobile app development tools reviewUday Kothari
 
BI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewBI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewUday Kothari
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Uday Kothari
 
The art technique of data visualization
The art  technique of data visualizationThe art  technique of data visualization
The art technique of data visualizationUday Kothari
 
Innovative Internet & Digital marketing
 Innovative Internet & Digital marketing  Innovative Internet & Digital marketing
Innovative Internet & Digital marketing Uday Kothari
 

Más de Uday Kothari (7)

Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune Introduction to blockchain Session @ Tie Pune
Introduction to blockchain Session @ Tie Pune
 
MoSync Cross Platform mobile app development
MoSync  Cross Platform mobile app developmentMoSync  Cross Platform mobile app development
MoSync Cross Platform mobile app development
 
Cross platform mobile app development tools review
Cross platform mobile app development tools reviewCross platform mobile app development tools review
Cross platform mobile app development tools review
 
BI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikViewBI & Analytics in Action Using QlikView
BI & Analytics in Action Using QlikView
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
The art technique of data visualization
The art  technique of data visualizationThe art  technique of data visualization
The art technique of data visualization
 
Innovative Internet & Digital marketing
 Innovative Internet & Digital marketing  Innovative Internet & Digital marketing
Innovative Internet & Digital marketing
 

Último

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Designing high performance datawarehouse

  • 1. Welcome to the webinar on Designing High Performance Datawarehouse Presented by &
  • 2. Contents 1 What happened in the Data 1.0 World 2 What is shaping the new Data 2.0 World 3 Designing High Performance Datawarehouse 4 Q&A
  • 3. What happened in the Data 1.0 World? Before 2000 Do we need a DWH? 2000s Select success : top down & bottom up Advent of ODS Now Business led We’ve got BI / DWH Tools Volume | Variety | Velocity | Value Performance vs. Volume : Game Changer Need insights from nonstructured data as well Drill-down Reporting from DWH – getting into mainstream Analytics is a differentiator Data Silos Metrics for success? OLAP = Insights Painful Implementations Show me the ROI Standardized KPIs Analytics as differentiator? (DATA) Big, Real time, In-memory – what do with existing initiatives? Retaining skills and expertise Data 2.0 : scale, performance, knowledge, relevance
  • 4. Challenges in current DW environment - Survey 42% say Can’t scale to big data volumes 27% say Inadequate data load speed 27% say Poor query response 25% Existing DW modeled for reports & OLAP only 24% 24% 23% 19% Can’t score analytic models Fast enough 18% Cost of scaling up or out is too expensive 15% Can’t support high Concurrent user count 15% Inadequate support for In-memory processing 9% 18% Current platform needs great Manual effort for performance Poorly suited to real-time workloads Can’t support in-database analytics Poor CPU speed and capacity Current platform is a legacy, We must phase it out TDWI research based on 278 respondents – Top Responses`
  • 5. Social Media Data Data 2.0 World True Sentiment Faster Compliance Text Data Sensor Data High Performance Data Warehouse Concurrency Enabled Able to handle Complexity Ability to Scale Syndicated Data Faster Reach Speed Numeric Data Every 18 months, non-rich structured and unstructured enterprise data doubles. Big Data Analytics Analytics = Competitive Advantage Efficiencies driving down costs Customer experience & service Business is now equipped to consume, identify and act upon this data for superior insights
  • 6. So what is a High Performance Datawarehouse? Key Dimensions
  • 8. CONCURRENCY      Streaming Big Data S  Event Processing P  Real time operation  Operational BI E  Near time Analytics E  Dashboard D Refresh  Fast Queries Competing Workloads – OLAP, Analytics Intraday data loads Thousands of users Ad hoc queries High Performance Data Warehouse     Big Data volumes Detailed source data Thousands of reports Scale out into: cloud, clusters, grids, etc. SCALE  Big Data variety  Unstructured  Sensor  Social media  Many sources / targets  Complex models and SQL  High availability C O M P L E X I T Y
  • 10. Industry recognized top techniques 45% say Creating Summary Tables 44% say 33% Adding Indexes say Altering SQL Statements or routines 24% 24% Changing physical data models 16% Using in-memory databases 21% 16% Upgrading Hardware 20% 16% Choosing between column-row oriented data storage Restricting or throttling user queries 15% Moving an application to a separate data mart 10% Applying workload to management controls Shifting some workloads to off-peak hours Adjusting system parameters 6% Others TDWI research based on 329 responses from 114 respondents
  • 12. Summary table design process A good sampling of queries. These may come from user interviews, testing / QA queries, COLLECT production queries, reports or any other means that provide a good representation of expected production queries ANALYZE IDENTIFY The dimension hierarchy levels, dimension attributes, and fact table measures that are required by each query or report. The row counts associated with each dimension level represented. The most commonly required dimension levels against the number of rows in the resulting BALANCE summary tables. A goal should be to design summary tables that are roughly 1/100th the size of the source fact tables in terms of rows (or less) MINIMIZE The columns that are carried in the summary table in favor of joining back to the dimension table. The larger the summary table, the less performance advantages it provides. Some of the best candidates for aggregation will be those where the row counts decrease the most from one level in a hierarchy to the next.
  • 13. Capturing requirements for Summary table •Choosing Aggregates to Create - There are two basic pieces of information which are required to select the appropriate aggregates. •Expected usage patterns of the data. •Data volumes and distributions in the fact table Report Date Calendar Year Measures Sales Sale_Amt Dimension Level Report 1 Dimension Level Store Item District Report 2 District Calendar Year Sales_Qty Sale_Amt Store Geography Report 3 District Calendar Month Calendar Year Sales_Qty Sale_Amt Calendar Month Fiscal Period Fiscal Week Fiscal Period Fiscal Week Sales_Qty Sale_Amt Sales_Qty Sale_Amt Sale_Amt Fiscal Week Sales_Qty Sale_Amt Division Region District Store Subject Category Department Fiscal Year Fiscal Quarter Fiscal Period Fiscal Week Report 4 Report 5 Report 6 Report 7 Report 8 Report 9 Report 10 Report 11 District Store Category Dept Dept District District District District Region Dept Category Fiscal Quarter Fiscal Period Fiscal Week Sales_Qty Sale_Amt Sales_Qty Item Category Date # Populated of Members 1 3 50 3980 279 1987 4145 3 12 36 156
  • 14. Summary table design considerations Aggregate storage column selection  Semi-additive and all non-additive fact data – need not be stored in the summary table  Add as many “pre calculated” columns as possible  “Count” columns could be added for non additive facts to preserve a portion of the information Recreating vs. Updating Aggregates  Efficient for aggregation programs to update the aggregate tables with the newly loaded data  Regeneration more appropriate if there is a lot of program logic to determine what data must be updated in the aggregate table Storing Aggregate Rows  A combined table containing basic level fact rows and aggregate rows  A single aggregate table which holds all aggregate data for a single base fact table  A separate table for each aggregate created – Most preferred option Storing Aggregate Dimension Data  Multiple hierarchies in a single dimension  Store all of the aggregate dimension records together in a single table  Use a separate table for each level in the dimension  Add dimension data to aggregate fact table
  • 15. Efficient Indexing for Datawarehouse 44% say Adding Indexes
  • 16. Dimension table indexing Create a non clustered, primary key on the surrogate key of each dimension table • A clustered index on the business key should be considered. • Enhance the query response when the business key is used in the WHERE clause. • Help avoid lock escalation during ETL process • For large type 2 SCDs, create a four-part non-clustered index : business key, record begin date, record end date and surrogate key • Create non-clustered indexes on columns in the dimension that will be used for searching, sorting, or grouping,. • If there’s a hierarchy in a dimension, such as Category- Sub Category-Product ID, then create index on Hierarchy Index Type EmployeeKey • Index columns Non clustered EmployeeNationalIDAlternateKey clustered EmployeeNationalIDAlternateKey, StartDate, EndDate EmployeeKey Non clustered FirstName LastName DeoartmentName Non clustered
  • 17. Fact table indexing Index columns Index Type clustered • Create a clustered, composite index composed of each of the foreign keys to the fact tables OrderDateKey ProductKey CustomerKey PromotionKey CurrencyKey SalesTerritoryKey DueDateKey • Keep the most commonly queried date column as the leftmost column in the index • There can be more than one date in the fact table but there is usually one date that is of the most interest to business users. A clustered index on this column has the effect of quickly segmenting the amount of data that must be evaluated for a given query
  • 19. Row Store and Column Store Most of the queries does not process all the attributes of a particular relation. Row Store Column Store (+) Easy to add/modify a record (+) Only need to read in relevant data (-) Might read in unnecessary data (-) Tuple writes require multiple accesses • One can obtain the performance benefits of a column-store using a row-store by making some changes to the physical structure of the row store. – Vertically partitioning – Using index-only plans – Using materialized views
  • 20. Vertical Partitioning • Process: – Full Vertical partitioning of each relation • Each column =1 Physical table • This can be achieved by adding integer position column to every table • Adding integer position is better than adding primary key – Join on Position for multi column fetch
  • 21. Index-only plans • Process: – Add B+Tree index for every Table.column – Plans never access the actual tuples on disk – Headers are not stored, so per tuple overhead is less
  • 22. Using Hadoop for Datawarehouse
  • 23. Ecosystem of open Source projects Metadata Management (Hcatlog) Distributed Processing (MapReduce) Distributed Storage (HDFS) Hosted by Apache Foundation Query (Pig) Google developed and shared concepts (Hcatlog APIs, WebHDFS, Talend Open Studio for Big Data, Sqoop) Scripting (Pig) Data Extraction & Loading Non-Relational Database (Hbase) Workflow & Scheduling (Oozie) Management & Monitoring (Ambari, Zookeeper) Hadoop ecosystem Distributed File System that has the ability to scale out
  • 24. Promising uses of Hadoop in DW context Data Staging Hadoop’s scalability and low cost enable organizations to keep all data forever in a readily accessible online environment Data archiving Schema flexibility Hadoop enables the growing practice of “late binding” – instead of transforming data as it’s ingested by Hadoop, structure is applied at runtime Hadoop allows organizations to deploy an extremely scalable and economical ETL environment Hadoop can quickly and easily ingest any data format Processing flexibility Distributed DW architecture Off load workloads for big data and advanced analytics to HDFS, discovery platforms and MapReduce
  • 25. What led to Datawarehouse at Facebook The Problem The Hadoop Experiment Challenges with Hadoop Data, data and more data Superior in availability, scalability Programmability & Metadata  200 GB per day in And Manageability compared March 2008 to commercial Databases 2+ TB (compressed) per day Uses Hadoop File System (HDFS)  Map Reduce hard to program Need to publish data in well known schemas HIVE What is Hive? Key Building Principles Tables A system for managing and querying structured data built on top of Hadoop SQL on structured data as a familiar data warehousing tool Each table has a corresponding directory in HDFS Uses Map Reduce for execution Pluggable map/reduce scripts in language of your choice: Rich Data Types Uses HDFS for storage Performance Each table points to existing data directories in HDFS Split data based on hash of a column – mainly for parallelism
  • 27. Analytical platforms overview 1010data Aster Data (Teradata) Calpont Datallegro (Microsoft) Exasol Greenplum (EMC) IBM SmartAnalytics Infobright Kognitio Netezza (IBM) Oracle Exadata Paraccel Pervasive Sand Technology SAP HANA Sybase IQ (SAP) Teradata Vertica (HP) Purpose-built database management systems designed explicitly for query processing and analysis that provides dramatically higher price/performance and availability compared to general purpose solutions. Deployment Options -Software only (Paraccel, Vertica) -Appliance (SAP, Exadata, Netezza) -Hosted(1010data, Kognitio) • Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations • AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted marketing
  • 28. Which platform do you choose? Hadoop Analytic Database General Purpose RDBMS Structured  Semi-Structured  Unstructured
  • 29. Thank You Please send your Feedback & Corporate Training /Consulting Services requirements on BI to sameer@compulinkacademy.com