SlideShare una empresa de Scribd logo
1 de 24
Data WarehouseDesign Considerations 
Ram Kedem
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 1 SCD 
•OLTP updates are moved into the DW 
•Any changes overwrites the current DW data 
•Past actual data history is lost 
•Historical data may be change if it doesn’t contain important business details (such as store location)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 2 SCD 
•Data is not overwritten in the DW 
•A new row for the customer must be inserted 
•Usually created Primary Key Issues 
•For example –if customer details got changed, this approach suggest you insert another row in the Dimension for the same customer 
•You must add a Surrogate Key (DWH Key) 
•Incremented number for each update, same idea as Primary Key that consists from two columns. 
•You must also add another column or two 
•To flag the current value 
•To provide date / time perspective
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 1 SCD
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changing Dimensions 
•Type 2 SCD
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Indexing 
•Indexing affects how data is stored and managed in SQL Server 
•There are four main indexing options in SQL Server 
•Clustered Index 
•Non Clustered Index 
•Filtered Non Clustered Index 
•Columnstoreindex (include)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Indexing 
•Clustered Index 
•Determines the physical storage order of the data 
•There can be only one clustered index on a table 
•Non Clustered Index 
•Sorts data in a column or columns and stores pointers to the actual data row 
•We can have up to 999 non clustered indexes on a table
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Indexing 
•Filtered Non Clustered Index 
•Creates a non clustered index on a subset of values in a column 
•ColumnstoreIndex 
•A non clustered index placed on a single column 
•The column is stored and searched speratelyfrom the data row 
•Adding a columnstoreindex to a column makes the column read- only 
•https://www.simple-talk.com/sql/database- administration/columnstore-indexes-in-sql-server-2012/
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
ColumnstoreIndex 
CREATE NONCLUSTERED COLUMNSTORE INDEX csi_products 
ON dbo.products 
(productName, UnitPrice, unitsinstock); 
SELECT productName, UnitPrice, unitsinstock 
FROM products ;
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Indexing the Data Warehouse 
•Indexing in the Data Warehouse can be tricky 
•Too few indexes will allow data loads to be quick But query response time will be slow 
•Too many indexes slow down load, and storage requirements go up But query response is good
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Indexing the Data Warehouse 
•General rule of thumb 
•Dimension tables 
•Place a clustered index on the surrogate key 
•If the table has a lot of columns, create non-clustered indexes on the most popular columns 
•Fact tables 
•Place a non-clustered index on the single-column foreign keys to the dimension tables 
•If the primary key is a composite of all the dimension foreign keys, make it a non-unique clustered index.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Index Views 
•What is a view 
•A result set of a query that is a virtual table 
•The virtual table is not stored permanently in the database. 
•The view can be referenced like a table in TSQL 
•Indexing a view 
•You can create a unique clustered index on a view 
•The view result set get stored in the database, just like a regular table with a clustered index.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Index Views 
•Advantages 
•Improve the performance of joins and aggregations that process many rows
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Data Compression 
•SQL Server 2012 Supports data compression 
•Data compression reduces the size of the database 
•Packs more data onto few data pages 
•Fewer data page reads are required to satisfy queries 
•Lower IO means faster response; lower processing load on the server 
•Extra CPU resource are required for data decompression / compression 
•DWH usually doesn’t have much updates (other than Bulk Loading)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Data Compression 
•SQL Server 2012 supports three compression types 
•Page compression 
•Focuses on duplicated values within the data page 
•Stores one value, place a pointer at all other locations 
•Row Compression 
•Remove any unused bytes in a fixed data type 
•CHAR(25) 
•Unicode compression 
•Reduces storage space for Unicode data that doesn’t require that space
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Data Compression 
•Which compression should you use 
•Page compression 
•It automatically uses row compression when page compression is used 
•If you use row compression, you cant use page compression 
•Facttables usually benefit the most from compression 
•Compression is only available in SQL Server Enterprise Edition.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage 
•What is data lineage 
•Data origination and flow details 
•Where it is from, where it is going, how it is transformed in the process 
•Same concept as comments in programming
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage 
•Why do we need Data Lineage 
•To provide meta-data context in the DWH 
•Future business rules may change, affecting some data 
•Making it invalid 
•Making it suspect 
•Making it more important 
•Data lineage allows us to identify this data
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage 
•Two main options for adding Data Lineage 
•SSIS system variables 
•TSQL System functions
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage using TSQL 
SELECT 
APP_NAME () , 
DATABASE_PRINCIPAL_ID (), 
USER_NAME () 
SUSER_NAME (), 
GETDATE () , 
CURRENT_TIMESTAMP () , 
CONNECTIONPROPERTY (‘Client_net_address’)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partitions 
•Fact tables become very large tables over time 
•Very large database tables present serious challenges 
•What if you need to delete large portion of the data ? 
•TRUNCATE TABLE command performs deletion with minimal logging, but it deletes the entire table. 
•Large data inserts become time consuming 
•Index maintenance and storage can become problematic 
•Table partitions deal with all these issues
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partitions 
•What is a table partition 
•A large table is stored in multiple files 
•Divided by rows (based on condition) 
•Usually date / time 
•SQL SERVER 2012 allows up to 15,000 partitions on a single table 
•Partitions and data are managed in the background
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partitions
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Identifying our Dimensions / Fact Tables

Más contenido relacionado

La actualidad más candente

Adbms 17 object query language
Adbms 17 object query languageAdbms 17 object query language
Adbms 17 object query languageVaibhav Khanna
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecturehasanshan
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse designines beltaief
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional ModellingVincent Rainardi
 
Multimedia Database
Multimedia Database Multimedia Database
Multimedia Database Avnish Patel
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaVaibhav Khanna
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data modelAnilPokhrel7
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouseSiddique Ibrahim
 

La actualidad más candente (20)

Adbms 17 object query language
Adbms 17 object query languageAdbms 17 object query language
Adbms 17 object query language
 
Multidimensional Database Design & Architecture
Multidimensional Database Design & ArchitectureMultidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
 
Aspects of data mart
Aspects of data martAspects of data mart
Aspects of data mart
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Multimedia Database
Multimedia Database Multimedia Database
Multimedia Database
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
OODM-object oriented data model
OODM-object oriented data modelOODM-object oriented data model
OODM-object oriented data model
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
 

Similar a Data Warehouse Design Considerations

Data Warehouse Basics
Data Warehouse BasicsData Warehouse Basics
Data Warehouse BasicsRam Kedem
 
Managing and Configuring Databases
Managing and Configuring DatabasesManaging and Configuring Databases
Managing and Configuring DatabasesRam Kedem
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Mrunal Shridhar
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to DatabasesRam Kedem
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hivevshreepadma
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Ashnikbiz
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQLRam Kedem
 
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle Ashnikbiz
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyIDERA Software
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 

Similar a Data Warehouse Design Considerations (20)

Data Warehouse Basics
Data Warehouse BasicsData Warehouse Basics
Data Warehouse Basics
 
Managing and Configuring Databases
Managing and Configuring DatabasesManaging and Configuring Databases
Managing and Configuring Databases
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
Introduction to Databases
Introduction to DatabasesIntroduction to Databases
Introduction to Databases
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hive
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
In Memory Cahce Structure
In Memory Cahce StructureIn Memory Cahce Structure
In Memory Cahce Structure
 
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
 
Redis meetup
Redis meetupRedis meetup
Redis meetup
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Redis - Partitioning
Redis - PartitioningRedis - Partitioning
Redis - Partitioning
 
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 

Más de Ram Kedem

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL WebinarRam Kedem
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database InstanceRam Kedem
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power ViewRam Kedem
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSASRam Kedem
 
Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSASRam Kedem
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - OracleRam Kedem
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS AttributesRam Kedem
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)Ram Kedem
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)Ram Kedem
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Ram Kedem
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Ram Kedem
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML dataRam Kedem
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesRam Kedem
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic ParametersRam Kedem
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional FormattingRam Kedem
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated FieldsRam Kedem
 

Más de Ram Kedem (20)

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL Webinar
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database Instance
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power View
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSAS
 
Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSAS
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - Oracle
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS Attributes
 
SSRS Matrix
SSRS MatrixSSRS Matrix
SSRS Matrix
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML data
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic Parameters
 
SSRS Gauges
SSRS GaugesSSRS Gauges
SSRS Gauges
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional Formatting
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated Fields
 
SSRS Groups
SSRS GroupsSSRS Groups
SSRS Groups
 

Último

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Data Warehouse Design Considerations

  • 2. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD •OLTP updates are moved into the DW •Any changes overwrites the current DW data •Past actual data history is lost •Historical data may be change if it doesn’t contain important business details (such as store location)
  • 3. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD •Data is not overwritten in the DW •A new row for the customer must be inserted •Usually created Primary Key Issues •For example –if customer details got changed, this approach suggest you insert another row in the Dimension for the same customer •You must add a Surrogate Key (DWH Key) •Incremented number for each update, same idea as Primary Key that consists from two columns. •You must also add another column or two •To flag the current value •To provide date / time perspective
  • 4. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD
  • 5. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD
  • 6. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Indexing affects how data is stored and managed in SQL Server •There are four main indexing options in SQL Server •Clustered Index •Non Clustered Index •Filtered Non Clustered Index •Columnstoreindex (include)
  • 7. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Clustered Index •Determines the physical storage order of the data •There can be only one clustered index on a table •Non Clustered Index •Sorts data in a column or columns and stores pointers to the actual data row •We can have up to 999 non clustered indexes on a table
  • 8. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Filtered Non Clustered Index •Creates a non clustered index on a subset of values in a column •ColumnstoreIndex •A non clustered index placed on a single column •The column is stored and searched speratelyfrom the data row •Adding a columnstoreindex to a column makes the column read- only •https://www.simple-talk.com/sql/database- administration/columnstore-indexes-in-sql-server-2012/
  • 9. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com ColumnstoreIndex CREATE NONCLUSTERED COLUMNSTORE INDEX csi_products ON dbo.products (productName, UnitPrice, unitsinstock); SELECT productName, UnitPrice, unitsinstock FROM products ;
  • 10. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •Indexing in the Data Warehouse can be tricky •Too few indexes will allow data loads to be quick But query response time will be slow •Too many indexes slow down load, and storage requirements go up But query response is good
  • 11. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •General rule of thumb •Dimension tables •Place a clustered index on the surrogate key •If the table has a lot of columns, create non-clustered indexes on the most popular columns •Fact tables •Place a non-clustered index on the single-column foreign keys to the dimension tables •If the primary key is a composite of all the dimension foreign keys, make it a non-unique clustered index.
  • 12. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •What is a view •A result set of a query that is a virtual table •The virtual table is not stored permanently in the database. •The view can be referenced like a table in TSQL •Indexing a view •You can create a unique clustered index on a view •The view result set get stored in the database, just like a regular table with a clustered index.
  • 13. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •Advantages •Improve the performance of joins and aggregations that process many rows
  • 14. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 Supports data compression •Data compression reduces the size of the database •Packs more data onto few data pages •Fewer data page reads are required to satisfy queries •Lower IO means faster response; lower processing load on the server •Extra CPU resource are required for data decompression / compression •DWH usually doesn’t have much updates (other than Bulk Loading)
  • 15. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 supports three compression types •Page compression •Focuses on duplicated values within the data page •Stores one value, place a pointer at all other locations •Row Compression •Remove any unused bytes in a fixed data type •CHAR(25) •Unicode compression •Reduces storage space for Unicode data that doesn’t require that space
  • 16. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •Which compression should you use •Page compression •It automatically uses row compression when page compression is used •If you use row compression, you cant use page compression •Facttables usually benefit the most from compression •Compression is only available in SQL Server Enterprise Edition.
  • 17. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •What is data lineage •Data origination and flow details •Where it is from, where it is going, how it is transformed in the process •Same concept as comments in programming
  • 18. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Why do we need Data Lineage •To provide meta-data context in the DWH •Future business rules may change, affecting some data •Making it invalid •Making it suspect •Making it more important •Data lineage allows us to identify this data
  • 19. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Two main options for adding Data Lineage •SSIS system variables •TSQL System functions
  • 20. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage using TSQL SELECT APP_NAME () , DATABASE_PRINCIPAL_ID (), USER_NAME () SUSER_NAME (), GETDATE () , CURRENT_TIMESTAMP () , CONNECTIONPROPERTY (‘Client_net_address’)
  • 21. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •Fact tables become very large tables over time •Very large database tables present serious challenges •What if you need to delete large portion of the data ? •TRUNCATE TABLE command performs deletion with minimal logging, but it deletes the entire table. •Large data inserts become time consuming •Index maintenance and storage can become problematic •Table partitions deal with all these issues
  • 22. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •What is a table partition •A large table is stored in multiple files •Divided by rows (based on condition) •Usually date / time •SQL SERVER 2012 allows up to 15,000 partitions on a single table •Partitions and data are managed in the background
  • 23. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions
  • 24. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Identifying our Dimensions / Fact Tables