Enviar búsqueda
Cargar
Data Warehouse Design Considerations
•
4 recomendaciones
•
13,670 vistas
Ram Kedem
Seguir
Data Warehouse Design Considerations
Leer menos
Leer más
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 24
Recomendados
Data warehouse
Data warehouse
Yogendra Uikey
data warehouse , data mart, etl
data warehouse , data mart, etl
Aashish Rathod
Data Warehouse Modeling
Data Warehouse Modeling
vivekjv
Dimensional Modeling
Dimensional Modeling
Sunita Sahu
Object database standards, languages and design
Object database standards, languages and design
Dabbal Singh Mahara
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
8 drived horizontal fragmentation
8 drived horizontal fragmentation
Mohsan Ijaz
Introduction to NoSQL
Introduction to NoSQL
Dr-Dipali Meher
Recomendados
Data warehouse
Data warehouse
Yogendra Uikey
data warehouse , data mart, etl
data warehouse , data mart, etl
Aashish Rathod
Data Warehouse Modeling
Data Warehouse Modeling
vivekjv
Dimensional Modeling
Dimensional Modeling
Sunita Sahu
Object database standards, languages and design
Object database standards, languages and design
Dabbal Singh Mahara
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
8 drived horizontal fragmentation
8 drived horizontal fragmentation
Mohsan Ijaz
Introduction to NoSQL
Introduction to NoSQL
Dr-Dipali Meher
Adbms 17 object query language
Adbms 17 object query language
Vaibhav Khanna
Multidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan
Aspects of data mart
Aspects of data mart
Osama Hussain Paracha
Data warehouse design
Data warehouse design
ines beltaief
introduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
Advanced Dimensional Modelling
Advanced Dimensional Modelling
Vincent Rainardi
Distributed database
Distributed database
ReachLocal Services India
Multimedia Database
Multimedia Database
Avnish Patel
Hadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
NOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
HADOOP TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
Vaibhav Khanna
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
OODM-object oriented data model
OODM-object oriented data model
AnilPokhrel7
Hadoop YARN
Hadoop YARN
Vigen Sahakyan
Data mining tasks
Data mining tasks
Khwaja Aamer
Hadoop Distributed File System
Hadoop Distributed File System
elliando dias
Data Modeling PPT
Data Modeling PPT
Trinath
Metadata in data warehouse
Metadata in data warehouse
Siddique Ibrahim
Data Warehouse Basics
Data Warehouse Basics
Ram Kedem
Managing and Configuring Databases
Managing and Configuring Databases
Ram Kedem
Más contenido relacionado
La actualidad más candente
Adbms 17 object query language
Adbms 17 object query language
Vaibhav Khanna
Multidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
hasanshan
Aspects of data mart
Aspects of data mart
Osama Hussain Paracha
Data warehouse design
Data warehouse design
ines beltaief
introduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
Advanced Dimensional Modelling
Advanced Dimensional Modelling
Vincent Rainardi
Distributed database
Distributed database
ReachLocal Services India
Multimedia Database
Multimedia Database
Avnish Patel
Hadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
NOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
HADOOP TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
Vaibhav Khanna
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
OODM-object oriented data model
OODM-object oriented data model
AnilPokhrel7
Hadoop YARN
Hadoop YARN
Vigen Sahakyan
Data mining tasks
Data mining tasks
Khwaja Aamer
Hadoop Distributed File System
Hadoop Distributed File System
elliando dias
Data Modeling PPT
Data Modeling PPT
Trinath
Metadata in data warehouse
Metadata in data warehouse
Siddique Ibrahim
La actualidad más candente
(20)
Adbms 17 object query language
Adbms 17 object query language
Multidimensional Database Design & Architecture
Multidimensional Database Design & Architecture
Aspects of data mart
Aspects of data mart
Data warehouse design
Data warehouse design
introduction to NOSQL Database
introduction to NOSQL Database
Advanced Dimensional Modelling
Advanced Dimensional Modelling
Distributed database
Distributed database
Multimedia Database
Multimedia Database
Hadoop Distributed File System
Hadoop Distributed File System
NOSQL Databases types and Uses
NOSQL Databases types and Uses
HADOOP TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
Hadoop File system (HDFS)
Hadoop File system (HDFS)
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
OODM-object oriented data model
OODM-object oriented data model
Hadoop YARN
Hadoop YARN
Data mining tasks
Data mining tasks
Hadoop Distributed File System
Hadoop Distributed File System
Data Modeling PPT
Data Modeling PPT
Metadata in data warehouse
Metadata in data warehouse
Similar a Data Warehouse Design Considerations
Data Warehouse Basics
Data Warehouse Basics
Ram Kedem
Managing and Configuring Databases
Managing and Configuring Databases
Ram Kedem
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
Mrunal Shridhar
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
Ashnikbiz
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
Building better SQL Server Databases
Building better SQL Server Databases
ColdFusionConference
Introduction to Databases
Introduction to Databases
Ram Kedem
Column Statistics in Hive
Column Statistics in Hive
vshreepadma
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
Amazon Web Services
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
Ashnikbiz
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Amazon Web Services
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
Amazon Web Services
Introduction to SQL
Introduction to SQL
Ram Kedem
In Memory Cahce Structure
In Memory Cahce Structure
Mehmet Ali Tastan
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnikbiz
Redis meetup
Redis meetup
Nikhil Dole
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
Redis - Partitioning
Redis - Partitioning
Ismaeel Enjreny
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
IDERA Software
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
FlyData Inc.
Similar a Data Warehouse Design Considerations
(20)
Data Warehouse Basics
Data Warehouse Basics
Managing and Configuring Databases
Managing and Configuring Databases
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Building better SQL Server Databases
Building better SQL Server Databases
Introduction to Databases
Introduction to Databases
Column Statistics in Hive
Column Statistics in Hive
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
Introduction to SQL
Introduction to SQL
In Memory Cahce Structure
In Memory Cahce Structure
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Ashnik EnterpriseDB PostgreSQL - A real alternative to Oracle
Redis meetup
Redis meetup
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
Redis - Partitioning
Redis - Partitioning
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
Más de Ram Kedem
Impala use case @ edge
Impala use case @ edge
Ram Kedem
Advanced SQL Webinar
Advanced SQL Webinar
Ram Kedem
Managing oracle Database Instance
Managing oracle Database Instance
Ram Kedem
Power Pivot and Power View
Power Pivot and Power View
Ram Kedem
Data Mining in SSAS
Data Mining in SSAS
Ram Kedem
Data mining In SSAS
Data mining In SSAS
Ram Kedem
SQL Injections - Oracle
SQL Injections - Oracle
Ram Kedem
SSAS Attributes
SSAS Attributes
Ram Kedem
SSRS Matrix
SSRS Matrix
Ram Kedem
DDL Practice (Hebrew)
DDL Practice (Hebrew)
Ram Kedem
DML Practice (Hebrew)
DML Practice (Hebrew)
Ram Kedem
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Ram Kedem
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
Ram Kedem
Pig - Processing XML data
Pig - Processing XML data
Ram Kedem
SSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
Ram Kedem
SSRS Basic Parameters
SSRS Basic Parameters
Ram Kedem
SSRS Gauges
SSRS Gauges
Ram Kedem
SSRS Conditional Formatting
SSRS Conditional Formatting
Ram Kedem
SSRS Calculated Fields
SSRS Calculated Fields
Ram Kedem
SSRS Groups
SSRS Groups
Ram Kedem
Más de Ram Kedem
(20)
Impala use case @ edge
Impala use case @ edge
Advanced SQL Webinar
Advanced SQL Webinar
Managing oracle Database Instance
Managing oracle Database Instance
Power Pivot and Power View
Power Pivot and Power View
Data Mining in SSAS
Data Mining in SSAS
Data mining In SSAS
Data mining In SSAS
SQL Injections - Oracle
SQL Injections - Oracle
SSAS Attributes
SSAS Attributes
SSRS Matrix
SSRS Matrix
DDL Practice (Hebrew)
DDL Practice (Hebrew)
DML Practice (Hebrew)
DML Practice (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
Pig - Processing XML data
Pig - Processing XML data
SSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
SSRS Basic Parameters
SSRS Basic Parameters
SSRS Gauges
SSRS Gauges
SSRS Conditional Formatting
SSRS Conditional Formatting
SSRS Calculated Fields
SSRS Calculated Fields
SSRS Groups
SSRS Groups
Último
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
MadyBayot
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
apidays
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Dropbox
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
Overkill Security
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Último
(20)
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Architecting Cloud Native Applications
Architecting Cloud Native Applications
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Data Warehouse Design Considerations
1.
Data WarehouseDesign Considerations
Ram Kedem
2.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD •OLTP updates are moved into the DW •Any changes overwrites the current DW data •Past actual data history is lost •Historical data may be change if it doesn’t contain important business details (such as store location)
3.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD •Data is not overwritten in the DW •A new row for the customer must be inserted •Usually created Primary Key Issues •For example –if customer details got changed, this approach suggest you insert another row in the Dimension for the same customer •You must add a Surrogate Key (DWH Key) •Incremented number for each update, same idea as Primary Key that consists from two columns. •You must also add another column or two •To flag the current value •To provide date / time perspective
4.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD
5.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD
6.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Indexing affects how data is stored and managed in SQL Server •There are four main indexing options in SQL Server •Clustered Index •Non Clustered Index •Filtered Non Clustered Index •Columnstoreindex (include)
7.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Clustered Index •Determines the physical storage order of the data •There can be only one clustered index on a table •Non Clustered Index •Sorts data in a column or columns and stores pointers to the actual data row •We can have up to 999 non clustered indexes on a table
8.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Filtered Non Clustered Index •Creates a non clustered index on a subset of values in a column •ColumnstoreIndex •A non clustered index placed on a single column •The column is stored and searched speratelyfrom the data row •Adding a columnstoreindex to a column makes the column read- only •https://www.simple-talk.com/sql/database- administration/columnstore-indexes-in-sql-server-2012/
9.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com ColumnstoreIndex CREATE NONCLUSTERED COLUMNSTORE INDEX csi_products ON dbo.products (productName, UnitPrice, unitsinstock); SELECT productName, UnitPrice, unitsinstock FROM products ;
10.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •Indexing in the Data Warehouse can be tricky •Too few indexes will allow data loads to be quick But query response time will be slow •Too many indexes slow down load, and storage requirements go up But query response is good
11.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •General rule of thumb •Dimension tables •Place a clustered index on the surrogate key •If the table has a lot of columns, create non-clustered indexes on the most popular columns •Fact tables •Place a non-clustered index on the single-column foreign keys to the dimension tables •If the primary key is a composite of all the dimension foreign keys, make it a non-unique clustered index.
12.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •What is a view •A result set of a query that is a virtual table •The virtual table is not stored permanently in the database. •The view can be referenced like a table in TSQL •Indexing a view •You can create a unique clustered index on a view •The view result set get stored in the database, just like a regular table with a clustered index.
13.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •Advantages •Improve the performance of joins and aggregations that process many rows
14.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 Supports data compression •Data compression reduces the size of the database •Packs more data onto few data pages •Fewer data page reads are required to satisfy queries •Lower IO means faster response; lower processing load on the server •Extra CPU resource are required for data decompression / compression •DWH usually doesn’t have much updates (other than Bulk Loading)
15.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 supports three compression types •Page compression •Focuses on duplicated values within the data page •Stores one value, place a pointer at all other locations •Row Compression •Remove any unused bytes in a fixed data type •CHAR(25) •Unicode compression •Reduces storage space for Unicode data that doesn’t require that space
16.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •Which compression should you use •Page compression •It automatically uses row compression when page compression is used •If you use row compression, you cant use page compression •Facttables usually benefit the most from compression •Compression is only available in SQL Server Enterprise Edition.
17.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •What is data lineage •Data origination and flow details •Where it is from, where it is going, how it is transformed in the process •Same concept as comments in programming
18.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Why do we need Data Lineage •To provide meta-data context in the DWH •Future business rules may change, affecting some data •Making it invalid •Making it suspect •Making it more important •Data lineage allows us to identify this data
19.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Two main options for adding Data Lineage •SSIS system variables •TSQL System functions
20.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage using TSQL SELECT APP_NAME () , DATABASE_PRINCIPAL_ID (), USER_NAME () SUSER_NAME (), GETDATE () , CURRENT_TIMESTAMP () , CONNECTIONPROPERTY (‘Client_net_address’)
21.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •Fact tables become very large tables over time •Very large database tables present serious challenges •What if you need to delete large portion of the data ? •TRUNCATE TABLE command performs deletion with minimal logging, but it deletes the entire table. •Large data inserts become time consuming •Index maintenance and storage can become problematic •Table partitions deal with all these issues
22.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •What is a table partition •A large table is stored in multiple files •Divided by rows (based on condition) •Usually date / time •SQL SERVER 2012 allows up to 15,000 partitions on a single table •Partitions and data are managed in the background
23.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions
24.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Identifying our Dimensions / Fact Tables