SlideShare una empresa de Scribd logo
1 de 25
About Presenter


      Karan Gulati is SQL Server Analysis Services Maestro (MCM), working as
      Support Escalation Engineer in Microsoft for last five years. Currently he is
      focusing more on SQL BI and SQL PDW. He is very Active blogger and
      contributed to multiple whitepapers which are published on MSDN or
      TechNet site. He had also written tools which are available on CodePlex.




0                                    Karan Gulati (SSAS Maestro)
Data Warehousing Concepts

    Overview of Data Warehousing and Analysis Services terms




1                                  Karan Gulati (SSAS Maestro)
What are we covering

    Understanding terms used in SSAS and Data Warehousing world:

    •   What is Data Warehouse
    •   OLAP
         •   Cube
              •   Measures
              •   Dimensions

    •   Schema
         •   Star
         •   Snow-Flake
    •   Surrogate Keys
    •   Slowly Changing Dimensions
         •   SCD1
         •   SCD2
         •   SCD3




2                                    Karan Gulati (SSAS Maestro)
Data Warehousing
     A data warehouse is a general structure for storing the data needed for good
     BI (Business Intelligence).

     Data in a warehouse is of little use until it is converted into the information
     that decision makers need.

     The large relational databases, typical of data warehouses, need additional
     help to convert the data into information.




3                                   Karan Gulati (SSAS Maestro)
Why Use OLAP?

     Provides fast and interactive access to aggregated data and the ability to drill
     down to detail.
     Lets users view and interrogate large volumes of data (often millions of rows)
     by pre-aggregating the information.
     Puts the data needed to make strategic decisions directly into the hands of the
     decision makers, through pre-defined queries and reports, because it gives
     end users the ability to perform their own ad hoc queries, minimizing users'
     dependence on database developers.




4                                  Karan Gulati (SSAS Maestro)
OLAP Secret

      It leverages existing data from a relational schema or data warehouse (data
      source) by placing key performance indicators (measures) into context
      (dimensions).
      Once processed into a multidimensional database (cube), all of the measures
      are pre-aggregated, which makes data retrieval significantly faster.
      The processed cube can then be made available to business users who can
      browse the data using a variety of tools, making ad hoc analysis an interactive
      and analytical process rather than a development effort.
      SQL Server 2005's BI Workbench substantially improves upon SQL Server
      2000's BI capability.




5                                   Karan Gulati (SSAS Maestro)
SQL BI Tools


      The SQL Server BI Workbench suite consists of five basic tools:
          SQL Server Relational Database: Used to create relational database
          Analysis Services: Used to create multidimensional model
          (measures, dimensions and schema)
          Data Transformation Services (DTS (Integration Services)): Used to
          extract, transform and load data from source(s) to the data warehouse or
          schema
          Reporting Services: Used to build and manage enterprise reporting using
          the relational or multidimensional sources
          Data Mining: Used to extract information based on predetermined
          algorithms




6                                  Karan Gulati (SSAS Maestro)
Architecture




7                  Karan Gulati (SSAS Maestro)
What is Cube?
     A collection of one or more related measure groups and
     their associated dimensions




8                         Karan Gulati (SSAS Maestro)
Cube Example
     Consider the following Imports cube. It contains:
        Two measures:
            Packages
            Last
        Three related dimensions:
            Route
            Source
            Time




9                                  Karan Gulati (SSAS Maestro)
Elements of Cubes
       Measures
       Dimensions
       Schema
              Star
              Snowflake




10                        Karan Gulati (SSAS Maestro)
Measures
      Measures are the key performance indicators that you want to evaluate.

      To determine which of the numbers in the data might be measures, here is a
      rule of thumb:
          If a number makes sense when it is aggregated, then it is a measure.




11                                 Karan Gulati (SSAS Maestro)
Dimensions
      Dimensions are the categories of data analysis.

      Here is the rule of thumb:
         When a report is requested "by" something, that something is
         usually a dimension.




12                             Karan Gulati (SSAS Maestro)
Schema


      Methodology of arranging your Fact and Master tables:

        Star Schema




        Snow-Flake Schema




13                                Karan Gulati (SSAS Maestro)
Star Schema
     The figure shows a basic star schema; with the dimension tables arranged
     around a central fact table that contains the measures. A fact table contains a
     column for each measure as well as a column for each dimension. Each
     dimension column has a foreign-key relationship to the related dimension
     table, and the dimension columns taken together are the key to the fact table.




14                                    Karan Gulati (SSAS Maestro)
Snowflake

         Normalizing each of the dimension tables so that there are many joins for
         each dimension results in a Snowflake Schema.
         It is called a Snowflake Schema because the “points” of the star get broken
         up into little branches that look like a snowflake.




15                                 Karan Gulati (SSAS Maestro)
Which Schema works for you?

     Good question:
     It all depends on your requirement, I will say Star is very simple to understand and
     manage in comparison to Snow-flake but in real world you cant fit everything in
     one table so Normalize needs to be done.




16                                     Karan Gulati (SSAS Maestro)
Surrogate Keys
     Also known:
            Meaningless keys
            Substitute keys
            Non-natural keys
            Artificial keys

      A surrogate key is a unique value, usually an integer,
     assigned to each row in the dimension. This surrogate key
     becomes the primary key of the dimension table and is used
     to join the dimension to the associated foreign key field in
     the fact table.



17                             Karan Gulati (SSAS Maestro)
What’s benefit of Surrogate Keys

     A surrogate key is a unique value, usually an integer, assigned
     to each row in the dimension. This surrogate key becomes
     the primary key of the dimension table and is used to join
     the dimension to the associated foreign key field in the fact
     table.

     Surrogate keys helps in maintaining history in case of Slowly
     Changing Dimensions




18                            Karan Gulati (SSAS Maestro)
Slowly Changing Dimensions
     There are 3 Versions of SCD

     SCD 1

        The Type 1 methodology overwrites old data with new data, and therefore
        does not track historical data at all. This is most appropriate when correcting
        certain types of data errors, such as the spelling of a name. (Assuming you
        won't ever need to know how it used to be misspelled in the past)




19                                     Karan Gulati (SSAS Maestro)
So, what’ Dis-Advantage of SCD1


       The obvious disadvantage to this method of managing SCDs is that there is no
       historical record kept in the data warehouse. You can't tell if your suppliers are
       tending to move to the Midwest, for example. But an advantage to this is that
       these are very easy to maintain. Type 2




20                                    Karan Gulati (SSAS Maestro)
SCD 2
     The Type 2 method tracks historical data by creating multiple records in the
     dimensional tables with separate keys. With Type 2, we have unlimited history
     preservation as a new record is inserted each time a change is made.
        In the same example, if the supplier moves to Illinois, the table would look like
        this:
        Another popular method for tuple versioning is to add effective date columns.




21                                     Karan Gulati (SSAS Maestro)
SCD 3
     The Type 3 method tracks changes using separate columns. Whereas Type 2 had
     unlimited history preservation, Type 3 has limited history preservation, as it's
     limited to the number of columns we designate for storing historical data. Where
     the original table structure in Type 1 and Type 2 was very similar, Type 3 will add
     additional columns to the tables:




        Note: Type 3, keeps separate columns for both the old and new
        attribute values—sometimes called “alternate realities.” In our
        experience, Type 3 is less common because it involves changing the
        physical tables and is not very scalable.
22                                     Karan Gulati (SSAS Maestro)
Slowly Changing Dimension

      You can use SSIS or TSQL for implementing SCD in DW
       Here is a reference –

       http://blogs.msdn.com/b/karang/archive/2010/09/29/slowly-changing-
       dimension-using-ssis.aspx




23                                Karan Gulati (SSAS Maestro)
Thanks
      Contact Speaker -

                http://karanspeaks.com

                http://blogs.msdn.com/karang

                https://twitter.com/karangspeaks

              http://in.linkedin.com/in/karanspeaks




24                         Karan Gulati (SSAS Maestro)

Más contenido relacionado

La actualidad más candente

SSAS, MDX , Cube understanding, Browsing and Tools information
SSAS, MDX , Cube understanding, Browsing and Tools information SSAS, MDX , Cube understanding, Browsing and Tools information
SSAS, MDX , Cube understanding, Browsing and Tools information Vishal Pawar
 
Developing ssas cube
Developing ssas cubeDeveloping ssas cube
Developing ssas cubeSlava Kokaev
 
Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Turner Kunkel
 
Building a SSAS Tabular Model Database
Building a SSAS Tabular Model DatabaseBuilding a SSAS Tabular Model Database
Building a SSAS Tabular Model DatabaseCode Mastery
 
Data visualization with sql analytics
Data visualization with sql analyticsData visualization with sql analytics
Data visualization with sql analyticsDatabricks
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI RoleJames Serra
 
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...KTL Solutions
 
Azure analysis services next step to bi in the cloud
Azure analysis services   next step to bi in the cloudAzure analysis services   next step to bi in the cloud
Azure analysis services next step to bi in the cloudGabi Münster
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction Mark Ginnebaugh
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
OLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis ServicesOLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis ServicesPeter Gfader
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 

La actualidad más candente (20)

SSAS, MDX , Cube understanding, Browsing and Tools information
SSAS, MDX , Cube understanding, Browsing and Tools information SSAS, MDX , Cube understanding, Browsing and Tools information
SSAS, MDX , Cube understanding, Browsing and Tools information
 
Developing ssas cube
Developing ssas cubeDeveloping ssas cube
Developing ssas cube
 
Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)Azure Analysis Services (Azure Bootcamp 2018)
Azure Analysis Services (Azure Bootcamp 2018)
 
Building a SSAS Tabular Model Database
Building a SSAS Tabular Model DatabaseBuilding a SSAS Tabular Model Database
Building a SSAS Tabular Model Database
 
Data visualization with sql analytics
Data visualization with sql analyticsData visualization with sql analytics
Data visualization with sql analytics
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI Role
 
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...
Leveraging Azure Analysis Services Tabular Data Models with Power BI by Tim M...
 
SQL Server 2016 BI updates
SQL Server 2016 BI updatesSQL Server 2016 BI updates
SQL Server 2016 BI updates
 
Azure analysis services next step to bi in the cloud
Azure analysis services   next step to bi in the cloudAzure analysis services   next step to bi in the cloud
Azure analysis services next step to bi in the cloud
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
OLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis ServicesOLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis Services
 
SSAS and MDX
SSAS and MDXSSAS and MDX
SSAS and MDX
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 

Destacado

Microsoft SSAS: Should I Use Tabular or Multidimensional?
Microsoft SSAS: Should I Use Tabular or Multidimensional?Microsoft SSAS: Should I Use Tabular or Multidimensional?
Microsoft SSAS: Should I Use Tabular or Multidimensional?Senturus
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)Anurag Rana
 
SSAS Reference Architecture
SSAS Reference ArchitectureSSAS Reference Architecture
SSAS Reference ArchitectureMarcel Franke
 
HDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy Steps
HDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy StepsHDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy Steps
HDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy StepsKaran Gulati
 
Building SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutionsBuilding SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutionsDenny Lee
 
Data Driven Security in SSAS
Data Driven Security in SSASData Driven Security in SSAS
Data Driven Security in SSASMike Duffy
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 

Destacado (12)

Microsoft SSAS: Should I Use Tabular or Multidimensional?
Microsoft SSAS: Should I Use Tabular or Multidimensional?Microsoft SSAS: Should I Use Tabular or Multidimensional?
Microsoft SSAS: Should I Use Tabular or Multidimensional?
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)
 
SSAS Reference Architecture
SSAS Reference ArchitectureSSAS Reference Architecture
SSAS Reference Architecture
 
HDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy Steps
HDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy StepsHDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy Steps
HDInsight on Windows: Building PowerPivot Report from Hive in a Few Easy Steps
 
Building SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutionsBuilding SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutions
 
Data Driven Security in SSAS
Data Driven Security in SSASData Driven Security in SSAS
Data Driven Security in SSAS
 
SSRS for DBA's
SSRS for DBA'sSSRS for DBA's
SSRS for DBA's
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAs
 
MSBI-SSRS PPT
MSBI-SSRS PPTMSBI-SSRS PPT
MSBI-SSRS PPT
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 

Similar a Data warehouse and ssas terms

Top 20 MSBI Interview Questions.pdf
Top 20 MSBI Interview Questions.pdfTop 20 MSBI Interview Questions.pdf
Top 20 MSBI Interview Questions.pdfAnanthReddy38
 
Data ware house design
Data ware house designData ware house design
Data ware house designSayed Ahmed
 
Data ware house design
Data ware house designData ware house design
Data ware house designSayed Ahmed
 
Data Visualization Techniques
Data Visualization TechniquesData Visualization Techniques
Data Visualization TechniquesAllAnalytics
 
Joel Chamberlain Business Intelligence Portfolio
Joel Chamberlain Business Intelligence PortfolioJoel Chamberlain Business Intelligence Portfolio
Joel Chamberlain Business Intelligence Portfoliojwchamb
 
Multidimensional schema
Multidimensional schemaMultidimensional schema
Multidimensional schemaChaand Chopra
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??Abdul Aslam
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09guest9d79e073
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Mark Ginnebaugh
 
Table Partitioning: Secret Weapon for Big Data Problems
Table Partitioning: Secret Weapon for Big Data ProblemsTable Partitioning: Secret Weapon for Big Data Problems
Table Partitioning: Secret Weapon for Big Data ProblemsJohn Sterrett
 
L16 l17 Data Warehousing
L16 l17  Data WarehousingL16 l17  Data Warehousing
L16 l17 Data WarehousingRushdi Shams
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databasesyazad dumasia
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Shardinginside-BigData.com
 

Similar a Data warehouse and ssas terms (20)

Top 20 MSBI Interview Questions.pdf
Top 20 MSBI Interview Questions.pdfTop 20 MSBI Interview Questions.pdf
Top 20 MSBI Interview Questions.pdf
 
Data ware house design
Data ware house designData ware house design
Data ware house design
 
Data ware house design
Data ware house designData ware house design
Data ware house design
 
SSAS Tabular model importance and uses
SSAS  Tabular model importance and usesSSAS  Tabular model importance and uses
SSAS Tabular model importance and uses
 
Data Visualization Techniques
Data Visualization TechniquesData Visualization Techniques
Data Visualization Techniques
 
Data Visualization Techniques
Data Visualization TechniquesData Visualization Techniques
Data Visualization Techniques
 
CS636-olap.ppt
CS636-olap.pptCS636-olap.ppt
CS636-olap.ppt
 
Joel Chamberlain Business Intelligence Portfolio
Joel Chamberlain Business Intelligence PortfolioJoel Chamberlain Business Intelligence Portfolio
Joel Chamberlain Business Intelligence Portfolio
 
Multidimensional schema
Multidimensional schemaMultidimensional schema
Multidimensional schema
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09Brad McGehee Intepreting Execution Plans Mar09
Brad McGehee Intepreting Execution Plans Mar09
 
Table Partitioning: Secret Weapon for Big Data Problems
Table Partitioning: Secret Weapon for Big Data ProblemsTable Partitioning: Secret Weapon for Big Data Problems
Table Partitioning: Secret Weapon for Big Data Problems
 
L16 l17 Data Warehousing
L16 l17  Data WarehousingL16 l17  Data Warehousing
L16 l17 Data Warehousing
 
Schemas for multidimensional databases
Schemas for multidimensional databasesSchemas for multidimensional databases
Schemas for multidimensional databases
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Dbms schemas for decision support
Dbms schemas for decision supportDbms schemas for decision support
Dbms schemas for decision support
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Sas Grid Migration and Roadmap
Sas Grid Migration and RoadmapSas Grid Migration and Roadmap
Sas Grid Migration and Roadmap
 
Scalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data ShardingScalable Machine Learning: The Role of Stratified Data Sharding
Scalable Machine Learning: The Role of Stratified Data Sharding
 

Data warehouse and ssas terms

  • 1. About Presenter Karan Gulati is SQL Server Analysis Services Maestro (MCM), working as Support Escalation Engineer in Microsoft for last five years. Currently he is focusing more on SQL BI and SQL PDW. He is very Active blogger and contributed to multiple whitepapers which are published on MSDN or TechNet site. He had also written tools which are available on CodePlex. 0 Karan Gulati (SSAS Maestro)
  • 2. Data Warehousing Concepts Overview of Data Warehousing and Analysis Services terms 1 Karan Gulati (SSAS Maestro)
  • 3. What are we covering Understanding terms used in SSAS and Data Warehousing world: • What is Data Warehouse • OLAP • Cube • Measures • Dimensions • Schema • Star • Snow-Flake • Surrogate Keys • Slowly Changing Dimensions • SCD1 • SCD2 • SCD3 2 Karan Gulati (SSAS Maestro)
  • 4. Data Warehousing A data warehouse is a general structure for storing the data needed for good BI (Business Intelligence). Data in a warehouse is of little use until it is converted into the information that decision makers need. The large relational databases, typical of data warehouses, need additional help to convert the data into information. 3 Karan Gulati (SSAS Maestro)
  • 5. Why Use OLAP? Provides fast and interactive access to aggregated data and the ability to drill down to detail. Lets users view and interrogate large volumes of data (often millions of rows) by pre-aggregating the information. Puts the data needed to make strategic decisions directly into the hands of the decision makers, through pre-defined queries and reports, because it gives end users the ability to perform their own ad hoc queries, minimizing users' dependence on database developers. 4 Karan Gulati (SSAS Maestro)
  • 6. OLAP Secret It leverages existing data from a relational schema or data warehouse (data source) by placing key performance indicators (measures) into context (dimensions). Once processed into a multidimensional database (cube), all of the measures are pre-aggregated, which makes data retrieval significantly faster. The processed cube can then be made available to business users who can browse the data using a variety of tools, making ad hoc analysis an interactive and analytical process rather than a development effort. SQL Server 2005's BI Workbench substantially improves upon SQL Server 2000's BI capability. 5 Karan Gulati (SSAS Maestro)
  • 7. SQL BI Tools The SQL Server BI Workbench suite consists of five basic tools: SQL Server Relational Database: Used to create relational database Analysis Services: Used to create multidimensional model (measures, dimensions and schema) Data Transformation Services (DTS (Integration Services)): Used to extract, transform and load data from source(s) to the data warehouse or schema Reporting Services: Used to build and manage enterprise reporting using the relational or multidimensional sources Data Mining: Used to extract information based on predetermined algorithms 6 Karan Gulati (SSAS Maestro)
  • 8. Architecture 7 Karan Gulati (SSAS Maestro)
  • 9. What is Cube? A collection of one or more related measure groups and their associated dimensions 8 Karan Gulati (SSAS Maestro)
  • 10. Cube Example Consider the following Imports cube. It contains: Two measures: Packages Last Three related dimensions: Route Source Time 9 Karan Gulati (SSAS Maestro)
  • 11. Elements of Cubes Measures Dimensions Schema Star Snowflake 10 Karan Gulati (SSAS Maestro)
  • 12. Measures Measures are the key performance indicators that you want to evaluate. To determine which of the numbers in the data might be measures, here is a rule of thumb: If a number makes sense when it is aggregated, then it is a measure. 11 Karan Gulati (SSAS Maestro)
  • 13. Dimensions Dimensions are the categories of data analysis. Here is the rule of thumb: When a report is requested "by" something, that something is usually a dimension. 12 Karan Gulati (SSAS Maestro)
  • 14. Schema Methodology of arranging your Fact and Master tables: Star Schema Snow-Flake Schema 13 Karan Gulati (SSAS Maestro)
  • 15. Star Schema The figure shows a basic star schema; with the dimension tables arranged around a central fact table that contains the measures. A fact table contains a column for each measure as well as a column for each dimension. Each dimension column has a foreign-key relationship to the related dimension table, and the dimension columns taken together are the key to the fact table. 14 Karan Gulati (SSAS Maestro)
  • 16. Snowflake Normalizing each of the dimension tables so that there are many joins for each dimension results in a Snowflake Schema. It is called a Snowflake Schema because the “points” of the star get broken up into little branches that look like a snowflake. 15 Karan Gulati (SSAS Maestro)
  • 17. Which Schema works for you? Good question: It all depends on your requirement, I will say Star is very simple to understand and manage in comparison to Snow-flake but in real world you cant fit everything in one table so Normalize needs to be done. 16 Karan Gulati (SSAS Maestro)
  • 18. Surrogate Keys Also known: Meaningless keys Substitute keys Non-natural keys Artificial keys A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table. 17 Karan Gulati (SSAS Maestro)
  • 19. What’s benefit of Surrogate Keys A surrogate key is a unique value, usually an integer, assigned to each row in the dimension. This surrogate key becomes the primary key of the dimension table and is used to join the dimension to the associated foreign key field in the fact table. Surrogate keys helps in maintaining history in case of Slowly Changing Dimensions 18 Karan Gulati (SSAS Maestro)
  • 20. Slowly Changing Dimensions There are 3 Versions of SCD SCD 1 The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past) 19 Karan Gulati (SSAS Maestro)
  • 21. So, what’ Dis-Advantage of SCD1 The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to this is that these are very easy to maintain. Type 2 20 Karan Gulati (SSAS Maestro)
  • 22. SCD 2 The Type 2 method tracks historical data by creating multiple records in the dimensional tables with separate keys. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made. In the same example, if the supplier moves to Illinois, the table would look like this: Another popular method for tuple versioning is to add effective date columns. 21 Karan Gulati (SSAS Maestro)
  • 23. SCD 3 The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 will add additional columns to the tables: Note: Type 3, keeps separate columns for both the old and new attribute values—sometimes called “alternate realities.” In our experience, Type 3 is less common because it involves changing the physical tables and is not very scalable. 22 Karan Gulati (SSAS Maestro)
  • 24. Slowly Changing Dimension You can use SSIS or TSQL for implementing SCD in DW Here is a reference – http://blogs.msdn.com/b/karang/archive/2010/09/29/slowly-changing- dimension-using-ssis.aspx 23 Karan Gulati (SSAS Maestro)
  • 25. Thanks Contact Speaker - http://karanspeaks.com http://blogs.msdn.com/karang https://twitter.com/karangspeaks http://in.linkedin.com/in/karanspeaks 24 Karan Gulati (SSAS Maestro)

Notas del editor

  1. It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult. Example:On Jan 1 2010, Emp A belongs to Dept1, whatever sales made by this employee added to Dept1 but on June 1 2010 Emp A moved to Dept2. All his new sales contribution should be added to Dept2 from that day onwards and the old one should belong to the Dept2.If let's say in this case we have used business key (Primary key as stated in RDBMS) within data warehouse everything would be allocated to Dept2 even what actually belongs to Dept1If you use surrogate keys you could create on the 1st June a new record for the Employee 'A' in your Employee Dimension with a new surrogate key. This way in your fact table you have your old data (before June) with the SID of the Employee 'E1' + 'Dept1' All new data (after June) would take the SID of the employee 'E1' + 'Dept2' Key Points: