Este documento presenta una introducción a los almacenes de datos y las nuevas tendencias en este campo. Explica brevemente los conceptos clave de los almacenes de datos como dimensiones, tablas de hechos y ETL. También describe algunas tecnologías emergentes como Azure SQL Data Warehouse y Azure Data Lake que permiten el autoservicio de BI y el análisis de datos no estructurados. El documento concluye invitando preguntas y comentarios.
10. Llaves Sustitutas Vs Llaves de Negocio
CustomerKey CustomerAltKey FirstName LastName
1 1002 Amy Alberts
2 1005 Neil Black
Llave Suplente Llave de Negocio Atributos Adicionales de la dimensión
11. Atributos y Jerarquías
CustKey CustAltKey Name Country State City Phone Gender
1 1002 Amy Alberts Canada BC Vancouver 555 123 F
2 1005 Neil Black USA CA Irvine 555 321 M
3 1006 Ye Xu USA NY New York 555 222 M
Jerarquías FiltrosDetalles Detalles
12. Dimensiones Lentamente Cambiantes
CustKey CustAltKey Name Phone
1 1002 Amy Alberts 555 123
CustKey CustAltKey Name City Current Start End
1 1002 Amy Alberts Vancouver Yes 1/1/2000
CustKey CustAltKey Name Phone
1 1002 Amy Alberts 555 222
Tipo 1
CustKey CustAltKey Name City Current Start End
1 1002 Amy Alberts Vancouver No 1/1/2000 1/1/2012
4 1002 Amy Alberts Toronto Yes 1/1/2012
Tipo 2
CustKey CustAltKey Name Cars
1 1002 Amy Alberts 0
CustKey CustAltKey Name Prior Cars Current Cars
1 1002 Amy Alberts 0 1
Tipo 3
13. Dimensión de Tiempo
• Granularidad
• Rangos
• Múltiples Calendarios
• Incluir una fecha por
defecto
DateKey DateAltKey MonthDay Day MonthNo Month Year
00000000 01-01-1753 NULL NULL NULL NULL NULL
20130101 01-01-2013 1 Tue 01 Jan 2013
20130102 01-02-2013 2 Wed 01 Jan 2013
20130103 01-03-2013 3 Thu 01 Jan 2013
20130104 01-04-2013 4 Fri 01 Jan 2013
Use this topic to ensure that all students understand why the business key from the source system is not used as a unique key in dimension tables.
Emphasize that the categorization of attributes in this topic is simply used to help identify reasons why a data value would be included as a dimension attribute column. You do not need to apply any specific configuration to define an attribute as a slicer or a member of a hierarchy.
Point out that the levels of the hierarchy are all stored within a single dimension table, resulting in duplication. This is preferable to normalizing the data to create a table for each hierarchy in a snowflake schema. OLTP database developers might find this preference for duplication over normalization unintuitive, but remind them that dimension data is generally denormalized from multiple tables before being loaded, and does not experience the same level of transactional updates as would occur in an OLTP database. Therefore, the performance benefits of storing the data in a single table generally outweigh the reduced duplication benefits of normalizing the data.
The slide shows a before and after representation of the changes in Type 1, Type 2 and Type 3 tables.
Discuss the issues in the bulleted list in the student content. In some cases, you might choose to include a column for the parent alternate key as well as the parent key, because this can be useful in some load techniques.
Some techniques for loading self-referencing dimension tables are discussed in Module 4: Designing an ETL Solution.
As an alternative to a junk dimension, fact-specific attributes can be used to create degenerate dimensions in the fact table. This approach is discussed in the next lesson.
Point out that degenerate dimension columns provide the same capability as a junk dimension table. In a scenario where only one fact table requires the additional miscellaneous attributes for analysis and reporting, it is generally more efficient to include them as degenerate dimension columns. Conversely, if the additional attributes are relevant for multiple fact tables, a junk dimension is probably a better choice.
Discuss the note about fact table primary keys in the student manual. Students with a strong background in relational database design might feel uncomfortable about not defining a primary key for every table. If, however, there is no need to uniquely identify individual fact rows, and the ETL process can be relied on to eliminate accidental duplicate entries, defining a primary key adds unnecessary overhead to the table definition and generates an index, which can negatively affect the performance of data loads.
Similarly, note that declaring foreign-key constraints on dimension-key columns in a fact table is not necessary to enforce referential integrity in most data warehouses, and can negatively impact load performance. You can declare them, and then drop and recreate them during each load, but this creates its own overhead and adds little value if the ETL process is correctly implemented. The query optimizer can use foreign-key constraints to identify the fact table in a star join query but, in their absence, selects the largest table, which is usually correct.
Discuss the importance of including a row for “Unknown” or “None” in the time dimension table when using accumulating snapshot fact tables.
Point out that accumulating snapshot fact tables must be updated after the initial load. This requirement can affect the physical design of the table, especially if partitions or column store indexes are used. These considerations are discussed in the next lesson and in Module 4: Designing an ETL Solution.