Enviar búsqueda
Cargar
BI Environment Technical Analysis
•
0 recomendaciones
•
79 vistas
Ryan Casey
Seguir
BI Environment Technical Analysis
Leer menos
Leer más
Datos y análisis
Denunciar
Compartir
Denunciar
Compartir
1 de 21
Recomendados
HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
Kent Graziano
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
SolarWinds
Data Visualization and Discovery
Data Visualization and Discovery
Datavail
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
Tammy Bednar
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
Lucas Jellema
Recomendados
HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
Kent Graziano
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
SolarWinds
Data Visualization and Discovery
Data Visualization and Discovery
Datavail
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
RTTS
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
#dbhouseparty - Graph Technologies - More than just Social (Distancing) Networks
Tammy Bednar
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
Lucas Jellema
Data Lake
Data Lake
Anitha Krishnappa
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
Terry Bunio
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
Syed Jahanzaib Bin Hassan - JBH Syed
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
Maria Colgan
Oracle data integrator (odi) online training
Oracle data integrator (odi) online training
Glory IT Technologies Pvt. Ltd.
Disaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQL
Syed Jahanzaib Bin Hassan - JBH Syed
Data warehouse design
Data warehouse design
ines beltaief
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
Sandesh Rao
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Charlie Berger
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Cloudera, Inc.
Ryan-Symposium-v5
Ryan-Symposium-v5
Kevin Ryan
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
RTTS
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
Tammy Bednar
Data Federation
Data Federation
Stephen Lahanas
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)
Karan Gulati
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Ravindra kumar
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Charlie Berger
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
RTTS
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
DATAVERSITY
Sql server 2008_replication_technical_case_study
Sql server 2008_replication_technical_case_study
Klaudiia Jacome
Más contenido relacionado
La actualidad más candente
Data Lake
Data Lake
Anitha Krishnappa
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
Terry Bunio
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
Syed Jahanzaib Bin Hassan - JBH Syed
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
Maria Colgan
Oracle data integrator (odi) online training
Oracle data integrator (odi) online training
Glory IT Technologies Pvt. Ltd.
Disaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQL
Syed Jahanzaib Bin Hassan - JBH Syed
Data warehouse design
Data warehouse design
ines beltaief
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
Sandesh Rao
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Charlie Berger
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Cloudera, Inc.
Ryan-Symposium-v5
Ryan-Symposium-v5
Kevin Ryan
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
RTTS
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
Tammy Bednar
Data Federation
Data Federation
Stephen Lahanas
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)
Karan Gulati
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Ravindra kumar
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Charlie Berger
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
RTTS
La actualidad más candente
(20)
Data Lake
Data Lake
A data driven etl test framework sqlsat madison
A data driven etl test framework sqlsat madison
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
Oracle data integrator (odi) online training
Oracle data integrator (odi) online training
Disaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQL
Data warehouse design
Data warehouse design
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Oracle Database House Party_Oracle Machine Learning to Pick a Good Inexpensiv...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Hadoop World 2011: I Want to Be BIG - Lessons Learned at Scale - David "Sunny...
Ryan-Symposium-v5
Ryan-Symposium-v5
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
Data Federation
Data Federation
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Quick Guide to Refresh Spark skills
Quick Guide to Refresh Spark skills
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
Oracle Machine Learning Overview and From Oracle Data Professional to Oracle ...
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
Similar a BI Environment Technical Analysis
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
DATAVERSITY
Sql server 2008_replication_technical_case_study
Sql server 2008_replication_technical_case_study
Klaudiia Jacome
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
tovetrivel
Presentation application change management and data masking strategies for ...
Presentation application change management and data masking strategies for ...
xKinAnx
Resume_of_Vasudevan - Hadoop
Resume_of_Vasudevan - Hadoop
vasudevan venkatraman
ShashankJainMSBI
ShashankJainMSBI
Shashank Jain
A Primer To Sybase Iq Development July 13
A Primer To Sybase Iq Development July 13
sparkwan
Run Your Oracle BI QA Cycles More Effectively
Run Your Oracle BI QA Cycles More Effectively
KPI Partners
Service quality monitoring system architecture
Service quality monitoring system architecture
Matsuo Sawahashi
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
Harsha Gowda B R
rough-work.pptx
rough-work.pptx
sharpan
Financial, Retail And Shopping Domains
Financial, Retail And Shopping Domains
Sonia Sanchez
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
Maria Colgan
ChakravarthyUppara
ChakravarthyUppara
Chakravarthy Uppara
Satya Cv
Satya Cv
sqlmaster
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
Terry Bunio
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
BsMath3rdsem
SSAS RLS Prototype | Vision and Scope Document
SSAS RLS Prototype | Vision and Scope Document
Ryan Casey
Copy of Alok_Singh_CV
Copy of Alok_Singh_CV
Alok Singh
Similar a BI Environment Technical Analysis
(20)
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
Sql server 2008_replication_technical_case_study
Sql server 2008_replication_technical_case_study
Day 02 sap_bi_overview_and_terminology
Day 02 sap_bi_overview_and_terminology
Presentation application change management and data masking strategies for ...
Presentation application change management and data masking strategies for ...
Resume_of_Vasudevan - Hadoop
Resume_of_Vasudevan - Hadoop
ShashankJainMSBI
ShashankJainMSBI
A Primer To Sybase Iq Development July 13
A Primer To Sybase Iq Development July 13
Run Your Oracle BI QA Cycles More Effectively
Run Your Oracle BI QA Cycles More Effectively
Service quality monitoring system architecture
Service quality monitoring system architecture
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
rough-work.pptx
rough-work.pptx
Financial, Retail And Shopping Domains
Financial, Retail And Shopping Domains
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
ChakravarthyUppara
ChakravarthyUppara
Satya Cv
Satya Cv
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
SSAS RLS Prototype | Vision and Scope Document
SSAS RLS Prototype | Vision and Scope Document
Copy of Alok_Singh_CV
Copy of Alok_Singh_CV
Más de Ryan Casey
Invoicing Bus Matrix
Invoicing Bus Matrix
Ryan Casey
First Steps Snapshot vs Transaction Grain Statements
First Steps Snapshot vs Transaction Grain Statements
Ryan Casey
First Steps to Define Grain
First Steps to Define Grain
Ryan Casey
RLS Prototype ETL | Vision and Scope Document
RLS Prototype ETL | Vision and Scope Document
Ryan Casey
Dynamic CSV String Business Rules and Pseudo Logic
Dynamic CSV String Business Rules and Pseudo Logic
Ryan Casey
Defining the Grain | Source system: Dynamics 365
Defining the Grain | Source system: Dynamics 365
Ryan Casey
SSRS RLS Prototype | Vision and Scope Document
SSRS RLS Prototype | Vision and Scope Document
Ryan Casey
Más de Ryan Casey
(7)
Invoicing Bus Matrix
Invoicing Bus Matrix
First Steps Snapshot vs Transaction Grain Statements
First Steps Snapshot vs Transaction Grain Statements
First Steps to Define Grain
First Steps to Define Grain
RLS Prototype ETL | Vision and Scope Document
RLS Prototype ETL | Vision and Scope Document
Dynamic CSV String Business Rules and Pseudo Logic
Dynamic CSV String Business Rules and Pseudo Logic
Defining the Grain | Source system: Dynamics 365
Defining the Grain | Source system: Dynamics 365
SSRS RLS Prototype | Vision and Scope Document
SSRS RLS Prototype | Vision and Scope Document
Último
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Delhi Call girls
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
amy56318795
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Valters Lauzums
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Call Girls in Nagpur High Profile Call Girls
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
manisha194592
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
olyaivanovalion
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
Delhi Call girls
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
MoniSankarHazra
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Timothy Spann
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Delhi Call girls
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
michael115558
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
Último
(20)
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
BI Environment Technical Analysis
1.
SmithGroup JJR |
Technical Analysis of BI Environment Version 1.0
2.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 2 of 21 Technical Analysis of BI Environment Executive Summary At the beginning of the engagement with the SmithGroup JJR, AIM Report Writing was asked to provide an initial, high level Technical Analysis of their Business Intelligence (BI) Environment. This document is what resulted from the weeks of investigation that followed. The following areas were agreed upon for this delivery item: • Executive Summary • Recommendations • Data Warehouse Architecture • SQL Server Best Practices • Case for Hadoop: Indoor Positioning Study (POE) • SQL Server / Database Discovery • Data Warehouse and Data Marts • Extract, Transform, and Load • Appendix A | Microsoft Data Warehouse On-Premises Architecture • Appendix B | Design Questions to Review The current business intelligence and data warehouse environment at SmithGroup JJR includes 3 primary components: a data warehouse, an ETL process, and a data mart. One additional storage location exists on separate SQL Server resources. There are two main data sources stored in the cloud with the intent of future expansion in the cloud. This business intelligence and data warehouse environment was analyzed against SQL Server Best Practices, the Data Warehouse (Data Mart), and the ETL process. In addition to this analysis, a database discovery and a case for Hadoop was completed. In order to implement the following recommendations, it is suggested to follow an Agile approach. This document’s technical analysis has identified and allowed the defining groups of work, called an Epic in Agile. Breaking up the findings of this analysis into Epics (groups of work) provides a starting points for identifying scope and vision, user stories, and backlog. These Agile efforts provide the framework for defining the effort, sprints, and delivery to production. During on-site meetings regarding the topics in this document, it was informally agreed that the initial two Epics for development focus on the two areas listed below: • SSIS Error Flows to replace TSQL Functions • Dimensional Modeling and Star Schema The team at SmithGroup JJR has started gathering use cases. These use cases will be used during the dimensional modeling (star schema) development. These use cases serve 4 purposes: • Identify Entities, Relationships, and Attributes for Star Schema Conceptual Model • Used to Develop the Dimensional Model, Star Schema Conceptual Model • After Draft Conceptual Model is Complete They are Used to Verify that the Model can Source the Use Cases • Used to Develop the Front End Requirement such as a Report, or Dashboard Future Epics need to address validation, scalability, transaction processing, and load Meta data.
3.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 3 of 21 Recommendations • SSIS Error Flows to replace TSQL Functions o All Existing Source to Stage functions in ETL Database o This needs to be investigated as to where and how this change would be implemented o Alternative: T-SQL Error Control can produce the current failing row, but not a batch of failed rows like SSIS • Dimensional Modeling and Star Schema o Continue to Gather and Collect Use Cases o Identify Entities (Dimensions), Relationships, and Attributes for Star Schema Conceptual Model ▪ Used to Obtain Stakeholder Consensus o Create Flat Table Definitions of Entities (Dimensions) using Excel o Model the Star Schema Conceptual Model using ERWin o Continued Learning and Training on ERWin, possible Pluralsight, or Webinars (http://erwin.com/videos/) • Architecture | Option 1, Shown Below in the Following Section, Data Warehouse Architecture o Current Use does not require an integrated cloud environment ▪ Users do not experience issues using a gateway with on-premises in terms of performance ▪ Especially in consideration the on-premises environment should receive maximum efforts o Cloud Analysis and Analytics using Power BI using Gateway and On-Premises Data o Tableau users have Access to On-Premises Data for Analytics o Approach allows a scalable and future roadmap to Integrate the On-Premises and the Cloud • Hadoop | Reserve for Future Roadmap o Low Volume | Carl Estimated 1TB of Data ▪ As an unwritten rule shared by experts, Hadoop needs at least 5TB to justify and achieve performance o High to Moderate Investment for On-Premises, or Cloud based Hadoop o Although Volume, Velocity, Variety, and Veracity are all considerations, Volume is required for federation. • Naming of Business Resource with Industry Standard Naming o Data Lake is used with Hadoop. Rename to something else o Data Vault is actually a Data Warehouse (not a big concern like the above naming conflict) • 2 Summary Tables from the Analysis Sections towards the end of this document. o 1 | Data Warehouse and Data Marts o 2 | Extract, Transform, and Load 1 | Data Warehouse and Data Marts Security The SQL Server security appears to be adequate and industry standards. Partitioning After review of dev data only, we do not see a data need for partitioning. No performance issues reported. Alerting Combining Try-catch, database mail, and SQL Agent is highly recommended for SQL Server alerting of issues. Indexing Index discovery and creating an enterprise index strategy is recommended for production servers. Star Schema There is currently no star schema and it is highly recommended. Conformed Dimensions Since there is not a star schema, there are not conformed dimensions. Scalability Scalability appears to be a concern. A future use plan for instances, files, and file groups is suggested. Exception Handling Try-Catch is not being used in functions and stored procedures. It is highly recommended that they be added. Transaction Processing The environment doesn’t use Transaction Processing. Transaction Processing is recommend for future phases. SQL Views (Business Views) A star schema is suggested to reduce complexity in creating and managing SQL Views for the business. Surrogate Keys Surrogate Keys with a data type of Integer is suggested for the star schema. Delta Loads TSQL Merge is adequate, however the additional use of checksums should be considered. Meta data needed. 2 | Extract, Transform, and Load Load Meta Data Currently, there no tracking of load meta data and it is highly recommended. Environments and Environment Variables Currently, environments and environment variables are being used with success. Parameters Currently, parameters are being used with success. Logging Logging with the SSIS Framework is working, however, load meta data logging is suggested. Validation Since little, to no validation is currently employed, it is highly recommended. Transaction Processing It is recommend not using transaction processing in SSIS, but rather at the SQL Server level functions and SPs. Package Sequencing There are no reported errors, or issues with the current package sequencing. Connection Managers Currently, connection managers are being used with success. Alerting Alerting is not enabled in the solutions evaluated. Alerting is highly recommended. Exception Handling There is no exception handling in the currently evaluated packages, this is highly recommended. Checkpoints It is recommended that some restart ability be designed and implemented. Naming Conventions Naming conventions are recommended.
4.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 4 of 21 Data Warehouse Architecture Below, we provided diagrams representing the current architecture and 2 options for the next stage of the BI / Data Warehouse Architecture. An “all-features” Microsoft On-Premises Architecture diagram can be found in Appendix A. These diagrams are intended to help the decision makers compare their current architecture with possible phases. In order to help clarify these phases, this section also includes information about the Current Error Control Design and information about Integrating On-Premises and the Cloud. The Current Architecture stages both enterprise and project / application specific data sources from both internal and external locations. Data sources intended for the data warehouse are staged first then loaded. The data stored in the Data Mart Data Warehouse (3NF) Data Vault UltiPro Schema Vision Schema SmithGroupJJR All On-Premises DW and Cloud | Current UltiPro Active Directory Vision NewForma Enterprise Systems ETL Stage ELT Process Structured Data RevIt NSF, IPEDs SQL Server Indoor Positioning Denormalized Tableau End User Power BI End User Blue Vision IPS RAW Blue Vision Azure Event Hub Azure Streaming Analytics Azure Table Storage Azure SQL Database Azure SQL Database Marquette Normalized Data Figure 1 | Current Architecture data warehouse becomes the source for the data mart. Data source examples include: Ultipro, Vision, and Active Directory. Other data sources such as NSF, IPEDs, and RevIt are stored on other dedicated SQL Server storage. In the cloud, data sources from Indoor Positioning and Marquette indicate the slow adoption of integrating on-premises data with cloud data. With this identified, the options discussed include an all on-premises option and an integrated on- premises and cloud option.
5.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 5 of 21 Phase 1, an All On-Premises Data Warehouse design dictates that all of the data structures and data be stored on internal company resources (no cloud). In this option, the star schema and cubes exist and remain on internal company resources, however, these resources and their content connect to cloud apps such as Power BI. This connection is facilitated by a gateway. In this option, the gateway is required to make multiple trips when sourcing data. However, this design allows end users using Power BI, or Tableau a flexible and feasible option. All On-Premises with Gateway Data Mart Star Schema Cube Analyitics Data Mart Data Warehouse (3NF) Data Vault UltiPro Schema Vision Schema SmithGroupJJR All On-Premises DW and Cloud | Option 1 UltiPro Active Directory Vision NewForma Enterprise Systems ETL Stage ELT Process Structured Data RevIt NSF, IPEDs SQL Server Indoor Positioning Denormalized Tableau End User Power BI End User Blue Vision IPS RAW Star Schema Blue Vision Azure Event Hub Azure Streaming Analytics Azure Table Storage Azure SQL Database HTTP No VPN Azure SQL Database Marquette Normalized Data Figure 2| Option 1
6.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 6 of 21 Phase 2, an Integrated On-Premises and Cloud Data Warehouse, seeks to design a hybrid data warehouse providing the best of both the on-premises and cloud data warehouses. In option 2, SQL Server Integration Services loads data from a data mart start schema into Azure SQL Database, or directly into Azure SQL Server Analysis Services. This option differs from option 1 since the cloud is used to store on-premises data in the cloud to be consumed by applications such as Power BI. When using Power BI as a data scientist, or a business analyst, having the on-premises data in the cloud provides fast analysis with external data already in the cloud. In the first option, the gateway is required to make multiple trips when sourcing data. In this option, the data exists in the cloud, so the gateway use in minimalized. Gateway Data Mart Star Schema Cube Analyitics Data Mart Data Warehouse (3NF) Data Vault UltiPro Schema Vision Schema SmithGroupJJR On-Premises and Cloud | Option 2 UltiPro Active Directory Vision NewForma Enterprise Systems ETL Stage ELT Process Structured Data RevIt NSF, IPEDs SQL Server Indoor Positioning Denormalized Tableau End User Power BI End User Blue Vision IPS RAW Star Schema Blue Vision Azure Event Hub Azure Streaming Analytics Azure Table Storage Azure SQL Database Azure SSAS VPN Azure SQL Database Azure SQL Database Marquette Normalized Data Figure 3| Option 2
7.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 7 of 21 In the next two diagrams, the flow of data is separated into five phases. These phases are Enterprise Source Systems, Staging, Data Warehouse, Data Mart, and Star Schema. However, for this analysis, we are focusing on the first two rectangles titled Enterprise Source Systems and Staging. The first diagram displays the current use of functions to extract data from the source systems. The second diagram displays a possible use of SSIS Error Flows. Functions, the diagram below represents the current flow of data where functions extract data using functions. These function are intended to exist on the actual source system in a database named ETL, but in the case of Ultipro (backup / restore) the functions exist on the ETL database used by the data warehouse. These functions are used to load the staging tables used in the downstream merge. The advantage to this design is that changes to the architecture can be implemented without affecting downstream objects such as SSIS. The concern with this design is that during a load failure, specific rows that have failed are not easily identifiable to generate a detailed alert that contains the failed rows. Star Schema SmithGroupJJR | Current using Functions Star Schema Enterprise Source Systems Staging Data Warehouse Data Mart SSAS SSAS SSAS SSAS SQL Functions ETL Database Stores Table Returned by Functions SQL Stored Procedures ETL Database Stores Functions to Create Load Tables from Source UltiPro Active Directory Vision Backup and Restore Process. Requires Functions to be stored on Stage, not the Source System. Functions stored on Source System. Process Uses SSIS Plugin for Connecting and Extracting Active Directory Data Merge Statement Loads Inserts and Updates Into Data Warehouse DeNormalized SQL Functions SQL Stored Procedures Stored Procedures on the Data Mart Execute Functions on the Data Warehouse to Load the Data Mart Star Schema to be Designed and Developed in Future Phases Using Functions to Pull the Source Data Prevents using SSIS Data Flow Tasks. This means that we can t have an error flow that stores failed rows for evaluation and fixing. Warning! Figure 4| Current Architecture Using Functions
8.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 8 of 21 Error Flow, the diagram below represents the proposed flow of data using SSIS Error flow. The use of SSIS is intended to replace the existing functions that are used to load the staging tables. As shown in the diagram, SSIS has the ability to create an error flow to capture rows that fail the load process. This ability allows the details and cause of the failure to be emailed to alert the appropriate stake holders. Once SSIS loads the staging tables and stores any row failures, the rest of the data flow remains the same as in the current diagram. Star Schema SmithGroupJJR | Using Error Flow Star Schema Enterprise Source Systems Staging Data Warehouse Data Mart SSAS SSAS SSAS SSAS ETL Database Stores Table Returned by Functions SQL Stored Procedures Data Flow Task uses an SSIS Data Source to Extract Data from Source Systems UltiPro Active Directory Vision Merge Statement Loads Inserts and Updates Into Data Warehouse DeNormalized SQL Functions SQL Stored Procedures Stored Procedures on the Data Mart Execute Functions on the Data Warehouse to Load the Data Mart Star Schema to be Designed and Developed in Future Phases Using The Data Flow Task Allows the Use of Error Flows. This means that we can have an error flow that stores failed rows for evaluation and fixing. Notice! Figure 5 | Current Architecture Using SSIS Error Flows
9.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 9 of 21 The options to Integrate On-Premises and Cloud are diagramed below. The Full Overview show a Site-to-Site and Point- to-Site VPNs as well as a HTTP connection. All three of these options provide different levels of security and IPSec standards. An additional option to the Site-to-Site VPN is ExpressRoute (https://azure.microsoft.com/en-us/services/expressroute/). ExpressRoute is a Microsoft Azure app that provides advanced scalability, increased reliability and speed, lower latency, and WAN integration. This is a fee and pay for use application. SmithGroupJJR | Integrate On Premises and Cloud Full Overview | Integrate On Premises and Cloud Site-to-Site Secure VPN | ExpressRoute Point-to-Site VPN | HTTP Internet WorkstationGateway SQL Server Workstation Point-to-Site VPN Site -to-Site VPN (ExpressRoute) • Secure • Controlled • better connectivity quality SQL Server Site -to-Site VPN (ExpressRoute) * optional • Secure • Controlled • better connectivity quality Gateway Gateway Workstation Point-to-Site VPN Figure 6 | Integrate On-Premises and Cloud
10.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 10 of 21 SQL Server Best Practices SQL Server best practices were discussed and explained during a meeting with the SmithGroup JJR DBA and Infrastructure teams. The meeting was demonstrated on development servers to ensure production SLAs. All decisions in regards to implement, or not to implement these best practices and when, were left up to the SmithGroup JJR. NTFS Allocation Unit (AU) Block Size = 64k, Alignment = 1024k | Default is 4k, Use /L with Format on Windows 2012 and above. Max Degree of Parallelism MAXDOP | Set to Number of Cores in a Single CPU Socket DB Auto Growth Set very high for performance. (100MB to xxx GB) Cost Threshold for Parallelism For OLTP where we seek to minimize Parallelism and offer more concurrency then Use 15-20. Up to 50 with modern CPUs. For DSS, OLAP, Data Warehouse, and test environments Consider leaving at default and Managing Parallelism with MAXDOP if concurrency is a problem. TempDB 1:2, or 1:4 Ratio (TempDB Data Files to Cores). 1:1 Ratio for large systems. Pre SQL Server 2016: Use Trace Flags T1117 and T1118 to enable consistent AutoGrowth. On Flash Arrays enable SORT_IN_TEMPDB Index Build option to prevent index rebuilds. Separate Data / Log Volumes Tier 1. Test to determine for Tier 2 Flash Arrays. Multiple Volumes per File Group to Reduce Latch Contention. 4-8 Files per File Group. 3 Volumes | TempDB, Data / Log Files, and Backups for Fast Flash (Under 1ms Response Times) Max Server Memory 90% of Available Server Memory Enable Instant File Initialization Windows Server Setting: Perform Volume Maintenance Tasks needs to be Set Under Local Policies and User Rights Assignments. Case for Hadoop: Indoor Positioning Study (POE) During our initial meetings regarding the data sources here at the SmithGroup JJR, we identified one possible use case for Hadoop. This one possible use case is Indoor Positioning Study (POE). During our conversations, multiple questions were asked about Hadoop such as what the minimum size is and how to handle aggregates on unstructured data. Hadoop does not perform well on 5TB, or less. Also, it is worth noting small files do not work well with Hadoop and should be combined into larger files. As for aggregates in Hadoop, if SmithGroup JJR were to use Azure Data Lake Store (ADLS), they could use HDInsights and Hive, or if they use SQL Data Warehouse, or SQL Server 2016 then they could use PolyBase. Another option is Azure Data Lake Analytics/U-SQL to aggregate Hadoop data. Below are some questions that are given in the Indoor Positioning Study (POE) documentation to describe the questions and answers that the SmithGroup JJR would like to answer with this data source. These questions below are broad topics, each topic with more specific questions. • How do people utilize space? o What is the average dwell time by space? o How does the number of people within a space vary over time? o What are the most frequently used paths between spaces? • How do people interact and collaborate? o How much time do people spend in spaces occupied by other people? o What is the average number of people in a collaborative space? o How does job/organizational role impact collaboration? • Person movement o How often do people move between spaces? o What is the average duration of rest (motion)?
11.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 11 of 21 Additional Questions: • Exact location of a user within space • Actual paths traveled between spaces • Relationship between workspace and study subject (employee/organizational) measures, such as happiness or productivity (what is an abstract term that captures these types of things?) • Comparison of varied workspace configurations / designs / arrangements such as office /open / free assignment • Integration with other technologies and data sources, such as space scheduling software, communication software, galvanic skin response, implanted telemetry chips, health and dental records, etc. After talking with Peter, he estimated that the size in terabytes for the Indoor Positioning Study (POE) at the SmithGroup JJR was at most one terabyte. Understanding that this is much less that the five terabyte minimum for Hadoop clusters, it is not suggested to implement a Hadoop cluster for this use case. SQL Server / Database Discovery In order to complete a data discovery, we were provided 3 databases: • DataVault • DataMart • ETL We provided a data discovery by employing 2 different methods. The first method was to create a web based document of each database using Redgate’s SQL Doc. The second method was to use SQL Server DMVs and TSQL to create an Excel based data dictionary. The files are included on the SharePoint folder along with this document. Also note, that Shabhana provided the WBS Migration Changes to Datawarehouse Systems.pdf where you can find much of this type of information as well. Finally, we collected information about the various data sources (both internal and external). The list of data sources is as follows: Internal Enterprise Data Source • Vision Enterprise Resource Planning Software • UltiPro Human Resources • SharePoint Document Management and Collaboration • Active Directory (AD) Network / Domain Information • NewForma Project Meta Data and RFIs Project / Application Specific • Revit Data Collector Building Information Modeling (Model Statistics) • CER • WorkSim Space Planning Space Planning • Indoor Positioning Study (POE) Azure SQL for People Movement in Workspace • Campus Project Data (Marquette) Campus Planning & Space External • IPEDS Public University Data
12.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 12 of 21 • National Science Foundation Public for Funded Projects • Bureau of Labor Government Labor Statistics • GIS Typographical, Surveys of Land Data Warehouse and Data Marts In order to analyze SmithGroup JJR’s Data Warehouse / Data Mart environment, we were provided 3 databases: • DataVault • DataMart • ETL Overview The data warehouse (DataVault) and the data mart of the same name are the 2 databases that make up the SmithGroup JJR BI environment. The DataVault is a 3NF database. The DataMart in de-normalized and currently contains employee and project data. At this time, there is no star schema. However, there are plans to build out a star schema in the future. The DataVault stores source data by the corresponding source system name by using schemas of the same name such as UltiPro and Vision. In order to complete this analysis below, a Server and Database Discovery was completed as well. Analysis The areas for analysis for these 2 databases include the following topics: • Security • Scalability • Partitioning • Exception Handling • Alerting • Transaction Processing • Indexing • SQL Views (Business Views) • Star Schema • Surrogate Keys • Conformed Dimensions • Delta Loads (Merge SCD 1 and SCD 2, Checksums) Security should always be the first concern in planning and deploying any data warehouse / data mart environment. In reviewing what roles were defined, we found the following server roles: bulkadmin, dbcreator, diskadmin, processadmin, public, securityadmin, serveradmin, setupadmin, and sysadmin. There were no user defined SQL Server roles. The sa account was enabled, but not being used. There was not an implementations of Row Level Security or, Role Based Security. SQL Schemas, such as ultipro, vision, ad, and admin were used to scale and organize the various SQL Server objects. Scalability is a very high priority for companies that want to deliver solutions that last 5, or more years after the initial deployment. Many of the Server and Database DMVs listed above help us determine the scalability. For instance, using instances, partitioning, files and file groups, and synonyms can help make a system more scalable. Instances allow better resource management between different processes on the same server. They also allow us to separate load layers such as stage, consolidation, transformation, 3NF, Star, and Analytics. Since we only had access to development servers, we
13.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 13 of 21 did not see any examples of instances, but we highly recommend them in production. We also looked at partitioning, but that will be discussed below. As for files and file groups, we have provided an Excel spreadsheet to identify what the files and file groups and their current sizes. We also provided size information for all of the tables in the 3 databases we were asked to analyze. File and Table sizes are important indicators for scalability and where to set the auto growth for your tables. The data and log files were on the same volume and had the following sizes: FileName FileSizeMB SpaceUsedMB AvailableSpaceMB %FreeSpace DataVault 3004 1636.69 1367.31 45.52 DataVault_log 36828.31 1256.45 35571.87 96.59 When looking at development, we were okay with these settings, however, the auto growth was not what we recommend in production. Finally, synonyms are an easy way to manage server to server (physical, or instances) connections without taking the risk of using linked servers. We did notice 3 linked servers (FINANCIALDATA, SGJJR- SQL2ASCCM2012, and VISIONDEVDB. These linked servers were not part of our scope provided by the SmithGroup JJR, however, we would warn against relying too heavy on linked servers. Partitioning is a great way to manage reporting performance in a data warehouse / data mart environment. Currently, the SmithGroup JJR is not using any partitioning strategy. Since we only had access to the development data, it is hard to tell if the sizes in our analysis represent the real production sizes, however, with the database sizes we encountered in the scope of this analysis, we do not recommend partitioning at this time. For the DataVault, there were a total of 2,548,891 rows. The table with the most rows was 730,655 rows in the [vision].[ProjectFinancialsByPeriod] table. We can see in the section above that data size for DataVault is 3004 MB. At this time, partitioning is not recommended. Exception Handling in SQL Server (TSQL) is accomplished by using Try Catch clauses. We did our due diligence in verifying that there is no exception handling at the SQL Server level. We confirmed this with the different teams at SmithGroup JJR. Exception handling is recommend for future phases of development. Alerting in SQL Server is a combination of Try Catch clauses, database mail, and SQL Server Agent. Both SQL Server Agent allow us to define operators and alerts. Alerts can then be defined on performance conditions, or SQL Server events based on an error number from a try catch clause. Alerting is recommend for future phases of development. Transaction Processing is the process of ensuring that data is written to disk before we commit a transaction and move on to the next step in the process. Transaction Processing also provides a mechanism to rollback any data that have been written if the transaction fails before a commit can take place. Transaction Processing is a critical part of any design. Currently, the environment here does not use Transaction Processing. Transaction Processing is recommend for future phases of development. Indexing has a huge impact on Server and Query Performance. DMV queries to identify any unused indexes and what indexes need to be re-organized and re-built should be ran on a regular basis. Index discovery and creating an enterprise index strategy is recommended. SQL Views (Business Views) can be used to denormalize and simplify data structures in the 3NF for reporting purposes. Currently, both the DataVault and DataMart use SQL Views, however, the ETL database does not. There are 32 SQL Views grouped into 5 different schemas (admin, api, dbo, lookup, and vision) in the DataVault database. The DataMart
14.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 14 of 21 database has 10 SQL Views all in the dbo schema. A star schema is suggested to reduce complexity in creating and managing SQL Views for the business. Star Schema is not used and is not being developed. It is highly recommended for future phases. Surrogate Keys are used to provide referential security in a Data Warehouse / Data Mart that sources data from numerous data sources that all have different keys defined for the same entity / attribute such as Person / Social Security Number. Surrogate keys are employed in the SmithGroup JJR DataVault and DataMart, however, GUIDs have been used. This design has no issues for the data warehouse, however, the reporting start schema should use integers for load and processing performance. Conformed Dimensions at the database level entails ensuring that the Data Warehouse in a star schema only has 1 dimension for a specific entity such as employee, or region. Any data mart use of the entity employee, or region need to be sourced from the Data Warehouse and not reloaded with different logic and processes. Since there is not a star schema for DataVault, or DataMart, we do not have any dimensions to conform. We suggest a robust star schema for both a data warehouse and data marts. Delta Loads for data is both a performance issue and a management issue. Loading only the data that has changed since the last load can be implemented and managed in many ways. We can use TSQL merge, checksums, and last load date tables to determine if a row has changed since the last time the table has been loaded. SmithGroup JJR uses TQL merge, but not checksums and a last load date table. At the size of the data today, checksums and storing a last load date is not necessary, but recommended for performance and scalability. These same processes can also be used to implement slowly changing dimensions once a star schema is developed. The was an initial plan at the SmithGroup JJR to use a logging, error logging, and number table to manage load meta data, but it was not implemented.
15.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 15 of 21 Extract, Transform, and Load In order to analyze SmithGroup JJR’s SSIS environment, we were provided 4 SSIS Solutions for the following areas: • Active Directory | 12 Packages • Deltek Vision | 63 Packages • Ultipro | 19 Packages • NewForma | 21 Packages with 11 Disabled Overview The Active Directory load process uses the KingswaySoft Directory Services Integration toolkit to provide access to Active Directory data. Using this tool, this solution extracts Active Directory data for the following areas: computer, group, group member, and user. The master ActiveDirectory.dtsx package call 3 sub-packages named: Extract, Transform, and Load. As the names of these packages indicate, the Extracts package takes data from the source system and temporarily stores this data in the ETL database. Unlike the other more complex ETL solutions, this solution does not have any tasks, or data flow transformations in the Transformation package. This Transform.dtsx could possibly be disabled. The Loads package calls stored procedures located in the ETL database then loads the data warehouse. The Active Directory design pattern uses the KingswaySoft Directory Services Integration toolkit to extract the data from the source system and place the extracted data into stage tables located on the Data Warehouse in the ETL database on that server. Once the data is staged in the ETL database on the Data Warehouse, ETL stored procedures in the ETL database load the staged data into the DataVault tables using the merge statement. The Deltek Vision solution also uses a similar process by calling separate sub-packages for the Extract, Transform, and Load phases of the data load. However, this solution also has a PreProcessing and a PostProcessing package. The PreProcessing package truncates the TPH tables. The PostProcessing package is empty and could possibly be disabled.
16.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 16 of 21 The Extracts package takes data from the source system and temporarily stores this data in the ETL database for tables like client, vendor, and employee. The Transform package transforms data for vendor / client, contact / employee, project / opportunity, project dependents. The Loads package calls stored procedures located in the ETL database then loads the data warehouse for these same areas such as client, vendor, and employee. The Vision design pattern includes an ETL database that is stored on the transactional server. This ETL database stores functions that are used to extract the data from the source system and place the extracted data into stage tables located on the Data Warehouse in the ETL database on that server. Once the data is staged in the ETL database on the Data Warehouse, ETL stored procedures in the ETL database load the staged data into the DataVault tables using the merge statement. The Ultipro solution uses a different process by organizing the Extract, Transform, and Load phases of the data load into separate containers. The Extracts package stores data in the ETL database. The Transform package transforms data for organization, employment, and employmentHistory. The Load package loads tables using the TSQL Merge statement from the ETL database to the data warehouse. Please review Shobhana’s WBS Migration Changes to Datawarehouse Systems.pdf for an ETL Dataflow Diagram and other useful package information.
17.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 17 of 21 Since Ultipro is a backup and restore process, the Ultipro design pattern does not include ETL database functions that are stored on the transactional server. Instead, these extract functions are stored in the ETL database on the Data Warehouse server. This ETL database stores functions that are used to extract the data from the source system and place the extracted data into stage tables located on the Data Warehouse in the ETL database on that server. Once the data is staged in the ETL database on the Data Warehouse, ETL stored procedures in the ETL database load the staged data into the DataVault tables using the merge statement. The NewForma (oblivion) load process is a non-standard load process that needs to be updated into the new design pattern described with the Vision load process above. There is a monthly load that calls a weekly load that calls the hourly load that is not being currently used. The weekly load package also has an archival process. Besides the monthly load, there is a daily load that calls the hourly load. Both the weekly and daily load packages call the same hourly package. The current state of the NewForma load process executes two packages in parallel. The first package is Execute etlOrgChart and the second package is Execute etlNewformaProjects. Execute etlNewformaProjects has two child packages named Execute etlProjectRFIs and Execute etlProjectMilestones. The packages etlOrgChart and
18.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 18 of 21 etlProjectMilestones both use the KingswaySoft SharePoint Integration toolkit to extract and load data to and from SharePoint lists. This process needs to be updated and implemented to production using the standard process. Analysis The areas for analysis for these 3 solutions include the following topics: • Load Meta Data • Package Sequencing (Master and Child Packages, SQL Jobs, Conformed Dimensions, Dimensions, Facts, Data Marts) • Environments and Environment Variables • Connection Managers (Package and Project) • Parameters (Package and Project) • Alerting • Logging • Exception Handling • Validation • Checkpoints • Transaction Processing (requires MSDTC) • Naming Conventions Load Meta Data is important since it can help us track load start and end times by package, table, and even cube processing, but it also can track load row counts for inserts and updates, it can provide restart ability that is more robust than SSIS checkpoints, and it can provide rollback information during a failure. Currently, there no tracking of load meta data and it is highly recommended. Package Sequencing controls the order of how the packages load the tables. In terms of packages, we have a master package and then child packages. The master package may call child packages such as a conformed dimension package, a dimension package, and a fact package. Child packages can also call data mart packages that duplicate data warehouse dimensions and facts to be used as data marts. Finally, SQL Jobs can be used to schedule different load patterns and times such as daily, every hour, and even weekly, or monthly. Since package sequencing is already working and not causing issues at this time, this is not a high priority for redesign. Environments and Environment Variables are used to provide a mechanism for changing project data connections and variables during a change control migration from one environment to another, such as development to test, or test to production. Currently, environments and environment variables are being used with success. Connection Managers can be either project, or package level. In most cases, connections that need to change from environment to environment, or will be used in many packages will be project connections. Any connections that are just required by 1 package and will not change between environments can be package connections. Parameters can be either project, or package level. In most cases, parameters that need to change from environment to environment, or will be used in many packages will be project parameters. Any parameters that are just required by 1 package and will not change between environments can be package parameters. Alerting in SSIS is provided by using a SMTP connection. This connection can then be used in a task flow, data flow, or even as an event handler such as OnError. Alerting is not enabled in the solutions evaluated. Alerting is highly recommended.
19.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 19 of 21 Logging can be very customized by using a custom logging schema and then tying that custom logging to the logging built into SQL Server 2012 and newer. Since a robust version Logging is provided in new versions of SQL Server that include verbose logging for troubleshooting, it is not recommended to make any changes to logging. An example custom logging diagram that can bridge to the data logged by new versions of SQL Server has been provided for your reference. Exception Handling in SSIS can be addressed in multiple methods. One method is exception data flows. These data flows can load exception data into flat files such as text files, or a table that stores exception data in an XML format. Another example of exception handling is rolling back data when an error occurs. This can be accomplished at the end of an error flow, or in an event handler such as OnError. Finally, exception handling can be combined with Alerting to let the appropriate technical and business users know of and issue, or delay. Since there is no exception handling in the currently evaluated packages, this is highly recommended. Validation has many solutions. The most common include row and column based validation combined with what are called sanity checks. In a fact table, column validation can sum the column and compare that to an aggregate value in the consolidation layer of the load process. For dimension validation, we can verify that the surrogate key for a specific user ties back to multiple source systems through the stored business keys. Sanity checks tend to focus on a known business rule and verify that a calculated business rule matches in multiple systems such as a source system, data warehouse, data mart, the cloud, and reporting tools. This validation can use load meta data as well as SQL Tasks to gather can validate complex scenarios (data sources) when necessary. Since little, to no validation is currently employed, it is highly recommended. Checkpoints are SSIS’s built in method for providing package restart ability. They are configured by providing a checkpoint location in the package level properties. The property name is CheckPointFileName. Two other properties need to be configured named CheckPointUsage and SaveCheckPoints. These properties are defined in the solutions we evaluated. It is recommended that some restart ability be designed and implemented. Transaction Processing is a feature in SSIS, however it requires that Microsoft Distributed Transaction Coordinator (MSDTC) be enabled. This coordinator comes with overhead and is not always well received by the DBA team. Since the SmithGroup JJR already uses SQL Tasks to call stored procedures that utilize the TSQL merge statement, we recommend not using transaction processing in SSIS, but rather at the SQL Server level. Naming Conventions in SSIS may seem elementary, but good naming conventions in SSIS can help with readability and maintenance, especially when introducing new developers to the ETL environment. A sample SSIS naming convention document has been provided.
20.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 20 of 21 Appendix A | Microsoft Data Warehouse On-Premises Architecture Below is a diagram that illustrates an ideal business intelligence (BI) architecture. This example is intended to show the many pieces available in the Microsoft BI stack. This diagram includes source systems (structured data) that flow into a SQL Server data repository. Data repositories offload the reporting loads onto non-production, transactional servers. The ELT process will stage the data required for that load only. There are three (3) paths for the data once it is staged. The first path is a quickly cleaned path intended for daily business reporting needs called an Operational Data Store. Moving down the diagram, the second path is the fact pipeline. The fact pipeline will begin to denormalize and prepare the fact data for transformation. The third path is the dimension pipeline. The dimension pipeline goes through two other tools offered in the Microsoft BI Stack. The first tool is Data Quality Services. This tool is used to cleanse the data. The second tool is Master Data Management. Master Data Management provides what the industry calls “Golden Record Management.” Golden record management gives you access to the most pure, validated, and complete picture of your individual records in your domain. Products like Profisee (https://profisee.com/grm) offer functionality beyond the built in tools offered out of the box with SQL Server. This extra “Golden Record” functionality include matching, de-duplication, mastering, and record harmonization. Profisee also offers graphical user interfaces, scorecards, and reports. Stage ODS DQS *DQS = Data Quality Services *MDM = Master Data Management MDM Data Warehouse (3NF) eDW Sales Schema CRM Schema Marketing Schema Microsoft Data Warehouse On-Premises Architecture *Data Steward(s) Fact Pipeline Dimension Pipeline Structured Data eDW Cube Farm Sales Cube CRM Cube Marketing Cube SharePoint Portal Excel SSRSSSAS Power QueryPower BI Data Repository Flat File Data Sales Data Customer Service (CRM) Data Accounting Data Human Resource (HR) Data Supply Chain Data Enterprise Systems Configuration Logging Audit Real Time Operational Reporting Transform Supplement Data Steward Tasks Data Steward(s) Define Business Rules Manage Master Data Subject Matter Expert (SME) Liaison Between Business and BI Team Structured Data Identify Business Question Define Staffing Roles Data Discovery Establish Data Stewards Agree on Business Rules Determine Master Data Lists Business Tasks Data Mart Data Mart Star Schema eAnalyitics Sales CRM Plan Prototype Around Question Product Licensing Examine Existing Infrastructure Determine eDW Infrastructure Plan Security / Kerberos Develop eDW Architecture BI Team Tasks Figure 7 | Microsoft Data Warehouse On-Premises Architecture
21.
Technical Analysis of
BI Environment © AIM Report Writing, 2017 Page 21 of 21 Appendix B | Design Questions to Review How is data from multiple sources consolidated? An example is currently, when we model DataVault, we see 3 person tables, a vision.Person, an ultipro.Person, and a dbo.person. According to the DBAs, dbo.Person is a consolidated version of Person. This begs another question of what logic is used to consolidate the 2 different versions of Person. Is there a reference, or lookup table? What was the reasons and domain knowledge used to use GUIDs and not INTs for Surrogate Keys? Confusing nomenclature for DataVault, DataMart, and DataLake databases. May want to rethink these names to not confuse the more general and industry accepted meanings.