BI 101 Presentation and examples of some of my work. Background information on Business Intelligence; BI Tool and Vendor Analysis; Current/Upcoming technology we are exploring and hope to leverage in the near future
Selecting BI Tool - Proof of Concept - Андрій Музичук
Bi Lunch And Learn Examples
1. BI/DW 101
Introduction to Business Intelligence at Guaranty Bank
Erik Okerholm, Business Intelligence
2. Agenda
• Business Intelligence Overview
• Data Flow, Data Availability/SLAs
• BI at Guaranty Bank
– Query/Report Examples
• Terminology and Concepts (Modeling, Dim/Fact)
• Current Environment
• BI Future
• Q&A
2
4. What is Business Intelligence?
“Business Intelligence is actually an environment in which business
users receive data that is reliable, consistent, understandable, easily
manipulated and timely. With this data, business users are able to
conduct analyses that yield overall understanding of where the business
has been, where it is now and where it will be in the near future.
Business Intelligence serves two main purposes:
1. It monitors the financial and operational health of the organization
(reports, alerts, alarms, analysis tools, key performance indicators
and dashboards).
2. It also regulates the operation of the organization providing two-
way integration with operational systems and information feedback
analysis.”
Source: DM Review
4
5. What is Business Intelligence?
The discipline of understanding the business abstractly
and often from a distance.
With business intelligence, you can see the forest and the trees
5
6. BI Reporting Areas
Accounting
Deposit
Admin & Bank Ops
Risk Ops
BI DW
Fraud Retail Bank
Marketing
6
7. What Data is available?
• Deposit information
– IM/ST Account Snapshots
– IM/ST Transactions
– RM Customer Details (Customer Records, Airmiles, AMEX
Rewards, Account Relationships)
– RF (Card) Details
– Branch, Account Types, Sales & Service and VRU Activity
• General Ledger information
– Income & Expense
– Assets & Liabilities
– Responsibility/Cost Center and Structures
– Natural Accounts and Structures
7
8. The Data Mart contains both
Daily and Monthly Data
Daily Data Monthly Data
Deposits
IM/ST Account Snapshots IM/ST Transactions
S&S, VRU Activity Onboarding
Account “Events” RM Customer Details
General Ledger RF (Card) Detail
RCs, Natural Account
Income and Expense
Assets and Liabilities
8
10. Business Intelligence Data Flow
Data Warehouse
Masterpiece
Data
GL
Profiling, GL
Source RDBMS System
Analysis,
Reports
Fidelity Reports
Extraction GL
Future
Transform, MDB
Retail Systems Cleanse, & Load
Customer
RDBMS
Profitability
Reports
Investments Deposits
Ad Hoc
Reports
Lending Systems
RDBMS
Lending
Lending
Central Metadata System
Reports
Financial Systems Future Lending Reports
System
Other
Data Modeling Tool
ERWIN, Visio
Extract/Transform/
Data Sources Data Mart Targets 10
Load (Informatica)
11. Data Availability – Service Level
Agreements
• Customer Account Activity Data = 7am
• General Ledger Data = 8am
– Historically, over the last few months
• CP is ready by 5:30am and
• GL by 6:30am
11
12. What is…GB Data Warehouse? Intelligence tool?
Hyperion? Business
SQL Databases?
GB Enterprise GB Enterprise Data Business Purpose
Application/Tool
Hyperion HFM Hyperion Database – GL data Vendor application tailored for
summary & RC level external reporting; also used for
internal financial statement
preparation
Hyperion Planning Hyperion Planning Database – Vendor application tailored for
Budget & Planning data at budgeting and planning
summary & RC level
Hyperion Interactive Reporting GB Data Warehouse Vendor tool to enable building of
(aka Business Intelligence/BI Tool) • Retail Deposit Data Mart business cases, in-house
• General Ledger Data Mart applications, performing enterprise
reporting, ad-hoc queries, what if &
trend analysis
Access or Excel “silo” SQL Databases End user tools for sourcing
disparate data sources, performing
departmental reporting & analysis
3
13. GB SQL Data Flow
IM
IM ST
RM RF
ST Deposit
Fidelity Reports
I&E A&L RM
Masterpiece (GL) RF
GL
GL Reports
Retail Systems
ALS
CLCS
Lending
Lending Systems
Reports
AP
Other
SQL Databases Departmental Access DBs
& Reporting
End Users
Disparate DBs & MS Access DBs & Departmental MS Access & Excel
Data Sources
Load Processes Depart. Processes Report Preparation Reports
13
14. Comparison:
GB Data Warehouse vs. SQL Databases
Subject DB(s) Data Sources Data Acquisition
GB Data Warehouse IM, ST, RM, RF, OLB, Automated & repeatable processes;
• Retail Deposit Data Mart VRU, Sales & Service built-in relationships for consumption
• GL Data Mart Masterpiece GL of multiple data sources; application of
standardized business rules
SQL Disparate Databases IM Manual processes pulled into
ST secondary, departmental Access
RM databases for user manipulation,
RF analysis & reporting; no relationships
GL between data sources; application of
AP non-standardized business rules
ALS
CLCS
14
15. BI Customers & Content
Customers / Content Description
Customers Marketing Intelligence
Bank Operations (Deposits, Risk Ops)
SIG (Retail Finance)
IS&T Finance
Financial Accounting & Reporting
Retail Deposit Data Mart (est. 2004)
Data: 5.5 yrs EOM / 13 Months Rolling Daily IM/ST Individual Account Records (Daily)
(ADS) IM/ST Transactions (Summary)
• Analytics & Program Development
• Pricing RM Customer Details (Customer Records, Account
• Reporting Relationships, Airmiles & AMEX Rewards)
• Sales & Service Support RF (Card) Details
• Consumer Checking Onboarding Account Types, Branches, Sales & Service and VRU
• Periodic Bank Ops reports Activity, Online Banking
• Ad-hoc query & analysis
Customer Profitability Data Mart (est. 2006) Income, Expense, Assets, Liabilities
Data: 5 yrs EOM Rolling Detailed Transactions (vendor information)
• Monthly P&L Reports and Variance Analysis Responsibility/Cost Center Structures
Natural Account Structures
15
16. BI Business Value Examples
Business Process Value
Program Development
Consumer Onboarding Projected 5-yr cumulative impact - $6.6M
Projected IRR = 186%
Product Management – Reversed negative checking account trend
Guaranty Checking Net increase in 2008 of ~13k accounts with value of $2M
Check Card Utilization Projected 5-yr cumulative impact - $1.8M
Projected ROI = 150%
4Q08 Deposit Gathering Increase CD & liquid savings deposits by $1.5B
Analysis & Reporting
Fee Income Analysis (NSF Tiers) “what if” analysis performed by Marketing in one day vs.
estimated 6-8 weeks w/out BI
Insider Reporting Saving 15+ hours/quarter and 1 hr/month on report
generation and export, submitted to Legal
GL Reporting for Bank Operations Saved 13 hours/month of manual effort on variance analysis
16
18. BI Terminology
• OLTP vs. Dimensional vs. OLAP
• Normalization vs. Denormalization
• Schemas, Star vs. Snowflake
• Dim vs. Fact Tables vs. Views (SCDs)
• Relationships (parent/child), Hierarchies
• Facts, Attributes
• Aggregates
• Conformed Dimensions
• Metadata
• Cube (Physical vs Virtual) , Cube Farms
• Object-Oriented
18
19. OLTP vs. OLAP
• OLTP (Online Transactional Processing)
– OLTP systems are optimized for fast and reliable transaction handling.
– Compared to data warehouse systems, most OLTP interactions will
involve a relatively small number of rows, but a larger group of tables.
– Data is more current
• OLAP (Online Analytical Processing)
– Dynamic, multidimensional analysis of historical data, which supports
activities such as the following:
• Calculating across dimensions and through hierarchies
• Analyzing trends
• Drilling up and down through hierarchies
• Rotating to change the dimensional orientation
• OLAP tools can run against a multidimensional database or interact
directly with a relational database.
19
20. Normalization
• Normalization is the process of efficiently organizing data
in a database.
• There are two goals of the normalization process:
1. Eliminating redundant data (for example, storing the same data in
more than one table) and
2. Ensuring data dependencies make sense (only storing related
data in a table).
• Both of these are worthy goals as they reduce the amount
of space a database consumes and ensure that data is
logically stored.
20
21. Normal Forms (NF)
First Normal Form (1NF)
• First normal form (1NF) sets the very basic rules for an organized database:
Eliminate duplicative columns from the same table.
• Create separate tables for each group of related data and identify each row
with a unique column or set of columns (the primary key).
Second Normal Form (2NF)
• Second normal form (2NF) further addresses the concept of removing
duplicative data: Meet all the requirements of the first normal form.
• Remove subsets of data that apply to multiple rows of a table and place them
in separate tables.
• Create relationships between these new tables and their predecessors
through the use of foreign keys.
Third Normal Form (3NF)
• Third normal form (3NF) goes one large step further: Meet all the
requirements of the second normal form.
• Remove columns that are not dependent upon the primary key.
21
22. Third Normal Form (3NF)
Third Normal Form (3NF):
• 3NF schemas are typically chosen for large data warehouses, especially
environments with significant data-loading requirements that are used to feed
data marts and execute long-running queries.
"Nothing but the key"
A memorable summary of EF Codd's definition of 3NF, paralleling the traditional
pledge to give true evidence in a court of law, was given by Bill Kent:
“Every non-key attribute "must provide a fact about the key, the whole key, and
nothing but the key, so help me Codd”.
22
23. Schema Designs - Star
The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the
entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center
of the star consists of a large fact table and the points of the star are the dimension tables.
A star schema is characterized by one or more very large fact tables that contain the primary information in the
data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains
information about the entries for a particular attribute in the fact table.
23
24. Schema Designs - Snowflake
The snowflake schema is a variation of the star schema, featuring normalization of dimension tables.
A snowflake schema is a logical arrangement of tables in a relational database such that the entity relationship diagram resembles a
snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are
connected to multiple dimensions. In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the
star schema's dimensions are denormalized with each dimension being represented by a single table. When the dimensions of a
snowflake schema are elaborate, having multiple levels of relationships, and where child tables have multiple parent tables ("forks in the
road"), a complex snowflake shape starts to emerge. The "snowflaking" effect only affects the dimension tables and not the fact tables.
24
25. Dimensional Tables (SCDs)
In data warehousing, a dimension table is one of the set of companion tables to a fact table.
The fact table contains business facts or measures and foreign keys which refer to candidate
keys (normally primary keys) in the dimension tables.
The dimension tables contain attributes (or fields) used to constrain and group (“slice and dice”)
data when performing data warehousing queries. Typically dimension tables are named with a
“_dim” suffix
Over time, the attributes of a given row in a dimension table may change. For example, the
shipping address for a company may change. Kimball refers to this phenomenon as Slowly
Changing Dimensions (SCD). Strategies for dealing with this kind of change are divided into
three categories:
Type 1 - Simply overwrite the old value(s).
Type 2 - Add a new row containing the new value(s), and distinguish between the rows
where a change occurred
Type 3 - Add a new attribute to the existing row.
25
26. Fact Tables
• A table in a star schema that contains facts. A fact table typically has
two types of columns:
1. those that contain facts and
2. those that are foreign keys to dimension tables.
• The primary key of a fact table is usually a composite key that is made
up of all of its foreign keys.
• A fact table might contain either detail level facts or facts that have
been aggregated (fact tables that contain aggregated facts are often
instead called summary tables). A fact table usually contains facts with
the same level of aggregation.
26
27. Views – The “Other” Database Object
• In database theory, a view consists of a stored query accessible as a virtual
table composed of the result set of a query. Unlike ordinary tables (base
tables) in a relational database, a view does not form part of the physical
schema: it is a dynamic, virtual table computed or collated from data in the
database. Changing the data in a table alters the data shown in subsequent
invocations of the view.
– Views can provide advantages over tables:
– Views can represent a subset of the data contained in a table
– Views can join and simplify multiple tables into a single virtual table
– Views can act as aggregated tables, where the database engine aggregates
data (sum, average etc) and presents the calculated results as part of the data
– Views can hide the complexity of data; for example a view could appear as
Sales2000 or Sales2001, transparently partitioning the actual underlying table
– Views take very little space to store; the database contains only the definition
of a view, not a copy of all the data it presents
– Depending on the SQL engine used, views can provide extra security
27
28. Hierarchies and M:1 Relationships
Hierarchies
• A hierarchy is a set of levels having many-to-one relationships between each other, and
the set of levels collectively makes up a dimension. In a relational database, the
different levels of a hierarchy can be stored in a single table (as in a star schema) or in
separate tables (as in a snowflake schema).
Many-to-one relationships
• A many-to-one relationship is where one entity (typically a column or set of columns)
contains values that refer to another entity (a column or set of columns) that has unique
values. In relational databases, these many-to-one relationships are often enforced by
foreign key/primary key relationships, and the relationships typically are between fact
and dimension tables and between levels in a hierarchy. The relationship is often used
to describe classifications or groupings.
• For example, in a geography schema having tables Region, State and City, there are
many states that are in a given region, but no states are in two regions. Similarly for
cities, a city is in only one state (cities that have the same name but are in more than
one state must be handled slightly differently). The key point is that each city exists in
exactly one state, but a state may have many cities, hence the term "many-to-one."
Region State City
28
29. Cube Farms
BI Cube Farms Intelligent Cubes
Cubes for each application
Cubes for varying
levels of security
Relational Database
Cubes for increasing Data Depth
• Fragmented Management • Centralized Management
• Data Latency • Automatic Data Refresh
• Dedicated Building Process • No Separate Building Process
• Manual to Push to Users • On Demand Loading
• Limited Data Size • Full and Immediate Data Access
• Manual Security Coding • Full Integrated Security 29
37. “Getting Data Into The Warehouse”
• We use The Informatica PowerCenter Suite for ETL
(Extraction, Transformation, and Loading)
• Extremely powerful yet GUI based ETL Tool.
• Industry leader for data integration
• Potential future leverage of this toolset
– Data Profiling and Cleansing
– Data Matching and Lineage
– EAI (Enterprise Application Integration)
– MDM (Master Data Management)
37
38. Data Flows via Informatica
Source/Target Types:
• Db and/or Table,
• Flat File (csv, txt),
• Spreadsheet,
• PDF
Transformations:
• Expressions
• Aggregaters
• Filters
• Joiners
• Look ups
• Routers
• Unions
38
50. New BI Tool RFI
(Completed Fall 2008)
• Over 230 hours were spent on an extensive and
encompassing analysis of business, reporting, user, and
administrative support requirements across the our most
technical business unit, Marketing.
• We
– Participated in numerous Vendor/Analysis calls with Gartner
– Purchased 3rd Party and Vendor analyses
– Requested Information, a completed comprehensive questionnaire
(some 100+ questions), and product quotes from BI Vendors
– We independently and internally scored their responses
– Reviewed with the Business our recommendation and why.
50
51. Summary
• Top two vendors based on market data & Gartner calls:
1. MicroStrategy (MSTR)
2. Oracle (OBIEE)
• Both MSTR & Oracle offering discount pricing
• MSTR fulfills all business and IT requirements and is noted for
requiring few IT support personnel
• Gartner comments on MSTR:
• Fewest weaknesses • Easy IT maintenance
• Elegant • No main functionality lacking
• Strong performance • Excellent dashboards
• Scalable • Scalable
• PMML support • $ only downside (historically)
51
52. MicroStrategy is the Best Overall BI Technology According to the
Most Recent Analyst Evaluations and Customer Surveys
Customer Survey QlikTech Oracle Cognos Board SAS
#2 #3 #4 #5 #6
James Richardson MicroStrategy
Magic Quadrant 367 Companies
Customer Survey 12 Core BI Capabilities
#1 Microsoft Applix Bus. Obj. IBI Spotfire
March 2008 #7 #8 #9 #10 #11
Customer Survey
Applix IBI Microsoft AS Hyperion Microsoft RS
Nigel Pendse #2 #3 #4 #5 #6
1,901 Companies MicroStrategy
58 Countries
17 Major Categories
#1 Cognos AS Cognos RS Bus. Obj. SAP B.O. Crystal
Feb 2008 #7 #8 #9 #10 #11
Analyst Evaluation &
IBI Oracle SAS Hyperion Microsoft
Customer Survey
#2 #3 #4 #5 #6
MicroStrategy
Daan Van Beek
Norman Manley #1 Cognos Bizzcore Bus. Obj. SAP Actuate
70 Evaluation Criteria #7 #8 #9 #10 #11
Nov 2007
Analyst Evaluation Oracle IBI Cognos SAP Hyperion
Kurt Schlegel #1 tie #1 tie #4 #5 #6
BI Platform Bhavish Sood
MicroStrategy
Capabilities 12 BI Capabilities #1 Bus Obj. QlikTech Panorama Microsoft SAS
Rating 220 Distinct Criteria
#7 #8 #8 #10 #11
April 2007
Analyst Evaluation Bus. Obj. Cognos SAS Oracle IBI
#2 #3 tie #3 tie #3 tie #6
Cindi Howson MicroStrategy
Hands-on testing
100+ Criteria Tested #1 Microsoft QlikTech
May 2008 #6 tie #8
52
53. Gartner Magic Quadrant Customer
Survey: Survey of BI Customers in Support of the Gartner Magic Quadrant
Analysis for BI Platforms
53
54. BI Survey 7: BI Technology Rankings According to the BI
Survey 7
The Largest Independent Survey of BI, Involving Over 1,900 Companies
54
55. BI Product Survey: Evaluation and Survey Conducted by
Passioned International, a Leading BI Analyst Firm in the
Netherlands
55
56. Gartner BI Platform Capability Evaluation:
Comprehensive, Point-by-point Evaluation of all Major BI Products
56
57. The BI Scorecard: Comprehensive Hands-on Evaluation
of BI Products by Cindi Howson, Author, Industry Analyst, and
President of ASK
57
58. What the future brings…
• …and where we want to go with BI.
58
71. MicroStrategy Abstracts the Business Model From the
Physical Model Using a Layered Object-Oriented Metadata
APPLICATION CONFIGURATION
Define application-wide settings
User Administration, Security, Performance
REPORT DESIGN
Assemble insightful, visually appealing reports
Layout, Format, Calculations
BUSINESS ABSTRACTION
Build reusable report components Metrics, Filters,
Prompts, Templates, Custom Groupings
DATA ABSTRACTION
Insulate business constructs from data sources
Tables, Attributes, Facts, Hierarchies,
Transformations
DATA SOURCES
Access all corporate data source
Schema neutrality, Database Optimizations
71
72. WYSIWYG Report Design Makes it Possible for
Business Users to Refine Report Designs Using
Common Microsoft Office-like Skills
72
83. #2) User Scalability Without The Staffing or
Cost Burden
Total Cost of Ownership (TCO) assesses costs over the lifecycle of an application. Industry analysts
agree that TCO is dominated by recurring costs and not by one-time purchase costs.
3 Year Typical Enterprise Software TCO Breakdown
Staffing is Largest
TCO Component
in BI Applications
Note: The figure is based on over 300 interviews conducted across numerous platforms, presented in composite form. Source: IDC Study 2007
Gartner, a leading analyst firm, estimates that customers spend up to four times the initial cost
of their software license every year they own their BI applications. The vast majority of
these recurring costs are personnel or staffing costs.
IDC, another leading research firm, concludes that staffing constitutes 60%-85%
of the overall enterprise software ownership costs over three years.
83
84. MicroStrategy Customer Data Shows
Reduced Staffing Costs
IT Resource Efficiency
50
40 Other BI
IT Staff (60% of TCO)
30
20
10
0
300 500 1000 2000 4000 10000 20000 40000
User Population
**Note: MicroStrategy 8 based on results of MicroStrategy customer research study of over 80 production deployments.
Other BI based on competitive sales cycle feedback.
84
85. As BI Systems Expand, Administration Becomes a
Key Driver in the Total Cost of Ownership
1,000 Users
25 Users
1,000 Reports
100 Reports
1 BI Application Many BI Applications
Finance Finance Marketing
Sales HR
1 Data Source
Many Data Sources
DWH
Operational Cube
Databases
DWH Databases
1 Full Time Administrator 1 Full Time Administrator
85
86. A Complete and Multilayered Metadata Effectively
Minimizes the Number of Moving Parts
Administrators Need to Create and Maintain
Consider Minor Changes to a BI System with 1,000 Reports:
No Atomic Elements Partial Set of Atomic Complete Set of Atomic
Elements Elements
Report Creation 1,000 Reports = 1,000 SQL 1,000 Reports = 200 1,000 Reports = 20
Statements Metadata Objects** Metadata Objects**
Reusability No Reusability Limited Reusability Full Reusability
Parameterization No Parameterization Limited Parameterization Full Parameterization
Maintenance Overhead •1,000 SQL Statements •200 Objects*** •20 Objects***
•1,000 man hrs at 1 hr per •100 man hrs at 0.5 hr per •10 man hrs at 0.5 hr per
report object object
Assumptions:
* Minor changes include changes to calculations, levels of aggregation, attributes,
number of columns, and filtering criteria
** Reports are created with underlying MD objects
*** Assumes changes to metadata objects will automatically cascade to reports
86
87. Automatic Monitoring Helps Reduce HW and
Downtime Costs
22% of TCO
Source: IDC Study 2007
1. Performance Analysis:
• Fine tune BI system for maximum performance Minimize
• Optimize HW utilization HW Costs
• Track User Activity
2. Operational Analysis:
• Monitor daily trends Minimize
• Reduce unplanned system downtime Downtime
• Predict future capacity requirements
87
88. # 2) Highest User Scalability
Reports Can Be Delivered Through Users’ Interface of Choice
88
94. Semantic-Based User Profiles Enable
Fine-Tuned-Control
The security architecture gives administrators fine-grained control of every
user along three dimensions of privileges and permissions, allowing each user to access just
the functionality their skills can accommodate and just the data they are allowed to see. 94
95. #5) Highest Data Scalability
Heterogeneous Database Access via MicroStrategy ROLAP Architecture
95