Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
20IT501_DWDM_PPT_Unit_I.ppt
1. 20IT501 – Data Warehousing and
Data Mining
III Year / V Semester
2. OBJECTIVES
• The Student should be made to:
Be familiar with the concepts of data warehouse
Know the fundamentals of data mining
Understand the importance of association rule mining
Understand the techniques of classification and
clustering
Be aware of the recent trends of data mining
3. UNIT I- DATA WAREHOUSING
Data Warehouse – Data warehousing Components – Building a
Data warehouse – Mapping the Data Warehouse to a
Multiprocessor Architecture – DBMS Schemas for Decision
Support. Business Analysis: Reporting and Query tools and
Applications - Online Analytical Processing (OLAP) – Need –
Multidimensional Data Model – OLAP Guidelines -
Categories of OLAP Tools.
4. Data Warehouse
A warehouse is a subject-oriented, integrated,
time-variant and non-volatile collection of data
in support of management’s decision making
process
5. Data Warehouse
Subject Oriented:
it provides information around a subject rather than the
organization's ongoing operations.
These subjects can be product, customers, suppliers, sales,
revenue, etc.
A data warehouse does not focus on the ongoing
operations, rather it focuses on modelling and analysis of
data for decision making.
6. Data Warehouse
Integrated:
A data warehouse is constructed by integrating
data from heterogeneous sources such as relational
databases, flat files, etc.
This integration enhances the effective analysis of
data.
7. Data Warehouse
Time Variant:
The data collected in a data warehouse is
identified with a particular time period.
The data in a data warehouse provides information
from the historical point of view.
8. Data Warehouse
Non-volatile
Non-volatile means the previous data is not erased
when new data is added to it.
A data warehouse is kept separate from the operational
database and therefore frequent changes in operational
database is not reflected in the data warehouse.
9. Data Warehouse
It is a database designed for analytical tasks, using data
from multiple applications.
It supports a relatively small number of users with
relatively long interactions.
Its content is periodically updated.
It contains current and historical data to provide a
historical perspective of information.
10. Data Warehouse - Terminologies
Current detail data: It is organized along
subject lines(customer profile data, sales data)
Data Mart: Summarized departmental data
and is customized to suit the needs of a
particular department that owns the data.
11. Data Warehouse - Terminologies
Drill-down: To perform business analysis in a top-
down fashion.
Metadata: Data about data. It contains the location and
description of warehouse system components; names,
definition, structure and content of the data warehouse
and end-users views.
12. Data Warehouse
Operational Data Information Data
Content Current Values Summarized, Derived
Organization Application Subject
Stability Dynamic Static until refreshed
Data Structure Optimized for transactions Optimized for complex
queries
Access High Medium to low
Response time Sub second (<1s) to 2-3s Several seconds to minutes
14. Data Warehouse Components
Data Warehouse Database:
It is based on a relational database management system server
that functions as the central repository for informational data.
This repository is surrounded by a number of key components
designed to make the entire functional, manageable and
accessible by both the operational systems that source data into
the warehouse and by end-user query and analysis tool.
15. Data Warehouse Components
Sourcing, Acquisition, Clean up, and
Transformation Tools:
They perform conversions, summarization, key
changes, structural changes and condensation.
The data transformation is required so that the
information can be used by decision support tools.
16. Data Warehouse Components
Sourcing, Acquisition, Clean up, and Transformation
Tools – Functionalities:
To remove unwanted data from operational db
Converting to common data names and attributes
Calculating summaries and derived data
Establishing defaults for missing data
Accommodating source data definition changes
17. Data Warehouse Components
Sourcing, Acquisition, Clean up, and
Transformation Tools – Issues:
Data heterogeneity: It refers to the different way
the data is defined and used in different modules.
19. Data Warehouse Components
Meta data:
It is data about data. It is used for maintaining,
managing and using the data warehouse.
Types:
Technical Meta data
Business Meta data
20. Data Warehouse Components
Meta data - Technical Meta data:
Information about data stores
Transformation descriptions
The rules used to perform clean up, and data enhancement
Data mapping operations
Access authorization, backup history, archive history, info
delivery history, data acquisition history, data access etc.,
21. Data Warehouse Components
Meta data – Business Meta data:
It contains information that stored in data warehouse
to users.
Subject areas, and info object type including queries,
reports, images, video, audio clips etc.
Internet home pages
Information related to info delivery system
22. Data Warehouse Components
Meta data:
Meta data helps the users to understand content and find
the data.
Meta data are stored in a separate data stores which is
known as informational directory or Meta data repository.
It helps to integrate, maintain and view the contents of the
data warehouse
23. Data Warehouse Components
Meta data – Characteristics:
It is the gateway to the data warehouse environment
It supports easy distribution and replication of content for high
performance and availability
It should be searchable by business oriented key words
It should support the sharing of information
It should support and provide interface to other applications
24. Data Warehouse Components
Access Tools:
Data query and reporting tools
Application development tools
Executive info system tools (EIS)
OLAP tools
Data mining tools
25. Data Warehouse Components
Data Marts:
Departmental subsets that focus on selected subjects.
They are independent used by dedicated user group.
They are used for rapid delivery of enhanced decision
support functionality to end users.
26. Data Warehouse Components
Data warehouse admin and management:
Security and priority management
Monitoring updates from multiple sources
Data quality checks
Managing and updating meta data
Auditing and reporting data warehouse usage and status
Replicating, sub setting and distributing data
Backup and recovery
27. Data Warehouse Components
Information delivery system
It is used to enable the process of subscribing for
data warehouse info.
Delivery to one or more destinations according to
specified scheduling algorithm
28. Building a Data Warehouse -
Reason
Business perspective:
Decisions need to be made quickly and correctly,
using all available data.
Users are business domain expert, not computer
professionals
Competition is heating up in the areas of business
intelligence and added information value
29. Building a Data Warehouse -
Reason
Technological perspective:
The price of computer processing speed continues
to decline.
The price of digital storage is rapidly dropping.
Network bandwidth is increasing.
30. Building a Data Warehouse
Approaches
Top - Down Approach
Bottom - Up Approach
31. Building a Data Warehouse
Approaches: Top - Down Approach
To build a centralized repository to house corporate
wide business data.
This repository is called Enterprise Data Warehouse
(EDW).
The data in the EDW is stored in a normalized form in
order to avoid redundancy.
32. Building a Data Warehouse
Approaches: Top - Down Approach
The data in the EDW is stored at the most detail level.
The reason to build the EDW on the most detail level
is to leverage
Flexibility to be used by multiple departments.
Flexibility to cater for future requirements.
33. Building a Data Warehouse
Approaches: Top - Down Approach
The data in the EDW is stored at the most detail level.
The reason to build the EDW on the most detail level
is to leverage
Flexibility to be used by multiple departments.
Flexibility to cater for future requirements.
34. Building a Data Warehouse
Approaches: Top - Down Approach
We should implement the top-down approach when
The business has complete clarity on all or multiple subject
areas data warehouse requirements.
The business is ready to invest considerable time and money.
35. Building a Data Warehouse
Approaches: Top - Down Approach –
Advantages:
This is very important for the data to be reliable,
Consistent across subject areas
36. Building a Data Warehouse
Approaches: Top - Down Approach –
Disadvantages:
It requires more time and initial investment
The business has to wait for the EDW to be
implemented followed by building the data marts
before which they can access their reports
37. Building a Data Warehouse
Approaches: Bottom Up Approach
It is an incremental approach to build a data
warehouse.
Here we build the data marts separately at
different points of time as and when the specific
subject area requirements are clear.
38. Building a Data Warehouse
Approaches: Bottom Up Approach
The data marts are integrated or combined
together to form a data warehouse.
Separate data marts are combined through the use
of conformed dimensions and conformed facts.
39. Building a Data Warehouse
Approaches: Bottom Up Approach
We should implement the bottom up approach
when
We have initial cost and time constraints.
The complete warehouse requirements are not clear. We
have clarity to only one data mart.
40. Building a Data Warehouse
Approaches: Bottom Up Approach -
Advantages
They do not require high initial costs and have a
faster implementation time;
Hence the business can start using the marts much
earlier as compared to the top-down approach.
41. Building a Data Warehouse
Approaches: Bottom Up Approach -
Disadvantages
It stores data in the de normalized format, hence
there would be high space usage for detailed data.
We have a tendency of not keeping detailed data
42. Building a Data Warehouse
Design considerations: Most successful data warehouses
that meet these requirements have these common
characteristics:
Are based on a dimensional model
Contain historical and current data
Include both detailed and summarized data
Consolidate disparate data from multiple sources while retaining
consistency
43. Building a Data Warehouse
Design considerations: Data warehouse is
difficult to build due to the following reason:
Heterogeneity of data sources
Use of historical data
Growing nature of data base
44. Building a Data Warehouse
Data content
The content and structure of the data warehouse are reflected in
its data model.
The data model is the template that describes how information
will be organized within the integrated warehouse framework.
The data warehouse data must be a detailed data. It must be
formatted, cleaned up and transformed to fit the warehouse data
model.
45. Building a Data Warehouse
Meta data
It defines the location and contents of data in the
warehouse.
Meta data is searchable by users to find definitions or
subject areas.
In other words, it must provide decision support oriented
pointers to warehouse data and thus provides a logical link
between warehouse data and decision support applications.
46. Building a Data Warehouse
Data Distribution
The data can be distributed based on the subject
area, location (geographical region), or time
(current, month, year).
47. Building a Data Warehouse
Tools
These tools provide facilities for defining the
transformation and cleanup rules, data movement,
end user query, reporting and data analysis.
48. Building a Data Warehouse
Design steps
Choosing the subject matter
Deciding what a fact table represents
Identifying and conforming the dimensions
Choosing the facts
Storing pre calculations in the fact table
49. Building a Data Warehouse
Design steps
Rounding out the dimension table
Choosing the duration of the db
The need to track slowly changing dimensions
Deciding the query priorities and query models
50. Building a Data Warehouse
Technical Constraints – Issues:
The hardware platform that would house the data warehouse
The dbms that supports the warehouse data
The communication infrastructure that connects data marts,
operational systems and end users
The hardware and software to support meta data repository
The systems management framework that enables admin of the
entire environment
51. Building a Data Warehouse
Implementation considerations
Collect and analyze business requirements
Create a data model and a physical design
Define data sources
Choose the database Technology and platform
Extract the data from operational database, transform
it, clean it up and load it into the warehouse
52. Building a Data Warehouse
Implementation considerations
Choose database access and reporting tools
Choose database connectivity software
Choose data analysis and presentation s/w
Update the data warehouse Access tools
53. Building a Data Warehouse
Access Tools
Simple tabular form data
Ranking data
Multivariable data
Time series data
Graphing, charting and pivoting data
54. Building a Data Warehouse
Access Tools
Statistical analysis data
Predefined repeatable queries
Ad hoc user specified queries
Reporting and analysis data
55. Building a Data Warehouse
Data extraction, clean up, transformation and
migration – Selection Criteria:
Timeliness of data delivery to the warehouse
The tool must have the ability to identify the
particular data and that can be read by conversion tool
The tool must have the capability to merge data from
multiple data stores.
56. Building a Data Warehouse
Data extraction, clean up, transformation and
migration – Selection Criteria:
The tool should have the ability to read data from data
dictionary
The code generated by the tool should be completely
maintainable
The tool should permit the user to extract the required data
57. Building a Data Warehouse
User levels - Casual users:
They are most comfortable in retrieving information
from warehouse in pre defined formats and running pre
existing queries and reports.
These users do not need tools that allow for building
standard and ad hoc reports
58. Building a Data Warehouse
User levels - Power Users:
The user can use pre defined as well as user defined
queries to create simple and ad hoc reports.
These users can engage in drill down operations.
These users may have the experience of using
reporting and query tools.
59. Building a Data Warehouse
User levels - Expert users:
These users tend to create their own complex
queries and perform standard analysis on the info
they retrieve.
These users have the knowledge about the use of
query and report tools
60. Building a Data Warehouse
Benefits of data warehousing:
Locating the right information
Presentation of information
Testing of hypothesis
Discovery of information
Sharing the analysis
61. Building a Data Warehouse
Tangible benefits (quantified / measureable):It
includes
Improvement in product inventory
Decrement in production cost
Improvement in selection of target markets
Enhancement in asset and liability management
62. Building a Data Warehouse
Intangible benefits (not easy to quantified): It
includes,
Improvement in productivity by keeping all data in
single location and eliminating rekeying of data
Reduced redundant processing
Enhanced customer relation
63. Mapping the Data Warehouse to a
Multiprocessor Architecture
Parallel Query Processing:
The query optimization in distributed memory parallel
systems.
Parallel RDBMS processing offers the solution to the
traditional problem of poor RDMS performance for
complex queries and very large databases.
64. Mapping the Data Warehouse to a
Multiprocessor Architecture
Parallel Query Processing:
The accessing and processing of portion of database by
individual threads in parallel can greatly improve the
performance of the query.
Speed up: The ability to execute the same request on the
same amount of data in less time.
Scale up: The ability to obtain the same performance on
the same requests as the database size increases
65. Mapping the Data Warehouse to a
Multiprocessor Architecture
Types of parallelism:
Inter query Parallelism: In which different server threads
or processes handle multiple requests at the same time.
Intra query Parallelism: This form of parallelism
decomposes the serial SQL query into lower level
operations such as scan, join, sort etc. Then these lower
level operations are executed concurrently in parallel.
66. Mapping the Data Warehouse to a
Multiprocessor Architecture
Types of parallelism: Intra query parallelism can
be done in either of two ways:
Horizontal parallelism: which means that the data
base is partitioned across multiple disks and parallel
processing occurs within a specific task that is
performed concurrently on different processors against
different set of data
67. Mapping the Data Warehouse to a
Multiprocessor Architecture
Types of parallelism: Intra query parallelism can
be done in either of two ways:
Vertical parallelism: This occurs among different
tasks. All query components such as scan, join, sort etc
are executed in parallel in a pipelined fashion. In other
words, an output from one task becomes an input into
another task.
68. Mapping the Data Warehouse to a
Multiprocessor Architecture
Data partitioning
Data partitioning is the key component for effective
parallel execution of data base operations.
It spread data from database tables across multiple
disks so that I/O operations such as read and write can
be performed in parallel.
Partition can be done randomly or intelligently
69. Mapping the Data Warehouse to a
Multiprocessor Architecture
Data partitioning
Random portioning includes random data striping
across multiple disks on a single server. Another
option for random portioning is round robin fashion
partitioning in which each record is placed on the next
disk assigned to the data base.
Since DBMS doe not know where each record resides.
70. Mapping the Data Warehouse to a
Multiprocessor Architecture
Data partitioning
Intelligent partitioning assumes that DBMS knows
where a specific record is located and does not waste
time searching for it across all disks. The various
intelligent partitioning include:
Hash partitioning, Key range partitioning, Schema
portioning and User defined portioning
71. Mapping the Data Warehouse to a
Multiprocessor Architecture
Data base architectures of parallel processing:
Shared memory or shared everything Architecture
Shared disk architecture
Shred nothing architecture
72. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared memory or shared everything Architecture:
Multiple PUs share memory.
Each PU has full access to all shared memory through a common
bus.
Communication between nodes occurs via shared memory.
Performance is limited by the bandwidth of the memory bus.
It is providing consistent single system to the user.
73. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared memory or shared everything
Architecture:
74. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared memory or shared everything
Architecture:
Threads provide better resource utilization and faster
context switching, thus providing for better scalability.
At the same time, threads that are too tightly coupled
with the OS may limit RDBMS portability.
Scalability of shared-memory architectures is limited.
75. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared disk architecture:
76. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared disk architecture:
Each node consists of one or more PUs and associated
memory.
Memory is not shared between nodes.
Communication occurs over a common high-speed bus.
Each node has access to the same disks and other resources.
77. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared disk architecture:
It implements the concept of shared ownership of the
entire database between RBMS server, each of which is
running on a node of a distributed memory system.
Each RDBMS server can read, write, update, and remove
data from the same shared database, which would require
the system to implement a form of a distributed lock
manager(DLM).
78. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared disk architecture:
Since the memory is not shared among the nodes, each
node has its own data cache.
Cache consistency must be maintained across the nodes
and a lock manager is needed to maintain the consistency.
There is additional overhead in maintaining the locks and
ensuring that the data caches are consistent
79. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared disk architecture – Advantages:
Shared disk systems permit high availability. All data
is accessible even if one node dies.
These systems have the concept of one database,
which is an advantage over shared nothing systems.
Shared disk systems provide for incremental growth.
80. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared disk architecture – Disadvantages:
Inter-node synchronization is required, involving DLM
overhead and greater dependency on high-speed
interconnect.
If the workload is not partitioned well, there may be high
synchronization overhead.
There is operating system overhead of running shared disk
software..
81. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared Nothing Architecture
82. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared Nothing Architecture:
Shared nothing systems are typically loosely coupled.
In shared nothing systems only one CPU is connected
to a given disk.
If a table or database is located on that disk, access
depends entirely on the PU which owns it.
83. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared Nothing Architecture – Advantages:
Shared nothing systems provide for incremental
growth.
Failure is local: if one node fails, the others stay
up.
84. Mapping the Data Warehouse to a
Multiprocessor Architecture
Shared Nothing Architecture – Advantages:
More coordination is required
More overhead is required for a process working
on a disk belonging to another node.
85. Mapping the Data Warehouse to a
Multiprocessor Architecture
Parallel DBMS Vendors – Oracle:
Architecture: shared disk architecture
Data partition: Key range, hash, round robin
Parallel operations: hash joins, scan and sort
86. Mapping the Data Warehouse to a
Multiprocessor Architecture
Parallel DBMS Vendors – Informix:
Architecture: Shared memory, shared disk and
shared nothing models
Data partition: round robin, hash, schema, key
range and user defined
Parallel operations: INSERT, UPDATE, DELELTE
87. Mapping the Data Warehouse to a
Multiprocessor Architecture
Parallel DBMS Vendors – IBM:
Architecture: Shared nothing models
Data partition: hash
Parallel operations: INSERT, UPDATE,
DELELTE, load, recovery, index creation, backup,
table reorganization
88. Mapping the Data Warehouse to a
Multiprocessor Architecture
Parallel DBMS Vendors – SYBASE:
Architecture: Shared nothing models
Data partition: hash, key range, Schema
Parallel operations: Horizontal and vertical
parallelism
89. DBMS Schemas for Decision
Support
A data cube allows data to be modeled and
viewed in multiple dimensions. It is defined by
dimensions and facts.
In general terms, dimensions are the perspectives
or entities with respect to which an organization
wants to keep records.
90. DBMS Schemas for Decision
Support
Example:
A sales data warehouse in order to keep records of the
store’s sales with respect to the dimensions time, item,
branch, and location.
Each dimension may have a table associated with it, called
a dimension table, which further describes the dimension.
A dimension table for item may contain the attributes item
name, brand, and type.
91. DBMS Schemas for Decision
Support
Example:
A multidimensional data model is typically organized
around a central theme, like sales, for instance. This theme
is represented by a fact table.
Facts are numerical measures.
Examples of facts for a sales data warehouse include
dollars sold (sales amount in dollars), units sold (number of
units sold), and amount budgeted.
93. DBMS Schemas for Decision
Support
Star schema
The most popular data model for a data warehouse is a
multidimensional model.
Data warehouse contains a large central table (fact table) containing
the bulk of the data, with no redundancy, and
A set of smaller associated tables (dimension tables), one for each
dimension.
The dimension tables displayed in a radial pattern around the central
fact table.
95. DBMS Schemas for Decision
Support
Star schema – Characteristics:
Simple structure - > easy to understand schema
Great query effectives -> small number of tables to
join
Relatively long time of loading data into dimension
tables -> de-normalization, redundancy data caused
that size of the table could be large.
96. DBMS Schemas for Decision
Support
Snowflake schema:
The snowflake schema is a variant of the star schema
model, where some dimension tables are normalized,
thereby further splitting the data into additional tables.
The resulting schema graph forms a shape similar to a
snowflake.
98. DBMS Schemas for Decision
Support
Snowflake schema:
The major difference between the snowflake and star
schema models is that the dimension tables of the
snowflake model may be kept in normalized form to
reduce redundancies.
Such a table is easy to maintain and saves storage
space.
99. DBMS Schemas for Decision
Support
Fact constellation
Sophisticated applications may require multiple
fact tables to share dimension tables.
This kind of schema can be viewed as a collection
of stars, and hence is called a galaxy schema or a
fact constellation.
101. Business Analysis: Reporting and
Query Tools and Applications
The principal purpose of data warehousing is to
provide information to business users for strategic
decision making.
These users interact with the data warehouse using
front-end tools, or by getting the required information
through the information delivery system.
102. Business Analysis: Reporting and
Query Tools and Applications
Tool Categories
Reporting
Managed query
Executive information systems
OLAP
Data Mining
103. Business Analysis: Reporting and
Query Tools and Applications
Tool Categories – Reporting:
Production reporting Tools: Companies generate
regular operational reports. Example: Calculating
and printing paychecks.
Report Writers: They are having graphical
interfaces and built-in charting functions.
104. Business Analysis: Reporting and
Query Tools and Applications
Tool Categories – Managed Tools:
These tools shield end users from the complexities
of SQL and database structures by inserting a
metalyer between users and the database.
They make it possible for knowledge workers to
access corporate data without IS intervention.
105. Business Analysis: Reporting and
Query Tools and Applications
Tool Categories – EIS Tools:
EIS – Executive Information System
It allows developers to build customized,
graphical decision support applications that give
managers and executives a high level view of the
business and access to external sources.
106. Business Analysis: Reporting and
Query Tools and Applications
Tool Categories – OLAP Tools:
A natural way to view corporate data.
These tools aggregate data along common
business subjects or dimensions.
Users can drill down, across or up levels in each
dimension
107. Business Analysis: Reporting and
Query Tools and Applications
Tool Categories – Data Mining Tools:
It uses a variety of statistical and artificial-
intelligence algorithms to analyze the correlation
of variables in the data and relationships to
investigate.
108. Business Analysis: Reporting and
Query Tools and Applications
Cognos Impromptu:
Impromptu is an interactive database reporting tool.
It allows Power Users to query data without
programming knowledge.
When using the Impromptu tool, no data is written or
changed in the database. It is only capable of reading
the data
109. Business Analysis: Reporting and
Query Tools and Applications
Cognos Impromptu – Features:
Interactive reporting capability
Enterprise-wide scalability
Superior user interface
Fastest time to result
Lowest cost of ownership
110. Business Analysis: Reporting and
Query Tools and Applications
Catalogs:
Impromptu stores metadata in subject related folders
This metadata will be used to develop a query for a
report. The metadata set is stored in a file called a
catalog.
The catalog does not contain any data.
111. Business Analysis: Reporting and
Query Tools and Applications
Catalogs:
It just contains information about connecting to
the database and the fields that will be accessible
for reports.
112. Business Analysis: Reporting and
Query Tools and Applications
Catalogs:
Folders—meaningful groups of information representing columns from
one or more tables
Columns—individual data elements that can appear in one or more
folders
Calculations—expressions used to compute required values from
existing data
Conditions—used to filter information so that only a certain type of
information is displayed
113. Business Analysis: Reporting and
Query Tools and Applications
Catalogs:
Prompts—pre-defined selection criteria prompts
that users can include in reports they create
Other components, such as metadata, a logical
database name, join information, and user classes
114. Business Analysis: Reporting and
Query Tools and Applications
Catalogs – Uses:
view, run, and print reports
export reports to other applications
disconnect from and connect to the database
create reports
change the contents of the catalog
115. Business Analysis: Reporting and
Query Tools and Applications
Reports:
Reports are created by choosing fields from the
catalog folders. This process will build a SQL
(Structured Query Language) statement behind the
scene.
No SQL knowledge is required to use Impromptu.
116. Business Analysis: Reporting and
Query Tools and Applications
Reports:
The data in the report may be formatted, sorted and/or
grouped as needed.
Titles, dates, headers and footers and other standard text
formatting features (italics, bolding, and font size) are also
available.
Once the desired layout is obtained, the report can be
saved to a report file.
117. Business Analysis: Reporting and
Query Tools and Applications
Frame-Based Reporting
Frames are the building blocks of all Impromptu reports and
templates.
They may contain report objects, such as data, text, pictures, and
charts.
There are no limits to the number of frames that you can place
within an individual report or template. You can nest frames
within other frames to group report objects within a report.
118. Business Analysis: Reporting and
Query Tools and Applications
Frame-Based Reporting
Form frame: An empty form frame appears.
List frame: An empty list frame appears.
Text frame: The flashing I-beam appears where you can begin
inserting text.
Picture frame: The Source tab (Picture Properties dialog box)
appears. You can use this tab to select the image to include in the
frame.
119. Business Analysis: Reporting and
Query Tools and Applications
Frame-Based Reporting
Chart frame: The Data tab (Chart Properties dialog box)
appears. You can use this tab to select the data item to
include in the chart.
OLE Object: The Insert Object dialog box appears where
you can locate and select the file you want to insert, or you
can create a new object using the software listed in the
Object Type box.
120. Business Analysis: Reporting and
Query Tools and Applications
Impromptu features
Unified query and reporting interface: It unifies both
query and reporting interface in a single user interface.
Object oriented architecture: It enables an inheritance
based administration so that more than 1000 users can
be accommodated as easily as single user.
121. Business Analysis: Reporting and
Query Tools and Applications
Impromptu features
Scalability: Its scalability ranges from single user to
1000 user
Security and Control: Security is based on user profiles
and their classes.
Data presented in a business context: It presents
information using the terminology of the business.
122. Business Analysis: Reporting and
Query Tools and Applications
Impromptu features
Over 70 pre defined report templates: It allows users can
simply supply the data to create an interactive report
Frame based reporting: It offers number of objects to create
a user designed report
Business relevant reporting: It can be used to generate a
business relevant report through filters, pre conditions and
calculations
123. Online Analytical Processing
(OLAP)
It uses database tables (fact and dimension tables) to
enable multidimensional viewing, analysis and
querying of large amounts of data.
Online Analytical Processing (OLAP) applications and
tools are those that are designed to ask ―complex
queries of large multidimensional collections of data
124. Online Analytical Processing
(OLAP)
Need:
It is the multidimensional nature of the business
problem.
These problems are characterized by retrieving a very
large number of records that can reach gigabytes and
terabytes and summarizing this data into a form
information that can by used by business analysts.
125. Online Analytical Processing
(OLAP)
The Multidimensional Data Model:
OLAP is on-line, it must provide answers quickly;
analysts pose iterative queries during interactive
sessions
Multidimensional data model is to view it as a
cube.
126. Online Analytical Processing
(OLAP)
The Multidimensional Data Model – Operations:
Roll-up: The roll-up operation (also called the drill-up
operation by some vendors) performs aggregation on a
data cube, either by climbing up a concept hierarchy
for a dimension or by dimension reduction.
128. Online Analytical Processing
(OLAP)
The Multidimensional Data Model – Operations:
Drill-down: Drill-down is the reverse of roll-up. It
navigates from less detailed data to more detailed data.
Drill-down can be realized by either stepping down a
concept hierarchy for a dimension or introducing
additional dimensions.
130. Online Analytical Processing
(OLAP)
The Multidimensional Data Model –
Operations:
Slice and dice: The slice operation performs a
selection on one dimension of the given cube,
resulting in a subcube.
131. Online Analytical Processing
(OLAP)
The Multidimensional Data Model –
Operations:
The dice operation defines a subcube by
performing a selection on two or more dimensions.
132. Online Analytical Processing
(OLAP)
The Multidimensional Data Model –
Operations:
The dice operation defines a subcube by
performing a selection on two or more dimensions.
133. Online Analytical Processing
(OLAP)
The Multidimensional Data Model –
Operations:
Pivot (also called rotate) is a visualization
operation that rotates the data axes in view to
provide an alternative data presentation.
134. OLAP vs OLTP
OLTP OLAP
Source of data Operational data; OLTPs are the
original source of the data.
Consolidation data; OLAP
data comes from the
various OLTP Databases
Purpose of data To control and run fundamental
business tasks
To help with planning,
problem solving, and
decision support
What the data Reveals a snapshot of ongoing
business processes
Multi-dimensional views of
various kinds of business
activities
Queries Relatively standardized and simple
queries Returning relatively few
records
Often complex queries
involving aggregations
135. OLAP vs OLTP
OLTP OLAP
Processing Speed Typically very fast Depends on the amount of
data involved;
Space Requirements Can be relatively small if historical
data is archived
Larger due to the existence
of aggregation structures
and history data;
Database Design Highly normalized with many
tables
Typically de-normalized
with fewer tables; use of
star and/or snowflake
schemas
Backup and
Recovery
data loss is likely to entail
significant monetary loss and legal
liability
Instead of regular backups,
some environments may
consider simply reloading
the OLTP data as a
recovery method
136. Online Analytical Processing
(OLAP)
Categories of OLAP Tools – MOLAP:
This is the more traditional way of OLAP analysis.
In MOLAP, data is stored in a multidimensional cube.
The storage is not in the relational database, but in
proprietary formats.
That is, data stored in array-based structures.
137. Online Analytical Processing
(OLAP)
Categories of OLAP Tools – ROLAP
This methodology relies on manipulating the data
stored in the relational database to give the appearance
of traditional OLAP's slicing and dicing functionality.
In essence, each action of slicing and dicing is
equivalent to adding a "WHERE" clause in the SQL
statement. Data stored in relational tables
138. Online Analytical Processing
(OLAP)
Categories of OLAP Tools – HOLAP (MQE: Managed Query
Environment)
HOLAP technologies attempt to combine the advantages of MOLAP
and ROLAP.
For summary-type information, HOLAP leverages cube technology for
faster performance.
It stores only the indexes and aggregations in the multidimensional
form while the rest of the data is stored in the relational database.