SlideShare una empresa de Scribd logo
1 de 138
20IT501 – Data Warehousing and
Data Mining
III Year / V Semester
OBJECTIVES
• The Student should be made to:
Be familiar with the concepts of data warehouse
Know the fundamentals of data mining
Understand the importance of association rule mining
Understand the techniques of classification and
clustering
Be aware of the recent trends of data mining
UNIT I- DATA WAREHOUSING
Data Warehouse – Data warehousing Components – Building a
Data warehouse – Mapping the Data Warehouse to a
Multiprocessor Architecture – DBMS Schemas for Decision
Support. Business Analysis: Reporting and Query tools and
Applications - Online Analytical Processing (OLAP) – Need –
Multidimensional Data Model – OLAP Guidelines -
Categories of OLAP Tools.
Data Warehouse
A warehouse is a subject-oriented, integrated,
time-variant and non-volatile collection of data
in support of management’s decision making
process
Data Warehouse
Subject Oriented:
 it provides information around a subject rather than the
organization's ongoing operations.
 These subjects can be product, customers, suppliers, sales,
revenue, etc.
 A data warehouse does not focus on the ongoing
operations, rather it focuses on modelling and analysis of
data for decision making.
Data Warehouse
Integrated:
 A data warehouse is constructed by integrating
data from heterogeneous sources such as relational
databases, flat files, etc.
 This integration enhances the effective analysis of
data.
Data Warehouse
Time Variant:
 The data collected in a data warehouse is
identified with a particular time period.
 The data in a data warehouse provides information
from the historical point of view.
Data Warehouse
Non-volatile
Non-volatile means the previous data is not erased
when new data is added to it.
 A data warehouse is kept separate from the operational
database and therefore frequent changes in operational
database is not reflected in the data warehouse.
Data Warehouse
 It is a database designed for analytical tasks, using data
from multiple applications.
 It supports a relatively small number of users with
relatively long interactions.
 Its content is periodically updated.
 It contains current and historical data to provide a
historical perspective of information.
Data Warehouse - Terminologies
 Current detail data: It is organized along
subject lines(customer profile data, sales data)
 Data Mart: Summarized departmental data
and is customized to suit the needs of a
particular department that owns the data.
Data Warehouse - Terminologies
 Drill-down: To perform business analysis in a top-
down fashion.
 Metadata: Data about data. It contains the location and
description of warehouse system components; names,
definition, structure and content of the data warehouse
and end-users views.
Data Warehouse
Operational Data Information Data
Content Current Values Summarized, Derived
Organization Application Subject
Stability Dynamic Static until refreshed
Data Structure Optimized for transactions Optimized for complex
queries
Access High Medium to low
Response time Sub second (<1s) to 2-3s Several seconds to minutes
Data Warehouse Components
Data Warehouse Components
 Data Warehouse Database:
 It is based on a relational database management system server
that functions as the central repository for informational data.
 This repository is surrounded by a number of key components
designed to make the entire functional, manageable and
accessible by both the operational systems that source data into
the warehouse and by end-user query and analysis tool.
Data Warehouse Components
 Sourcing, Acquisition, Clean up, and
Transformation Tools:
 They perform conversions, summarization, key
changes, structural changes and condensation.
 The data transformation is required so that the
information can be used by decision support tools.
Data Warehouse Components
 Sourcing, Acquisition, Clean up, and Transformation
Tools – Functionalities:
 To remove unwanted data from operational db
 Converting to common data names and attributes
 Calculating summaries and derived data
 Establishing defaults for missing data
 Accommodating source data definition changes
Data Warehouse Components
 Sourcing, Acquisition, Clean up, and
Transformation Tools – Issues:
 Data heterogeneity: It refers to the different way
the data is defined and used in different modules.
Data Warehouse Components
 Sourcing, Acquisition, Clean up, and
Transformation Tools – Issues:

Data Warehouse Components
Meta data:
 It is data about data. It is used for maintaining,
managing and using the data warehouse.
 Types:
 Technical Meta data
 Business Meta data
Data Warehouse Components
 Meta data - Technical Meta data:
Information about data stores
Transformation descriptions
The rules used to perform clean up, and data enhancement
Data mapping operations
Access authorization, backup history, archive history, info
delivery history, data acquisition history, data access etc.,
Data Warehouse Components
Meta data – Business Meta data:
 It contains information that stored in data warehouse
to users.
 Subject areas, and info object type including queries,
reports, images, video, audio clips etc.
 Internet home pages
 Information related to info delivery system
Data Warehouse Components
 Meta data:
 Meta data helps the users to understand content and find
the data.
Meta data are stored in a separate data stores which is
known as informational directory or Meta data repository.
It helps to integrate, maintain and view the contents of the
data warehouse
Data Warehouse Components
 Meta data – Characteristics:
 It is the gateway to the data warehouse environment
 It supports easy distribution and replication of content for high
performance and availability
 It should be searchable by business oriented key words
 It should support the sharing of information
 It should support and provide interface to other applications
Data Warehouse Components
Access Tools:
Data query and reporting tools
Application development tools
Executive info system tools (EIS)
OLAP tools
Data mining tools
Data Warehouse Components
Data Marts:
 Departmental subsets that focus on selected subjects.
They are independent used by dedicated user group.
They are used for rapid delivery of enhanced decision
support functionality to end users.
Data Warehouse Components
 Data warehouse admin and management:
 Security and priority management
 Monitoring updates from multiple sources
 Data quality checks
 Managing and updating meta data
 Auditing and reporting data warehouse usage and status
 Replicating, sub setting and distributing data
 Backup and recovery
Data Warehouse Components
Information delivery system
It is used to enable the process of subscribing for
data warehouse info.
Delivery to one or more destinations according to
specified scheduling algorithm
Building a Data Warehouse -
Reason
 Business perspective:
 Decisions need to be made quickly and correctly,
using all available data.
 Users are business domain expert, not computer
professionals
 Competition is heating up in the areas of business
intelligence and added information value
Building a Data Warehouse -
Reason
 Technological perspective:
 The price of computer processing speed continues
to decline.
 The price of digital storage is rapidly dropping.
 Network bandwidth is increasing.
Building a Data Warehouse
 Approaches
 Top - Down Approach
 Bottom - Up Approach
Building a Data Warehouse
 Approaches: Top - Down Approach
 To build a centralized repository to house corporate
wide business data.
 This repository is called Enterprise Data Warehouse
(EDW).
 The data in the EDW is stored in a normalized form in
order to avoid redundancy.
Building a Data Warehouse
 Approaches: Top - Down Approach
 The data in the EDW is stored at the most detail level.
The reason to build the EDW on the most detail level
is to leverage
Flexibility to be used by multiple departments.
Flexibility to cater for future requirements.
Building a Data Warehouse
 Approaches: Top - Down Approach
 The data in the EDW is stored at the most detail level.
The reason to build the EDW on the most detail level
is to leverage
Flexibility to be used by multiple departments.
Flexibility to cater for future requirements.
Building a Data Warehouse
 Approaches: Top - Down Approach
 We should implement the top-down approach when
The business has complete clarity on all or multiple subject
areas data warehouse requirements.
The business is ready to invest considerable time and money.
Building a Data Warehouse
 Approaches: Top - Down Approach –
Advantages:
 This is very important for the data to be reliable,
 Consistent across subject areas
Building a Data Warehouse
 Approaches: Top - Down Approach –
Disadvantages:
 It requires more time and initial investment
The business has to wait for the EDW to be
implemented followed by building the data marts
before which they can access their reports
Building a Data Warehouse
 Approaches: Bottom Up Approach
 It is an incremental approach to build a data
warehouse.
 Here we build the data marts separately at
different points of time as and when the specific
subject area requirements are clear.
Building a Data Warehouse
 Approaches: Bottom Up Approach
 The data marts are integrated or combined
together to form a data warehouse.
 Separate data marts are combined through the use
of conformed dimensions and conformed facts.
Building a Data Warehouse
 Approaches: Bottom Up Approach
 We should implement the bottom up approach
when
 We have initial cost and time constraints.
The complete warehouse requirements are not clear. We
have clarity to only one data mart.
Building a Data Warehouse
 Approaches: Bottom Up Approach -
Advantages
 They do not require high initial costs and have a
faster implementation time;
 Hence the business can start using the marts much
earlier as compared to the top-down approach.
Building a Data Warehouse
 Approaches: Bottom Up Approach -
Disadvantages
 It stores data in the de normalized format, hence
there would be high space usage for detailed data.
 We have a tendency of not keeping detailed data
Building a Data Warehouse
 Design considerations: Most successful data warehouses
that meet these requirements have these common
characteristics:
 Are based on a dimensional model
 Contain historical and current data
 Include both detailed and summarized data
 Consolidate disparate data from multiple sources while retaining
consistency
Building a Data Warehouse
 Design considerations: Data warehouse is
difficult to build due to the following reason:
 Heterogeneity of data sources
 Use of historical data
 Growing nature of data base
Building a Data Warehouse
 Data content
 The content and structure of the data warehouse are reflected in
its data model.
 The data model is the template that describes how information
will be organized within the integrated warehouse framework.
 The data warehouse data must be a detailed data. It must be
formatted, cleaned up and transformed to fit the warehouse data
model.
Building a Data Warehouse
 Meta data
It defines the location and contents of data in the
warehouse.
Meta data is searchable by users to find definitions or
subject areas.
In other words, it must provide decision support oriented
pointers to warehouse data and thus provides a logical link
between warehouse data and decision support applications.
Building a Data Warehouse
Data Distribution
 The data can be distributed based on the subject
area, location (geographical region), or time
(current, month, year).
Building a Data Warehouse
 Tools
 These tools provide facilities for defining the
transformation and cleanup rules, data movement,
end user query, reporting and data analysis.
Building a Data Warehouse
 Design steps
Choosing the subject matter
Deciding what a fact table represents
Identifying and conforming the dimensions
Choosing the facts
Storing pre calculations in the fact table
Building a Data Warehouse
 Design steps
Rounding out the dimension table
Choosing the duration of the db
The need to track slowly changing dimensions
Deciding the query priorities and query models
Building a Data Warehouse
 Technical Constraints – Issues:
 The hardware platform that would house the data warehouse
 The dbms that supports the warehouse data
 The communication infrastructure that connects data marts,
operational systems and end users
 The hardware and software to support meta data repository
 The systems management framework that enables admin of the
entire environment
Building a Data Warehouse
 Implementation considerations
Collect and analyze business requirements
Create a data model and a physical design
Define data sources
Choose the database Technology and platform
Extract the data from operational database, transform
it, clean it up and load it into the warehouse
Building a Data Warehouse
 Implementation considerations
Choose database access and reporting tools
Choose database connectivity software
Choose data analysis and presentation s/w
Update the data warehouse Access tools
Building a Data Warehouse
 Access Tools
Simple tabular form data
Ranking data
Multivariable data
Time series data
Graphing, charting and pivoting data
Building a Data Warehouse
 Access Tools
Statistical analysis data
Predefined repeatable queries
Ad hoc user specified queries
Reporting and analysis data
Building a Data Warehouse
 Data extraction, clean up, transformation and
migration – Selection Criteria:
 Timeliness of data delivery to the warehouse
 The tool must have the ability to identify the
particular data and that can be read by conversion tool
 The tool must have the capability to merge data from
multiple data stores.
Building a Data Warehouse
 Data extraction, clean up, transformation and
migration – Selection Criteria:
 The tool should have the ability to read data from data
dictionary
 The code generated by the tool should be completely
maintainable
The tool should permit the user to extract the required data
Building a Data Warehouse
 User levels - Casual users:
They are most comfortable in retrieving information
from warehouse in pre defined formats and running pre
existing queries and reports.
These users do not need tools that allow for building
standard and ad hoc reports
Building a Data Warehouse
 User levels - Power Users:
The user can use pre defined as well as user defined
queries to create simple and ad hoc reports.
These users can engage in drill down operations.
These users may have the experience of using
reporting and query tools.
Building a Data Warehouse
 User levels - Expert users:
These users tend to create their own complex
queries and perform standard analysis on the info
they retrieve.
These users have the knowledge about the use of
query and report tools
Building a Data Warehouse
 Benefits of data warehousing:
Locating the right information
Presentation of information
Testing of hypothesis
Discovery of information
Sharing the analysis
Building a Data Warehouse
 Tangible benefits (quantified / measureable):It
includes
Improvement in product inventory
Decrement in production cost
Improvement in selection of target markets
Enhancement in asset and liability management
Building a Data Warehouse
 Intangible benefits (not easy to quantified): It
includes,
Improvement in productivity by keeping all data in
single location and eliminating rekeying of data
Reduced redundant processing
Enhanced customer relation
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel Query Processing:
 The query optimization in distributed memory parallel
systems.
 Parallel RDBMS processing offers the solution to the
traditional problem of poor RDMS performance for
complex queries and very large databases.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel Query Processing:
The accessing and processing of portion of database by
individual threads in parallel can greatly improve the
performance of the query.
 Speed up: The ability to execute the same request on the
same amount of data in less time.
 Scale up: The ability to obtain the same performance on
the same requests as the database size increases
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Types of parallelism:
 Inter query Parallelism: In which different server threads
or processes handle multiple requests at the same time.
 Intra query Parallelism: This form of parallelism
decomposes the serial SQL query into lower level
operations such as scan, join, sort etc. Then these lower
level operations are executed concurrently in parallel.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Types of parallelism: Intra query parallelism can
be done in either of two ways:
 Horizontal parallelism: which means that the data
base is partitioned across multiple disks and parallel
processing occurs within a specific task that is
performed concurrently on different processors against
different set of data
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Types of parallelism: Intra query parallelism can
be done in either of two ways:
 Vertical parallelism: This occurs among different
tasks. All query components such as scan, join, sort etc
are executed in parallel in a pipelined fashion. In other
words, an output from one task becomes an input into
another task.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data partitioning
 Data partitioning is the key component for effective
parallel execution of data base operations.
It spread data from database tables across multiple
disks so that I/O operations such as read and write can
be performed in parallel.
Partition can be done randomly or intelligently
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data partitioning
 Random portioning includes random data striping
across multiple disks on a single server. Another
option for random portioning is round robin fashion
partitioning in which each record is placed on the next
disk assigned to the data base.
 Since DBMS doe not know where each record resides.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data partitioning
 Intelligent partitioning assumes that DBMS knows
where a specific record is located and does not waste
time searching for it across all disks. The various
intelligent partitioning include:
 Hash partitioning, Key range partitioning, Schema
portioning and User defined portioning
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data base architectures of parallel processing:
Shared memory or shared everything Architecture
Shared disk architecture
Shred nothing architecture
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared memory or shared everything Architecture:
 Multiple PUs share memory.
 Each PU has full access to all shared memory through a common
bus.
 Communication between nodes occurs via shared memory.
 Performance is limited by the bandwidth of the memory bus.
 It is providing consistent single system to the user.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared memory or shared everything
Architecture:
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared memory or shared everything
Architecture:
 Threads provide better resource utilization and faster
context switching, thus providing for better scalability.
 At the same time, threads that are too tightly coupled
with the OS may limit RDBMS portability.
 Scalability of shared-memory architectures is limited.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
 Each node consists of one or more PUs and associated
memory.
Memory is not shared between nodes.
Communication occurs over a common high-speed bus.
Each node has access to the same disks and other resources.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
 It implements the concept of shared ownership of the
entire database between RBMS server, each of which is
running on a node of a distributed memory system.
 Each RDBMS server can read, write, update, and remove
data from the same shared database, which would require
the system to implement a form of a distributed lock
manager(DLM).
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
 Since the memory is not shared among the nodes, each
node has its own data cache.
Cache consistency must be maintained across the nodes
and a lock manager is needed to maintain the consistency.
 There is additional overhead in maintaining the locks and
ensuring that the data caches are consistent
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture – Advantages:
 Shared disk systems permit high availability. All data
is accessible even if one node dies.
 These systems have the concept of one database,
which is an advantage over shared nothing systems.
 Shared disk systems provide for incremental growth.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture – Disadvantages:
Inter-node synchronization is required, involving DLM
overhead and greater dependency on high-speed
interconnect.
If the workload is not partitioned well, there may be high
synchronization overhead.
There is operating system overhead of running shared disk
software..
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture:
 Shared nothing systems are typically loosely coupled.
In shared nothing systems only one CPU is connected
to a given disk.
If a table or database is located on that disk, access
depends entirely on the PU which owns it.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture – Advantages:
 Shared nothing systems provide for incremental
growth.
 Failure is local: if one node fails, the others stay
up.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture – Advantages:
 More coordination is required
 More overhead is required for a process working
on a disk belonging to another node.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – Oracle:
Architecture: shared disk architecture
Data partition: Key range, hash, round robin
Parallel operations: hash joins, scan and sort
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – Informix:
Architecture: Shared memory, shared disk and
shared nothing models
Data partition: round robin, hash, schema, key
range and user defined
Parallel operations: INSERT, UPDATE, DELELTE
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – IBM:
 Architecture: Shared nothing models
 Data partition: hash
 Parallel operations: INSERT, UPDATE,
DELELTE, load, recovery, index creation, backup,
table reorganization
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – SYBASE:
Architecture: Shared nothing models
Data partition: hash, key range, Schema
Parallel operations: Horizontal and vertical
parallelism
DBMS Schemas for Decision
Support
A data cube allows data to be modeled and
viewed in multiple dimensions. It is defined by
dimensions and facts.
 In general terms, dimensions are the perspectives
or entities with respect to which an organization
wants to keep records.
DBMS Schemas for Decision
Support
 Example:
A sales data warehouse in order to keep records of the
store’s sales with respect to the dimensions time, item,
branch, and location.
 Each dimension may have a table associated with it, called
a dimension table, which further describes the dimension.
A dimension table for item may contain the attributes item
name, brand, and type.
DBMS Schemas for Decision
Support
 Example:
A multidimensional data model is typically organized
around a central theme, like sales, for instance. This theme
is represented by a fact table.
Facts are numerical measures.
Examples of facts for a sales data warehouse include
dollars sold (sales amount in dollars), units sold (number of
units sold), and amount budgeted.
DBMS Schemas for Decision
Support
 Example
DBMS Schemas for Decision
Support
 Star schema
 The most popular data model for a data warehouse is a
multidimensional model.
 Data warehouse contains a large central table (fact table) containing
the bulk of the data, with no redundancy, and
 A set of smaller associated tables (dimension tables), one for each
dimension.
 The dimension tables displayed in a radial pattern around the central
fact table.
DBMS Schemas for Decision
Support
 Star schema
DBMS Schemas for Decision
Support
 Star schema – Characteristics:
Simple structure - > easy to understand schema
Great query effectives -> small number of tables to
join
Relatively long time of loading data into dimension
tables -> de-normalization, redundancy data caused
that size of the table could be large.
DBMS Schemas for Decision
Support
 Snowflake schema:
 The snowflake schema is a variant of the star schema
model, where some dimension tables are normalized,
thereby further splitting the data into additional tables.
 The resulting schema graph forms a shape similar to a
snowflake.
DBMS Schemas for Decision
Support
 Snowflake schema:
DBMS Schemas for Decision
Support
 Snowflake schema:
 The major difference between the snowflake and star
schema models is that the dimension tables of the
snowflake model may be kept in normalized form to
reduce redundancies.
 Such a table is easy to maintain and saves storage
space.
DBMS Schemas for Decision
Support
Fact constellation
 Sophisticated applications may require multiple
fact tables to share dimension tables.
 This kind of schema can be viewed as a collection
of stars, and hence is called a galaxy schema or a
fact constellation.
DBMS Schemas for Decision
Support
Fact constellation
Business Analysis: Reporting and
Query Tools and Applications
 The principal purpose of data warehousing is to
provide information to business users for strategic
decision making.
 These users interact with the data warehouse using
front-end tools, or by getting the required information
through the information delivery system.
Business Analysis: Reporting and
Query Tools and Applications
 Tool Categories
 Reporting
 Managed query
 Executive information systems
 OLAP
 Data Mining
Business Analysis: Reporting and
Query Tools and Applications
 Tool Categories – Reporting:
 Production reporting Tools: Companies generate
regular operational reports. Example: Calculating
and printing paychecks.
 Report Writers: They are having graphical
interfaces and built-in charting functions.
Business Analysis: Reporting and
Query Tools and Applications
 Tool Categories – Managed Tools:
 These tools shield end users from the complexities
of SQL and database structures by inserting a
metalyer between users and the database.
 They make it possible for knowledge workers to
access corporate data without IS intervention.
Business Analysis: Reporting and
Query Tools and Applications
 Tool Categories – EIS Tools:
 EIS – Executive Information System
 It allows developers to build customized,
graphical decision support applications that give
managers and executives a high level view of the
business and access to external sources.
Business Analysis: Reporting and
Query Tools and Applications
 Tool Categories – OLAP Tools:
 A natural way to view corporate data.
 These tools aggregate data along common
business subjects or dimensions.
 Users can drill down, across or up levels in each
dimension
Business Analysis: Reporting and
Query Tools and Applications
 Tool Categories – Data Mining Tools:
 It uses a variety of statistical and artificial-
intelligence algorithms to analyze the correlation
of variables in the data and relationships to
investigate.
Business Analysis: Reporting and
Query Tools and Applications
 Cognos Impromptu:
 Impromptu is an interactive database reporting tool.
 It allows Power Users to query data without
programming knowledge.
 When using the Impromptu tool, no data is written or
changed in the database. It is only capable of reading
the data
Business Analysis: Reporting and
Query Tools and Applications
 Cognos Impromptu – Features:
 Interactive reporting capability
Enterprise-wide scalability
Superior user interface
Fastest time to result
Lowest cost of ownership
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs:
 Impromptu stores metadata in subject related folders
 This metadata will be used to develop a query for a
report. The metadata set is stored in a file called a
catalog.
 The catalog does not contain any data.
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs:
 It just contains information about connecting to
the database and the fields that will be accessible
for reports.
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs:
 Folders—meaningful groups of information representing columns from
one or more tables
 Columns—individual data elements that can appear in one or more
folders
 Calculations—expressions used to compute required values from
existing data
 Conditions—used to filter information so that only a certain type of
information is displayed
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs:
Prompts—pre-defined selection criteria prompts
that users can include in reports they create
Other components, such as metadata, a logical
database name, join information, and user classes
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs – Uses:
view, run, and print reports
export reports to other applications
disconnect from and connect to the database
create reports
change the contents of the catalog
Business Analysis: Reporting and
Query Tools and Applications
 Reports:
 Reports are created by choosing fields from the
catalog folders. This process will build a SQL
(Structured Query Language) statement behind the
scene.
 No SQL knowledge is required to use Impromptu.
Business Analysis: Reporting and
Query Tools and Applications
 Reports:
 The data in the report may be formatted, sorted and/or
grouped as needed.
 Titles, dates, headers and footers and other standard text
formatting features (italics, bolding, and font size) are also
available.
 Once the desired layout is obtained, the report can be
saved to a report file.
Business Analysis: Reporting and
Query Tools and Applications
 Frame-Based Reporting
 Frames are the building blocks of all Impromptu reports and
templates.
 They may contain report objects, such as data, text, pictures, and
charts.
 There are no limits to the number of frames that you can place
within an individual report or template. You can nest frames
within other frames to group report objects within a report.
Business Analysis: Reporting and
Query Tools and Applications
 Frame-Based Reporting
 Form frame: An empty form frame appears.
 List frame: An empty list frame appears.
 Text frame: The flashing I-beam appears where you can begin
inserting text.
 Picture frame: The Source tab (Picture Properties dialog box)
appears. You can use this tab to select the image to include in the
frame.
Business Analysis: Reporting and
Query Tools and Applications
 Frame-Based Reporting
Chart frame: The Data tab (Chart Properties dialog box)
appears. You can use this tab to select the data item to
include in the chart.
OLE Object: The Insert Object dialog box appears where
you can locate and select the file you want to insert, or you
can create a new object using the software listed in the
Object Type box.
Business Analysis: Reporting and
Query Tools and Applications
 Impromptu features
 Unified query and reporting interface: It unifies both
query and reporting interface in a single user interface.
 Object oriented architecture: It enables an inheritance
based administration so that more than 1000 users can
be accommodated as easily as single user.
Business Analysis: Reporting and
Query Tools and Applications
 Impromptu features
Scalability: Its scalability ranges from single user to
1000 user
Security and Control: Security is based on user profiles
and their classes.
 Data presented in a business context: It presents
information using the terminology of the business.
Business Analysis: Reporting and
Query Tools and Applications
 Impromptu features
Over 70 pre defined report templates: It allows users can
simply supply the data to create an interactive report
Frame based reporting: It offers number of objects to create
a user designed report
Business relevant reporting: It can be used to generate a
business relevant report through filters, pre conditions and
calculations
Online Analytical Processing
(OLAP)
 It uses database tables (fact and dimension tables) to
enable multidimensional viewing, analysis and
querying of large amounts of data.
 Online Analytical Processing (OLAP) applications and
tools are those that are designed to ask ―complex
queries of large multidimensional collections of data
Online Analytical Processing
(OLAP)
 Need:
 It is the multidimensional nature of the business
problem.
 These problems are characterized by retrieving a very
large number of records that can reach gigabytes and
terabytes and summarizing this data into a form
information that can by used by business analysts.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model:
 OLAP is on-line, it must provide answers quickly;
analysts pose iterative queries during interactive
sessions
 Multidimensional data model is to view it as a
cube.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model – Operations:
 Roll-up: The roll-up operation (also called the drill-up
operation by some vendors) performs aggregation on a
data cube, either by climbing up a concept hierarchy
for a dimension or by dimension reduction.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model – Operations:
 Drill-down: Drill-down is the reverse of roll-up. It
navigates from less detailed data to more detailed data.
 Drill-down can be realized by either stepping down a
concept hierarchy for a dimension or introducing
additional dimensions.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
 Slice and dice: The slice operation performs a
selection on one dimension of the given cube,
resulting in a subcube.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
 The dice operation defines a subcube by
performing a selection on two or more dimensions.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
 The dice operation defines a subcube by
performing a selection on two or more dimensions.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
 Pivot (also called rotate) is a visualization
operation that rotates the data axes in view to
provide an alternative data presentation.
OLAP vs OLTP
OLTP OLAP
Source of data Operational data; OLTPs are the
original source of the data.
Consolidation data; OLAP
data comes from the
various OLTP Databases
Purpose of data To control and run fundamental
business tasks
To help with planning,
problem solving, and
decision support
What the data Reveals a snapshot of ongoing
business processes
Multi-dimensional views of
various kinds of business
activities
Queries Relatively standardized and simple
queries Returning relatively few
records
Often complex queries
involving aggregations
OLAP vs OLTP
OLTP OLAP
Processing Speed Typically very fast Depends on the amount of
data involved;
Space Requirements Can be relatively small if historical
data is archived
Larger due to the existence
of aggregation structures
and history data;
Database Design Highly normalized with many
tables
Typically de-normalized
with fewer tables; use of
star and/or snowflake
schemas
Backup and
Recovery
data loss is likely to entail
significant monetary loss and legal
liability
Instead of regular backups,
some environments may
consider simply reloading
the OLTP data as a
recovery method
Online Analytical Processing
(OLAP)
 Categories of OLAP Tools – MOLAP:
 This is the more traditional way of OLAP analysis.
In MOLAP, data is stored in a multidimensional cube.
The storage is not in the relational database, but in
proprietary formats.
That is, data stored in array-based structures.
Online Analytical Processing
(OLAP)
 Categories of OLAP Tools – ROLAP
This methodology relies on manipulating the data
stored in the relational database to give the appearance
of traditional OLAP's slicing and dicing functionality.
In essence, each action of slicing and dicing is
equivalent to adding a "WHERE" clause in the SQL
statement. Data stored in relational tables
Online Analytical Processing
(OLAP)
 Categories of OLAP Tools – HOLAP (MQE: Managed Query
Environment)
 HOLAP technologies attempt to combine the advantages of MOLAP
and ROLAP.
 For summary-type information, HOLAP leverages cube technology for
faster performance.
 It stores only the indexes and aggregations in the multidimensional
form while the rest of the data is stored in the relational database.

Más contenido relacionado

Similar a 20IT501_DWDM_PPT_Unit_I.ppt

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3ambujm
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroangshuman2387
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 

Similar a 20IT501_DWDM_PPT_Unit_I.ppt (20)

Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
DW 101
DW 101DW 101
DW 101
 
11666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect311666 Bitt I 2008 Lect3
11666 Bitt I 2008 Lect3
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 vero
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 

Último

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

20IT501_DWDM_PPT_Unit_I.ppt

  • 1. 20IT501 – Data Warehousing and Data Mining III Year / V Semester
  • 2. OBJECTIVES • The Student should be made to: Be familiar with the concepts of data warehouse Know the fundamentals of data mining Understand the importance of association rule mining Understand the techniques of classification and clustering Be aware of the recent trends of data mining
  • 3. UNIT I- DATA WAREHOUSING Data Warehouse – Data warehousing Components – Building a Data warehouse – Mapping the Data Warehouse to a Multiprocessor Architecture – DBMS Schemas for Decision Support. Business Analysis: Reporting and Query tools and Applications - Online Analytical Processing (OLAP) – Need – Multidimensional Data Model – OLAP Guidelines - Categories of OLAP Tools.
  • 4. Data Warehouse A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process
  • 5. Data Warehouse Subject Oriented:  it provides information around a subject rather than the organization's ongoing operations.  These subjects can be product, customers, suppliers, sales, revenue, etc.  A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.
  • 6. Data Warehouse Integrated:  A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc.  This integration enhances the effective analysis of data.
  • 7. Data Warehouse Time Variant:  The data collected in a data warehouse is identified with a particular time period.  The data in a data warehouse provides information from the historical point of view.
  • 8. Data Warehouse Non-volatile Non-volatile means the previous data is not erased when new data is added to it.  A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.
  • 9. Data Warehouse  It is a database designed for analytical tasks, using data from multiple applications.  It supports a relatively small number of users with relatively long interactions.  Its content is periodically updated.  It contains current and historical data to provide a historical perspective of information.
  • 10. Data Warehouse - Terminologies  Current detail data: It is organized along subject lines(customer profile data, sales data)  Data Mart: Summarized departmental data and is customized to suit the needs of a particular department that owns the data.
  • 11. Data Warehouse - Terminologies  Drill-down: To perform business analysis in a top- down fashion.  Metadata: Data about data. It contains the location and description of warehouse system components; names, definition, structure and content of the data warehouse and end-users views.
  • 12. Data Warehouse Operational Data Information Data Content Current Values Summarized, Derived Organization Application Subject Stability Dynamic Static until refreshed Data Structure Optimized for transactions Optimized for complex queries Access High Medium to low Response time Sub second (<1s) to 2-3s Several seconds to minutes
  • 14. Data Warehouse Components  Data Warehouse Database:  It is based on a relational database management system server that functions as the central repository for informational data.  This repository is surrounded by a number of key components designed to make the entire functional, manageable and accessible by both the operational systems that source data into the warehouse and by end-user query and analysis tool.
  • 15. Data Warehouse Components  Sourcing, Acquisition, Clean up, and Transformation Tools:  They perform conversions, summarization, key changes, structural changes and condensation.  The data transformation is required so that the information can be used by decision support tools.
  • 16. Data Warehouse Components  Sourcing, Acquisition, Clean up, and Transformation Tools – Functionalities:  To remove unwanted data from operational db  Converting to common data names and attributes  Calculating summaries and derived data  Establishing defaults for missing data  Accommodating source data definition changes
  • 17. Data Warehouse Components  Sourcing, Acquisition, Clean up, and Transformation Tools – Issues:  Data heterogeneity: It refers to the different way the data is defined and used in different modules.
  • 18. Data Warehouse Components  Sourcing, Acquisition, Clean up, and Transformation Tools – Issues: 
  • 19. Data Warehouse Components Meta data:  It is data about data. It is used for maintaining, managing and using the data warehouse.  Types:  Technical Meta data  Business Meta data
  • 20. Data Warehouse Components  Meta data - Technical Meta data: Information about data stores Transformation descriptions The rules used to perform clean up, and data enhancement Data mapping operations Access authorization, backup history, archive history, info delivery history, data acquisition history, data access etc.,
  • 21. Data Warehouse Components Meta data – Business Meta data:  It contains information that stored in data warehouse to users.  Subject areas, and info object type including queries, reports, images, video, audio clips etc.  Internet home pages  Information related to info delivery system
  • 22. Data Warehouse Components  Meta data:  Meta data helps the users to understand content and find the data. Meta data are stored in a separate data stores which is known as informational directory or Meta data repository. It helps to integrate, maintain and view the contents of the data warehouse
  • 23. Data Warehouse Components  Meta data – Characteristics:  It is the gateway to the data warehouse environment  It supports easy distribution and replication of content for high performance and availability  It should be searchable by business oriented key words  It should support the sharing of information  It should support and provide interface to other applications
  • 24. Data Warehouse Components Access Tools: Data query and reporting tools Application development tools Executive info system tools (EIS) OLAP tools Data mining tools
  • 25. Data Warehouse Components Data Marts:  Departmental subsets that focus on selected subjects. They are independent used by dedicated user group. They are used for rapid delivery of enhanced decision support functionality to end users.
  • 26. Data Warehouse Components  Data warehouse admin and management:  Security and priority management  Monitoring updates from multiple sources  Data quality checks  Managing and updating meta data  Auditing and reporting data warehouse usage and status  Replicating, sub setting and distributing data  Backup and recovery
  • 27. Data Warehouse Components Information delivery system It is used to enable the process of subscribing for data warehouse info. Delivery to one or more destinations according to specified scheduling algorithm
  • 28. Building a Data Warehouse - Reason  Business perspective:  Decisions need to be made quickly and correctly, using all available data.  Users are business domain expert, not computer professionals  Competition is heating up in the areas of business intelligence and added information value
  • 29. Building a Data Warehouse - Reason  Technological perspective:  The price of computer processing speed continues to decline.  The price of digital storage is rapidly dropping.  Network bandwidth is increasing.
  • 30. Building a Data Warehouse  Approaches  Top - Down Approach  Bottom - Up Approach
  • 31. Building a Data Warehouse  Approaches: Top - Down Approach  To build a centralized repository to house corporate wide business data.  This repository is called Enterprise Data Warehouse (EDW).  The data in the EDW is stored in a normalized form in order to avoid redundancy.
  • 32. Building a Data Warehouse  Approaches: Top - Down Approach  The data in the EDW is stored at the most detail level. The reason to build the EDW on the most detail level is to leverage Flexibility to be used by multiple departments. Flexibility to cater for future requirements.
  • 33. Building a Data Warehouse  Approaches: Top - Down Approach  The data in the EDW is stored at the most detail level. The reason to build the EDW on the most detail level is to leverage Flexibility to be used by multiple departments. Flexibility to cater for future requirements.
  • 34. Building a Data Warehouse  Approaches: Top - Down Approach  We should implement the top-down approach when The business has complete clarity on all or multiple subject areas data warehouse requirements. The business is ready to invest considerable time and money.
  • 35. Building a Data Warehouse  Approaches: Top - Down Approach – Advantages:  This is very important for the data to be reliable,  Consistent across subject areas
  • 36. Building a Data Warehouse  Approaches: Top - Down Approach – Disadvantages:  It requires more time and initial investment The business has to wait for the EDW to be implemented followed by building the data marts before which they can access their reports
  • 37. Building a Data Warehouse  Approaches: Bottom Up Approach  It is an incremental approach to build a data warehouse.  Here we build the data marts separately at different points of time as and when the specific subject area requirements are clear.
  • 38. Building a Data Warehouse  Approaches: Bottom Up Approach  The data marts are integrated or combined together to form a data warehouse.  Separate data marts are combined through the use of conformed dimensions and conformed facts.
  • 39. Building a Data Warehouse  Approaches: Bottom Up Approach  We should implement the bottom up approach when  We have initial cost and time constraints. The complete warehouse requirements are not clear. We have clarity to only one data mart.
  • 40. Building a Data Warehouse  Approaches: Bottom Up Approach - Advantages  They do not require high initial costs and have a faster implementation time;  Hence the business can start using the marts much earlier as compared to the top-down approach.
  • 41. Building a Data Warehouse  Approaches: Bottom Up Approach - Disadvantages  It stores data in the de normalized format, hence there would be high space usage for detailed data.  We have a tendency of not keeping detailed data
  • 42. Building a Data Warehouse  Design considerations: Most successful data warehouses that meet these requirements have these common characteristics:  Are based on a dimensional model  Contain historical and current data  Include both detailed and summarized data  Consolidate disparate data from multiple sources while retaining consistency
  • 43. Building a Data Warehouse  Design considerations: Data warehouse is difficult to build due to the following reason:  Heterogeneity of data sources  Use of historical data  Growing nature of data base
  • 44. Building a Data Warehouse  Data content  The content and structure of the data warehouse are reflected in its data model.  The data model is the template that describes how information will be organized within the integrated warehouse framework.  The data warehouse data must be a detailed data. It must be formatted, cleaned up and transformed to fit the warehouse data model.
  • 45. Building a Data Warehouse  Meta data It defines the location and contents of data in the warehouse. Meta data is searchable by users to find definitions or subject areas. In other words, it must provide decision support oriented pointers to warehouse data and thus provides a logical link between warehouse data and decision support applications.
  • 46. Building a Data Warehouse Data Distribution  The data can be distributed based on the subject area, location (geographical region), or time (current, month, year).
  • 47. Building a Data Warehouse  Tools  These tools provide facilities for defining the transformation and cleanup rules, data movement, end user query, reporting and data analysis.
  • 48. Building a Data Warehouse  Design steps Choosing the subject matter Deciding what a fact table represents Identifying and conforming the dimensions Choosing the facts Storing pre calculations in the fact table
  • 49. Building a Data Warehouse  Design steps Rounding out the dimension table Choosing the duration of the db The need to track slowly changing dimensions Deciding the query priorities and query models
  • 50. Building a Data Warehouse  Technical Constraints – Issues:  The hardware platform that would house the data warehouse  The dbms that supports the warehouse data  The communication infrastructure that connects data marts, operational systems and end users  The hardware and software to support meta data repository  The systems management framework that enables admin of the entire environment
  • 51. Building a Data Warehouse  Implementation considerations Collect and analyze business requirements Create a data model and a physical design Define data sources Choose the database Technology and platform Extract the data from operational database, transform it, clean it up and load it into the warehouse
  • 52. Building a Data Warehouse  Implementation considerations Choose database access and reporting tools Choose database connectivity software Choose data analysis and presentation s/w Update the data warehouse Access tools
  • 53. Building a Data Warehouse  Access Tools Simple tabular form data Ranking data Multivariable data Time series data Graphing, charting and pivoting data
  • 54. Building a Data Warehouse  Access Tools Statistical analysis data Predefined repeatable queries Ad hoc user specified queries Reporting and analysis data
  • 55. Building a Data Warehouse  Data extraction, clean up, transformation and migration – Selection Criteria:  Timeliness of data delivery to the warehouse  The tool must have the ability to identify the particular data and that can be read by conversion tool  The tool must have the capability to merge data from multiple data stores.
  • 56. Building a Data Warehouse  Data extraction, clean up, transformation and migration – Selection Criteria:  The tool should have the ability to read data from data dictionary  The code generated by the tool should be completely maintainable The tool should permit the user to extract the required data
  • 57. Building a Data Warehouse  User levels - Casual users: They are most comfortable in retrieving information from warehouse in pre defined formats and running pre existing queries and reports. These users do not need tools that allow for building standard and ad hoc reports
  • 58. Building a Data Warehouse  User levels - Power Users: The user can use pre defined as well as user defined queries to create simple and ad hoc reports. These users can engage in drill down operations. These users may have the experience of using reporting and query tools.
  • 59. Building a Data Warehouse  User levels - Expert users: These users tend to create their own complex queries and perform standard analysis on the info they retrieve. These users have the knowledge about the use of query and report tools
  • 60. Building a Data Warehouse  Benefits of data warehousing: Locating the right information Presentation of information Testing of hypothesis Discovery of information Sharing the analysis
  • 61. Building a Data Warehouse  Tangible benefits (quantified / measureable):It includes Improvement in product inventory Decrement in production cost Improvement in selection of target markets Enhancement in asset and liability management
  • 62. Building a Data Warehouse  Intangible benefits (not easy to quantified): It includes, Improvement in productivity by keeping all data in single location and eliminating rekeying of data Reduced redundant processing Enhanced customer relation
  • 63. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel Query Processing:  The query optimization in distributed memory parallel systems.  Parallel RDBMS processing offers the solution to the traditional problem of poor RDMS performance for complex queries and very large databases.
  • 64. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel Query Processing: The accessing and processing of portion of database by individual threads in parallel can greatly improve the performance of the query.  Speed up: The ability to execute the same request on the same amount of data in less time.  Scale up: The ability to obtain the same performance on the same requests as the database size increases
  • 65. Mapping the Data Warehouse to a Multiprocessor Architecture  Types of parallelism:  Inter query Parallelism: In which different server threads or processes handle multiple requests at the same time.  Intra query Parallelism: This form of parallelism decomposes the serial SQL query into lower level operations such as scan, join, sort etc. Then these lower level operations are executed concurrently in parallel.
  • 66. Mapping the Data Warehouse to a Multiprocessor Architecture  Types of parallelism: Intra query parallelism can be done in either of two ways:  Horizontal parallelism: which means that the data base is partitioned across multiple disks and parallel processing occurs within a specific task that is performed concurrently on different processors against different set of data
  • 67. Mapping the Data Warehouse to a Multiprocessor Architecture  Types of parallelism: Intra query parallelism can be done in either of two ways:  Vertical parallelism: This occurs among different tasks. All query components such as scan, join, sort etc are executed in parallel in a pipelined fashion. In other words, an output from one task becomes an input into another task.
  • 68. Mapping the Data Warehouse to a Multiprocessor Architecture  Data partitioning  Data partitioning is the key component for effective parallel execution of data base operations. It spread data from database tables across multiple disks so that I/O operations such as read and write can be performed in parallel. Partition can be done randomly or intelligently
  • 69. Mapping the Data Warehouse to a Multiprocessor Architecture  Data partitioning  Random portioning includes random data striping across multiple disks on a single server. Another option for random portioning is round robin fashion partitioning in which each record is placed on the next disk assigned to the data base.  Since DBMS doe not know where each record resides.
  • 70. Mapping the Data Warehouse to a Multiprocessor Architecture  Data partitioning  Intelligent partitioning assumes that DBMS knows where a specific record is located and does not waste time searching for it across all disks. The various intelligent partitioning include:  Hash partitioning, Key range partitioning, Schema portioning and User defined portioning
  • 71. Mapping the Data Warehouse to a Multiprocessor Architecture  Data base architectures of parallel processing: Shared memory or shared everything Architecture Shared disk architecture Shred nothing architecture
  • 72. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared memory or shared everything Architecture:  Multiple PUs share memory.  Each PU has full access to all shared memory through a common bus.  Communication between nodes occurs via shared memory.  Performance is limited by the bandwidth of the memory bus.  It is providing consistent single system to the user.
  • 73. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared memory or shared everything Architecture:
  • 74. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared memory or shared everything Architecture:  Threads provide better resource utilization and faster context switching, thus providing for better scalability.  At the same time, threads that are too tightly coupled with the OS may limit RDBMS portability.  Scalability of shared-memory architectures is limited.
  • 75. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:
  • 76. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:  Each node consists of one or more PUs and associated memory. Memory is not shared between nodes. Communication occurs over a common high-speed bus. Each node has access to the same disks and other resources.
  • 77. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:  It implements the concept of shared ownership of the entire database between RBMS server, each of which is running on a node of a distributed memory system.  Each RDBMS server can read, write, update, and remove data from the same shared database, which would require the system to implement a form of a distributed lock manager(DLM).
  • 78. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:  Since the memory is not shared among the nodes, each node has its own data cache. Cache consistency must be maintained across the nodes and a lock manager is needed to maintain the consistency.  There is additional overhead in maintaining the locks and ensuring that the data caches are consistent
  • 79. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture – Advantages:  Shared disk systems permit high availability. All data is accessible even if one node dies.  These systems have the concept of one database, which is an advantage over shared nothing systems.  Shared disk systems provide for incremental growth.
  • 80. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture – Disadvantages: Inter-node synchronization is required, involving DLM overhead and greater dependency on high-speed interconnect. If the workload is not partitioned well, there may be high synchronization overhead. There is operating system overhead of running shared disk software..
  • 81. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture
  • 82. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture:  Shared nothing systems are typically loosely coupled. In shared nothing systems only one CPU is connected to a given disk. If a table or database is located on that disk, access depends entirely on the PU which owns it.
  • 83. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture – Advantages:  Shared nothing systems provide for incremental growth.  Failure is local: if one node fails, the others stay up.
  • 84. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture – Advantages:  More coordination is required  More overhead is required for a process working on a disk belonging to another node.
  • 85. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – Oracle: Architecture: shared disk architecture Data partition: Key range, hash, round robin Parallel operations: hash joins, scan and sort
  • 86. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – Informix: Architecture: Shared memory, shared disk and shared nothing models Data partition: round robin, hash, schema, key range and user defined Parallel operations: INSERT, UPDATE, DELELTE
  • 87. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – IBM:  Architecture: Shared nothing models  Data partition: hash  Parallel operations: INSERT, UPDATE, DELELTE, load, recovery, index creation, backup, table reorganization
  • 88. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – SYBASE: Architecture: Shared nothing models Data partition: hash, key range, Schema Parallel operations: Horizontal and vertical parallelism
  • 89. DBMS Schemas for Decision Support A data cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.  In general terms, dimensions are the perspectives or entities with respect to which an organization wants to keep records.
  • 90. DBMS Schemas for Decision Support  Example: A sales data warehouse in order to keep records of the store’s sales with respect to the dimensions time, item, branch, and location.  Each dimension may have a table associated with it, called a dimension table, which further describes the dimension. A dimension table for item may contain the attributes item name, brand, and type.
  • 91. DBMS Schemas for Decision Support  Example: A multidimensional data model is typically organized around a central theme, like sales, for instance. This theme is represented by a fact table. Facts are numerical measures. Examples of facts for a sales data warehouse include dollars sold (sales amount in dollars), units sold (number of units sold), and amount budgeted.
  • 92. DBMS Schemas for Decision Support  Example
  • 93. DBMS Schemas for Decision Support  Star schema  The most popular data model for a data warehouse is a multidimensional model.  Data warehouse contains a large central table (fact table) containing the bulk of the data, with no redundancy, and  A set of smaller associated tables (dimension tables), one for each dimension.  The dimension tables displayed in a radial pattern around the central fact table.
  • 94. DBMS Schemas for Decision Support  Star schema
  • 95. DBMS Schemas for Decision Support  Star schema – Characteristics: Simple structure - > easy to understand schema Great query effectives -> small number of tables to join Relatively long time of loading data into dimension tables -> de-normalization, redundancy data caused that size of the table could be large.
  • 96. DBMS Schemas for Decision Support  Snowflake schema:  The snowflake schema is a variant of the star schema model, where some dimension tables are normalized, thereby further splitting the data into additional tables.  The resulting schema graph forms a shape similar to a snowflake.
  • 97. DBMS Schemas for Decision Support  Snowflake schema:
  • 98. DBMS Schemas for Decision Support  Snowflake schema:  The major difference between the snowflake and star schema models is that the dimension tables of the snowflake model may be kept in normalized form to reduce redundancies.  Such a table is easy to maintain and saves storage space.
  • 99. DBMS Schemas for Decision Support Fact constellation  Sophisticated applications may require multiple fact tables to share dimension tables.  This kind of schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact constellation.
  • 100. DBMS Schemas for Decision Support Fact constellation
  • 101. Business Analysis: Reporting and Query Tools and Applications  The principal purpose of data warehousing is to provide information to business users for strategic decision making.  These users interact with the data warehouse using front-end tools, or by getting the required information through the information delivery system.
  • 102. Business Analysis: Reporting and Query Tools and Applications  Tool Categories  Reporting  Managed query  Executive information systems  OLAP  Data Mining
  • 103. Business Analysis: Reporting and Query Tools and Applications  Tool Categories – Reporting:  Production reporting Tools: Companies generate regular operational reports. Example: Calculating and printing paychecks.  Report Writers: They are having graphical interfaces and built-in charting functions.
  • 104. Business Analysis: Reporting and Query Tools and Applications  Tool Categories – Managed Tools:  These tools shield end users from the complexities of SQL and database structures by inserting a metalyer between users and the database.  They make it possible for knowledge workers to access corporate data without IS intervention.
  • 105. Business Analysis: Reporting and Query Tools and Applications  Tool Categories – EIS Tools:  EIS – Executive Information System  It allows developers to build customized, graphical decision support applications that give managers and executives a high level view of the business and access to external sources.
  • 106. Business Analysis: Reporting and Query Tools and Applications  Tool Categories – OLAP Tools:  A natural way to view corporate data.  These tools aggregate data along common business subjects or dimensions.  Users can drill down, across or up levels in each dimension
  • 107. Business Analysis: Reporting and Query Tools and Applications  Tool Categories – Data Mining Tools:  It uses a variety of statistical and artificial- intelligence algorithms to analyze the correlation of variables in the data and relationships to investigate.
  • 108. Business Analysis: Reporting and Query Tools and Applications  Cognos Impromptu:  Impromptu is an interactive database reporting tool.  It allows Power Users to query data without programming knowledge.  When using the Impromptu tool, no data is written or changed in the database. It is only capable of reading the data
  • 109. Business Analysis: Reporting and Query Tools and Applications  Cognos Impromptu – Features:  Interactive reporting capability Enterprise-wide scalability Superior user interface Fastest time to result Lowest cost of ownership
  • 110. Business Analysis: Reporting and Query Tools and Applications  Catalogs:  Impromptu stores metadata in subject related folders  This metadata will be used to develop a query for a report. The metadata set is stored in a file called a catalog.  The catalog does not contain any data.
  • 111. Business Analysis: Reporting and Query Tools and Applications  Catalogs:  It just contains information about connecting to the database and the fields that will be accessible for reports.
  • 112. Business Analysis: Reporting and Query Tools and Applications  Catalogs:  Folders—meaningful groups of information representing columns from one or more tables  Columns—individual data elements that can appear in one or more folders  Calculations—expressions used to compute required values from existing data  Conditions—used to filter information so that only a certain type of information is displayed
  • 113. Business Analysis: Reporting and Query Tools and Applications  Catalogs: Prompts—pre-defined selection criteria prompts that users can include in reports they create Other components, such as metadata, a logical database name, join information, and user classes
  • 114. Business Analysis: Reporting and Query Tools and Applications  Catalogs – Uses: view, run, and print reports export reports to other applications disconnect from and connect to the database create reports change the contents of the catalog
  • 115. Business Analysis: Reporting and Query Tools and Applications  Reports:  Reports are created by choosing fields from the catalog folders. This process will build a SQL (Structured Query Language) statement behind the scene.  No SQL knowledge is required to use Impromptu.
  • 116. Business Analysis: Reporting and Query Tools and Applications  Reports:  The data in the report may be formatted, sorted and/or grouped as needed.  Titles, dates, headers and footers and other standard text formatting features (italics, bolding, and font size) are also available.  Once the desired layout is obtained, the report can be saved to a report file.
  • 117. Business Analysis: Reporting and Query Tools and Applications  Frame-Based Reporting  Frames are the building blocks of all Impromptu reports and templates.  They may contain report objects, such as data, text, pictures, and charts.  There are no limits to the number of frames that you can place within an individual report or template. You can nest frames within other frames to group report objects within a report.
  • 118. Business Analysis: Reporting and Query Tools and Applications  Frame-Based Reporting  Form frame: An empty form frame appears.  List frame: An empty list frame appears.  Text frame: The flashing I-beam appears where you can begin inserting text.  Picture frame: The Source tab (Picture Properties dialog box) appears. You can use this tab to select the image to include in the frame.
  • 119. Business Analysis: Reporting and Query Tools and Applications  Frame-Based Reporting Chart frame: The Data tab (Chart Properties dialog box) appears. You can use this tab to select the data item to include in the chart. OLE Object: The Insert Object dialog box appears where you can locate and select the file you want to insert, or you can create a new object using the software listed in the Object Type box.
  • 120. Business Analysis: Reporting and Query Tools and Applications  Impromptu features  Unified query and reporting interface: It unifies both query and reporting interface in a single user interface.  Object oriented architecture: It enables an inheritance based administration so that more than 1000 users can be accommodated as easily as single user.
  • 121. Business Analysis: Reporting and Query Tools and Applications  Impromptu features Scalability: Its scalability ranges from single user to 1000 user Security and Control: Security is based on user profiles and their classes.  Data presented in a business context: It presents information using the terminology of the business.
  • 122. Business Analysis: Reporting and Query Tools and Applications  Impromptu features Over 70 pre defined report templates: It allows users can simply supply the data to create an interactive report Frame based reporting: It offers number of objects to create a user designed report Business relevant reporting: It can be used to generate a business relevant report through filters, pre conditions and calculations
  • 123. Online Analytical Processing (OLAP)  It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data.  Online Analytical Processing (OLAP) applications and tools are those that are designed to ask ―complex queries of large multidimensional collections of data
  • 124. Online Analytical Processing (OLAP)  Need:  It is the multidimensional nature of the business problem.  These problems are characterized by retrieving a very large number of records that can reach gigabytes and terabytes and summarizing this data into a form information that can by used by business analysts.
  • 125. Online Analytical Processing (OLAP)  The Multidimensional Data Model:  OLAP is on-line, it must provide answers quickly; analysts pose iterative queries during interactive sessions  Multidimensional data model is to view it as a cube.
  • 126. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Roll-up: The roll-up operation (also called the drill-up operation by some vendors) performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction.
  • 127. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:
  • 128. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Drill-down: Drill-down is the reverse of roll-up. It navigates from less detailed data to more detailed data.  Drill-down can be realized by either stepping down a concept hierarchy for a dimension or introducing additional dimensions.
  • 129. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:
  • 130. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Slice and dice: The slice operation performs a selection on one dimension of the given cube, resulting in a subcube.
  • 131. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  The dice operation defines a subcube by performing a selection on two or more dimensions.
  • 132. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  The dice operation defines a subcube by performing a selection on two or more dimensions.
  • 133. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Pivot (also called rotate) is a visualization operation that rotates the data axes in view to provide an alternative data presentation.
  • 134. OLAP vs OLTP OLTP OLAP Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations
  • 135. OLAP vs OLTP OLTP OLAP Processing Speed Typically very fast Depends on the amount of data involved; Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
  • 136. Online Analytical Processing (OLAP)  Categories of OLAP Tools – MOLAP:  This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. That is, data stored in array-based structures.
  • 137. Online Analytical Processing (OLAP)  Categories of OLAP Tools – ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. Data stored in relational tables
  • 138. Online Analytical Processing (OLAP)  Categories of OLAP Tools – HOLAP (MQE: Managed Query Environment)  HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP.  For summary-type information, HOLAP leverages cube technology for faster performance.  It stores only the indexes and aggregations in the multidimensional form while the rest of the data is stored in the relational database.