The document discusses business intelligence tools and data warehousing. It defines business intelligence tools as software used to analyze and present data to help with strategic decision making. It describes various BI tools like data integration tools, BI platforms, reporting and analysis tools, and performance management tools. It also discusses how data is extracted, transformed, and loaded from source databases into a data warehouse using ETL tools. The data warehouse is a repository of historical data designed to support analysis and decision making. It defines key concepts of data warehousing like subjects, integration, time-variant data, and non-volatility. Finally, it discusses data modeling techniques for data warehousing including entity-relationship modeling and dimensional modeling.
1. ASSIGNMENTSSubject code: MB0036(4 credits)Set 1Marks 60SUBJECT NAME: BUSINESS
INTELLIGENCE & TOOLSNote: Each Question carries 10 marksQ1. Define the term business
intelligence tools? Briefly explain how the data from oneend gets transformed into information at the
other end?Ans:
Business intelligence tools. The various tools of this suite are:
•
Data Integration Tools:
These tools extract, transform and load the data from the source databases to the target
database. There are two categories; Data Integrator andRapid Marts. Data Integrator is an
ETL tool with a GUI. Rapid Marts is a packagedETL with pre -built data models for
reporting and query analysis that makes initial prototype development easy and fast for
ERP applications.The important components of Data Integrator include;
Graphicaldesigner:
This is a GUI used to build and test ETL jobs for data cleansing,validation and auditing.
Data integration server:
This integrates data from different source databases.
Metadata repository:
T h i s r e p o s i t o r y k e e p s s o u r c e a n d t a r g e t m e t a d a t a a n d t h e transformation rules.
Administrator:
This is a web-based tool that can be used to start, stop, schedule andmonitor ETL jobs.
•
BI Platform:
This platform provides a set of common services to deploy, use andmanage the tools and
applications. These services include providing the security, broadcasting, collaboration, metadata
and developer services.
•
Reporting Tools and Query & Analysis Tools
: These tools provide the facility for standard reports generation, ad hoc queries and data
analysis.
•
Performance Management Tools:
These tools help in managing the performance of a business by analyzing and tracking key
metrics and goals.
•
B u s i n e s s i n t e l l i g e n c e t o o l s a r e a t yp e o f a p p l i c a t i o n s o f t w a r e d e s i g n e d t o
h e l p i n making better business decisions. These tools aid in the analysis and
presentation of data in a more meaningful way and so play a key role in the strategic planning
process o f a n o r g a n i z a t i o n . T h e y i l l u s t r a t e b u s i n e s s i n t e l l i g e n c e i n t h e a r e a s
o f m a r k e t research and segmentation, customer profiling, customer support,
profitability, andinventory and distribution analysis to name a few.
•
Various types of BI systems viz. Decision Support Systems, Executive
InformationSystems (EIS), Multidimensi onal Analysis software or OLAP (On-Line
AnalyticalProcessing) tools, data mining tools are discussed further. Whatever is
the type, theBusiness Intelligence capabilities of the system is to let its users slice
2. and dice theinformation from their organization‟s numerous databases without having to wait
for their IT departments to develop complex queries and elicit answers.
•
Although it is possible to build BI systems without the benefit of a d a t a
w a r e h o u s e , m o s t o f t h e s ys t e m s a r e a n i n t e g r a l p a r t o f t h e user-facing end of
the data warehouse in practice. In fact, we can never think of building a data warehouse
without BI Systems. That isthe reason; sometimes, the words „data warehousing‟ and
„businessintelligence‟ are being used interchangeably.
•
Figure 1.1
depicts how the data from one end gets transformed to information at the other end for
business information.
•
Q2.
What do you mean by data ware house? What are the major concepts andterminology used in the study of
data warehouse?Ans:
In simple terms, a data warehouse is the repository of an organization‟s historical data( a l s o
termed as the corporate memory). For example, an organization
w o u l d g e t t h e information that is stored in its data warehouse to find out what day of the
week they sold themost number of gadgets in May 2002, or how employees were on
sick leave for a specificweek.A data warehouse is a database designed to support decision
making in an organization. Here,the data from various production databases are copied
to the data warehouse so that queriescan be forwarded without disturbing the stability or
performance of the production systems.S o t h e m a i n f a c t o r t h a t l e a d s t o t h e u s e o f a
d a t a w a r e h o u s e i s t h a t c o m p l e x q u e r i e s a n d analysis can be obtained over the
information without slowing down the operational systems.While operational systems are
optimized for simplicity and speed of modification (online transaction processing, or
OLTP), the data warehouse is optimized for reporting and analysis(online analytical processing,
or OLAP). (
The concepts of OLTP and OLAP are discussed inlater Units
).A p a r t f r o m t r a d i t i o n a l q u e r y a n d r e p o r t i n g , a d a t a w a r e h o u s e p r o v i d e s
t h e b a s e f o r t h e p o w e r f u l d a t a a n a l ys i s t e c h n i q u e s s u c h a s d a t a m i n i n g
a n d m u l t i d i m e n s i o n a l a n a l ys i s (discussed in detail in later Units). Making use of these
techniques will result in easier accessto the information you need for informed decision making.
haracteristics of a Data Warehouse
According to Bill Inmon, who is considered to be the Father of Data warehousing, the data ina
Data Warehouse consists of the following characteristics:
Subject oriented
The first feature of DW is its orientation toward the major subjects of the organization insteadof
applications. The subjects are categorized in such a way that the subject-wise collection
of information helps in decision-making. For example, the data in the data
warehouse of aninsurance company can be organized as customer ID, customer
name, premium, payment period, etc. rather auto insurance, life insurance, fire insurance, etc.
Integrated
3. The data contained within the boundaries of the warehouse are integrated. This means that
allinconsistencies regarding naming convention and value representations need to be removedin
a data warehouse. For example, one of the applications of an organization might
codegender as „m‟ and „f‟ and the other application might code the same
functionality as „0′ and„ 1 ′ . W h e n t h e d a t a i s m o v e d f r o m t h e o p e r a t i o n a l
e n v i r o n m e n t t o t h e d a t a w a r e h o u s e environment, this will result in conflict.
Time variant
The data stored in a data warehouse is not the current data. The data is a time series
data asthe data warehouse is a place where the data is accumulated periodically.
This is in contrastto the data in an operational system where the data in the
databases are accurate as of themoment of access.
Non-volatility of the data
The data in the data warehouse is non-volatile which means the data is stored in a read-
onlyf o r m a t a n d i t d o e s n o t c h a n g e o v e r a p e r i o d o f t i m e . T h i s i s t h e r e a s o n
t h e d a t a i n a d a t a warehouse forms as a single source for all decision system support
processing.Keeping the above characteristics in view, „
data warehouse
„can be defined as
a subject-oriented, integrated, non-volatile, time-variant collection of data designed
to support thedecision-making requirements of an organization
.
Q 3. What are the data modeling techniques used in data warehousing environment?Ans:
There are two data modeling techniques that are relevant in a data
warehousingenvironment. They are Entity Relationship modeling (ER
m o d e l i n g ) a n d d i m e n s i o n a l modeling.
•
ER modeling produces a data model of the specific area of interest, using two basicconcepts:
Entities and the Relationships
between them. A detailed ER model mayalso contain attributes, which can be properties of either
the entities or the
relationships. The ER model is an abstraction tool as it can be used to simplify,understand and
analyze the ambiguous data relationships in the real business world.
•
Dimensional modeling uses three basic concepts:
Facts, Dimensions and Measures.Dimensional
modeling is powerful in representing the requirements of the businessuser in the context of
database tables and also in the area of data warehousing.Both ER and dimensional modeling
can be used to create an abstract model of a specific subject. However, each of them has
its own limited set of modeling concepts and associatedn o t a t i o n c o n v e n t i o n s .
C o n s e q u e n t l y, t h e t e c h n i q u e s s e e m d i f f e r e n t , a n d t h e y a r e i n d e e d different
in terms of semantic representation. There is much debate as to which method
is better and the conditions under which a specific technique is to be selected. There can be
nodefinite answer, understanding of the circumstances and the business
requirements finallylead to selection of an appropriate technique.
Entity- Relationship (E-R) ModelingBasic Concepts
4. MB0036 – Business Intelligence & Tools
relationships. The ER model is an abstraction tool as it can be used to simplify,understand and
analyze the ambiguous data relationships in the real business world.
•
Dimensional modeling uses three basic concepts:
Facts, Dimensions and Measures.Dimensional
modeling is powerful in representing the requirements of the businessuser in the context of
database tables and also in the area of data warehousing.Both ER and dimensional modeling
can be used to create an abstract model of a specific subject. However, each of them has
its own limited set of modeling concepts and associatedn o t a t i o n c o n v e n t i o n s .
C o n s e q u e n t l y, t h e t e c h n i q u e s s e e m d i f f e r e n t , a n d t h e y a r e i n d e e d different
in terms of semantic representation. There is much debate as to which method
is better and the conditions under which a specific technique is to be selected. There can be
nodefinite answer, understanding of the cir cumstances and the business
requirements finallylead to selection of an appropriate technique.
Entity- Relationship (E-R) ModelingBasic Concepts
An ER model is represented by an ER diagram, which uses three basic graphic
symbols toconceptualize the data: entity, relationship, and attribute.
Entity
An entity is defined to be a person, place, thing, or event of interest to the business
or theorganization. It represents a class of objects, which are things in the real business world
thatcan be observed and classified by their properties and characteristics. In general, an entity
hasits own business definition and a clear boundary definition that is required to describe what
isincluded and what is not.In a practical modeling project, the team members share a definition
template for integrationand a consistent entity definition in the model. In case of a high-level
business modeling, anentity can be very generic, but it must be quite specific in the
detailed logical modeling.There are four entities; PRODUCT, PRODU CT MODEL,
PRODUCT COMPONENT, andCOMPONENT in the ER diagram (Refer Figure 4.1) and are
represented as rectangles.
Fig. 4.1: A Simple ER Model
5
5. An ER model is represented by an ER diagram, which uses three basic graphic
symbols toconceptualize the data: entity, relationship, and attribute.
Entity
An entity is defined to be a person, place, thing, or event of interest to the business
or theorganization. It represents a class of objects, which are things in the real business world
thatcan be observed and classified by their properties and characteristics. In general, an entity
hasits own business definition and a clear boundary definition that is required to describe what
isincluded and what is not.In a practical modeling project, the team members share a definition
template for integrationand a consistent entity definition in the model. In case of a high-level
business modeling, anentity can be very generic, but it must be quite specific in the
detailed logical modeling.There are four entities; PRODUCT, PR ODUCT MODEL,
PRODUCT COMPONENT, andCOMPONENT in the ER diagram (Refer Figure 4.1) and are
represented as rectangles.
Fig. 4.1: A Simple ER Model
he four diagonal lines on the corners of the PRODUCT COMPONENT entity represent thatthe
entity is „an associative entity‟ and the entity is to resolve the many-to-many
relationship between two entities. PRODUCT MODEL and COMPONENT are independent of
each other but have a business relationship between them. A PRODUCT MODEL
consists of manycomponents and a component is related to many product models. With this
6. business rule, youc a n n o t t e l l w h i c h c o m p o n e n t s m a k e u p a p r o d u c t m o d e l . T o
d o t h i s , yo u c a n d e f i n e a r e s o l v i n g e n t i t y . F o r e x a m p l e , t h e
P R O D U C T C O M P O N E N T e n t i t y c a n p r o v i d e t h e information about which
components are related to which product model.I n E R m o d e l i n g , n a m i n g t h e
entities is important for easy understanding and
c l e a r communication. It is expressed grammatically in the form of a noun rather
than a verb andt h e c r i t e r i a f o r s e l e c t i n g a n e n t i t y n a m e d e p e n d o n
h o w w e l l t h e n a m e r e p r e s e n t s t h e characteristics and scope of the entity.
Also, defining a unique identifier of an entity is the most critical task. These unique
identifiers are called candidate keys. Among them, you canselect the key that is most commonly
used to identify the entity, called „primary key‟.
Relationship
Relationships represent the structural interaction and association among the entities
in amodel and they are represented with lines drawn between the two specific entities.
Generally,a relationship is named grammatically by a verb (such as owns, belongs,
and has) and ther e l a t i o n s h i p b e t w e e n t h e e n t i t i e s c a n b e d e f i n e d i n t e r m s o f
t h e c a r d i n a l i t y. C a r d i n a l i t y represents the maximum number of instances of one entity
that are related to a single instancein another table and vice versa. Thus the possible cardinalities
include one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M). In a detailed
normalized ER model, any M:Mrelationship is not shown because it is resolved to an
associative entity.
Attributes
Attributes describe the characteristics of properties of the entities. The ProductID,
Description, and Picture are attributes of the PRODUCT entity in Figure 4.1. T h e
name of an attribute has to be unique in an entity and should be
s e l f - explanatory to ensure clarity. For example, rather naming date1 and date2, youm a y u s e
t h e n a m e s ; o r d e r d a t e a n d d e l i v e r y d a t e . W h e n a n i n s t a n c e h a s n o value for
an attribute, the minimum cardinality of the attribute is zero, which means either
nullable or optional.In Figure 4.1, you can see the characters P, m, o, and F that stand
for primarykey, mandatory, optional, and foreign key. The Picture attribute of the
PRODUCTentity is optional, which means it is nullable. A foreign key of an entity is definedto
be the primary key of another entity. In figure 4.1, the Product ID attribute of t h e
PRODUCT MODEL entity is a foreign key as it is the primary key
o f t h e PRODUCT entity. These foreign keys are useful in determining the
relationshipssuch as the referential integrity between the entities.
Other ConceptsSupertype and Subtype
An entity can have subtypes and supertypes and the relationship between a supertype entityand
its subtype entity is an IS A relationship. An IS A relationship is used where one entity is
a generalization of several more specialized entities. The supertype and subtype relationshipis
represented by a triangle on the relationship. Figure 4.2 shows an example of supertype
ands u b t y p e e n t i t i e s w h e r e i n S A L E S O U T L E T i s t h e s u p e r t y p e o f
R E T A I L S T O R E a n d CORPORATE SALES OFFICE and RETAIL STORE,
CORPORATE SALES OFFICE aresubtypes of SALES OUTLET. Here, each subtype entity
inherits attributes from its supertypeentity.Also, each subtype entity can have its own
distinctive attributes. In the example providedabove, Region ID and Outlet ID are
7. inherited attributes and the sub entities have their own a t t r i b u t e s ( s u c h a s
number of cash registers and floor space of the RETAIL
S T O R E subentity). The practical benefit of supertyping and subtyping is that they make a data
modelmore directly expressive. Just by looking at the ER diagram, you can see that sales outlets
arecomposed of „retail stores‟ and „corporate sales offices‟.
Fig. 4.2: Supertype and Subtype
Other important concepts in the area of ER modeling are „domain‟ and „normalization‟.
•
A domain consists of all the possible acceptable values and categories that areallowed for an
attribute. It is the set of all real possible occurrences. The format or data type, such as integer,
date, and character, provides a clear definition of domain.The practical benefit of domain is that
it is imperative for building the data dictionaryor repository, and for implementing the database
consequently.
•
Normalization is a process of assigning the attributes to entities which in a wayreduces data
redundancy, avoids data anomalies, provides a solid architecture for updating data, and
reinforces the long-term integrity of the data model (the thirdnormal form is usually adequate).
Dimensional Modeling
Dimensional modeling is a relatively new concept compared to ER modeling. This method is
simpler,more expressive, and easier to understand. This technique is mainly aimed at
conceptualizing andvisualizing data models as a set of measures that are described by
common aspects of the business. Itis useful for summarizing and rearranging the data
and presenting views of the data to support data analysis. Also, the technique focuses on
numeric data, such as values, counts, and weights
Q 4. Discuss the categories in which data is divided before structuring it into data
warehouse?Ans:
The Data Warehouses can be divided into two types:
•
Enterprise Data Warehouse
•
Data Mart
Enterprise Data Warehouse
The Enterprise data warehouse consists of the data drawn from multiple operational systemso f
an organization. This data warehouse supports time -series and trend
a n a l ys i s a c r o s s different business areas of an organization and so can be used for strategic
decision-making.Also, this data warehouse is used to populate various data marts.
Data Mart
As data warehouses contain larger amounts of data, organizations often create „data
marts‟that are precise, specific to a department or product line. Thus data mart is a
physical andlogical subset of an Enterprise data warehouse and is also termed as a
department-specificdata warehouse. Generally, data marts are organized around a single
business process.There are two types of data marts; independent and dependant. The data is fed
directly fromthe legacy systems in case of an independent data mart and the data is fed from the
enterprisedata warehouse in case of a dependent data mart. In the long run, the
dependent data martsare much more stable architecturally than the independent data marts.
Advantages and Limitations of a DW System
8. Use of a data warehouse brings in the following advantages for an organization:
•
End-users can access a wide variety of data.
•
Management can obtain various kinds of trends and patterns of data.
•
A warehouse provides competitive advantage to the company by providing the data andtimely information.
•
A warehouse acts as a significant enabler of commercial business applications viz.,
Customer Relationship Management (CRM) applications.
However, following are the concerns that one has to keep in mind while
u s i n g a d a t a warehouse:
•
The scope of a Data warehousing project is to be managed carefully to attain the definedcontent and value.
•
The process of extracting, cleaning and loading the data and finally storing it into a
datawarehouse is a time-consuming process.
•
The problems of compatibility with the existing systems need to be resolved before building
adata warehouse.
•
Security of the data may become a serious issue, especially if the warehouse is webaccessible.
•
Building and maintenance of the data warehouse can be handled only through skilledresources and requires huge
investment.
MB0036 – Business Intelligence & Tools
Data Warehouse Concepts and Terminology
Various concepts and the key terms used in the study of data warehouse are provided below.
•
Dashboard:
This is a reporting tool that consolidates aggregates and arranges measurements,
metrics(measurements compared to a goal) on a single screen so that information can be
monitoredat a glance.
•
Data Management:
This is the process of controlling, protecting, and facilitating access to data in order
to provide the end users with timely access to the data they need.
•
Data Mining (or Data Surfing):
T h i s i s a t e c h n i q u e g e a r e d f o r t h e t yp i c a l u s e r w h o d o e s n o t k n o w e x a c t l y
w h a t h e i s searching for, but is looking for particular patterns or trends. Data
mining is the process of sifting through large amounts of data to produce data
content relationships. It can predictf u t u r e t r e n d s a n d b e h a v i o r s , a l l o w i n g
b u s i n e s s e s t o m a k e p r o a c t i v e , k n o w l e d g e - d r i v e n decisions. The most valuable
9. results from data mining include clustering, classifying, andestimating the things
that occur together. There are many kinds of t ools that play a role ind a t a m i n i n g
and they include neural networks, decision trees, visualization,
g e n e r a l algorithms, fuzzy logic, etc.
•
Data Modeling:
A method used to define and analyze data requirements needed to support the
businessfunctions of an organization.
•
Data Profiling:
Data Profiling is a critical step in data migration that automates the
identification of problematic data and metadata, and enables
o r g a n i z a t i o n s t o c o r r e c t i n c o n s i s t e n c i e s , redundancies and inaccuracies in their
databases.
•
Data Visualization:
Data visualization involves examining the data represented by dynamic images rather than pure
numbers. These are the techniques that turn the data into information by using the highcapacity
of the human brain to visually recognize patterns and trends
•
Decentralized Warehouse:
A remote data source that users can query/access via a central gateway that provides
alogical view of corporate data in terms that users can understand. The gateway
parses anddistributes queries in real time to remote data sources and returns result sets back to
users.
•
Drill-down:
This is the capacity to browse the information through a hierarchical structure as
shown below.
•
External Data Source:
This is the data that is not available in the OLTP system s, but is required to
enhance theinformation quality in the data warehouse. The examples of this data
include the data of thec o m p e t i t o r s , i n f o r m a t i o n o f t h e r e g u l a t o r y a n d
g o v e r n m e n t b o d i e s , r e s e a r c h d a t a o f t h e professional bodies and universities.
•
Metadata:
Metadata is data about data. The examples of metadata include data element descriptions,data
type descriptions, attribute descriptions, and process descriptions.
On-Line Analytical Processing (OLAP):
This is a category of software technology that en ables the users gain
i n s i g h t i n t o d a t a through fast, consistent, interactive access to a wide variety of possible
views of informationthat has been transformed from raw data to reflect the real dimensionality of
the organization.This is implemented in a multi-user client/server mode and offers consistently
rapid responset o q u e r i e s , r e g a r d l e s s o f d a t a b a s e s i z e a n d c o m p l e x i t y .
T h i s s o f t w a r e i s a l s o c a l l e d Multidimensional Analysis Software.
10. •
On-Line Transaction Processing (OLTP):
This is the way the data is processed by an end user/a computer system. Here, the
data isdetail oriented, highly repetitive with larger amounts of updates and changes. The major
task of these systems is to perform on-line transaction and query processing. These systems
cover m o s t o f t h e d a y - t o - d a y o p e r a t i o n s o f t h e o r g a n i z a t i o n , s u c h a s
p u r c h a s i n g , i n v e n t o r y, manufacturing, payroll, banking, accounting and registration
MB0036 – Business Intelligence & Tools
•
Operational Databases:
These are detail oriented databases defined to meet the needs of complex processes
of ano r g a n i z a t i o n . H e r e , t h e d a t a i s h i g h l y n o r m a l i z e d t o a v o i d d a t a
r e d u n d a n c y a n d d o u b l e - maintenance. A large number of transactions take place every
hour on these databases and area l w a ys “ u p t o d a t e ” a n d r e p r e s e n t a s n a p s h o t o f
t h e c u r r e n t s i t u a t i o n . C o n t r a s t t o t h e s e databases, there are Informational databases
that are stable over a period of time to representa situation at a specific point in time in the past.
Architecture of a Data Warehouse
The architecture describes the overall system of a Data Warehouse from various
perspectivess u c h a s d a t a , p r o c e s s , a n d i n f r a s t r u c t u r e t o s t u d y t h e i n t e r -
r e l a t i o n s h i p s a m o n g v a r i o u s components.
•
The data perspective includes the source and target data structures and so it aids the user
inunderstanding what data assets are available in a data warehouse and how they are related.
•
The process perspective is primarily concerned with communicating the process and flow of data
from the originating source system through the process of loading the data warehouseand
extracting data from the warehouse.
•
The infrastructure or technology perspective details the various hardware and softwareproducts
used to implement the distinct components of the overall system.
Depending upon the specifics of an organizational situation, the following types of
DataWarehouse architectures are provided below:
•
Basic architecture of a Data warehouse
•
Architecture of a Data warehouse with Staging area
•
Architecture of a Data warehouse with Staging area and Data marts
Fig 2.1 shows a simple architecture of a data warehouse wherein the end users
directlyaccess the data derived from several source systems through the data warehouse.
Q 5. Discuss the purpose of executive information system in an organization?
Ans:
11. An Executive Information System (EIS) is a set of management tools supporting theinformation and decision-
making needs of management by combining information availablewithin the organisation with external
information in an analytical framework.
EIS are targeted at management needs to quickly assess the status of a business or section
of business. These packages are aimed firmly at the type of business user who needs instant
andup to date understanding of critical business information to aid decision making.The idea
behind an EIS is that information can be collated and displayed to the user withoutmanipulation
or further processing. The user can then quickly see the status of his chosendepartment or
function, enabling them to concentrate on decision making. Generally an EISis configured to
display data such as order backlogs, open sales, purchase order backlogs, shipments, receipts
and pending orders. This information can then be used to make executivedecisions at a strategic
level.The emphasis of the system as a whole is the easy to use interface and the integration with
avariety of data sources. It offers strong reporting and data mining capabilities which can provide
all the data the executive is likely to need. Traditionally the interface was menudriven with either
reports, or text presentation. Newer systems, and especially the newer Business Intelligence
systems,which are replacing EIS, have adashboard or scorecard type display.Before these
systems became available, decision makers had to rely on disparate spreadsheetsand reports
which slowed down the decision making process. Now massive amounts of relevant information
can be accessed in seconds. The two main aspects of an EIS system areintegration and
visualisation. The newest method of visualisation is theDashboard and Scorecard.
TheDashboardis one screen that presents key data and organisational informationon an almost
real time and integrated basis. The Scorecard is another one screen display withmeasurement
metrics which can give a percentile view of whatever criteria the executivechooses.Behind these
two front end screens can be an immensedata processing infrastructure, or acouple of
integrateddatabases, depending entirely on the organisation that is using thesystem. The
backbone of the system is traditional server hardware and a fast network. TheEIS software itself
is run from here and presented to the executive over this network. Thedatabases needs to be fully
integrated into the system and have real-time connections both inand out. This information then
needs to be collated, verified, processed and presented to theend user, so a real-time connection
into the EIS core is necessary.Executive Information Systems come in two distinct types: ones
that are data driven, andones that are model driven. Data driven systems interface with databases
and datawarehouses. They collate information from different sources and presents them to the
user inan integrated dashboard style screen. Model driven systems use forecasting, simulations
anddecision tree like processes to present the data.As with any emerging and progressive market,
service providers are continually improvingtheir products and offering new ways of doing
business. Modern EIS systems can also present industry trend information and competitor
behaviour trends if needed. They can filter and analyse data; create graphs, charts and scenario
generations; and offer many other optionsfor presenting data.
There are a number of ways to link decision making to organisational performance. From
adecision maker's perspective these tools provide an excellent way of viewing data.
Outcomesdisplayed include single metrics, trend analyses, demographics, market shares and a
myriadof other options. The simple interface makes it quick and easy to navigate and call
theinformation required.For a system that seems to offer business so much, it is used by
relatively few organisations.Current estimates indicate that as few as 10% of businesses use EIS
systems. One of thereasons for this is the complexity of the system and support infrastructure. It
is difficult tocreate such a system and populate it effectively. Combining all the necessary
12. systems anddata sources can be a daunting task, and seems to put many businesses off
implementing it.The system vendors have addressed this issue by offering turnkey solutions for
potentialclients. Companies like Actuate and Oracle are both offering complete out of the
boxExecutive Information Systems, and these aren't the only ones. Expense is also an issue.
Oncethe initial cost is calculated, there is the additional cost of support infrastructure,training,
andthe means of making the company data meaningful to the system.Does EIS warrant all of this
expense? Green King certainly thinks so. They installed aCognos system in 2003 and their first
few reports illustrated business opportunities in excessof £250,000. The AA is also using a
Business Objects variant of an EIS system and theyexpect a return of 300% in three years.
(Guardian 31/7/03)An effective Executive Information System isn't something you can just set
up and leave it todo its work. Its success depends on the support and timely accurate data it gets
to be able to provide something meaningful. It can provide the information executives need to
makeeducated decisions quickly and effectively. An EIS can provide a competitive edge
to business strategy that can pay for itself in a very short space of time.
Q6. Discuss the challenges involved in data integration and coordination process?Ans:
In general, most of the data that the warehouse gets is the data extracted
f r o m a c o m b i n a t i o n o f l e g a c y m a i n f r a m e s ys t e m s , o l d m i n i c o m p u t e r
a p p l i c a t i o n s , a n d s o m e client/server systems. But these source systems do not
conform to the same set of businessrules. Thus they may often follow different naming
conventions and varied standards for datarepresentation. Thus the process of data integration and
consolidation plays a vital role. Here,the data integration includes combining of all
relevant operational data into coherent data s t r u c t u r e s s o a s t o m a k e t h e m r e a d y
for loading into data warehouse. It standardizes thenames and data
representations and resolves the discrepancies. So me of the
c h a l l e n g e s involved in the data integration and consolidation process are as follows.
Identification of an Entity
Suppose there are three legacy applications that are in use in your organization; one
is theorder entry system, second is customer service support system, and the third is the
marketingsystem. Each of these systems might have their own customer file to
support the system.Even most of the customers will be common to all these three
files, the same customer oneach of these files have a different unique identification number.
As you need to keep a single record for each customer in a data warehouse, you
need to getthe transactions of each customer from various source systems and then
match them up toload into the data warehouse. This is an entity identification
problem in which you do notknow which of the customer records relate to the same
customer. This problem is prevalentwhere multiple sources exist for the same entities and
the other entities that are prone to thistype of problem include vendors, suppliers, employees, and
various products manufactured by a company.In case of three customer files, you have to design
complex algorithms to match records froma l l t h e t h r e e f i l e s a n d g r o u p s o f m a t c h i n g
records. But this is a difficult exercise . If thematching criterion is too
t i g h t , t h e n s o m e r e c o r d s m i g h t e s c a p e t h e g r o u p s . S i m i l a r l y, a particular
group may include records of more than one customer if the matching
criteriond e s i g n e d i s t o o l o o s e . A l s o , y o u m i g h t h a v e t o i n v o l v e y o u r
u s e r s o r t h e r e s p e c t i v e stakeholders to understand the transaction accurately.
Some of the companies attempt this problem in two phases. In the first phase, the
entire records, irrespective whether they are duplicates or not, are assigned unique
13. identifiers and in the second phase, the duplicates arereconciled periodically ether through
automatic algorithms or manually.
Existence of Multiple Sources
Another major challenge in the area of data integration and consolidation results
from asingle data element having more than one source. For instance, cost values are calculated
andu p d a t e d a t s p e c i f i c i n t e r v a l s i n t h e s t a n d a r d c o s t i n g a p p l i c a t i o n .
S i m i l a r l y, yo u r o r d e r processing application also carries the unit costs for all products.
Thus there are two sourcesavailable to obtain the unit cost of a product and so there
could be a slight variation in their v a l u e s . W h i c h o f t h e s e s ys t e m s n e e d s t o b e
c o n s i d e r e d t o s t o r e t h e u n i t c o s t i n t h e d a t a warehouse becomes an important
question. One easy way of han dling this situation is to prioritize the two sources, or you
may select the source on the basis of the last update date.
Implementation of Transformation
The implementation of data transformation is a complex exercise. You
m a y h a v e t o g o beyond the manual methods, usual methods of writing conversion programs
while deployingthe operational systems. You need to consider several other factors to decide the
methods to be adopted. Suppose you are considering automating the data transformation
functions, youhave to identify, configure and install the tools, train the team on these
tools, and integratethem into the data warehouse environment. But a combination of both
methods proves to beeffective. The issues you may face in using manual methods and
transformation tools arediscussed below.
Manual Methods
These are the traditional methods that are in practice in the recent past. These
methods area d e q u a t e i n c a s e o f s m a l l e r d a t a w a r e h o u s e s . T h e s e m e t h o d s
i n c l u d e m a n u a l l y c o d e d programs and scripts that are mainly executed in the data
staging area. Since these methodsc a l l f o r e l a b o r a t e c o d i n g a n d t e s t i n g
a n d p r o g r a m m e r s a n d a n a l y s t s w h o p o s s e s t h e specialized knowledge in this
area only can produce the programs and scripts.Although the initial cost may be
reasonable, ongoing maintenance may escalate the costwhile implementing these
methods. Moreover these methods are always prone to errors . Another disadvantage
of these methods is about the creation of metadata. Even if the in -house programs
record the metadata initially, the metadata needs to be updated every time thechanges occur in
the transformation rules.
Transformation Tools
The difficulties involved in using the manual methods can be
e l i m i n a t e d u s i n g t h e sophisticated and comprehensive set of transformation
tools that are now available. Use of these automated tools certainly improves efficiency
and accuracy. If the inputs provided intothe tools are accurate, then the rest of the work
is performed efficiently by the tool. So youhave to carefully specify the required
parameters, the data definitions and the rules to the transformation tool.A l s o , t h e
t r a n s f o r m a t i o n t o o l s e n a b l e t h e r e c o r d i n g o f m e t a d a t a . W h e n yo u s p e c i f y
t h e transformation parameters and rules, these values are stored as metadata by the tool and
thismetadata becomes a part of the overall metadata component of the data
warehouse. Whenchanges occur to business rules or data definitions, you just have to enter the
14. changes into thetool and the metadata for the transformations get adjusted
automatically. But relying on thetransformation tools alone without using the manual
methods is also not practically possible.
Transformation for Dimension Attributes
Now we consider the updating of the dimension tables. The dimension tables are more stablein
nature and so they are less volatile compared to the fact tables. The fact tables
changethrough an increase in the number of rows, but the dimension tables change
through thechanges to the attributes. For instance, we consider a product dimension
table. Every year,rows are added as new models become available. But what about the
attributes that are withint h e d i m e n s i o n t a b l e . Y o u m i g h t f a c e a s i t u a t i o n w h e r e
t h e r e i s a c h a n g e i n t h e p r o d u c t dimension table because a particular product was
moved into a different product category. Sothe corresponding values must be changed in
the product dimension table. Though most of the dimensions are generally constant over a
period of time, they may change slowly.
http://www.scribd.com/doc/44415081/MB0036-Business-Intelligence-amp-Tolls-Fall-10
15. SET 2
Q.1 Explain business development life cycle in detail? [10 Marks]
May 092012
Ans.
The Business development Lifecycle is a methodology adopted for planning, designing,
implementing and maintaining the BI system. Various steps involved in this approach are
depicted below.
Each of the phases in the above life cycle is described below.
Project Planning
Developing a project plan involves identification of all the tasks necessary to implement the BI
project. The Project Manager identifies the key team members, assigns the tasks, and develops
the effort estimates for their tasks. There is much interplay between this activity and the activity
of defining the Business Requirements and aligning the BI system/data warehouse system with
the business requirements is very crucial. Therefore you need to understand the business
requirements properly before proceeding further.
Project Management
This is the phase wherein the actual implementation of the project takes place. The first step here
is to define the business requirements and the implementation is carried out in three phases on
the basis of the requirements. The first phase (includes technical architecture design, selection
and installation of a product) deals with technology, the second phase (includes Dimensional
Modeling, Physical Design, ETL Design & Development) focuses on data and the last phase
(includes BI Application Specification, BI Application Development) deals with design and
development of analytical applications. The steps in these phases are discussed below.
1 Defining the Business Requirements
Business requirements are the bedrock of the BI system and so the Business Requirements
Definition acts as the foundation of the Lifecycle methodology. The business requirements
16. defined at this stage provide the necessary guidance to make the decisions. This process mainly
includes the following activities:
Requirements planning
Collecting the business requirements
Post-collection documentation and follow-up
2 Technical Architecture Design
Creation of the Technical Architecture includes the following steps:
1. Establishing an Architecture task-force
2. Collecting Architecture-related requirements
3. Documenting the Architecture requirements
4. Developing a high-level Architectural model
5. Designing and specifying the subsystems
6. Determining Architecture implementation phases
7. Documenting the technical Architecture
8. Reviewing and finalizing the Architecture
3 Selection and Installation of a Product
The selection and the installation of a business intelligence product is carried out in the following
steps:
1. Understanding the corporate purchasing process
2. Developing a product evaluation matrix
3. Conducting market research
4. Shortlisting the options and performing detailed evaluations
5. Conducting a prototype (if necessary)
17. 6. Selecting a product, installing on trial, and negotiating the value/price.
4 Dimensional Modeling
A dimensional model packages the data in a symmetric format whose design goals are obtaining
the user know-how, query performance, and resilience to change. In this step, a data-modeling
team is formed and design workshops are conducted to create the dimensional model. Once the
modeling team is confident of the model prepared, the model is demonstrated and validated with
a broader audience and then documented.
5 Physical Design
In this step, the dimensional model created in the previous step is translated into a physical
design. The physical model includes the details viz., physical database, data types, key
declarations, permissibility of nulls.
6 ETL Design & Development
ETL stands for Extraction, Transformation, and Loading. ETL tools are used to extract the data
from the operational data sources and to load the same into a data warehouse.
7 BI Application Specification
In this step, a set of analytical applications are identified for building a BI system based on the
business requirements definition, type of data being used, and the architecture of the warehouse
proposed.
8 BI Application Development
This is step wherein a specific application (tool) is selected from the identified applications for
actual implementation of the BI system.
9 Deployment
This is the step wherein the technology, data and analytical application tracks are converged. The
completion of this step can be assumed as the completion of actual building of the BI system.
10 Maintenance & Growth
During this step, the project team provides the user-support to the end-users of the system. Also,
the team involves in providing the technical support required for the system so as ensure the
18. continuous utilization of the system. This step may also include making some minor
enhancements to the BI system.
Revising the Project Planning
As the project makes progress, the project manager of the project has to revise the project plan to
accommodate the new business interests, concerns raised by the end-users.
http://www.scribd.com/doc/75437915/MI0036-SET-1-amp-SET-2
http://www.scribd.com/doc/75437878/MI0034-SET-1-amp-SET-2
http://www.scribd.com/santosh143hsv143