The document provides information about data warehousing concepts. It defines a data warehouse as a relational database designed for query and analysis rather than transactions. It contains historical data from various sources and separates analysis from transaction workloads. The goals of a data warehouse are to provide a single source of integrated information, give users direct access to data without relying on IT, and allow predictive modeling. Factors like significant user requests for related historical data and advanced decision support needs should be considered when implementing a data warehouse.
3. DWH Concepts
What is a DATA WAREHOUSE?
A data warehouse is a relational database that is designed for query and analysis
rather than for transaction processing. It usually contains historical data derived from
transaction data, but it can include data from other sources. It separates analysis
workload from transaction workload and enables an organization to consolidate data
from several sources. In addition to a relational database, a data warehouse
environment includes an extraction, transportation, transformation, and loading (ETL)
solution, an online analytical processing (OLAP) engine, client analysis tools, and
other applications that manage the process of gathering data and delivering it to
business users.
® A data warehouse is a database designed to support a broad range of decision
tasks in a specific organization. It is usually batch updated and structured for rapid
online queries and managerial summaries. Data warehouses contain large amounts
of historical data. The term data warehousing is often used to describe the process
of creating, managing and using a data warehouse.
What are the characteristics of a DATA WAREHOUSE?
The characteristics of a DWH are
• Subject-Oriented: DWH’s are designed to help you analyze data. For example,
to learn more about the company’s sales data, you can build a warehouse that
concentrates on sales. This ability to define a DWH by subject matter, sales in
this case makes the DWH subject oriented.
• Integrated: It is closely related to subject orientation. DWH’s put data from
desperate sources into a consistent format. They must resolve such problems as
naming conflicts and inconsistencies among units of measure. When they
achieve this, they are said be integrated.
• Nonvolatile: It means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable you to
analyze what has occurred and whatever once happened never changes.
• Time-Variant: In order to discover trends, analysts need large amounts of data.
This is very much in contrast to OLTP systems, where performance requirements
demand that historical data be moved to an archive. A DWH focus on change
over time is what is meant by the term time variant.
What are the goals of a DATA WAREHOUSE?
4. The goals of a DATA WAREHOUSE are
• To provide a reliable, single integrated source of key corporate information.
• To give end users access to their data without a reliance on reports produced by
the information system department.
• To allow analysts to analyze corporate data and even produce predictive “what if”
models from that data.
The data warehouse is simply one component of modern reporting architectures.
The real goal of reporting systems are decision support –or its modern equivalent
Business intelligence-to help people makes better, more intelligent decision.
When should a company consider implementing a data warehouse?
Data warehouses or a more focused database called a data mart should be
considered when a significant number of potential users are requesting access to a
large amount of related historical information for analysis and reporting purposes.
So-called active or real-time data warehouses can provide advanced decision
support capabilities.
What are the uses of DATAWAREHOUSE?
• It separates analysis workload and enables an organization to consolidate data
from several sources.
• It manages the process of gathering data and delivering to business users.
• It is used to analyze data.
• It puts data from desperate sources into a consistent format.
What are the benefits of data warehousing?
Some of the potential benefits of putting data into a data warehouse include:
1. Improving turnaround time for data access and reporting;
2. Standardizing data across the organization so there will be one view of the
"truth";
3. Merging data from various source systems to create a more comprehensive
information source;
4. Lowering costs to create and distribute information and reports;
5. Sharing data and allowing others to access and analyze the data;
6. Encouraging and improving fact-based decision-making.
What are the limitations of data warehousing?
5. The major limitations associated with data warehousing are related to user
expectations, lack of data and poor data quality. Building a data warehouse creates
some unrealistic expectations that need to be managed. A data warehouse doesn't
meet all decision support needs. If needed data is not currently collected, transaction
systems need to be altered to collect the data. If data quality is a problem, the
problem should be corrected in the source system before the data warehouse is
built. Software can provide only limited support for cleaning and transforming data.
Missing and inaccurate data can not be "fixed" using software. Historical data can be
collected manually, coded and "fixed", but at some point source systems need to
provide quality data that can be loaded into the data warehouse without manual
clerical intervention.
What data is stored in a data warehouse?
In general, organized data about business transactions and business operations is
stored in a data warehouse. But, any data used to manage a business or any type of
data that has value to a business should be evaluated for storage in the warehouse.
Some static data may be compiled for initial loading into the warehouse. Any data
that comes from mainframe, client/server, or web-based systems can then be
periodically loaded into the warehouse. The idea behind a data warehouse is to
capture and maintain useful data in a central location. Once data is organized,
managers and analysts can use software tools like OLAP to link different types of
data together and potentially turn that data into valuable information that can be used
for a variety of business decision support needs, including analysis, discovery,
reporting and planning. Database administrators (DBAs) have always said that
having non-normalized or de-normalized data is bad.
What are the methodologies of Data Warehousing?
Every company has methodology of their own. But to name a few SDLC
Methodology, AIM methodology are sturdily used. Other methodologies are AMM,
World class methodology and many more.
How does my company get started with data warehousing?
Build one! The easiest way to get started with data warehousing is to analyze some
existing transaction processing systems and see what type of historical trends and
comparisons might be interesting to examine to support decision making. See if
there is a "real" user need for integrating the data. If there is, then IS/IT staff can
develop a data model for a new schema and load it with some current data and start
creating a decision support data store using a database management system
(DBMS). Find some software for query and reporting and build a decision support
interface that's easy to use. Although the initial data warehouse/data-driven DSS
may seem to meet only limited needs, it is a "first step". Start small and build more
sophisticated systems based upon experience and successes.
6. What is the Data warehouse Implementation Schemes?
What type of Indexing mechanism do we need to use for a typical data
warehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap
and/or the other types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle
supports bitmaps.
What are the steps to build the data warehouse?
Gathering business requirements
Identifying Sources
Identifying Facts
Defining Dimensions
Define Attributes
Redefine Dimensions & Attributes
Organize Attribute Hierarchy & Define Relationship
Assign Unique Identifiers
Additional conventions: Cardinality/Adding ratios
How often should data be loaded into a data warehouse from transaction
processing and other source systems?
It all depends on the needs of the users, how fast data changes and the volume of
information that is to be loaded into the data warehouse. It is common to schedule
daily, weekly or monthly dumps from operational data stores during periods of low
activity (for example, at night or on weekends). The longer the gap between loads,
the longer the processing times for the load when it does run. A technical IS/IT
staffer should make some calculations and consult with potential users to develop a
schedule to load new data.
What are the different architectures of data warehouse? ® What are the
different approaches of a Data warehouse?
There are two main things
7. Top down - (bill Inmon)
Bottom up - (Ralph Kimball)
What are the types of a data warehouse?
What is the main difference between Inmon and Kimball philosophies of data
warehousing?
Both differed in the concept of building the data warehouse.
Kimball views data warehousing as a constituency of data marts. Data marts are
focused on delivering business objectives for departments in the organization. And
the data warehouse is a conformed dimension of the data marts. Hence a unified
view of the enterprise can be obtained from the dimension modeling on a local
departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis.
Hence the development of the data warehouse can start with data from the online
store. Other subject areas can be added to the data warehouse as their needs arise.
Point-of-sale (POS) data can be added later if management decides it is necessary.
i.e., Kimball--First Data Marts--Combined way ---Data warehouse
Inmon---First Data warehouse--Later----Data marts
When should I consider a Data warehouse solution?
What is the process of warehousing data?
Explain the architecture of a data warehouse with the diagram.
What is Staging Area?
What is a general purpose scheduling tool?
The basic purpose of the scheduling tool in a DW Application is to stream line the
flow of data from Source to Target at specific time or based on some condition.
What is real time data warehousing?
Real-time data warehousing is a combination of two things:
1. real-time activity and
2. Data warehousing.
Real-time activity is activity that is happening right now. The activity could be
anything such as the sale of widgets. Once the activity is complete, there is data
about it. Data warehousing captures business activity data. Real-time data
warehousing captures business activity data as it occurs. As soon as the business
activity is complete and there is data about it, the completed activity data flows into
8. the data warehouse and becomes available instantly. In other words, real-time data
warehousing is a framework for deriving information from data as the data becomes
available.
What is ODS?
ODS means Operational Data Store. A collection of operation or bases data that is
extracted from operation databases and standardized, cleansed, consolidated,
transformed, and loaded into enterprise data architecture. An ODS is used to support
data mining of operational data, or as the store for base data that is summarized for
a data warehouse. The ODS may also be used to audit the data warehouse to
assure summarized and derived data is calculated properly. The ODS may further
become the enterprise shared operational database, allowing operational systems
that are being reengineered to use the ODS as there operation databases.
What is Active data warehousing?
An active data warehouse provides information that enables decision-makers within
an organization to manage customer relationships nimbly, efficiently and proactively.
Active data warehousing is all about integrating advanced decision support with day-
to-day-even minute-to-minute-decision making in a way that increases quality of
those customer touches which encourages customer loyalty and thus secure an
organization's bottom line. The marketplace is coming of age as we progress from
first-generation "passive" decision-support systems to current- and next-generation
"active" data warehouse implementations.
® Active Data ware house means every user can access the database any time 24/7
that is called Active DWH.
® Active Transformation means data can change and pass.
What is meant by OLTP?
OLTP stands for On-Line Transaction Processing. This is a standard, normalized
database structure. OLTP is designed for Transactions i.e., day-to-day transactions.
OLTP database has hundreds of users connected to it. These databases are
normalized to reduce the redundancy of the data & increase the performance while
inserting the data. The ratio of no. of records being inserted is more than the ration of
no. of records being updated or deleted. OLTP systems are not designed for
analysis, reporting and decision support. Examples: ATM Machines, Online
Shopping, Online Application Filling, and Online Railway Reservations.
Why OLTP database are designs not generally a good idea for a Data
Warehouse?
9. Since in OLTP, tables are normalized and hence query response will be slow for end
user and OLTP doesn’t contain years of data and hence cannot be analyzed.
Why is de-normalized data now ok when it's used for Decision Support?
Normalization of a relational database for transaction processing avoids processing
anomalies and results in the most efficient use of database storage. A data
warehouse for Decision Support is not intended to achieve these same goals. For
Data-driven Decision Support, the main concern is to provide information to the user
as fast as possible. Because of this, storing data in a de-normalized fashion,
including storing redundant data and pre-summarizing data, provides the best
retrieval results. Also, data warehouse data is usually static so anomalies will not
occur from operations like add, delete and update a record or field.
Why should you put your data warehouse on a different system than your
OLTP system?
A OLTP system is basically “data oriented” (ER model) and not “Subject oriented
"(Dimensional Model) .That is why we design a separate system that will have a
subject oriented OLAP system...Moreover if a complex query is fired on a OLTP
system will cause a heavy overhead on the OLTP server that will affect the day-to-
day business directly.
What is Business Intelligence?
Business intelligence (BI) is a broad category of applications and technologies for
gathering, storing, analyzing, and providing access to data to help enterprise users
make better business decisions.
What are the important concerns of OLTP and DSS systems?
OLTP DSS
No. of users Many FEW
10. Data 1. Stored in a Complex data format. 1. Stored in multidimensional
structures (Normalized) e.g.:
cube (3 dimensional).
2. Stored in a normalized form.
Normally 3rd Normalized form.
Normalization enhances 2. Stored in de-normalized format.
performance.
3. Large volumes of data.
3. Small volumes of data.
4. Static in nature with periodic
4. Data is volatile in nature. loads.
Operations Transactions. Reporting.
Indexes Few Many.
Joins Many(because it is normalized) Few (because it is de-normalized).
Performanc Concurrency and availability are Response time is most imp.
e more imp aspects. e.g.: ATM's.
OLTP DSS
Complex Data Multidimensional Data
Structures Structures
Few INDEXES Many
Many JOINS Some
Normalized DBMS DUPLICATED DATA De-Normalized DBMS
Rare DERIVED DATA AND Common
AGGREGATES
Many NUMBER OF USERS Few
11. Predefined WORKLOAD AD-HOC queries
operations
Volatile DATA MODIFICATIONS Update on a regular basis
Small Volumes DATA Large Volume (Historical
Data)
Availability Must be Response time must be
high good
What is the difference between ODS and OLTP?
ODS: It is nothing but a collection of tables created in the Data warehouse that
maintains only current data where as OLTP maintains the data only for transactions,
these are designed for recording daily operations and transactions of a business
® ODS: Having data with Data warehouse that will be stand alone. No further
transaction will take place for current data which is part of the data ware house.
Current data will be change once you upload through ETL on schedule basis.
OLTP: Having data with on line system which connected to network and all update
on transaction happened in seconds. Every second data summarized value will get
changed.
What is an OLAP? What are the types of OLAP?
OLAP is software for manipulating multidimensional data from a variety of sources.
The data is often stored in data warehouse. OLAP software helps a user create
queries, views, representations and reports. OLAP tools can provide a "front-end" for
a data-driven DSS.
® OLAP: On-Line Analytical Processing: On-Line Analytical Processing (OLAP) is
a category of software technology that enables analysts, managers and executives
to gain insight into data through fast, consistent, interactive access to a wide variety
of possible views of information that has been transformed from raw data to reflect
the real dimensionality of the enterprise as understood by the user.
® OLAP stands for On-Line Analytical Processing. OLAP system stores data in
multidimensional databases. U then accesses these databases to perform financial
and statistical analysis on different combinations of the data. An OLAP database is
generally used to analyze data. It is optimized so that u can quickly retrieve data. An
OLAP database is generally created from the information u have put in an OLTP
database. OLAP products can be grouped into 3 categories.
12. MOLAP: (Multidimensional OLAP)
o Data is stored multidimensional arrays in order to be viewed in a
multidimensional manner.
o Multidimensional arrays provide efficiency in storage and operations.
o Examples: ORACLE Express Servers, Essbase by Hyperion Software, Power
play by Cognos.
o MOLAP does not support ad-hoc queries because it is optimized for
multidimensional operations
o Retrieval is Fast
o Storage is very efficient
ROLAP: (Relational OLAP)
o Data is stored in a Relational model because OLAP capabilities are best provided
against the relational database.
o Examples: Oracle, SQL Server… etc.
o ROLAP integrates naturally with existing technology and standards.
o ROLAP can readily take advantage of parallel relational technology.
HOLAP: (Hybrid OLAP)
o These products combine MOLAP and ROLAP.
o With HOLAP products, a relational database stores most of the data.
o A separatable multidimensional database stores a small portion of the data
o
Is OLAP databases are called decision support system??? True/false?
True
What does the term ‘Metadata’ mean?
Very loosely, it is documentation about data; it is how you provide context for data
people might be using. Metadata is basically the wrapping you put around data you
use in everyday life to transform it into meaningful information.
What is the difference between data warehousing and OLAP?
The term’s data warehousing and OLAP are often used interchangeably. As the
definitions suggest, warehousing refers to the organization and storage of data from
a variety of sources so that it can be analyzed and retrieved easily. OLAP deals with
the software and the process of analyzing data, managing aggregations, and
partitioning information into cubes for in-depth analysis, retrieval and visualization.
Some vendors are replacing the term OLAP with the term’s analytical software and
business intelligence.
® Data warehouse is the place where the data is stored for analyzing where as
OLAP is the process of analyzing the data, managing aggregations, partitioning
information into cubes for in-depth visualization.
What is OLAP, MOLAP, ROLAP, DOLAP, and HOLAP?
13. OLAP - On-Line Analytical Processing: Designates a category of applications and
technologies that allow the collection, storage, manipulation and reproduction of
multidimensional data, with the goal of analysis.
MOLAP - Multidimensional OLAP: This term designates a Cartesian data structure
more specifically. In effect, MOLAP contrasts with ROLAP. In the former, joins
between tables are already suitable, which enhances performances. In the latter,
joins are computed during the request. Targeted at groups of users because it's a
shared environment. Data is stored in an exclusive server-based format. It performs
more complex analysis of data.
ROLAP - Relational OLAP: Designates one or several star schemas stored in
relational databases. This technology permits multidimensional analysis with data
stored in relational databases. Used for large departments or groups because it
supports large amounts of data and users.
DOLAP - Desktop OLAP: Small OLAP products for local multidimensional analysis
Desktop OLAP. There can be a mini multidimensional database (using Personal
Express), or extraction of a data cube (using Business Objects). Designed for low-
end, single, departmental user. Data is stored in cubes on the desktop. It's like
having your own spreadsheet. Since the data is local, end users don't have to worry
about performance hits against the server.
HOLAP: Hybridization of OLAP, which can include any of the above.
What is meant by metadata in context of a Data warehouse and how it is
important?
Meta data is the data about data; Business Analyst or data modeler usually capture
information about data - the source (where and how the data is originated), nature of
data (char, varchar, nullable, existence, valid values etc) and behavior of data (how it
is modified / derived and the life cycle) in data dictionary a.k.a metadata. Metadata is
also presented at the Data mart level, subsets, fact and dimensions, ODS etc. For a
DW user, metadata provides vital information for analysis / DSS.
What is difference between MOLAP, ROLAP?
ROLAP MOLAP
Tactical Strategic
• Detailed Data • Summary Data
• Simple calculations • Complex
• Analyze past trends • Predict future trends
14. Data storage structure Data storage structure
• Tables • Cube
Advantages Advantages
• Requires less memory storage • Data access is faster
space Disadvantages
Disadvantages
• Requires more memory storage
• Data access is slow space.
• Is sparsely filled as the number
of dimensions in the cube
increases
What is the Difference between OLTP and OLAP?
Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and
IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers (managers,
executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for
summarization and aggregation, stores information at different levels of granularity to
support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database
design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented
database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary
process of an organization; integrates information from many organizational
locations and data stores
15. What types of Metadata are there and when will they be available?
Metadata will be made available on the Decision Support website as each increment
'goes live'. We have two classifications of metadata: one that is business and one
that is technical. Technical metadata is fairly clear-cut: where did the data come from
or how was it transformed along the way? Business metadata deals more with the
possible meaning of the data and how it can be used.
Why is Metadata important to the DWH User?
Metadata is what makes the data in the Data Warehouse meaningful. The Data
Warehouse is very different from an operational application. When you're using an
operational application, you can get clues from the screen that tells you to update a
particular field on the window. If I’m processing a new employee, I know exactly what
needs to be updated for that new employee record, and can move through the
process based on the context that the application provides. In a data-warehousing
environment, you don’t have that context or workflow. You have data that is
interrelated, and it is raw out there in a form, but there is no application between you
and the data. Basically, you have a number of tables and structures that you have
access to without a business layer, without a definition on top of it. So metadata is
very important to be able to provide that context to people so they know how to go
between subject areas or how data within a subject area is related and what it
defines and represents.
Is Metadata a description of what the data represents?
In the simplest terms it is. As an example, if a user of the Data Warehouse is
interested in a field called "campus code", then the metadata might have a definition
of what the campus code represents, such as "an indicator for one of the three
campuses". That is a form of metadata, although it is not a complete picture of what
metadata can be.
What types of Metadata will be made available to the User?
Decision Support has identified several kinds of metadata that will be published on
the website. Some basic categories are the data model, source-to-target mapping,
and the logical & physical model. The logical model gives more of a grouping or
identifies logically what would be expected from the business side. The physical
model goes into more detail with more of the data dictionary definition, but it gives
the user a pictorial representation of the data, not just a list of columns and tables. It
provides a visual so people can see how data elements relate to each other. There is
also a category of metadata that we call usage notes. These go into expanding on
how someone might query the Data Warehouse or use a query against a data mart.
Based on going through the requirements process and working with the focus
groups, as data is available, we expect to expand the metadata categories.
16. Is Metadata also useful to the average User of the DWH, in addition to a
department’s technical staff?
Yes. For an "ad hoc" user, there may be questions as to what a field represents.
Another form of metadata at a business user level would be sample queries that
Decision Support’s Services area would publish based on findings from the
requirements process and focus groups. These queries provide samples of relating
data to answer a business question.
What Challenges are involved when providing Metadata?
Historically organizations find it a challenge to manage metadata over time. So I
think the biggest challenge that we face at Decision Support is learning from those
mistakes and from what we’ve read in the industry. We need to make sure the
metadata we have is ‘live’; that it’s not something that is static and put on the shelf.
Decision Support has formed a Custodial Data Council that will take ownership in
making sure we have business definitions and work with the user community. I think
we also need to technically streamline those processes as much as possible, publish
the metadata, and make it as consistent as possible.
What is the difference between DWH and BI?
There may be a Feature film (movie) without a Trailer. But there will be no trailer
without a movie. Similarly Data warehousing is a concept related to extracting client's
business data and applying business processing features on that data according to
user needs and finally loading the processed data into a database, this database is
what we call a warehouse or data warehouse. After the completion of a data
warehouse the business user ultimately want to view his data (a precise and
summary data) but as a business person he may don't have knowledge of accessing
a database (a computer person can access the database with SQL). So there comes
OLAP tools (which help that person to access the database) we can call these OLAP
tools as Business Intelligence tools (Intelligence in sense they generate SQL queries
internally and provide lot of facilities and privileges for a reporting developers in
formatting the data and presenting it in a highly convenient manner). So data
warehouse (movie) is a database and business intelligence tools (trailers) present
the content of a database in an efficient manner.
® Simply speaking, BI is the capability of analyzing the data of a data warehouse in
advantage of that business. A BI tool analyzes the data of a data warehouse and to
come into some business decision depending on the result of the analysis.
® Data warehouses deals with all aspects of managing the development,
implementation and operation of a data warehouse or data mart including meta data
management, data acquisition, data cleansing, data transformation, storage
management, data distribution, data archiving, operational reporting, analytical
reporting, security management, backup/recovery planning, etc. Business
17. intelligence, on the other hand, is a set of software tools that enable an organization
to analyze measurable aspects of their business such as sales performance,
profitability, operational efficiency, effectiveness of marketing campaigns, market
penetration among certain customer groups, cost trends, anomalies and exceptions,
etc. Typically, the term “business intelligence” is used to encompass OLAP, data
visualization, data mining and query/reporting tools. Think of the data warehouse as
the back office and business intelligence as the entire business including the back
office. The business needs the back office on which to function, but the back office
without a business to support, makes no sense.
® DATAWAREHOUSE: Data warehouse is integrated, time-variant, subject oriented
and non-volatile collection data in support of management decision making process.
BUSINESS INTELLIGENCE: Business Intelligence is the process of extracting the
data, converting it into information and then into knowledge base is known as
Business Intelligence.
® A data warehouse is a database geared towards the business intelligence
requirements of an organization. It integrates data from the various operational
systems and is typically loaded from these systems at regular intervals.
BI - It is category of technologies that allows for gathering, storing, accessing and
analyzing data to help business users make better decisions.
® To make Business Analysis effective and efficient we require specialized form of
storage. This special form of storage of data is called Data Warehouse and the
process Data Warehousing.
Business Intelligence, is the mechanism of using data according to type of industry
for predictive analysis, fault findings, process improvement etc.
What is a Data Dictionary?
A data dictionary is a kind of metadata. A data dictionary explains how data
physically resides in an environment. A data dictionary identifies the type of column it
is, whether it is character or numeric or some other value. It identifies the width of a
column as well as the name of the column. Sometimes in data dictionaries you see
descriptions; sometimes you don’t. But basically it is how that field is physically
represented in Oracle or Sybase or some other platform, if that’s where the data
resides. It's difficult to do any meaningful query or report without basic metadata.
What are the possible data marts in Retail sales?
Product information, sales information.
What are data validation strategies for data mart validation after loading
process?
18. Data validation is to make sure that the loaded data is accurate and meets the
business requirements.
Strategies are different methods followed to meet the validation requirements.
What is a Data Mart?
A Data Mart is a focused subset of a DWH that deals with a single area of data and
is organized for quick analysis. It contains the summarized data of the warehouses
and is referred as High Performance Query Structures. They consist of
Materialized Views and Special Indexes. In some businesses these data marts may
be maintained within the warehouses whereas, in some other scenario’s they may
be maintained apart from the DWH’s.
® A data mart is a repository of data gathered from operational data and other
sources that is designed to serve a particular community of knowledge workers.
® The systems designed for a particular line of business.
What are Data Marts?
Data Marts are designed to help manager make strategic decisions about their
business. Data Marts are subset of the corporate-wide data that is of value to a
specific group of users.
There are two types of Data Marts:
1. Independent data marts – sources from data captured form OLTP system,
external providers or from data generated locally within a particular department or
geographic area.
2. Dependent data mart – sources directly form enterprise data warehouses.
What are the levels of Data mart?
What are the difference between Database, DATAWAREHOUSE and Data
Marts?
A Database is an organized collection of data.
A DWH is a very large database with special set of tools to extract and cleanse data
from operational systems and to analyze data.
A Data Mart is a focused subset of a DWH that deals with a single area of data and
is organized for quick analysis.
What is Data Sampling?
What is Data Scrubbing?
19. What is Data Acquisition Process?
What is data mining?
Data mining is a process of extracting hidden trends within a data warehouse. For
example an insurance data warehouse can be used to mine data for the most high
risk people to insure in a certain geographical area.
What is a transformation?
It is a repository object that generates, modifies or passes data.
Transformations: Transformations are the manipulation of data from how it appears
in the source systems into another form in the DWH or data mart in a way that
enhances or simplifies its meaning. In another way, you transform data into
information. This includes the following:
Data Merging: It is a process of standardizing data types and fields. Suppose one
source system calls integer type data as smallint whereas another calls same data
as decimal. The data from the two source systems needs to rationalize when moved
into the oracle data format called number.
Cleansing: It is the process of validating the data brought from multiple sources.
This involves identifying any changing inconsistencies or inaccuracies.
• Eliminating inconsistencies in the data from multiple sources.
• Converting data from different systems into single consistent data set suitable for
analysis.
• Meets a standard for establishing data elements, codes, domains, formats and
naming conventions.
• Correct data errors and fills in for missing data values.
Aggregation: The process where by multiple detailed values are combined into a
single summary value typically summation numbers representing dollars spend or
units sold.
Generate summarized data for use in aggregate fact and dimension tables.
What are the advantages of data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a
company/business organization, by using the concept of Data Mining, we can predict
the future of business in terms of Revenue (or) Employees (or) Customers (or)
Orders etc.
Traditional approaches use simple algorithms for estimating the future. But, it does
not give accurate results when compared to Data Mining.
What is ETL?
20. ETL stands for extraction, transformation and loading.
ETL provide developers with an interface for designing source-to-target mappings,
transformation and job control parameter.
• Extraction: Take data from an external source and move it to the warehouse
pre-processor database.
• Transformation: Transform data task allows point-to-point generating, modifying
and transforming data.
• Loading: Load data task adds records to a database table in a warehouse.
Explain the classification of Tables in a Data warehouse?
What is Fact table?
Fact Table contains the measurements or metrics or facts of business process. If
your business process is "Sales”, then a measurement of this business process such
as "monthly sales number" is captured in the Fact table. Fact table also contains the
foreign keys for the dimension tables.
Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/look up tables
and the measures. So when ever we have the keys in a table. That itself implies that
the table is in the normal form.
What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data
warehouse. For example: Based on design you can decide to put the sales data in
each transaction. Now, level of granularity would mean what detail you are willing to
put for each transactional fact. Product sales with respect to each minute or you
want to aggregate it up to minute and put that data.
What does level of Granularity of a fact table signify?
Granularity: The first step in designing a fact table is to determine the granularity of
the fact table. By granularity, we mean the lowest level of information that will be
stored in the fact table. This constitutes two steps:
Determine which dimensions will be included.
Determine where along the hierarchy of each dimension the information will be kept.
The determining factors usually go back to the requirements
What is aggregate fact table?
21. Aggregate table contains the [measure] values, aggregated /grouped/summed up to
some level of hierarchy.
What is fact less fact table? Where you have used it in your project?
Factless table means only the key available in the Fact there is no measures
available.
What is the common use of creating a Factless Fact Table?
What are the different types of Fact Table? Explain with an example.
1. Cumulative Fact Table:
2. Snapshot Fact Table:
What are the types of Facts?
Additive: A Fact that can be summed up with any of the dimensions is called Additive
Facts.
® A measure can participate arithmetic calculations using all or any dimensions. Ex:
Sales profit
Semi additive: A Fact that can be summed up with some of the dimensions is called
Semi-additive Facts.
® A measure can participate arithmetic calculations using some dimensions. Ex:
Sales amount
Non Additive: A Fact that can be summed up with none of the dimensions is called
Non-additive Facts.
® A measure can’t participate arithmetic calculations using dimensions. Ex:
temperature
What are Semi-additive and factless facts and in which scenario will you use
such kinds of fact tables?
Snapshot facts are semi-additive, while we maintain aggregated facts we go for
semi-additive. EX: Average daily balance
A fact table without numeric fact columns is called factless fact table. Ex: Promotion
Facts
22. While maintain the promotion values of the transaction (ex: product samples)
because this table doesn’t contain any measures.
What are non-additive facts in detail?
A fact may be measure, metric or a dollar value. Measure and metric are non
additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place
for a particular period of time, we can add the dollar amounts and come up with the
total amount.
A non additive fact, for e.g. measure height(s) for 'citizens by geographical location' ,
when we rollup 'city' data to 'state' level data we should not add heights of the
citizens rather we may want to use it to derive 'count'.
What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data
Marts in combination with multiple facts tables accordingly.
What is a continuously valued fact?
What is Centipede Fact Table?
What is Fact Constellation?
What are the categories of Snapshot Fact Table Grains?
What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which the user
can drill down and drill up. It contains only the textual attributes.
How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles up to 2NF. In
some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in the next
section).
Change fact table and DW population routines.
What are the Different methods of loading Dimension tables?
23. Conventional Load: Before loading the data, all the Table constraints will be checked
against the data.
Direct load: (Faster Loading) All the Constraints will be disabled. Data will be loaded
directly. Later the data will be checked against the table constraints and the bad data
won't be indexed.
Can a dimension table contain numeric values?
What is hierarchy relationship in a dimension? Whether it is:
1. 1:1
2. 1: m
3. M: m
What are the different types of dimensions? Explain with examples.
1. Regular Dimensions
2. Shared dimensions
What are the different types of dimension tables? Explain with examples.
Why dimensions are de-normalized in nature?
Can 2 fact tables share same dimension tables?
What is junk dimension?
Junk dimension: Grouping of Random flags and text attributes in a dimension and
moving them to a separate sub dimension.
® A dimension, which does not change the grain level, is called junk dimension.
Grain- lowest level of reporting.
(Or) The junk dimension is simply a structure that provides a convenient place to
store the junk attributes
(Or) A junk dimension is a convenient grouping of flags and indicators.
What are Conformed Dimensions?
A dimension that is used in more than one cube.
® The use of conformed dimensions and shared measures is the primary way a set
of data marts can be united into one consolidated data warehouse.
® Conformed dimensions are dimensions which are common to the cubes.(cubes
are the schemas contains facts and dimension tables)
24. Consider Cube-1 contains F1, D1, D2, D3 and Cube-2 contains F2, D1, D2, D4 are
the Facts and Dimensions. Here D1,D2 are the Conformed Dimensions
® Conformed dimensions mean the exact same thing with every possible fact table
to which they are joined. Ex: Date Dimensions is connected all facts like Sales facts,
Inventory facts. Etc
What is degenerated dimension?
Degenerate Dimension: Keeping the control information on Fact table ex: Consider
a Dimension table with fields like order number and order line number and have 1:1
relationship with Fact table, In this case this dimension is removed and the order
information will be directly stored in a Fact table in order eliminate unnecessary joins
while retrieving order information.
What is degenerate dimension table?
Degenerate Dimensions: If a table contains the values, which r neither dimension
nor measures is called degenerate dimensions. Ex: invoice id, empno.
What is Audit dimension? Explain with an example.
What is a Fact Dimension?
What is a Mini Dimension?
What are Role-playing dimensions?
What is a Mystery Dimension?
How do you connect the facts and dimensions in the tables?
1. Smart Matching columns
2. Manually you can link
Which columns go to the fact table and which columns go the dimension
table?
The Primary Key columns of the Tables (Entities) go to the Dimension Tables as
Foreign Keys.
The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign
Keys.
What is Associate Table?
What is Bridge Table?
What is crass reference table?
25. What is Event-Tracking Table?
What is a lookup table?
A lookup table is the one which is used when updating a warehouse. When the
lookup is placed on the target table (fact table / warehouse) based upon the primary
key of the target, it just updates the table by allowing only new records or updated
records based on the lookup condition.
What is the data type of the surrogate key?
Data type of the surrogate key is either integer or numeric or number.
What is a Schema?
What is a Star Schema?
Star schema is a type of organizing the tables such that we can retrieve the result
from the database easily and fastly in the warehouse environment. Usually a star
schema consists of one or more dimension tables around a fact table which looks
like a star, so that it got its name.
Differences between star and snowflake schemas?
Star schema: A single fact table with N number of Dimension.
Snowflake schema: Any dimensions with extended dimensions are known as
snowflake schema.
® Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many relationship
with other tables.
What is Snow-Flake Schema?
When do U go for Star Schema? & when do U go for Snow-Flake Schema?
What is the main difference between schema in RDBMS and schemas in Data
Warehouse?
RDBMS Schema
26. • Used for OLTP systems
• Traditional and old schema
• Normalized
• Difficult to understand and navigate
• Cannot solve extract and complex problems
• Poorly modeled
DWH Schema
• Used for OLAP systems
• New generation schema
• De Normalized
• Easy to understand and navigate
• Extract and complex problems can be easily solved
• Very good model
Why did u choose STAR SCHEMA only? What are the benefits of STAR
SCHEMA?
Because it’s de-normalized structure, i.e., Dimension Tables are de-normalized. Why
to de-normalize means the first (and often only) answer is: speed. OLTP structure is
designed for data inserts, updates, and deletes, but not data retrieval. Therefore, we
can often squeeze some speed out of it by de-normalizing some of the tables and
having queries go against fewer tables. These queries are faster because they
perform fewer joins to retrieve the same record set. Joins are also confusing to many
End users. By de-normalizing, we can present the user with a view of the data that is
far easier for them to understand.
Benefits of STAR SCHEMA:
Far fewer Tables.
Designed for analysis across time.
Simplifies joins.
Less database space.
Supports “drilling” in reports.
Flexibility to meet business and technical needs.
Difference between Snow flake and Star Schema. What are situations where
Snow flake Schema is better than Star Schema to use and when the opposite
is true?
27. Star schema contains the dimension tables mapped around one or more fact tables.
It is a denormalised model. No need to use complicated joins. Queries results fastly.
Snowflake schema: It is the normalized form of Star schema. It contains in-depth
joins, because the tables r splitted in to many pieces. We can easily do modification
directly in the tables. We have to use complicated joins, since we have more
tables .There will be some delay in processing the Query.
Which is preferable? Star Schema or Snow-Flake Schema?
If U have 2 fact tables connected in the schema, do U know the name of the
schema?
What is Galaxy Schema?
What is Multi-Star Schema?
How do you load the time dimension?
Time dimensions are usually loaded by a program that loops through all possible
dates that may appear in the data. It is not unusual for 100 years to be represented
in a time dimension, with one row per day.
What are slowly changing dimensions?
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of
three types
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags Or combination of these
SCD3: by adding new columns to target table we maintain historical information and
current information
® Type-1: Most Recent Value
Type-2(full History)
i) Version Number
ii) Flag
28. iii) Date
Type-3: Current and one Previous value
® Type 1: overwrite data is to be there.
Type 2: current, recent and history data should be there.
Type 3: current and recent data should be there.
What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and
standardized definition if facts.
What is hybrid slowly changing dimension?
What are Critical columns?
What is a surrogate key? Why is it used? What is its need? Give an example.
Explain in detail what do you mean by Slicing and Dicing?
Slicing and dicing refers to the ability to combine and re-combine the dimensions to
see different slices of the information. Picture slicing a three-dimensional cube of
information, in order to see what values are contained in the middle layer. Dicing is
the ability to view the cube from different perspectives. Slicing and dicing a cube
allows an end-user to do the same thing with multiple dimensions.
What is a Measure? What are the types of Measures?
How can U create Measures & Dimensions?
Can we group a measure?
What do U mean by Multi-dimensional Analysis?
What is a Grain?
What is Drill-up, Drill-down & Drill-Across?
Differentiate between Level and Category?
Level is a logical subdivision of a dimension
e.g.: if orderdate is a dimension, the levels are year, quarter, month, week, day etc.
Category is the different instances of a level
E.g. if year is a level, the category are 1996, 1997, 1998 etc.
What is a CUBE in data warehousing concept?
29. Cubes are logical representation of multidimensional data. The edge of the cube
contains dimension members and the body of the cube contains data values.
What is a Virtual Cube?
Difference between filter and condition?
Parameter is the only difference
® The difference between Filter and Condition: Condition returns true or false Ex: if
Country = 'India' then ...Filter will return two types of results.
1. Detail information which is equal to where clause in SQL statement
2. Summary information which is equal to Group by and having clause in SQL
statement
® I filter we just create a parameter on which we can filter the fields. but in condition
we can have the static functions like if yes then color it green, if no then color it as
red etc. so here we can create conditions for filtering in the report. Mean we can
make different filtering function at the same time by using conditional formatting.
What is snapshot?
You can disconnect the report from the catalog to which it is attached by saving the
report with a snapshot of the data. However, you must reconnect to the catalog if you
want to refresh the data.
What is a linked cube?
Linked cube in which a sub-set of the data can be analyzed into great detail. The
linking ensures that the data in the cubes remain consistent.
What is VLDB?
VLDB stands for Very Large Database.
It is an environment or storage space managed by a relational database
management system (RDBMS) consisting of vast quantities of information. VLDB
doesn’t refer to size of database or vast amount of information stored. It refers to the
window of opportunity to take back up the database.
Window of opportunity refers to the time of interval and if the DBA was unable to
take back up in the specified time then the database was considered as VLDB.
What is batch processing?
What is incremental loading?
30. Incremental loading means loading the ongoing changes in the OLTP.
Explain the advantages of RAID 1, 1/0, and 5. What type of RAID setup would
you put your TX logs.
Transaction logs write sequentially and don't need to be read at all. The ideal is to
have each on RAID 1/0 because it has much better write performance than RAID 5.
RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad
less reliability and performance is a little worse generally speaking.
RAID 5 is best for data generally because of cost and the fact it provides great read
capability.
What is BAS? What is the function?
The Business Application Support (BAS) functional area at SLAC provides
administrative computing services to the Business Services Division and Human
Resources Department. We are responsible for software development and
maintenance of the PeopleSoft applications and consultation to customers with their
computer-related tasks. It’s called Broadcast Agent Server. Its function is to run the
jobs or reports scheduled and can be monitored using Broadcast Agent Console.
What are modeling tools available in the Market?
There are a number of data modeling tools
Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
What are the various Reporting tools in the Market?
1. MS-Excel
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
31. 6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity
® Some of the standard Business Intelligence tools in the market According to their
performance
1) MICROSTRATEGY
2) BUSINESS OBJECTS, CRYSTAL REPORTS
3) COGNOS REPORT NET
4) MS-OLAP SERVICES
Or
1. Seagate Crystal report
2. SAS
3. Business objects
4. Microstrategy
5. Cognos
6. Microsoft OLAP
7. Hyperion
8. Microsoft integrated services and some more.
What are the various ETL tools in the Market?
Various ETL tools used in market are:
Informatica.
Data Stage.
Oracle Warehouse Builder.
Ab Initio.
Data Junction.
Name some of the real time data-warehousing tools?
What is Outsourcing, Offshoring & Insourcing? And what is the difference
between them.
32. Outsourcing is not strictly IT. Any function of an organization that is executed by non-
employees is essentially an Outsourced task.
Insourcing is the use of external resources (not employees of the Organization) to
accomplish some function, but they are predominately carrying out the function at
the client’s site. So, the function is “sourced” but not “out” sourced. These resources
are also typically managed more closely by the client directly with little management
involvement from the supplier.
Offshoring is a subset of Outsourcing which is generally understood to involve a
country in which cost remain lower than the clients country of operations.
While most Offshoring situations are indeed an example of Outsourcing, for those
companies (HP for example) who now own their offshore operations and have folded
them into the company, the line gets blurred. In other words, Offshoring is not always
outsourcing anymore.
What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976
[Chen76] as a way to unify the network and relational database views. Simply stated
the ER model is a conceptual data model that views the real world as entities and
relationships. A basic component of the model is the Entity-Relationship diagram
which is used to visually represent data objects.
Since Chen wrote his paper the model has been extended and today it is commonly
used for database design for the database designer, the utility of the ER model is:
It maps well to the relational model. The constructs used in the ER model can easily
be transformed into relational tables.
It is simple and easy to understand with a minimum of training. Therefore, the model
can be used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to
implement a data model in specific database management software.
What Oracle tools can be used to build and design a warehouse?
What Oracle features can be used to optimize my warehouse system?
What is Data Modeling?
Data modeling represent information in the entities, attributes and relationships.
Visual representation of the information.
What are the different steps for Data Modeling?
1. Define the problem and scope of the problem.
33. 2. Information gathering.
3. Analysis(normalization)
4. Create a logical data model (independent of platform).
5. Decision about physical platform like oracle or SQL etc.
6. Create a physical data model, which is platform specific.
7. Database creation.
What is Dimensional Modeling?
Dimensional Modeling is a design concept used by many data warehouse designers
to build their data warehouse. In this design model all the data is stored in two types
of tables - Facts table and Dimension table. Fact table contains the
facts/measurements of the business and the dimension table contains the context of
measurements i.e., the dimensions on which the facts are calculated. Data modeling
is probably the most labor intensive and time consuming part of the development
process. Why bother especially if you are pressed for time? A common response by
practitioners who write on the subject is that you should no more build a database
without a model than you should build a house without blueprints. The goal of the
data model is to make sure that the all data objects required by the database are
completely and accurately represented. Because the data model uses easily
understood notations and natural language, it can be reviewed and verified as
correct by the end-users. The data model is also detailed enough to be used by the
database developers to use as a "blueprint" for building the physical database. The
information contained in the data model will be used to define the relational tables,
primary and foreign keys, stored procedures, and triggers. A poorly designed
database will require more time in the long-term. Without careful planning you may
create a database that omits data required to create critical reports, produces results
that are incorrect or inconsistent, and is unable to accommodate changes in the
user's requirements.
What is Logical Modeling?
The Logical Model: In Erwin, the logical model is the version of the model that
represents all of the logical business requirements of an organization. There are
three levels of logical models that are used to capture these requirements:
The Entity Relationship Diagram A high-level data model that includes all major
entities and relationships. The Entity Relationship Diagram does not contain much
detail and is often used in the initial planning phase.
The Key Based Model A model that describes major data structures such as
entities, primary keys, and sample attributes.
34. The Fully Attributed Model A complete model that includes all required entities,
attributes, key groups, and relationships.
In Erwin, a logical model can be created in conjunction with the physical model, or
independent of the physical model. Logical models can also be derived from other
models using the Derive Model Wizard.
In addition, Erwin supports the definition of model objects in a logical model as
logical only and in a physical model as physical only. These options allow for the
logical model to be fully normalized and for the corresponding physical model to be
de-normalized. Erwin also allows for the automatic conversion of many-to-many and
super type/subtype relationships when you change from a logical model to a physical
model.
What are the types of Dimensional Modeling?
What is Conceptual Modeling?
What is Physical Modeling?
Comparing Logical and Physical Models in a Logical/Physical Model:
In an Erwin logical/physical model, each model that you create automatically
includes both a logical and a physical model. By default, the logical model is closely
related to the physical model. If you make a change in the logical model, the change
is automatically reflected in the physical model and vice-versa.
You can use either the logical model or the physical model to define and document
database structures; although the model you use typically depends on the type of
work you want to perform. You can use the logical model to represent business
information and define business rules in a fully normalized model, while the physical
model supports the needs of the database administrator, who focuses on the
physical implementation of the model in a database.
Comparing Logical and Physical Model Objects:
Most of the objects in the logical model correspond to a related object in the physical
model. For example, the logical model contains entities, attributes, and key groups,
which are represented in the physical model as tables, columns, and indexes,
respectively. The following table compares the logical and physical components in
an Erwin model.
What is Difference between E-R Modeling and Dimensional Modeling?
Basic diff is E-R modeling will have logical and physical model. Dimensional model
will have only physical model.
E-R modeling is used for normalizing the OLTP database design.
35. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.
What is Entity, Attribute and Relationship?
Entity: Entity is an object of which an organization wants to maintain the information
E.g.: Employee.
Attribute: Is an object that maintains the information.
Key attribute: A key attribute consists of one or more attributes of an entity, which
uniquely identify the entity. e.g.; Bank account no identifies for account.
Relationship: Defines the association between different entities.
one to one, one to many, many to one, many to many.
What is meant by De-Normalization?
What is the definition of normalized and denormalized view and what are the
differences between them?
Normalization is the process of removing redundancies.
Denormalization is the process of allowing redundancies.
Why Denormalization is promoted in Universe Designing?
In a relational data model, for normalization purposes, some lookup tables are not
merged as a single table. In a dimensional data modeling (star schema), these
tables would be merged as a single table called DIMENSION table for performance
and slicing data. Due to this merging of tables into one large Dimension table, it
comes out of complex intermediate joins. Dimension tables are directly joined to Fact
tables. Though, redundancy of data occurs in DIMENSION table, size of
DIMENSION table is 15% only when compared to FACT table. So only
Denormalization is promoted in Universe Designing.
What is Cardinality?
What is Referential Integrity?
What are Integrity Constraints?
What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Every
time you access the view, the SQL statement executes.
Materialized view - stores the results of the SQL in table form in the database. SQL
statement only executes once and after that every time you run the query, the stored
result set is used. Pros include quick query results.
36. What is Normalization, First Normal Form, Second Normal Form , Third Normal
Form?
1. Normalization is process for assigning attributes to entities–Reduces data
redundancies–Helps eliminate data anomalies–Produces controlled redundancies to
link tables
2. Normalization is the analysis of functional dependency between attributes / data
items of user views. It reduces a complex user view to a set of small and stable
subgroups of fields / relations
1NF: Repeating groups must be eliminated, Dependencies can be identified, All key
attributes defined, No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No attribute
dependent on a portion of primary key, Still possible to exhibit transitive dependency,
Attributes may be functionally dependent on non-key attributes
3NF: The Table is already in 2NF, Contains no transitive dependencies.
What is a Table space? What does it contain?
What is a Composite Key or Concatenated Key? What is its use?
What are Unique Identifiers?
What is an Index? What are the types of Indexes?
What do U mean Partitioned Indexes?
What is partitioning? What are the methods of partitioning?
What is Parallelism?
What are the advantages and disadvantages of reporting directly against the
database? Do you always need to copy the data before reporting on it?
(Example, real-time & on-demand reporting is a requirement)
There isn’t any need to copy the data before reporting on as long as the data is
clean. But if the data is not clean it should be cleansed and so go for ETL
process.
Adv of reporting directly against the database (OLTP): No need to separately
maintain a Database for it. (Space consumption is reduced).
Disadv of reporting directly against the database (OLTP): It slows down the
process bcoz OLTP system is designed for the online application but a Data
Warehouse application which requires to do analysis and hence takes the same data
but takes a long time.
37. What are the most frequent data errors that slow down data input process?
Data mining is the process of data selection, exploration and building models
using vast data stores to uncover previously unknown patterns. What does
this mean to you?
You can produce new knowledge to better inform decision makers before they act.
Build a model of the real world based on data collected from a variety of sources,
including corporate transactions, customer histories and demographics, even
external sources such as credit bureaus. Then use this model to produce patterns in
the information that can support decision making and predict new business
opportunities. Text mining capabilities enable you to apply such analyses to text-
based documents. With SAS's rich suite of text processing and analysis tools, you
can uncover underlying themes or concepts contained in large document collections,
group documents into topical clusters, classify documents into predefined categories
and integrate text data with structured data for enriched predictive modeling
endeavors.
38. Before you begin, you should know the answers for the following questions.
what is Data?
D what is a Database?
D what is an RDBMS?
R What is a Data Model?
D Why we follow Normalization while designing data model?
What is an OLTP system
WHAT IS A DATAWAREHOUSING:
• A data warehouse is a relational database that is designed for query and
analysis rather than for transaction processing. It usually contains historical
data derived from transaction data, but it can include data from other sources.
It separates analysis workload from transaction workload and enables an
organization to consolidate data from several sources.
• In addition to a relational database, a data warehouse environment includes
an extraction, transportation, transformation, and loading (ETL) solution, an
online analytical processing (OLAP) engine, client analysis tools, and other
applications that manage the process of gathering data and delivering it to
business users.
• A Data warehouse is a complete set of
Subject Oriented
Integrated
Time variant
Nonvolatile
data which helps business in taking organization decision
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more
about your company's sales data, you can build a warehouse that concentrates on
sales. Using this warehouse, you can answer questions like "Who was our best
customer for this item last year?" This ability to define a data warehouse by subject
matter, sales in this case, makes the data warehouse subject oriented.
39. Integrated
Integration is closely related to subject orientation. Data warehouses must put data
from disparate sources into a consistent format. They must resolve such problems
as naming conflicts and inconsistencies among units of measure. When they achieve
this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change.
This is logical because the purpose of a warehouse is to enable you to analyze what
has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is
very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A
data warehouse's focus on change over time is what is meant by the term time
variant.
When an organization should create a Data Warehouse?
Once an organization have too much of information where it becomes too difficult to
get the meaning full information for the business to take the strategic decisions. The
decisions we make using the Data warehousing data will affect the entire
organization instead of one customer or one employee. Example of decisions we
make in DW is, should we continue with the specific product offerings to our
customers or not. Should we move the customer support department to a different
location for a cost saving, etc etc.
Data warehouses and OLTP systems have very different requirements. Here are
some examples of differences between typical data warehouses and OLTP systems:
• Workload
Data warehouses are designed to accommodate ad hoc queries. You might
not know the workload of your data warehouse in advance, so a data
warehouse should be optimized to perform well for a wide variety of possible
query operations.
OLTP systems support only predefined operations. Your applications might be
specifically tuned or designed to support only these operations.
• Data modifications
40. A data warehouse is updated on a regular basis by the ETL process (run
nightly or weekly) using bulk data modification techniques. The end users of a
data warehouse do not directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification
statements to the database. The OLTP database is always up to date, and
reflects the current state of each business transaction.
• Schema design
Data warehouses often use denormalized or partially denormalized schemas
(such as a star schema) to optimize query performance.
OLTP systems often use fully normalized schemas to optimize
update/insert/delete performance, and to guarantee data consistency.
• Typical operations
A typical data warehouse query scans thousands or millions of rows. For
example, "Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For example,
"Retrieve the current order for this customer."
• Historical data
Data warehouses usually store many months or years of data. This is to
support historical analysis.
OLTP systems usually store data from only a few weeks or months. The
OLTP system stores only historical data as needed to successfully meet the
requirements of the current transaction.
END USER OF APPPLICATION:
What you mean by end user in OLTP system ?
• An end user is who is entering data or reading a particular report from the
system.
• For a Bank teller he/she should enter the account number see the balance or
deposit the cheque etc
• For a customer representative job he/she must see the cust information to be
more effective
What kind of information management wants to know, because
the DW data is primarily used by management.
41. • Which are our lowest/highest margin customers?
• What is the most effective distribution channel?
• What product promotions have the biggest impact on revenue?
• What impact will new products/services have on revenue and margins?
• Which customers are most likely to go to the competition?
• Who are my customers and what products are they buying?
In OLTP applications, end users are individuals who takes care of day to day
operations.
In DW applications, end users are managers and above who takes decisions based
on the trend, history, predictions etc
If end users are not satisfied with the application, then the product is considered to
be failure even though the technology wise its a great achievement.
Data Warehouse Architecture:
Source Data:
42. An organization will have many OLTP applications, all these operational data
becomes the source for the Data Warehouse database.
ETL: (Extract Transform and Load)
We extract data from various operational systems and clean the data so that we get
only the information make sense to have in Data Warehouse. While cleansing the
data we may reject some records or we fill in the missing information. Once we
transform the operational data to the format in which DW expects, then we load the
data to DW. This process takes most of the time while developing DW applications.
DW Database
This is the area where we store the data which is required by the business so that
they can run any report against the data. In data warehouses we will have current
and history information which is very useful for trend analysis, behavioral analysis
etc.
What is Data Mart?
A data mart is a simple form of a data warehouse that is focused on a single subject
(or functional area), such as Sales or Finance or Marketing. Data marts are often
built and controlled by a single department within an organization. Given their single-
subject focus, data marts usually draw data from only a few sources. The sources
could be internal operational systems, a central data warehouse, or external data
Difference between Data Warehouse and Data Mart
Data Warehouse Data Mart
D Enterprise-wide Departmental
Structure for corporate view of Star Schema based (Facts and
data dimensions)
d Organized E-R Model or d Quick turn around (up and
Galaxy of Star (Multiple Star running as there are less
schemas in the Data Model) stakeholders)
s Long turn around time
Data Granularity
43. What is Granularity of your DW?
Granularity is the level of details we want to store in the data warehouse.
For a retail store, Point of Sale (POS) is the lowest granularity information
available.
For banking its the account level details based on every day transactions.
As DSS is learning towards analyzing the data as a whole, not necessarily
the data warehouse will have all the details up to daily transactions.
t Daily sales by date, product and customer
Weekly sales by product and customer
Monthly sales by product and customer
Quarterly sales by product and customer
Yearly sales by product and customer
Usually in Data Warehouses (EDW) we will tend to have POS where as in Data
marts we will have it aggregated by week or month so that we never loose the
detailed information. This detailed level data can be used to get the micro behaviors
of our customers (especially in Data Mining)
Data Warehousing Objects:
Data ware housing consists only two objects
Fact
Dimension
Fact Tables:
A fact table typically has two types of columns: those that contain numeric facts
(often called measurements), and those that are foreign keys to dimension tables. A
fact table contains either detail-level facts or facts that have been aggregated. Fact
tables that contain aggregated facts are often called summary tables. A fact table
usually contains facts with the same level of aggregation. Though most facts are
additive, they can also be semi-additive or non-additive. Additive facts can be
aggregated by simple arithmetical addition. A common example of this is sales. Non-
additive facts cannot be added at all. An example of this is averages. Semi-additive
44. facts can be aggregated along some of the dimensions and not along others. An
example of this is inventory levels, where you cannot tell what a level means simply
by looking at it.
Dimension Tables:
A dimension is a structure, often composed of one or more hierarchies, that
categorizes data. Dimensional attributes help to describe the dimensional value.
They are normally descriptive, textual values. Several distinct dimensions, combined
with facts, enable you to answer business questions. Commonly used dimensions
are customers, products, and time.
Dimension data is typically collected at the lowest level of detail and then aggregated
into higher level totals that are more useful for analysis. These natural rollups or
aggregations within a dimension table are called hierarchies.
Hierarchies:
Hierarchies are logical structures that use ordered levels as a means of organizing
data. A hierarchy can be used to define data aggregation. For example, in a time
dimension, a hierarchy might aggregate data from the month level to the quarter
level to the year level. A hierarchy can also be used to define a navigational drill path
and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it.
Data values at lower levels aggregate into the data values at higher levels. A
dimension can be composed of more than one hierarchy. For example, in the
product dimension, there might be two hierarchies--one for product categories and
one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use
hierarchies to enable you to drill down into your data to view different levels of
granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business
structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level
value, a value at the next higher level is its parent, and values at the next lower level
are its children. These familial relationships enable analysts to access data quickly.
YE
AR
QUATER
WEEK
45. How to handle Slowly Changing Dimensions (SCDs) in data model design?
Posted by Dylan Wan on January 13, 2007
There are multiple methods to handle the slowly changing dimensions. Which
technique to use depends on your business requirements. The choice among these
three methods are not a technical design decision since their behaviors are different.
Type One: Overwite the old data with new data
Using this method, you do not store the histoy. For example, that say each customer
can have one salesrep at any given point in time. When the salerep of ABC Inc.,
changes from Sandy to Laura, Sandy was a salerep of ABC will not be kept
anywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc.
forever and count all the sales done by Sandy as Lanura’s.
The above example may not sound making business sense. However, if you only
report the sales of the current period, and salesrep does not change during the
period, this method is ok to be used.
Mary OLTP tables does not need to track the history of changes and thus this
method may be used by the source application. However, if you want to report the
historical data, even your OLTP does not track history, the data warehouse can still
use other methods to track the history.
Type Two: Add a new record at the timeof the change
Using this method, all priorhistory are saved. There
MONTH are two alternative methods to model the key of this
table.
Method A – No surrogate key – Use timestamp
When a change happens, a new record is added into the table. All the attributes are
copied from the previous record except the changed values. The nature key is
copied as well so the timestamps is used to differentiate the records.
When a fact table is joined with the dimension, if you are interested in the historical
data, the timestamp will be used as part of the join condition. To ease the join, the
record typically use two date columns – the effective start date and the effective end
date.
46. Method B – No surrogate key – Use version number
Instead of using the date column, a version number is used to differentiate the
different versions of the records.
This technique requires the fact table store both nature key and the version number
to retrive a given version of the dimension date.
Method C – Use a surrogate key
When an attribue is change, a sequence generated key is used, the fact table will
also use this key column as the foreign key.
Type Three: Track changes using a separate column
Using this method, you use a separate column of dimension table to store the values
of previous years, in addition to the current year data.
This method does not track all the history, but just one prior version.
If the data is changed, the old value need to be moved from the current value column
to the prior column and the new value overwrites the current column.
This method is used when the changes is not randon but a predefined interval such
as annual.
47.
48. Structured Query Language
SQL is a database language used to create, manipulate and control the access to
the Database objects. SQL is a non procedural language used to access relational
databases. It is a flexible, efficient language with features designed to manipulate
and examine relational data.
SQL is only used for definition and manipulation of database objects. It cannot be
used for application development like form definitions, creation of procedures
etc...For that you need to necessarily have some 3gl languages such as cobol or 4gl
languages such as Dbase to provide front-end support to the database.
Key features of SQL are:
• Non procedural language
• Unified Language
• Common language for all Relational databases. ( Syntax may change
between different RDBMS )
SQL is made of Three sub-languages such as:
• Data Definition language (DDL)
• Data Manipulation language (DML)
• Data control language (DCL)
Data Definition Language (DDL): allows you to define database objects at the
conceptual level. It consists of commands to create objects and alter the structure of
objects, such as tables, views, indexes etc.. Commonly used DDL statements are
CREATE, DROP etc..
If you want to create a table Student,then use the following syntax
CREATE TABLE STUDENT
( STUDENT_ID INTEGER PRIMARY KEY,
STUDENT_NM VARCHAR(30),
COURSE_ID VARCHAR(15) ,
PHONE VARCHAR(10) ,
ADDRESS VARCHAR(50) );
To drop a table from the database
DROP TABLE STUDENT;
Data Manipulation language(DML): Allows you to retrieve or update data within a
database. It is used for query, insertion, deletion and updating of information stored
in databases. Eg: Select, Insert, Update, Delete.
49. STUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS
972-888-90 888, North Central Exp,
1001 JAMES Oracle
18 Dallas, TX- 75089
972-678-89 567, Preston Road, Dallas,
1002 JIM MSSql Server
09 TX - 75240
214-571-15 1234, Elm Street, Dallas,
1003 BRUCE Java
67 TX - 75039
Select statement:
Select statement in SQL language is used to display certain data from the table.For
example:- if you want to know what course Jim is taking; Select statement fetches
you the information you want,when you use the information you have. So,in the
above scenario the information you have is student_nm as Jim and and the
information you want is course_id, the intersection of those two columns in that
table is what you are looking for.
SELECT (what you want)
FROM (which tables)
WHERE (what you have )
Now the select statement to know the course_id Jim looks like this:
SELECT COURSE_ID
FROM STUDENT
WHERE STUDENT_NM = 'JIM'
You will get the result as:
COURSE_ID
MSSql Server
If you want to see all the rows in the table then your select will be:
SELECT * FROM STUDENT;
If you would like to show student_nm and address who is attending Oracle course in
the form of a report then your select will look like:
SELECT STUDENT_NM, ADDRESS
FROM STUENT
WHERE COURSE_ID = 'Oracle'
The result will be
STUDENT_NM ADDRESS
50. JAMES 888, North Central Exp, Dallas, TX- 75089
Insert Statement
Insert statement is used to insert a new row into the table. For example:- If a new
student DAVE is joining Java course then,use the INSERT SQL statement.
INSERT INTO STUDENT (STUDENT_ID, STUDENT_NM, COURSE_ID,PHONE,
ADDRESS ) VALUES
(1004, 'DAVE', 'Java','972-912-4008', '567, Washington Ave, Dallas - 75543' )
after executing the insert statement,your table should look like below when you issue
a select from student table:
STUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS
972-888-90 888, North Central Exp,
1001 JAMES Oracle
18 Dallas, TX- 75089
972-678-89 567, Preston Road, Dallas,
1002 JIM MSSql Server
09 TX - 75240
214-571-15 1234, Elm Street, Dallas, TX
1003 BRUCE Java
67 - 75039
972-912-40 567, Washington Ave,
1004 DAVE Java
08 Dallas - 75543
Update Statement
is used to change the existing information in the table.For example:-If DAVE moved
to another address then we need to change the ADDRESS column for DAVE's
record.If the new address is 146, Dallas Parkway, Dallas - 75240 then your update
should be:
UPDATE STUDENT SET ADDRESS = '146, Dallas Parkway, Dallas - 75240'
WHERE STUDENT_NM = 'DAVE'
In order to make sure you updated the Address column for DAVE issue following
SQL
SELECT * FROM STUDENT WHERE STUDENT_NM = 'DAVE'
then you should see the following result
STUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS
972-912-40 146, Dallas Parkway, Dallas
1004 DAVE Java
08 - 75240
Delete Statement
51. is used to delete a row from the table ie remove records from the table.For
example:JAMES moved to different city, and he does not want to take the course.In
order to remove JAMES's record from the table we use the DELETE statement
DELETE STUDENT
WHERE STUDENT_NM = 'JAMES'
once you delete the record and you select all the information from the student table
you should see the following information:
STUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS
972-678-89 567, Preston Road, Dallas,
1002 JIM MSSql Server
09 TX - 75240
214-571-15 1234, Elm Street, Dallas,
1003 BRUCE Java
67 TX - 75039
972-912-40 567, Washington Ave,
1004 DAVE Java
08 Dallas - 75543
If you dont include where clause in delete statment then it will remove all the rows
from the table.
Data control language(DCL)
In RDBMS one of the main advantages is the security for the data in the database.
You can allow some user to do a specific operation or all operations on certain
objects. Examples for DCL statements are GRANT, REVOKE statements.
GRANT is used to Grant a permission to an user so that the user can do that
operation.
REVOKE is used to take back that permission from that user on that object.
For example we have two users JAMES and DAVID
If JAMES created a table called ITEMS then JAMES becomes the owner of that
table.
DAVID cannot access ITEMS table because he is not the owner of that table.
DAVID can access ITEMS if JAMES gives the permission on his table.
JAMES can give different types of access like Select, Update, Delete and Insert
on ITEMS table to DAVID.
For example:-
If JAMES wants to provide only Select on ITEMS to DAVID then he can issue:
GRANT SELECT ON ITEMS TO JAMES
If JAMES wants to provide only Select and Insert on ITEMS to DAVID then he can
issue: GRANT SELECT, INSERT ON ITEMS TO JAMES
If JAMES wants to provide all the operations on ITEMS to DAVID then he can issue:
GRANT ALL ON ITEMS TO JAMES
Once you provide all permissions on an object to an user then indirectly he becomes
the owner and can do any manipulation to the table.
52. Oracle datatypes
Data in a database is stored in the form of tables. Each table consists of rows and
columns to store the data.
A particular column in a table must contain the same type of data.For example:
PLAYER_NAME(char COUNTRY DATE_OF_BIRTH(date
ROOM_NO(number)
) (char) )
AGASSI USA 10/12/1969 1004
WILLIAM USA 01/15/1975 1006
JIM RUSSIA 05/25/1980 1007
SWITZERLAN
HINGIS 06/25/1979 1009
D
Every column has certain information, PLAYER_NAME is a char column.
DATE_OF_BIRTH is a Date column, ROOM_NO is a number column.
Different datatypes available in Oracle database:
CHAR: To store character type of data,for example: name of a person (you can save
anything in character field)
VARCHAR: Same as CHAR. The only difference between CHAR and VARCHAR is
the way the database saves the data.
To understand the difference better we will take the following example.
CREATE TABLE EMPLOYEE (EMP_NO NUMBER(4), ENAME CHAR(15))
EMP_NO ENAME
888 CLARK
889 KING
890 DAVID COOPER
As Ename column defined as CHAR(15) every value you put it that column will
occupy all 15 bytes ie CLARK is 5 bytes string,so the database pads 10 spaces.
CREATE TABLE EMPLOYEE (EMP_NO NUMBER(4), ENAME VARCHAR(15))
EMP_NO ENAME
888 CLARK
889 KING
890 DAVID COOPER
53. Here as Ename is defined as VARCHAR(15) it occupies only the required space. so
in the above table ename CLARK occupies only 5 bytes in the database.
So what are the advantages and disadvantages?.The thumb rule here is that if you
are using a char column as primary key then it better be a char field. If you are using
a column to have comments then you must use varchar.
NUMBER: Used to store the numbers, for example:If you want to store employee
numbers then you define the column's data type as number. If you want to define a
column to store currency then you can define the column as NUMBER(7,2).
DATE: Used to store the date,like Date of birth of a person, join date in a company
etc.
LONG: to store the variable char length.
RAW:
LONG RAW: store binary data of variable length.
LOB: Large objects to store binary files.
In addition oracle 8 supports CLOB, BLOB and BFILE
CLOB - A table can have multiple columns of this type.
BLOB - can store large binary objects such as graphics, video and sound files.
BFILE - stores file pointers to LOB managed by file systems external to the database
Constraints
When you bind a business rule to a column in the table then those rules are called
the Constraints. Constraints are defined while creating the table. Say for example,
you cannot have an employee who does not have a name, then employee name
column in employee table should be a NOT NULL column. The NOT NULL is a
constraint.
The following table shows the constraint types and short descriptions.
Constraint Type Description
you must provide the value in that column. you cannot leave that
NOT NULL
column blank
PRIMARY KEY No duplicate values allowed, for example Empno in Employee table
54. should be unique
CHECK checks the value and controls the inserting and updating values.
DEFAULT Assigns a default value if no value is given.
REFERENCES To maintain the referential integrity (Foreign Key)
Examples for some of the rules usually implement through the business rules.
NOT NULL
If we have a business rule saying that all customers should have a name, we cannot
have any customer without a name. So to implement that business rule we can
create customer table and specify customer name column as NOT NULL (constraint)
Example
CREATE TABLE EMPLOYEE (EMPNO NUMBER(4) PRIMARY KEY, ENAME
VARCHAR(4) NOT NULL);CHECK
Check constraint is used where we define a condition on a column. Check constraint
consists of the keyword
col_name datatype CHECK (col_name in(value1, value2))
Example
If you have a business rule saying that all employees in the organization should get
atleast $500 then we can use CHECK constraint while creating table.
CREATE TABLE EMPLOYEE ( EMPNO NUMBER(4) PRIMARY KEY, ENAME
VARCHAR(4) NOT NULL, SALARY NUMBER(7,2) CHECK (SALARY > 500) );
DEFAULT
While inserting a row into a table without giving values for every column, SQL must
insert a default value to fill in the excluded columns, or the command will be rejected.
The most common default value is NULL. This can be used with columns not defined
with a NOT NULL.
Default value assigned to a column while creating the table using CREATE TABLE
operation.
Example
CREATE TABLE ITEM (ITEM_ID NUMBER(4) PRIMARY KEY, ITEM_NAME
VARCHAR(15),
ITEM_DESC VARCHAR(100), QOH NUMBER(4) DEFAULT 100)
Assigning a default value 0 for numeric columns makes the computation.
PRIMARY KEY
Primary Key in a table is a unique identifier of a row. For example,if you are
maintaning the customer profiles, you should assign particular number to each one.
So customer_number should be defined as a Primary key in Customer table.
55. REFERENCES
is a Foreign key. A foreign key column value refers a column in another table to
check whether the value exists or not.
UNIQUE
The values entered into a column are unique ie no duplicate values exists.This
constraint ensures business that there is no duplicates allowed.
Data Definition Language
It's a part of SQL langugae which creates a database object. Examples of database
objects are tables, procedures, functions, packages etc. When you create a table or
drop a table you are modifying the structure of the database and that is the reason
why it is called data definition language. When you issue a create or alter or drop sql
statements database internally does a commit,and that is why we cannot include the
DDL as part of the transaction.Following are a few DDL statements.
Create table
Create table course (
course_id not null number(5) primary key,
course_name not null varchar2(30),
start_date Date);
Alter table course modify ( start_date not null date );
Alter table course add ( instructor_id null );
Drop table course
Create table course ( course_id not null primary key, course_name varchar(30),
start_Date date ) tablespace=course_info storage (initial 1024k next 1024
pctincrease=10)
Data Manipulation Language
Data Manipulation in RDBMS means maintaining the data in the database. There are
three DML statements:Insert,Update and Delete. INSERT statment is used to insert
a new record into a table. The UPDATE statement is used to change the existing
information of a table. The DELETE statement is used to remove certain information
from the table.
We will take an example here:If you are running an apartment complex where you
rent apartments,the day to day record maintenance would look like this.
tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets
1000 888 SMITH 881-890-9000 767-908-5432 900 1
1001 889 STEVE 881-909-8971 898-543-9032 890 0
1002 890 BILL 781-897-9011 567-891-9108 880 2
INSERT Statement
56. If a person named JAMES rented an apartment,we need to add his information into
the table. We have to do an INSERT because the information does not exist in the
table as of now.The following information has to be entered into the database:-name
= JAMES aptno = 891, home_phone as 676-789-9011, work_phone as
777-567-1234, apt_rent = 880 and no_of_pets as 1.
So now how we can write the INSERT statement.
INSERT into TENANT
(tenant_id, aptno, tenant_name, home_phone, work_phone, apt_rent, no_of_pets )
VALUES
(1003, 891, 'JAMES','676-789-9011','777-567-1234', 880, 1 ). After executing the
insert statement the table now should have four rows as shown below
tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets
1000 888 SMITH 881-890-9000 767-908-5432 900 1
1001 889 STEVE 881-909-8971 898-543-9032 890 0
1002 890 BILL 781-897-9011 567-891-9108 880 2
1003 891 JAMES 676-789-9011 777-567-1234 880 1
Following shown are the different syntaxes available INSERT SQL syntaxes.
Syntax1
INSERT into table_name values (col1, col2, col3....) values (value1, value2,
value3.....)
In the syntax 1 we need to specify the column names of a table and values
respectively. In the application development its more recommened to use this syntax
while doing inserts into the table, reason being if you added a column in the table
then it won’t give an error except the value for that column won’t be supplied and
program will run fine.
Syntax2
INSERT into table_name values ( value1, value2.....)
In the Syntax 2 we won’t specify the column names and pass all the values to the
columns respectively.
Syntax3
INSERT itno table_name (col1, col2, col3...)
SELECT col1, col2, col3........ FROM table
In the Syntax 3 we can insert multiple rows using one INSERT into statement where
as in Syntax 1 and Syntax 2 you can insert only one row at a time.
UPDATE Statement