SlideShare una empresa de Scribd logo
1 de 82
MI00




MI0034 – Database
Management
Systems
                                Zafar Ishtiaq- 531111145
                                Assignment SET I and SET II .




       Atyab Gulf Catering Co

         Block 11, Street 108

                     Jabriya-

                      Kuwait

                  11/02/2009
MBA-IT Semester III
                      MI0034 – Database Management System
                                Assignment - Set- 1

Q1. Differentiate between Traditional File System & Modern Database System?
Describe the properties of Database & the Advantage of Database?

A1.
Differentiate between Traditional File System & Modern Database System:
File Base system were the traditional systems which has been replaced now by
modern database systems. All database application are using the Modern day
database management systems now a days .
The difference between the these two technologies given below.
File-based System
File-based systems were an early attempt to computerize the manual filing
system. File-based system is a collection of application programs that perform
services for the end-users. Each program defines and manages its data.
However, five types of problem are occurred in using the file-based approach:
Separation and isolation of data
When data is isolated in separate files, it is more difficult for us to access data
that should be available. The application programmer is required to synchronize
the processing of two or more files to ensure the correct data is extracted.
Duplication of data
When employing the decentralized file-based approach, the uncontrolled
duplication of data is occurred. Uncontrolled duplication of data is undesirable
because:


   i. Duplication is wasteful
   ii. Duplication can lead to loss of data integrity
Data dependence
Using file-based system, the physical structure and storage of the data files and
records are defined in the application program code. This characteristic is known
as program-data dependence. Making changes to an existing structure are rather
difficult and will lead to a modification of program. Such maintenance activities
are time-consuming and subject to error.
Incompatible file formats
The structures of the file are dependent on the application programming
language. However file structure provided in one programming language such as
direct file, indexed-sequential file which is available in COBOL programming, may
be different from the structure generated by other programming language such
as C. The direct incompatibility makes them difficult to process jointly.
Fixed queries / proliferation of application programs
File-based systems are very dependent upon the application programmer. Any
required queries or reports have to be written by the application programmer.
Normally, a fixed format query or report can only be entertained and no facility
for ad-hoc queries if offered.
File-based systems also give tremendous pressure on data processing staff, with
users' complaints on programs that are inadequate or inefficient in meeting their
demands. Documentation may be limited and maintenance of the system is
difficult. Provision for security, integrity and recovery capability is very limited.
Database Systems:
In order to overcome the limitations of the file-based approach, the concept of
database and the Database Management System (DMS) was emerged in 60s.

A database is an application that can store and retrieve data very rapidly. The
relational bit refers to how the data is stored in the database and how it is
organized. When we talk about database, we mean a relational database, in fact
an RDBMS - Relational Database Management System.
In a relational database, all data is stored in tables. These have the same structure
repeated in each row (like a spreadsheet) and it is the relations between the
tables that make it a "relational" table



Advantages:
A number of advantages of applying database approach in application system are
obtained including:
Control of data redundancy
The database approach attempts to eliminate the redundancy by integrating the
file. Although the database approach does not eliminate redundancy entirely, it
controls the amount of redundancy inherent in the database.
Data consistency:
By eliminating or controlling redundancy, the database approach reduces the risk
of inconsistencies occurring. It ensures all copies of the idea are kept consistent.
More information from the same amount of data
With the integration of the operated data in the database approach, it may be
possible to derive additional information for the same data.
Sharing of data
Database belongs to the entire organization and can be shared by all authorized
users.
Improved data integrity
Database integrity provides the validity and consistency of stored data. Integrity
is usually expressed in terms of constraints, which are consistency rules that the
database is not permitted to violate.
Improved security
Database approach provides a protection of the data from the unauthorized
users. It may take the term of user names and passwords to identify user type
and their access right in the operation including retrieval, insertion, updating and
deletion.
Enforcement of standards
The integration of the database enforces the necessary standards including data
formats, naming conventions, documentation standards, update procedures and
access rules.
Economy of scale
Cost savings can be obtained by combining all organization's operational data into
one database with applications to work on one source of data.
Balance of conflicting requirements
By having a structural design in the database, the conflicts between users or
departments can be resolved. Decisions will be based on the base use of
resources for the organization as a whole rather that for an individual entity.
Improved data accessibility and responsiveness
By having an integration in the database approach, data accessing can be crossed
departmental boundaries. This feature provides more functionality and better
services to the users.
Increased productivity
The database approach provides all the low-level file-handling routines. The
provision of these functions allows the programmer to concentrate more on the
specific functionality required by the users. The fourth-generation environment
provided by the database can simplify the database application development.
Improved maintenance
Database approach provides a data independence. As a change of data structure
in the database will be affect the application program, it simplifies database
application maintenance.
Increased concurrency
Database can manage concurrent data access effectively. It ensures no
interference between users that would not result any loss of information nor loss
of integrity.
Improved backing and recovery services
Modern database management system provides facilities to minimize the amount
of processing that can be lost following a failure by using the transaction
approach.


Disadvantages
In split of a large number of advantages can be found in the database approach, it
is not without any challenge. The following disadvantages can be found including:
Complexity
Database management system is an extremely complex piece of software. All
parties must be familiar with its functionality and take full advantage of it.
Therefore, training for the administrators, designers and users is required.
Size
The database management system consumes a substantial amount of main
memory as well as a large number amount of disk space in order to make it run
efficiently.
Cost of DBMS
A multi-user database management system may be very expensive. Even after
the installation, there is a high recurrent annual maintenance cost on the
software.
Cost of conversion
When moving from a file-base system to a database system, the company is
required to have additional expenses on hardware acquisition and training cost.
Performance
As the database approach is to cater for many applications rather than exclusively
for a particular one, some applications may not run as fast as before.
Higher impact of a failure
The database approach increases the vulnerability of the system due to the
centralization. As all users and applications reply on the database availability, the
failure of any component can bring operations to a halt and affect the services to
the customer seriously



Q2. What is the disadvantage of sequential file organization? How do you
overcome it? What are the advantages & disadvantages of Dynamic Hashing?

Disadvantage of Sequential file organization:
A file that contains records or other elements that are stored in a chronological
order based on account number or some other identifying data are called
sequential files . In order to locate the desired data, sequential files must be read
starting at the beginning of the file. A sequential file may be stored on a
sequential access device such as magnetic tape or on a direct access device such
as magnetic disk but the accessing method remains the same.

Slow Access :
The Major issue with the Sequential files is the slow access of information as the
read attempts go through the files one by one until arrived to the desired record.
That makes all file operation read –write and update very time consuming in
comparison to the random access files.


Dynamic Hashing:

Advantages

The main advantage of hash tables over other table data structures is speed. This
advantage is more apparent when the number of entries is large (thousands or
more). Hash tables are particularly efficient when the maximum number of
entries can be predicted in advance, so that the bucket array can be allocated
once with the optimum size and never resized.
If the set of key-value pairs is fixed and known ahead of time (so insertions and
deletions are not allowed), one may reduce the average lookup cost by a careful
choice of the hash function, bucket table size, and internal data structures. In
particular, one may be able to devise a hash function that is collision-free, or even
perfect (see below). In this case the keys need not be stored in the table.


Disadvantages

Hash tables can be more difficult to implement than self-balancing binary search
trees. Choosing an effective hash function for a specific application is more an art
than a science. In open-addressed hash tables it is fairly easy to create a poor
hash function.


Although operations on a hash table take constant time on average, the cost of a
good hash function can be significantly higher than the inner loop of the lookup
algorithm for a sequential list or search tree. Thus hash tables are not effective
when the number of entries is very small. (However, in some cases the high cost
of computing the hash function can be mitigated by saving the hash value
together with the key.)


For certain string processing applications, such as spell-checking, hash tables may
be less efficient than tries, finite automata, or Judy arrays. Also, if each key is
represented by a small enough number of bits, then, instead of a hash table, one
may use the key directly as the index into an array of values. Note that there are
no collisions in this case.


The entries stored in a hash table can be enumerated efficiently (at constant cost
per entry), but only in some pseudo-random order. Therefore, there is no efficient
way to efficiently locate an entry whose key is nearest to a given key. Listing all n
entries in some specific order generally requires a separate sorting step, whose
cost is proportional to log(n) per entry. In comparison, ordered search trees have
lookup and insertion cost proportional to log(n), but allow finding the nearest key
at about the same cost, and ordered enumeration of all entries at constant cost
per entry.


If the keys are not stored (because the hash function is collision-free), there may
be no easy way to enumerate the keys that are present in the table at any given
moment.

Although the average cost per operation is constant and fairly small, the cost of a
single operation may be quite high. In particular, if the hash table uses dynamic
resizing, an insertion or deletion operation may occasionally take time
proportional to the number of entries. This may be a serious drawback in real-
time or interactive applications.


Hash tables in general exhibit poor locality of reference—that is, the data to be
accessed is distributed seemingly at random in memory. Because hash tables
cause access patterns that jump around, this can trigger microprocessor cache
misses that cause long delays. Compact data structures such as arrays, searched
with linear search, may be faster if the table is relatively small and keys are
integers or other short strings. According to Moore's Law, cache sizes are growing
exponentially and so what is considered "small" may be increasing. The optimal
performance point varies from system to system.


Hash tables become quite inefficient when there are many collisions. While
extremely uneven hash distributions are extremely unlikely to arise by chance, a
malicious adversary with knowledge of the hash function may be able to supply
information to a hash which creates worst-case behavior by causing excessive
collisions, resulting in very poor performance (i.e., a denial of service attack). In
critical applications, either universal hashing can be used or a data structure with
better worst-case guarantees may be preferable.
Q3. What is relationship type? Explain the difference among a relationship
instance, relationship type & a relation set?

A3.

A relationship type R among n entity types E1, E2, …, En is a set of associations
among entities from these types. Actually, R is a set of relationship instances ri
where each ri is an n-tuple of entities (e1, e2, …, en), and each entity ej in ri is a
member of entity type Ej, 1≤j≤n. Hence, a relationship type is a mathematical
relation on E1, E2, …, En, or alternatively it can be defined as a subset of the
Cartesian product E1x E2x … xEn . Here, entity types E1, E2, …, En defines a set of
relationship, called relationship sets.




Q4. What is SQL? Discuss.
A4.

Abbreviation of structured query language, and pronounced either see-kwell or as
separate letters. SQL is a standardized query language for requesting information
from a database. The original version called SEQUEL (structured English query
language) was designed by an IBM research center in 1974 and 1975. SQL was
first introduced as a commercial database system in 1979 by Oracle Corporation.

Historically, SQL has been the favorite query language for database management
systems running on minicomputers and mainframes. Increasingly, however, SQL is
being supported by PC database systems because it supports distributed
databases (databases that are spread out over several computer systems). This
enables several users on a local-area network to access the same database
simultaneously.

Although there are different dialects of SQL, it is nevertheless the closest thing to
a standard query language that currently exists. In 1986, ANSI approved a
rudimentary version of SQL as the official standard, but most versions of SQL since
then have included many extensions to the ANSI standard. In 1991, ANSI updated
the standard. The new standard is known as SAG SQL.

SQL was one of the first commercial languages for Edgar F. Codd'srelational
model, as described in his influential 1970 paper, "A Relational Model of Data for
Large Shared Data Banks".[5] Despite not adhering to the relational model as
described by Codd, it became the most widely used database language.[6][7]
Although SQL is often described as, and to a great extent is, a declarative
language, it also includes procedural elements. SQL became a standard of the
American National Standards Institute (ANSI) in 1986, and of the International
Organization for Standards (ISO) in 1987. Since then, the standard has been
enhanced several times with added features. However, issues of SQL code
portability between major RDBMS products still exist due to lack of full
compliance with, or different interpretations of, the standard. Among the reasons
mentioned are the large size and incomplete specification of the standard, as well
as vendor lock-in.

SQL was initially developed at IBM by Donald D. Chamberlin and A. Murphy in the
early 1970s. This version, initially called SEQUEL (Structured English Query
Language), was designed to manipulate and retrieve data stored in IBM's original
quasi-relational database management system, System R, which a group at IBM
San Jose Research Laboratory had developed during the 1970s.[8] The acronym
SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-
basedHawker Siddeley aircraft company.[9]

The first Relational Database Management System (RDBMS) was RDMS,
developed at MIT in the early 1970s, soon followed by Ingres, developed in 1974
at U.C. Berkeley. Ingres implemented a query language known as QUEL, which
was later supplanted in the marketplace by SQL.[9]

In the late 1970s, Relational Software, Inc. (now Oracle Corporation) saw the
potential of the concepts described by Codd, Chamberlin, and Boyce and
developed their own SQL-based RDBMS with aspirations of selling it to the U.S.
Navy, Central Intelligence Agency, and other U.S. government agencies. In June
1979, Relational Software, Inc. introduced the first commercially available
implementation of SQL, Oracle V2 (Version2) for VAX computers. Oracle V2 beat
IBM's August release of the System/38 RDBMS to market by a few weeks.[citation
needed]
After testing SQL at customer test sites to determine the usefulness and
practicality of the system, IBM began developing commercial products based on
their System R prototype including System/38, SQL/DS, and DB2, which were
commercially available in 1979, 1981, and 1983, respectively.[10]




This chart shows several of the SQL language elements that compose a single
statement.

The SQL language is subdivided into several language elements, including:

      Clauses, which are constituent components of statements and queries. (In
      some cases, these are optional.)[11]
      Expressions, which can produce either scalar values or tables consisting of
      columns and rows of data.
      Predicates, which specify conditions that can be evaluated to SQL three-
      valued logic (3VL) or Boolean (true/false/unknown) truth values and which
      are used to limit the effects of statements and queries, or to change
      program flow.
      Queries, which retrieve the data based on specific criteria. This is the most
      important element of SQL.
      Statements, which may have a persistent effect on schemata and data, or
      which may control transactions, program flow, connections, sessions, or
      diagnostics.
         o SQL statements also include the semicolon (";") statement
            terminator. Though not required on every platform, it is defined as a
            standard part of the SQL grammar.
      Insignificant whitespace is generally ignored in SQL statements and
      queries, making it easier to format SQL code for readability.
[edit]Queries

The most common operation in SQL is the query, which is performed with the
declarative SELECT statement. SELECT retrieves data from one or more tables, or
expressions. Standard SELECT statements have no persistent effects on the
database. Some non-standard implementations of SELECT can have persistent
effects, such as the SELECT INTO syntax that exists in some databases.[12]

Queries allow the user to describe desired data, leaving the database
management system (DBMS) responsible for planning, optimizing, and performing
the physical operations necessary to produce that result as it chooses.

A query includes a list of columns to be included in the final result immediately
following the SELECT keyword. An asterisk ("*") can also be used to specify that
the query should return all columns of the queried tables. SELECT is the most
complex statement in SQL, with optional keywords and clauses that include:

      The FROM clause which indicates the table(s) from which data is to be
      retrieved. The FROM clause can include optional JOINsubclauses to specify
      the rules for joining tables.
      The WHERE clause includes a comparison predicate, which restricts the
      rows returned by the query. The WHERE clause eliminates all rows from the
      result set for which the comparison predicate does not evaluate to True.
      The GROUP BY clause is used to project rows having common values into a
      smaller set of rows. GROUP BY is often used in conjunction with SQL
      aggregation functions or to eliminate duplicate rows from a result set. The
      WHERE clause is applied before the GROUP BY clause.
      The HAVING clause includes a predicate used to filter rows resulting from
      the GROUP BY clause. Because it acts on the results of the GROUP BY
      clause, aggregation functions can be used in the HAVING clause predicate.
      The ORDER BY clause identifies which columns are used to sort the
      resulting data, and in which direction they should be sorted (options are
      ascending or descending). Without an ORDER BY clause, the order of rows
      returned by an SQL query is undefined.

The following is an example of a SELECT query that returns a list of expensive
books. The query retrieves all rows from the Book table in which the price column
contains a value greater than 100.00. The result is sorted in ascending order by
title. The asterisk (*) in the select list indicates that all columns of the Book table
should be included in the result set.

SELECT*
FROM Book
WHERE price >100.00
ORDERBY title;

The example below demonstrates a query of multiple tables, grouping, and
aggregation, by returning a list of books and the number of authors associated
with each book.

SELECTBook.title,
COUNT(*)AS Authors
FROM BookJOINBook_author
ONBook.isbn=Book_author.isbn
GROUPBYBook.title;

Example output might resemble the following:

Title               Authors
---------------------- -------
SQL Examples and Guide 4
The Joy of SQL              1
An Introduction to SQL 2
Pitfalls of SQL           1

Under the precondition that isbn is the only common column name of the two
tables and that a column named title only exists in the Books table, the above
query could be rewritten in the following form:

SELECT title,
COUNT(*)AS Authors
FROM BookNATURALJOINBook_author
GROUPBY title;

However, many vendors either do not support this approach, or require certain
column naming conventions in order for natural joins to work effectively.
SQL includes operators and functions for calculating values on stored values. SQL
allows the use of expressions in the select list to project data, as in the following
example which returns a list of books that cost more than 100.00 with an
additional sales_tax column containing a sales tax figure calculated at 6% of the
price.

SELECTisbn,
title,
price,
price*0.06ASsales_tax
FROM Book
WHERE price >100.00
ORDERBY title;
[edit]Subqueries

Queries can be nested so that the results of one query can be used in another
query via a relational operator or aggregation function. A nested query is also
known as a subquery. While joins and other table operations provide
computationally superior (i.e. faster) alternatives in many cases, the use of
subqueries introduces a hierarchy in execution which can be useful or necessary.
In the following example, the aggregation function AVG receives as input the
result of a subquery:

SELECTisbn, title, price
FROM Book
WHERE price <AVG(SELECT price FROM Book)
ORDERBY title;

Q5. What is Normalization? Discuss various types of Normal Forms?
Normalization is A process of decomposing tables to eliminate data redundancy
is                            called                            Normalization.

1N.F:-    The    table     should     caontain     scalar  or atomic values.
2 N.F:- Table should be in 1N.F + No partial functional dependencies
3 N.F :-Table should be in 2 N.F + No transitive dependencies
The normal forms defined in relational database theory represent guidelines for
record design. The guidelines corresponding to first through fifth normal forms
are presented here, in terms that do not require an understanding of relational
theory. The design guidelines are meaningful even if one is not using a relational
database system. We present the guidelines without referring to the concepts of
the relational model in order to emphasize their generality, and also to make
them easier to understand. Our presentation conveys an intuitive sense of the
intended constraints on record design, although in its informality it may be
imprecise in some technical details. A comprehensive treatment of the subject is
provided by Date [4].

The normalization rules are designed to prevent update anomalies and data
inconsistencies. With respect to performance tradeoffs, these guidelines are
biased toward the assumption that all non-key fields will be updated frequently.
They tend to penalize retrieval, since data which may have been retrievable from
one record in an unnormalized design may have to be retrieved from several
records in the normalized form. There is no obligation to fully normalize all
records when actual performance requirements are taken into account.

2 FIRST NORMAL FORM

First normal form [1] deals with the "shape" of a record type.

Under first normal form, all occurrences of a record type must contain the same
number of fields.

First normal form excludes variable repeating fields and groups. This is not so
much a design guideline as a matter of definition. Relational database theory
doesn't deal with records having a variable number of fields.

3 SECOND AND THIRD NORMAL FORMS

Second and third normal forms [2, 3, 7] deal with the relationship between non-
key and key fields.

Under second and third normal forms, a non-key field must provide a fact about
the key, us the whole key, and nothing but the key. In addition, the record must
satisfy first normal form.
We deal now only with "single-valued" facts. The fact could be a one-to-many
relationship, such as the department of an employee, or a one-to-one
relationship, such as the spouse of an employee. Thus the phrase "Y is a fact
about X" signifies a one-to-one or one-to-many relationship between Y and X. In
the general case, Y might consist of one or more fields, and so might X. In the
following example, QUANTITY is a fact about the combination of PART and
WAREHOUSE.

3.1 Second Normal Form

Second normal form is violated when a non-key field is a fact about a subset of a
key. It is only relevant when the key is composite, i.e., consists of several fields.
Consider the following inventory record:

---------------------------------------------------
| PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS |
====================-------------------------------

The key here consists of the PART and WAREHOUSE fields together, but
WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. The basic problems
with this design are:

      The warehouse address is repeated in every record that refers to a part
      stored in that warehouse.
      If the address of the warehouse changes, every record referring to a part
      stored in that warehouse must be updated.
      Because of the redundancy, the data might become inconsistent, with
      different records showing different addresses for the same warehouse.
      If at some point in time there are no parts stored in the warehouse, there
      may be no record in which to keep the warehouse's address.

To satisfy second normal form, the record shown above should be decomposed
into (replaced by) the two records:

------------------------------- ---------------------------------
| PART | WAREHOUSE | QUANTITY | | WAREHOUSE | WAREHOUSE-ADDRESS |
====================----------- =============--------------------
When a data design is changed in this way, replacing unnormalized records with
normalized records, the process is referred to as normalization. The term
"normalization" is sometimes used relative to a particular normal form. Thus a set
of records may be normalized with respect to second normal form but not with
respect to third.

The normalized design enhances the integrity of the data, by minimizing
redundancy and inconsistency, but at some possible performance cost for certain
retrieval applications. Consider an application that wants the addresses of all
warehouses stocking a certain part. In the unnormalized form, the application
searches one record type. With the normalized design, the application has to
search two record types, and connect the appropriate pairs.

3.2 Third Normal Form

Third normal form is violated when a non-key field is a fact about another non-
key field, as in

------------------------------------
| EMPLOYEE | DEPARTMENT | LOCATION |
============------------------------

The EMPLOYEE field is the key. If each department is located in one place, then
the LOCATION field is a fact about the DEPARTMENT -- in addition to being a fact
about the EMPLOYEE. The problems with this design are the same as those
caused by violations of second normal form:

      The department's location is repeated in the record of every employee
      assigned to that department.
      If the location of the department changes, every such record must be
      updated.
      Because of the redundancy, the data might become inconsistent, with
      different records showing different locations for the same department.
      If a department has no employees, there may be no record in which to
      keep the department's location.

To satisfy third normal form, the record shown above should be decomposed into
the two records:
------------------------- -------------------------
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
============------------- ==============-----------

To summarize, a record is in second and third normal forms if every field is either
part of the key or provides a (single-valued) fact about exactly the whole key and
nothing else.

3.3 Functional Dependencies

In relational database theory, second and third normal forms are defined in terms
of functional dependencies, which correspond approximately to our single-valued
facts. A field Y is "functionally dependent" on a field (or fields) X if it is invalid to
have two records with the same X-value but different Y-values. That is, a given X-
value must always occur with the same Y-value. When X is a key, then all fields
are by definition functionally dependent on X in a trivial way, since there can't be
two records having the same X value.

There is a slight technical difference between functional dependencies and single-
valued facts as we have presented them. Functional dependencies only exist
when the things involved have unique and singular identifiers (representations).
For example, suppose a person's address is a single-valued fact, i.e., a person has
only one address. If we don't provide unique identifiers for people, then there will
not be a functional dependency in the data:

----------------------------------------------
| PERSON |             ADDRESS                 |
-------------+--------------------------------
| John Smith | 123 Main St., New York            |
| John Smith | 321 Center St., San Francisco |
----------------------------------------------

Although each person has a unique address, a given name can appear with
several different addresses. Hence we do not have a functional dependency
corresponding to our single-valued fact.

Similarly, the address has to be spelled identically in each occurrence in order to
have a functional dependency. In the following case the same person appears to
be living at two different addresses, again precluding a functional dependency.
---------------------------------------
| PERSON |             ADDRESS          |
-------------+-------------------------
| John Smith | 123 Main St., New York |
| John Smith | 123 Main Street, NYC |
---------------------------------------

We are not defending the use of non-unique or non-singular representations.
Such practices often lead to data maintenance problems of their own. We do wish
to point out, however, that functional dependencies and the various normal
forms are really only defined for situations in which there are unique and singular
identifiers. Thus the design guidelines as we present them are a bit stronger than
those implied by the formal definitions of the normal forms.

For instance, we as designers know that in the following example there is a single-
valued fact about a non-key field, and hence the design is susceptible to all the
update anomalies mentioned earlier.

----------------------------------------------------------
| EMPLOYEE | FATHER | FATHER'S-ADDRESS                     |
|============------------+-------------------------------|
| Art Smith | John Smith | 123 Main St., New York          |
| Bob Smith | John Smith | 123 Main Street, NYC            |
| Cal Smith | John Smith | 321 Center St., San Francisco |
----------------------------------------------------------

However, in formal terms, there is no functional dependency here between
FATHER'S-ADDRESS and FATHER, and hence no violation of third normal form.

4 FOURTH AND FIFTH NORMAL FORMS

Fourth [5] and fifth [6] normal forms deal with multi-valued facts. The multi-
valued fact may correspond to a many-to-many relationship, as with employees
and skills, or to a many-to-one relationship, as with the children of an employee
(assuming only one parent is an employee). By "many-to-many" we mean that an
employee may have several skills, and a skill may belong to several employees.

Note that we look at the many-to-one relationship between children and fathers
as a single-valued fact about a child but a multi-valued fact about a father.
In a sense, fourth and fifth normal forms are also about composite keys. These
normal forms attempt to minimize the number of fields involved in a composite
key, as suggested by the examples to follow.

4.1 Fourth Normal Form

Under fourth normal form, a record type should not contain two or more
independent multi-valued facts about an entity. In addition, the record must
satisfy third normal form.

The term "independent" will be discussed after considering an example.

Consider employees, skills, and languages, where an employee may have several
skills and several languages. We have here two many-to-many relationships, one
between employees and skills, and one between employees and languages.
Under fourth normal form, these two relationships should not be represented in a
single record such as

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
===============================

Instead, they should be represented in the two records

-------------------- -----------------------
| EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE |
==================== =======================

Note that other fields, not involving multi-valued facts, are permitted to occur in
the record, as in the case of the QUANTITY field in the earlier PART/WAREHOUSE
example.

The main problem with violating fourth normal form is that it leads to
uncertainties in the maintenance policies. Several policies are possible for
maintaining two independent multi-valued facts in one record:

(1) A disjoint format, in which a record contains either a skill or a language, but
not both:
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook |                |
| Smith | type |                |
| Smith |           | French |
| Smith |           | German |
| Smith |           | Greek |
-------------------------------

This is not much different from maintaining two separate record types. (We note
in passing that such a format also leads to ambiguities regarding the meanings of
blank fields. A blank SKILL could mean the person has no skill, or the field is not
applicable to this employee, or the data is unknown, or, as in this case, the data
may be found in another record.)

(2) A random mix, with three variations:

(a) Minimal number of records, with repetitions:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------

(b) Minimal number of records, with null values:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | German |
| Smith |           | Greek |
-------------------------------

(c) Unrestricted:
-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type |                |
| Smith |           | German |
| Smith | type | Greek |
-------------------------------

(3) A "cross-product" form, where for each employee, there must be a record for
every possible pairing of one of his skills with one of his languages:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | cook | German |
| Smith | cook | Greek |
| Smith | type | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------

Other problems caused by violating fourth normal form are similar in spirit to
those mentioned earlier for violations of second or third normal form. They take
different variations depending on the chosen maintenance policy:

      If there are repetitions, then updates have to be done in multiple records,
      and they could become inconsistent.
      Insertion of a new skill may involve looking for a record with a blank skill, or
      inserting a new record with a possibly blank language, or inserting multiple
      records pairing the new skill with some or all of the languages.
      Deletion of a skill may involve blanking out the skill field in one or more
      records (perhaps with a check that this doesn't leave two records with the
      same language and a blank skill), or deleting one or more records, coupled
      with a check that the last mention of some language hasn't also been
      deleted.
Fourth normal form minimizes such update problems.

4.1.1 Independence

We mentioned independent multi-valued facts earlier, and we now illustrate
what we mean in terms of the example. The two many-to-many relationships,
employee:skill and employee:language, are "independent" in that there is no
direct connection between skills and languages. There is only an indirect
connection because they belong to some common employee. That is, it does not
matter which skill is paired with which language in a record; the pairing does not
convey any information. That's precisely why all the maintenance policies
mentioned earlier can be allowed.

In contrast, suppose that an employee could only exercise certain skills in certain
languages. Perhaps Smith can cook French cuisine only, but can type in French,
German, and Greek. Then the pairings of skills and languages becomes
meaningful, and there is no longer an ambiguity of maintenance policies. In the
present case, only the following form is correct:

-------------------------------
| EMPLOYEE | SKILL | LANGUAGE |
|----------+-------+----------|
| Smith | cook | French |
| Smith | type | French |
| Smith | type | German |
| Smith | type | Greek |
-------------------------------

Thus the employee:skill and employee:language relationships are no longer
independent. These records do not violate fourth normal form. When there is an
interdependence among the relationships, then it is acceptable to represent them
in a single record.

4.1.2 Multivalued Dependencies

For readers interested in pursuing the technical background of fourth normal
form a bit further, we mention that fourth normal form is defined in terms of
multivalued dependencies, which correspond to our independent multi-valued
facts. Multivalued dependencies, in turn, are defined essentially as relationships
which accept the "cross-product" maintenance policy mentioned above. That is,
for our example, every one of an employee's skills must appear paired with every
one of his languages. It may or may not be obvious to the reader that this is
equivalent to our notion of independence: since every possible pairing must be
present, there is no "information" in the pairings. Such pairings convey
information only if some of them can be absent, that is, only if it is possible that
some employee cannot perform some skill in some language. If all pairings are
always present, then the relationships are really independent.

We should also point out that multivalued dependencies and fourth normal form
apply as well to relationships involving more than two fields. For example,
suppose we extend the earlier example to include projects, in the following sense:

        An employee uses certain skills on certain projects.
        An employee uses certain languages on certain projects.

If there is no direct connection between the skills and languages that an
employee uses on a project, then we could treat this as two independent many-
to-many relationships of the form EP:S and EP:L, where "EP" represents a
combination of an employee with a project. A record including employee, project,
skill, and language would violate fourth normal form. Two records, containing
fields E,P,S and E,P,L, respectively, would satisfy fourth normal form.

4.2 Fifth Normal Form

Fifth normal form deals with cases where information can be reconstructed from
smaller pieces of information that can be maintained with less redundancy.
Second, third, and fourth normal forms also serve this purpose, but fifth normal
form generalizes to cases not covered by the others.

We will not attempt a comprehensive exposition of fifth normal form, but
illustrate the central concept with a commonly used example, namely one
involving agents, companies, and products. If agents represent companies,
companies make products, and agents sell products, then we might want to keep
a record of which agent sells which product for which company. This information
could be kept in one record type with three fields:

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | GM | truck |
-----------------------------

This form is necessary in the general case. For example, although agent Smith
sells cars made by Ford and trucks made by GM, he does not sell Ford trucks or
GM cars. Thus we need the combination of three fields to know which
combinations are valid and which are not.

But suppose that a certain rule was in effect: if an agent sells a certain product,
and he represents a company making that product, then he sells that product for
that company.

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | Ford | truck |
| Smith | GM | car |
| Smith | GM | truck |
| Jones | Ford | car |
-----------------------------

In this case, it turns out that we can reconstruct all the true facts from a
normalized form consisting of three separate record types, each containing two
fields:

------------------- --------------------- -------------------
| AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT |
|-------+---------| |---------+---------| |-------+---------|
| Smith | Ford | | Ford | car | | Smith | car |
| Smith | GM | | Ford | truck | | Smith | truck |
| Jones | Ford | | GM | car | | Jones | car |
------------------- | GM | truck | -------------------
                ---------------------
These three record types are in fifth normal form, whereas the corresponding
three-field record shown previously is not.

Roughly speaking, we may say that a record type is in fifth normal form when its
information content cannot be reconstructed from several smaller record types,
i.e., from record types each having fewer fields than the original record. The case
where all the smaller records have the same key is excluded. If a record type can
only be decomposed into smaller records which all have the same key, then the
record type is considered to be in fifth normal form without decomposition. A
record type in fifth normal form is also in fourth, third, second, and first normal
forms.

Fifth normal form does not differ from fourth normal form unless there exists a
symmetric constraint such as the rule about agents, companies, and products. In
the absence of such a constraint, a record type in fourth normal form is always in
fifth normal form.

One advantage of fifth normal form is that certain redundancies can be
eliminated. In the normalized form, the fact that Smith sells cars is recorded only
once; in the unnormalized form it may be repeated many times.

It should be observed that although the normalized form involves more record
types, there may be fewer total record occurrences. This is not apparent when
there are only a few facts to record, as in the example shown above. The
advantage is realized as more facts are recorded, since the size of the normalized
files increases in an additive fashion, while the size of the unnormalized file
increases in a multiplicative fashion. For example, if we add a new agent who sells
x products for y companies, where each of these companies makes each of these
products, we have to add x+y new records to the normalized form, but xy new
records to the unnormalized form.

It should be noted that all three record types are required in the normalized form
in order to reconstruct the same information. From the first two record types
shown above we learn that Jones represents Ford and that Ford makes trucks. But
we can't determine whether Jones sells Ford trucks until we look at the third
record type to determine whether Jones sells trucks at all.

The following example illustrates a case in which the rule about agents,
companies, and products is satisfied, and which clearly requires all three record
types in the normalized form. Any two of the record types taken alone will imply
something untrue.

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford | car |
| Smith | Ford | truck |
| Smith | GM | car |
| Smith | GM | truck |
| Jones | Ford | car |
| Jones | Ford | truck |
| Brown | Ford | car |
| Brown | GM | car |
| Brown | Totota | car |
| Brown | Totota | bus |
-----------------------------
------------------- --------------------- -------------------
| AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT |
|-------+---------| |---------+---------| |-------+---------|
| Smith | Ford | | Ford | car | | Smith | car | Fifth
| Smith | GM | | Ford | truck | | Smith | truck | Normal
| Jones | Ford | | GM | car | | Jones | car | Form
| Brown | Ford | | GM | truck | | Jones | truck |
| Brown | GM | | Toyota | car | | Brown | car |
| Brown | Toyota | | Toyota | bus | | Brown | bus |
------------------- --------------------- -------------------

Observe that:

      Jones sells cars and GM makes cars, but Jones does not represent GM.
      Brown represents Ford and Ford makes trucks, but Brown does not sell
      trucks.
      Brown represents Ford and Brown sells buses, but Ford does not make
      buses.

Fourth and fifth normal forms both deal with combinations of multivalued facts.
One difference is that the facts dealt with under fifth normal form are not
independent, in the sense discussed earlier. Another difference is that, although
fourth normal form can deal with more than two multivalued facts, it only
recognizes them in pairwise groups. We can best explain this in terms of the
normalization process implied by fourth normal form. If a record violates fourth
normal form, the associated normalization process decomposes it into two
records, each containing fewer fields than the original record. Any of these
violating fourth normal form is again decomposed into two records, and so on
until the resulting records are all in fourth normal form. At each stage, the set of
records after decomposition contains exactly the same information as the set of
records before decomposition.

In the present example, no pairwise decomposition is possible. There is no
combination of two smaller records which contains the same total information as
the original record. All three of the smaller records are needed. Hence an
information-preserving pairwise decomposition is not possible, and the original
record is not in violation of fourth normal form. Fifth normal form is needed in
order to deal with the redundancies in this case.

5 UNAVOIDABLE REDUNDANCIES

Normalization certainly doesn't remove all redundancies. Certain redundancies
seem to be unavoidable, particularly when several multivalued facts are
dependent rather than independent. In the example shown Section 4.1.1, it
seems unavoidable that we record the fact that "Smith can type" several times.
Also, when the rule about agents, companies, and products is not in effect, it
seems unavoidable that we record the fact that "Smith sells cars" several times.

6 INTER-RECORD REDUNDANCY

The normal forms discussed here deal only with redundancies occurring within a
single record type. Fifth normal form is considered to be the "ultimate" normal
form with respect to such redundanciesæ.

Other redundancies can occur across multiple record types. For the example
concerning employees, departments, and locations, the following records are in
third normal form in spite of the obvious redundancy:

------------------------- -------------------------
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
============------------- ==============-----------
-----------------------
| EMPLOYEE | LOCATION |
============-----------

In fact, two copies of the same record type would constitute the ultimate in this
kind of undetected redundancy.

Inter-record redundancy has been recognized for some time [1], and has recently
been addressed in terms of normal forms and normalization [8].

7 CONCLUSION

While we have tried to present the normal forms in a simple and understandable
way, we are by no means suggesting that the data design process is
correspondingly simple. The design process involves many complexities which are
quite beyond the scope of this paper. In the first place, an initial set of data
elements and records has to be developed, as candidates for normalization. Then
the factors affecting normalization have to be assessed:

      Single-valued vs. multi-valued facts.
      Dependency on the entire key.
      Independent vs. dependent facts.
      The presence of mutual constraints.
      The presence of non-unique or non-singular representations.


Q6. What do you mean by Shared Lock & Exclusive lock? Describe briefly two
phase locking protocol?

A database is the huge collection of data that is stored into it in form of tables.
This data is very important for the companies who use those databases as any
loss or misuse of this data can put both the company and customers into trouble.
In order to avoid this situation and protect the customers, database developing
companies provide much security featured with their database products; one of
them is Locking system to maintain the integrity of database. There are two types
of      lock     available     with      database      system,       these     are:
1) Shared Lock: is provided to the readers of the data. These locks enable all the
users to read the concurrent data at the same time, but they are not allowed to
change/ write the data or obtain exclusive lock on the object. It could be set for
table or table row. Lock is released or unlocked at the end of transaction.

2) Exclusive Lock: is provided to the writers of the data. When this lock is set on a
object or transaction, it means that only writer, who has set the lock can change
the data, and if other users cannot access the locked object. Lock is released at
the end of change in transaction. Can be set on Tables or rows.

Exclusive locks

Exclusive locks protect updates to file resources, both recoverable and non-
recoverable. They can be owned by only one transaction at a time. Any
transaction that requires an exclusive lock must wait if another task currently
owns an exclusive lock or a shared lock against the requested resource.

Shared locks

Shared locks support read integrity. They ensure that a record is not in the
process of being updated during a read-only request. Shared locks can also be
used to prevent updates of a record between the time that a record is read and
the next syncpoint.

A shared lock on a resource can be owned by several tasks at the same time.
However, although several tasks can own shared locks, there are some
circumstances in which tasks can be forced to wait for a lock:

    A request for a shared lock must wait if another task currently owns an
    exclusive lock on the resource.

    A request for an exclusive lock must wait if other tasks currently own shared
    locks on this resource.

    A new request for a shared lock must wait if another task is waiting for an
    exclusive lock on a resource that already has a shared lock.
In databases and transaction processing, two-phase locking (2PL) is a concurrency
control method that guarantees serializability.[1][2] It is also the name of the
resulting set of database transactionschedules (histories). The protocol utilizes
locks, applied by a transaction to data, which may block (interpreted as signals to
stop) other transactions from accessing the same data during the transaction's
life.

By the 2PL protocol locks are applied and removed in two phases:

   1. Expanding phase: locks are acquired and no locks are released.
   2. Shrinking phase: locks are released and no locks are acquired.

Two types of locks are utilized by the basic protocol: Shared and Exclusive locks.
Refinements of the basic protocol may utilize more lock types. Using locks that
block processes, 2PL may be subject to deadlocks that result from the mutual
blocking of two or more transactions.

2PL is a superset of strong strict two-phase locking (SS2PL),[3] also called
rigorousness,[4] which has been widely utilized for concurrency control in general-
purpose database systems since the 1970s. SS2PL implementations have many
variants. SS2PL was called strict 2PL[1] but this name usage is not recommended
now. Now strict 2PL (S2PL) is the intersection of strictness and 2PL, which is
different from SS2PL. SS2PL is also a special case of commitment ordering,[3] and
inherits many of CO's useful properties. SS2PL actually comprises only one phase:
phase-2 does not exist, and all locks are released only after transaction end. Thus
this useful 2PL type is not two-phased at all.

Neither 2PL nor S2PL in their general forms are known to be used in practice. Thus
2PL by itself does not seem to have much practical importance, and whenever 2PL
or S2PL utilization has been mentioned in the literature, the intention has been
SS2PL. What has made SS2PL so popular (probably the most utilized serializability
mechanism) is the effective and efficient locking-based combination of two
ingredients (the first does not exist in both general 2PL and S2PL; the second does
not exist in general 2PL):

   1. Commitment ordering, which provides both serializability, and effective
      distributed serializability and global serializability, and
2. Strictness, which provides cascadelessness (ACA, cascade-less
      recoverability) and (independently) allows efficient database recovery from
      failure.

Additionally SS2PL is easier, with less overhead to implement than both 2PL and
S2PL, provides exactly same locking, but sometimes releases locks later. However,
practically (though not simplistically theoretically) such later lock release occurs
only slightly later, and this apparent disadvantage is insignificant and disappears
next to the advantages of SS2PL.
Master of Business Administration - MBA Semester III
MI0034 – Database Management System - 4 Credits
Assignment - Set- 2 (60 Marks)
Answer all the Questions
Q1. Define Data Model & discuss the categories of Data Models? What is the
difference between logical data Independence & Physical Data Independence?


 A data model is a picture or description which depicts how data is to be arranged
to serve a specific purpose. The data model depicts what that data items are
required, and how that data must look. However it would be misleading to discuss
data models as if there were only one kind of data model, and equally misleading
to discuss them as if they were used for only one purpose. It would also be
misleading to assume that data models were only used in the construction of data
files.

Some data models are schematics which depict the manner in which data records
are connected or related within a file structure. These are called record or
structural data models. Some data models are used to identify the subjects of
corporate data processing - these are called entity-relationship data models. Still
another type of data model is used for analytic purposes to help the analyst to
solidify the semantics associated with critical corporate or business concepts.

The record data model

The record version of the data model is used to assist the implementation team
by providing a series of schematics of the file that will contain the data that must
be built to support the business processing procedures. When the design team
has chosen a file management system, or when corporate policy dictates that a
specific data management system, these models may be the only models
produced within the context of a design project. If no such choice has been made,
they may be produced after first developing a more general, non-DBMS specific
entity relationship data model.



Early data models
Although the term data modeling has become popular only in recent years, in fact
modeling of data has been going on for quite a long time. It is difficult for any of
us to pinpoint exactly when the first data model was constructed because each of
us has a different idea of what a data model is. If we go back to the definition we
set forth earlier, then we can say that perhaps the earliest form of data modeling
was practiced by the first persons who created paper forms for collecting large
amounts of similar data. We can see current versions of these forms everywhere
we look. Every time we fill out an application, buy something, make a request on
using anything other than a blank piece of paper or stationary, we are using a
form of data model.

These forms were designed to collect specific kinds of information, in specific
format. The very definition of the word form confirms this.

A definition

      A form is the shape and structure of something as distinguished from its
      substance. A form is a document with blanks for the insertion of details or
      information.

Almost all businesses and in fact almost all organization use forms of every sort to
gather and store information.

Data Management Systems

Until the introduction of data management systems (and data base management
systems) data modeling and data layout were synonymous. With one notable
exception data files were collections of identically formatted records. That
exception was a concept introduced in card records - the multi-format-card set, or
master detail set. This form of card record layout within a file allowed for
repeating sets of data within specific a larger record concept - the so-called logical
record (to distinguish it from the physical record). This form was used most
frequently when designing files to contain records of orders, where each order
could have certain data which was common to the whole order (the master) and
individual, repetitive records for each order line item (the details). This method of
file design employed record fragmentation rather than record consolidation.
To facilitate processing of these multi-format record files, designers used record
codes to identify records with different layouts and redundant data to permit
these records to be collected (or tied) together in sequence for processing.
Because these files were difficult to process, the layout of these records, and the
identification and placement of the control and redundant identifier data fields
had to be carefully planned. The planning and coordination associated with these
kinds of files constituted the first instances of data modeling.

The concepts associated with these kinds of files were transferred to magnetic
media and expanded by vendors who experimented with the substitution of
physical record addresses for the redundant data. This use of physical record
addresses coupled with various techniques for combining records of varying
lengths and formats gave rise to products which allowed for the construction of
complex files containing multiple format records tied together in complex
patterns to support business processing requirements.

These patterns were relatively difficult to visualize and schematics were devised
to portray them. These schematics were also called data models because they
modeled how the data was to be viewed. Because the schematics were based on
the manner in which the records were physically tied together, and thus logically
accessed, rather than how they were physically arranged on the direct access
device, they were in reality data file structure models, or data record structure
models. Over time the qualifications to these names became lost and they
became simply known as data models.

Whereas previously data was collected into large somewhat haphazardly
constructed records for processing, these new data management systems allowed
data to be separated into smaller, more focused records which could be tied
together to form a larger record by the data management system. The this
capability forced designers to look at data in different ways.

Data management models

The data management systems (also called data base management systems)
introduced several new ways of organizing data. That is they introduced several
new ways of linking record fragments (or segments) together to form larger
records for processing. Although many different methods were tried, only three
major methods became popular: the hierarchic method, the network method,
and the newest, the relational method.

Each of these methods reflected the manner in which the vendor constructed and
physically managed data within the file. The systems designer and the
programmer had to understand these methods so that they could retrieve and
process the data in the files. These models depicted the way the record fragments
were tied to each other and thus the manner in which the chain of pointers had
to be followed to retrieved the fragments in the correct order.

Each vendor introduced a structural model to depict how the data was organized
and tied together. These models also depicted what options were chosen to be
implemented by the development team, data record dependencies, data record
occurrence frequencies, and the sequence in which data records had to be
accessed - also called the navigation sequence.

The hierarchic model

The hierarchic model (figure 7-1) is used to describe those record structures in
which the various physical records which make up the logical record are tied
together in a sequence which looks like an inverted tree. At the top of the
structure is a single record. Beneath that are one or more records each of which
can occur one or more times. Each of these can in turn have multiple records
beneath them. In diagrammatic form the top to bottom set of records looks like a
inverted tree or a pyramid of records. To access the set of records associated with
the identifier one started at the top record and followed the pointers from record
to record.
The various records in the lower part of the structure are accessed by first
accessing the records above them and then following the chain of pointers to the
records at the next lower level. The records at any given level are referred to as
the parent records and the records at the next lower level that are connected to
it, or dependent on it are referred to as its children or the child records. There can
be any number of records at any level, and each record can have any number of
children. Each occurrence of the structure normally represent the collection of
data about a single subject. This parent-child repetition can be repeated through
several levels.
The data model for this type of structural representation usually depicts each
segment or record fragment only once and uses lines to show the connection
between a parent record and its children. This depiction of record types and lines
connecting them looks like an inverted tree or an organizational hierarchy chart.

Each file is said to consist of a number of repetitions of this tree structure.
Although the data model depicts all possible records types within a structure, in
any given occurrence, record types may or may not be present. Each occurrence
of the structure represents a specific subject occurrence an is identified by a
unique identifier in the single, topmost record type (the root record).

Designers employing this type of data management system would have to
develop a unique record hierarchy for each data storage subject. A given
application may have several different hierarchies, each representing data about
a different subject, associated with it and a company may have several dozen
different hierarchies of record types as components of its data model. A
characteristic of this type of model is that each hierarchy is normally treated as
separate and distinct from the other hierarchies, and various hierarchies can be
mixed and matched to suit the data needs of the particular application.

The network model

The network data model (figure 7-2) has no implicit hierarchic relationship
between the various records, and in many cases no implicit structure at all, with
the records seemingly placed at random. The network model does not make a
clear distinction between subjects mingling all record types in an overall
schematic. The network model may have many different records containing
unique identifiers, each of which acts as an entry point into the record structure.
Record types are grouped into sets of two, one or both of which can in turn be
part of another set of two record types. Within a given set, one record type is said
to be the owner record and one is said to be the member record. Access to a set
is always accomplished by first locating the specific owner record and then
following the chain of pointers to the member records of the set. The network
can be traversed or navigated by moving from set to set. Various different data
structures can be constructed by selecting sets of records and excluding others.
Each record type is depicted only once in this type of data model and the
relationship between record types is indicated by a line between them. The line
joining the two records contains the name of the set. Within a set a record can
have only one owner, but multiple owner member sets can be constructed using
the same two record types

The network model has no explicit hierarchy and no explicit entry point. Whereas
the hierarchic model has several different hierarchies structures, the network
model employs a single master network or model, which when completed looks
like a web of records. As new data is required, records are added to the network
and joined to existing sets.

The relational model

The relational model (figure 7-3), unlike the network or the hierarchic models did
not rely on pointers to connect and chose to view individual records in sets
regardless of the subject occurrence they were associated with. This is in contrast
to the other models which sought to depict the relationships between record
types. In the network model records are portrayed as residing in tables with no
physical pointer between these tables. Each table is thus portrayed independently
from each other table. This made the data model itself a model of simplicity, but
it in turn made the visualization of all the records associated with a particular
subject somewhat difficult.
Data records were connected using logic and by using that data that was
redundantly stored in each table. Records on a given subject occurrence could be
selected from multiple tables by matching the contents of these redundantly
stored data fields.

The impact of data management systems

The use of these products to manage data introduced a new set of tasks for the
data analysis personnel. In addition to developing record layouts, they also had
the new task of determining how these records should be structured, or arranged
and joined by pointer structures.

Once those decisions were made they had to be conveyed to the members of the
implementation team. The hierarchic and network models were necessary
because without them the occurrence sequences and the record to record
relationships designed into the files could not be adequately portrayed. Although
the relational "model" design choices also needed to be conveyed to the
implementation team, the relational model was always depicted in much the
same format as standard record layouts, and any other access or navigation
related information could be conveyed in narrative form.

Difference between logical data Independence & Physical Data Independence

Data independence is the type of data transparency that matters for a centralized
DBMS. It refers to the immunity of user applications to make changes in the
definition and organization of data.

Physical data independence deals with hiding the details of the storage structure
from user applications. The application should not be involved with these issues,
since there is no difference in the operation carried out against the data.

The data independence and operation independence together gives the feature
of data abstraction. There are two levels of data independence.



Logical                           Data                             Independence:
Logical data independence is the ability to modify the conceptual schema without
having alteration in external schemas or application programs. Alterations in the
conceptual schema may include addition or deletion of fresh entities, attributes
or relationships and should be possible without having alteration to existing
external schemas or having to rewrite application programs.


Physical                          Data                          Independence:
Physical data independence is the ability to modify the inner schema without
having alteration to the conceptual schemas or application programs. Alteration
in         the           internal            schema           might           include.
*              Using                 new               storage                devices.
*             Using                different             data              structures.
*    Switching        from        one       access     method       to       another.
*   Using      different     file      organizations    or    storage      structures.
* Modifying indexes.




Q2. What is a B+Trees? Describe the structure of both internal and leaf nodes of
a B+Tree?

A2.
B+-TREE

The B-tree is the classic disk-based data structure for indexing records based on
an ordered key set. The B+-tree (sometimes written B+-tree, B+tree, or just B-
tree) is a variant of the original B-tree in which all records are stored in the leaves
and all leaves are linked sequentially. The B+-tree is used as a (dynamic) indexing
method in relational database management systems.

B+-tree considers all the keys in nodes except the leaves as dummies. All keys are
duplicated in the leaves. This has the advantage that is all the leaves are linked
together sequentially, the entire tree may be scanned without visiting the higher
nodes at all.



B+-Tree Structure
• A B + -Tree consists of one or more blocks of data, called nodes, linked together
by pointers. The B + -Tree is a tree structure. The tree has a single node at the top,
called the root node. The root node points to two or more blocks , called child
nodes. Each child nodes points to further child nodes and so on.

• The B + -Tree consists of two types of

(1) internal nodes

(2) leaf nodes:

• Internal nodes point to other nodes in the tree. Leaf nodes point to data in the
database using data pointers. Leaf nodes also contain an additional pointer,
called the sibling pointer, which is used to improve the efficiency of certain types
of search.

• All the nodes in a B + -Tree must be at least half full except the root node which
may contain a minimum of two entries. The algorithms that allow data to be
inserted into and deleted from a B + -Tree guarantee that each node in the tree
will be at least half full.

• Searching for a value in the B + -Tree always starts at the root node and moves
downwards until it reaches a leaf node.

• Both internal and leaf nodes contain key values that are used to guide the
search for entries in the index.
• The B + -Tree is called a balanced tree because every path from the root node to
a leaf node is the same length. A balanced tree means that all searches for
individual values require the same number of nodes to be read from the disc.

Internal Nodes

 • An internal node in a B + -Tree consists of a set of key values and pointers.The
set of keys and values are ordered so that a pointer is followed by a key value.The
last key value is followed by one pointer.

• Each pointer points to nodes containing values that are less than or equalto the
value of the key immediately to its right.

• The last pointer in an internal node is called the infinity pointer. Theinfinity
pointer points to a node containing key values that are greater thanthe last key
value in the node.

• When an internal node is searched for a key value, the search begins at the

leftmost key value and moves rightwards along the keys.

• If the key value is less than the sought key then the pointer to the

left of the key is known to point to a node containing keys less than

the sought key.

• If the key value is greater than or equal to the sought key then the

pointer to the left of the key is known to point to a node containing

keys between the previous key value and the current key value.



Leaf Nodes



• A leaf node in a B + -Tree consists of a set of key values and data pointers.
Each key value has one data pointer. The key values and data pointers are

ordered by the key values.

• The data pointer points to a record or block in the database that contains

the record identified by the key value. For instance, in the example,

above, the pointer attached to key value 7 points to the record identified by the
value 7.

• Searching a leaf node for a key value begins at the leftmost value and

moves rightwards until a matching key is found.

• The leaf node also has a pointer to its immediate sibling node in the tree.

The sibling node is the node immediately to the right of the current node.

Because of the order of keys in the B + -Tree the sibling pointer always

points to a node that has key values that are greater than the key values in

the current node.

Order of a B + -Tree



• The order of a B + -Tree is the number of keys and pointers that an internal

node can contain. An order size of m means that an internal node can

containm-1 keys and m pointers.

• The order size is important because it determines how large a B + -Tree will

become.

• For example, if the order size is small then fewer keys and pointers can be

placed in one node and so more nodes will be required to store the index.
If the order size is large then more keys and pointers can be placed in a

node and so fewer nodes are required to store the index.



Searching a B+-Tree



      Searching a B+-Tree for a key value always starts at the root node and
descends down the tree. A search for a single key value in a B+-Tree consisting of
unique values will always follow one path from the root node to a leaf node.




Searching for Key Value 6



· Read blockB3 from disc.               ~ read the root node

· Is B3 a leaf node? No                         ~ its not a leaf node so the search
continues

· Is 6 <= 5? No                         ~ step through each value in B3
· Read block B2.                        ~ when all else fails follow the infinity
pointer

· Is B2 a leaf node? No                   ~ B2 is not a leaf node, continue the
search

· Is 6 <= 7? Yes                   ~ 6 is less than or equal to 7, follow pointer

· Read block L2.                     ~ read node L2 which is pointed to by 7 in
B2

· Is L2 a leaf node? Yes            ~ L2 is a leaf node

· Search L2 for the key value 6.     ~ if 6 is in the index it must be in L2



Searching for Key Value 5

· Read blockB3 from disc.          ~ read the root node

· Is B3 a leaf node? No                    ~ its not a leaf node so the search
continues

· Is 5 <= 5? Yes                   ~ step through each value in B3

· Read blockB1.                     ~ read node B1 which is pointed to by 5 in
B3

· Is B1 a leaf node? No                   ~ B1 is not a leaf node, continue the
search

· Is 5 <= 3? No                    ~ step through each value in B1

· Read blockL3.                         ~ when all else fails follow the infinity
pointer

· Is L3 a leaf node? Yes            ~ L3 is a leaf node

· Search L3 for the key value 5.     ~ if 5 is in the index it must be in L3
Inserting in a B+-Tree

             A B+-Tree consists of two types of node: (i) leaf nodes, which contain
pointers to data records, and (ii)internal nodes, which contain pointers to other
internal nodes or leaf nodes. In this example, we assume that the order size1 is 3
and that there are a maximum of two keys in each leaf node.

Insert sequence : 5, 8, 1, 7, 3, 12, 9, 6

Empty Tree

    The B+-Tree starts as a single leaf node. A leaf node consists of one or more
data pointers and a pointer to its right sibling. This leaf node is empty.




Inserting Key Value 5

To insert a key search for the location where the key would be expected to occur.
In our example the B+-Tree consists of a single leaf node, L1, which is empty.
Hence, the key value 5 must be placed in leaf node L1.
Inserting Key Value 8

      Again, search for the location where key value 8 is expected to be found.
This is in leaf node L1.

There is room in L1 so insert the new key.
Inserting Key Value 1

  Searching for where the key value 1 should appear also results in L1 but L1 is
now full it contains the maximum two records.




  L1 must be split into two nodes. The first node will contain the first half of the
keys and the second node will contain the second half of the keys
However, we now require a new root node to point to each of these nodes.
We create a new root node and promote the rightmost key from node L1.




Each node is half full.

Insert Key Value 7

  Search for the location where key 7 is expected to be located, that is, L2. Insert
key 7 into L2.
Insert Key Value 3

    Search for the location where key 3 is expected to be found results in reading
L1. But, L1 is full and must be split.




The rightmost key in L1, i.e. 3, must now be promoted up the tree.
L1 was pointed to by key 5 in B1. Therefore, all the key values in B1 to the right of
and including key 5 are moved to the right one place.

Insert Key Value 12

  Search for the location where key 12 is expected to be found, L2. Try to insert
12 into L2. Because L2 is full it must be split.
As before, we must promote the rightmost value of L2 but B1 is full and so it
must be split.




  Now the tree requires a new root node, so we promote the rightmost value of
B1 into a new node.
The tree is still balanced, that is, all paths from the root node, B3, to a leaf
node are of equal length.



Insert Key Value 9

    Search for the location where key value 9 would be expected to be found, L4.
Insert key 9 into L4.
Insert Key Value 6

  Key value 6 should be inserted into L2 but it is full. Therefore, split it and
promote the appropriate key value.




Leaf block L2 has split and the middle key, 7, has been promoted into B2.
Deleting from a B+-Tree

   Deleting entries from a B+-Tree may require some redistribution of the key
values to guarantee a wellbalanced tree.

Deletion sequence: 9, 8, 12.

Delete Key Value 9

   First, search for the location of key value 9, L4. Delete 9 from L4. L4 is not less
than half full and the tree is correct.




Delete Key Value 8

   Search for key value 8, L5. Deleting 8 from L5 causes L5 to underflow, that is, it
becomes less than half full.
We could remove L5 but instead we will attempt to redistribute some of
the values from L2. This is possible because L2 is full and half its contents can be
placed in L5. As some entries have been removed from L2, its parent B2 must be
adjusted to reflect the change.




We can do this by removing it from the index and then adjusting the parent node
B2.
Deleting Key Value 12

    Deleting key value 12 from L4 causes L4 to underflow. However, because L5 is
already half full we cannot redistribute keys between the nodes. L4 must be
deleted from the index and B2 adjusted to reflect the change.




    The tree is still balanced and all nodes are at least half full. However, to
guarantee this property it is sometimes necessary to perform a more extensive
redistribution of the data.



Search Algorithm



      s = Key value to be found

      n = Root node

      o = Order of B+-Tree

      WHILE n is not a leaf node

          i=1
found = FALSE

            WHILE i <= (o-1) AND NOT found

                  IF s <= nk[i] THEN

                      n = np[i]

      found = TRUE

                  ELSE

                      i=i+1

                  END

            END

            IF NOT found THEN

               n = np[i]

            END

      END




Insert Algorithm




      s = Key value to be inserted

      Search tree for node n containing key s with path in stack p

      from root(bottom) to parent of node n(top).

      IF found THEN
STOP

ELSE

  IF n is not full THEN

    Insert s into n

  ELSE

       Insert s in n (* assume n can hold s temporarily *)

        j = number of keys in n / 2

       Split n to give n and n1

        Put first j keys from n in n

        Put remaining keys from n in n1

        (k,p) = (nk[j],"pointer to n1")

  REPEAT

    IF p is empty THEN

        Create internal node n2

        Put (k,p) in n2

finished = TRUE

    ELSE

n = POP p

IF n is not full THEN

 Put (k,p) in n

finished = TRUE

ELSE
j = number of keys in n / 2

         Split n into n and n1

         Put first j keys and pointers in n into n

         Put remaining keys and pointers in n into n1

         (k,p) = (nk[j],"pointer to n1")

       END

   END

  UNTIL finished

 END


Q3. Describe Projection operation, Set theoretic operation & join operation?

Q3. The operation of projection consists in selecting the name of the columns of
table(s) which one wishes to see appearing in the answer. If one wants to display
all the columns "*" should be used. The columns are given after the SELECT
clause.

-Display the Name and the code sex of the students.

SELECT                                 Nometu,                            Cdsexe
FROM ETUDIANT;

-Display the contents of the table ETUDIANT

SELECT                                                                         *
FROM ETUDIANT;



Conventional set-theoretic operations are union, intersect, exception, and
Cartesian product.
Cartesian product
The Cartesian product discussed previously is realized as a comma-separated list
of table expressions (tables, views, subqueries) in the FROM clause. In addition,
another explicit join operation may be used:

SELECTLaptop.model, Product.model
FROM Laptop CROSS JOIN Product;



Recall that the Cartesian product combines each row in the first table with each
row in the second table. The number of the rows in the result set is equal to the
number of the rows in the first table multiplied by the number of the rows in the
second table. In the example under consideration, the Laptop table has 5 rows
while the Product table has 16 rows. As a result, we get 5*16 = 80 rows. Hence,
there is no result set of that query here. You may check this assertion executing
above query on the academic database.

In the uncombined state, the Cartesian product is hardly used in practice. As a
rule, it presents an intermediate restriction (horizontal ptojection) operation
where the WHERE clause is available in the SELECT statement.

Union

The UNION keyword is used for integrating queries:

<query                                                                        1>
UNION                                                                       [ALL]
<query 2>

The UNION operator combines the results of two SELECT statements into a single
result set. If the ALL parameter is given, all the duplicates of the rows returned
are retained; otherwise the result set includes only unique rows. Note that any
number of queries may be combined. Moreover, the union order can be changed
with parentheses.

The following conditions should be observed:
   The number of columns of each query must be the same.
          Result set columns of each query must be compared by the data type to
           each other (as they follows).
          The result set uses the column names in the first query.
          The ORDER BY clause is applied to the union result, so it may only be
           written at the end of the combined query.

Example.Find the model numbers and prices of the PCs and laptops:

SELECT model, price
FROM              PC
UNION
SELECT model, price
FROM          Laptop
ORDER BY price DESC;


                                model price

                                1750    1200.0

                                1752    1150.0

                                1298    1050.0

                                1233    980.0

                                1321    970.0

                                1233    950.0

                                1121    850.0

                                1298    700.0

                                1232    600.0

                                1233    600.0
1232   400.0

                                1232   350.0

                                1260   350.0

Example. Find the product type, the model number, and the price of the PCs and
laptops:

SELECT Product .type, PC.model, price
FROM          PC       INNER          JOIN
  Product ON PC.model = Product .model
UNION
SELECT Product .type, Laptop.model, price
FROM        Laptop       INNER        JOIN
  Product ON Laptop.model = Product .model
ORDER BY price DESC;


                           type    model price

                           Laptop 1750     1200.0

                           Laptop 1752     1150.0

                           Laptop 1298     1050.0

                           PC      1233    980.0

                           Laptop 1321     970.0

                           PC      1233    950.0

                           PC      1121    850.0

                           Laptop 1298     700.0

                           PC      1232    600.0
PC        1233   600.0

                            PC        1232   400.0

                            PC        1232   350.0

                            PC        1260   350.0


Intersect and Exception

The SQL standard offers SELECT statement clauses for operating with the
intersect and exception of queries. These are INTERSECT and EXCEPT clauses,
which work as the UNION clause. The result set will include only those rows that
are present in each query (INTERSECT) or only those rows from the first query
that are not present in the second query (EXCEPT).

Many of the DBMS do not support these clauses in the SELECT statement. This is
also true for MS SQL Server. There are also other means to be involved while
performing intersect and exception operations. It should be noted here that the
same result may be reached by differently formulating the SELECT statement. In
the case of intersection and exception one could use the EXISTS predicate.

The EXISTS predicate

EXISTS::=
    [NOT] EXISTS (<table subquery>)

The EXISTS predicate evaluates to TRUE providing the subquery contains any
rows, otherwise it evaluates to FALSE. NOT EXISTS works the same as EXISTS being
satisfied if no rows are returnable by the subquery. This predicate does not
evaluate to UNKNOWN.

As in our case, the EXISTS predicate is generally used with dependent subqueries.
That subquery type has an outer reference to the value in the main query. The
subquery result may be dependent on this value and must be separately
evaluated for each row of the query that includes the subquery. Because of this,
the EXISTS predicate may have different values for each row of the main query.
Intersection example. Find those laptop makers who also produce printers:

SELECT                   DISTINCT                  maker
FROM          Product             AS         Lap_product
WHERE      type       =      'Laptop'    AND       EXISTS
   (SELECT                                         maker
   FROM                                           Product
   WHERE type = 'Printer' AND maker = Lap_product.maker);



The printer makers are retrieved by the subquery and compared with the maker
returned from the main query. The main query returns the laptop makers. So, for
each laptop maker it is checked that the subquery returns any rows (i.e. this
maker also produces printers). Because the two queries in the WHERE clause
must simultaneously be satisfied (AND), the result set includes only wanted rows.
The DISTINCT keyword is used to make sure each maker is in the returned data
only once. As a result, we get:

                                      maker

                                      A

Exception example. Find those laptop makers who do not produce printers:

SELECT                   DISTINCT                  maker
FROM           Product           AS          Lap_product
WHERE     type    =     'Laptop'   AND     NOT     EXISTS
   (SELECT                                         maker
   FROM                                           Product
   WHERE type = 'Printer' AND maker = Lap_product.maker);


Here, it is sufficient to replace EXIST in the previous example with NOT EXIST. So,
the returned data includes only those main query rows, for which the subquery
return no rows. As a result we get:
maker

                                        B

                                        C




Q4. Discuss Multi Table Queries?

Inner joins (also known as equijoins) are used to contain information from a
combination of two or more tables. The join condition determines which records
are paired together and is specified in the WHERE clause. For example, let's
create a list of driver/vehicle match-ups where both the vehicle and driver are
located in the same city. The following SQL query will accomplish this task:

SELECTlastname,                            firstname,                           tag
FROM                                drivers,                               vehicles
WHERE drivers.location = vehicles.location

And let's take a look at the results:

lastname firstname          tag
-------- ---------    ---
Baker Roland                H122JM
Smythe           Michael    D824HA
Smythe           Michael    P091YF
Jacobs           Abraham    J291QR
Jacobs           Abraham    L990MT

Notice that the results are exactly what we sought. It is possible to further refine
the query by specifying additional criteria in the WHERE clause. Our vehicle
managers took a look at the results of our last query and noticed that the
previous query matches drivers to vehicles that they are not authorized to drive
(e.g. truck drivers to cars and vice-versa). We can use the following query to
resolve this problem:



The current commercial multilevel secure (MLS) database management system
(DBMS) products provide extensions to SQL to support multilevel database
applications. However, the DBMS vendors have implemented a variety of
mechanisms that are both difficult to understand and ineffective in addressing a
number of application concerns. The paper documents and compares the SQL
extensions for Informix Online/Secure, Trusted Oracle, Trusted Rubix, and Sybase
Secure SQL server. Based on the vendors' current implementations, we have
developed recommendations for an MLS SQL standard that would support
interoperability both among the MLS DBMS products and with standard SQL client
applications. We have also analyzed the vendors' approaches to polyinstantiation
and signaling channels; our recommendations include improved support for cover
stories and better control of inherent signaling channels




SELECT          lastname,           firstname,          tag,         vehicles.class
FROM                                 drivers,                              vehicles
WHERE                drivers.location              =              vehicles.location
AND drivers.class = vehicles.class

Notice that in this example we needed to specify the source table for the class
attribute in the SELECT clause. This is due to the fact that class is ambiguous – it
appears in both tables and we need to specify which table’s column should be
included in the query results. In this case it does not make a difference as the
columns are identical and they are joined using an equijoin. However, if the
columns contained different data this distinction would be critical. Here are the
results of this query:
lastname FirstName         Tag Class
-------- ---------         ---  -----
Baker Roland               H122JM     Car
Smythe           Michael   D824HA     Truck
Jacobs           Abraham   J291QR     Car

Notice that the rows pairing Michael Smythe to a car and Abraham Jacobs to a
truck have been removed.

You can also use inner joins to combine data from three or more tables.

Outer joins allow database users to include additional information in the query
results. We'll explore them in the next section of this article.




Take a moment and review the database tables located on the first page of this
article. Notice that we have a driver -- Jack Ryan -- who is located in a city where
there are no vehicles. Our vehicle managers would like this information to be
included in their query results to ensure that drivers do not sit idly by waiting for a
vehicle to arrive. We can use outer joins to include records from one table that
have no corresponding record in the joined table. Let's create a list of
driver/vehicle pairings that includes records for drivers with no vehicles in their
city. We can use the following query:

SELECTlastname,               firstname,                 driver.city,              tag
FROM                                drivers,                                  vehicles
WHERE drivers.location = vehicles.location (+)

Notice that the outer join operator "(+)" is included in this query. This operator is
placed in the join condition next to the table that is allowed to have NULL
values. This query would produce the following results:
lastname firstname citytag
-------- ---------    -------
Baker Roland                  NewYorkH122JM
Smythe           Michael      MiamiD824HA
Smythe           Michael      MiamiP091YF
Jacobs           Abraham SeattleJ291QR
Jacobs           Abraham Seattle                                             L990MT
Ryan          Patrick Annapolis

This time our results include the stranded Patrick Ryan and our vehicle
management department can now dispatch a vehicle to pick him up.

Note that there are other possible ways to accomplish the results seen in this
article and syntax may vary slightly from DBMS to DBMS. These examples were
designed to work with Oracle databases, so your mileage may vary. Furthermore,
as you advance in your knowledge of SQL you’ll discover that there is often more
than one way to accomplish a desired result and oftentimes one way is just as
good as another. Case in point, it is also possible to specify a join condition in the
FROM clause rather than the WHERE clause. For example, we used the following
SELECT statement earlier in this article:

SELECT                   lastname,                   firstname,                   tag
FROM                                 drivers,                                vehicles
WHERE                drivers.location               =               vehicles.location
AND drivers.class = vehicles.class

The same query could be rewritten as:

SELECT                  lastname,         firstname,                 tag
FROM drivers INNER JOIN vehicles ON drivers.location = vehicles.location
WHERE drivers.class = vehicles.class
That's it for this week! Be sure to check back next week for a new exciting article
on databases. If you'd like a reminder in your Inbox, subscribe to the About
Databases newsletter.




Q5. Discuss Transaction Processing Concept? 10.2 Describe properties of
Transactions?




In computer science, transaction processing is information processing that is
divided into individual, indivisible operations, called transactions. Each
transaction must succeed or fail as a complete unit; it cannot remain in an
intermediate state.



Transaction processing is designed to maintain a database Integrity (typically a
database or some modern filesystems) in a known, consistent state, by ensuring
that any operations carried out on the system that are interdependent are either
all completed successfully or all canceled successfully.

For example, consider a typical banking transaction that involves moving $700
from a customer's savings account to a customer's checking account. This
transaction is a single operation in the eyes of the bank, but it involves at least
two separate operations in computer terms: debiting the savings account by
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii

Más contenido relacionado

La actualidad más candente

Characteristic of dabase approach
Characteristic of dabase approachCharacteristic of dabase approach
Characteristic of dabase approachLuina Pani
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemTamur Iqbal
 
Database system environment ppt.
Database system environment ppt.Database system environment ppt.
Database system environment ppt.yhen06
 
Database management system
Database management systemDatabase management system
Database management systemRizwanHafeez
 
The advantages of a dbms
The advantages of a dbmsThe advantages of a dbms
The advantages of a dbmsadnan_bappy
 
Lect 21 components_of_database_management_system
Lect 21 components_of_database_management_systemLect 21 components_of_database_management_system
Lect 21 components_of_database_management_systemnadine016
 
Database design process
Database design processDatabase design process
Database design processTayyab Hameed
 
Lecture 09 dblc centralized vs decentralized design
Lecture 09   dblc centralized vs decentralized designLecture 09   dblc centralized vs decentralized design
Lecture 09 dblc centralized vs decentralized designemailharmeet
 
Lesson - 02 Network Design and Management
Lesson - 02 Network Design and ManagementLesson - 02 Network Design and Management
Lesson - 02 Network Design and ManagementAngel G Diaz
 
Functions of database management systems
Functions of database management systemsFunctions of database management systems
Functions of database management systemsUZAIR UDDIN SHAIKH
 

La actualidad más candente (19)

Database Management System 1
Database Management System 1Database Management System 1
Database Management System 1
 
Characteristic of dabase approach
Characteristic of dabase approachCharacteristic of dabase approach
Characteristic of dabase approach
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Chapter1
Chapter1Chapter1
Chapter1
 
Database system environment ppt.
Database system environment ppt.Database system environment ppt.
Database system environment ppt.
 
Database management system
Database management systemDatabase management system
Database management system
 
Database
DatabaseDatabase
Database
 
The advantages of a dbms
The advantages of a dbmsThe advantages of a dbms
The advantages of a dbms
 
Data Base Management System
Data Base Management SystemData Base Management System
Data Base Management System
 
Lect 21 components_of_database_management_system
Lect 21 components_of_database_management_systemLect 21 components_of_database_management_system
Lect 21 components_of_database_management_system
 
Database design process
Database design processDatabase design process
Database design process
 
Database Development Process
Database Development ProcessDatabase Development Process
Database Development Process
 
Lecture 09 dblc centralized vs decentralized design
Lecture 09   dblc centralized vs decentralized designLecture 09   dblc centralized vs decentralized design
Lecture 09 dblc centralized vs decentralized design
 
Lesson - 02 Network Design and Management
Lesson - 02 Network Design and ManagementLesson - 02 Network Design and Management
Lesson - 02 Network Design and Management
 
DATABASE MANAGEMENT
DATABASE MANAGEMENTDATABASE MANAGEMENT
DATABASE MANAGEMENT
 
Chapter one
Chapter oneChapter one
Chapter one
 
Functions of database management systems
Functions of database management systemsFunctions of database management systems
Functions of database management systems
 
Database Management System ppt
Database Management System pptDatabase Management System ppt
Database Management System ppt
 
Lecture 05 dblc
Lecture 05 dblcLecture 05 dblc
Lecture 05 dblc
 

Destacado

Infocus' Policy on Fee Disclosure Statements
Infocus' Policy on Fee Disclosure StatementsInfocus' Policy on Fee Disclosure Statements
Infocus' Policy on Fee Disclosure StatementsInfocusWealth
 
Recurrence Bringing Education to the 21st Century
Recurrence Bringing Education to the 21st CenturyRecurrence Bringing Education to the 21st Century
Recurrence Bringing Education to the 21st CenturyJuan Arango
 
Best interest duty & safe harbour
Best interest duty & safe harbourBest interest duty & safe harbour
Best interest duty & safe harbourInfocusWealth
 
The Future of Financial Advice
The Future of Financial AdviceThe Future of Financial Advice
The Future of Financial AdviceInfocusWealth
 
Business During the Roaring 20s
Business During the Roaring 20sBusiness During the Roaring 20s
Business During the Roaring 20smawb101
 
Cold War Project
Cold War ProjectCold War Project
Cold War Projectmawb101
 
Monnelle's slideshow
Monnelle's slideshowMonnelle's slideshow
Monnelle's slideshowellennom
 

Destacado (14)

Infocus' Policy on Fee Disclosure Statements
Infocus' Policy on Fee Disclosure StatementsInfocus' Policy on Fee Disclosure Statements
Infocus' Policy on Fee Disclosure Statements
 
Recurrence Bringing Education to the 21st Century
Recurrence Bringing Education to the 21st CenturyRecurrence Bringing Education to the 21st Century
Recurrence Bringing Education to the 21st Century
 
Ppangi
PpangiPpangi
Ppangi
 
Diet
DietDiet
Diet
 
REGLAMENTO
REGLAMENTOREGLAMENTO
REGLAMENTO
 
Best interest duty & safe harbour
Best interest duty & safe harbourBest interest duty & safe harbour
Best interest duty & safe harbour
 
Future with will
Future with willFuture with will
Future with will
 
The Future of Financial Advice
The Future of Financial AdviceThe Future of Financial Advice
The Future of Financial Advice
 
Business During the Roaring 20s
Business During the Roaring 20sBusiness During the Roaring 20s
Business During the Roaring 20s
 
Presentazione pmi
Presentazione pmiPresentazione pmi
Presentazione pmi
 
سورة الكهف
سورة الكهفسورة الكهف
سورة الكهف
 
Cold War Project
Cold War ProjectCold War Project
Cold War Project
 
Monnelle's slideshow
Monnelle's slideshowMonnelle's slideshow
Monnelle's slideshow
 
Slaid v+kv
Slaid v+kvSlaid v+kv
Slaid v+kv
 

Similar a Ans mi0034-database management system-sda-2012-ii

1. Chapter One.pdf
1. Chapter One.pdf1. Chapter One.pdf
1. Chapter One.pdffikadumola
 
Data base management system
Data base management systemData base management system
Data base management systemSuneel Dogra
 
Fundamentals of DBMS
Fundamentals of DBMSFundamentals of DBMS
Fundamentals of DBMSAhmed478619
 
Unit 2 rdbms study_material
Unit 2  rdbms study_materialUnit 2  rdbms study_material
Unit 2 rdbms study_materialgayaramesh
 
DBMS-1.pptx
DBMS-1.pptxDBMS-1.pptx
DBMS-1.pptxkingVox
 
Chap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptChap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptLisaMalar
 
Database Design and Implementation
Database Design and ImplementationDatabase Design and Implementation
Database Design and ImplementationChristian Reina
 
A critique on traditional file system vs databases
A critique on traditional file system vs databasesA critique on traditional file system vs databases
A critique on traditional file system vs databasesShallote Dsouza
 
DBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptxDBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptxDivyaKS12
 
A database management system
A database management systemA database management system
A database management systemghulam120
 
Introduction to files and db systems 1.0
Introduction to files and db systems 1.0Introduction to files and db systems 1.0
Introduction to files and db systems 1.0Dr. C.V. Suresh Babu
 

Similar a Ans mi0034-database management system-sda-2012-ii (20)

Assign 1
Assign 1Assign 1
Assign 1
 
database ppt(2)
database ppt(2)database ppt(2)
database ppt(2)
 
1. Chapter One.pdf
1. Chapter One.pdf1. Chapter One.pdf
1. Chapter One.pdf
 
Data base management system
Data base management systemData base management system
Data base management system
 
Fundamentals of DBMS
Fundamentals of DBMSFundamentals of DBMS
Fundamentals of DBMS
 
Database & dbms
Database & dbmsDatabase & dbms
Database & dbms
 
DataMgt - UNIT-I .PPT
DataMgt - UNIT-I .PPTDataMgt - UNIT-I .PPT
DataMgt - UNIT-I .PPT
 
Unit 2 rdbms study_material
Unit 2  rdbms study_materialUnit 2  rdbms study_material
Unit 2 rdbms study_material
 
DBMS-1.pptx
DBMS-1.pptxDBMS-1.pptx
DBMS-1.pptx
 
Chap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.pptChap1-Introduction to database systems.ppt
Chap1-Introduction to database systems.ppt
 
Ch01
Ch01Ch01
Ch01
 
Unit1 dbms
Unit1 dbmsUnit1 dbms
Unit1 dbms
 
Dbms mca-section a
Dbms mca-section aDbms mca-section a
Dbms mca-section a
 
Database Design and Implementation
Database Design and ImplementationDatabase Design and Implementation
Database Design and Implementation
 
A critique on traditional file system vs databases
A critique on traditional file system vs databasesA critique on traditional file system vs databases
A critique on traditional file system vs databases
 
Database System Concepts
Database System ConceptsDatabase System Concepts
Database System Concepts
 
DBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptxDBMS-INTRODUCTION.pptx
DBMS-INTRODUCTION.pptx
 
A database management system
A database management systemA database management system
A database management system
 
Introduction to files and db systems 1.0
Introduction to files and db systems 1.0Introduction to files and db systems 1.0
Introduction to files and db systems 1.0
 
Powerpoint chap.9
Powerpoint chap.9Powerpoint chap.9
Powerpoint chap.9
 

Último

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Último (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Ans mi0034-database management system-sda-2012-ii

  • 1. MI00 MI0034 – Database Management Systems Zafar Ishtiaq- 531111145 Assignment SET I and SET II . Atyab Gulf Catering Co Block 11, Street 108 Jabriya- Kuwait 11/02/2009
  • 2. MBA-IT Semester III MI0034 – Database Management System Assignment - Set- 1 Q1. Differentiate between Traditional File System & Modern Database System? Describe the properties of Database & the Advantage of Database? A1. Differentiate between Traditional File System & Modern Database System: File Base system were the traditional systems which has been replaced now by modern database systems. All database application are using the Modern day database management systems now a days . The difference between the these two technologies given below. File-based System File-based systems were an early attempt to computerize the manual filing system. File-based system is a collection of application programs that perform services for the end-users. Each program defines and manages its data. However, five types of problem are occurred in using the file-based approach: Separation and isolation of data When data is isolated in separate files, it is more difficult for us to access data that should be available. The application programmer is required to synchronize the processing of two or more files to ensure the correct data is extracted. Duplication of data When employing the decentralized file-based approach, the uncontrolled duplication of data is occurred. Uncontrolled duplication of data is undesirable because: i. Duplication is wasteful ii. Duplication can lead to loss of data integrity
  • 3. Data dependence Using file-based system, the physical structure and storage of the data files and records are defined in the application program code. This characteristic is known as program-data dependence. Making changes to an existing structure are rather difficult and will lead to a modification of program. Such maintenance activities are time-consuming and subject to error. Incompatible file formats The structures of the file are dependent on the application programming language. However file structure provided in one programming language such as direct file, indexed-sequential file which is available in COBOL programming, may be different from the structure generated by other programming language such as C. The direct incompatibility makes them difficult to process jointly. Fixed queries / proliferation of application programs File-based systems are very dependent upon the application programmer. Any required queries or reports have to be written by the application programmer. Normally, a fixed format query or report can only be entertained and no facility for ad-hoc queries if offered. File-based systems also give tremendous pressure on data processing staff, with users' complaints on programs that are inadequate or inefficient in meeting their demands. Documentation may be limited and maintenance of the system is difficult. Provision for security, integrity and recovery capability is very limited. Database Systems: In order to overcome the limitations of the file-based approach, the concept of database and the Database Management System (DMS) was emerged in 60s. A database is an application that can store and retrieve data very rapidly. The relational bit refers to how the data is stored in the database and how it is organized. When we talk about database, we mean a relational database, in fact an RDBMS - Relational Database Management System.
  • 4. In a relational database, all data is stored in tables. These have the same structure repeated in each row (like a spreadsheet) and it is the relations between the tables that make it a "relational" table Advantages: A number of advantages of applying database approach in application system are obtained including: Control of data redundancy The database approach attempts to eliminate the redundancy by integrating the file. Although the database approach does not eliminate redundancy entirely, it controls the amount of redundancy inherent in the database. Data consistency: By eliminating or controlling redundancy, the database approach reduces the risk of inconsistencies occurring. It ensures all copies of the idea are kept consistent. More information from the same amount of data With the integration of the operated data in the database approach, it may be possible to derive additional information for the same data. Sharing of data Database belongs to the entire organization and can be shared by all authorized users. Improved data integrity Database integrity provides the validity and consistency of stored data. Integrity is usually expressed in terms of constraints, which are consistency rules that the database is not permitted to violate. Improved security Database approach provides a protection of the data from the unauthorized users. It may take the term of user names and passwords to identify user type
  • 5. and their access right in the operation including retrieval, insertion, updating and deletion. Enforcement of standards The integration of the database enforces the necessary standards including data formats, naming conventions, documentation standards, update procedures and access rules. Economy of scale Cost savings can be obtained by combining all organization's operational data into one database with applications to work on one source of data. Balance of conflicting requirements By having a structural design in the database, the conflicts between users or departments can be resolved. Decisions will be based on the base use of resources for the organization as a whole rather that for an individual entity. Improved data accessibility and responsiveness By having an integration in the database approach, data accessing can be crossed departmental boundaries. This feature provides more functionality and better services to the users. Increased productivity The database approach provides all the low-level file-handling routines. The provision of these functions allows the programmer to concentrate more on the specific functionality required by the users. The fourth-generation environment provided by the database can simplify the database application development. Improved maintenance Database approach provides a data independence. As a change of data structure in the database will be affect the application program, it simplifies database application maintenance. Increased concurrency
  • 6. Database can manage concurrent data access effectively. It ensures no interference between users that would not result any loss of information nor loss of integrity. Improved backing and recovery services Modern database management system provides facilities to minimize the amount of processing that can be lost following a failure by using the transaction approach. Disadvantages In split of a large number of advantages can be found in the database approach, it is not without any challenge. The following disadvantages can be found including: Complexity Database management system is an extremely complex piece of software. All parties must be familiar with its functionality and take full advantage of it. Therefore, training for the administrators, designers and users is required. Size The database management system consumes a substantial amount of main memory as well as a large number amount of disk space in order to make it run efficiently. Cost of DBMS A multi-user database management system may be very expensive. Even after the installation, there is a high recurrent annual maintenance cost on the software. Cost of conversion When moving from a file-base system to a database system, the company is required to have additional expenses on hardware acquisition and training cost. Performance As the database approach is to cater for many applications rather than exclusively for a particular one, some applications may not run as fast as before.
  • 7. Higher impact of a failure The database approach increases the vulnerability of the system due to the centralization. As all users and applications reply on the database availability, the failure of any component can bring operations to a halt and affect the services to the customer seriously Q2. What is the disadvantage of sequential file organization? How do you overcome it? What are the advantages & disadvantages of Dynamic Hashing? Disadvantage of Sequential file organization: A file that contains records or other elements that are stored in a chronological order based on account number or some other identifying data are called sequential files . In order to locate the desired data, sequential files must be read starting at the beginning of the file. A sequential file may be stored on a sequential access device such as magnetic tape or on a direct access device such as magnetic disk but the accessing method remains the same. Slow Access : The Major issue with the Sequential files is the slow access of information as the read attempts go through the files one by one until arrived to the desired record. That makes all file operation read –write and update very time consuming in comparison to the random access files. Dynamic Hashing: Advantages The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large (thousands or more). Hash tables are particularly efficient when the maximum number of entries can be predicted in advance, so that the bucket array can be allocated once with the optimum size and never resized.
  • 8. If the set of key-value pairs is fixed and known ahead of time (so insertions and deletions are not allowed), one may reduce the average lookup cost by a careful choice of the hash function, bucket table size, and internal data structures. In particular, one may be able to devise a hash function that is collision-free, or even perfect (see below). In this case the keys need not be stored in the table. Disadvantages Hash tables can be more difficult to implement than self-balancing binary search trees. Choosing an effective hash function for a specific application is more an art than a science. In open-addressed hash tables it is fairly easy to create a poor hash function. Although operations on a hash table take constant time on average, the cost of a good hash function can be significantly higher than the inner loop of the lookup algorithm for a sequential list or search tree. Thus hash tables are not effective when the number of entries is very small. (However, in some cases the high cost of computing the hash function can be mitigated by saving the hash value together with the key.) For certain string processing applications, such as spell-checking, hash tables may be less efficient than tries, finite automata, or Judy arrays. Also, if each key is represented by a small enough number of bits, then, instead of a hash table, one may use the key directly as the index into an array of values. Note that there are no collisions in this case. The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order. Therefore, there is no efficient way to efficiently locate an entry whose key is nearest to a given key. Listing all n entries in some specific order generally requires a separate sorting step, whose cost is proportional to log(n) per entry. In comparison, ordered search trees have lookup and insertion cost proportional to log(n), but allow finding the nearest key
  • 9. at about the same cost, and ordered enumeration of all entries at constant cost per entry. If the keys are not stored (because the hash function is collision-free), there may be no easy way to enumerate the keys that are present in the table at any given moment. Although the average cost per operation is constant and fairly small, the cost of a single operation may be quite high. In particular, if the hash table uses dynamic resizing, an insertion or deletion operation may occasionally take time proportional to the number of entries. This may be a serious drawback in real- time or interactive applications. Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Compact data structures such as arrays, searched with linear search, may be faster if the table is relatively small and keys are integers or other short strings. According to Moore's Law, cache sizes are growing exponentially and so what is considered "small" may be increasing. The optimal performance point varies from system to system. Hash tables become quite inefficient when there are many collisions. While extremely uneven hash distributions are extremely unlikely to arise by chance, a malicious adversary with knowledge of the hash function may be able to supply information to a hash which creates worst-case behavior by causing excessive collisions, resulting in very poor performance (i.e., a denial of service attack). In critical applications, either universal hashing can be used or a data structure with better worst-case guarantees may be preferable.
  • 10. Q3. What is relationship type? Explain the difference among a relationship instance, relationship type & a relation set? A3. A relationship type R among n entity types E1, E2, …, En is a set of associations among entities from these types. Actually, R is a set of relationship instances ri where each ri is an n-tuple of entities (e1, e2, …, en), and each entity ej in ri is a member of entity type Ej, 1≤j≤n. Hence, a relationship type is a mathematical relation on E1, E2, …, En, or alternatively it can be defined as a subset of the Cartesian product E1x E2x … xEn . Here, entity types E1, E2, …, En defines a set of relationship, called relationship sets. Q4. What is SQL? Discuss. A4. Abbreviation of structured query language, and pronounced either see-kwell or as separate letters. SQL is a standardized query language for requesting information from a database. The original version called SEQUEL (structured English query language) was designed by an IBM research center in 1974 and 1975. SQL was first introduced as a commercial database system in 1979 by Oracle Corporation. Historically, SQL has been the favorite query language for database management systems running on minicomputers and mainframes. Increasingly, however, SQL is being supported by PC database systems because it supports distributed databases (databases that are spread out over several computer systems). This enables several users on a local-area network to access the same database simultaneously. Although there are different dialects of SQL, it is nevertheless the closest thing to a standard query language that currently exists. In 1986, ANSI approved a rudimentary version of SQL as the official standard, but most versions of SQL since
  • 11. then have included many extensions to the ANSI standard. In 1991, ANSI updated the standard. The new standard is known as SAG SQL. SQL was one of the first commercial languages for Edgar F. Codd'srelational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks".[5] Despite not adhering to the relational model as described by Codd, it became the most widely used database language.[6][7] Although SQL is often described as, and to a great extent is, a declarative language, it also includes procedural elements. SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987. Since then, the standard has been enhanced several times with added features. However, issues of SQL code portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations of, the standard. Among the reasons mentioned are the large size and incomplete specification of the standard, as well as vendor lock-in. SQL was initially developed at IBM by Donald D. Chamberlin and A. Murphy in the early 1970s. This version, initially called SEQUEL (Structured English Query Language), was designed to manipulate and retrieve data stored in IBM's original quasi-relational database management system, System R, which a group at IBM San Jose Research Laboratory had developed during the 1970s.[8] The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK- basedHawker Siddeley aircraft company.[9] The first Relational Database Management System (RDBMS) was RDMS, developed at MIT in the early 1970s, soon followed by Ingres, developed in 1974 at U.C. Berkeley. Ingres implemented a query language known as QUEL, which was later supplanted in the marketplace by SQL.[9] In the late 1970s, Relational Software, Inc. (now Oracle Corporation) saw the potential of the concepts described by Codd, Chamberlin, and Boyce and developed their own SQL-based RDBMS with aspirations of selling it to the U.S. Navy, Central Intelligence Agency, and other U.S. government agencies. In June 1979, Relational Software, Inc. introduced the first commercially available implementation of SQL, Oracle V2 (Version2) for VAX computers. Oracle V2 beat IBM's August release of the System/38 RDBMS to market by a few weeks.[citation needed]
  • 12. After testing SQL at customer test sites to determine the usefulness and practicality of the system, IBM began developing commercial products based on their System R prototype including System/38, SQL/DS, and DB2, which were commercially available in 1979, 1981, and 1983, respectively.[10] This chart shows several of the SQL language elements that compose a single statement. The SQL language is subdivided into several language elements, including: Clauses, which are constituent components of statements and queries. (In some cases, these are optional.)[11] Expressions, which can produce either scalar values or tables consisting of columns and rows of data. Predicates, which specify conditions that can be evaluated to SQL three- valued logic (3VL) or Boolean (true/false/unknown) truth values and which are used to limit the effects of statements and queries, or to change program flow. Queries, which retrieve the data based on specific criteria. This is the most important element of SQL. Statements, which may have a persistent effect on schemata and data, or which may control transactions, program flow, connections, sessions, or diagnostics. o SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a standard part of the SQL grammar. Insignificant whitespace is generally ignored in SQL statements and queries, making it easier to format SQL code for readability.
  • 13. [edit]Queries The most common operation in SQL is the query, which is performed with the declarative SELECT statement. SELECT retrieves data from one or more tables, or expressions. Standard SELECT statements have no persistent effects on the database. Some non-standard implementations of SELECT can have persistent effects, such as the SELECT INTO syntax that exists in some databases.[12] Queries allow the user to describe desired data, leaving the database management system (DBMS) responsible for planning, optimizing, and performing the physical operations necessary to produce that result as it chooses. A query includes a list of columns to be included in the final result immediately following the SELECT keyword. An asterisk ("*") can also be used to specify that the query should return all columns of the queried tables. SELECT is the most complex statement in SQL, with optional keywords and clauses that include: The FROM clause which indicates the table(s) from which data is to be retrieved. The FROM clause can include optional JOINsubclauses to specify the rules for joining tables. The WHERE clause includes a comparison predicate, which restricts the rows returned by the query. The WHERE clause eliminates all rows from the result set for which the comparison predicate does not evaluate to True. The GROUP BY clause is used to project rows having common values into a smaller set of rows. GROUP BY is often used in conjunction with SQL aggregation functions or to eliminate duplicate rows from a result set. The WHERE clause is applied before the GROUP BY clause. The HAVING clause includes a predicate used to filter rows resulting from the GROUP BY clause. Because it acts on the results of the GROUP BY clause, aggregation functions can be used in the HAVING clause predicate. The ORDER BY clause identifies which columns are used to sort the resulting data, and in which direction they should be sorted (options are ascending or descending). Without an ORDER BY clause, the order of rows returned by an SQL query is undefined. The following is an example of a SELECT query that returns a list of expensive books. The query retrieves all rows from the Book table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by
  • 14. title. The asterisk (*) in the select list indicates that all columns of the Book table should be included in the result set. SELECT* FROM Book WHERE price >100.00 ORDERBY title; The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of authors associated with each book. SELECTBook.title, COUNT(*)AS Authors FROM BookJOINBook_author ONBook.isbn=Book_author.isbn GROUPBYBook.title; Example output might resemble the following: Title Authors ---------------------- ------- SQL Examples and Guide 4 The Joy of SQL 1 An Introduction to SQL 2 Pitfalls of SQL 1 Under the precondition that isbn is the only common column name of the two tables and that a column named title only exists in the Books table, the above query could be rewritten in the following form: SELECT title, COUNT(*)AS Authors FROM BookNATURALJOINBook_author GROUPBY title; However, many vendors either do not support this approach, or require certain column naming conventions in order for natural joins to work effectively.
  • 15. SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the select list to project data, as in the following example which returns a list of books that cost more than 100.00 with an additional sales_tax column containing a sales tax figure calculated at 6% of the price. SELECTisbn, title, price, price*0.06ASsales_tax FROM Book WHERE price >100.00 ORDERBY title; [edit]Subqueries Queries can be nested so that the results of one query can be used in another query via a relational operator or aggregation function. A nested query is also known as a subquery. While joins and other table operations provide computationally superior (i.e. faster) alternatives in many cases, the use of subqueries introduces a hierarchy in execution which can be useful or necessary. In the following example, the aggregation function AVG receives as input the result of a subquery: SELECTisbn, title, price FROM Book WHERE price <AVG(SELECT price FROM Book) ORDERBY title; Q5. What is Normalization? Discuss various types of Normal Forms? Normalization is A process of decomposing tables to eliminate data redundancy is called Normalization. 1N.F:- The table should caontain scalar or atomic values. 2 N.F:- Table should be in 1N.F + No partial functional dependencies 3 N.F :-Table should be in 2 N.F + No transitive dependencies
  • 16. The normal forms defined in relational database theory represent guidelines for record design. The guidelines corresponding to first through fifth normal forms are presented here, in terms that do not require an understanding of relational theory. The design guidelines are meaningful even if one is not using a relational database system. We present the guidelines without referring to the concepts of the relational model in order to emphasize their generality, and also to make them easier to understand. Our presentation conveys an intuitive sense of the intended constraints on record design, although in its informality it may be imprecise in some technical details. A comprehensive treatment of the subject is provided by Date [4]. The normalization rules are designed to prevent update anomalies and data inconsistencies. With respect to performance tradeoffs, these guidelines are biased toward the assumption that all non-key fields will be updated frequently. They tend to penalize retrieval, since data which may have been retrievable from one record in an unnormalized design may have to be retrieved from several records in the normalized form. There is no obligation to fully normalize all records when actual performance requirements are taken into account. 2 FIRST NORMAL FORM First normal form [1] deals with the "shape" of a record type. Under first normal form, all occurrences of a record type must contain the same number of fields. First normal form excludes variable repeating fields and groups. This is not so much a design guideline as a matter of definition. Relational database theory doesn't deal with records having a variable number of fields. 3 SECOND AND THIRD NORMAL FORMS Second and third normal forms [2, 3, 7] deal with the relationship between non- key and key fields. Under second and third normal forms, a non-key field must provide a fact about the key, us the whole key, and nothing but the key. In addition, the record must satisfy first normal form.
  • 17. We deal now only with "single-valued" facts. The fact could be a one-to-many relationship, such as the department of an employee, or a one-to-one relationship, such as the spouse of an employee. Thus the phrase "Y is a fact about X" signifies a one-to-one or one-to-many relationship between Y and X. In the general case, Y might consist of one or more fields, and so might X. In the following example, QUANTITY is a fact about the combination of PART and WAREHOUSE. 3.1 Second Normal Form Second normal form is violated when a non-key field is a fact about a subset of a key. It is only relevant when the key is composite, i.e., consists of several fields. Consider the following inventory record: --------------------------------------------------- | PART | WAREHOUSE | QUANTITY | WAREHOUSE-ADDRESS | ====================------------------------------- The key here consists of the PART and WAREHOUSE fields together, but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone. The basic problems with this design are: The warehouse address is repeated in every record that refers to a part stored in that warehouse. If the address of the warehouse changes, every record referring to a part stored in that warehouse must be updated. Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same warehouse. If at some point in time there are no parts stored in the warehouse, there may be no record in which to keep the warehouse's address. To satisfy second normal form, the record shown above should be decomposed into (replaced by) the two records: ------------------------------- --------------------------------- | PART | WAREHOUSE | QUANTITY | | WAREHOUSE | WAREHOUSE-ADDRESS | ====================----------- =============--------------------
  • 18. When a data design is changed in this way, replacing unnormalized records with normalized records, the process is referred to as normalization. The term "normalization" is sometimes used relative to a particular normal form. Thus a set of records may be normalized with respect to second normal form but not with respect to third. The normalized design enhances the integrity of the data, by minimizing redundancy and inconsistency, but at some possible performance cost for certain retrieval applications. Consider an application that wants the addresses of all warehouses stocking a certain part. In the unnormalized form, the application searches one record type. With the normalized design, the application has to search two record types, and connect the appropriate pairs. 3.2 Third Normal Form Third normal form is violated when a non-key field is a fact about another non- key field, as in ------------------------------------ | EMPLOYEE | DEPARTMENT | LOCATION | ============------------------------ The EMPLOYEE field is the key. If each department is located in one place, then the LOCATION field is a fact about the DEPARTMENT -- in addition to being a fact about the EMPLOYEE. The problems with this design are the same as those caused by violations of second normal form: The department's location is repeated in the record of every employee assigned to that department. If the location of the department changes, every such record must be updated. Because of the redundancy, the data might become inconsistent, with different records showing different locations for the same department. If a department has no employees, there may be no record in which to keep the department's location. To satisfy third normal form, the record shown above should be decomposed into the two records:
  • 19. ------------------------- ------------------------- | EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION | ============------------- ==============----------- To summarize, a record is in second and third normal forms if every field is either part of the key or provides a (single-valued) fact about exactly the whole key and nothing else. 3.3 Functional Dependencies In relational database theory, second and third normal forms are defined in terms of functional dependencies, which correspond approximately to our single-valued facts. A field Y is "functionally dependent" on a field (or fields) X if it is invalid to have two records with the same X-value but different Y-values. That is, a given X- value must always occur with the same Y-value. When X is a key, then all fields are by definition functionally dependent on X in a trivial way, since there can't be two records having the same X value. There is a slight technical difference between functional dependencies and single- valued facts as we have presented them. Functional dependencies only exist when the things involved have unique and singular identifiers (representations). For example, suppose a person's address is a single-valued fact, i.e., a person has only one address. If we don't provide unique identifiers for people, then there will not be a functional dependency in the data: ---------------------------------------------- | PERSON | ADDRESS | -------------+-------------------------------- | John Smith | 123 Main St., New York | | John Smith | 321 Center St., San Francisco | ---------------------------------------------- Although each person has a unique address, a given name can appear with several different addresses. Hence we do not have a functional dependency corresponding to our single-valued fact. Similarly, the address has to be spelled identically in each occurrence in order to have a functional dependency. In the following case the same person appears to be living at two different addresses, again precluding a functional dependency.
  • 20. --------------------------------------- | PERSON | ADDRESS | -------------+------------------------- | John Smith | 123 Main St., New York | | John Smith | 123 Main Street, NYC | --------------------------------------- We are not defending the use of non-unique or non-singular representations. Such practices often lead to data maintenance problems of their own. We do wish to point out, however, that functional dependencies and the various normal forms are really only defined for situations in which there are unique and singular identifiers. Thus the design guidelines as we present them are a bit stronger than those implied by the formal definitions of the normal forms. For instance, we as designers know that in the following example there is a single- valued fact about a non-key field, and hence the design is susceptible to all the update anomalies mentioned earlier. ---------------------------------------------------------- | EMPLOYEE | FATHER | FATHER'S-ADDRESS | |============------------+-------------------------------| | Art Smith | John Smith | 123 Main St., New York | | Bob Smith | John Smith | 123 Main Street, NYC | | Cal Smith | John Smith | 321 Center St., San Francisco | ---------------------------------------------------------- However, in formal terms, there is no functional dependency here between FATHER'S-ADDRESS and FATHER, and hence no violation of third normal form. 4 FOURTH AND FIFTH NORMAL FORMS Fourth [5] and fifth [6] normal forms deal with multi-valued facts. The multi- valued fact may correspond to a many-to-many relationship, as with employees and skills, or to a many-to-one relationship, as with the children of an employee (assuming only one parent is an employee). By "many-to-many" we mean that an employee may have several skills, and a skill may belong to several employees. Note that we look at the many-to-one relationship between children and fathers as a single-valued fact about a child but a multi-valued fact about a father.
  • 21. In a sense, fourth and fifth normal forms are also about composite keys. These normal forms attempt to minimize the number of fields involved in a composite key, as suggested by the examples to follow. 4.1 Fourth Normal Form Under fourth normal form, a record type should not contain two or more independent multi-valued facts about an entity. In addition, the record must satisfy third normal form. The term "independent" will be discussed after considering an example. Consider employees, skills, and languages, where an employee may have several skills and several languages. We have here two many-to-many relationships, one between employees and skills, and one between employees and languages. Under fourth normal form, these two relationships should not be represented in a single record such as ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | =============================== Instead, they should be represented in the two records -------------------- ----------------------- | EMPLOYEE | SKILL | | EMPLOYEE | LANGUAGE | ==================== ======================= Note that other fields, not involving multi-valued facts, are permitted to occur in the record, as in the case of the QUANTITY field in the earlier PART/WAREHOUSE example. The main problem with violating fourth normal form is that it leads to uncertainties in the maintenance policies. Several policies are possible for maintaining two independent multi-valued facts in one record: (1) A disjoint format, in which a record contains either a skill or a language, but not both:
  • 22. ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | | | Smith | type | | | Smith | | French | | Smith | | German | | Smith | | Greek | ------------------------------- This is not much different from maintaining two separate record types. (We note in passing that such a format also leads to ambiguities regarding the meanings of blank fields. A blank SKILL could mean the person has no skill, or the field is not applicable to this employee, or the data is unknown, or, as in this case, the data may be found in another record.) (2) A random mix, with three variations: (a) Minimal number of records, with repetitions: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | German | | Smith | type | Greek | ------------------------------- (b) Minimal number of records, with null values: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | German | | Smith | | Greek | ------------------------------- (c) Unrestricted:
  • 23. ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | | | Smith | | German | | Smith | type | Greek | ------------------------------- (3) A "cross-product" form, where for each employee, there must be a record for every possible pairing of one of his skills with one of his languages: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | cook | German | | Smith | cook | Greek | | Smith | type | French | | Smith | type | German | | Smith | type | Greek | ------------------------------- Other problems caused by violating fourth normal form are similar in spirit to those mentioned earlier for violations of second or third normal form. They take different variations depending on the chosen maintenance policy: If there are repetitions, then updates have to be done in multiple records, and they could become inconsistent. Insertion of a new skill may involve looking for a record with a blank skill, or inserting a new record with a possibly blank language, or inserting multiple records pairing the new skill with some or all of the languages. Deletion of a skill may involve blanking out the skill field in one or more records (perhaps with a check that this doesn't leave two records with the same language and a blank skill), or deleting one or more records, coupled with a check that the last mention of some language hasn't also been deleted.
  • 24. Fourth normal form minimizes such update problems. 4.1.1 Independence We mentioned independent multi-valued facts earlier, and we now illustrate what we mean in terms of the example. The two many-to-many relationships, employee:skill and employee:language, are "independent" in that there is no direct connection between skills and languages. There is only an indirect connection because they belong to some common employee. That is, it does not matter which skill is paired with which language in a record; the pairing does not convey any information. That's precisely why all the maintenance policies mentioned earlier can be allowed. In contrast, suppose that an employee could only exercise certain skills in certain languages. Perhaps Smith can cook French cuisine only, but can type in French, German, and Greek. Then the pairings of skills and languages becomes meaningful, and there is no longer an ambiguity of maintenance policies. In the present case, only the following form is correct: ------------------------------- | EMPLOYEE | SKILL | LANGUAGE | |----------+-------+----------| | Smith | cook | French | | Smith | type | French | | Smith | type | German | | Smith | type | Greek | ------------------------------- Thus the employee:skill and employee:language relationships are no longer independent. These records do not violate fourth normal form. When there is an interdependence among the relationships, then it is acceptable to represent them in a single record. 4.1.2 Multivalued Dependencies For readers interested in pursuing the technical background of fourth normal form a bit further, we mention that fourth normal form is defined in terms of multivalued dependencies, which correspond to our independent multi-valued
  • 25. facts. Multivalued dependencies, in turn, are defined essentially as relationships which accept the "cross-product" maintenance policy mentioned above. That is, for our example, every one of an employee's skills must appear paired with every one of his languages. It may or may not be obvious to the reader that this is equivalent to our notion of independence: since every possible pairing must be present, there is no "information" in the pairings. Such pairings convey information only if some of them can be absent, that is, only if it is possible that some employee cannot perform some skill in some language. If all pairings are always present, then the relationships are really independent. We should also point out that multivalued dependencies and fourth normal form apply as well to relationships involving more than two fields. For example, suppose we extend the earlier example to include projects, in the following sense: An employee uses certain skills on certain projects. An employee uses certain languages on certain projects. If there is no direct connection between the skills and languages that an employee uses on a project, then we could treat this as two independent many- to-many relationships of the form EP:S and EP:L, where "EP" represents a combination of an employee with a project. A record including employee, project, skill, and language would violate fourth normal form. Two records, containing fields E,P,S and E,P,L, respectively, would satisfy fourth normal form. 4.2 Fifth Normal Form Fifth normal form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy. Second, third, and fourth normal forms also serve this purpose, but fifth normal form generalizes to cases not covered by the others. We will not attempt a comprehensive exposition of fifth normal form, but illustrate the central concept with a commonly used example, namely one involving agents, companies, and products. If agents represent companies, companies make products, and agents sell products, then we might want to keep a record of which agent sells which product for which company. This information could be kept in one record type with three fields: -----------------------------
  • 26. | AGENT | COMPANY | PRODUCT | |-------+---------+---------| | Smith | Ford | car | | Smith | GM | truck | ----------------------------- This form is necessary in the general case. For example, although agent Smith sells cars made by Ford and trucks made by GM, he does not sell Ford trucks or GM cars. Thus we need the combination of three fields to know which combinations are valid and which are not. But suppose that a certain rule was in effect: if an agent sells a certain product, and he represents a company making that product, then he sells that product for that company. ----------------------------- | AGENT | COMPANY | PRODUCT | |-------+---------+---------| | Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | ----------------------------- In this case, it turns out that we can reconstruct all the true facts from a normalized form consisting of three separate record types, each containing two fields: ------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT | |-------+---------| |---------+---------| |-------+---------| | Smith | Ford | | Ford | car | | Smith | car | | Smith | GM | | Ford | truck | | Smith | truck | | Jones | Ford | | GM | car | | Jones | car | ------------------- | GM | truck | ------------------- ---------------------
  • 27. These three record types are in fifth normal form, whereas the corresponding three-field record shown previously is not. Roughly speaking, we may say that a record type is in fifth normal form when its information content cannot be reconstructed from several smaller record types, i.e., from record types each having fewer fields than the original record. The case where all the smaller records have the same key is excluded. If a record type can only be decomposed into smaller records which all have the same key, then the record type is considered to be in fifth normal form without decomposition. A record type in fifth normal form is also in fourth, third, second, and first normal forms. Fifth normal form does not differ from fourth normal form unless there exists a symmetric constraint such as the rule about agents, companies, and products. In the absence of such a constraint, a record type in fourth normal form is always in fifth normal form. One advantage of fifth normal form is that certain redundancies can be eliminated. In the normalized form, the fact that Smith sells cars is recorded only once; in the unnormalized form it may be repeated many times. It should be observed that although the normalized form involves more record types, there may be fewer total record occurrences. This is not apparent when there are only a few facts to record, as in the example shown above. The advantage is realized as more facts are recorded, since the size of the normalized files increases in an additive fashion, while the size of the unnormalized file increases in a multiplicative fashion. For example, if we add a new agent who sells x products for y companies, where each of these companies makes each of these products, we have to add x+y new records to the normalized form, but xy new records to the unnormalized form. It should be noted that all three record types are required in the normalized form in order to reconstruct the same information. From the first two record types shown above we learn that Jones represents Ford and that Ford makes trucks. But we can't determine whether Jones sells Ford trucks until we look at the third record type to determine whether Jones sells trucks at all. The following example illustrates a case in which the rule about agents, companies, and products is satisfied, and which clearly requires all three record
  • 28. types in the normalized form. Any two of the record types taken alone will imply something untrue. ----------------------------- | AGENT | COMPANY | PRODUCT | |-------+---------+---------| | Smith | Ford | car | | Smith | Ford | truck | | Smith | GM | car | | Smith | GM | truck | | Jones | Ford | car | | Jones | Ford | truck | | Brown | Ford | car | | Brown | GM | car | | Brown | Totota | car | | Brown | Totota | bus | ----------------------------- ------------------- --------------------- ------------------- | AGENT | COMPANY | | COMPANY | PRODUCT | | AGENT | PRODUCT | |-------+---------| |---------+---------| |-------+---------| | Smith | Ford | | Ford | car | | Smith | car | Fifth | Smith | GM | | Ford | truck | | Smith | truck | Normal | Jones | Ford | | GM | car | | Jones | car | Form | Brown | Ford | | GM | truck | | Jones | truck | | Brown | GM | | Toyota | car | | Brown | car | | Brown | Toyota | | Toyota | bus | | Brown | bus | ------------------- --------------------- ------------------- Observe that: Jones sells cars and GM makes cars, but Jones does not represent GM. Brown represents Ford and Ford makes trucks, but Brown does not sell trucks. Brown represents Ford and Brown sells buses, but Ford does not make buses. Fourth and fifth normal forms both deal with combinations of multivalued facts. One difference is that the facts dealt with under fifth normal form are not
  • 29. independent, in the sense discussed earlier. Another difference is that, although fourth normal form can deal with more than two multivalued facts, it only recognizes them in pairwise groups. We can best explain this in terms of the normalization process implied by fourth normal form. If a record violates fourth normal form, the associated normalization process decomposes it into two records, each containing fewer fields than the original record. Any of these violating fourth normal form is again decomposed into two records, and so on until the resulting records are all in fourth normal form. At each stage, the set of records after decomposition contains exactly the same information as the set of records before decomposition. In the present example, no pairwise decomposition is possible. There is no combination of two smaller records which contains the same total information as the original record. All three of the smaller records are needed. Hence an information-preserving pairwise decomposition is not possible, and the original record is not in violation of fourth normal form. Fifth normal form is needed in order to deal with the redundancies in this case. 5 UNAVOIDABLE REDUNDANCIES Normalization certainly doesn't remove all redundancies. Certain redundancies seem to be unavoidable, particularly when several multivalued facts are dependent rather than independent. In the example shown Section 4.1.1, it seems unavoidable that we record the fact that "Smith can type" several times. Also, when the rule about agents, companies, and products is not in effect, it seems unavoidable that we record the fact that "Smith sells cars" several times. 6 INTER-RECORD REDUNDANCY The normal forms discussed here deal only with redundancies occurring within a single record type. Fifth normal form is considered to be the "ultimate" normal form with respect to such redundanciesæ. Other redundancies can occur across multiple record types. For the example concerning employees, departments, and locations, the following records are in third normal form in spite of the obvious redundancy: ------------------------- -------------------------
  • 30. | EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION | ============------------- ==============----------- ----------------------- | EMPLOYEE | LOCATION | ============----------- In fact, two copies of the same record type would constitute the ultimate in this kind of undetected redundancy. Inter-record redundancy has been recognized for some time [1], and has recently been addressed in terms of normal forms and normalization [8]. 7 CONCLUSION While we have tried to present the normal forms in a simple and understandable way, we are by no means suggesting that the data design process is correspondingly simple. The design process involves many complexities which are quite beyond the scope of this paper. In the first place, an initial set of data elements and records has to be developed, as candidates for normalization. Then the factors affecting normalization have to be assessed: Single-valued vs. multi-valued facts. Dependency on the entire key. Independent vs. dependent facts. The presence of mutual constraints. The presence of non-unique or non-singular representations. Q6. What do you mean by Shared Lock & Exclusive lock? Describe briefly two phase locking protocol? A database is the huge collection of data that is stored into it in form of tables. This data is very important for the companies who use those databases as any loss or misuse of this data can put both the company and customers into trouble. In order to avoid this situation and protect the customers, database developing companies provide much security featured with their database products; one of them is Locking system to maintain the integrity of database. There are two types of lock available with database system, these are:
  • 31. 1) Shared Lock: is provided to the readers of the data. These locks enable all the users to read the concurrent data at the same time, but they are not allowed to change/ write the data or obtain exclusive lock on the object. It could be set for table or table row. Lock is released or unlocked at the end of transaction. 2) Exclusive Lock: is provided to the writers of the data. When this lock is set on a object or transaction, it means that only writer, who has set the lock can change the data, and if other users cannot access the locked object. Lock is released at the end of change in transaction. Can be set on Tables or rows. Exclusive locks Exclusive locks protect updates to file resources, both recoverable and non- recoverable. They can be owned by only one transaction at a time. Any transaction that requires an exclusive lock must wait if another task currently owns an exclusive lock or a shared lock against the requested resource. Shared locks Shared locks support read integrity. They ensure that a record is not in the process of being updated during a read-only request. Shared locks can also be used to prevent updates of a record between the time that a record is read and the next syncpoint. A shared lock on a resource can be owned by several tasks at the same time. However, although several tasks can own shared locks, there are some circumstances in which tasks can be forced to wait for a lock: A request for a shared lock must wait if another task currently owns an exclusive lock on the resource. A request for an exclusive lock must wait if other tasks currently own shared locks on this resource. A new request for a shared lock must wait if another task is waiting for an exclusive lock on a resource that already has a shared lock.
  • 32. In databases and transaction processing, two-phase locking (2PL) is a concurrency control method that guarantees serializability.[1][2] It is also the name of the resulting set of database transactionschedules (histories). The protocol utilizes locks, applied by a transaction to data, which may block (interpreted as signals to stop) other transactions from accessing the same data during the transaction's life. By the 2PL protocol locks are applied and removed in two phases: 1. Expanding phase: locks are acquired and no locks are released. 2. Shrinking phase: locks are released and no locks are acquired. Two types of locks are utilized by the basic protocol: Shared and Exclusive locks. Refinements of the basic protocol may utilize more lock types. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two or more transactions. 2PL is a superset of strong strict two-phase locking (SS2PL),[3] also called rigorousness,[4] which has been widely utilized for concurrency control in general- purpose database systems since the 1970s. SS2PL implementations have many variants. SS2PL was called strict 2PL[1] but this name usage is not recommended now. Now strict 2PL (S2PL) is the intersection of strictness and 2PL, which is different from SS2PL. SS2PL is also a special case of commitment ordering,[3] and inherits many of CO's useful properties. SS2PL actually comprises only one phase: phase-2 does not exist, and all locks are released only after transaction end. Thus this useful 2PL type is not two-phased at all. Neither 2PL nor S2PL in their general forms are known to be used in practice. Thus 2PL by itself does not seem to have much practical importance, and whenever 2PL or S2PL utilization has been mentioned in the literature, the intention has been SS2PL. What has made SS2PL so popular (probably the most utilized serializability mechanism) is the effective and efficient locking-based combination of two ingredients (the first does not exist in both general 2PL and S2PL; the second does not exist in general 2PL): 1. Commitment ordering, which provides both serializability, and effective distributed serializability and global serializability, and
  • 33. 2. Strictness, which provides cascadelessness (ACA, cascade-less recoverability) and (independently) allows efficient database recovery from failure. Additionally SS2PL is easier, with less overhead to implement than both 2PL and S2PL, provides exactly same locking, but sometimes releases locks later. However, practically (though not simplistically theoretically) such later lock release occurs only slightly later, and this apparent disadvantage is insignificant and disappears next to the advantages of SS2PL.
  • 34. Master of Business Administration - MBA Semester III MI0034 – Database Management System - 4 Credits Assignment - Set- 2 (60 Marks) Answer all the Questions Q1. Define Data Model & discuss the categories of Data Models? What is the difference between logical data Independence & Physical Data Independence? A data model is a picture or description which depicts how data is to be arranged to serve a specific purpose. The data model depicts what that data items are required, and how that data must look. However it would be misleading to discuss data models as if there were only one kind of data model, and equally misleading to discuss them as if they were used for only one purpose. It would also be misleading to assume that data models were only used in the construction of data files. Some data models are schematics which depict the manner in which data records are connected or related within a file structure. These are called record or structural data models. Some data models are used to identify the subjects of corporate data processing - these are called entity-relationship data models. Still another type of data model is used for analytic purposes to help the analyst to solidify the semantics associated with critical corporate or business concepts. The record data model The record version of the data model is used to assist the implementation team by providing a series of schematics of the file that will contain the data that must be built to support the business processing procedures. When the design team has chosen a file management system, or when corporate policy dictates that a specific data management system, these models may be the only models produced within the context of a design project. If no such choice has been made, they may be produced after first developing a more general, non-DBMS specific entity relationship data model. Early data models
  • 35. Although the term data modeling has become popular only in recent years, in fact modeling of data has been going on for quite a long time. It is difficult for any of us to pinpoint exactly when the first data model was constructed because each of us has a different idea of what a data model is. If we go back to the definition we set forth earlier, then we can say that perhaps the earliest form of data modeling was practiced by the first persons who created paper forms for collecting large amounts of similar data. We can see current versions of these forms everywhere we look. Every time we fill out an application, buy something, make a request on using anything other than a blank piece of paper or stationary, we are using a form of data model. These forms were designed to collect specific kinds of information, in specific format. The very definition of the word form confirms this. A definition A form is the shape and structure of something as distinguished from its substance. A form is a document with blanks for the insertion of details or information. Almost all businesses and in fact almost all organization use forms of every sort to gather and store information. Data Management Systems Until the introduction of data management systems (and data base management systems) data modeling and data layout were synonymous. With one notable exception data files were collections of identically formatted records. That exception was a concept introduced in card records - the multi-format-card set, or master detail set. This form of card record layout within a file allowed for repeating sets of data within specific a larger record concept - the so-called logical record (to distinguish it from the physical record). This form was used most frequently when designing files to contain records of orders, where each order could have certain data which was common to the whole order (the master) and individual, repetitive records for each order line item (the details). This method of file design employed record fragmentation rather than record consolidation.
  • 36. To facilitate processing of these multi-format record files, designers used record codes to identify records with different layouts and redundant data to permit these records to be collected (or tied) together in sequence for processing. Because these files were difficult to process, the layout of these records, and the identification and placement of the control and redundant identifier data fields had to be carefully planned. The planning and coordination associated with these kinds of files constituted the first instances of data modeling. The concepts associated with these kinds of files were transferred to magnetic media and expanded by vendors who experimented with the substitution of physical record addresses for the redundant data. This use of physical record addresses coupled with various techniques for combining records of varying lengths and formats gave rise to products which allowed for the construction of complex files containing multiple format records tied together in complex patterns to support business processing requirements. These patterns were relatively difficult to visualize and schematics were devised to portray them. These schematics were also called data models because they modeled how the data was to be viewed. Because the schematics were based on the manner in which the records were physically tied together, and thus logically accessed, rather than how they were physically arranged on the direct access device, they were in reality data file structure models, or data record structure models. Over time the qualifications to these names became lost and they became simply known as data models. Whereas previously data was collected into large somewhat haphazardly constructed records for processing, these new data management systems allowed data to be separated into smaller, more focused records which could be tied together to form a larger record by the data management system. The this capability forced designers to look at data in different ways. Data management models The data management systems (also called data base management systems) introduced several new ways of organizing data. That is they introduced several new ways of linking record fragments (or segments) together to form larger records for processing. Although many different methods were tried, only three
  • 37. major methods became popular: the hierarchic method, the network method, and the newest, the relational method. Each of these methods reflected the manner in which the vendor constructed and physically managed data within the file. The systems designer and the programmer had to understand these methods so that they could retrieve and process the data in the files. These models depicted the way the record fragments were tied to each other and thus the manner in which the chain of pointers had to be followed to retrieved the fragments in the correct order. Each vendor introduced a structural model to depict how the data was organized and tied together. These models also depicted what options were chosen to be implemented by the development team, data record dependencies, data record occurrence frequencies, and the sequence in which data records had to be accessed - also called the navigation sequence. The hierarchic model The hierarchic model (figure 7-1) is used to describe those record structures in which the various physical records which make up the logical record are tied together in a sequence which looks like an inverted tree. At the top of the structure is a single record. Beneath that are one or more records each of which can occur one or more times. Each of these can in turn have multiple records beneath them. In diagrammatic form the top to bottom set of records looks like a inverted tree or a pyramid of records. To access the set of records associated with the identifier one started at the top record and followed the pointers from record to record.
  • 38. The various records in the lower part of the structure are accessed by first accessing the records above them and then following the chain of pointers to the records at the next lower level. The records at any given level are referred to as the parent records and the records at the next lower level that are connected to it, or dependent on it are referred to as its children or the child records. There can be any number of records at any level, and each record can have any number of children. Each occurrence of the structure normally represent the collection of data about a single subject. This parent-child repetition can be repeated through several levels.
  • 39. The data model for this type of structural representation usually depicts each segment or record fragment only once and uses lines to show the connection between a parent record and its children. This depiction of record types and lines connecting them looks like an inverted tree or an organizational hierarchy chart. Each file is said to consist of a number of repetitions of this tree structure. Although the data model depicts all possible records types within a structure, in any given occurrence, record types may or may not be present. Each occurrence of the structure represents a specific subject occurrence an is identified by a unique identifier in the single, topmost record type (the root record). Designers employing this type of data management system would have to develop a unique record hierarchy for each data storage subject. A given application may have several different hierarchies, each representing data about a different subject, associated with it and a company may have several dozen different hierarchies of record types as components of its data model. A characteristic of this type of model is that each hierarchy is normally treated as separate and distinct from the other hierarchies, and various hierarchies can be mixed and matched to suit the data needs of the particular application. The network model The network data model (figure 7-2) has no implicit hierarchic relationship between the various records, and in many cases no implicit structure at all, with the records seemingly placed at random. The network model does not make a clear distinction between subjects mingling all record types in an overall schematic. The network model may have many different records containing unique identifiers, each of which acts as an entry point into the record structure. Record types are grouped into sets of two, one or both of which can in turn be part of another set of two record types. Within a given set, one record type is said to be the owner record and one is said to be the member record. Access to a set is always accomplished by first locating the specific owner record and then following the chain of pointers to the member records of the set. The network can be traversed or navigated by moving from set to set. Various different data structures can be constructed by selecting sets of records and excluding others.
  • 40. Each record type is depicted only once in this type of data model and the relationship between record types is indicated by a line between them. The line joining the two records contains the name of the set. Within a set a record can have only one owner, but multiple owner member sets can be constructed using the same two record types The network model has no explicit hierarchy and no explicit entry point. Whereas the hierarchic model has several different hierarchies structures, the network model employs a single master network or model, which when completed looks
  • 41. like a web of records. As new data is required, records are added to the network and joined to existing sets. The relational model The relational model (figure 7-3), unlike the network or the hierarchic models did not rely on pointers to connect and chose to view individual records in sets regardless of the subject occurrence they were associated with. This is in contrast to the other models which sought to depict the relationships between record types. In the network model records are portrayed as residing in tables with no physical pointer between these tables. Each table is thus portrayed independently from each other table. This made the data model itself a model of simplicity, but it in turn made the visualization of all the records associated with a particular subject somewhat difficult.
  • 42. Data records were connected using logic and by using that data that was redundantly stored in each table. Records on a given subject occurrence could be selected from multiple tables by matching the contents of these redundantly stored data fields. The impact of data management systems The use of these products to manage data introduced a new set of tasks for the data analysis personnel. In addition to developing record layouts, they also had
  • 43. the new task of determining how these records should be structured, or arranged and joined by pointer structures. Once those decisions were made they had to be conveyed to the members of the implementation team. The hierarchic and network models were necessary because without them the occurrence sequences and the record to record relationships designed into the files could not be adequately portrayed. Although the relational "model" design choices also needed to be conveyed to the implementation team, the relational model was always depicted in much the same format as standard record layouts, and any other access or navigation related information could be conveyed in narrative form. Difference between logical data Independence & Physical Data Independence Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to make changes in the definition and organization of data. Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues, since there is no difference in the operation carried out against the data. The data independence and operation independence together gives the feature of data abstraction. There are two levels of data independence. Logical Data Independence: Logical data independence is the ability to modify the conceptual schema without having alteration in external schemas or application programs. Alterations in the conceptual schema may include addition or deletion of fresh entities, attributes or relationships and should be possible without having alteration to existing external schemas or having to rewrite application programs. Physical Data Independence: Physical data independence is the ability to modify the inner schema without having alteration to the conceptual schemas or application programs. Alteration
  • 44. in the internal schema might include. * Using new storage devices. * Using different data structures. * Switching from one access method to another. * Using different file organizations or storage structures. * Modifying indexes. Q2. What is a B+Trees? Describe the structure of both internal and leaf nodes of a B+Tree? A2. B+-TREE The B-tree is the classic disk-based data structure for indexing records based on an ordered key set. The B+-tree (sometimes written B+-tree, B+tree, or just B- tree) is a variant of the original B-tree in which all records are stored in the leaves and all leaves are linked sequentially. The B+-tree is used as a (dynamic) indexing method in relational database management systems. B+-tree considers all the keys in nodes except the leaves as dummies. All keys are duplicated in the leaves. This has the advantage that is all the leaves are linked together sequentially, the entire tree may be scanned without visiting the higher nodes at all. B+-Tree Structure
  • 45. • A B + -Tree consists of one or more blocks of data, called nodes, linked together by pointers. The B + -Tree is a tree structure. The tree has a single node at the top, called the root node. The root node points to two or more blocks , called child nodes. Each child nodes points to further child nodes and so on. • The B + -Tree consists of two types of (1) internal nodes (2) leaf nodes: • Internal nodes point to other nodes in the tree. Leaf nodes point to data in the database using data pointers. Leaf nodes also contain an additional pointer, called the sibling pointer, which is used to improve the efficiency of certain types of search. • All the nodes in a B + -Tree must be at least half full except the root node which may contain a minimum of two entries. The algorithms that allow data to be inserted into and deleted from a B + -Tree guarantee that each node in the tree will be at least half full. • Searching for a value in the B + -Tree always starts at the root node and moves downwards until it reaches a leaf node. • Both internal and leaf nodes contain key values that are used to guide the search for entries in the index.
  • 46. • The B + -Tree is called a balanced tree because every path from the root node to a leaf node is the same length. A balanced tree means that all searches for individual values require the same number of nodes to be read from the disc. Internal Nodes • An internal node in a B + -Tree consists of a set of key values and pointers.The set of keys and values are ordered so that a pointer is followed by a key value.The last key value is followed by one pointer. • Each pointer points to nodes containing values that are less than or equalto the value of the key immediately to its right. • The last pointer in an internal node is called the infinity pointer. Theinfinity pointer points to a node containing key values that are greater thanthe last key value in the node. • When an internal node is searched for a key value, the search begins at the leftmost key value and moves rightwards along the keys. • If the key value is less than the sought key then the pointer to the left of the key is known to point to a node containing keys less than the sought key. • If the key value is greater than or equal to the sought key then the pointer to the left of the key is known to point to a node containing keys between the previous key value and the current key value. Leaf Nodes • A leaf node in a B + -Tree consists of a set of key values and data pointers.
  • 47. Each key value has one data pointer. The key values and data pointers are ordered by the key values. • The data pointer points to a record or block in the database that contains the record identified by the key value. For instance, in the example, above, the pointer attached to key value 7 points to the record identified by the value 7. • Searching a leaf node for a key value begins at the leftmost value and moves rightwards until a matching key is found. • The leaf node also has a pointer to its immediate sibling node in the tree. The sibling node is the node immediately to the right of the current node. Because of the order of keys in the B + -Tree the sibling pointer always points to a node that has key values that are greater than the key values in the current node. Order of a B + -Tree • The order of a B + -Tree is the number of keys and pointers that an internal node can contain. An order size of m means that an internal node can containm-1 keys and m pointers. • The order size is important because it determines how large a B + -Tree will become. • For example, if the order size is small then fewer keys and pointers can be placed in one node and so more nodes will be required to store the index.
  • 48. If the order size is large then more keys and pointers can be placed in a node and so fewer nodes are required to store the index. Searching a B+-Tree Searching a B+-Tree for a key value always starts at the root node and descends down the tree. A search for a single key value in a B+-Tree consisting of unique values will always follow one path from the root node to a leaf node. Searching for Key Value 6 · Read blockB3 from disc. ~ read the root node · Is B3 a leaf node? No ~ its not a leaf node so the search continues · Is 6 <= 5? No ~ step through each value in B3
  • 49. · Read block B2. ~ when all else fails follow the infinity pointer · Is B2 a leaf node? No ~ B2 is not a leaf node, continue the search · Is 6 <= 7? Yes ~ 6 is less than or equal to 7, follow pointer · Read block L2. ~ read node L2 which is pointed to by 7 in B2 · Is L2 a leaf node? Yes ~ L2 is a leaf node · Search L2 for the key value 6. ~ if 6 is in the index it must be in L2 Searching for Key Value 5 · Read blockB3 from disc. ~ read the root node · Is B3 a leaf node? No ~ its not a leaf node so the search continues · Is 5 <= 5? Yes ~ step through each value in B3 · Read blockB1. ~ read node B1 which is pointed to by 5 in B3 · Is B1 a leaf node? No ~ B1 is not a leaf node, continue the search · Is 5 <= 3? No ~ step through each value in B1 · Read blockL3. ~ when all else fails follow the infinity pointer · Is L3 a leaf node? Yes ~ L3 is a leaf node · Search L3 for the key value 5. ~ if 5 is in the index it must be in L3
  • 50. Inserting in a B+-Tree A B+-Tree consists of two types of node: (i) leaf nodes, which contain pointers to data records, and (ii)internal nodes, which contain pointers to other internal nodes or leaf nodes. In this example, we assume that the order size1 is 3 and that there are a maximum of two keys in each leaf node. Insert sequence : 5, 8, 1, 7, 3, 12, 9, 6 Empty Tree The B+-Tree starts as a single leaf node. A leaf node consists of one or more data pointers and a pointer to its right sibling. This leaf node is empty. Inserting Key Value 5 To insert a key search for the location where the key would be expected to occur. In our example the B+-Tree consists of a single leaf node, L1, which is empty. Hence, the key value 5 must be placed in leaf node L1.
  • 51. Inserting Key Value 8 Again, search for the location where key value 8 is expected to be found. This is in leaf node L1. There is room in L1 so insert the new key.
  • 52. Inserting Key Value 1 Searching for where the key value 1 should appear also results in L1 but L1 is now full it contains the maximum two records. L1 must be split into two nodes. The first node will contain the first half of the keys and the second node will contain the second half of the keys
  • 53. However, we now require a new root node to point to each of these nodes. We create a new root node and promote the rightmost key from node L1. Each node is half full. Insert Key Value 7 Search for the location where key 7 is expected to be located, that is, L2. Insert key 7 into L2.
  • 54. Insert Key Value 3 Search for the location where key 3 is expected to be found results in reading L1. But, L1 is full and must be split. The rightmost key in L1, i.e. 3, must now be promoted up the tree.
  • 55. L1 was pointed to by key 5 in B1. Therefore, all the key values in B1 to the right of and including key 5 are moved to the right one place. Insert Key Value 12 Search for the location where key 12 is expected to be found, L2. Try to insert 12 into L2. Because L2 is full it must be split.
  • 56. As before, we must promote the rightmost value of L2 but B1 is full and so it must be split. Now the tree requires a new root node, so we promote the rightmost value of B1 into a new node.
  • 57. The tree is still balanced, that is, all paths from the root node, B3, to a leaf node are of equal length. Insert Key Value 9 Search for the location where key value 9 would be expected to be found, L4. Insert key 9 into L4.
  • 58. Insert Key Value 6 Key value 6 should be inserted into L2 but it is full. Therefore, split it and promote the appropriate key value. Leaf block L2 has split and the middle key, 7, has been promoted into B2.
  • 59. Deleting from a B+-Tree Deleting entries from a B+-Tree may require some redistribution of the key values to guarantee a wellbalanced tree. Deletion sequence: 9, 8, 12. Delete Key Value 9 First, search for the location of key value 9, L4. Delete 9 from L4. L4 is not less than half full and the tree is correct. Delete Key Value 8 Search for key value 8, L5. Deleting 8 from L5 causes L5 to underflow, that is, it becomes less than half full.
  • 60. We could remove L5 but instead we will attempt to redistribute some of the values from L2. This is possible because L2 is full and half its contents can be placed in L5. As some entries have been removed from L2, its parent B2 must be adjusted to reflect the change. We can do this by removing it from the index and then adjusting the parent node B2.
  • 61. Deleting Key Value 12 Deleting key value 12 from L4 causes L4 to underflow. However, because L5 is already half full we cannot redistribute keys between the nodes. L4 must be deleted from the index and B2 adjusted to reflect the change. The tree is still balanced and all nodes are at least half full. However, to guarantee this property it is sometimes necessary to perform a more extensive redistribution of the data. Search Algorithm s = Key value to be found n = Root node o = Order of B+-Tree WHILE n is not a leaf node i=1
  • 62. found = FALSE WHILE i <= (o-1) AND NOT found IF s <= nk[i] THEN n = np[i] found = TRUE ELSE i=i+1 END END IF NOT found THEN n = np[i] END END Insert Algorithm s = Key value to be inserted Search tree for node n containing key s with path in stack p from root(bottom) to parent of node n(top). IF found THEN
  • 63. STOP ELSE IF n is not full THEN Insert s into n ELSE Insert s in n (* assume n can hold s temporarily *) j = number of keys in n / 2 Split n to give n and n1 Put first j keys from n in n Put remaining keys from n in n1 (k,p) = (nk[j],"pointer to n1") REPEAT IF p is empty THEN Create internal node n2 Put (k,p) in n2 finished = TRUE ELSE n = POP p IF n is not full THEN Put (k,p) in n finished = TRUE ELSE
  • 64. j = number of keys in n / 2 Split n into n and n1 Put first j keys and pointers in n into n Put remaining keys and pointers in n into n1 (k,p) = (nk[j],"pointer to n1") END END UNTIL finished END Q3. Describe Projection operation, Set theoretic operation & join operation? Q3. The operation of projection consists in selecting the name of the columns of table(s) which one wishes to see appearing in the answer. If one wants to display all the columns "*" should be used. The columns are given after the SELECT clause. -Display the Name and the code sex of the students. SELECT Nometu, Cdsexe FROM ETUDIANT; -Display the contents of the table ETUDIANT SELECT * FROM ETUDIANT; Conventional set-theoretic operations are union, intersect, exception, and Cartesian product.
  • 65. Cartesian product The Cartesian product discussed previously is realized as a comma-separated list of table expressions (tables, views, subqueries) in the FROM clause. In addition, another explicit join operation may be used: SELECTLaptop.model, Product.model FROM Laptop CROSS JOIN Product; Recall that the Cartesian product combines each row in the first table with each row in the second table. The number of the rows in the result set is equal to the number of the rows in the first table multiplied by the number of the rows in the second table. In the example under consideration, the Laptop table has 5 rows while the Product table has 16 rows. As a result, we get 5*16 = 80 rows. Hence, there is no result set of that query here. You may check this assertion executing above query on the academic database. In the uncombined state, the Cartesian product is hardly used in practice. As a rule, it presents an intermediate restriction (horizontal ptojection) operation where the WHERE clause is available in the SELECT statement. Union The UNION keyword is used for integrating queries: <query 1> UNION [ALL] <query 2> The UNION operator combines the results of two SELECT statements into a single result set. If the ALL parameter is given, all the duplicates of the rows returned are retained; otherwise the result set includes only unique rows. Note that any number of queries may be combined. Moreover, the union order can be changed with parentheses. The following conditions should be observed:
  • 66. The number of columns of each query must be the same.  Result set columns of each query must be compared by the data type to each other (as they follows).  The result set uses the column names in the first query.  The ORDER BY clause is applied to the union result, so it may only be written at the end of the combined query. Example.Find the model numbers and prices of the PCs and laptops: SELECT model, price FROM PC UNION SELECT model, price FROM Laptop ORDER BY price DESC; model price 1750 1200.0 1752 1150.0 1298 1050.0 1233 980.0 1321 970.0 1233 950.0 1121 850.0 1298 700.0 1232 600.0 1233 600.0
  • 67. 1232 400.0 1232 350.0 1260 350.0 Example. Find the product type, the model number, and the price of the PCs and laptops: SELECT Product .type, PC.model, price FROM PC INNER JOIN Product ON PC.model = Product .model UNION SELECT Product .type, Laptop.model, price FROM Laptop INNER JOIN Product ON Laptop.model = Product .model ORDER BY price DESC; type model price Laptop 1750 1200.0 Laptop 1752 1150.0 Laptop 1298 1050.0 PC 1233 980.0 Laptop 1321 970.0 PC 1233 950.0 PC 1121 850.0 Laptop 1298 700.0 PC 1232 600.0
  • 68. PC 1233 600.0 PC 1232 400.0 PC 1232 350.0 PC 1260 350.0 Intersect and Exception The SQL standard offers SELECT statement clauses for operating with the intersect and exception of queries. These are INTERSECT and EXCEPT clauses, which work as the UNION clause. The result set will include only those rows that are present in each query (INTERSECT) or only those rows from the first query that are not present in the second query (EXCEPT). Many of the DBMS do not support these clauses in the SELECT statement. This is also true for MS SQL Server. There are also other means to be involved while performing intersect and exception operations. It should be noted here that the same result may be reached by differently formulating the SELECT statement. In the case of intersection and exception one could use the EXISTS predicate. The EXISTS predicate EXISTS::= [NOT] EXISTS (<table subquery>) The EXISTS predicate evaluates to TRUE providing the subquery contains any rows, otherwise it evaluates to FALSE. NOT EXISTS works the same as EXISTS being satisfied if no rows are returnable by the subquery. This predicate does not evaluate to UNKNOWN. As in our case, the EXISTS predicate is generally used with dependent subqueries. That subquery type has an outer reference to the value in the main query. The subquery result may be dependent on this value and must be separately evaluated for each row of the query that includes the subquery. Because of this, the EXISTS predicate may have different values for each row of the main query.
  • 69. Intersection example. Find those laptop makers who also produce printers: SELECT DISTINCT maker FROM Product AS Lap_product WHERE type = 'Laptop' AND EXISTS (SELECT maker FROM Product WHERE type = 'Printer' AND maker = Lap_product.maker); The printer makers are retrieved by the subquery and compared with the maker returned from the main query. The main query returns the laptop makers. So, for each laptop maker it is checked that the subquery returns any rows (i.e. this maker also produces printers). Because the two queries in the WHERE clause must simultaneously be satisfied (AND), the result set includes only wanted rows. The DISTINCT keyword is used to make sure each maker is in the returned data only once. As a result, we get: maker A Exception example. Find those laptop makers who do not produce printers: SELECT DISTINCT maker FROM Product AS Lap_product WHERE type = 'Laptop' AND NOT EXISTS (SELECT maker FROM Product WHERE type = 'Printer' AND maker = Lap_product.maker); Here, it is sufficient to replace EXIST in the previous example with NOT EXIST. So, the returned data includes only those main query rows, for which the subquery return no rows. As a result we get:
  • 70. maker B C Q4. Discuss Multi Table Queries? Inner joins (also known as equijoins) are used to contain information from a combination of two or more tables. The join condition determines which records are paired together and is specified in the WHERE clause. For example, let's create a list of driver/vehicle match-ups where both the vehicle and driver are located in the same city. The following SQL query will accomplish this task: SELECTlastname, firstname, tag FROM drivers, vehicles WHERE drivers.location = vehicles.location And let's take a look at the results: lastname firstname tag -------- --------- --- Baker Roland H122JM Smythe Michael D824HA Smythe Michael P091YF Jacobs Abraham J291QR Jacobs Abraham L990MT Notice that the results are exactly what we sought. It is possible to further refine the query by specifying additional criteria in the WHERE clause. Our vehicle managers took a look at the results of our last query and noticed that the previous query matches drivers to vehicles that they are not authorized to drive
  • 71. (e.g. truck drivers to cars and vice-versa). We can use the following query to resolve this problem: The current commercial multilevel secure (MLS) database management system (DBMS) products provide extensions to SQL to support multilevel database applications. However, the DBMS vendors have implemented a variety of mechanisms that are both difficult to understand and ineffective in addressing a number of application concerns. The paper documents and compares the SQL extensions for Informix Online/Secure, Trusted Oracle, Trusted Rubix, and Sybase Secure SQL server. Based on the vendors' current implementations, we have developed recommendations for an MLS SQL standard that would support interoperability both among the MLS DBMS products and with standard SQL client applications. We have also analyzed the vendors' approaches to polyinstantiation and signaling channels; our recommendations include improved support for cover stories and better control of inherent signaling channels SELECT lastname, firstname, tag, vehicles.class FROM drivers, vehicles WHERE drivers.location = vehicles.location AND drivers.class = vehicles.class Notice that in this example we needed to specify the source table for the class attribute in the SELECT clause. This is due to the fact that class is ambiguous – it appears in both tables and we need to specify which table’s column should be included in the query results. In this case it does not make a difference as the columns are identical and they are joined using an equijoin. However, if the columns contained different data this distinction would be critical. Here are the results of this query:
  • 72. lastname FirstName Tag Class -------- --------- --- ----- Baker Roland H122JM Car Smythe Michael D824HA Truck Jacobs Abraham J291QR Car Notice that the rows pairing Michael Smythe to a car and Abraham Jacobs to a truck have been removed. You can also use inner joins to combine data from three or more tables. Outer joins allow database users to include additional information in the query results. We'll explore them in the next section of this article. Take a moment and review the database tables located on the first page of this article. Notice that we have a driver -- Jack Ryan -- who is located in a city where there are no vehicles. Our vehicle managers would like this information to be included in their query results to ensure that drivers do not sit idly by waiting for a vehicle to arrive. We can use outer joins to include records from one table that have no corresponding record in the joined table. Let's create a list of driver/vehicle pairings that includes records for drivers with no vehicles in their city. We can use the following query: SELECTlastname, firstname, driver.city, tag FROM drivers, vehicles WHERE drivers.location = vehicles.location (+) Notice that the outer join operator "(+)" is included in this query. This operator is placed in the join condition next to the table that is allowed to have NULL values. This query would produce the following results:
  • 73. lastname firstname citytag -------- --------- ------- Baker Roland NewYorkH122JM Smythe Michael MiamiD824HA Smythe Michael MiamiP091YF Jacobs Abraham SeattleJ291QR Jacobs Abraham Seattle L990MT Ryan Patrick Annapolis This time our results include the stranded Patrick Ryan and our vehicle management department can now dispatch a vehicle to pick him up. Note that there are other possible ways to accomplish the results seen in this article and syntax may vary slightly from DBMS to DBMS. These examples were designed to work with Oracle databases, so your mileage may vary. Furthermore, as you advance in your knowledge of SQL you’ll discover that there is often more than one way to accomplish a desired result and oftentimes one way is just as good as another. Case in point, it is also possible to specify a join condition in the FROM clause rather than the WHERE clause. For example, we used the following SELECT statement earlier in this article: SELECT lastname, firstname, tag FROM drivers, vehicles WHERE drivers.location = vehicles.location AND drivers.class = vehicles.class The same query could be rewritten as: SELECT lastname, firstname, tag FROM drivers INNER JOIN vehicles ON drivers.location = vehicles.location WHERE drivers.class = vehicles.class
  • 74. That's it for this week! Be sure to check back next week for a new exciting article on databases. If you'd like a reminder in your Inbox, subscribe to the About Databases newsletter. Q5. Discuss Transaction Processing Concept? 10.2 Describe properties of Transactions? In computer science, transaction processing is information processing that is divided into individual, indivisible operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot remain in an intermediate state. Transaction processing is designed to maintain a database Integrity (typically a database or some modern filesystems) in a known, consistent state, by ensuring that any operations carried out on the system that are interdependent are either all completed successfully or all canceled successfully. For example, consider a typical banking transaction that involves moving $700 from a customer's savings account to a customer's checking account. This transaction is a single operation in the eyes of the bank, but it involves at least two separate operations in computer terms: debiting the savings account by