Scaling API-first – The story of a global engineering organization
A concept of dbms
1. A concept of dbms
WRITTEN BY
S O U R AV M I S H R A
2.
3. What is file management system ?
The technique is used to represent and store a record on file is called file organization.
This is three types
sequential file organization.
Indexed sequential organization.
Direct access.
Fundamental characteristics of file management system :
Creation of file : to create a file.
Updating :It includes insertion ,deletion ,updation.
Retrieval :retrieval means access the file. It have two way
Inquiry.
Report generation.
Mentainance :It includes restructuring and reorganizing .restructuring means structural change are
made to file.re organizing means changes are made from one file organization to another.
4. Sequential file organization: in sequential file organization records are arranged in
either ascending or descending order.
Advantage: the advantage of sequential file is the ability to access the nest record
quickly.
Disadvantage: in sequential file when we access a record from file organization at
that time the key value search the whole record.
Index sequence file: In index sequential file organization ,to access the record in
individually and sequentially by same key value.
Advantage: in index sequential file index provide for random access for record.
Disadvantage: it is more expensive for reorganizing the records in overflow area.
Direct file organization :in this organization the mapping from search key value is
mapped directly to the storage location.
Advantage: the advantage of relative file is the ability to access the indivual record
directly.
5.
6. Use of Hashing and hash function
Keys Buckets overflow area
Sm 000 *
001
G.m 9932 . * R.K 9932
003
G.m S.m 8935 .
051
*
J.P 052
053 J.p 5562 .
R.K *
Keys Hash function Hashing
S.M 00
G.m 01
Rk 02
Jp ..
04
7. Comparision of Hash function
The division remainder technique given the best overall
performance . It is the best hush function . The midsquare method
can be applied to a file with low loading factor . So it given poor
performance .
: Data redundancy occurs in database system which
have a field that is repeated in two or more tables. Due to this other
data or field can not be inserted into database properly. To maintain
this reduction of redundancy is very much essential for every
database design.
Data redundancy leads to data anomalies and corruption and
generally should be avoid by design. Database normalizes prevent
redundancy and remove the anomalies data.
8. Hash addressing
Hash addressing: In direct file organization the key value is mapped
directly to the storage location.
Hash function
Key value Address.
Advantage : Hash addressing depend on hash function. It depend upon
1. The distribution of key values directly map to location of the table.
2. The collision resolution technique must be used.
Disadvantage : the main disadvantage is collision. A collision occur
when two distinct key values are mapped to the same storage location.
Collision is resolved by linear probing and double hashing.
9. Approach to problem of collision: When a hash function mapped to the large key
value to a small address .there are certain to be collision. More than one key value
will be mapped to a single location .Due to this a collision is occurred and this
solved by linear and double hashing.
Linear probing: The process of finding the slot in hash table is called probing. In this
method key value transfer home address to empty location.
It uses the following hash function.
h(k,i)=[h`(k)+i]mod m where h`(k) =k mod m and i=1,2,…
Double hashing: double hashing is a computer technique used in a hash table to
resolved hash collision. It uses following hash function
h(k,i)= [h1 (k)+ih2(k) ] mod m
where h1 (k) =k mod m and h2(k) =k mod m`
10.
11. Database and DBMS.
Database schema and 3D schema.
Data abstraction and data independence.
DBMS Language.
Database user.
Data model.
Advantage Disadvantage of DBMS.
E.R model.
DBMS architecture and data dictionary.
12. Database and DBMS
Concept of database: Database is a collection of related data.
A database is logically coherent collection of data with inherent meaning.
A database is designed populated with data for specific purpose.
A database may be generated & maintain manually or it may be computerised.
Ex: Library card catalog is a database that may be created &maintain manually.
Uses: the database use to store information ,useful to an organization.
DBMS: Database management system is a mega software system that allows access to data contain in
a database.
It allow user to maintaining, managing, utilising database .
It facilitate process of defining, constructing and manipulating data.
Database schema: Description of database is known as database schema which is specified during
database designing.
Database state or instances: The data in database at a particular moment in timing is called a
database state.
13. 3-schema architecture
Internal schema: The internal schema describes the internal level of database.
It describes the physical storage of database.
Conceptual schema:
It describe conceptual level which includes structure of whole database for community of user.
External schema:
It describes the external level such as user view.
User 1 User 2 User 3
Conceptual schema
Internal schema
14.
15. DBMS languages
DBMS have different language to describe the database.
1. Extended host languages: It is a system provide extension to cable to enable the user to interact with database.
2. Query languages: It provides more powerful facilities to interact with database. It again divided into two types.
a) Data definition language.(DDL)
b) Data manipulation language.(DML)
DDL: DBMS provides a languages called data definition language which can be used to define the conceptual
schema and also gives details about storage of data in physical device.
DML: DML involve the following task
1. Retrieve the data from the database.
2. Insertion the new data in database .
3. Deletion and modification of existing the data.
There are two types of DML .
A. High-level DML: high level DML such as SQL (standard query language ) can specify and retrieve many records in
a single DML statement.
B. Low-level DML : Low level DML specify how to retrieve the data.
16. user
1. Actor on the screen: Many person are involved in design, use and maintenance of database. The people
whose job involve day to day use of large database, these type of user known as actor on the screen.
The actor are
Database administer: The user who can control the centralized database system is called DBA.
any organization where many person use same resources ,there is a need for a chief administrator to
oversee & manage these resources.
B. In a database ,the primary resources is database itself & secondary resource is DBMS and related
software. All these resources are responsibility of DBA.
Database designer :It is responsibility of database designer to communicate with all prospective database
users in order to understand their requirements & to come up with a design meets these requirements.
End user: End user are the people whose jobs require access to the database for querying, updating, &
generating reports. The four type of end user are casual end user, parametric end user, sophisticated
end user, stand alone user.
Application programmer: They are the user who are responsible for writing application programme in
programming language(c, c++,java etc).
17. 2.Worker behind the screen: Some person are associated with the design, development &
operation of DBMS software &system environment. These person are typically not interested
in database itself. These person are known as worker behind the screen.
DBMS designer & implementers: They are the person who design & implement the DBMS
modules & interface as a software package.
Tool developer: Tool means software system that facilitate database system design & use.
The person who design & implement tools is known as tool developer.
Operators maintains person: They are the system administrator person, who are
responsible for the actual running & maintenance of hardware & software environment for
the database.
Data model: A data model is a collection of concept that can be used to describe the structure of
a database. There are four type of data model. File based system, Traditional data
model, Semantic data model or high level data model, low level data model.
18. Advantage and disadvantage of DBMS
Reduction of redundancies: The main advantage of DBMS is avoiding
duplication of data.
Shared data: Database allow the sharing of data under its control by
any no of application programmers or user.
Data independence: Data independence is advantages in database
environment since it allow for changes at one level of database without
affecting other levels.
Security:
19. E.R model(Entity relationship model
The E-R model consist of the following component.
Entity :An entity is a class of person, places, object, event that exist in real world.
Attribute: Each attribute can have no of characteristics. The characteristics of an
entity are called attribute. For ex: name, roll no.
Simple vs. composite: The attribute which can be divided into smaller,
independent, meaningful attributes are called composite attribute. ex: Address is a
composite attribute. Age of a person is simple attribute.
Street City
Address
Single value vs. multivalve: Most attribute have a single value for a particular
entity such attribute are called single value attribute. Ex: age of a person. Dual
color car contain multiple value.
20. E.R Model
Stored attribute vs. Derive attribute: For a particular person , age can be determine
from current date and value of DOB. So age is derive attribute and DOB is called
stored attribute.
Null attribute: The attribute having null value is called null value attribute. For ex:
phone no of a person may be unknown.
Relationship:
1:1,1:n,m;n :This relationship exist among the entities.
Department Father Customer
HOD Children Item
Key attribute: The key attribute is an attribute that unique identify a entity set.
Ex emp-code can identify the entity set employee.
21. Symbol of E.R Model
Symbol Meaning
ENTITY
ATTRIBUTE
WEAK ENTITY
RELATIONSHIP ENTITY
____
_
KEY ATTRIBUTE
22. DBMS Architecture and Data dictionary
DBMS Architecture:
Different abstraction level: Database describe by three abstract level.
A. Internal schema.(physical database)
B. Conceptual schema.(conceptual database)
C. External schema.(view)
Objectives:
A. Support of multiple user view.(meta data)
B. Use of schema to store DB description.
Data dictionary: Data dictionary also known as system catalog. It contain all the
information about the database structure that means it also describes all the
primary structure of a database and these information are known as metadata.
24. Hierarchical data model
Hierarchical data model is used the tree concept to represent data and relationship among data. But no clear
document are there to describe HDM. Only IMS information management system from which HDM is driven. IMS is
HDBMS used in banking sector, privet firm that managed the DBMS from HDM.
Relationship : The relationship is two type
Record.
PCR.
Record :A record is a collection of field. A record type is collection of similar record.
PCR :A PCR type is a 1:n relationship among two record. One side of record is parent record type and n side of
record type is known as child record type.
An occurrence of PCR type consist of 1:n relationship between parent and child record type.
A hierarchical database schema is a collection of hierarchical schema. A diagrammatically representation of
hierarchical schema is known as hierarchical diagram. In a hierarchical diagram one single parent record have
more than one child record type then link representation PCR type are connected .
Department
Employee Project
25. Characteristics of HDM
1. Each HDM diagramed can have only one record and this
record does not have parent record.
2. One parent record may have more than one child record
type.
3. The record type does not have any child record type is called
leaf.
4. All record type except root must be connected to a PCR type.
5. When one parent record type have more than one child record
type in that case child record must be ordered.
26. Explanation of relationship
1:1
As in PCR one parent record type corespond to n child record where n >=0 ,so 1:1
relationship can be represented with the general concept of HDM. Similarly 1:n
relationship can be represented.
When two record type have M:N relationship with each other. In that case in a PCR
type concept is not sufficient enough to represent M:N relationship. This problem
can be solve by storing child record multiple times. So one problem arises of storing
the same record multiple time.
To solve this problem HDM assume one of parent record type as parent and raster
virtual parent type bring a new concept of virtual PCR
Department Department Employee
1:1
M:N M:M
Employee Project
Manager
28. These are constant on database such that database must obey these constant
Any record type can not be exist without being related to a parent record type. It
has three implementation.
Whenever a parent record type is deleted .Then its corresponding child record
type is also deleted.
Whenever a child record is inserted then its corresponding parent record type
also linked
.Whenever a virtual parent is deleted then its corresponding parent record is not
deleted.
• Any record type that have more than one parent record type can exist only one
record type as real parent & virtual parent.
• One virtual record type may have any no of child record type but IMS restrict to
this only one .
29. Dept
Project
D loc D Employee D Manager
P Worker
Typically a hierarchical schema by means of schema diagram forms a tree data structure.
For ex the above diagram can be represented by tree structure.
Each node represent a record type. Link is representation of a PCR type in a hierarchical
schema diagram .
In hierarchical data model except root all child record type are dependent segment. Inertly
we say that in every PCR type the child record type depend upon the parent record
type by the root of PCR . So root is full independent segment & leaf is full dependent
segment.
30. Networking data model
It also known as DBTG (database task group) as it also proposed by
codasyl.
The network data model is based on the set construct and record type.
1. Record: A record is a collection of field. The similar collection of record is
called is record type.
2. Set construct: The set construct defines 1:n relationship between two
record type. The record type is one side is known as owner record type
and at n side is known as member record type.
3. Batchman diagram: In batchman a set type has three parts.
a) Owner record type. owner
b) Member record type. Dept Dept-student
c) Name of the set type. Member
Student
31. construct m
Network data model construct are two type. Structural and Behavioral.
Behavioral construct are of two main category. Insertion and retention option.
Insertion option deals with roots applied to a member record type when a record is
inserted.
Retention option gives how a record can when inserted, deleted or
updated.
option: A new record can be inserted in two ways one is automatically and
other is manually. The new record when inserted is automatically associated with a
set instance. This is maintain by system.
option: It is three type.
1. Optional: It mince a record may related to any set instance.
2. Mandatory: A record can not be exist without being related to a owner record or to
any set instance.
3. Fixed: Once a record is inserted it must be owner record and it is fixed.
32. Unit:4
Keys and types.
Integrity rule.
Relational algebra.
Tuple and Domain.
Relational algebra.
33. Keys And Types
Keys: A key is that data item that exclusively identifies a record. For ex: account-no, product-id, emp-
no and customer-no are used as key fields because they specifically identifies a record stored in a
database.
Super key: A super key for a set of one or more attributes which combine value uniquely identifies the
entity in entity set. For ex entity set employee, the set of attributes (emp-name, address)can consider
to be a super key.
Primary key: The primary key uniquely identifies each record in a table and must never be the same
for records ex: emp-code can be the primary key for the entity emp.
Candidate keys: A candidate key in the minimum set of attribute to identify a record within a entity set.
Secondary key: After choosing primary key and candidate keys, the others are called secondary key.
Foreign key: it is a set of field in a relation that refers to field in another relation.
Here primary key name super key: roll name, roll sex, roll class. Secondary key: name, sex, class.
roll : roll no, Stu code sex class
Stu code is candidate key.
34. Integrity rule
When many users enter data items into a database it becomes important that all data item and
association among such data item not destroy.
Hence, data insertion, updation, etc have to be carried out in such a way that database integrity
is maintain.
Integrity rule 1(entity integrity):
If a attribute of a table is of prime attribute, it can not accept null value or in other words, primary
key may not be null.
Integrity rule 2 (referential integrity):
1. To ensure that a value that appears in one relation for given set of attributes also appears
for a certain set of attributes in another relation .such a condition is called referential
integrity.
2. Integrity rule 2 is concern with the concept of foreign key.
3. The value of a primary key which appears in a base table. Whenever there is a cardinality
then the value of a primary key, which becomes a foreign key in the entity relation, the value
of foreign key and primary key should be same.
35. Relational algebra
It is a formal foundation of relational model.
It is used for implementing and optimizing queries in relational database
management system.
It is two type. Set oriented operation, relational oriented operation.
Set oriented operation: There are four type of this operation.
set union, set intersection, set difference, Cartesian product.
Relational oriented operation: There are two type of operation.
Select: The select operation extract specific touple from a relation. We can
use the lower Greek latter to denote selection. In general we allow
Comparision using relational operators.
Project: The project operation is a unary operation. The project operation
select the column from the table and discard the other column.
36. There are twelve rules formulated by E.F CODD ,for RDBMS in 1970.
The twelve rules are having the following main points:
1. Information Representation.
2. Granted Access.
3. Systematic treatment of null value.
4. Database description rule.
5. Comprehensive data sub language.
6. View updating.
7. High level update, insert, delete.
8. Physical data independence.
9. Logical data independence.
10. The distribution rule.
11. Non sub-version.
12. Integrity rule.
37.
38. Anomalies.
Functional dependency.
Closer and axiom rule.
Normalization and types.
BCNF and database security.
Concurrency operation.
39. Anomalies & F.D
Anomalies: The aim of the database system is to reduce redundancy meaning
information is to be stored only once. Storing information several times leads to the
wastage of storage space and increase in the total size of the data store update to the
database with such redundancy is becoming in consistence.
Functional dependency: F.D are the relationships among the set of attributes with
relationship.
A F.D denoted by A B between two set of attributes A & B.
There are different types of F.D.
Full functional dependency: When all non-key attributes are dependent on the key
attributes is called full functional dependency.
In following example non-key attribute (name, adds, age course) are depend on key
attribute roll no.
Roll no name address age course
40. Functional dependency
Partial dependency: In partial dependency when some non-key
attributes depends on the key attributes and the remaining non-key
attributes depend on one are more non-key attributes.
Roll no Name Address Age Coerce Date of join
Transitive dependency: When one non-key attribute Distance
snow Origin Destination depends on other
non-key attribute, it is called transitive dependency.
41. Closer and axiom rule of F.D
Multivalued dependency:
Multivalued dependency are a consequence of first normal form.
F.D are also referred to as a equality generating dependency and multivalued
dependency are referred to as touple generating dependency.
Closer of F.D: The set of all dependency that include F as well as all dependency that
can be inferred from F.
There are six rules are known as axiom rule.
1. Reflexive rule. If x
2. Augmenting rule.
3. Transitive rule.
4. Decomposition rule.
5. Union rule.
6. Pseudo transitive rule.
42. Normalization And Types
Normalization is the process of efficiently organizing data in a database.
The first step of normalization is to convert E.R model into table. Then to examine the table for
redundancy & if necessary change to non-redundancy form.
The normal form are used to ensure that various type of anomalies and inconsistencies are not
introduces into the database.
There are five type of normal form.
1st normal form: A relation schema is said to be in first normal form if the values of domain of
each attribute of relation are atomic.
It disallow having a set of values, a touple of values or a combination of both.
2nd normal form: A relation schema is said to be second normal form, if it is in 1 st normal form
and if all non prime attribute are fully functionally depend on relation key.
3 rd normal form: To be in 3rd normal form, the relation must be in 2 nd normal form and no transitive
dependency may exist without the relation.
4th and 5th normal form based on the concept of multivalue dependency & join dependency.
43. BCNF & Database Security
BCNF:
1. When a relation has more than one candidate key, anomalies may result
even through the relation in 3nf.
2. It based on the concept of determination.
3. If a table contains only one candidate key the 3nf and BCNF is
equivalent. BCNF only violated if table contain more than one candidate
key.
Security: Database security are of two types .
1. System security: System security deals with providing security to
database at system level. For ex :DBMS cheeks.
2. Database security: It protecting the data individual level. For ex: a user
with insufficient privileges can not view a table.
44. Concurrent Operation
Locking and timestamp are two best concurrent operation.
Locking:
A data item can be locked by a transaction in order to prevent this data item being
accessed and updating by any other transaction.
Locks are two types.
Exclusive lock: A transaction which want to modify a data item and not read if can
make exclusive lock on the data item. Hence it is also known as write lock.
Shared lock: A transaction which only read a data item and not modify it, can make
shared lock on the data item.
Time stamped ordering: In this method a serial order is created among the
concurrent transaction by assigning a unique non decreasing number to each
transaction.