Artifacts, Data Dictionary, Data Modeling, Data Wrangling
1. Artifacts | Data Dictionary |
Data Modeling | Data Wrangling
Presented By
Md Faisal Akbar
2. Artifacts
An artifact is one of many kinds of tangible by-products produced during the
development of software.
Some artifacts (e.g., use cases, class diagrams, and other Unified Modeling
Language (UML) models, requirements and design documents) help describe
the function, architecture, and design of software.
Other artifacts are concerned with the process of development itself—
such as project plans, business cases, and risk assessments.
Artifacts are typically living documents and formally updated to reflect
changes in scope. They exist so that everyone involved in the project has a
shared understanding of all information related to the effort.
4. Whatis a data dictionary?
◇ It is an integralpart of a database.
◇ It holds information about the
database and the data that it stores.
◇ A data dictionary is a “virtual database”
containing metadata (data about data).
5. META DATA
Metadata is Metadata is defined as data providing
information about one or more aspects of the
data, such as:
◇ Time and date of creation.
◇ Authorization of the data.
◇ Attribute size.
◇ Purpose of the data.
6. It is where the systems analyst goes to define or look
up information about entities, attributes and relationships
on the ERD (Entity Relationship Design).
“
7. Viewing the data dictionary
SELECT * FROM DICT;
--or
SELECT * FROM DICTIONARY;
lists all tables and views of the data dictionary that are accessible to the
user. The selected information includes the name and a short description of
each table and view
8. Data Dictionary provides information about
database
◇
◇
◇
◇
◇
◇
◇
◇
◇
◇
Table
Indexes
Columns
Constrains
Relationship to other variables
Precision of data
Variable format
Packages
Data type
And more
11. Structure of Data Dictionary
Relational
systems all have
some form of
integrated data
dictionary (e.g.
Oracle)
It can be
integrated with
the DBMS or
stand-alone.
It automatically
reflect the
changes in the
database.
12. Disadvantages of
Data Dictionary?
Creating a new data dictionary is
a very big task. It will take years
To create one.
Requires management commitment,
which is not easy to achieve,
particularly where the benefits are
intangible and long term.
The cost of data dictionary will
be bit high as it includes its initial
build and hardware charges as
well as cost of maintenance.
It needs careful planning,
defining the exact requirements
designing its contents, testing,
implementation and
evaluation.
13. What is a Data Model ?
Graphical Representation of tables
Represent relationship between
tables
Easily understood
Phases of Data Model
Conceptual
Logical
Physical
14. Conceptual Data Model
Highly Abstract
Easily understood
Easily enhanced
Only “Entities” visible
Abstract Relationship
No attribute is specified.
No primary key is specified.
15. Logical Data Model Includes all entities and relationships
among them
Key Attribute
Non-Key attribute
The primary key for each entity is specified.
Foreign keys are specified
Normalization occurs at this level.
User Friendly Attribute name
More detailed than Conceptual Model
Database agnostic
The steps for designing the logical data model
are as follows:
1. Specify primary keys for all entities.
2. Find the relationships between different
entities.
3. Find all attributes for each entity.
4. Resolve many-to-many relationships.
5. Normalization.
16. Physical Data Model
Physical data model represents how the model will be
built in the database
Entities referred to as Tables
Attribute referred to as Columns
Foreign keys are used to identify relationships
between tables.
Denormalization may occur based on user
requirements.
Database compatible Table names
Database compatible Column names
Database specific data types (For example, data
type for a column may be different between MySQL
and SQL Server)
The steps for physical data model design are
as follows:
1. Convert entities into tables.
2. Convert relationships into foreign keys.
3. Convert attributes into columns.
4. Modify the physical data model based on physical
constraints / requirements.
17. Compare Stages of Data Model
Feature Conceptual Logical Physical
Entity Names ✓ ✓
Entity Relationships ✓ ✓
Attributes ✓
Primary Keys ✓ ✓
Foreign Keys ✓ ✓
Table Names ✓
Column Names ✓
Column Data Types ✓
18. Data wrangling
Data wrangling is the process of cleaning, structuring and enriching raw data into a desired format
for better decision making in less time.
Key Steps of Data Wrangling:
Data Acquisition: Identify and obtain access to the data within your sources
Joining Data : Combine the edited data for further use and analysis
Data Cleansing: Redesign the data into a usable/functional format and correct/remove any bad
data