SlideShare una empresa de Scribd logo
1 de 65
Descargar para leer sin conexión
Data Mining and
Data Warehousing Techniques
Presented to : Muhammad Faisal
Presented by:
Faizan Saleem
Pireh Pirzada
Ahmed Hassan
Muhammad Usman
BSE-4 | DATABASE MANAGEMENT SYSTEM
Topics
 Why we need Data warehouses and
Data mining?
 What Data warehouses and Data
mining?
 History of Data warehouses and Data
mining?
 Techniques of Data warehouses and
Data mining
Why we need Data Mining and
Ware-housing
Problem Scenario
Solution
Needs of Data warehouses and Data Mining
Why Data Warehouse?
Necessity is the mother of invention
Information
Problem Scenario 1
ABC Pvt Ltd is a company with
branches at Karachi, Lahore,
Peshawar and Islamabad.
The Sales Manager wants quarterly
sales report.
Each branch has a separate
operational system.
ABC Pvt Ltd.
Karachi
Lahore
Peshawar
Islamabad
Sales
Manager
Sales per item type per branch
for first quarter.
Solution for ABC Pvt Ltd.
 Extract sales information
from each database and
Store the information in a
common repository at a
single site.
Solution ABC Pvt Ltd.
Karachi
Lahore
Peshawar
Islamabad
Data
Warehouse
Sales
Manager
Query &
Analysis tools
Reports
Problem Scenario 2
A Shopping Super Market has huge
operational database. Whenever
Executives wants some report the OLTP
system becomes slow and data entry
operators have to wait for some time.
Problem
Operational
Database
Data Entry Operator
Data Entry Operator
ManagementWait
Report
Solutions for Shopping Mart
 Extract data needed for analysis from
operational database and Store it in warehouse.
 Refresh warehouse at regular interval so that it
contains up to date information for analysis.
 Warehouse will contain data with historical
perspective.
Solution
Operational
database
Data
Warehouse
Extract
data
Data Entry
Operator
Data Entry
Operator
Manager
Report
Transaction
Need for Data Warehousing
 Industry has huge amount of operational data
 Knowledge worker wants to turn this data into
useful information.
 This information is used by them to support
strategic decision making .
Need for Data Warehousing
 It is a platform for consolidated historical data
for analysis.
 It stores data of good quality so that knowledge
worker can make correct decisions.
Need for Data Warehousing
 From business perspective
It is latest marketing weapon
Helps to keep customers by learning more
about their needs .
Valuable tool in today’s competitive fast
evolving world.
Why Mine Data? Commercial Viewpoint
 Lots of data is being collected and warehoused
 Web data, e-commerce
 Purchases at department/ grocery stores
 Bank/Credit Card
transactions
 Computers have become cheaper and more powerful
 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g.
in Customer Relationship Management)
Why Mine Data in Scientific Viewpoint
 Data collected and stored at enormous speeds
(GB/hour)
 Remote sensors on a satellite
 telescopes scanning the skies
 Microarrays generating gene expression data
 Scientific simulations generating terabytes of
data
What is Data Mining and Ware-
housing?
Definition Data Warehouse
Data Ware houses Uses
Definition Data Warehouse
Data Mining Uses
Data Ware Housing Verses Data Mining
Examples
What is Data Ware-Housing?
20
Data warehousing can be
said to be the process of
centralizing or
aggregating data from
multiple sources into one
common repository.
A process of transforming data
into information and making it
available to users in a timely
enough manner to make a
difference.
Data Information
Data Ware-Housing Uses
 Reporting and Data Analysis.
 Data warehouses store current as well as historical
data and are used for creating trending reports for
senior management reporting such as annual and
quarterly comparisons.
Data warehousing and Data mining
What is Data Mining?
23
Data mining is the process
of mining and discovering
of new information in
terms of patterns or rules
from vast amounts of data
involving methods at the
intersection of artificial
intelligence, machine
learning, statistics, and
database systems.
What is Data Mining?
 Extract information and transform it into an
understandable structure.
 Uses past data to analyze the outcome of a particular
problem or situation.
Data Mining Uses
 To decide upon marketing strategies for their product.
 They can use data to compare and contrast among
competitors.
 Data mining interprets its data into real time analysis
that can be used to:
 increase sales,
 promote new product,
 or delete product that is not value-added to the company.
Data Mining works with Warehouse
Data
26
 Data Warehousing provides
the Enterprise with a memory
Data Mining provides
the Enterprise with
intelligence
Data ware-housing VS data
mining
Data Ware Housing
 Occurs before any Data
mining process.
 data warehousing is the
process of compiling and
organizing data into one
common database
Data Mining
 Relies on data
warehousing data to
detect meaningful
patterns.
 data mining is the
process of extracting
meaningful data from
that database.
Example of data mining
 Credit Card Fraud.
 Data it collection on shoppers to find patterns
in their shopping habits.
 A great example of data warehousing that
everyone can relate to is what Facebook does.
History of Data Mining and
Ware-housing?
Data Warehouse History
Data Mining History
History of Data warehouse
 1960s — General Mills and Dartmouth College, in a joint
research project, develop the
terms dimensions and facts.
 1970s — ACNielsen and IRI provide dimensional data
marts for retail sales.
 1970s — Bill Inmon begins to define and discuss the
term: Data Warehouse
History of Data warehouse
 1975 — Sperry Univac Introduce MAPPER (MAintain,
Prepare, and Produce Executive Reports) is a database
management and reporting system that includes the
world's first 4GL.
History of Data warehouse
 1983 — Tera data introduces a database management
system specifically designed for decision support.
 1983 — Sperry Corporation Martyn Richard Jones defines
the Sperry Information Center approach, which while
not being a true DW in the Inmon sense, did contain
many of the characteristics of DW structures.
History of Data warehouse
 1984 — Metaphor Computer Systems releases Data
Interpretation System (DIS). DIS was a
hardware/software package and GUI for business users
to create a database management and analytic system.
History of Data warehouse
 1988 — Barry Devlin and Paul Murphy publish the article
in IBM Systems Journal where they introduce the term
"business data warehouse".
 1990 — Red Brick Systems, founded by Ralph Kimball,
introduces Red Brick Warehouse, a database
management system specifically for data warehousing.
 1991 — Prism Solutions, founded by Bill Inmon,
introduces Prism Warehouse Manager, software for
developing a data warehouse.
History of Data warehouse
 1992 — Bill Inmon publishes the book Building the Data
Warehouse.
 1995 — The Data Warehousing Institute, a for-profit
organization that promotes data warehousing, is
founded.
History of Data warehouse
 1996 — Ralph Kimball publishes the book The Data
Warehouse Toolkit.
 2000 — Daniel Linstedt releases the Data Vault, enabling
real time auditable Data Warehouses warehouse.
Brief History Of Data Mining
 The term "Data mining" was introduced in the 1990s.
 Data mining can be tracked through classical statistics,
artificial intelligence, and machine learning.
 Statistics are the foundation of most technologies on
which data mining is built. All of these are used to study
data and data relationships.
 Artificial intelligence, or AI, which is built upon
heuristics as opposed to statistics, attempts to
apply human-thought-like processing to statistical
problems. AI concepts were adopted for RDBMS ‘s
Query processor.
Brief History Of Data Mining
Brief History Of Data Mining
 Machine learning is the union of statistics
and AI. It could be considered an
evolution of AI, because it blends AI
heuristics with advanced statistical
analysis.
Data Mining Techniques
Task of data mining
Applications of data mining
Processes Used in Data Mining
It is done by two Methods:
• Prediction Methods
• Description Methods
How it works
 Data mining involves six common tasks
o Classification [Predictive]
o Clustering [Descriptive]
o Association Rule Discovery [Descriptive]
o Sequential Pattern Discovery [Descriptive]
o Regression [Predictive]
o Deviation Detection [Predictive]
Anomaly detection
 What is Anomaly Detection ?
 Types of Anomaly Detection:
• Unsupervised anomaly detection
• Supervised anomaly detection
• Semi-supervised anomaly detection
Association rule learning
 What is Association rule learning
 The examples:
• In super Market
• Inventory Management
Classification
What is it ?
 Given a collection of records (training set )
Find a model for class attribute as a function of the values
of other attributes
Goal: previously unseen records should be assigned a class
as accurately as possible.
 Example:
Clusters
 What is it ?
 Example:
Sequential Pattern
Discovery
 What is it?
 Example:
 In point-of-sale transaction sequences,
 Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk)
 Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)
(A B) (C)  (D E)
Regression
 What is it ?
 Example:
 Pagerank as used by google
 • Page structure implicitly holds importance of a page
 • Important pages are linked to by important pages
Applications Of Data Mining
 Data Mining Applications in Sales/Marketing
 Data Mining Applications in Banking / Finance
 Data Mining Applications in Health Care and Insurance
 Data Mining Applications in Transportation
 Data Mining Applications in Medicine
Data Mining Applications in
Sales/Marketing
 enables businesses to understand the hidden patterns
inside historical purchasing transaction
 Market basket analysis
 Identify customer’s behavior
Data Mining Applications
in Banking / Finance
 credit card fraud detection
 identify customers loyalty
 identify stock trading rules
 Identify users by method of payment/transaction
Data Mining Applications
in Health Care and Insurance
 Claims analysis
 Forecasts of customers
 Detect risky customers
 Fraudulent behavior
Data Mining Applications
in Transportation
 Determine the distribution schedules
Data Mining Applications
in Medicine
 Characterize patient activities
 Identify the patterns
Data Ware-housing
Techniques
Star Schema
Elements
Example
Star Schema VS Snowflake Schema
Star Schema
 Star schema is the simplest form of a dimensional model, in
which data is organized into facts and dimensions.
 A star schema is diagramed by surrounding each fact with
its associated dimensions.
 The resulting diagram resembles a star.
 Star schemas are optimized for querying large data sets and
are used in data warehouses and data marts to support
OLAP cubes, business intelligence and analytic applications,
and queries.
Elements of star schema
 Dimension tables
 A dimension contains reference information
about the fact, such as date, product, or
customer.
 Demoralized, decoded and cleaned set of
descriptive data elements
 Geography dimension tables describe
location data, such as country, state, or city
 Employee dimension tables describe
employees, such as salespeople
Fact Tables
A fact is an event that is counted or measured,
such as a sale or login.
Contains foreign keys referencing dimension
records
Contain either additive or semi-additive
measures for analysis
Data warehousing and Data mining
Example
 Each dimension table has a primary key on its Id column, relating
to one of the columns (viewed as rows in the example schema) of
the Fact_Sales table's three-column (compound) primary key
(Date_Id, Store_Id, Product_Id).
 The non-primary key Units_Sold column of the fact table in this
example represents a measure or metric that can be used in
calculations and analysis.
 The non-primary key columns of the dimension tables represent
additional attributes of the dimensions (such as the Year of the
Dim_Date dimension).
 For example, the following query answers how many TV sets have
been sold, for each brand and country, in 1997:
 SELECT P.Brand, S.Country, SUM(F.Units_Sold)FROM
Fact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNER
JOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Product P
ON F.Product_Id = P.IdWHERE D.YEAR = 1997AND
P.Product_Category = 'tv'GROUP BY P.Brand, S.Country
Snowflake
Schema
Star Schema
Ease of
maintenance/change:
No redundancy
and hence more
easy to maintain
and change
Has redundant data and hence less easy to
maintain/change
Ease of Use:
More complex
queries and hence
less easy to
understand
Less complex queries and easy to
understand
Query Performance:
More foreign keys-
and hence more
query execution
time
Less no. of foreign keys and hence lesser
query execution time
Normalization:
Has normalized
tables
Has De-normalized tables
Type of
Datawarehouse:
Good to use for
datawarehouse
core to simplify
complex
relationships
(many:many)
Good for datamarts with simple
relationships (1:1 or 1:many)
Joins:
Higher number of
Joins
Fewer Joins
Dimension table:
It may have more
than one
dimension table
for each
dimension
Contains only single dimension table for
each dimension
When to use:
When dimension
table is relatively
big in size,
snowflaking is
better as it
reduces space.
When dimension table contains less number
of rows, we can go for Star schema.
References
 http://www.programmerinterview.com/index.php/data
base-sql/data-mining-vs-warehousing/
 http://en.wikipedia.org/wiki/Data_mining
 http://en.wikipedia.org/wiki/Data_warehouse
Thank you For Your Attention
Any Questions
Presented by
Engr.Faizan Saleem
Software Engineer
Bahria University Karachi Campus
faizansaleem2803@yahoo.com
www.facebook.com/faiz.saleem

Más contenido relacionado

La actualidad más candente

Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceHank Lin
 
Dbms ii mca-ch1-ch2-intro-datamodel-2013
Dbms ii mca-ch1-ch2-intro-datamodel-2013Dbms ii mca-ch1-ch2-intro-datamodel-2013
Dbms ii mca-ch1-ch2-intro-datamodel-2013Prosanta Ghosh
 
Business Intelligence-v1.pptx
Business Intelligence-v1.pptxBusiness Intelligence-v1.pptx
Business Intelligence-v1.pptxRandhirShah3
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Miningcpjcollege
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 

La actualidad más candente (20)

Ppt
PptPpt
Ppt
 
Data Mining
Data MiningData Mining
Data Mining
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Data mining
Data miningData mining
Data mining
 
Dbms ii mca-ch1-ch2-intro-datamodel-2013
Dbms ii mca-ch1-ch2-intro-datamodel-2013Dbms ii mca-ch1-ch2-intro-datamodel-2013
Dbms ii mca-ch1-ch2-intro-datamodel-2013
 
Big data analysis
Big data analysisBig data analysis
Big data analysis
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Business Intelligence-v1.pptx
Business Intelligence-v1.pptxBusiness Intelligence-v1.pptx
Business Intelligence-v1.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Example Accomplishment Statements
Example Accomplishment StatementsExample Accomplishment Statements
Example Accomplishment Statements
 
Big data
Big dataBig data
Big data
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 

Similar a Data warehousing and Data mining

DMML1_overview.ppt
DMML1_overview.pptDMML1_overview.ppt
DMML1_overview.pptbutest
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questionsSatyam Jaiswal
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Conceptsraulmisir
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.pptBsMath3rdsem
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Edwin S. Garcia
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.pptKRISHNARAJ207
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mininggulab sharma
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewNagaraj Yerram
 
Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overviewnetpeachteam
 

Similar a Data warehousing and Data mining (20)

DMML1_overview.ppt
DMML1_overview.pptDMML1_overview.ppt
DMML1_overview.ppt
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questions
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt
 
DWM
DWMDWM
DWM
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
Abstract
AbstractAbstract
Abstract
 

Último

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 

Último (20)

UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 

Data warehousing and Data mining

  • 1. Data Mining and Data Warehousing Techniques Presented to : Muhammad Faisal Presented by: Faizan Saleem Pireh Pirzada Ahmed Hassan Muhammad Usman BSE-4 | DATABASE MANAGEMENT SYSTEM
  • 2. Topics  Why we need Data warehouses and Data mining?  What Data warehouses and Data mining?  History of Data warehouses and Data mining?  Techniques of Data warehouses and Data mining
  • 3. Why we need Data Mining and Ware-housing Problem Scenario Solution Needs of Data warehouses and Data Mining
  • 4. Why Data Warehouse? Necessity is the mother of invention
  • 6. Problem Scenario 1 ABC Pvt Ltd is a company with branches at Karachi, Lahore, Peshawar and Islamabad. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
  • 7. ABC Pvt Ltd. Karachi Lahore Peshawar Islamabad Sales Manager Sales per item type per branch for first quarter.
  • 8. Solution for ABC Pvt Ltd.  Extract sales information from each database and Store the information in a common repository at a single site.
  • 9. Solution ABC Pvt Ltd. Karachi Lahore Peshawar Islamabad Data Warehouse Sales Manager Query & Analysis tools Reports
  • 10. Problem Scenario 2 A Shopping Super Market has huge operational database. Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
  • 11. Problem Operational Database Data Entry Operator Data Entry Operator ManagementWait Report
  • 12. Solutions for Shopping Mart  Extract data needed for analysis from operational database and Store it in warehouse.  Refresh warehouse at regular interval so that it contains up to date information for analysis.  Warehouse will contain data with historical perspective.
  • 14. Need for Data Warehousing  Industry has huge amount of operational data  Knowledge worker wants to turn this data into useful information.  This information is used by them to support strategic decision making .
  • 15. Need for Data Warehousing  It is a platform for consolidated historical data for analysis.  It stores data of good quality so that knowledge worker can make correct decisions.
  • 16. Need for Data Warehousing  From business perspective It is latest marketing weapon Helps to keep customers by learning more about their needs . Valuable tool in today’s competitive fast evolving world.
  • 17. Why Mine Data? Commercial Viewpoint  Lots of data is being collected and warehoused  Web data, e-commerce  Purchases at department/ grocery stores  Bank/Credit Card transactions  Computers have become cheaper and more powerful  Competitive Pressure is Strong  Provide better, customized services for an edge (e.g. in Customer Relationship Management)
  • 18. Why Mine Data in Scientific Viewpoint  Data collected and stored at enormous speeds (GB/hour)  Remote sensors on a satellite  telescopes scanning the skies  Microarrays generating gene expression data  Scientific simulations generating terabytes of data
  • 19. What is Data Mining and Ware- housing? Definition Data Warehouse Data Ware houses Uses Definition Data Warehouse Data Mining Uses Data Ware Housing Verses Data Mining Examples
  • 20. What is Data Ware-Housing? 20 Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository. A process of transforming data into information and making it available to users in a timely enough manner to make a difference. Data Information
  • 21. Data Ware-Housing Uses  Reporting and Data Analysis.  Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
  • 23. What is Data Mining? 23 Data mining is the process of mining and discovering of new information in terms of patterns or rules from vast amounts of data involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.
  • 24. What is Data Mining?  Extract information and transform it into an understandable structure.  Uses past data to analyze the outcome of a particular problem or situation.
  • 25. Data Mining Uses  To decide upon marketing strategies for their product.  They can use data to compare and contrast among competitors.  Data mining interprets its data into real time analysis that can be used to:  increase sales,  promote new product,  or delete product that is not value-added to the company.
  • 26. Data Mining works with Warehouse Data 26  Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
  • 27. Data ware-housing VS data mining Data Ware Housing  Occurs before any Data mining process.  data warehousing is the process of compiling and organizing data into one common database Data Mining  Relies on data warehousing data to detect meaningful patterns.  data mining is the process of extracting meaningful data from that database.
  • 28. Example of data mining  Credit Card Fraud.  Data it collection on shoppers to find patterns in their shopping habits.  A great example of data warehousing that everyone can relate to is what Facebook does.
  • 29. History of Data Mining and Ware-housing? Data Warehouse History Data Mining History
  • 30. History of Data warehouse  1960s — General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.  1970s — ACNielsen and IRI provide dimensional data marts for retail sales.  1970s — Bill Inmon begins to define and discuss the term: Data Warehouse
  • 31. History of Data warehouse  1975 — Sperry Univac Introduce MAPPER (MAintain, Prepare, and Produce Executive Reports) is a database management and reporting system that includes the world's first 4GL.
  • 32. History of Data warehouse  1983 — Tera data introduces a database management system specifically designed for decision support.  1983 — Sperry Corporation Martyn Richard Jones defines the Sperry Information Center approach, which while not being a true DW in the Inmon sense, did contain many of the characteristics of DW structures.
  • 33. History of Data warehouse  1984 — Metaphor Computer Systems releases Data Interpretation System (DIS). DIS was a hardware/software package and GUI for business users to create a database management and analytic system.
  • 34. History of Data warehouse  1988 — Barry Devlin and Paul Murphy publish the article in IBM Systems Journal where they introduce the term "business data warehouse".  1990 — Red Brick Systems, founded by Ralph Kimball, introduces Red Brick Warehouse, a database management system specifically for data warehousing.  1991 — Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse.
  • 35. History of Data warehouse  1992 — Bill Inmon publishes the book Building the Data Warehouse.  1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.
  • 36. History of Data warehouse  1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.  2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses warehouse.
  • 37. Brief History Of Data Mining  The term "Data mining" was introduced in the 1990s.  Data mining can be tracked through classical statistics, artificial intelligence, and machine learning.  Statistics are the foundation of most technologies on which data mining is built. All of these are used to study data and data relationships.
  • 38.  Artificial intelligence, or AI, which is built upon heuristics as opposed to statistics, attempts to apply human-thought-like processing to statistical problems. AI concepts were adopted for RDBMS ‘s Query processor. Brief History Of Data Mining
  • 39. Brief History Of Data Mining  Machine learning is the union of statistics and AI. It could be considered an evolution of AI, because it blends AI heuristics with advanced statistical analysis.
  • 40. Data Mining Techniques Task of data mining Applications of data mining
  • 41. Processes Used in Data Mining It is done by two Methods: • Prediction Methods • Description Methods
  • 42. How it works  Data mining involves six common tasks o Classification [Predictive] o Clustering [Descriptive] o Association Rule Discovery [Descriptive] o Sequential Pattern Discovery [Descriptive] o Regression [Predictive] o Deviation Detection [Predictive]
  • 43. Anomaly detection  What is Anomaly Detection ?  Types of Anomaly Detection: • Unsupervised anomaly detection • Supervised anomaly detection • Semi-supervised anomaly detection
  • 44. Association rule learning  What is Association rule learning  The examples: • In super Market • Inventory Management
  • 45. Classification What is it ?  Given a collection of records (training set ) Find a model for class attribute as a function of the values of other attributes Goal: previously unseen records should be assigned a class as accurately as possible.  Example:
  • 46. Clusters  What is it ?  Example:
  • 47. Sequential Pattern Discovery  What is it?  Example:  In point-of-sale transaction sequences,  Computer Bookstore: (Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk)  Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports_Jacket) (A B) (C)  (D E)
  • 48. Regression  What is it ?  Example:  Pagerank as used by google  • Page structure implicitly holds importance of a page  • Important pages are linked to by important pages
  • 49. Applications Of Data Mining  Data Mining Applications in Sales/Marketing  Data Mining Applications in Banking / Finance  Data Mining Applications in Health Care and Insurance  Data Mining Applications in Transportation  Data Mining Applications in Medicine
  • 50. Data Mining Applications in Sales/Marketing  enables businesses to understand the hidden patterns inside historical purchasing transaction  Market basket analysis  Identify customer’s behavior
  • 51. Data Mining Applications in Banking / Finance  credit card fraud detection  identify customers loyalty  identify stock trading rules  Identify users by method of payment/transaction
  • 52. Data Mining Applications in Health Care and Insurance  Claims analysis  Forecasts of customers  Detect risky customers  Fraudulent behavior
  • 53. Data Mining Applications in Transportation  Determine the distribution schedules
  • 54. Data Mining Applications in Medicine  Characterize patient activities  Identify the patterns
  • 56. Star Schema  Star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions.  A star schema is diagramed by surrounding each fact with its associated dimensions.  The resulting diagram resembles a star.  Star schemas are optimized for querying large data sets and are used in data warehouses and data marts to support OLAP cubes, business intelligence and analytic applications, and queries.
  • 57. Elements of star schema  Dimension tables  A dimension contains reference information about the fact, such as date, product, or customer.  Demoralized, decoded and cleaned set of descriptive data elements  Geography dimension tables describe location data, such as country, state, or city  Employee dimension tables describe employees, such as salespeople
  • 58. Fact Tables A fact is an event that is counted or measured, such as a sale or login. Contains foreign keys referencing dimension records Contain either additive or semi-additive measures for analysis
  • 60. Example  Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id).  The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis.  The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension).  For example, the following query answers how many TV sets have been sold, for each brand and country, in 1997:  SELECT P.Brand, S.Country, SUM(F.Units_Sold)FROM Fact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNER JOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Product P ON F.Product_Id = P.IdWHERE D.YEAR = 1997AND P.Product_Category = 'tv'GROUP BY P.Brand, S.Country
  • 61. Snowflake Schema Star Schema Ease of maintenance/change: No redundancy and hence more easy to maintain and change Has redundant data and hence less easy to maintain/change Ease of Use: More complex queries and hence less easy to understand Less complex queries and easy to understand Query Performance: More foreign keys- and hence more query execution time Less no. of foreign keys and hence lesser query execution time Normalization: Has normalized tables Has De-normalized tables
  • 62. Type of Datawarehouse: Good to use for datawarehouse core to simplify complex relationships (many:many) Good for datamarts with simple relationships (1:1 or 1:many) Joins: Higher number of Joins Fewer Joins Dimension table: It may have more than one dimension table for each dimension Contains only single dimension table for each dimension When to use: When dimension table is relatively big in size, snowflaking is better as it reduces space. When dimension table contains less number of rows, we can go for Star schema.
  • 64. Thank you For Your Attention Any Questions
  • 65. Presented by Engr.Faizan Saleem Software Engineer Bahria University Karachi Campus faizansaleem2803@yahoo.com www.facebook.com/faiz.saleem

Notas del editor

  1. Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository. A process of transforming data into information and making it available to users in a timely enough manner to make a difference.
  2. , e.g. regression analysis, standard distribution, standard deviation, etc (STATISTICS)
  3. . Machine learning attempts to let computer programs learn about the data they study, such that programs make different decisions based on the qualities of the studied data, using statistics for fundamental concepts, and adding more advanced AI heuristics and algorithms to achieve its goals.Data mining, in many ways, is fundamentally the adaptation of machine learning techniques to business applications. Data mining is best described as the union of historical and recent developments in statistics, AI, and machine learning. These techniques are then used together to study data and find previously-hidden trends or patterns within.
  4. Data Mining Applications in Sales/MarketingData mining enables businesses to understand the hidden patterns inside historical purchasing transaction data, thus helping in planning and launching new marketing campaigns in prompt and cost effective way. The following illustrates several data mining applications in sale and marketing.Data mining is used for market basket analysis to provide information on what product combinations were purchased together, when they were bought and in what sequence.  This information helps businesses promote their most profitable products and maximize the profit. In addition, it encourages customers to purchase related products that they may have been missed or overlooked.Retail companies uses data mining to identify customer’s behavior buying patterns.
  5. Several data mining techniques e.g., distributed data mining have been researched, modeled and developed to help credit card fraud detection.Data mining is used to identify customers loyalty by analyzing the data of customer’s purchasing activities such as the data of frequency of purchase in a period of time, total monetary value of all purchases and when was the last purchase. After analyzing those dimensions, the relative measure is generated for each customer. The higher of the score, the more relative loyal the customer is.To help bank to retain credit card customers, data mining is applied.  By analyzing the past data, data mining can help banks predict customers that likely to change their credit card affiliation so they can plan and launch different special offers to retain those customers.Credit card spending by customer groups can be identified by using data mining.The hidden correlation’s between different financial indicators can be discovered by using data mining.From historical market data, data mining enables to identify stock trading rules.
  6. The growth of the insurance industry entirely depends on the ability of converting data into the knowledge, information or intelligence about customers, competitors and its markets. Data mining is applied in insurance industry lately but brought tremendous competitive advantages to the companies who have implemented it successfully. The data mining applications in insurance industry are listed below:Data mining is applied in claims analysis such as identifying which medical procedures are claimed together.Data mining enables to forecasts which customers will potentially purchase new policies.Data mining allows insurance companies to detect risky customers’ behavior patterns.Data mining helps detect fraudulent behavior.
  7. Data mining helps determine the distribution schedules among warehouses and outlets and analyze loading patterns.
  8. Data mining enables to characterize patient activities to see incoming office visits.Data mining helps identify the patterns of successful medical therapies for different illnesses.