SlideShare una empresa de Scribd logo
1 de 38
Knowledge Acquisition In
Decision Making (SQIT 3033)
Izwan Nizal Mohd Shaharanee
SQS 4017/ 6364
nizal@uum.edu.my
izwan.nizal@gmail.com
Course Objective
 To introduce :: knowledge about data mining
and data warehouse
 To evaluate and understand several data
mining techniques
 To enhance skill on data mining through
analysis problem in business
 Being able to apply the commonly used
functions of SAS Enterprise Miner and
WEKA to solve data mining problems
 Developing the skills of data mining modeling
and data analysis with SAS Enterprise Miner
and WEKA
Course Content
 Intro to Knowledge Acquisition aka ~knowledge discovery~
(3 hours)
 Knowledge Discovery Process (4 hours)
 Pre-processing data (5 hours)
 Predictive Modeling (10 hours)
 Decision Tree, Regression, Neural Network, Rough Set
 Evaluation And Implementation (6 hours)
 Descriptive Modeling (7 hours)
 Clustering, Association Rules
 Data mining ethics (1.5 hours)
 PROJECT PRESENTATION
Course Evaluation
 Assignments
 Case study + Presentations
 Project + Poster Presentations
 Mid Term ? Quizzes ?
 Class PARTICIPATION !!
 Final Exam 40%
60%
PreRequisites
 A “Basic statistics course such as
SQQS2023”Bussiness Statistical”+” programming
language knowledge”+“SAS knowledge”+”Database”+
“spreadsheet+ web 2.0”
 Passion in computer applications
 Dare to take the challenges
 Have a sincere heart to understand infinite God’s
knowledge
 Attendance is compulsory (no freely “tuang kelas”)
 Behave your “gadget". Please respect others
Timetable
Please introduce yourself..
http://padlet.com/wall/8yly4q2yu8
Facebook Group
https://www.facebook.com/dataharvester2.0
Youtube Channel + Vimeo Video
izwan nizal
http://www.theage.com.au/it-pro/business-it/data-miners-find-theres-gold-in-them-thar-
files-20120511-1yi3q.html
The Age of Big Data
 “The BBC documentary follows people who mine
Big Data, including LAPD police officers who use
data to predict crime, a London scientist/trader
who makes millions with math, and a South
African astronomer who wants to catalog the
entire cosmos.”
 “Data Scientist” is the sexiest job of the 21st
century. The Harvard Business Review made this
claim last October and it seems that everyone
(including your grandmother) has been repeating it
ever since.
Why Knowledge Acquisitions ?
 Why?
 Data explosion (tremendous amount of data available + cloud
computing)
 Data is being warehoused
 Computing power – Bionic Skin?
 Competitive pressure
Hard Disk Nowadays more than 1TB capacities
What is Knowledge Acquisitions ?
 aka :: data mining, knowledge discovery, knowledge
extraction, information discovery, information
harvesting ect.
 Process of discovering useful information,hidden
pattern or rules in large quantities of data ( non-
trivial, unknown data)
 By automatic or semiautomatic means
 It’s impossible to find pattern using manual method.
Traditional Approaches
 Traditional database queries:. Access a
database using a well defined query such as
SQL
 The query output consist of data from
database
 The output usually a subset of the database
DBMS
DB
SQL
Disciplines Of Data Mining
Data Mining
Information RetrievalAlgorithm
Machine Learning Visualization
StatisticsDatabase System
Data Mining Model & Task
Data Mining
Predictive Descriptive
•Classification
•Regression
•Time Series Analysis
•Prediction
•Clustering
•Summarization
•Association Rules
•Sequence Discovery
Try to related with your previous
knowledge?
 Hmmm…how this data mining differ with
forecasting or prediction?
 Are there similar?
Predictive Model
 Make prediction about values of data using
known results found from different data
 Or based on the use of other historical data
 Example:: credit card fraud, breast cancer
early warning, terrorist act, tsunami and ect.
 Ghost Protocol, Minority Report, Eagle Eye,
Predictive Model
 Perform inference on the current data to make
predictions.
 We know what to predict based on historical data)
 Never accurate 100%
 Concentrate more to input output relation ship ( x,f(x))
 Typical Question
 Which costumer are likely to buy this product next
four month
 What kind of transactions that are likely to be
fraudulent
 Who is likely to drop this paper?
Predictive Model
x
x x
xx
x
x
x
x
x
x
x x
x
x
x
months
Profit (RM)
Current data
Future dataO ?
Descriptive Model
 Identifies pattern or relationships in data.
 Serves as a way to explore the properties of data
examined, not to predict new properties
 Always required a domain expert
 Example::
 Segmenting marketing area
 Profiling student performances
 Profiling GooglePlay/ AppleApps customer
Descriptive Model
 Discovering new patterns inside the data
 We may don’t have any idea how the data looks like
 Explores the properties of the data examined
 Pattern at various granularities (eg: Student: University-
> faculty->program-> major?
 Typical Question
 What is the data
 What does it look like
 What does the data suggest for group of costumer
advertisement?
Descriptive Model
major
Results
x
x x
x
x
x
x
x
x
x
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
y
y
y
y
y y
yy y
y
y
y y
y
y
Group 1
Group 2
Group 3
View Of DM
 Data To Be Mined
 Data warehouse, WWW, time series, textual. spatial
multimedia, transactional
 Knowledge To Be Mined
 Classification, prediction, summarization, trend
 Techniques Utilized
 Database, machine learning, visualization, statistics
 Applications Adapted
 Marketing, demographic segmentation, stock analysis
DM In Action
 Medical Applications ::clinical diagnosis, drug analysis
 Business (marketing segmentation & strategies, insolvency
predictor, loan risk assessment
 Education (Online learning)
 Internet (searching engine)
 Ect
Data Mining Methodology
 Hypothesis Testing vs Knowledge Discovery
 Hypothesis Testing
 Top down approach
 Attempts to substantiate or disprove preconceived idea
 Knowledge Discovery
 Bottom-up approach
 Start with data and tries to get it to tell us something
we didn’t already know
Data Mining Methodology
 Hypothesis Testing
 Generate good ideas
 Determine what data allow these hypotheses to be
tested
 Locate the data
 Prepare the data for analysis
 Build computer models based on the data
 Evaluate computer model to confirm or reject
hypotheses
Data Mining Methodology
 Knowledge Discovery
 Directed
 Identified sources of pre classified data
 Prepare data analysis
 Select appropriated KD techniques based on data
characteristics and data mining goal
 Divide data into training, testing and evaluation
 Use the training dataset to build model
 Tune the model by applying it to test dataset
 Take action based on data mining results
 Measure the effect of the action taken
 Restart the DM process taking advantage of new data
generated by the action taken
Data Mining Methodology
 Knowledge Discovery
 Undirected
 Identified available data sources
 Prepare data analysis
 Select appropriated undirected KD techniques based on
data characteristics and data mining goal
 Use the selected technique to uncover hidden structure in
the data
 Identify potential targets for directed KD
 Generate new hypothesis to test
Revision::
Two Approaches In data Mining
Data Mining
Predictive Descriptive
•Classification
•Regression
•Time Series Analysis
•Prediction
•Clustering
•Summarization
•Association Rules
•Sequence Discovery
Predict the future value Define R/S among data
Knowledge Discovery Process
Knowledge Discovery Process
Knowledge Discovery Process
 1.0 Selection
 The data needs for the data mining process may be
obtained from many different and heterogeneous
data sources
 Examples
 Business Transactions
 Scientific Data
 Video and pictures
 UUM Student Database
Knowledge Discovery Process
 2.0 Pre Processing
 Main idea – to ensure that data is clean (high quality of
data).
 The data to be used by the process may have
incorrect or missing data.
 There may be anomalous data from multiple
sources involving different data types and
metrics
 Erroneous data may be corrected or removed,
whereas missing data must be supplied or
predicted (Often using data mining tools)
Knowledge Discovery Process
 3.0 Transformation
 Data from different sources must be converted
into a common format for processing
 Some data may be encoded or transformed into
more usable formats
 Example::
 Data Reduction Data Cleaning, Data Integration,
Data Transformation, Data Reduction and Data
Discretization
Knowledge Discovery Process
 4.0 Data Mining
 Main idea –to use intelligent method to extract patterns
and knowledge from database
 This step applies algorithms to the transformed data to
generate the desired results.
 The heart of KD process (where unknown pattern will be
revealed).
 Example of algorithms: Regression (classification,
prediction), Neural Networks (prediction, classification,
clustering), Apriori Algorithms (association rules), K-
Means & K-Nearest Neighbor (clustering), Decision
Tree (classification), Instance Learning (classification).
Knowledge Discovery Process
 5.0 Interpretation/Evaluation
 How the data mining results are presented to the
users is extremely important because the
usefulness of the results is dependent on it
 Example::
 Graphical
 Geometric
 Icon Based
 Pixel Based
 Hierarchical Based
 Hybrid
Case Study: Predicting SQS Final
Year’s Student Performance
activities
Student
database
{contains
30,000 records}
Academics
academics
Selected record
{matric, PMK, grades} –
only 2,000 records
(contains incomplete
records etc.
academics
Clean record {replace
the missing value,
removed the replicated}
Y=w1x1+w2x2+b1
Generated Model :
pattern for
prediction
Testing result:
90 % correct 
accept model
Knowledge
(apply model)
Using neural
networks :
transform into
numerical.
Selection Pre-processing
Transformation
Data mining
Interpretation
& evaluation
Assignment 1
 Individual Assignment >> you may be selected (randomly) to present
your answer? (2 minutes max)
 Discuss how data mining application can be utilized in construction
project.
 You may discuss
 Give an appropriated example of any construction issues? Ect. Weather
forecasting to determine the development time, housing loan issues..
 How it can been done?
 At least 2 pages, 1.5 spacing (Do include your references and
proper citation)
 Due Date: 6 March 2014
 Title Page(need to supply with the due date)
 Upload your assignment through SCRIBD (you may need to
register) and share the documents with me
 No need to printout. Share the document. Save the planet. Go
green

Más contenido relacionado

La actualidad más candente

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDataminingTools Inc
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniquesSandhya Tarwani
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniquesHatem Magdy
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Mateusz Brzoska
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)Krishan Pareek
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data MiningScottperrone
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
 

La actualidad más candente (19)

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 

Similar a Introduction to Data Mining

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 

Similar a Introduction to Data Mining (20)

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Talk
TalkTalk
Talk
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Part1
Part1Part1
Part1
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
KDD assignmnt data.docx
KDD assignmnt data.docxKDD assignmnt data.docx
KDD assignmnt data.docx
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
data mining
data miningdata mining
data mining
 
Data mining
Data miningData mining
Data mining
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 

Más de Izwan Nizal Mohd Shaharanee

Más de Izwan Nizal Mohd Shaharanee (8)

Padlet and Kahoot
Padlet and KahootPadlet and Kahoot
Padlet and Kahoot
 
An Improved Framework of Tree-Structured Data Mining for Business Process Log...
An Improved Framework of Tree-Structured Data Mining for Business Process Log...An Improved Framework of Tree-Structured Data Mining for Business Process Log...
An Improved Framework of Tree-Structured Data Mining for Business Process Log...
 
Taklimat kok untuk lawatan luar (2)
Taklimat kok untuk lawatan luar (2)Taklimat kok untuk lawatan luar (2)
Taklimat kok untuk lawatan luar (2)
 
Bengkel pemantapan jurulatih a161
Bengkel pemantapan jurulatih a161Bengkel pemantapan jurulatih a161
Bengkel pemantapan jurulatih a161
 
Mendeley ver6 wm
Mendeley ver6 wmMendeley ver6 wm
Mendeley ver6 wm
 
Chapter 7 -DescriptiveStatistics and Pivot Table
Chapter 7 -DescriptiveStatistics and Pivot TableChapter 7 -DescriptiveStatistics and Pivot Table
Chapter 7 -DescriptiveStatistics and Pivot Table
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Mendeley Training
Mendeley TrainingMendeley Training
Mendeley Training
 

Último

Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 

Último (20)

Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 

Introduction to Data Mining

  • 1. Knowledge Acquisition In Decision Making (SQIT 3033) Izwan Nizal Mohd Shaharanee SQS 4017/ 6364 nizal@uum.edu.my izwan.nizal@gmail.com
  • 2. Course Objective  To introduce :: knowledge about data mining and data warehouse  To evaluate and understand several data mining techniques  To enhance skill on data mining through analysis problem in business  Being able to apply the commonly used functions of SAS Enterprise Miner and WEKA to solve data mining problems  Developing the skills of data mining modeling and data analysis with SAS Enterprise Miner and WEKA
  • 3. Course Content  Intro to Knowledge Acquisition aka ~knowledge discovery~ (3 hours)  Knowledge Discovery Process (4 hours)  Pre-processing data (5 hours)  Predictive Modeling (10 hours)  Decision Tree, Regression, Neural Network, Rough Set  Evaluation And Implementation (6 hours)  Descriptive Modeling (7 hours)  Clustering, Association Rules  Data mining ethics (1.5 hours)  PROJECT PRESENTATION
  • 4. Course Evaluation  Assignments  Case study + Presentations  Project + Poster Presentations  Mid Term ? Quizzes ?  Class PARTICIPATION !!  Final Exam 40% 60%
  • 5. PreRequisites  A “Basic statistics course such as SQQS2023”Bussiness Statistical”+” programming language knowledge”+“SAS knowledge”+”Database”+ “spreadsheet+ web 2.0”  Passion in computer applications  Dare to take the challenges  Have a sincere heart to understand infinite God’s knowledge  Attendance is compulsory (no freely “tuang kelas”)  Behave your “gadget". Please respect others
  • 7. Please introduce yourself.. http://padlet.com/wall/8yly4q2yu8 Facebook Group https://www.facebook.com/dataharvester2.0 Youtube Channel + Vimeo Video izwan nizal
  • 9. The Age of Big Data  “The BBC documentary follows people who mine Big Data, including LAPD police officers who use data to predict crime, a London scientist/trader who makes millions with math, and a South African astronomer who wants to catalog the entire cosmos.”  “Data Scientist” is the sexiest job of the 21st century. The Harvard Business Review made this claim last October and it seems that everyone (including your grandmother) has been repeating it ever since.
  • 10. Why Knowledge Acquisitions ?  Why?  Data explosion (tremendous amount of data available + cloud computing)  Data is being warehoused  Computing power – Bionic Skin?  Competitive pressure Hard Disk Nowadays more than 1TB capacities
  • 11. What is Knowledge Acquisitions ?  aka :: data mining, knowledge discovery, knowledge extraction, information discovery, information harvesting ect.  Process of discovering useful information,hidden pattern or rules in large quantities of data ( non- trivial, unknown data)  By automatic or semiautomatic means  It’s impossible to find pattern using manual method.
  • 12. Traditional Approaches  Traditional database queries:. Access a database using a well defined query such as SQL  The query output consist of data from database  The output usually a subset of the database DBMS DB SQL
  • 13. Disciplines Of Data Mining Data Mining Information RetrievalAlgorithm Machine Learning Visualization StatisticsDatabase System
  • 14. Data Mining Model & Task Data Mining Predictive Descriptive •Classification •Regression •Time Series Analysis •Prediction •Clustering •Summarization •Association Rules •Sequence Discovery
  • 15. Try to related with your previous knowledge?  Hmmm…how this data mining differ with forecasting or prediction?  Are there similar?
  • 16. Predictive Model  Make prediction about values of data using known results found from different data  Or based on the use of other historical data  Example:: credit card fraud, breast cancer early warning, terrorist act, tsunami and ect.  Ghost Protocol, Minority Report, Eagle Eye,
  • 17. Predictive Model  Perform inference on the current data to make predictions.  We know what to predict based on historical data)  Never accurate 100%  Concentrate more to input output relation ship ( x,f(x))  Typical Question  Which costumer are likely to buy this product next four month  What kind of transactions that are likely to be fraudulent  Who is likely to drop this paper?
  • 18. Predictive Model x x x xx x x x x x x x x x x x months Profit (RM) Current data Future dataO ?
  • 19. Descriptive Model  Identifies pattern or relationships in data.  Serves as a way to explore the properties of data examined, not to predict new properties  Always required a domain expert  Example::  Segmenting marketing area  Profiling student performances  Profiling GooglePlay/ AppleApps customer
  • 20. Descriptive Model  Discovering new patterns inside the data  We may don’t have any idea how the data looks like  Explores the properties of the data examined  Pattern at various granularities (eg: Student: University- > faculty->program-> major?  Typical Question  What is the data  What does it look like  What does the data suggest for group of costumer advertisement?
  • 22. View Of DM  Data To Be Mined  Data warehouse, WWW, time series, textual. spatial multimedia, transactional  Knowledge To Be Mined  Classification, prediction, summarization, trend  Techniques Utilized  Database, machine learning, visualization, statistics  Applications Adapted  Marketing, demographic segmentation, stock analysis
  • 23. DM In Action  Medical Applications ::clinical diagnosis, drug analysis  Business (marketing segmentation & strategies, insolvency predictor, loan risk assessment  Education (Online learning)  Internet (searching engine)  Ect
  • 24. Data Mining Methodology  Hypothesis Testing vs Knowledge Discovery  Hypothesis Testing  Top down approach  Attempts to substantiate or disprove preconceived idea  Knowledge Discovery  Bottom-up approach  Start with data and tries to get it to tell us something we didn’t already know
  • 25. Data Mining Methodology  Hypothesis Testing  Generate good ideas  Determine what data allow these hypotheses to be tested  Locate the data  Prepare the data for analysis  Build computer models based on the data  Evaluate computer model to confirm or reject hypotheses
  • 26. Data Mining Methodology  Knowledge Discovery  Directed  Identified sources of pre classified data  Prepare data analysis  Select appropriated KD techniques based on data characteristics and data mining goal  Divide data into training, testing and evaluation  Use the training dataset to build model  Tune the model by applying it to test dataset  Take action based on data mining results  Measure the effect of the action taken  Restart the DM process taking advantage of new data generated by the action taken
  • 27. Data Mining Methodology  Knowledge Discovery  Undirected  Identified available data sources  Prepare data analysis  Select appropriated undirected KD techniques based on data characteristics and data mining goal  Use the selected technique to uncover hidden structure in the data  Identify potential targets for directed KD  Generate new hypothesis to test
  • 28. Revision:: Two Approaches In data Mining Data Mining Predictive Descriptive •Classification •Regression •Time Series Analysis •Prediction •Clustering •Summarization •Association Rules •Sequence Discovery Predict the future value Define R/S among data
  • 31. Knowledge Discovery Process  1.0 Selection  The data needs for the data mining process may be obtained from many different and heterogeneous data sources  Examples  Business Transactions  Scientific Data  Video and pictures  UUM Student Database
  • 32.
  • 33. Knowledge Discovery Process  2.0 Pre Processing  Main idea – to ensure that data is clean (high quality of data).  The data to be used by the process may have incorrect or missing data.  There may be anomalous data from multiple sources involving different data types and metrics  Erroneous data may be corrected or removed, whereas missing data must be supplied or predicted (Often using data mining tools)
  • 34. Knowledge Discovery Process  3.0 Transformation  Data from different sources must be converted into a common format for processing  Some data may be encoded or transformed into more usable formats  Example::  Data Reduction Data Cleaning, Data Integration, Data Transformation, Data Reduction and Data Discretization
  • 35. Knowledge Discovery Process  4.0 Data Mining  Main idea –to use intelligent method to extract patterns and knowledge from database  This step applies algorithms to the transformed data to generate the desired results.  The heart of KD process (where unknown pattern will be revealed).  Example of algorithms: Regression (classification, prediction), Neural Networks (prediction, classification, clustering), Apriori Algorithms (association rules), K- Means & K-Nearest Neighbor (clustering), Decision Tree (classification), Instance Learning (classification).
  • 36. Knowledge Discovery Process  5.0 Interpretation/Evaluation  How the data mining results are presented to the users is extremely important because the usefulness of the results is dependent on it  Example::  Graphical  Geometric  Icon Based  Pixel Based  Hierarchical Based  Hybrid
  • 37. Case Study: Predicting SQS Final Year’s Student Performance activities Student database {contains 30,000 records} Academics academics Selected record {matric, PMK, grades} – only 2,000 records (contains incomplete records etc. academics Clean record {replace the missing value, removed the replicated} Y=w1x1+w2x2+b1 Generated Model : pattern for prediction Testing result: 90 % correct  accept model Knowledge (apply model) Using neural networks : transform into numerical. Selection Pre-processing Transformation Data mining Interpretation & evaluation
  • 38. Assignment 1  Individual Assignment >> you may be selected (randomly) to present your answer? (2 minutes max)  Discuss how data mining application can be utilized in construction project.  You may discuss  Give an appropriated example of any construction issues? Ect. Weather forecasting to determine the development time, housing loan issues..  How it can been done?  At least 2 pages, 1.5 spacing (Do include your references and proper citation)  Due Date: 6 March 2014  Title Page(need to supply with the due date)  Upload your assignment through SCRIBD (you may need to register) and share the documents with me  No need to printout. Share the document. Save the planet. Go green