1. Knowledge Acquisition In
Decision Making (SQIT 3033)
Izwan Nizal Mohd Shaharanee
SQS 4017/ 6364
nizal@uum.edu.my
izwan.nizal@gmail.com
2. Course Objective
To introduce :: knowledge about data mining
and data warehouse
To evaluate and understand several data
mining techniques
To enhance skill on data mining through
analysis problem in business
Being able to apply the commonly used
functions of SAS Enterprise Miner and
WEKA to solve data mining problems
Developing the skills of data mining modeling
and data analysis with SAS Enterprise Miner
and WEKA
3. Course Content
Intro to Knowledge Acquisition aka ~knowledge discovery~
(3 hours)
Knowledge Discovery Process (4 hours)
Pre-processing data (5 hours)
Predictive Modeling (10 hours)
Decision Tree, Regression, Neural Network, Rough Set
Evaluation And Implementation (6 hours)
Descriptive Modeling (7 hours)
Clustering, Association Rules
Data mining ethics (1.5 hours)
PROJECT PRESENTATION
4. Course Evaluation
Assignments
Case study + Presentations
Project + Poster Presentations
Mid Term ? Quizzes ?
Class PARTICIPATION !!
Final Exam 40%
60%
5. PreRequisites
A “Basic statistics course such as
SQQS2023”Bussiness Statistical”+” programming
language knowledge”+“SAS knowledge”+”Database”+
“spreadsheet+ web 2.0”
Passion in computer applications
Dare to take the challenges
Have a sincere heart to understand infinite God’s
knowledge
Attendance is compulsory (no freely “tuang kelas”)
Behave your “gadget". Please respect others
9. The Age of Big Data
“The BBC documentary follows people who mine
Big Data, including LAPD police officers who use
data to predict crime, a London scientist/trader
who makes millions with math, and a South
African astronomer who wants to catalog the
entire cosmos.”
“Data Scientist” is the sexiest job of the 21st
century. The Harvard Business Review made this
claim last October and it seems that everyone
(including your grandmother) has been repeating it
ever since.
10. Why Knowledge Acquisitions ?
Why?
Data explosion (tremendous amount of data available + cloud
computing)
Data is being warehoused
Computing power – Bionic Skin?
Competitive pressure
Hard Disk Nowadays more than 1TB capacities
11. What is Knowledge Acquisitions ?
aka :: data mining, knowledge discovery, knowledge
extraction, information discovery, information
harvesting ect.
Process of discovering useful information,hidden
pattern or rules in large quantities of data ( non-
trivial, unknown data)
By automatic or semiautomatic means
It’s impossible to find pattern using manual method.
12. Traditional Approaches
Traditional database queries:. Access a
database using a well defined query such as
SQL
The query output consist of data from
database
The output usually a subset of the database
DBMS
DB
SQL
13. Disciplines Of Data Mining
Data Mining
Information RetrievalAlgorithm
Machine Learning Visualization
StatisticsDatabase System
14. Data Mining Model & Task
Data Mining
Predictive Descriptive
•Classification
•Regression
•Time Series Analysis
•Prediction
•Clustering
•Summarization
•Association Rules
•Sequence Discovery
15. Try to related with your previous
knowledge?
Hmmm…how this data mining differ with
forecasting or prediction?
Are there similar?
16. Predictive Model
Make prediction about values of data using
known results found from different data
Or based on the use of other historical data
Example:: credit card fraud, breast cancer
early warning, terrorist act, tsunami and ect.
Ghost Protocol, Minority Report, Eagle Eye,
17. Predictive Model
Perform inference on the current data to make
predictions.
We know what to predict based on historical data)
Never accurate 100%
Concentrate more to input output relation ship ( x,f(x))
Typical Question
Which costumer are likely to buy this product next
four month
What kind of transactions that are likely to be
fraudulent
Who is likely to drop this paper?
19. Descriptive Model
Identifies pattern or relationships in data.
Serves as a way to explore the properties of data
examined, not to predict new properties
Always required a domain expert
Example::
Segmenting marketing area
Profiling student performances
Profiling GooglePlay/ AppleApps customer
20. Descriptive Model
Discovering new patterns inside the data
We may don’t have any idea how the data looks like
Explores the properties of the data examined
Pattern at various granularities (eg: Student: University-
> faculty->program-> major?
Typical Question
What is the data
What does it look like
What does the data suggest for group of costumer
advertisement?
22. View Of DM
Data To Be Mined
Data warehouse, WWW, time series, textual. spatial
multimedia, transactional
Knowledge To Be Mined
Classification, prediction, summarization, trend
Techniques Utilized
Database, machine learning, visualization, statistics
Applications Adapted
Marketing, demographic segmentation, stock analysis
23. DM In Action
Medical Applications ::clinical diagnosis, drug analysis
Business (marketing segmentation & strategies, insolvency
predictor, loan risk assessment
Education (Online learning)
Internet (searching engine)
Ect
24. Data Mining Methodology
Hypothesis Testing vs Knowledge Discovery
Hypothesis Testing
Top down approach
Attempts to substantiate or disprove preconceived idea
Knowledge Discovery
Bottom-up approach
Start with data and tries to get it to tell us something
we didn’t already know
25. Data Mining Methodology
Hypothesis Testing
Generate good ideas
Determine what data allow these hypotheses to be
tested
Locate the data
Prepare the data for analysis
Build computer models based on the data
Evaluate computer model to confirm or reject
hypotheses
26. Data Mining Methodology
Knowledge Discovery
Directed
Identified sources of pre classified data
Prepare data analysis
Select appropriated KD techniques based on data
characteristics and data mining goal
Divide data into training, testing and evaluation
Use the training dataset to build model
Tune the model by applying it to test dataset
Take action based on data mining results
Measure the effect of the action taken
Restart the DM process taking advantage of new data
generated by the action taken
27. Data Mining Methodology
Knowledge Discovery
Undirected
Identified available data sources
Prepare data analysis
Select appropriated undirected KD techniques based on
data characteristics and data mining goal
Use the selected technique to uncover hidden structure in
the data
Identify potential targets for directed KD
Generate new hypothesis to test
28. Revision::
Two Approaches In data Mining
Data Mining
Predictive Descriptive
•Classification
•Regression
•Time Series Analysis
•Prediction
•Clustering
•Summarization
•Association Rules
•Sequence Discovery
Predict the future value Define R/S among data
31. Knowledge Discovery Process
1.0 Selection
The data needs for the data mining process may be
obtained from many different and heterogeneous
data sources
Examples
Business Transactions
Scientific Data
Video and pictures
UUM Student Database
32.
33. Knowledge Discovery Process
2.0 Pre Processing
Main idea – to ensure that data is clean (high quality of
data).
The data to be used by the process may have
incorrect or missing data.
There may be anomalous data from multiple
sources involving different data types and
metrics
Erroneous data may be corrected or removed,
whereas missing data must be supplied or
predicted (Often using data mining tools)
34. Knowledge Discovery Process
3.0 Transformation
Data from different sources must be converted
into a common format for processing
Some data may be encoded or transformed into
more usable formats
Example::
Data Reduction Data Cleaning, Data Integration,
Data Transformation, Data Reduction and Data
Discretization
35. Knowledge Discovery Process
4.0 Data Mining
Main idea –to use intelligent method to extract patterns
and knowledge from database
This step applies algorithms to the transformed data to
generate the desired results.
The heart of KD process (where unknown pattern will be
revealed).
Example of algorithms: Regression (classification,
prediction), Neural Networks (prediction, classification,
clustering), Apriori Algorithms (association rules), K-
Means & K-Nearest Neighbor (clustering), Decision
Tree (classification), Instance Learning (classification).
36. Knowledge Discovery Process
5.0 Interpretation/Evaluation
How the data mining results are presented to the
users is extremely important because the
usefulness of the results is dependent on it
Example::
Graphical
Geometric
Icon Based
Pixel Based
Hierarchical Based
Hybrid
37. Case Study: Predicting SQS Final
Year’s Student Performance
activities
Student
database
{contains
30,000 records}
Academics
academics
Selected record
{matric, PMK, grades} –
only 2,000 records
(contains incomplete
records etc.
academics
Clean record {replace
the missing value,
removed the replicated}
Y=w1x1+w2x2+b1
Generated Model :
pattern for
prediction
Testing result:
90 % correct
accept model
Knowledge
(apply model)
Using neural
networks :
transform into
numerical.
Selection Pre-processing
Transformation
Data mining
Interpretation
& evaluation
38. Assignment 1
Individual Assignment >> you may be selected (randomly) to present
your answer? (2 minutes max)
Discuss how data mining application can be utilized in construction
project.
You may discuss
Give an appropriated example of any construction issues? Ect. Weather
forecasting to determine the development time, housing loan issues..
How it can been done?
At least 2 pages, 1.5 spacing (Do include your references and
proper citation)
Due Date: 6 March 2014
Title Page(need to supply with the due date)
Upload your assignment through SCRIBD (you may need to
register) and share the documents with me
No need to printout. Share the document. Save the planet. Go
green