SlideShare una empresa de Scribd logo
1 de 60
Introduction to Data Mining
Mahmoud Rafeek Alfarra
http://mfarra.cst.ps
University College of Science & Technology- Khan yonis
Development of computer systems
2016
Chapter 1
Outline
 Definition of Data Mining
 Data Mining as an Interdisciplinary field
 Process of Data Mining
 Data Mining Tasks
 Challenges of Data Mining
 Data mining application examples
 Introduction to RapidMiner
Data pyramid
Data pyramid
Definition of Data Mining
"Computers have promised us a source of wisdom but
delivered a flood of data."
"It has been estimated that the amount of information in
the world doubles every 20 months."
The Explosive Growth of Data: from terabytes to
petabytes
We are drowning in data, but starving for knowledge!
Definition of Data Mining
 Knowledge discovery in databases (data
mining) is
“The non-trivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns in data”.
Definition of Data Mining
 Pattern is an arrangement of repeated parts.
 In a data table, a pattern is defined as a set of rows
that share the same values in two or more columns.
 Consider for example, the following table that
contains data about objects; shape, color, and weight.
Definition of Data Mining
WeightColorShapeRow #
100RedBox1->
200RedBox2->
300RedBox3->
400BlueBox4
400BlueCone5
In this table, we have 3 rows (row 1, 2 and 3) that share the same values
in two columns (Shape and Color). From this table, we can observe the following
patterns:
Most Boxes are Red.
We can represent Pattern as rule:
If Shape = Box then Color = Red.
Definition of Data Mining
 Valid: Discovered patterns should be true
on new data with some degree of certainty.
Generalize to the future (other data).
 Novel: Patterns must be novel (should not
be previously known).
Definition of Data Mining
 Actionable: patterns should potentially lead to
some useful actions.
 Understandable: Patterns must be made
understandable in order to facilitate a better
understanding of the underlying data.
Definition of Data Mining
Example: Credit Risk
A credit risk is the risk of default on a debt that may arise from a
borrower failing to make required payments.
In the first resort, the risk is that of the lender and includes lost principal
and interest, disruption to cash flows, and increased collection costs.
Definition of Data Mining
Is it valid?
The pattern has to be valid with respect to a certainty level (rule true for
the 86%)
Is it novel?
The value k should be previously unknown or obvious
Is it useful?
The pattern should provide information useful to the bank for assessing
credit risk
Is it understandable?
Definition of Data Mining
 Other definition of data mining:
“Is the process of extracting knowledge hidden from
large volumes of raw data. The knowledge must be
new, not obvious, and must be able to use it”.
Definition of Data Mining
Many people treat data mining as a synonym for
another popularly used term , knowledge Discovery
in Databases, or KDD. Alternatively, other view
data mining as simply an essential step in the
process of knowledge discovery in databases.
Definition of Data Mining
What is Data Mining?What is not Data Mining?
Certain names are more common in
certain US locations (O’Brien,
O’Rurke, O’Reilly … in Boston area)
Look up phone number in
phone
directory
Group together similar documents
returned by search engine according
to their context (e.g. Amazon
rainforest, Amazon.com,) information
about “Amazon”
Query a Web search engine
Data Mining and Business Intelligence
Increasing potential
to support
business decisions
End User
Business
Analyst
Data
Analyst
DBA
Decision
Making
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Introduction to Data Mining
Mahmoud Rafeek Alfarra
http://mfarra.cst.ps
University College of Science & Technology- Khan yonis
Development of computer systems
2016
Lecture 2
Outline
 Definition of Data Mining
 Data Mining as an Interdisciplinary field
 Process of Data Mining
 Data Mining Tasks
 Challenges of Data Mining
 Data mining application examples
 Introduction to RapidMiner
Outline
Definition of Data Mining
Data Mining as an Interdisciplinary field
Process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMiner
Data Mining as an Interdisciplinary field
“Data mining is an interdisciplinary field bringing
together techniques from machine learning,
pattern recognition, statistics, databases, and
visualization to address the issue of information
extraction from large data bases”.
Data Mining as an Interdisciplinary field
Data Mining
Database
Technology
Statistics
Other
Disciplines
Artificial
Intelligence
Machine
Learning
Visualization
Data Mining as an Interdisciplinary field
Data mining is differ than statistics in kind of data
(not only numerical) , kinds of methods ( mostly use
machine learning methods), more than one
hypotheses, amount of data (statistics uses samples).
Data Mining as an Interdisciplinary field
Data Mining uses methods from Machine
Learning such as decision tree and neural nets.
 Machine Learning uses samples and Data Mining
uses whole data.
 Data Mining can access data from database.
Machine Learning some times used to replace
human where Data Mining to help human.
Data Mining as an Interdisciplinary field
Databases part of Data Mining that provide the
fast and reliable access to data.
Databases used for data operation (Storing and
retrieving data), Data Mining for Decision
making.
Data Mining as an Interdisciplinary field
 Search techniques , Knowledge representation,
Knowledge acquisition, maintenance and
application are other branches of Artificial
Intelligence which are highly related with Data
Mining.
Data Mining as an Interdisciplinary field
Visualization is used to gain visual insights
into the structure of the data.
 Visualization is in large quantities used as a
pre- and post-processing tool for data mining.
Process of Data Mining
 Data Mining is essentially a process of data
drive extraction of not so obvious but useful
information from large databases.
The entire process is interactive and iterative.
Process of Data Mining
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation.
Data cleaning
 Real-world data tends to be incomplete, noisy and inconsistent.
 incomplete: lacking attribute values, lacking certain attributes of
interest.
◦ e.g., occupation=“ ” (missing data)
 noisy: containing noise, errors, or outliers
◦ e.g., Salary=“−10” (an error)
 inconsistent: containing difference in codes or names,
◦ e.g., Age=“42” Birthday=“03/07/1997”
Data Integration
 Data integration is the merging of data
from multiple sources.
 These sources may include multiple
databases, data cubes, or flat files.
Data Selection
 Where data relevant to the analysis task are
retrieved from the database. Therefore,
irrelevant, weakly relevant or redundant
attributes may be detected and removed.
Data Transformation
 Where data are transformed or consolidated into forms
appropriate for mining by performing:
 Summary or aggregation operation, for example:
Daily sales may be aggregated to monthly sales or
annual sales.
Generalization, for example:
 City may be generalized to country or age may
generalized to young, middle- age, senior.
Data Mining
 An essential process where intelligent
methods are applied on data to covert it to
knowledge in for decision making.
 Wide range of methods can be used in data
mining such neural nets, decision tree and
Association.
Pattern evaluation
 To identify the truly interesting pattern based on
some interestingness measures.
 A pattern consider interesting if it is:
 Valid
 Novel
Actionable
Understandable
Knowledge Representation
 Knowledge presentation is the framework that
converts a large amount of data into a particular
data or procedure that human being can figure out
based on an intention.
 In Knowledge representation visualization tools
and knowledge representation techniques are used
to present the mined knowledge to the user.
Introduction to Data Mining
Mahmoud Rafeek Alfarra
http://mfarra.cst.ps
University College of Science & Technology- Khan yonis
Development of computer systems
2016
Lecture 3
Outline
 Definition of Data Mining
 Data Mining as an Interdisciplinary field
 Process of Data Mining
 Data Mining Tasks
 Challenges of Data Mining
 Data mining application examples
 Introduction to RapidMiner
Data Mining Tasks
 Data mining tasks are the kind of data
patterns that can be mined.
 Data Mining functionalities are used to
specify the kind of patterns to be found in the
data mining tasks.
 In general data mining tasks can be classified into
two categories:
Descriptive mining tasks characterize the general
properties of the data.
Predictive mining tasks perform inferences on the current
data in order to make predictions.
Data Mining Tasks
 Most famous data mining tasks:
 Classification [Predictive]
Prediction [Predictive]
Association Rules [Descriptive]
Clustering [Descriptive]
Outlier Analysis [Descriptive]
Data Mining Tasks
Classification
 Classification is used for predictive mining tasks.
 The input data for predictive modeling consists of
two types of variables:
Explanatory variables, which define the essential properties of
the data.
 Target variables , whose values are to be predicted.
 Classification is used to predicate the value of
discrete target variable.
Classification
Prediction
 Similar to classification, except we are trying to predict
the value of a variable (e.g. amount of purchase),
rather than a class (e.g. purchaser or non-purchaser).
Association
 Association Rules aims to find out the relationship
among valuables in database, resulting in deferent types
of rules.
 Seek to produce a set of rules describing the set of
features that are strongly related to each others.
Association
Gender Age Smoker LAD% RCA%
F 52 Y 85 100
M 62 N 80 0
M 75 Y 70 80
M 73 Y 40 99
M 66 N 50 45
… … … … …
 LAD%- The percentage of heat disease caused by left anterior descending coronary artery.
 RCA%- The percentage of heat disease caused by right coronary artery.
Original data from a research on heart disease
Association
Medical Association Rules
NO. Rule
1 Gender=M∩Age≥70∩Smoker=YRCA%≥50(40%,100%)
2 Gender=F∩Age<70∩Smoker=YLAD%≥70(20%,100%)
 Rule 1 indicates:40% of the cases are male, over 70 years old and have the habit of
smoking, the possibility of RCA%≥50% is 100%
 Rule 2 indicates:20% of the cases are female, under 70 years old and have the habit
of smoking, the possibility of LAD%≥70% is 100%
Clustering
 Finds groups of data pointes (clusters) so that data
points that belong to one cluster are more similar to
each other than to data points belonging to different
cluster.
Clustering
Document Clustering:
 Goal: To find groups of documents that are similar to each
other based on the important terms appearing in them.
 Approach: To identify frequently occurring terms in each
document. Form a similarity measure based on the frequencies
of different terms. Use it to cluster.
 Gain: Information Retrieval can utilize the clusters to relate a
new document or search term to clustered documents.
Outlier Analysis
 Discovers data points that are significantly different
than the rest of the data. Such points are known as
anomalies or outliers.
Outline
Definition of Data Mining
Data Mining as an Interdisciplinary field
Process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMiner
Challenges of Data Mining
Scalability: Scalable techniques are needed
to handle the massive scale of data.
Dimensionality: Many applications may
involves a large number of dimensions (e.g.
features or attributes of data)
Challenges of Data Mining
Heterogeneous and Complex Data: In recent years
complicated data types such as graph-based, text-free
and structured data types are introduced. Techniques
developed for data mining must be able to handle the
heterogeneity of the data.
Challenges of Data Mining
Data Quality: Many data sets are imperfect due to
present of missing values and noise un the data. To
handle the imperfection, robust data mining algorithms
must be developed.
Challenges of Data Mining
Data Distribution: As the volume of data increases , it
is no longer possible or safe to keep all the data in the
same place. As a result, the need for distributed data
mining techniques has increased over the years.
Challenges of Data Mining
Privacy Preservation: While privacy intends to prevent
the disclosure of information, data mining attempts to
revel interesting knowledge about data. As a result,
there is growing interest in developing privacy-
preserving data mining algorithms.
Outline
Definition of Data Mining
Data Mining as an Interdisciplinary field
Process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMine
Data mining application
Science
astronomy, bioinformatics, drug discovery, …
Business
advertising, CRM (Customer Relationship management),
investments, manufacturing, sports/entertainment, telecom, e-
Commerce, targeted marketing, health care, …
Web
search engines, web mining,…
Government
law enforcement, profiling tax cheaters,
Introduction to-data-mining  chapter 1

Más contenido relacionado

La actualidad más candente

Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceData Science Thailand
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceNiko Vuokko
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Simplilearn
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 

La actualidad más candente (20)

Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Web mining
Web mining Web mining
Web mining
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Data warehouse testing
Data warehouse testingData warehouse testing
Data warehouse testing
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Big_data_ppt
Big_data_ppt Big_data_ppt
Big_data_ppt
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
data-mining-tutorial.ppt
data-mining-tutorial.pptdata-mining-tutorial.ppt
data-mining-tutorial.ppt
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 

Similar a Introduction to-data-mining chapter 1

1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lectureMahmoud Alfarra
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxTake1As
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databasesbutest
 
A Review Of Data Mining Literature
A Review Of Data Mining LiteratureA Review Of Data Mining Literature
A Review Of Data Mining LiteratureAddison Coleman
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Data mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research TopicsData mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research TopicsTechsparks
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 

Similar a Introduction to-data-mining chapter 1 (20)

2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lecture
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Dm unit i r16
Dm unit i   r16Dm unit i   r16
Dm unit i r16
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
Data mining
Data miningData mining
Data mining
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databases
 
A Review Of Data Mining Literature
A Review Of Data Mining LiteratureA Review Of Data Mining Literature
A Review Of Data Mining Literature
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Data mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research TopicsData mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research Topics
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 

Más de Mahmoud Alfarra

Computer Programming, Loops using Java - part 2
Computer Programming, Loops using Java - part 2Computer Programming, Loops using Java - part 2
Computer Programming, Loops using Java - part 2Mahmoud Alfarra
 
Computer Programming, Loops using Java
Computer Programming, Loops using JavaComputer Programming, Loops using Java
Computer Programming, Loops using JavaMahmoud Alfarra
 
Chapter 10: hashing data structure
Chapter 10:  hashing data structureChapter 10:  hashing data structure
Chapter 10: hashing data structureMahmoud Alfarra
 
Chapter9 graph data structure
Chapter9  graph data structureChapter9  graph data structure
Chapter9 graph data structureMahmoud Alfarra
 
Chapter 8: tree data structure
Chapter 8:  tree data structureChapter 8:  tree data structure
Chapter 8: tree data structureMahmoud Alfarra
 
Chapter 7: Queue data structure
Chapter 7:  Queue data structureChapter 7:  Queue data structure
Chapter 7: Queue data structureMahmoud Alfarra
 
Chapter 6: stack data structure
Chapter 6:  stack data structureChapter 6:  stack data structure
Chapter 6: stack data structureMahmoud Alfarra
 
Chapter 5: linked list data structure
Chapter 5: linked list data structureChapter 5: linked list data structure
Chapter 5: linked list data structureMahmoud Alfarra
 
Chapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structureChapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structureMahmoud Alfarra
 
Chapter 3: basic sorting algorithms data structure
Chapter 3: basic sorting algorithms data structureChapter 3: basic sorting algorithms data structure
Chapter 3: basic sorting algorithms data structureMahmoud Alfarra
 
Chapter 2: array and array list data structure
Chapter 2: array and array list  data structureChapter 2: array and array list  data structure
Chapter 2: array and array list data structureMahmoud Alfarra
 
Chapter1 intro toprincipleofc#_datastructure_b_cs
Chapter1  intro toprincipleofc#_datastructure_b_csChapter1  intro toprincipleofc#_datastructure_b_cs
Chapter1 intro toprincipleofc#_datastructure_b_csMahmoud Alfarra
 
Chapter 0: introduction to data structure
Chapter 0: introduction to data structureChapter 0: introduction to data structure
Chapter 0: introduction to data structureMahmoud Alfarra
 
8 programming-using-java decision-making practices 20102011
8 programming-using-java decision-making practices 201020118 programming-using-java decision-making practices 20102011
8 programming-using-java decision-making practices 20102011Mahmoud Alfarra
 
7 programming-using-java decision-making220102011
7 programming-using-java decision-making2201020117 programming-using-java decision-making220102011
7 programming-using-java decision-making220102011Mahmoud Alfarra
 
6 programming-using-java decision-making20102011-
6 programming-using-java decision-making20102011-6 programming-using-java decision-making20102011-
6 programming-using-java decision-making20102011-Mahmoud Alfarra
 
5 programming-using-java intro-tooop20102011
5 programming-using-java intro-tooop201020115 programming-using-java intro-tooop20102011
5 programming-using-java intro-tooop20102011Mahmoud Alfarra
 
4 programming-using-java intro-tojava20102011
4 programming-using-java intro-tojava201020114 programming-using-java intro-tojava20102011
4 programming-using-java intro-tojava20102011Mahmoud Alfarra
 
3 programming-using-java introduction-to computer
3 programming-using-java introduction-to computer3 programming-using-java introduction-to computer
3 programming-using-java introduction-to computerMahmoud Alfarra
 

Más de Mahmoud Alfarra (20)

Computer Programming, Loops using Java - part 2
Computer Programming, Loops using Java - part 2Computer Programming, Loops using Java - part 2
Computer Programming, Loops using Java - part 2
 
Computer Programming, Loops using Java
Computer Programming, Loops using JavaComputer Programming, Loops using Java
Computer Programming, Loops using Java
 
Chapter 10: hashing data structure
Chapter 10:  hashing data structureChapter 10:  hashing data structure
Chapter 10: hashing data structure
 
Chapter9 graph data structure
Chapter9  graph data structureChapter9  graph data structure
Chapter9 graph data structure
 
Chapter 8: tree data structure
Chapter 8:  tree data structureChapter 8:  tree data structure
Chapter 8: tree data structure
 
Chapter 7: Queue data structure
Chapter 7:  Queue data structureChapter 7:  Queue data structure
Chapter 7: Queue data structure
 
Chapter 6: stack data structure
Chapter 6:  stack data structureChapter 6:  stack data structure
Chapter 6: stack data structure
 
Chapter 5: linked list data structure
Chapter 5: linked list data structureChapter 5: linked list data structure
Chapter 5: linked list data structure
 
Chapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structureChapter 4: basic search algorithms data structure
Chapter 4: basic search algorithms data structure
 
Chapter 3: basic sorting algorithms data structure
Chapter 3: basic sorting algorithms data structureChapter 3: basic sorting algorithms data structure
Chapter 3: basic sorting algorithms data structure
 
Chapter 2: array and array list data structure
Chapter 2: array and array list  data structureChapter 2: array and array list  data structure
Chapter 2: array and array list data structure
 
Chapter1 intro toprincipleofc#_datastructure_b_cs
Chapter1  intro toprincipleofc#_datastructure_b_csChapter1  intro toprincipleofc#_datastructure_b_cs
Chapter1 intro toprincipleofc#_datastructure_b_cs
 
Chapter 0: introduction to data structure
Chapter 0: introduction to data structureChapter 0: introduction to data structure
Chapter 0: introduction to data structure
 
3 classification
3  classification3  classification
3 classification
 
8 programming-using-java decision-making practices 20102011
8 programming-using-java decision-making practices 201020118 programming-using-java decision-making practices 20102011
8 programming-using-java decision-making practices 20102011
 
7 programming-using-java decision-making220102011
7 programming-using-java decision-making2201020117 programming-using-java decision-making220102011
7 programming-using-java decision-making220102011
 
6 programming-using-java decision-making20102011-
6 programming-using-java decision-making20102011-6 programming-using-java decision-making20102011-
6 programming-using-java decision-making20102011-
 
5 programming-using-java intro-tooop20102011
5 programming-using-java intro-tooop201020115 programming-using-java intro-tooop20102011
5 programming-using-java intro-tooop20102011
 
4 programming-using-java intro-tojava20102011
4 programming-using-java intro-tojava201020114 programming-using-java intro-tojava20102011
4 programming-using-java intro-tojava20102011
 
3 programming-using-java introduction-to computer
3 programming-using-java introduction-to computer3 programming-using-java introduction-to computer
3 programming-using-java introduction-to computer
 

Último

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 

Último (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 

Introduction to-data-mining chapter 1

  • 1. Introduction to Data Mining Mahmoud Rafeek Alfarra http://mfarra.cst.ps University College of Science & Technology- Khan yonis Development of computer systems 2016 Chapter 1
  • 2. Outline  Definition of Data Mining  Data Mining as an Interdisciplinary field  Process of Data Mining  Data Mining Tasks  Challenges of Data Mining  Data mining application examples  Introduction to RapidMiner
  • 5. Definition of Data Mining "Computers have promised us a source of wisdom but delivered a flood of data." "It has been estimated that the amount of information in the world doubles every 20 months." The Explosive Growth of Data: from terabytes to petabytes We are drowning in data, but starving for knowledge!
  • 6. Definition of Data Mining  Knowledge discovery in databases (data mining) is “The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”.
  • 7. Definition of Data Mining  Pattern is an arrangement of repeated parts.  In a data table, a pattern is defined as a set of rows that share the same values in two or more columns.  Consider for example, the following table that contains data about objects; shape, color, and weight.
  • 8. Definition of Data Mining WeightColorShapeRow # 100RedBox1-> 200RedBox2-> 300RedBox3-> 400BlueBox4 400BlueCone5 In this table, we have 3 rows (row 1, 2 and 3) that share the same values in two columns (Shape and Color). From this table, we can observe the following patterns: Most Boxes are Red. We can represent Pattern as rule: If Shape = Box then Color = Red.
  • 9. Definition of Data Mining  Valid: Discovered patterns should be true on new data with some degree of certainty. Generalize to the future (other data).  Novel: Patterns must be novel (should not be previously known).
  • 10. Definition of Data Mining  Actionable: patterns should potentially lead to some useful actions.  Understandable: Patterns must be made understandable in order to facilitate a better understanding of the underlying data.
  • 11. Definition of Data Mining Example: Credit Risk A credit risk is the risk of default on a debt that may arise from a borrower failing to make required payments. In the first resort, the risk is that of the lender and includes lost principal and interest, disruption to cash flows, and increased collection costs.
  • 12. Definition of Data Mining Is it valid? The pattern has to be valid with respect to a certainty level (rule true for the 86%) Is it novel? The value k should be previously unknown or obvious Is it useful? The pattern should provide information useful to the bank for assessing credit risk Is it understandable?
  • 13. Definition of Data Mining  Other definition of data mining: “Is the process of extracting knowledge hidden from large volumes of raw data. The knowledge must be new, not obvious, and must be able to use it”.
  • 14. Definition of Data Mining Many people treat data mining as a synonym for another popularly used term , knowledge Discovery in Databases, or KDD. Alternatively, other view data mining as simply an essential step in the process of knowledge discovery in databases.
  • 15. Definition of Data Mining What is Data Mining?What is not Data Mining? Certain names are more common in certain US locations (O’Brien, O’Rurke, O’Reilly … in Boston area) Look up phone number in phone directory Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) information about “Amazon” Query a Web search engine
  • 16. Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
  • 17.
  • 18. Introduction to Data Mining Mahmoud Rafeek Alfarra http://mfarra.cst.ps University College of Science & Technology- Khan yonis Development of computer systems 2016 Lecture 2
  • 19. Outline  Definition of Data Mining  Data Mining as an Interdisciplinary field  Process of Data Mining  Data Mining Tasks  Challenges of Data Mining  Data mining application examples  Introduction to RapidMiner
  • 20. Outline Definition of Data Mining Data Mining as an Interdisciplinary field Process of Data Mining Data Mining Tasks Challenges of Data Mining Data mining application examples Introduction to RapidMiner
  • 21. Data Mining as an Interdisciplinary field “Data mining is an interdisciplinary field bringing together techniques from machine learning, pattern recognition, statistics, databases, and visualization to address the issue of information extraction from large data bases”.
  • 22. Data Mining as an Interdisciplinary field Data Mining Database Technology Statistics Other Disciplines Artificial Intelligence Machine Learning Visualization
  • 23. Data Mining as an Interdisciplinary field Data mining is differ than statistics in kind of data (not only numerical) , kinds of methods ( mostly use machine learning methods), more than one hypotheses, amount of data (statistics uses samples).
  • 24. Data Mining as an Interdisciplinary field Data Mining uses methods from Machine Learning such as decision tree and neural nets.  Machine Learning uses samples and Data Mining uses whole data.  Data Mining can access data from database. Machine Learning some times used to replace human where Data Mining to help human.
  • 25. Data Mining as an Interdisciplinary field Databases part of Data Mining that provide the fast and reliable access to data. Databases used for data operation (Storing and retrieving data), Data Mining for Decision making.
  • 26. Data Mining as an Interdisciplinary field  Search techniques , Knowledge representation, Knowledge acquisition, maintenance and application are other branches of Artificial Intelligence which are highly related with Data Mining.
  • 27. Data Mining as an Interdisciplinary field Visualization is used to gain visual insights into the structure of the data.  Visualization is in large quantities used as a pre- and post-processing tool for data mining.
  • 28. Process of Data Mining  Data Mining is essentially a process of data drive extraction of not so obvious but useful information from large databases. The entire process is interactive and iterative.
  • 29. Process of Data Mining Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation.
  • 30. Data cleaning  Real-world data tends to be incomplete, noisy and inconsistent.  incomplete: lacking attribute values, lacking certain attributes of interest. ◦ e.g., occupation=“ ” (missing data)  noisy: containing noise, errors, or outliers ◦ e.g., Salary=“−10” (an error)  inconsistent: containing difference in codes or names, ◦ e.g., Age=“42” Birthday=“03/07/1997”
  • 31. Data Integration  Data integration is the merging of data from multiple sources.  These sources may include multiple databases, data cubes, or flat files.
  • 32. Data Selection  Where data relevant to the analysis task are retrieved from the database. Therefore, irrelevant, weakly relevant or redundant attributes may be detected and removed.
  • 33. Data Transformation  Where data are transformed or consolidated into forms appropriate for mining by performing:  Summary or aggregation operation, for example: Daily sales may be aggregated to monthly sales or annual sales. Generalization, for example:  City may be generalized to country or age may generalized to young, middle- age, senior.
  • 34. Data Mining  An essential process where intelligent methods are applied on data to covert it to knowledge in for decision making.  Wide range of methods can be used in data mining such neural nets, decision tree and Association.
  • 35. Pattern evaluation  To identify the truly interesting pattern based on some interestingness measures.  A pattern consider interesting if it is:  Valid  Novel Actionable Understandable
  • 36. Knowledge Representation  Knowledge presentation is the framework that converts a large amount of data into a particular data or procedure that human being can figure out based on an intention.  In Knowledge representation visualization tools and knowledge representation techniques are used to present the mined knowledge to the user.
  • 37.
  • 38. Introduction to Data Mining Mahmoud Rafeek Alfarra http://mfarra.cst.ps University College of Science & Technology- Khan yonis Development of computer systems 2016 Lecture 3
  • 39. Outline  Definition of Data Mining  Data Mining as an Interdisciplinary field  Process of Data Mining  Data Mining Tasks  Challenges of Data Mining  Data mining application examples  Introduction to RapidMiner
  • 40. Data Mining Tasks  Data mining tasks are the kind of data patterns that can be mined.  Data Mining functionalities are used to specify the kind of patterns to be found in the data mining tasks.
  • 41.  In general data mining tasks can be classified into two categories: Descriptive mining tasks characterize the general properties of the data. Predictive mining tasks perform inferences on the current data in order to make predictions. Data Mining Tasks
  • 42.  Most famous data mining tasks:  Classification [Predictive] Prediction [Predictive] Association Rules [Descriptive] Clustering [Descriptive] Outlier Analysis [Descriptive] Data Mining Tasks
  • 43. Classification  Classification is used for predictive mining tasks.  The input data for predictive modeling consists of two types of variables: Explanatory variables, which define the essential properties of the data.  Target variables , whose values are to be predicted.  Classification is used to predicate the value of discrete target variable.
  • 45. Prediction  Similar to classification, except we are trying to predict the value of a variable (e.g. amount of purchase), rather than a class (e.g. purchaser or non-purchaser).
  • 46. Association  Association Rules aims to find out the relationship among valuables in database, resulting in deferent types of rules.  Seek to produce a set of rules describing the set of features that are strongly related to each others.
  • 47. Association Gender Age Smoker LAD% RCA% F 52 Y 85 100 M 62 N 80 0 M 75 Y 70 80 M 73 Y 40 99 M 66 N 50 45 … … … … …  LAD%- The percentage of heat disease caused by left anterior descending coronary artery.  RCA%- The percentage of heat disease caused by right coronary artery. Original data from a research on heart disease
  • 48. Association Medical Association Rules NO. Rule 1 Gender=M∩Age≥70∩Smoker=YRCA%≥50(40%,100%) 2 Gender=F∩Age<70∩Smoker=YLAD%≥70(20%,100%)  Rule 1 indicates:40% of the cases are male, over 70 years old and have the habit of smoking, the possibility of RCA%≥50% is 100%  Rule 2 indicates:20% of the cases are female, under 70 years old and have the habit of smoking, the possibility of LAD%≥70% is 100%
  • 49. Clustering  Finds groups of data pointes (clusters) so that data points that belong to one cluster are more similar to each other than to data points belonging to different cluster.
  • 50. Clustering Document Clustering:  Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.
  • 51. Outlier Analysis  Discovers data points that are significantly different than the rest of the data. Such points are known as anomalies or outliers.
  • 52. Outline Definition of Data Mining Data Mining as an Interdisciplinary field Process of Data Mining Data Mining Tasks Challenges of Data Mining Data mining application examples Introduction to RapidMiner
  • 53. Challenges of Data Mining Scalability: Scalable techniques are needed to handle the massive scale of data. Dimensionality: Many applications may involves a large number of dimensions (e.g. features or attributes of data)
  • 54. Challenges of Data Mining Heterogeneous and Complex Data: In recent years complicated data types such as graph-based, text-free and structured data types are introduced. Techniques developed for data mining must be able to handle the heterogeneity of the data.
  • 55. Challenges of Data Mining Data Quality: Many data sets are imperfect due to present of missing values and noise un the data. To handle the imperfection, robust data mining algorithms must be developed.
  • 56. Challenges of Data Mining Data Distribution: As the volume of data increases , it is no longer possible or safe to keep all the data in the same place. As a result, the need for distributed data mining techniques has increased over the years.
  • 57. Challenges of Data Mining Privacy Preservation: While privacy intends to prevent the disclosure of information, data mining attempts to revel interesting knowledge about data. As a result, there is growing interest in developing privacy- preserving data mining algorithms.
  • 58. Outline Definition of Data Mining Data Mining as an Interdisciplinary field Process of Data Mining Data Mining Tasks Challenges of Data Mining Data mining application examples Introduction to RapidMine
  • 59. Data mining application Science astronomy, bioinformatics, drug discovery, … Business advertising, CRM (Customer Relationship management), investments, manufacturing, sports/entertainment, telecom, e- Commerce, targeted marketing, health care, … Web search engines, web mining,… Government law enforcement, profiling tax cheaters,