Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Próxima SlideShare
Cargando en…5
×

Data mining tasks are the kind of data patterns that can be mined.
Challenges of Data Mining
Data mining application

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Inicia sesión para ver los comentarios

1. 1. Introduction to Data Mining Mahmoud Rafeek Alfarra http://mfarra.cst.ps University College of Science & Technology- Khan yonis Development of computer systems 2016 Chapter 1 – Lecture 3
2. 2. Outline  Definition of Data Mining  Data Mining as an Interdisciplinary field  Process of Data Mining  Data Mining Tasks  Challenges of Data Mining  Data mining application examples  Introduction to RapidMiner
3. 3. Data Mining Tasks  Data mining tasks are the kind of data patterns that can be mined.  Data Mining functionalities are used to specify the kind of patterns to be found in the data mining tasks.
4. 4.  In general data mining tasks can be classified into two categories: Descriptive mining tasks characterize the general properties of the data. Predictive mining tasks perform inferences on the current data in order to make predictions. Data Mining Tasks
5. 5.  Most famous data mining tasks:  Classification [Predictive] Prediction [Predictive] Association Rules [Descriptive] Clustering [Descriptive] Outlier Analysis [Descriptive] Data Mining Tasks
6. 6. Classification  Classification is used for predictive mining tasks.  The input data for predictive modeling consists of two types of variables: Explanatory variables, which define the essential properties of the data.  Target variables , whose values are to be predicted.  Classification is used to predicate the value of discrete target variable.
7. 7. Classification
8. 8. Prediction  Similar to classification, except we are trying to predict the value of a variable (e.g. amount of purchase), rather than a class (e.g. purchaser or non-purchaser).
9. 9. Association  Association Rules aims to find out the relationship among valuables in database, resulting in deferent types of rules.  Seek to produce a set of rules describing the set of features that are strongly related to each others.
10. 10. Association Gender Age Smoker LAD% RCA% F 52 Y 85 100 M 62 N 80 0 M 75 Y 70 80 M 73 Y 40 99 M 66 N 50 45 … … … … …  LAD%－ The percentage of heat disease caused by left anterior descending coronary artery.  RCA%－ The percentage of heat disease caused by right coronary artery. Original data from a research on heart disease
11. 11. Association Medical Association Rules NO. Rule 1 Gender=M∩Age≥70∩Smoker=YRCA%≥50(40%,100%) 2 Gender=F∩Age＜70∩Smoker=YLAD%≥70(20%,100%)  Rule 1 indicates：40% of the cases are male, over 70 years old and have the habit of smoking, the possibility of RCA%≥50% is 100%  Rule 2 indicates：20% of the cases are female, under 70 years old and have the habit of smoking, the possibility of LAD%≥70% is 100%
12. 12. Clustering  Finds groups of data pointes (clusters) so that data points that belong to one cluster are more similar to each other than to data points belonging to different cluster.
13. 13. Clustering Document Clustering:  Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.
14. 14. Outlier Analysis  Discovers data points that are significantly different than the rest of the data. Such points are known as anomalies or outliers.
15. 15. Outline Definition of Data Mining Data Mining as an Interdisciplinary field Process of Data Mining Data Mining Tasks Challenges of Data Mining Data mining application examples Introduction to RapidMiner
16. 16. Challenges of Data Mining Scalability: Scalable techniques are needed to handle the massive scale of data. Dimensionality: Many applications may involves a large number of dimensions (e.g. features or attributes of data)
17. 17. Challenges of Data Mining Heterogeneous and Complex Data: In recent years complicated data types such as graph-based, text-free and structured data types are introduced. Techniques developed for data mining must be able to handle the heterogeneity of the data.
18. 18. Challenges of Data Mining Data Quality: Many data sets are imperfect due to present of missing values and noise un the data. To handle the imperfection, robust data mining algorithms must be developed.
19. 19. Challenges of Data Mining Data Distribution: As the volume of data increases , it is no longer possible or safe to keep all the data in the same place. As a result, the need for distributed data mining techniques has increased over the years.
20. 20. Challenges of Data Mining Privacy Preservation: While privacy intends to prevent the disclosure of information, data mining attempts to revel interesting knowledge about data. As a result, there is growing interest in developing privacy- preserving data mining algorithms.
21. 21. Outline Definition of Data Mining Data Mining as an Interdisciplinary field Process of Data Mining Data Mining Tasks Challenges of Data Mining Data mining application examples Introduction to RapidMine
22. 22. Data mining application Science astronomy, bioinformatics, drug discovery, … Business advertising, CRM (Customer Relationship management), investments, manufacturing, sports/entertainment, telecom, e- Commerce, targeted marketing, health care, … Web search engines, web mining,… Government law enforcement, profiling tax cheaters,