3. What is Data Mining ?
• process of analyzing
data from different
perspectives
• summarizing it into
useful information
• information that can be
used to increase
revenue, cuts costs, or
both.
4. Analysis(cont…)
Data mining helps analysts recognize significant
• facts
• relationships
• trends
• patterns
• Exceptions
• anomalies
that might otherwise go unnoticed.
5. Industries Using Data Mining
• retail
• finance
• heath care
• manufacturing transportation
• aerospace
6. Major Data Mining Tasks
1)Classification: Predicting an item class
2)Clustering: descriptive, finding groups of
items
3)Deviation Detection: predictive, finding
changes
4)Forecasting: predicting a parameter value
5)Description: describing a group
6)Link analysis: finding relationships and
associations
7. Data Warehouse
A single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in a
what they can understand
and use in a business
context.
9. Decision Tree(classification algo.)
20 No Low
25 Yes High
44 Yes High
18 No Low
55 No High
35 No Low
Smoke
Age
Yes No
0-35 36 - 100
Insurance
Risk
High
High
Low
Age Smoke Risk
10. Decision tree advantages
• Its model is simple to understand and
interpret
• Requires little data preparation
• Possible to validate a model using
statistical tests.
• Robust
11. ORANGE SOFTWARE
Open source
Component based
data visualization
analysis for novice and
experts.
Data mining through visual
programming or Python
scripting.
Add-ons for bioinformatics
and text mining.
Packed with features for data
analytics.
12. Orange Developments
• In1997-developed in Bioinformatics Laboratory
of the Faculty of Computer and
Information Science, Slovenia.
• In 2005- extents data analysis
in bioinformatics
• In 2008- installation packages were developed.
• In 2009- over 100 widgets were created and
maintained.
13. Widgets ?
• Orange widgets provide a graphical user’s
interface to Orange’s data mining and
machine learning methods. They include
widgets for
• data entry and preprocessing
• data visualization,
• Classification
16. Examples
• Any of your schemas
should probably start
with the file widget. In
the schema below, the
widget is used to read
the data that is then
sent to both data
table widget and to
widget that
displays attributes
statistics.