SlideShare una empresa de Scribd logo
1 de 28
An Efficient Two-Step Method for
Classification of Spatial Data
Authors : Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic
Presented on : Spatial Data Handling (SDH’ 98)
Reviewed by: Abhishek Agrawal
Introduction
• In spatial databases very large amounts of Spatial Data have been collected used in various
applications ranging from remote sensing to geographical information systems (GIS), computer
cartography, environmental assessment and planning etc.
• These spatial databases contains many hidden and interesting implicit spatial relations and
patterns which are extracted which are not explicitly stored in such databases.
• One of the spatial data mining techniques is the classification of the spatial objects stored in the
spatial databases where the objective is to label different spatial objects by identifying set of rules
that can describe the partition.
Classification Approach : Spatial Decision Tree
❖ In this paper[1], authors have used decision tree to classify spatial objects based on
➢ Non-Spatial properties of the classified objects (Traditional)
➢ Spatial relations of the classified objects to other objects in the database
❖ Also, authors have analyzed the problem of classification of spatial objects in relevance to
thematic maps and and spatial relationships to other objects in the database.
❖ With the new approach of spatial classification using decision tree, authors provided the
experimental results of both real and synthetic data to compare the performance and quality of the
results with other existing methods in the same problem space.
Business Problem: Label the local business units such as shopping malls
or stores based on their business profit status based on the influence of
their trade area.
Problem Definition
Problem Definition Continue..
Data Mining Problem: Classification of spatial objects such as shopping
malls or stores defined by its attributes, that belong to two or different
classes Y and N which are selected based on attribute high_profit with two
values Y for “yes” and N for “no”.
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
➢ non-spatial attributes of the
thematic map
State of the Art
● Fayyad et. al.[2] used decision tree methods to classify images of stellar objects to detect stars
and galaxies. They used low-level image processing system FOCAS to select and generate basic
attributes. The proposed method deals with image databases and is tailored for the astronomical
attributes which is not suitable for vector data format (GIS Database) .
● Another approach, Ester et. al.[3], based on ID3 algorithm and uses the concept of neighbourhood
graphs. This method doesn’t analyze aggregate values of non-spatial attributes for the
neighbouring objects. Similarly it doesn’t perform any relevance analysis for narrowing its search
space.
● Ng and Yu[4] described a method for the extraction of strong, common and discriminating
characteristics of clusters based on the thematic map. They have not extended the result
characteristics of thematic map to construct decision trees.
Classification Algorithm
Building a decision tree to classify spatial object based on spatial predicates, functions and
thematic maps.
Input :
1. Spatial Database containing:
a. classified objects Oc
b. other spatial objects with non-spatial attributes
2. Geo-mining query specifying:
a. objects to be used, predictive attributes, predicates and functions
b. attribute, predicate or function used as a class label
Output :
Binary Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step1.a : Define MBR(Minimum Bounding Rect.)
using data distribution and confidence
level as threshold.
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step 2.a : Find coarse description for the sample to
list the spatial attributes, functions etc.
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
Step 2.b : Generalize the predicates using concept
hierarchies
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : For every object s in the sample two nearest neighbours are found,
where one neighbour belongs to the same class(Y/N) as object s (nearest hit)
and other neighbour belongs to a class different than s (nearest miss)
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : Give weights to the predicate based on neighbourhood predicates:
➔ For nearest hit, if it has the same predicate value, then weight for this predicate increases ↑
➔ For nearest hit, if it has the different predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the same predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the different predicate value, then weight for this predicate increases ↑
Now based on weight > threshold, we select the relevant predicates
Method: Spatial Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Now for the shape of the buffer, different criteria
may be used. The buffers may be based on
rings or customer penetration polygons.
• The rings have some advantages:
1. ease of use,
2. no need to determine trade area based on
customer data
1. easy comparison between sites
Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Buffers represents area that have an impact
on class label attribute of classified objects.
• The size of buffer is fixed by finding for all
relevant non-spatial aggregate attributes,
the size of the buffer Xmax where the
information gain for the aggregated attribute
is maximum.
Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
Method: Spatial Decision Tree
Step 5 : Build Decision Tree
Method: Spatial Decision Tree
Step 5 : Build Decision Tree : Binary Split ( Based on Info gain )
Complexity Analysis
Complexity Analysis:
Results & Performance Evaluation
• Experiments were performed on synthetic data merge with TIGER U.S. census data for
washington state.
• With real data, best results were found with threshold between 0 to 0.2 and accuracy drastically
increased when relevance analysis was used.
Conclusion and Future Directions
• Classification of geographical objects enables researcher to explore
interesting relations between spatial and non-spatial data.
• The algorithm performs less costly, approximate spatial computations,
relevance analyses for producing smaller and more accurate decision trees.
• The pre-computed spatial indexes can be stored as part of regular spatial
query to find neighbourhood attributes.
• Authors plan to perform experiments using aggregate values for thematic
maps and by varying distance for close_to spatial predicates.
• Integrate with their spatial data mining prototype GeoMiner
References
[1] Koperski, Krzysztof, Jiawei Han, and Nebojsa Stefanovic. "An efficient two-step method for
classification of spatial data." proceedings of International Symposium on Spatial Data Handling
(SDH’98). 1998.
[2] Fayyad, Usama M., S. George Djorgovski, and Nicholas Weir. "Automating the analysis and
cataloging of sky surveys." Advances in knowledge discovery and data mining. American
Association for Artificial Intelligence, 1996.
[3] Ester, Martin, Hans-Peter Kriegel, and Jörg Sander. "Spatial data mining: A database approach."
Advances in spatial databases. Springer Berlin Heidelberg, 1997.
[4] Ng, R. T., and Y. Yu Discovering Strong. "Common and Discriminating Characteristics of Clusters
from Thematic Maps." Proc. of the 11th Annual Symp. on Geographic Information Systems. 1997.

Más contenido relacionado

La actualidad más candente (11)

My8clst
My8clstMy8clst
My8clst
 
What is cluster analysis
What is cluster analysisWhat is cluster analysis
What is cluster analysis
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Data clustering
Data clustering Data clustering
Data clustering
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Lect4
Lect4Lect4
Lect4
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 

Destacado

Customer Centric Data Mining
Customer Centric Data MiningCustomer Centric Data Mining
Customer Centric Data Mining
anjeshdubey
 
Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network
Iman Ardekani
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
stellajoseph
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 

Destacado (18)

Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
Svm implementation for Health Data
Svm implementation for Health DataSvm implementation for Health Data
Svm implementation for Health Data
 
Classification ANN
Classification ANNClassification ANN
Classification ANN
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Customer Centric Data Mining
Customer Centric Data MiningCustomer Centric Data Mining
Customer Centric Data Mining
 
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
 
Decision tree
Decision treeDecision tree
Decision tree
 
Back propagation
Back propagationBack propagation
Back propagation
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 

Similar a Two-step Classification method for Spatial Decision Tree

Similar a Two-step Classification method for Spatial Decision Tree (20)

DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
Virtual Enterprise Model
Virtual Enterprise ModelVirtual Enterprise Model
Virtual Enterprise Model
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
47 292-298
47 292-29847 292-298
47 292-298
 
Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
 
Automated features extraction from satellite images.
Automated features extraction from satellite images.Automated features extraction from satellite images.
Automated features extraction from satellite images.
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
lab report 4
lab report 4lab report 4
lab report 4
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real Images
 
Object Oriented Programming_Lecture 2
Object Oriented Programming_Lecture 2Object Oriented Programming_Lecture 2
Object Oriented Programming_Lecture 2
 
Unit3
Unit3Unit3
Unit3
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
I1803026164
I1803026164I1803026164
I1803026164
 
advancedR.pdf
advancedR.pdfadvancedR.pdf
advancedR.pdf
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 

Two-step Classification method for Spatial Decision Tree

  • 1. An Efficient Two-Step Method for Classification of Spatial Data Authors : Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic Presented on : Spatial Data Handling (SDH’ 98) Reviewed by: Abhishek Agrawal
  • 2. Introduction • In spatial databases very large amounts of Spatial Data have been collected used in various applications ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning etc. • These spatial databases contains many hidden and interesting implicit spatial relations and patterns which are extracted which are not explicitly stored in such databases. • One of the spatial data mining techniques is the classification of the spatial objects stored in the spatial databases where the objective is to label different spatial objects by identifying set of rules that can describe the partition.
  • 3. Classification Approach : Spatial Decision Tree ❖ In this paper[1], authors have used decision tree to classify spatial objects based on ➢ Non-Spatial properties of the classified objects (Traditional) ➢ Spatial relations of the classified objects to other objects in the database ❖ Also, authors have analyzed the problem of classification of spatial objects in relevance to thematic maps and and spatial relationships to other objects in the database. ❖ With the new approach of spatial classification using decision tree, authors provided the experimental results of both real and synthetic data to compare the performance and quality of the results with other existing methods in the same problem space.
  • 4. Business Problem: Label the local business units such as shopping malls or stores based on their business profit status based on the influence of their trade area. Problem Definition
  • 5. Problem Definition Continue.. Data Mining Problem: Classification of spatial objects such as shopping malls or stores defined by its attributes, that belong to two or different classes Y and N which are selected based on attribute high_profit with two values Y for “yes” and N for “no”.
  • 6. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N.
  • 7. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N. ● We want to build a decision tree classifying objects Oi based on two types of information: ➢ descriptions of the objects in the proximity of objects Oi
  • 8. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N. ● We want to build a decision tree classifying objects Oi based on two types of information: ➢ descriptions of the objects in the proximity of objects Oi ➢ non-spatial attributes of the thematic map
  • 9. State of the Art ● Fayyad et. al.[2] used decision tree methods to classify images of stellar objects to detect stars and galaxies. They used low-level image processing system FOCAS to select and generate basic attributes. The proposed method deals with image databases and is tailored for the astronomical attributes which is not suitable for vector data format (GIS Database) . ● Another approach, Ester et. al.[3], based on ID3 algorithm and uses the concept of neighbourhood graphs. This method doesn’t analyze aggregate values of non-spatial attributes for the neighbouring objects. Similarly it doesn’t perform any relevance analysis for narrowing its search space. ● Ng and Yu[4] described a method for the extraction of strong, common and discriminating characteristics of clusters based on the thematic map. They have not extended the result characteristics of thematic map to construct decision trees.
  • 10. Classification Algorithm Building a decision tree to classify spatial object based on spatial predicates, functions and thematic maps. Input : 1. Spatial Database containing: a. classified objects Oc b. other spatial objects with non-spatial attributes 2. Geo-mining query specifying: a. objects to be used, predictive attributes, predicates and functions b. attribute, predicate or function used as a class label Output : Binary Decision Tree
  • 11. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm 3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain for the aggregated attribute is maximum. 4. Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies. 5. Generate Decision Tree
  • 12. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description
  • 13. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. Step1.a : Define MBR(Minimum Bounding Rect.) using data distribution and confidence level as threshold.
  • 14. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. Step 2.a : Find coarse description for the sample to list the spatial attributes, functions etc.
  • 15. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies Step 2.b : Generalize the predicates using concept hierarchies
  • 16. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm RELIEF ALGORITHM Find Relevant Attributes Step 2.c : For every object s in the sample two nearest neighbours are found, where one neighbour belongs to the same class(Y/N) as object s (nearest hit) and other neighbour belongs to a class different than s (nearest miss)
  • 17. 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm RELIEF ALGORITHM Find Relevant Attributes Step 2.c : Give weights to the predicate based on neighbourhood predicates: ➔ For nearest hit, if it has the same predicate value, then weight for this predicate increases ↑ ➔ For nearest hit, if it has the different predicate value, then weight for this predicate decreases ↓ ➔ For nearest miss, if it has the same predicate value, then weight for this predicate decreases ↓ ➔ For nearest miss, if it has the different predicate value, then weight for this predicate increases ↑ Now based on weight > threshold, we select the relevant predicates Method: Spatial Decision Tree
  • 18. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm 3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain for the aggregated attribute is maximum. 4. Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies. 5. Generate Decision Tree
  • 19. Method: Spatial Decision Tree Step 3: Find the best size for the buffer for aggregates of thematic map polygons. • Now for the shape of the buffer, different criteria may be used. The buffers may be based on rings or customer penetration polygons. • The rings have some advantages: 1. ease of use, 2. no need to determine trade area based on customer data 1. easy comparison between sites
  • 20. Method: Spatial Decision Tree Step 3: Find the best size for the buffer for aggregates of thematic map polygons. • Buffers represents area that have an impact on class label attribute of classified objects. • The size of buffer is fixed by finding for all relevant non-spatial aggregate attributes, the size of the buffer Xmax where the information gain for the aggregated attribute is maximum.
  • 21. Method: Spatial Decision Tree Step 4 : Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies.
  • 22. Method: Spatial Decision Tree Step 4 : Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies.
  • 23. Method: Spatial Decision Tree Step 5 : Build Decision Tree
  • 24. Method: Spatial Decision Tree Step 5 : Build Decision Tree : Binary Split ( Based on Info gain )
  • 26. Results & Performance Evaluation • Experiments were performed on synthetic data merge with TIGER U.S. census data for washington state. • With real data, best results were found with threshold between 0 to 0.2 and accuracy drastically increased when relevance analysis was used.
  • 27. Conclusion and Future Directions • Classification of geographical objects enables researcher to explore interesting relations between spatial and non-spatial data. • The algorithm performs less costly, approximate spatial computations, relevance analyses for producing smaller and more accurate decision trees. • The pre-computed spatial indexes can be stored as part of regular spatial query to find neighbourhood attributes. • Authors plan to perform experiments using aggregate values for thematic maps and by varying distance for close_to spatial predicates. • Integrate with their spatial data mining prototype GeoMiner
  • 28. References [1] Koperski, Krzysztof, Jiawei Han, and Nebojsa Stefanovic. "An efficient two-step method for classification of spatial data." proceedings of International Symposium on Spatial Data Handling (SDH’98). 1998. [2] Fayyad, Usama M., S. George Djorgovski, and Nicholas Weir. "Automating the analysis and cataloging of sky surveys." Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, 1996. [3] Ester, Martin, Hans-Peter Kriegel, and Jörg Sander. "Spatial data mining: A database approach." Advances in spatial databases. Springer Berlin Heidelberg, 1997. [4] Ng, R. T., and Y. Yu Discovering Strong. "Common and Discriminating Characteristics of Clusters from Thematic Maps." Proc. of the 11th Annual Symp. on Geographic Information Systems. 1997.