3. *
*Performing data mining analysis on databases is very tough
because of the extensive volume of data.
*Attribute oriented analysis is one such technique.
*Here the analysis is done on the basis of attributes. Attributes
are selected and generalised. And the patterns of knowledge
ultimately formed are on the basis of attributes only.
*Attribute is a property or characteristic of an object. A
collection of attributes describes an object.
4. *Attribute generalisation is based on the following rule: “ if there is a
large set of distinct values for an attribute, then a generalisation
operator should be selected and applied to the attribute.”
*Nominal attributes: The operation defines a sub-cube by performing a
selection on two or more dimensions.
*Structured attributes: Climbing up concept hierarchy is used.
Replacing a value in an attribute value pair with a more general one.
The operation performs aggregation on data cube, either by climbing
up a concept hierarchy for a dimension or by dimension reduction.
5. *
*The general idea behind attribute relevance analysis
is to compute some measure which is used to
quantify the relevance of an attribute with respect to
given class or concept.
6. *
*Attribute selection is a term commonly used in data
mining to describe the tools and techniques available
for reducing inputs to a manageable size for
processing and analysis.
*Attribute selection implies not only cardinality
reduction but also the choice of attributes based on
their usefulness for analysis.
7. *
*Find a subset of attributes that is most likely to
describe/predict the class best. The following method
may be used:
*Filtering: Filter type methods select variables
regardless of the model. Filter methods suppress the
least interesting variables. These methods are
particularly effective in computation time and robust
to over fitting.
8. *
*Instance Based Filters: The goal of the instance-
based search is to find the closest decision boundary
to the instance under consideration and assign weight
to the features that bring about the change.
9. *
*In many applications, users may not be interested in
having a single class described or characterised, but
rather would prefer to mine a description that
compares or distinguishes one class from other
comparable classes. Class comparison mines
descriptions that distinguish a target class from its
contrasting classes.
10. *The general procedure for class comparison is as follows:
*Data Collection: The set of relevant data in the database is
collected by query processing and is partitioned respectively
into a target class and one or a set of contrasting class.
*Dimension relevance analysis: If there are many dimensions
and analytical comparisons is desired, then dimension
relevance analysis should be performed on these classes and
only the highly relevant dimensions are included in the further
analysis.
*Synchronous generalization: Generalization is performed on
the target class to the level controlled by a user-or expert-
specified dimension threshold, which results in a prime target
class relation.
11. *Presentation of the derived comparison: The
resulting class comparison description can be
visualized in the form of tables, graphs, and rules.
This presentation usually includes a “contrasting”
measure (such as count %)that reflects the
comparisons between the target and contrasting
classes.
12. *
*The descriptive statistics are of great help in
understanding the distribution of the data. They help
us choose an effective implementation.
13. *
*Arithmetic mean is the sum of a collection of
numbers divided by the number of numbers in the
collection.
*Median: Median is the number separating the higher
half of a data sample.
*Mode: mode is the value that appears most often in a
set of data.
14. *
*Variance (σ): variance measures how far a set of
numbers is spread out.
*Standard deviation (σ 2 ): standard deviation is a
measure that is used to quantify the amount of
variation or dispersion of a set of data values.