1. Data mining can be viewed as a result of the natural evolution of information
technology. data collection and database creation, data management (including
data storage and retrieval, and database transaction processing), and advanced data
analysis (involving data warehousing and data mining) are the main functionalities
of datamining.
Data mining is primarily used today by companies with a strong consumer
focus - retail, financial, communication, and marketing organizations. It enables
these companies to determine relationships among "internal" factors such as price,
product positioning, or staff skills, and "external" factors such as economic
indicators, competition, and customer demographics. And, it enables them to
determine the impact on sales, customer satisfaction, and corporate profits. Finally,
it enables them to "drill down" into summary information to view detail
transactional data.Data mining is the practice of automatically searching large
stores of data to discover patterns and trends that go beyond simple analysis. Data
mining uses sophisticated mathematical algorithms to segment the data and
evaluate the probability of future events. Data mining is also known as Knowledge
Discovery in Data (KDD). Dramatic advances in data capture, processing power,
data transmission, and storage capabilities are enabling organizations to integrate
their various databases into data warehouses.
Tool Used
The Konstanz Information Miner (KNIME)
The Konstanz Information Miner (KNIME) is being developed by the
Nycomed Chair for Bioinformatics and Information Mining at the University of
Konstanz since 2004. It Is a modern data analytics platform that allows to perform
2. sophisticated statistics and data mining on your data to analyze trends and predict
potential results. Its visual workbench combines data access, data transformation,
initial investigation, powerful predictive analytics and visualization. KNIME also
provides the ability to develop reports based on your information or automate the
application of new insight back into production systems. KNIME Desktop is open-
source .
The user can model work flows, which consist of nodes that process data
that is transported via connections between the nodes. A flow usually starts with a
node that reads in data from some data source, which are usually text files, but data
bases can also be queried by special nodes. Imported data is stored in an internal
table-based format consisting of columns with a certain data type (integer, string,
image, molecule, etc.) and an arbitrary number of rows conforming to the column
specifications.
Advantages
Each node stores its results permanently and thus work flow
execution can easily be stopped at any node and resumed later on.
Intermediate results can be inspected at any time and new nodes
can be inserted and may use already created data without preceding
nodes having to be re-executed.
The data tables are stored together with the work flow structure
and the nodes' settings.
Disadvantages
This concept is that preliminary results are not available as
soon as possible as if real pipeline were used.