SlideShare una empresa de Scribd logo
1 de 51
DATA WAREHOUSING AND
DATA MINING
UNIT – 5
Prepared by
Mr. P. Nandakumar
Assistant Professor,
Department of IT, SVCET
UNIT-V
Mining Object, Spatial, Multimedia, Text, and Web Data:
Multidimensional Analysis and Descriptive Mining of Complex
Data Objects – Spatial Data Mining – Multimedia Data Mining –
Text Mining – Mining the World Wide Web.
Mining Object, Spatial, Multimedia,
Text, and Web Data
• In general terms, “Mining” is the process of extraction. In the context of computer science, Data
Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern
analysis, data archaeology, and data dredging.
• There are other kinds of data like semi-structured or unstructured data which includes spatial data,
multimedia data, text data, web data which require different methodologies for data mining.
• Mining Multimedia Data: Multimedia data objects include image data, video data, audio data,
website hyperlinks, and linkages. Multimedia data mining tries to find out interesting patterns from
multimedia databases.
• This includes the processing of the digital data and performs tasks like image processing, image
classification, video, and audio data mining, and pattern recognition.
Mining Object, Spatial, Multimedia,
Text, and Web Data
• Multimedia Data mining is becoming the most interesting research area because most of the social
media platforms like Twitter, Facebook data can be analyzed through this and derive interesting
trends and patterns.
• Mining Web Data: Web mining is essential to discover crucial patterns and knowledge from the
Web. Web content mining analyzes data of several websites which includes the web pages and the
multimedia data such as images in the web pages.
• Web mining is done to understand the content of web pages, unique users of the website, unique
hypertext links, web page relevance and ranking, web page content summaries, time that the users
spent on the particular website, and understand user search patterns. Web mining also finds out the
best search engine and determines the search algorithm used by it. So it helps improve search
efficiency and finds the best search engine for the users.
Mining Object, Spatial, Multimedia,
Text, and Web Data
• Mining Text Data: Text mining is the subfield of data mining, machine learning, Natural Language
processing, and statistics. Most of the information in our daily life is stored as text such as news
articles, technical papers, books, email messages, blogs.
• Text Mining helps us to retrieve high-quality information from text such as sentiment analysis,
document summarization, text categorization, text clustering. We apply machine learning models and
NLP techniques to derive useful information from the text.
• This is done by finding out the hidden patterns and trends by means such as statistical pattern
learning and statistical language modeling.
• In order to perform text mining, we need to preprocess the text by applying the techniques of
stemming and lemmatization in order to convert the textual data into data vectors.
Mining Object, Spatial, Multimedia,
Text, and Web Data
• Mining Spatiotemporal Data: The data that is related to both space and time is Spatiotemporal
data. Spatiotemporal data mining retrieves interesting patterns and knowledge from spatiotemporal
data.
• Spatiotemporal Data mining helps us to find the value of the lands, the age of the rocks and precious
stones, predict the weather patterns.
• Spatiotemporal data mining has many practical applications like GPS in mobile phones, timers,
Internet-based map services, weather services, satellite, RFID, sensor.
Mining Object, Spatial, Multimedia,
Text, and Web Data
• Mining Data Streams: Stream data is the data that can change dynamically and it is noisy,
inconsistent which contain multidimensional features of different data types. So this data is stored in
NoSql database systems.
• NoSQL Database is used to refer a non-SQL or non relational database. It provides a mechanism for
storage and retrieval of data other than tabular relations model used in relational databases. NoSQL
database doesn't use tables for storing data. It is generally used to store big data and real-time web
applications.
• The volume of the stream data is very high and this is the challenge for the effective mining of stream
data. While mining the Data Streams we need to perform the tasks such as clustering, outlier
analysis, and the online detection of rare events in data streams.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
• Multidimensional analysis is the analysis of dimension objects organized in meaningful hierarchies.
Multidimensional analysis allows users to observe data from various viewpoints. This enables them
to spot trends or exceptions in the data. A hierarchy is an ordered series of related dimensions.
• Multidimensional analysis gives the ability to view data from different viewpoints. This is especially
critical for business.
• In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis
process that groups data into two categories: data dimensions and measurements.
• The multidimensional data model is composed of logical cubes, measures, dimensions, hierarchies,
levels, and attributes. The simplicity of the model is inherent because it defines objects that represent
real-world business entities.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
• The Complex data types require advanced data mining techniques. Some of the Complex data types
are sequence Data which includes the Time-Series, Symbolic Sequences, and Biological Sequences.
The additional preprocessing steps are needed for data mining of these complex data types.
1. Time-Series Data Mining:
• In time-series data, data is measured as the long series of the numerical or textual data at equal time
intervals per minute, per hour, or per day. Time-series data mining is performed on the data obtained
from the stock markets, scientific data, and medical data. In time series mining it is not possible to
find the data that exactly matches the given query. We employ the similarity search method that finds
the data sequences that are similar to the given query string. In the similarity search method,
subsequence matching is performed to find the subsequences that are similar to a given query string.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
2. Sequential Pattern Mining in Symbolic Sequences:
• Symbolic sequences are composed of long nominal data sequences, which dynamically change their
behavior over time intervals. Examples of the Symbolic Sequences include online customer
shopping sequences as well as sequences of events of experiments. Mining of Symbolic Sequences is
called Sequential Mining. A sequential pattern is a subsequence that exists more frequently in a set
of sequences. so it finds the most frequent subsequence in a set of sequences to perform the mining.
Many scalable algorithms have been built to find out the frequent subsequence. There are also
algorithms to mine the multidimensional and multilevel sequential patterns.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
3. Data mining of Biological Sequences:
• Biological sequences are the long sequences of nucleotides and data mining of biological sequences
is required to find the features of the DNA of humans. Biological sequence analysis is the first step of
data mining to compare the alignment of the biological sequences. Two species are similar to each
other only if their nucleotide (DNA, RNA) and protein sequences are close and similar. During the
data mining of Biological Sequences, the degree of similarity between nucleotide sequences is
measured. The degree of similarity obtained by sequence alignment of nucleotides is essential in
determining the homology between two sequences. There can be the situation of alignment of two or
more input biological sequences by identifying similar sequences with long subsequences. The amino
acids also called proteins sequences are also compared and aligned.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
4. Graph Pattern Mining:
• Graph Pattern Mining can be done by using Apriori-based and pattern growth-based approaches. We
can mine the subgraphs of the graph and the set of closed graphs. A closed graph g is the graph that
doesn’t have a super graph that carries the same support count as g. Graph Pattern Mining is applied
to different types of graphs such as frequent graphs, coherent graphs, and dense graphs. We can also
improve the mining efficiency by applying the user constraints on the graph patterns. Graph patterns
are two types. Homogeneous graphs where nodes or links of the graph are of the same type by having
similar features. In Heterogeneous graph patterns, the nodes and links are of different types.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
5. Statistical Modeling of Networks:
• A network is a collection of nodes where each node represents the data and the nodes are linked
through edges, representing relationships between data objects. If all the nodes and links connecting
the nodes are of the same type, then the network is homogeneous such as a friend network or a web
page network. If the nodes and links connecting the nodes are of different types, then the network is
heterogeneous such as health-care networks (linking the different parameters such as doctors, nurses,
patients, diseases together in the network). Graph Pattern Mining can be further applied to the
network to derive the knowledge and useful patterns from the network.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
6. Mining Spatial Data:
• Spatial data is the geo space-related data that is stored in large data repositories. The spatial data is
represented in “vector” format and geo-referenced multimedia format. A spatial database is
constructed from large geographic data warehouses by integrating geographical data of multiple
sources of areas. we can construct spatial data cubes that contain information about the spatial
dimensions and measures. It is possible to perform the OLAP operations on the spatial data for
spatial data analysis. Spatial data mining is performed on spatial data warehouses, spatial databases,
and other geospatial data repositories. Spatial Data mining discovers knowledge about the
geographic areas. The preprocessing of spatial data involves several operations like spatial
clustering, spatial classification, spatial modeling, and outlier detection in spatial data.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
7. Mining Cyber-Physical System Data:
• Cyber-Physical System Data can be mined by constructing a graph or network of data. A cyber-
physical system (CPS) is a heterogeneous network that consists of a large number of interconnected
nodes that store patients or medical information. The links in the CPS network represent the
relationship between the nodes . cyber-physical systems store dynamic, inconsistent, and
interdependent data that contains spatiotemporal information. Mining cyber-physical data links the
situation as a query to access the data from a large information database and it involves real-time
calculations and analysis to prompt responses from the CPS system. CPS analysis requires rare-event
detection and anomaly analysis in cyber-physical data streams, in cyber-physical networks, and the
processing of Cyber-Physical Data involves the integration of stream data with real-time automated
control processes.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
8. Mining Multimedia Data:
• Multimedia data objects include image data, video data, audio data, website hyperlinks, and linkages.
Multimedia data mining tries to find out interesting patterns from multimedia databases. This
includes the processing of the digital data and performs tasks like image processing, image
classification, video, and audio data mining, and pattern recognition. Multimedia Data mining is
becoming the most interesting research area because most of the social media platforms like Twitter,
Facebook data can be analyzed through this and derive interesting trends and patterns.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
9. Mining Web Data:
• Web mining is essential to discover crucial patterns and knowledge from the Web. Web content
mining analyzes data of several websites which includes the web pages and the multimedia data such
as images in the web pages. Web mining is done to understand the content of web pages, unique
users of the website, unique hypertext links, web page relevance and ranking, web page content
summaries, time that the users spent on the particular website, and understand user search patterns.
Web mining also finds out the best search engine and determines the search algorithm used by it. So
it helps improve search efficiency and finds the best search engine for the users.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
10. Mining Text Data:
• Text mining is the subfield of data mining, machine learning, Natural Language processing, and
statistics. Most of the information in our daily life is stored as text such as news articles, technical
papers, books, email messages, blogs. Text Mining helps us to retrieve high-quality information from
text such as sentiment analysis, document summarization, text categorization, text clustering. We
apply machine learning models and NLP techniques to derive useful information from the text. This
is done by finding out the hidden patterns and trends by means such as statistical pattern learning and
statistical language modeling. In order to perform text mining, we need to preprocess the text by
applying the techniques of stemming and lemmatization in order to convert the textual data into data
vectors.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
11. Mining Spatiotemporal Data:
• The data that is related to both space and time is Spatiotemporal data. Spatiotemporal data mining
retrieves interesting patterns and knowledge from spatiotemporal data. Spatiotemporal Data mining
helps us to find the value of the lands, the age of the rocks and precious stones, predict the weather
patterns. Spatiotemporal data mining has many practical applications like GPS in mobile phones,
timers, Internet-based map services, weather services, satellite, RFID, sensor.
Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
12. Mining Data Streams:
• Stream data is the data that can change dynamically and it is noisy, inconsistent which contain
multidimensional features of different data types. So this data is stored in NoSql database systems.
The volume of the stream data is very high and this is the challenge for the effective mining of stream
data. While mining the Data Streams we need to perform the tasks such as clustering, outlier
analysis, and the online detection of rare events in data streams.
Spatial Data Mining
A spatial database saves a huge amount of space-related data, including maps, preprocessed remote
sensing or medical imaging records, and VLSI chip design data. Spatial databases have several features
that distinguish them from relational databases.
They carry topological and/or distance information, usually organized by sophisticated,
multidimensional spatial indexing structures that are accessed by spatial data access methods and often
require spatial reasoning, geometric computation, and spatial knowledge representation techniques.
Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting
patterns not explicitly stored in spatial databases. Such mining demands the unification of data mining
with spatial database technologies. It can be used for learning spatial records, discovering spatial
relationships and relationships among spatial and nonspatial records, constructing spatial knowledge
bases, reorganizing spatial databases, and optimizing spatial queries.
Spatial Data Mining
It is expected to have broad applications in geographic data systems, marketing, remote sensing, image
database exploration, medical imaging, navigation, traffic control, environmental studies, and many
other areas where spatial data are used.
A central challenge to spatial data mining is the exploration of efficient spatial data mining techniques
because of the large amount of spatial data and the difficulty of spatial data types and spatial access
methods. Statistical spatial data analysis has been a popular approach to analyzing spatial data and
exploring geographic information.
The term geostatistics is often associated with continuous geographic space, whereas the term spatial
statistics is often associated with discrete space. In a statistical model that manages non-spatial records,
one generally considers statistical independence among different areas of data.
Spatial Data Mining
There is no such separation among spatially distributed records because, actually spatial objects are
interrelated, or more exactly spatially co-located, in the sense that the closer the two objects are placed,
the more likely they send the same properties. For example, natural resources, climate, temperature, and
economic situations are likely to be similar in geographically closely located regions.
Such a property of close interdependency across nearby space leads to the notion of spatial
autocorrelation. Based on this notion, spatial statistical modeling methods have been developed with
success. Spatial data mining will create spatial statistical analysis methods and extend them for large
amounts of spatial data, with more emphasis on effectiveness, scalability, cooperation with database
and data warehouse systems, enhanced user interaction, and the discovery of new kinds of knowledge.
Multimedia Data Mining
Multimedia mining is a subfield of data mining that is used to find interesting information of implicit
knowledge from multimedia databases. Mining in multimedia is referred to as automatic annotation or
annotation mining. Mining multimedia data requires two or more data types, such as text and video or
text video and audio.
Multimedia data mining is an interdisciplinary field that integrates image processing and understanding,
computer vision, data mining, and pattern recognition. Multimedia data mining discovers interesting
patterns from multimedia databases that store and manage large collections of multimedia objects,
including image data, video data, audio data, sequence data and hypertext data containing text, text
markups, and linkages. Issues in multimedia data mining include content-based retrieval and similarity
search, generalization and multidimensional analysis. Multimedia data cubes contain additional
dimensions and measures for multimedia information.
Multimedia Data Mining
The framework that manages different types of multimedia data stored, delivered, and utilized in
different ways is known as a multimedia database management system. There are three classes of
multimedia databases: static, dynamic, and dimensional media. The content of the Multimedia Database
management system is as follows:
• Media data:The actual data representing an object.
• Media format data: Information such as sampling rate, resolution, encoding scheme etc., about the
format of the media data after it goes through the acquisition, processing and encoding phase.
• Media keyword data:Keywords description relating to the generation of data. It is also known as
content descriptive data. Example: date, time and place of recording.
• Media feature data: Content dependent data such as the distribution of colours, kinds of texture and
different shapes present in data.
Multimedia Data Mining
Types of multimedia applications based on data management characteristics are:
1. Repository applications: A Large amount of multimedia data and meta-data (Media format date,
Media keyword data, Media feature data) that is stored for retrieval purposes, e.g., Repository of
satellite images, engineering drawings, radiology scanned pictures.
2. Presentation applications: They involve delivering multimedia data subject to temporal
constraints. Optimal viewing or listening requires DBMS to deliver data at a certain rate, offering
the quality of service above a certain threshold. Here data is processed as it is delivered. Example:
Annotating of video and audio data, real-time editing analysis.
3. Collaborative work using multimedia information involves executing a complex task by
merging drawings and changing notifications. Example: Intelligent healthcare network.
Multimedia Data Mining
Challenges with Multimedia Database:
1. Modelling: Working in this area can improve database versus information retrieval techniques; thus,
documents constitute a specialized area and deserve special consideration.
2. Design: The conceptual, logical and physical design of multimedia databases has not yet been
addressed fully as performance and tuning issues at each level are far more complex as they consist of
a variety of formats like JPEG, GIF, PNG, MPEG, which is not easy to convert from one form to
another.
3. Storage: Storage of multimedia database on any standard disk presents the problem of representation,
compression, mapping to device hierarchies, archiving and buffering during input-output operation. In
DBMS, a BLOB (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.
4. Performance: Physical limitations dominate an application involving video playback or audio-video
synchronization. The use of parallel processing may alleviate some problems, but such techniques are
not yet fully developed. Apart from this, a multimedia database consumes a lot of processing time and
bandwidth.
5. Queries and retrieval: For multimedia data like images, video, and audio accessing data through
query open up many issues like efficient query formulation, query execution and optimization, which
need to be worked upon.
Multimedia Data Mining
Where is Multimedia Database Applied?
Below are the following areas where a multimedia database is applied, such as:
• Documents and record management: Industries and businesses keep detailed records and various
documents. For example, insurance claim records.
• Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in
terms of providing several resources. For example, electronic books.
• Education and training: Computer-aided learning materials can be designed using multimedia sources
which are nowadays very popular sources of learning. Example: Digital libraries.
• Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of
cities.
• Real-time control and monitoring: With active database technology, multimedia presentation of
information can effectively monitor and control complex tasks. For example, manufacturing operation
control.
Multimedia Data Mining
Where is Multimedia Database Applied?
Below are the following areas where a multimedia database is applied, such as:
• Documents and record management: Industries and businesses keep detailed records and various
documents. For example, insurance claim records.
• Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in
terms of providing several resources. For example, electronic books.
• Education and training: Computer-aided learning materials can be designed using multimedia sources
which are nowadays very popular sources of learning. Example: Digital libraries.
• Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of
cities.
• Real-time control and monitoring: With active database technology, multimedia presentation of
information can effectively monitor and control complex tasks. For example, manufacturing operation
control.
Multimedia Data Mining
Categories of Multimedia Data Mining
Multimedia mining refers to analyzing a large amount of multimedia information to extract patterns based
on their statistical relationships. Multimedia data mining is classified into two broad categories: static and
dynamic media. Static media contains text (digital library, creating SMS and MMS) and images (photos
and medical images). Dynamic media contains Audio (music and MP3 sounds) and Video (movies). The
below image shows the categories of multimedia data mining.
Multimedia Data Mining
Application of Multimedia Mining
Multimedia Data Mining
Process of Multimedia Data Mining
Multimedia Data Mining
Architecture for Multimedia Data Mining
Text Mining in Data Mining
Text mining is a component of data mining that deals specifically with unstructured text data. It involves
the use of natural language processing (NLP) techniques to extract useful information and insights from
large amounts of unstructured text data. Text mining can be used as a preprocessing step for data mining or
as a standalone process for specific tasks.
By using text mining, the unstructured text data can be transformed into structured data that can be used
for data mining tasks such as classification, clustering, and association rule mining. This allows
organizations to gain insights from a wide range of data sources, such as customer feedback, social media
posts, and news articles.
What is the common usage of Text Mining?
Text mining is widely used in various fields, such as natural language processing, information retrieval,
and social media analysis. It has become an essential tool for organizations to extract insights from
unstructured text data and make data-driven decisions.
Text Mining in Data Mining
Conventional Process of Text Mining
• Gathering unstructured information from various sources accessible in various document organizations,
for example, plain text, web pages, PDF records, etc.
• Pre-processing and data cleansing tasks are performed to distinguish and eliminate inconsistency in the
data. The data cleansing process makes sure to capture the genuine text, and it is performed to eliminate
stop words stemming (the process of identifying the root of a certain word and indexing the data.
• Processing and controlling tasks are applied to review and further clean the data set.
• Pattern analysis is implemented in Management Information System.
• Information processed in the above steps is utilized to extract important and applicable data for a
powerful and convenient decision-making process and trend analysis.
Text Mining in Data Mining
Conventional Process of Text Mining
Text Mining in Data Mining
Procedures for Analyzing Text Mining
Text Summarization: To extract its partial content and reflect its whole content automatically.
Text Categorization: To assign a category to the text among categories predefined by users.
Text Clustering: To segment texts into several clusters, depending on the substantial relevance.
Text Mining in Data Mining
Text Mining Techniques
Information Retrieval: In the process of Information retrieval, we try to process the available documents
and the text data into a structured form so, that we can apply different pattern recognition and analytical
processes. It is a process of extracting relevant and associated patterns according to a given set of words or
text documents.
Information Extraction: It is a process of extracting meaningful words from documents.
• Feature Extraction – In this process, we try to develop some new features from existing ones. This
objective can be achieved by parsing an existing feature or combining two or more features based on
some mathematical operation.
• Feature Selection – In this process, we try to reduce the dimensionality of the dataset which is
generally a common issue while dealing with the text data by selecting a subset of features from the
whole dataset.
Natural Language Processing: Natural Language Processing includes tasks that are accomplished by
using Machine Learning and Deep Learning methodologies. It concerns the automatic processing and
analysis of unstructured text information.
Text Mining in Data Mining
Application Area of Text Mining
• Digital Library
• Academic and Research Field
• Life Science
• Social-Media
• Business Intelligence
Text Mining in Data Mining
Issues in Text Mining
Numerous issues happen during the text mining process:
1. The efficiency and effectiveness of decision-making.
2. The uncertain problem can come at an intermediate stage of text mining. In the pre-processing stage,
different rules and guidelines are characterized to normalize the text which makes the text-mining
process efficient. Before applying pattern analysis to the document, there is a need to change over
unstructured data into a moderate structure.
3. Sometimes original message or meaning can be changed due to alteration.
4. Another issue in text mining is many algorithms and techniques support multi-language text. It may
create ambiguity in text meaning. This problem can lead to false-positive results.
5. The utilization of synonyms, polysemy, and antonyms in the document text makes issues for the text
mining tools that take both in a similar setting. It is difficult to categorize such kinds of text/ words.
Text Mining in Data Mining
Advantages of Text Mining
1. Large Amounts of Data: Text mining allows organizations to extract insights from large amounts of
unstructured text data. This can include customer feedback, social media posts, and news articles.
2. Variety of Applications: Text mining has a wide range of applications, including sentiment analysis,
named entity recognition, and topic modeling. This makes it a versatile tool for organizations to gain
insights from unstructured text data.
3. Improved Decision Making: Text mining can be used to extract insights from unstructured text data,
which can be used to make data-driven decisions.
4. Cost-effective: Text mining can be a cost-effective way to extract insights from unstructured text
data, as it eliminates the need for manual data entry.
5. Broader benefits: Cost reductions, productivity increases, the creation of novel new services, and
new business models are just a few of the larger economic advantages mentioned by those consulted.
Text Mining in Data Mining
Disadvantages of Text Mining
Complexity: Text mining can be a complex process that requires advanced skills in natural language
processing and machine learning.
Quality of Data: The quality of text data can vary, which can affect the accuracy of the insights extracted
from text mining.
High Computational Cost: Text mining requires high computational resources, and it may be difficult for
smaller organizations to afford the technology.
Limited to Text Data: Text mining is limited to extracting insights from unstructured text data and cannot
be used with other data types.
Noise in text mining results: Text mining of documents may result in mistakes. It’s possible to find false
links or to miss others. In most situations, if the noise (error rate) is sufficiently low, the benefits of
automation exceed the chance of a larger mistake than that produced by a human reader.
Lack of transparency: Text mining is frequently viewed as a mysterious process where large corpora of
text documents are input and new information is produced. Text mining is in fact opaque when researchers
lack the technical know-how or expertise to comprehend how it operates, or when they lack access to
corpora or text mining tools.
Data Mining- World Wide Web
Over the last few years, the World Wide Web has become a significant source of information and
simultaneously a popular platform for business.
Web mining can define as the method of utilizing data mining techniques and algorithms to extract
useful information directly from the web, such as Web documents and services, hyperlinks, Web
content, and server logs.
The World Wide Web contains a large amount of data that provides a rich source to data mining. The
objective of Web mining is to look for patterns in Web data by collecting and examining data in order to
gain insights.
Data Mining- World Wide Web
What is Web Mining?
Web mining can widely be seen as the application of adapted data mining techniques to the web,
whereas data mining is defined as the application of the algorithm to discover patterns on mostly
structured data embedded into a knowledge discovery process.
Web mining has a distinctive property to provide a set of various data types. The web has multiple
aspects that yield different approaches for the mining process, such as web pages consist of text, web
pages are linked via hyperlinks, and user activity can be monitored via web server logs.
Data Mining- World Wide Web
Data Mining- World Wide Web
1. Web Content Mining:
Web content mining can be used to extract useful data, information, knowledge from the web page
content. In web content mining, each web page is considered as an individual document. The individual
can take advantage of the semi-structured nature of web pages, as HTML provides information that
concerns not only the layout but also logical structure. The primary task of content mining is data
extraction, where structured data is extracted from unstructured websites. The objective is to facilitate
data aggregation over various web sites by using the extracted structured data. Web content mining can
be utilized to distinguish topics on the web. For Example, if any user searches for a specific task on the
search engine, then the user will get a list of suggestions.
Data Mining- World Wide Web
2. Web Structured Mining:
The web structure mining can be used to find the link structure of hyperlink. It is used to identify that
data either link the web pages or direct link network. In Web Structure Mining, an individual considers
the web as a directed graph, with the web pages being the vertices that are associated with hyperlinks.
The most important application in this regard is the Google search engine, which estimates the ranking
of its outcomes primarily with the PageRank algorithm. It characterizes a page to be exceptionally
relevant when frequently connected by other highly related pages. Structure and content mining
methodologies are usually combined. For example, web structured mining can be beneficial to
organizations to regulate the network between two commercial sites.
Data Mining- World Wide Web
3. Web Usage Mining:
Web usage mining is used to extract useful data, information, knowledge from the weblog records, and
assists in recognizing the user access patterns for web pages. In Mining, the usage of web resources, the
individual is thinking about records of requests of visitors of a website, that are often collected as web
server logs. While the content and structure of the collection of web pages follow the intentions of the
authors of the pages, the individual requests demonstrate how the consumers see these pages. Web
usage mining may disclose relationships that were not proposed by the creator of the pages.
Data Mining- World Wide Web
Some of the methods to identify and analyze the web usage patterns are given below:
I. Session and visitor analysis:
The analysis of preprocessed data can be accomplished in session analysis, which incorporates the
guest records, days, time, sessions, etc. This data can be utilized to analyze the visitor's behavior.
The document is created after this analysis, which contains the details of repeatedly visited web pages,
common entry, and exit.
II. OLAP (Online Analytical Processing):
OLAP accomplishes a multidimensional analysis of advanced data.
OLAP can be accomplished on various parts of log related data in a specific period.
OLAP tools can be used to infer important business intelligence metrics
Data Mining- World Wide Web
Challenges in Web Mining:
The web pretends incredible challenges for resources, and knowledge discovery based on the following
observations:
The complexity of web pages:
The site pages don't have a unifying structure. They are extremely complicated as compared to
traditional text documents. There are enormous amounts of documents in the digital library of the web.
These libraries are not organized according to a specific order.
The web is a dynamic data source:
The data on the internet is quickly updated. For example, news, climate, shopping, financial news,
sports, and so on.
Diversity of client networks:
The client network on the web is quickly expanding. These clients have different interests,
backgrounds, and usage purposes. There are over a hundred million workstations that are associated
with the internet and still increasing tremendously.
Data Mining- World Wide Web
Relevancy of data:
It is considered that a specific person is generally concerned about a small portion of the web, while the rest
of the segment of the web contains the data that is not familiar to the user and may lead to unwanted results.
The web is too broad:
The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for data
warehousing and data mining.
Application of Web Mining:
Web mining has an extensive application because of various uses of the web. The list of some applications of
web mining is given below.
Marketing and conversion tool
Data analysis on website and application accomplishment.
Audience behavior analysis
Advertising and campaign accomplishment analysis.
Testing and analysis of a site.

Más contenido relacionado

La actualidad más candente

Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
Malik Alig
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
saba khan
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 

La actualidad más candente (20)

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Data mining
Data mining Data mining
Data mining
 
Analytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive TechniquesAnalytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive Techniques
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Data
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
 
Logistic regression
  Logistic regression  Logistic regression
Logistic regression
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 

Similar a UNIT - 5: Data Warehousing and Data Mining

Similar a UNIT - 5: Data Warehousing and Data Mining (20)

Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
All types of mining and trends indata mining
All types of mining and trends indata miningAll types of mining and trends indata mining
All types of mining and trends indata mining
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
NCCT.pptx
NCCT.pptxNCCT.pptx
NCCT.pptx
 
Data mining slide for data mining process
Data mining slide for data mining processData mining slide for data mining process
Data mining slide for data mining process
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
DOWLD SLIDES.pptx
DOWLD SLIDES.pptxDOWLD SLIDES.pptx
DOWLD SLIDES.pptx
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research TopicsData mining - Process, Techniques and Research Topics
Data mining - Process, Techniques and Research Topics
 
G045033841
G045033841G045033841
G045033841
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
 
Dm unit i r16
Dm unit i   r16Dm unit i   r16
Dm unit i r16
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
 
A Novel Data mining Technique to Discover Patterns from Huge Text Corpus
A Novel Data mining Technique to Discover Patterns from Huge  Text CorpusA Novel Data mining Technique to Discover Patterns from Huge  Text Corpus
A Novel Data mining Technique to Discover Patterns from Huge Text Corpus
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 

Más de Nandakumar P

Más de Nandakumar P (17)

UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
UNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data MiningUNIT 2: Part 1: Data Warehousing and Data Mining
UNIT 2: Part 1: Data Warehousing and Data Mining
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
UNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data MiningUNIT - 1 : Part 1: Data Warehousing and Data Mining
UNIT - 1 : Part 1: Data Warehousing and Data Mining
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Python Course for Beginners
Python Course for BeginnersPython Course for Beginners
Python Course for Beginners
 
CS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed SystemsCS6601-Unit 4 Distributed Systems
CS6601-Unit 4 Distributed Systems
 
Unit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in EngineeringUnit-4 Professional Ethics in Engineering
Unit-4 Professional Ethics in Engineering
 
Unit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in EngineeringUnit-3 Professional Ethics in Engineering
Unit-3 Professional Ethics in Engineering
 
Naming in Distributed Systems
Naming in Distributed SystemsNaming in Distributed Systems
Naming in Distributed Systems
 
Unit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File SystemUnit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File System
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed Systems
 
Professional Ethics in Engineering
Professional Ethics in EngineeringProfessional Ethics in Engineering
Professional Ethics in Engineering
 

Último

The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
heathfieldcps1
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
MinawBelay
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 

Último (20)

ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
The Ball Poem- John Berryman_20240518_001617_0000.pptx
The Ball Poem- John Berryman_20240518_001617_0000.pptxThe Ball Poem- John Berryman_20240518_001617_0000.pptx
The Ball Poem- John Berryman_20240518_001617_0000.pptx
 
II BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING II
II BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING IIII BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING II
II BIOSENSOR PRINCIPLE APPLICATIONS AND WORKING II
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
HVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptx
HVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptxHVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptx
HVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptx
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
 

UNIT - 5: Data Warehousing and Data Mining

  • 1. DATA WAREHOUSING AND DATA MINING UNIT – 5 Prepared by Mr. P. Nandakumar Assistant Professor, Department of IT, SVCET
  • 2. UNIT-V Mining Object, Spatial, Multimedia, Text, and Web Data: Multidimensional Analysis and Descriptive Mining of Complex Data Objects – Spatial Data Mining – Multimedia Data Mining – Text Mining – Mining the World Wide Web.
  • 3. Mining Object, Spatial, Multimedia, Text, and Web Data • In general terms, “Mining” is the process of extraction. In the context of computer science, Data Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. • There are other kinds of data like semi-structured or unstructured data which includes spatial data, multimedia data, text data, web data which require different methodologies for data mining. • Mining Multimedia Data: Multimedia data objects include image data, video data, audio data, website hyperlinks, and linkages. Multimedia data mining tries to find out interesting patterns from multimedia databases. • This includes the processing of the digital data and performs tasks like image processing, image classification, video, and audio data mining, and pattern recognition.
  • 4. Mining Object, Spatial, Multimedia, Text, and Web Data • Multimedia Data mining is becoming the most interesting research area because most of the social media platforms like Twitter, Facebook data can be analyzed through this and derive interesting trends and patterns. • Mining Web Data: Web mining is essential to discover crucial patterns and knowledge from the Web. Web content mining analyzes data of several websites which includes the web pages and the multimedia data such as images in the web pages. • Web mining is done to understand the content of web pages, unique users of the website, unique hypertext links, web page relevance and ranking, web page content summaries, time that the users spent on the particular website, and understand user search patterns. Web mining also finds out the best search engine and determines the search algorithm used by it. So it helps improve search efficiency and finds the best search engine for the users.
  • 5. Mining Object, Spatial, Multimedia, Text, and Web Data • Mining Text Data: Text mining is the subfield of data mining, machine learning, Natural Language processing, and statistics. Most of the information in our daily life is stored as text such as news articles, technical papers, books, email messages, blogs. • Text Mining helps us to retrieve high-quality information from text such as sentiment analysis, document summarization, text categorization, text clustering. We apply machine learning models and NLP techniques to derive useful information from the text. • This is done by finding out the hidden patterns and trends by means such as statistical pattern learning and statistical language modeling. • In order to perform text mining, we need to preprocess the text by applying the techniques of stemming and lemmatization in order to convert the textual data into data vectors.
  • 6. Mining Object, Spatial, Multimedia, Text, and Web Data • Mining Spatiotemporal Data: The data that is related to both space and time is Spatiotemporal data. Spatiotemporal data mining retrieves interesting patterns and knowledge from spatiotemporal data. • Spatiotemporal Data mining helps us to find the value of the lands, the age of the rocks and precious stones, predict the weather patterns. • Spatiotemporal data mining has many practical applications like GPS in mobile phones, timers, Internet-based map services, weather services, satellite, RFID, sensor.
  • 7. Mining Object, Spatial, Multimedia, Text, and Web Data • Mining Data Streams: Stream data is the data that can change dynamically and it is noisy, inconsistent which contain multidimensional features of different data types. So this data is stored in NoSql database systems. • NoSQL Database is used to refer a non-SQL or non relational database. It provides a mechanism for storage and retrieval of data other than tabular relations model used in relational databases. NoSQL database doesn't use tables for storing data. It is generally used to store big data and real-time web applications. • The volume of the stream data is very high and this is the challenge for the effective mining of stream data. While mining the Data Streams we need to perform the tasks such as clustering, outlier analysis, and the online detection of rare events in data streams.
  • 8. Multidimensional Analysis and Descriptive Mining of Complex Data Objects • Multidimensional analysis is the analysis of dimension objects organized in meaningful hierarchies. Multidimensional analysis allows users to observe data from various viewpoints. This enables them to spot trends or exceptions in the data. A hierarchy is an ordered series of related dimensions. • Multidimensional analysis gives the ability to view data from different viewpoints. This is especially critical for business. • In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis process that groups data into two categories: data dimensions and measurements. • The multidimensional data model is composed of logical cubes, measures, dimensions, hierarchies, levels, and attributes. The simplicity of the model is inherent because it defines objects that represent real-world business entities.
  • 9. Multidimensional Analysis and Descriptive Mining of Complex Data Objects • The Complex data types require advanced data mining techniques. Some of the Complex data types are sequence Data which includes the Time-Series, Symbolic Sequences, and Biological Sequences. The additional preprocessing steps are needed for data mining of these complex data types. 1. Time-Series Data Mining: • In time-series data, data is measured as the long series of the numerical or textual data at equal time intervals per minute, per hour, or per day. Time-series data mining is performed on the data obtained from the stock markets, scientific data, and medical data. In time series mining it is not possible to find the data that exactly matches the given query. We employ the similarity search method that finds the data sequences that are similar to the given query string. In the similarity search method, subsequence matching is performed to find the subsequences that are similar to a given query string.
  • 10. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 2. Sequential Pattern Mining in Symbolic Sequences: • Symbolic sequences are composed of long nominal data sequences, which dynamically change their behavior over time intervals. Examples of the Symbolic Sequences include online customer shopping sequences as well as sequences of events of experiments. Mining of Symbolic Sequences is called Sequential Mining. A sequential pattern is a subsequence that exists more frequently in a set of sequences. so it finds the most frequent subsequence in a set of sequences to perform the mining. Many scalable algorithms have been built to find out the frequent subsequence. There are also algorithms to mine the multidimensional and multilevel sequential patterns.
  • 11. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 3. Data mining of Biological Sequences: • Biological sequences are the long sequences of nucleotides and data mining of biological sequences is required to find the features of the DNA of humans. Biological sequence analysis is the first step of data mining to compare the alignment of the biological sequences. Two species are similar to each other only if their nucleotide (DNA, RNA) and protein sequences are close and similar. During the data mining of Biological Sequences, the degree of similarity between nucleotide sequences is measured. The degree of similarity obtained by sequence alignment of nucleotides is essential in determining the homology between two sequences. There can be the situation of alignment of two or more input biological sequences by identifying similar sequences with long subsequences. The amino acids also called proteins sequences are also compared and aligned.
  • 12. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 4. Graph Pattern Mining: • Graph Pattern Mining can be done by using Apriori-based and pattern growth-based approaches. We can mine the subgraphs of the graph and the set of closed graphs. A closed graph g is the graph that doesn’t have a super graph that carries the same support count as g. Graph Pattern Mining is applied to different types of graphs such as frequent graphs, coherent graphs, and dense graphs. We can also improve the mining efficiency by applying the user constraints on the graph patterns. Graph patterns are two types. Homogeneous graphs where nodes or links of the graph are of the same type by having similar features. In Heterogeneous graph patterns, the nodes and links are of different types.
  • 13. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 5. Statistical Modeling of Networks: • A network is a collection of nodes where each node represents the data and the nodes are linked through edges, representing relationships between data objects. If all the nodes and links connecting the nodes are of the same type, then the network is homogeneous such as a friend network or a web page network. If the nodes and links connecting the nodes are of different types, then the network is heterogeneous such as health-care networks (linking the different parameters such as doctors, nurses, patients, diseases together in the network). Graph Pattern Mining can be further applied to the network to derive the knowledge and useful patterns from the network.
  • 14. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 6. Mining Spatial Data: • Spatial data is the geo space-related data that is stored in large data repositories. The spatial data is represented in “vector” format and geo-referenced multimedia format. A spatial database is constructed from large geographic data warehouses by integrating geographical data of multiple sources of areas. we can construct spatial data cubes that contain information about the spatial dimensions and measures. It is possible to perform the OLAP operations on the spatial data for spatial data analysis. Spatial data mining is performed on spatial data warehouses, spatial databases, and other geospatial data repositories. Spatial Data mining discovers knowledge about the geographic areas. The preprocessing of spatial data involves several operations like spatial clustering, spatial classification, spatial modeling, and outlier detection in spatial data.
  • 15. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 7. Mining Cyber-Physical System Data: • Cyber-Physical System Data can be mined by constructing a graph or network of data. A cyber- physical system (CPS) is a heterogeneous network that consists of a large number of interconnected nodes that store patients or medical information. The links in the CPS network represent the relationship between the nodes . cyber-physical systems store dynamic, inconsistent, and interdependent data that contains spatiotemporal information. Mining cyber-physical data links the situation as a query to access the data from a large information database and it involves real-time calculations and analysis to prompt responses from the CPS system. CPS analysis requires rare-event detection and anomaly analysis in cyber-physical data streams, in cyber-physical networks, and the processing of Cyber-Physical Data involves the integration of stream data with real-time automated control processes.
  • 16. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 8. Mining Multimedia Data: • Multimedia data objects include image data, video data, audio data, website hyperlinks, and linkages. Multimedia data mining tries to find out interesting patterns from multimedia databases. This includes the processing of the digital data and performs tasks like image processing, image classification, video, and audio data mining, and pattern recognition. Multimedia Data mining is becoming the most interesting research area because most of the social media platforms like Twitter, Facebook data can be analyzed through this and derive interesting trends and patterns.
  • 17. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 9. Mining Web Data: • Web mining is essential to discover crucial patterns and knowledge from the Web. Web content mining analyzes data of several websites which includes the web pages and the multimedia data such as images in the web pages. Web mining is done to understand the content of web pages, unique users of the website, unique hypertext links, web page relevance and ranking, web page content summaries, time that the users spent on the particular website, and understand user search patterns. Web mining also finds out the best search engine and determines the search algorithm used by it. So it helps improve search efficiency and finds the best search engine for the users.
  • 18. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 10. Mining Text Data: • Text mining is the subfield of data mining, machine learning, Natural Language processing, and statistics. Most of the information in our daily life is stored as text such as news articles, technical papers, books, email messages, blogs. Text Mining helps us to retrieve high-quality information from text such as sentiment analysis, document summarization, text categorization, text clustering. We apply machine learning models and NLP techniques to derive useful information from the text. This is done by finding out the hidden patterns and trends by means such as statistical pattern learning and statistical language modeling. In order to perform text mining, we need to preprocess the text by applying the techniques of stemming and lemmatization in order to convert the textual data into data vectors.
  • 19. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 11. Mining Spatiotemporal Data: • The data that is related to both space and time is Spatiotemporal data. Spatiotemporal data mining retrieves interesting patterns and knowledge from spatiotemporal data. Spatiotemporal Data mining helps us to find the value of the lands, the age of the rocks and precious stones, predict the weather patterns. Spatiotemporal data mining has many practical applications like GPS in mobile phones, timers, Internet-based map services, weather services, satellite, RFID, sensor.
  • 20. Multidimensional Analysis and Descriptive Mining of Complex Data Objects 12. Mining Data Streams: • Stream data is the data that can change dynamically and it is noisy, inconsistent which contain multidimensional features of different data types. So this data is stored in NoSql database systems. The volume of the stream data is very high and this is the challenge for the effective mining of stream data. While mining the Data Streams we need to perform the tasks such as clustering, outlier analysis, and the online detection of rare events in data streams.
  • 21. Spatial Data Mining A spatial database saves a huge amount of space-related data, including maps, preprocessed remote sensing or medical imaging records, and VLSI chip design data. Spatial databases have several features that distinguish them from relational databases. They carry topological and/or distance information, usually organized by sophisticated, multidimensional spatial indexing structures that are accessed by spatial data access methods and often require spatial reasoning, geometric computation, and spatial knowledge representation techniques. Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting patterns not explicitly stored in spatial databases. Such mining demands the unification of data mining with spatial database technologies. It can be used for learning spatial records, discovering spatial relationships and relationships among spatial and nonspatial records, constructing spatial knowledge bases, reorganizing spatial databases, and optimizing spatial queries.
  • 22. Spatial Data Mining It is expected to have broad applications in geographic data systems, marketing, remote sensing, image database exploration, medical imaging, navigation, traffic control, environmental studies, and many other areas where spatial data are used. A central challenge to spatial data mining is the exploration of efficient spatial data mining techniques because of the large amount of spatial data and the difficulty of spatial data types and spatial access methods. Statistical spatial data analysis has been a popular approach to analyzing spatial data and exploring geographic information. The term geostatistics is often associated with continuous geographic space, whereas the term spatial statistics is often associated with discrete space. In a statistical model that manages non-spatial records, one generally considers statistical independence among different areas of data.
  • 23. Spatial Data Mining There is no such separation among spatially distributed records because, actually spatial objects are interrelated, or more exactly spatially co-located, in the sense that the closer the two objects are placed, the more likely they send the same properties. For example, natural resources, climate, temperature, and economic situations are likely to be similar in geographically closely located regions. Such a property of close interdependency across nearby space leads to the notion of spatial autocorrelation. Based on this notion, spatial statistical modeling methods have been developed with success. Spatial data mining will create spatial statistical analysis methods and extend them for large amounts of spatial data, with more emphasis on effectiveness, scalability, cooperation with database and data warehouse systems, enhanced user interaction, and the discovery of new kinds of knowledge.
  • 24. Multimedia Data Mining Multimedia mining is a subfield of data mining that is used to find interesting information of implicit knowledge from multimedia databases. Mining in multimedia is referred to as automatic annotation or annotation mining. Mining multimedia data requires two or more data types, such as text and video or text video and audio. Multimedia data mining is an interdisciplinary field that integrates image processing and understanding, computer vision, data mining, and pattern recognition. Multimedia data mining discovers interesting patterns from multimedia databases that store and manage large collections of multimedia objects, including image data, video data, audio data, sequence data and hypertext data containing text, text markups, and linkages. Issues in multimedia data mining include content-based retrieval and similarity search, generalization and multidimensional analysis. Multimedia data cubes contain additional dimensions and measures for multimedia information.
  • 25. Multimedia Data Mining The framework that manages different types of multimedia data stored, delivered, and utilized in different ways is known as a multimedia database management system. There are three classes of multimedia databases: static, dynamic, and dimensional media. The content of the Multimedia Database management system is as follows: • Media data:The actual data representing an object. • Media format data: Information such as sampling rate, resolution, encoding scheme etc., about the format of the media data after it goes through the acquisition, processing and encoding phase. • Media keyword data:Keywords description relating to the generation of data. It is also known as content descriptive data. Example: date, time and place of recording. • Media feature data: Content dependent data such as the distribution of colours, kinds of texture and different shapes present in data.
  • 26. Multimedia Data Mining Types of multimedia applications based on data management characteristics are: 1. Repository applications: A Large amount of multimedia data and meta-data (Media format date, Media keyword data, Media feature data) that is stored for retrieval purposes, e.g., Repository of satellite images, engineering drawings, radiology scanned pictures. 2. Presentation applications: They involve delivering multimedia data subject to temporal constraints. Optimal viewing or listening requires DBMS to deliver data at a certain rate, offering the quality of service above a certain threshold. Here data is processed as it is delivered. Example: Annotating of video and audio data, real-time editing analysis. 3. Collaborative work using multimedia information involves executing a complex task by merging drawings and changing notifications. Example: Intelligent healthcare network.
  • 27. Multimedia Data Mining Challenges with Multimedia Database: 1. Modelling: Working in this area can improve database versus information retrieval techniques; thus, documents constitute a specialized area and deserve special consideration. 2. Design: The conceptual, logical and physical design of multimedia databases has not yet been addressed fully as performance and tuning issues at each level are far more complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG, which is not easy to convert from one form to another. 3. Storage: Storage of multimedia database on any standard disk presents the problem of representation, compression, mapping to device hierarchies, archiving and buffering during input-output operation. In DBMS, a BLOB (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved. 4. Performance: Physical limitations dominate an application involving video playback or audio-video synchronization. The use of parallel processing may alleviate some problems, but such techniques are not yet fully developed. Apart from this, a multimedia database consumes a lot of processing time and bandwidth. 5. Queries and retrieval: For multimedia data like images, video, and audio accessing data through query open up many issues like efficient query formulation, query execution and optimization, which need to be worked upon.
  • 28. Multimedia Data Mining Where is Multimedia Database Applied? Below are the following areas where a multimedia database is applied, such as: • Documents and record management: Industries and businesses keep detailed records and various documents. For example, insurance claim records. • Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in terms of providing several resources. For example, electronic books. • Education and training: Computer-aided learning materials can be designed using multimedia sources which are nowadays very popular sources of learning. Example: Digital libraries. • Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of cities. • Real-time control and monitoring: With active database technology, multimedia presentation of information can effectively monitor and control complex tasks. For example, manufacturing operation control.
  • 29. Multimedia Data Mining Where is Multimedia Database Applied? Below are the following areas where a multimedia database is applied, such as: • Documents and record management: Industries and businesses keep detailed records and various documents. For example, insurance claim records. • Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in terms of providing several resources. For example, electronic books. • Education and training: Computer-aided learning materials can be designed using multimedia sources which are nowadays very popular sources of learning. Example: Digital libraries. • Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of cities. • Real-time control and monitoring: With active database technology, multimedia presentation of information can effectively monitor and control complex tasks. For example, manufacturing operation control.
  • 30. Multimedia Data Mining Categories of Multimedia Data Mining Multimedia mining refers to analyzing a large amount of multimedia information to extract patterns based on their statistical relationships. Multimedia data mining is classified into two broad categories: static and dynamic media. Static media contains text (digital library, creating SMS and MMS) and images (photos and medical images). Dynamic media contains Audio (music and MP3 sounds) and Video (movies). The below image shows the categories of multimedia data mining.
  • 31. Multimedia Data Mining Application of Multimedia Mining
  • 32. Multimedia Data Mining Process of Multimedia Data Mining
  • 33. Multimedia Data Mining Architecture for Multimedia Data Mining
  • 34. Text Mining in Data Mining Text mining is a component of data mining that deals specifically with unstructured text data. It involves the use of natural language processing (NLP) techniques to extract useful information and insights from large amounts of unstructured text data. Text mining can be used as a preprocessing step for data mining or as a standalone process for specific tasks. By using text mining, the unstructured text data can be transformed into structured data that can be used for data mining tasks such as classification, clustering, and association rule mining. This allows organizations to gain insights from a wide range of data sources, such as customer feedback, social media posts, and news articles. What is the common usage of Text Mining? Text mining is widely used in various fields, such as natural language processing, information retrieval, and social media analysis. It has become an essential tool for organizations to extract insights from unstructured text data and make data-driven decisions.
  • 35. Text Mining in Data Mining Conventional Process of Text Mining • Gathering unstructured information from various sources accessible in various document organizations, for example, plain text, web pages, PDF records, etc. • Pre-processing and data cleansing tasks are performed to distinguish and eliminate inconsistency in the data. The data cleansing process makes sure to capture the genuine text, and it is performed to eliminate stop words stemming (the process of identifying the root of a certain word and indexing the data. • Processing and controlling tasks are applied to review and further clean the data set. • Pattern analysis is implemented in Management Information System. • Information processed in the above steps is utilized to extract important and applicable data for a powerful and convenient decision-making process and trend analysis.
  • 36. Text Mining in Data Mining Conventional Process of Text Mining
  • 37. Text Mining in Data Mining Procedures for Analyzing Text Mining Text Summarization: To extract its partial content and reflect its whole content automatically. Text Categorization: To assign a category to the text among categories predefined by users. Text Clustering: To segment texts into several clusters, depending on the substantial relevance.
  • 38. Text Mining in Data Mining Text Mining Techniques Information Retrieval: In the process of Information retrieval, we try to process the available documents and the text data into a structured form so, that we can apply different pattern recognition and analytical processes. It is a process of extracting relevant and associated patterns according to a given set of words or text documents. Information Extraction: It is a process of extracting meaningful words from documents. • Feature Extraction – In this process, we try to develop some new features from existing ones. This objective can be achieved by parsing an existing feature or combining two or more features based on some mathematical operation. • Feature Selection – In this process, we try to reduce the dimensionality of the dataset which is generally a common issue while dealing with the text data by selecting a subset of features from the whole dataset. Natural Language Processing: Natural Language Processing includes tasks that are accomplished by using Machine Learning and Deep Learning methodologies. It concerns the automatic processing and analysis of unstructured text information.
  • 39. Text Mining in Data Mining Application Area of Text Mining • Digital Library • Academic and Research Field • Life Science • Social-Media • Business Intelligence
  • 40. Text Mining in Data Mining Issues in Text Mining Numerous issues happen during the text mining process: 1. The efficiency and effectiveness of decision-making. 2. The uncertain problem can come at an intermediate stage of text mining. In the pre-processing stage, different rules and guidelines are characterized to normalize the text which makes the text-mining process efficient. Before applying pattern analysis to the document, there is a need to change over unstructured data into a moderate structure. 3. Sometimes original message or meaning can be changed due to alteration. 4. Another issue in text mining is many algorithms and techniques support multi-language text. It may create ambiguity in text meaning. This problem can lead to false-positive results. 5. The utilization of synonyms, polysemy, and antonyms in the document text makes issues for the text mining tools that take both in a similar setting. It is difficult to categorize such kinds of text/ words.
  • 41. Text Mining in Data Mining Advantages of Text Mining 1. Large Amounts of Data: Text mining allows organizations to extract insights from large amounts of unstructured text data. This can include customer feedback, social media posts, and news articles. 2. Variety of Applications: Text mining has a wide range of applications, including sentiment analysis, named entity recognition, and topic modeling. This makes it a versatile tool for organizations to gain insights from unstructured text data. 3. Improved Decision Making: Text mining can be used to extract insights from unstructured text data, which can be used to make data-driven decisions. 4. Cost-effective: Text mining can be a cost-effective way to extract insights from unstructured text data, as it eliminates the need for manual data entry. 5. Broader benefits: Cost reductions, productivity increases, the creation of novel new services, and new business models are just a few of the larger economic advantages mentioned by those consulted.
  • 42. Text Mining in Data Mining Disadvantages of Text Mining Complexity: Text mining can be a complex process that requires advanced skills in natural language processing and machine learning. Quality of Data: The quality of text data can vary, which can affect the accuracy of the insights extracted from text mining. High Computational Cost: Text mining requires high computational resources, and it may be difficult for smaller organizations to afford the technology. Limited to Text Data: Text mining is limited to extracting insights from unstructured text data and cannot be used with other data types. Noise in text mining results: Text mining of documents may result in mistakes. It’s possible to find false links or to miss others. In most situations, if the noise (error rate) is sufficiently low, the benefits of automation exceed the chance of a larger mistake than that produced by a human reader. Lack of transparency: Text mining is frequently viewed as a mysterious process where large corpora of text documents are input and new information is produced. Text mining is in fact opaque when researchers lack the technical know-how or expertise to comprehend how it operates, or when they lack access to corpora or text mining tools.
  • 43. Data Mining- World Wide Web Over the last few years, the World Wide Web has become a significant source of information and simultaneously a popular platform for business. Web mining can define as the method of utilizing data mining techniques and algorithms to extract useful information directly from the web, such as Web documents and services, hyperlinks, Web content, and server logs. The World Wide Web contains a large amount of data that provides a rich source to data mining. The objective of Web mining is to look for patterns in Web data by collecting and examining data in order to gain insights.
  • 44. Data Mining- World Wide Web What is Web Mining? Web mining can widely be seen as the application of adapted data mining techniques to the web, whereas data mining is defined as the application of the algorithm to discover patterns on mostly structured data embedded into a knowledge discovery process. Web mining has a distinctive property to provide a set of various data types. The web has multiple aspects that yield different approaches for the mining process, such as web pages consist of text, web pages are linked via hyperlinks, and user activity can be monitored via web server logs.
  • 45. Data Mining- World Wide Web
  • 46. Data Mining- World Wide Web 1. Web Content Mining: Web content mining can be used to extract useful data, information, knowledge from the web page content. In web content mining, each web page is considered as an individual document. The individual can take advantage of the semi-structured nature of web pages, as HTML provides information that concerns not only the layout but also logical structure. The primary task of content mining is data extraction, where structured data is extracted from unstructured websites. The objective is to facilitate data aggregation over various web sites by using the extracted structured data. Web content mining can be utilized to distinguish topics on the web. For Example, if any user searches for a specific task on the search engine, then the user will get a list of suggestions.
  • 47. Data Mining- World Wide Web 2. Web Structured Mining: The web structure mining can be used to find the link structure of hyperlink. It is used to identify that data either link the web pages or direct link network. In Web Structure Mining, an individual considers the web as a directed graph, with the web pages being the vertices that are associated with hyperlinks. The most important application in this regard is the Google search engine, which estimates the ranking of its outcomes primarily with the PageRank algorithm. It characterizes a page to be exceptionally relevant when frequently connected by other highly related pages. Structure and content mining methodologies are usually combined. For example, web structured mining can be beneficial to organizations to regulate the network between two commercial sites.
  • 48. Data Mining- World Wide Web 3. Web Usage Mining: Web usage mining is used to extract useful data, information, knowledge from the weblog records, and assists in recognizing the user access patterns for web pages. In Mining, the usage of web resources, the individual is thinking about records of requests of visitors of a website, that are often collected as web server logs. While the content and structure of the collection of web pages follow the intentions of the authors of the pages, the individual requests demonstrate how the consumers see these pages. Web usage mining may disclose relationships that were not proposed by the creator of the pages.
  • 49. Data Mining- World Wide Web Some of the methods to identify and analyze the web usage patterns are given below: I. Session and visitor analysis: The analysis of preprocessed data can be accomplished in session analysis, which incorporates the guest records, days, time, sessions, etc. This data can be utilized to analyze the visitor's behavior. The document is created after this analysis, which contains the details of repeatedly visited web pages, common entry, and exit. II. OLAP (Online Analytical Processing): OLAP accomplishes a multidimensional analysis of advanced data. OLAP can be accomplished on various parts of log related data in a specific period. OLAP tools can be used to infer important business intelligence metrics
  • 50. Data Mining- World Wide Web Challenges in Web Mining: The web pretends incredible challenges for resources, and knowledge discovery based on the following observations: The complexity of web pages: The site pages don't have a unifying structure. They are extremely complicated as compared to traditional text documents. There are enormous amounts of documents in the digital library of the web. These libraries are not organized according to a specific order. The web is a dynamic data source: The data on the internet is quickly updated. For example, news, climate, shopping, financial news, sports, and so on. Diversity of client networks: The client network on the web is quickly expanding. These clients have different interests, backgrounds, and usage purposes. There are over a hundred million workstations that are associated with the internet and still increasing tremendously.
  • 51. Data Mining- World Wide Web Relevancy of data: It is considered that a specific person is generally concerned about a small portion of the web, while the rest of the segment of the web contains the data that is not familiar to the user and may lead to unwanted results. The web is too broad: The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for data warehousing and data mining. Application of Web Mining: Web mining has an extensive application because of various uses of the web. The list of some applications of web mining is given below. Marketing and conversion tool Data analysis on website and application accomplishment. Audience behavior analysis Advertising and campaign accomplishment analysis. Testing and analysis of a site.