2. Mining refers to extract something from any where.
Anand Bihari 2
3. Data mining refers to extracting or “mining” knowledge from large
amounts of data.
Mining of gold from rocks or sand is referred to as gold mining
rather than rock or sand mining.
Thus, data mining should have been more appropriately named
“knowledge mining from data,” which is unfortunately somewhat
long.
“Knowledge mining,” a shorter term, may not reflect the emphasis
on mining from large amounts of data.
Anand Bihari 3
4. Web is a collection of inter-related files on one or more Web
servers.
Huge : Over 1 billion pages, 15 terabytes.
Wealth of information : Presence everywhere.
Highly Dynamic : Sites registered, closed .
Structure : Graph structure with links between pages.
Access : Hundreds of millions of requests per day.
Anand Bihari 4
5. Web mining is the application of data mining techniques to extract
knowledge from Web data, including Web documents, hyperlinks
between documents, usage logs of web sites, etc.
Web Data :
Web content : text, image, record ,etc.
Web structure : hyperlinks, tag, etc.
Web usage : http logs, app server logs, etc.
Anand Bihari 5
6. Traditional data mining
Data is structured and relational.
Well-defined tables, columns, rows, keys, and constraints.
Web data
Semi-structured and unstructured.
Readily available data.
Rich in features and patterns.
Anand Bihari 6
7. E-commerce
User profiles.
Targeted advertising.
Network Management
Performance management.
Fault management.
Information retrieval (Search) on the Web
Anand Bihari 7
8. Web Mining
Structure
Content Mining Usage Mining
Mining
Document
Text Hyperlink Web Server Log
Structure
Inter Document Application
Image
Hyperlink Sever Log
Intra Document Application
Video
Hyperlink Level Log
Audio
Structure
Record
Anand Bihari 8
9.
10. The structure of a typical Web graph consists of Web pages as
nodes, and hyperlinks as edges connecting between two related
pages.
Web Structure Mining can be is the process of discovering
structure information from the Web .
This type of mining can be performed either at the (intra-page)
document level or at the (inter-page) hyperlink level.
Anand Bihari 10
11. Web-graph : A directed graph that represents the Web.
Node : Each Web page is a node of the Web-graph.
Link : Each hyperlink on the Web is a directed edge of the Web-
graph.
In-degree :The in-degree of a node, p , is the number of distinct links
that point to p.
Out-degree : The out-degree of a node, p, is the number of distinct
links originating at p that point to other nodes.
Anand Bihari 11
12. Directed Path : A sequence of links, starting from p that can be
followed to reach q.
Shortest Path: Of all the paths between nodes p and q, which has
the shortest length, i.e. number of links on it.
Diameter : The maximum of all the shortest paths between a pair of
nodes p and q, for all pairs of nodes p and q in the Web-graph.
Anand Bihari 12
13. Literature Survey
Titles Name of Publication
Journal/Conferences Year
Mining web informative structures IEEE Transactions On Knowledge 2004
and Contents based on entropy And Data Engineering
analysis
Wisdom: web intra page IEEE Transactions On Knowledge 2005
informative structure Mining based And Data Engineering
on document object model
Knowledge Discovery and Retrieval 2010 Fourth Asia International 2010
on World Wide Web Conference on
Using Web Structure Mining Mathematical/Analytical Modelling
and Computer Simulation
Design and implementation of a International Conference on 2011
web structure Mining algorithm internet technology and secured
using breadth first search Strategy transactions
for academic search application
Anand Bihari 13
14. Problem Identification
After studying these Journals and conference paper, we will
find the problem and go with this problem.
Anand Bihari 14