SlideShare una empresa de Scribd logo
1 de 15
Priyabrata Satapathy
   Mining refers to extract something from any where.




                                               Anand Bihari   2
   Data mining refers to extracting or “mining” knowledge from large
    amounts of data.
   Mining of gold from rocks or sand is referred to as gold mining
    rather than rock or sand mining.

   Thus, data mining should have been more appropriately named
    “knowledge mining from data,” which is unfortunately somewhat
    long.
   “Knowledge mining,” a shorter term, may not reflect the emphasis
    on mining from large amounts of data.



                                                 Anand Bihari           3
   Web is a collection of inter-related files on one or more Web
    servers.
   Huge : Over 1 billion pages, 15 terabytes.
   Wealth of information : Presence everywhere.
   Highly Dynamic : Sites registered, closed .
   Structure : Graph structure with links between pages.
   Access : Hundreds of millions of requests per day.




                                             Anand Bihari           4
   Web mining is the application of data mining techniques to extract
    knowledge from Web data, including Web documents, hyperlinks
    between documents, usage logs of web sites, etc.
   Web Data :
     Web content : text, image, record ,etc.
     Web structure : hyperlinks, tag, etc.
     Web usage : http logs, app server logs, etc.




                                                Anand Bihari             5
   Traditional data mining
    Data is structured and relational.
    Well-defined tables, columns, rows, keys, and constraints.
   Web data
    Semi-structured and unstructured.
    Readily available data.
    Rich in features and patterns.




                                           Anand Bihari           6
   E-commerce
     User profiles.
     Targeted advertising.
   Network Management
     Performance management.
     Fault management.
   Information retrieval (Search) on the Web



                                         Anand Bihari   7
Web Mining



                              Structure
Content Mining                                    Usage Mining
                               Mining


                                      Document
        Text      Hyperlink                         Web Server Log
                                      Structure


                  Inter Document                      Application
       Image
                     Hyperlink                         Sever Log


                  Intra Document                      Application
        Video
                     Hyperlink                         Level Log



       Audio



      Structure
       Record


                                                  Anand Bihari       8
   The structure of a typical Web graph consists of Web pages as

    nodes, and hyperlinks as edges connecting between two related

    pages.

   Web Structure Mining can be is the process of discovering

    structure information from the Web .

   This type of mining can be performed either at the (intra-page)

    document level or at the (inter-page) hyperlink level.



                                                  Anand Bihari        10
   Web-graph : A directed graph that represents the Web.
   Node : Each Web page is a node of the Web-graph.
   Link : Each hyperlink on the Web is a directed edge of the Web-
    graph.
   In-degree :The in-degree of a node, p , is the number of distinct links
    that point to p.
   Out-degree : The out-degree of a node, p, is the number of distinct
    links originating at p that point to other nodes.




                                                    Anand Bihari              11
   Directed Path : A sequence of links, starting from p that can be
    followed to reach q.
   Shortest Path: Of all the paths between nodes p and q, which has
    the shortest length, i.e. number of links on it.
   Diameter : The maximum of all the shortest paths between a pair of
    nodes p and q, for all pairs of nodes p and q in the Web-graph.




                                                       Anand Bihari      12
   Literature Survey
     Titles                                Name of                             Publication
                                           Journal/Conferences                 Year
     Mining web informative structures     IEEE Transactions On Knowledge      2004
     and Contents based on entropy         And Data Engineering
     analysis
     Wisdom: web intra page                IEEE Transactions On Knowledge      2005
     informative structure Mining based    And Data Engineering
     on document object model
     Knowledge Discovery and Retrieval     2010 Fourth Asia International      2010
     on World Wide Web                     Conference on
     Using Web Structure Mining            Mathematical/Analytical Modelling
                                           and Computer Simulation
     Design and implementation of a        International Conference on         2011
     web structure Mining algorithm        internet technology and secured
     using breadth first search Strategy   transactions
     for academic search application


                                                             Anand Bihari                    13
   Problem Identification
          After studying these Journals and conference paper, we will
    find the problem and go with this problem.




                                                 Anand Bihari           14
Anand Bihari   15

Más contenido relacionado

La actualidad más candente (20)

Web content mining
Web content miningWeb content mining
Web content mining
 
Web mining
Web miningWeb mining
Web mining
 
Web Mining
Web Mining Web Mining
Web Mining
 
Web mining tools
Web mining toolsWeb mining tools
Web mining tools
 
Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
Web mining
Web mining Web mining
Web mining
 
Personal Web Usage Mining
Personal Web Usage MiningPersonal Web Usage Mining
Personal Web Usage Mining
 
Web mining
Web miningWeb mining
Web mining
 
WEB MINING.
WEB MINING.WEB MINING.
WEB MINING.
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Web Mining Presentation Final
Web Mining Presentation FinalWeb Mining Presentation Final
Web Mining Presentation Final
 
4.5 mining the worldwideweb
4.5 mining the worldwideweb4.5 mining the worldwideweb
4.5 mining the worldwideweb
 
Webmining ppt
Webmining pptWebmining ppt
Webmining ppt
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
 
A survey on web usage mining techniques
A survey on web usage mining techniquesA survey on web usage mining techniques
A survey on web usage mining techniques
 
Web mining
Web miningWeb mining
Web mining
 

Destacado

Research professional activity network analysis
Research professional activity network analysisResearch professional activity network analysis
Research professional activity network analysisSilicon
 
Research professional activity network analysis2
Research professional activity network analysis2Research professional activity network analysis2
Research professional activity network analysis2Silicon
 
Data mining
Data miningData mining
Data miningSilicon
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Qingkai Kong
 
Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)spartacus131211
 
Page rank and hyperlink
Page rank and hyperlink Page rank and hyperlink
Page rank and hyperlink Silicon
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Randa Elanwar
 
Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...
Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...
Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...aferrandini
 
Artificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character RecognitionArtificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character RecognitionDr. Uday Saikia
 

Destacado (14)

web mining
web miningweb mining
web mining
 
Web mining
Web miningWeb mining
Web mining
 
Research professional activity network analysis
Research professional activity network analysisResearch professional activity network analysis
Research professional activity network analysis
 
Research professional activity network analysis2
Research professional activity network analysis2Research professional activity network analysis2
Research professional activity network analysis2
 
Data mining
Data miningData mining
Data mining
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network
 
Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)
 
Page rank and hyperlink
Page rank and hyperlink Page rank and hyperlink
Page rank and hyperlink
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Intoduction to Neural Network
Intoduction to Neural NetworkIntoduction to Neural Network
Intoduction to Neural Network
 
Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9
 
Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...
Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...
Artificial Neural Network in a Tic Tac Toe Symfony Console Application - Symf...
 
Artificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character RecognitionArtificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character Recognition
 

Similar a Web mining

Building Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On HadoopBuilding Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On HadoopNikolai Avteniev
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInHakka Labs
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web MiningIOSR Journals
 
3 05564736
3 055647363 05564736
3 05564736School
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
Setup API Introductie
Setup API IntroductieSetup API Introductie
Setup API Introductieannehelmond
 
Web 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionWeb 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionNitin Godawat
 
The Semantic Web #1 - Overview
The Semantic Web #1 - OverviewThe Semantic Web #1 - Overview
The Semantic Web #1 - OverviewMyungjin Lee
 
Reverse Engineering Web Applications
Reverse Engineering Web ApplicationsReverse Engineering Web Applications
Reverse Engineering Web ApplicationsPorfirio Tramontana
 
Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...Samur Araujo
 

Similar a Web mining (20)

Building Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On HadoopBuilding Satori: Web Data Extraction On Hadoop
Building Satori: Web Data Extraction On Hadoop
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
 
105 108
105 108105 108
105 108
 
WEB MINING.pptx
WEB MINING.pptxWEB MINING.pptx
WEB MINING.pptx
 
Literature Survey on Web Mining
Literature Survey on Web MiningLiterature Survey on Web Mining
Literature Survey on Web Mining
 
3 05564736
3 055647363 05564736
3 05564736
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Minning WWW
Minning WWWMinning WWW
Minning WWW
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Setup API Introductie
Setup API IntroductieSetup API Introductie
Setup API Introductie
 
Web 3.0: The Upcoming Revolution
Web 3.0: The Upcoming RevolutionWeb 3.0: The Upcoming Revolution
Web 3.0: The Upcoming Revolution
 
625 634
625 634625 634
625 634
 
Cl32543545
Cl32543545Cl32543545
Cl32543545
 
Cl32543545
Cl32543545Cl32543545
Cl32543545
 
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
 
Web mining
Web mining Web mining
Web mining
 
The Semantic Web #1 - Overview
The Semantic Web #1 - OverviewThe Semantic Web #1 - Overview
The Semantic Web #1 - Overview
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Reverse Engineering Web Applications
Reverse Engineering Web ApplicationsReverse Engineering Web Applications
Reverse Engineering Web Applications
 
Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...Linkator: enriching web pages by automatically adding dereferenceable semanti...
Linkator: enriching web pages by automatically adding dereferenceable semanti...
 

Web mining

  • 2. Mining refers to extract something from any where. Anand Bihari 2
  • 3. Data mining refers to extracting or “mining” knowledge from large amounts of data.  Mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining.  Thus, data mining should have been more appropriately named “knowledge mining from data,” which is unfortunately somewhat long.  “Knowledge mining,” a shorter term, may not reflect the emphasis on mining from large amounts of data. Anand Bihari 3
  • 4. Web is a collection of inter-related files on one or more Web servers.  Huge : Over 1 billion pages, 15 terabytes.  Wealth of information : Presence everywhere.  Highly Dynamic : Sites registered, closed .  Structure : Graph structure with links between pages.  Access : Hundreds of millions of requests per day. Anand Bihari 4
  • 5. Web mining is the application of data mining techniques to extract knowledge from Web data, including Web documents, hyperlinks between documents, usage logs of web sites, etc.  Web Data :  Web content : text, image, record ,etc.  Web structure : hyperlinks, tag, etc.  Web usage : http logs, app server logs, etc. Anand Bihari 5
  • 6. Traditional data mining Data is structured and relational. Well-defined tables, columns, rows, keys, and constraints.  Web data Semi-structured and unstructured. Readily available data. Rich in features and patterns. Anand Bihari 6
  • 7. E-commerce  User profiles.  Targeted advertising.  Network Management  Performance management.  Fault management.  Information retrieval (Search) on the Web Anand Bihari 7
  • 8. Web Mining Structure Content Mining Usage Mining Mining Document Text Hyperlink Web Server Log Structure Inter Document Application Image Hyperlink Sever Log Intra Document Application Video Hyperlink Level Log Audio Structure Record Anand Bihari 8
  • 9.
  • 10. The structure of a typical Web graph consists of Web pages as nodes, and hyperlinks as edges connecting between two related pages.  Web Structure Mining can be is the process of discovering structure information from the Web .  This type of mining can be performed either at the (intra-page) document level or at the (inter-page) hyperlink level. Anand Bihari 10
  • 11. Web-graph : A directed graph that represents the Web.  Node : Each Web page is a node of the Web-graph.  Link : Each hyperlink on the Web is a directed edge of the Web- graph.  In-degree :The in-degree of a node, p , is the number of distinct links that point to p.  Out-degree : The out-degree of a node, p, is the number of distinct links originating at p that point to other nodes. Anand Bihari 11
  • 12. Directed Path : A sequence of links, starting from p that can be followed to reach q.  Shortest Path: Of all the paths between nodes p and q, which has the shortest length, i.e. number of links on it.  Diameter : The maximum of all the shortest paths between a pair of nodes p and q, for all pairs of nodes p and q in the Web-graph. Anand Bihari 12
  • 13. Literature Survey Titles Name of Publication Journal/Conferences Year Mining web informative structures IEEE Transactions On Knowledge 2004 and Contents based on entropy And Data Engineering analysis Wisdom: web intra page IEEE Transactions On Knowledge 2005 informative structure Mining based And Data Engineering on document object model Knowledge Discovery and Retrieval 2010 Fourth Asia International 2010 on World Wide Web Conference on Using Web Structure Mining Mathematical/Analytical Modelling and Computer Simulation Design and implementation of a International Conference on 2011 web structure Mining algorithm internet technology and secured using breadth first search Strategy transactions for academic search application Anand Bihari 13
  • 14. Problem Identification After studying these Journals and conference paper, we will find the problem and go with this problem. Anand Bihari 14

Notas del editor

  1. hhhhh