1. (DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY, LONERE(2022-23)
RAJIV GANDHI COLLEGE OF ENGINEERING, RESEARCH & TECHNOLOGY, CHANDRAPUR
SEMINAR REPORT
ON
“Web Clustering Engines”
Submitted
By
Prajwal Dilip Kamble
Roll No: CSEA347
SEMISTER- III SECOND YEAR
BTCOS347 Seminar –I
Seminar Incharge
Prof. R. V. Lichode
Guided By:
Prof.Ravi Chibule
Dr. Nitin Janwe
HOD, CSE & IT
2. RAJIV GANDHI COLLEGE OF ENGINEERING, RESEARCH & TECHNOLOGY,
CHANDRAPUR
(DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY, LONERE)
(2022-23)
CERTIFICATE
This is to certify that, Mr. Prajwal Dilip Kamble , Roll
No. CSEA347 studying in B. E. Third semester
Computer Science and Engineering in the session
2022-2023, has successfully completed seminar-I on
“Web Clustering Engine” satisfactorily during the
academic session 2022-2023 from CSE, RCERT,
Chandrapur.
Seminar Incharge
Prof. R. V. Lichode
Guided By:
Prof. Ravi Chibule
Dr. Nitin Janwe
HOD, CSE & IT
4. CONTENTS:
• Abstract
• Introduction
• Search engine
• Why web clustering engine
• Main Advantages of cluster hierarchy
• Issues in implementation of cluster
• Architecture
• How to represent Feature/text ?
• Data Centric clustering algorithm
• Conclusion
5. Abstract
Web clustering Engines are emerging trend in the field of information
retrieval. They organize search results by topic, thus offering a
complementary view to the flat ranked list returned by the conventional
search engines.
The search results returned by traditional search engines on different
subtopics or meanings of a query will be mixed together in the list so that the
user may have to sift through a large number of irrelevant items to locate
those of interest. The Web clustering engines categorize the search results
into different hierarchical groups/clusters and display those cluster labels.
6. Introduction
Web Clustering Engines organize search results by topic, thus offering a
complementary view to the flat ranked list returned by the conventional
search engines.
7. Search Engine
A search engine is a software system designed to carry out web searches. They search the
World Wide Web in a systematic way for particular information specified in a textual web search
query. The search results are generally presented in a line of results, often referred to as search
engine results pages.
A search engine is a software program that helps people find the information
they are looking for online using keywords or phrases.
8. Search engine
• Search engine is a website
• Help user to find information on world wide Web.
• Archie is a first search engine
• 1970 Archie stablish is a world first search engine
• The most used search engine is a Google
• Google is invented is 1997 , google is a one of the most
famous search engine.
9. Why web clustering engine?
Conventional engines are not much efficient in ambiguous queries
The search results returned by conventional search engines on query will be
mixed together in the list, irrelevant item occurs.
In this context of search result come into picture!
10. Main advantages of cluster hierarchy
It makes for shortcuts to the items that relate to the same meaning
It allows better topic understanding
It favors systematic exploration of search results.
11. Issues in implementation of clusters
Short input description
Meaningful labels
Selection of similar measure
Grouping of objects into clusters
Computation efficiency
12. Architecture
• Practical implementations of Web search clustering engines will usually consist of four
general components: search results acquisition, input preprocessing, cluster
construction, and visualization of clustered results
13. Search result acquisition
The task of search result acquisition is to provide input for the rest of system.
Based on the query, the acquisition component must deliver 50 to 500 results,
each of which should contain
-Contextual snippet
Title URL pointing to the full text being referred to
The source of search results can be any public search engines such as ggoogle
yahoo etc.
The most elegant way of searching results from search engines is by using
application programming interfaces (APIs) these engines provide.
14. Conclusion
Web clustering engines organize search results by topic thus offering a
complimentary view to the flat-ranked list returned by conventional search
engines.
Due to lack of efficient methods of performance evaluation of clustering
engines they are not seeking the attention of the people.