SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING
SYNERGY INSTITUTE OF ENGINEERING &
TECHNOLOGY, DHENKANAL
SEMINAR ON
How a search engine
works?
Seminar Report ’10 How a search engine works
Dept. of CSE S.I.E.T, Dhenkanal
Guided by : XxX
Submitted by:
SOVAN MISRA
CS-O7-42
0701230147
2
Seminar Report ’10 How a search engine works
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
SYNERGY INSTITUTE OF ENGINEERING
AND TECHNOLOGY
DHENKANAL
CERTIFICATE
Certified that this is a bonafide record of the seminar entitled “HOW
A SEARCH ENGINE WORKS” done by the following student “SOVAN
MISRA” of the 7th semester, Computer Science and Engineering in the
year 2010 in partial fulfilment of the requirements of the award of
Degree of Bachelor of Technology in Computer Science and
Engineering of Synergy Institute Of Engineering And Technology,
Dhenkanal
XxX XxX
Seminar Guide Head of the Department
Dept. of CSE S.I.E.T, Dhenkanal3
Seminar Report ’10 How a search engine works
ACKNOWLEDGEMENT
I thank my seminar guide XxX, Lecturer, SIET, for her proper
guidance, and valuable suggestions. I am indebted to XxX, the HOD,
Computer Science department & other faculty members for giving me
an opportunity to learn and do this seminar. If not for the above
mentioned people my seminar would never have been completed
successfully. I once again extend my sincere thanks to all of them.
SOVAN MISRA
Dept. of CSE S.I.E.T, Dhenkanal4
Seminar Report ’10 How a search engine works
HOW SEARCH ENGINE WORKS?
INTRODUCTION
What is a search engine?
Search engine is a software program that searches a database and gathers
reports, information that contains or is related to specified terms.
Or
It is a website whose primary function is providing a search for gathering
and reporting information’s available on the Internet or a portion of
internet.
Why Search Engine?
In today’s world we have million and billions of information available in
the vast World Wide Web (WWW). If one has to search some information
it will kill lots of time of the user. For this purpose we should have certain
tools for making this searching automatic, Quick, and Effortless.
So, to reduce the problem to a , more or less manageable solution, Web
Search Engine were introduced a few years ago.
Different Search engines:
Dept. of CSE S.I.E.T, Dhenkanal5
Seminar Report ’10 How a search engine works
History of search engines:-
In 1990, the first search engine ARCHIE was released, at that time there is
no World Wide Web. Data resided on defence contractor, university, and
government computers and techies were the only people accessing the
data.
The computers are interconnected by Telnet*.
File Transfer Protocol (FTP) used for transferring file from computer to
computer.
There is no such thing called a Browser. So, information or data are
transferred in their native format and viewed using the associated file type
software.
Archie searched FTP servers and indexed their files into a searchable
directory.
In 1991, Ghopherspace come into existence with the advantage of Gopher.
It catalogued FTP sites and the resulting catalogue become known as
Gopher space.
In 1994, WebCrawler, a new type of search engine that indexed the entire
contents pf a webpage, was introduced.
In between 1995-1998, many changes and development occurred in the
world of search engines. Meta tags* is the webpage were first utilized by
some search engines to determine relevancy.
Search engine rank-checking software was introduced. It provides an
automated tool to determine web sites position and ranking within the
major search engines.
In around 1998, search engine Algorithms was introduced to optimize the
searching.
In 2000, Marketer determines that pay-per click campaigns were an easy
yet expensive approach for gaining top search rankings. To elevate sites in
the searching engine ranking websites started adding useful and relevant
content while optimizing their
WebPages for each specific search engines. And still the search engines
optimization (SEO) is going on by improving the algorithms.
TYPE OF SEARCH ENGINES:
Dept. of CSE S.I.E.T, Dhenkanal6
Seminar Report ’10 How a search engine works
On the basis of working, search engine is categories in the following
groups:
* Crawler-based search engine.
* Directories.
* Hybrid search engines.
* Meta search engine.
CRAWLER BASED SEARCH ENGINE:
It uses automated software programs to survey and categorises WebPages,
which is known as “spiders” ,”crawlers” ,”robots” and ”bots”.
A spider will find a web page, download it and analyses the information
presented on the WebPages. The webpage will then be added to the search
engines database.
When a user performs a search , the search engine will check its database
of WebPages for the key word the user searched.
The results are ordered as per the bots algorithm in the search engine result
pages (SERPs).
Ex:-
GOOGLE (www.google.com)
ASK (www.ask.com)
Dept. of CSE S.I.E.T, Dhenkanal7
Seminar Report ’10 How a search engine works
SPIDER’S ALGORITHMS :
All spiders use the following algorithms for retrieving documents from the
web:
The algorithm uses a list of known URLs. This lists contains at least one
URL to start with.
The document is parsed to retrieve information for the index database and
to extract the embedded link to other documents.
Dept. of CSE S.I.E.T, Dhenkanal8
Seminar Report ’10 How a search engine works
The URL of the links found in the document are added to the list of known
URLs.
If the list is empty or some limit exceed (number of documents retrieved,
size of the indexed database, etc) the algorithm stops, otherwise the
algorithm continues at steps 2.
Crawlers program treats World Wide Web as big graph having pages as
nodes and the hyperlinks as arcs.
Crawlers works with a simple goal, indexing all keywords in the
webpage titles.
Three Data structures is needed for crawlers or spider algorithms
A large linear array, URL_Table.
• Heap
Dept. of CSE S.I.E.T, Dhenkanal9
Seminar Report ’10 How a search engine works
• Hash table
• URL_table:
It is a large linear array that contains millions of entries.
Each entry contains two pointers:
• Pointer to URL
• Pointer to Title.
These are variables length strings and kept as heap.
Heap:
It is a large unstructured chunk of virtual memory to which strings can be
appended.
Hash table:
It is the third data structure of size ‘n’ entries
Any URL can be run through a hash function to produce a non-negative
integer less than ‘n’.
All URL that hash to the value ‘k’ are hooked together on a linked list
starting at the entry ‘k’ of the hash table.
Every entry in the URL_table is also entered into the hash table.
Dept. of CSE S.I.E.T, Dhenkanal10
Seminar Report ’10 How a search engine works
The main use of hash table is to start with a URL and be able to quickly
determine whether it is already present in URL_Table.
DATA STRUCTURE FOR CRAWLER
Building the index requires two phases:
• Searching (URL processing )
• Indexing.
The heart of the search engine is a recursive procedure procees_url, which
takes a URL string as input
Searching is done by procedure, procees_url as follows:-
It hashes the URL to see if it is already present in url_table. If so, it
is done and returns immediately.
If the URL is not already known, its page is fetched.
The URL and title are then copied to the heap and pointers to these
two strings are entered in url_table.
The URL is also entered into the hash table.
Dept. of CSE S.I.E.T, Dhenkanal11
Seminar Report ’10 How a search engine works
Finally, process_url extracts all the hyperlinks from the page and
calls process_url once per hyperlink, passing the hyperlink’s URL as the
input parameter
For each entry in url_table, indexing procedure will examine the
title and selects out all words not on the stop list.
Each selected word is written on to a file with a line consisting of
the word followed by the current url_table entry number.
When the whole table has been scanned, the file is shorted by word.
Formulating quires:
Keyword submission cause a request to be done in the machine
where the index is located (web server).
Then the keyword is looked up in the index database to find the set
of URL indices for each keyword.
Indexed into url_table to find all the titles and urls. Then it is stored
in the Document server.
Dept. of CSE S.I.E.T, Dhenkanal12
Seminar Report ’10 How a search engine works
These are then combined to form a web page and sent back to user as the
response.
Determining Relevance
Classic algorithm "TF / IDF“is used for determining relevance.
 It is a weight often used in information retrieval and text mining.
This weight is a statistical measure used to evaluate how important
a word is to a document in a collection
 A high weight in TF-IDF is reached by a high term frequency (in
the given document) and a low document frequency of the term in
the whole collection of documents
Term Frequency
 “Term Frequency” -The number of times a given term appears in
that document.
 It gives a measure of the importance of the term ti within the
particular document.
Term Frequency,
Where, ni is the number of occurrences of the considered term, and
the denominator is the number of occurrences of all terms.
E.g.
If a document contains 100 total words and the word computer
appears 3 times, then the term frequency of the word computer in the
document is 0.03 (3/100)
Inverse Document Frequency
The “inverse document frequency ”is a measure of the general importance
of the term (obtained by dividing the number of all documents by the
number of documents containing the term, and then taking the logarithm
of that quotient).
Dept. of CSE S.I.E.T, Dhenkanal13
Seminar Report ’10 How a search engine works
Where,
• | D | : total number of documents in the corpus
• : number of documents where the term ti
appears (that is ni!= 0)
Inverse Document Frequency
There are different ways of calculating the IDF
“Document Frequency” (DF) is to determine how many
documents contain the word and divide it by the total number of
documents in the collection.
E.g.
1) If the word computer appears in 1,000 documents out
of a total of 10,000,000 then the IDF is 0.0001
(1000/10,000,000).
2) Alternately, take the log of the document frequency.
The natural alogarithm is commonly used. In this
example we would have
IDF = ln(1,000 / 10,000,000) =1/ 9.21
The final TF-IDF score is then calculated by dividing the “Term
Frequency” by the “Document Frequency”.
E.g.
The TF-IDF score for computer in the collection would be :
1)TF-IDF = 0.03/0.0001= 300 , by using first formula of IDF.
2)If alternate formula used we would have
TF-IDF = 0.03 * 9.21 = 0.27.
Dept. of CSE S.I.E.T, Dhenkanal14
Seminar Report ’10 How a search engine works
OTHER TYPE OF SERCHING TECHNIQUES:
Directories
The human editors comprehensively check the website and
rank it, based on the information they find, using a pre-defined set
of rules.
There are two major directories :
Yahoo Directory (www.yahoo.com)
Open Directory (www.dmoz.org)
Hybrid Search Engines
Hybrid search engines use a combination of both crawler-based
results and directory results.
Dept. of CSE S.I.E.T, Dhenkanal15
Seminar Report ’10 How a search engine works
Examples of hybrid search engines are:
Yahoo (www.yahoo.com)
Google (www.google.com)
Meta Search Engines
Also known as Multiple Search Engines or Metacrawlers.
Meta search engines query several other Web search engine
databases in parallel and then combine the results in one list.
Examples of Meta search engines include:
Metacrawler (www.metacrawler.com)
Dogpile (www.dogpile.com)
References:
http://computer.howstuffworks.com/internet/basics/search-engine.htm
http://searchenginewatch.com/2168031
http://www.infotoday.com/searcher/may01/liddy.htm
http://www.slideshare.net/jsuleiman/how-search-engines-work-
presentation
Hey!
This is Sovan
Please send your feedbacks @
sovan107@gmail.com
Dept. of CSE S.I.E.T, Dhenkanal16

Más contenido relacionado

La actualidad más candente

Canteen automation system (updated) revised
Canteen automation system (updated) revisedCanteen automation system (updated) revised
Canteen automation system (updated) revisedrinshi jain
 
Counterfeit Currency Detection using Image Processing
Counterfeit Currency Detection using Image ProcessingCounterfeit Currency Detection using Image Processing
Counterfeit Currency Detection using Image Processingkarthik0101
 
Online Quiz System Project Report
Online Quiz System Project Report Online Quiz System Project Report
Online Quiz System Project Report Kishan Maurya
 
Top frontend web development tools
Top frontend web development toolsTop frontend web development tools
Top frontend web development toolsBenji Harrison
 
Android Based Application Project Report.
Android Based Application Project Report. Android Based Application Project Report.
Android Based Application Project Report. Abu Kaisar
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningJohn Alex
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first courseVlad Posea
 
Front end web development
Front end web developmentFront end web development
Front end web developmentviveksewa
 
Online Shopping project report
Online Shopping project report Online Shopping project report
Online Shopping project report Surjeet Art
 
Responsive web designing ppt(1)
Responsive web designing ppt(1)Responsive web designing ppt(1)
Responsive web designing ppt(1)admecindia1
 
Final project presentation CSE
Final project presentation CSEFinal project presentation CSE
Final project presentation CSEHumayra Khanum
 
My Project Report Documentation with Abstract & Snapshots
My Project Report Documentation with Abstract & SnapshotsMy Project Report Documentation with Abstract & Snapshots
My Project Report Documentation with Abstract & SnapshotsUsman Sait
 
ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognitionAayush Agrawal
 
srs for railway reservation system
 srs for railway reservation system srs for railway reservation system
srs for railway reservation systemkhushi kalaria
 

La actualidad más candente (20)

Web crawler
Web crawlerWeb crawler
Web crawler
 
Canteen automation system (updated) revised
Canteen automation system (updated) revisedCanteen automation system (updated) revised
Canteen automation system (updated) revised
 
Counterfeit Currency Detection using Image Processing
Counterfeit Currency Detection using Image ProcessingCounterfeit Currency Detection using Image Processing
Counterfeit Currency Detection using Image Processing
 
Online Quiz System Project Report
Online Quiz System Project Report Online Quiz System Project Report
Online Quiz System Project Report
 
Top frontend web development tools
Top frontend web development toolsTop frontend web development tools
Top frontend web development tools
 
Final year ppt
Final year pptFinal year ppt
Final year ppt
 
Android Based Application Project Report.
Android Based Application Project Report. Android Based Application Project Report.
Android Based Application Project Report.
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine Learning
 
Domain specific IoT
Domain specific IoTDomain specific IoT
Domain specific IoT
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first course
 
Safety app for woman
Safety app for womanSafety app for woman
Safety app for woman
 
Front end web development
Front end web developmentFront end web development
Front end web development
 
Online Shopping project report
Online Shopping project report Online Shopping project report
Online Shopping project report
 
Responsive web designing ppt(1)
Responsive web designing ppt(1)Responsive web designing ppt(1)
Responsive web designing ppt(1)
 
Food order
Food orderFood order
Food order
 
Final project presentation CSE
Final project presentation CSEFinal project presentation CSE
Final project presentation CSE
 
My Project Report Documentation with Abstract & Snapshots
My Project Report Documentation with Abstract & SnapshotsMy Project Report Documentation with Abstract & Snapshots
My Project Report Documentation with Abstract & Snapshots
 
FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT
 
ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognition
 
srs for railway reservation system
 srs for railway reservation system srs for railway reservation system
srs for railway reservation system
 

Destacado

Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint201014161
 
Fogscreen seminar report
Fogscreen seminar reportFogscreen seminar report
Fogscreen seminar reportSovan Misra
 
Blueprint to Search Engine Success
Blueprint to Search Engine SuccessBlueprint to Search Engine Success
Blueprint to Search Engine SuccessiContact
 
Training Project Report on Search Engines
Training Project Report on Search EnginesTraining Project Report on Search Engines
Training Project Report on Search EnginesShivam Saxena
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines PresentationJSCHO9
 
How a search engine works slide
How a search engine works slideHow a search engine works slide
How a search engine works slideSovan Misra
 
Green engine - an introduction
Green engine - an introductionGreen engine - an introduction
Green engine - an introductionDIVINE SEBASTIAN
 
The river valley civilizations
The river valley civilizationsThe river valley civilizations
The river valley civilizationsAshley Birmingham
 
Yuet - Chinese speech synthesis engine
Yuet - Chinese speech synthesis engineYuet - Chinese speech synthesis engine
Yuet - Chinese speech synthesis engineMary Chan
 
anna university automobile engineering unit 1
anna university automobile engineering unit 1 anna university automobile engineering unit 1
anna university automobile engineering unit 1 suresh n
 
AUTOMOBILE BASICS
AUTOMOBILE BASICSAUTOMOBILE BASICS
AUTOMOBILE BASICSKUCH BHI
 
11 Things to Look For in a Hotel Booking Engine Provider
11 Things to Look For in a Hotel Booking Engine Provider11 Things to Look For in a Hotel Booking Engine Provider
11 Things to Look For in a Hotel Booking Engine ProviderNet Affinity
 
Six stroke-engine-presenation
Six stroke-engine-presenationSix stroke-engine-presenation
Six stroke-engine-presenationgunjan panchal
 
SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...
SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...
SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...EBAI
 
UNIFAL - MySQL Storage Engine - 5.0/5.6
UNIFAL - MySQL Storage Engine - 5.0/5.6UNIFAL - MySQL Storage Engine - 5.0/5.6
UNIFAL - MySQL Storage Engine - 5.0/5.6Wagner Bianchi
 

Destacado (20)

Search Engine Powerpoint
Search Engine PowerpointSearch Engine Powerpoint
Search Engine Powerpoint
 
Search engines
Search enginesSearch engines
Search engines
 
Search Engine project ppt
 Search Engine project ppt Search Engine project ppt
Search Engine project ppt
 
Fogscreen seminar report
Fogscreen seminar reportFogscreen seminar report
Fogscreen seminar report
 
Blueprint to Search Engine Success
Blueprint to Search Engine SuccessBlueprint to Search Engine Success
Blueprint to Search Engine Success
 
Training Project Report on Search Engines
Training Project Report on Search EnginesTraining Project Report on Search Engines
Training Project Report on Search Engines
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Fogscreen
FogscreenFogscreen
Fogscreen
 
How a search engine works slide
How a search engine works slideHow a search engine works slide
How a search engine works slide
 
Green engine - an introduction
Green engine - an introductionGreen engine - an introduction
Green engine - an introduction
 
The river valley civilizations
The river valley civilizationsThe river valley civilizations
The river valley civilizations
 
Yuet - Chinese speech synthesis engine
Yuet - Chinese speech synthesis engineYuet - Chinese speech synthesis engine
Yuet - Chinese speech synthesis engine
 
anna university automobile engineering unit 1
anna university automobile engineering unit 1 anna university automobile engineering unit 1
anna university automobile engineering unit 1
 
Omar faruk CV
Omar faruk CVOmar faruk CV
Omar faruk CV
 
AUTOMOBILE BASICS
AUTOMOBILE BASICSAUTOMOBILE BASICS
AUTOMOBILE BASICS
 
11 Things to Look For in a Hotel Booking Engine Provider
11 Things to Look For in a Hotel Booking Engine Provider11 Things to Look For in a Hotel Booking Engine Provider
11 Things to Look For in a Hotel Booking Engine Provider
 
Six stroke-engine-presenation
Six stroke-engine-presenationSix stroke-engine-presenation
Six stroke-engine-presenation
 
6 stroke engine
6 stroke engine6 stroke engine
6 stroke engine
 
SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...
SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...
SEARCH ENGINE OPTIMIZATION - SEO: a contribuição do bibliotecário na otimizaç...
 
UNIFAL - MySQL Storage Engine - 5.0/5.6
UNIFAL - MySQL Storage Engine - 5.0/5.6UNIFAL - MySQL Storage Engine - 5.0/5.6
UNIFAL - MySQL Storage Engine - 5.0/5.6
 

Similar a How a search engine works report

Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)ROHIT SAHU
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
TEXT ANALYZER
TEXT ANALYZER TEXT ANALYZER
TEXT ANALYZER ijcseit
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET Journal
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET Journal
 
An Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document RetrievalAn Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document Retrievaliosrjce
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
 
B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...Thuan Ng
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search enginePrimya Tamil
 
Working of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th semWorking of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th semROHIT SAHU
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerIOSR Journals
 
Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013Terrence Nguyen
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engineankur881120
 
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...ijcseit
 

Similar a How a search engine works report (20)

Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
SearchEngine.pptx
SearchEngine.pptxSearchEngine.pptx
SearchEngine.pptx
 
TEXT ANALYZER
TEXT ANALYZER TEXT ANALYZER
TEXT ANALYZER
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
G017254554
G017254554G017254554
G017254554
 
An Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document RetrievalAn Intelligent Meta Search Engine for Efficient Web Document Retrieval
An Intelligent Meta Search Engine for Efficient Web Document Retrieval
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
 
E017624043
E017624043E017624043
E017624043
 
B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...B365 saturday practical guide to building a scalable search architecture in s...
B365 saturday practical guide to building a scalable search architecture in s...
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Components of a search engine
Components of a search engineComponents of a search engine
Components of a search engine
 
Working of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th semWorking of search engines(rohit sahu cs 17) 5th sem
Working of search engines(rohit sahu cs 17) 5th sem
 
Searching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal ComputerSearching and Analyzing Qualitative Data on Personal Computer
Searching and Analyzing Qualitative Data on Personal Computer
 
Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013Building a scalable search architecture in share point 2013
Building a scalable search architecture in share point 2013
 
Business Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search EngineBusiness Intelligence Solution Using Search Engine
Business Intelligence Solution Using Search Engine
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON...
 

Último

In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
Latin American Revolutions, c. 1789-1830
Latin American Revolutions, c. 1789-1830Latin American Revolutions, c. 1789-1830
Latin American Revolutions, c. 1789-1830Dave Phillips
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphNetziValdelomar1
 
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptxmary850239
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 

Último (20)

In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
Latin American Revolutions, c. 1789-1830
Latin American Revolutions, c. 1789-1830Latin American Revolutions, c. 1789-1830
Latin American Revolutions, c. 1789-1830
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
Presentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a ParagraphPresentation on the Basics of Writing. Writing a Paragraph
Presentation on the Basics of Writing. Writing a Paragraph
 
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
3.19.24 Urban Uprisings and the Chicago Freedom Movement.pptx
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 

How a search engine works report

  • 1. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SYNERGY INSTITUTE OF ENGINEERING & TECHNOLOGY, DHENKANAL SEMINAR ON How a search engine works?
  • 2. Seminar Report ’10 How a search engine works Dept. of CSE S.I.E.T, Dhenkanal Guided by : XxX Submitted by: SOVAN MISRA CS-O7-42 0701230147 2
  • 3. Seminar Report ’10 How a search engine works DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING SYNERGY INSTITUTE OF ENGINEERING AND TECHNOLOGY DHENKANAL CERTIFICATE Certified that this is a bonafide record of the seminar entitled “HOW A SEARCH ENGINE WORKS” done by the following student “SOVAN MISRA” of the 7th semester, Computer Science and Engineering in the year 2010 in partial fulfilment of the requirements of the award of Degree of Bachelor of Technology in Computer Science and Engineering of Synergy Institute Of Engineering And Technology, Dhenkanal XxX XxX Seminar Guide Head of the Department Dept. of CSE S.I.E.T, Dhenkanal3
  • 4. Seminar Report ’10 How a search engine works ACKNOWLEDGEMENT I thank my seminar guide XxX, Lecturer, SIET, for her proper guidance, and valuable suggestions. I am indebted to XxX, the HOD, Computer Science department & other faculty members for giving me an opportunity to learn and do this seminar. If not for the above mentioned people my seminar would never have been completed successfully. I once again extend my sincere thanks to all of them. SOVAN MISRA Dept. of CSE S.I.E.T, Dhenkanal4
  • 5. Seminar Report ’10 How a search engine works HOW SEARCH ENGINE WORKS? INTRODUCTION What is a search engine? Search engine is a software program that searches a database and gathers reports, information that contains or is related to specified terms. Or It is a website whose primary function is providing a search for gathering and reporting information’s available on the Internet or a portion of internet. Why Search Engine? In today’s world we have million and billions of information available in the vast World Wide Web (WWW). If one has to search some information it will kill lots of time of the user. For this purpose we should have certain tools for making this searching automatic, Quick, and Effortless. So, to reduce the problem to a , more or less manageable solution, Web Search Engine were introduced a few years ago. Different Search engines: Dept. of CSE S.I.E.T, Dhenkanal5
  • 6. Seminar Report ’10 How a search engine works History of search engines:- In 1990, the first search engine ARCHIE was released, at that time there is no World Wide Web. Data resided on defence contractor, university, and government computers and techies were the only people accessing the data. The computers are interconnected by Telnet*. File Transfer Protocol (FTP) used for transferring file from computer to computer. There is no such thing called a Browser. So, information or data are transferred in their native format and viewed using the associated file type software. Archie searched FTP servers and indexed their files into a searchable directory. In 1991, Ghopherspace come into existence with the advantage of Gopher. It catalogued FTP sites and the resulting catalogue become known as Gopher space. In 1994, WebCrawler, a new type of search engine that indexed the entire contents pf a webpage, was introduced. In between 1995-1998, many changes and development occurred in the world of search engines. Meta tags* is the webpage were first utilized by some search engines to determine relevancy. Search engine rank-checking software was introduced. It provides an automated tool to determine web sites position and ranking within the major search engines. In around 1998, search engine Algorithms was introduced to optimize the searching. In 2000, Marketer determines that pay-per click campaigns were an easy yet expensive approach for gaining top search rankings. To elevate sites in the searching engine ranking websites started adding useful and relevant content while optimizing their WebPages for each specific search engines. And still the search engines optimization (SEO) is going on by improving the algorithms. TYPE OF SEARCH ENGINES: Dept. of CSE S.I.E.T, Dhenkanal6
  • 7. Seminar Report ’10 How a search engine works On the basis of working, search engine is categories in the following groups: * Crawler-based search engine. * Directories. * Hybrid search engines. * Meta search engine. CRAWLER BASED SEARCH ENGINE: It uses automated software programs to survey and categorises WebPages, which is known as “spiders” ,”crawlers” ,”robots” and ”bots”. A spider will find a web page, download it and analyses the information presented on the WebPages. The webpage will then be added to the search engines database. When a user performs a search , the search engine will check its database of WebPages for the key word the user searched. The results are ordered as per the bots algorithm in the search engine result pages (SERPs). Ex:- GOOGLE (www.google.com) ASK (www.ask.com) Dept. of CSE S.I.E.T, Dhenkanal7
  • 8. Seminar Report ’10 How a search engine works SPIDER’S ALGORITHMS : All spiders use the following algorithms for retrieving documents from the web: The algorithm uses a list of known URLs. This lists contains at least one URL to start with. The document is parsed to retrieve information for the index database and to extract the embedded link to other documents. Dept. of CSE S.I.E.T, Dhenkanal8
  • 9. Seminar Report ’10 How a search engine works The URL of the links found in the document are added to the list of known URLs. If the list is empty or some limit exceed (number of documents retrieved, size of the indexed database, etc) the algorithm stops, otherwise the algorithm continues at steps 2. Crawlers program treats World Wide Web as big graph having pages as nodes and the hyperlinks as arcs. Crawlers works with a simple goal, indexing all keywords in the webpage titles. Three Data structures is needed for crawlers or spider algorithms A large linear array, URL_Table. • Heap Dept. of CSE S.I.E.T, Dhenkanal9
  • 10. Seminar Report ’10 How a search engine works • Hash table • URL_table: It is a large linear array that contains millions of entries. Each entry contains two pointers: • Pointer to URL • Pointer to Title. These are variables length strings and kept as heap. Heap: It is a large unstructured chunk of virtual memory to which strings can be appended. Hash table: It is the third data structure of size ‘n’ entries Any URL can be run through a hash function to produce a non-negative integer less than ‘n’. All URL that hash to the value ‘k’ are hooked together on a linked list starting at the entry ‘k’ of the hash table. Every entry in the URL_table is also entered into the hash table. Dept. of CSE S.I.E.T, Dhenkanal10
  • 11. Seminar Report ’10 How a search engine works The main use of hash table is to start with a URL and be able to quickly determine whether it is already present in URL_Table. DATA STRUCTURE FOR CRAWLER Building the index requires two phases: • Searching (URL processing ) • Indexing. The heart of the search engine is a recursive procedure procees_url, which takes a URL string as input Searching is done by procedure, procees_url as follows:- It hashes the URL to see if it is already present in url_table. If so, it is done and returns immediately. If the URL is not already known, its page is fetched. The URL and title are then copied to the heap and pointers to these two strings are entered in url_table. The URL is also entered into the hash table. Dept. of CSE S.I.E.T, Dhenkanal11
  • 12. Seminar Report ’10 How a search engine works Finally, process_url extracts all the hyperlinks from the page and calls process_url once per hyperlink, passing the hyperlink’s URL as the input parameter For each entry in url_table, indexing procedure will examine the title and selects out all words not on the stop list. Each selected word is written on to a file with a line consisting of the word followed by the current url_table entry number. When the whole table has been scanned, the file is shorted by word. Formulating quires: Keyword submission cause a request to be done in the machine where the index is located (web server). Then the keyword is looked up in the index database to find the set of URL indices for each keyword. Indexed into url_table to find all the titles and urls. Then it is stored in the Document server. Dept. of CSE S.I.E.T, Dhenkanal12
  • 13. Seminar Report ’10 How a search engine works These are then combined to form a web page and sent back to user as the response. Determining Relevance Classic algorithm "TF / IDF“is used for determining relevance.  It is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection  A high weight in TF-IDF is reached by a high term frequency (in the given document) and a low document frequency of the term in the whole collection of documents Term Frequency  “Term Frequency” -The number of times a given term appears in that document.  It gives a measure of the importance of the term ti within the particular document. Term Frequency, Where, ni is the number of occurrences of the considered term, and the denominator is the number of occurrences of all terms. E.g. If a document contains 100 total words and the word computer appears 3 times, then the term frequency of the word computer in the document is 0.03 (3/100) Inverse Document Frequency The “inverse document frequency ”is a measure of the general importance of the term (obtained by dividing the number of all documents by the number of documents containing the term, and then taking the logarithm of that quotient). Dept. of CSE S.I.E.T, Dhenkanal13
  • 14. Seminar Report ’10 How a search engine works Where, • | D | : total number of documents in the corpus • : number of documents where the term ti appears (that is ni!= 0) Inverse Document Frequency There are different ways of calculating the IDF “Document Frequency” (DF) is to determine how many documents contain the word and divide it by the total number of documents in the collection. E.g. 1) If the word computer appears in 1,000 documents out of a total of 10,000,000 then the IDF is 0.0001 (1000/10,000,000). 2) Alternately, take the log of the document frequency. The natural alogarithm is commonly used. In this example we would have IDF = ln(1,000 / 10,000,000) =1/ 9.21 The final TF-IDF score is then calculated by dividing the “Term Frequency” by the “Document Frequency”. E.g. The TF-IDF score for computer in the collection would be : 1)TF-IDF = 0.03/0.0001= 300 , by using first formula of IDF. 2)If alternate formula used we would have TF-IDF = 0.03 * 9.21 = 0.27. Dept. of CSE S.I.E.T, Dhenkanal14
  • 15. Seminar Report ’10 How a search engine works OTHER TYPE OF SERCHING TECHNIQUES: Directories The human editors comprehensively check the website and rank it, based on the information they find, using a pre-defined set of rules. There are two major directories : Yahoo Directory (www.yahoo.com) Open Directory (www.dmoz.org) Hybrid Search Engines Hybrid search engines use a combination of both crawler-based results and directory results. Dept. of CSE S.I.E.T, Dhenkanal15
  • 16. Seminar Report ’10 How a search engine works Examples of hybrid search engines are: Yahoo (www.yahoo.com) Google (www.google.com) Meta Search Engines Also known as Multiple Search Engines or Metacrawlers. Meta search engines query several other Web search engine databases in parallel and then combine the results in one list. Examples of Meta search engines include: Metacrawler (www.metacrawler.com) Dogpile (www.dogpile.com) References: http://computer.howstuffworks.com/internet/basics/search-engine.htm http://searchenginewatch.com/2168031 http://www.infotoday.com/searcher/may01/liddy.htm http://www.slideshare.net/jsuleiman/how-search-engines-work- presentation Hey! This is Sovan Please send your feedbacks @ sovan107@gmail.com Dept. of CSE S.I.E.T, Dhenkanal16