SlideShare a Scribd company logo
1 of 1
Download to read offline
KOSURU SAI MALLESWAR; SC09B093; SEM-6.


     REVIEW OF “The Anatomy of a Large-Scale Hyper textual Web Search Engine”

Sergey Brin and Lawrence Page started the design of ‘Google’ to make a search engine that can
crawl and index the web quickly and efficiently and to effectively deal with huge uncontrolled
hypertext collections. One of the main goals was to improve the quality and scalability of search.
Another goal was to setup a system that can support novel research activities on large-scale web
data and a reasonable number of people can actually use it for their academic research.

Google makes efficient use of storage space to store the index. This allows the quality of the
search to scale effectively to the size of the web as it grows. Its data structures are optimized for
fast and efficient access. To get high precision, Google uses the link structure of the Web to
calculate a quality ranking for each web page. This ranking is called PageRank. The probability
that the ‘random surfer’ visits a page is its PageRank. The ranking also involves damping factor,
which is the probability at each page the ‘random surfer’ will get bored and request another
random page. It allows for personalization and can make it nearly impossible to deliberately
mislead the system in order to get a higher ranking. The text of a link is associated with the page
that the link is on and also with the page the link points to. This idea of anchor text propagation
provides better quality search but the challenge was the efficient usage of it because of the heavy
data processing task. Along with page rank Google keeps a track of location information of all
hits, some visual presentation details and stores full raw HTML of pages in the repository.

Most of the Google’s architecture is implemented in C or C++ for efficiency and can run in
either Solaris or Linux. The data structures of Google include big files, document indexes,
lexicon, forward and reverse indexes and a huge repository. Google’s data structures are
optimized in terms of cost by the feature of avoiding disk seeks whenever possible. Google has a
fast distributed crawling system, where URL server and the crawlers are implemented in Python.
Each crawler maintains a DNS cache to reduce the no. of DNS lookups, uses asynchronous IO
and a no. of queues. The steps involved in indexing are parsing, indexing documents into barrels
using multiple indexers running in parallel and sorting. The Google’s ranking system is designed
so that no particular factor can have too much influence. The dot product of the vector of count-
weights with the vector of type-weights is used to compute an IR score for the document.
Finally, the IR score is combined with PageRank to give a final rank to the document. For multi
word search, Google has a complex algorithm. Google also considers feedback by trusted users
while updating the ranks of webpages.

Google can produce better results than the major commercial search engines for most searches.
Google has evolved to overcome a number of bottlenecks in CPU, memory access, memory
capacity, disk seeks, disk throughput, disk capacity, and network IO during various operations.
By the efficient crawling and indexing performed by Google, information can be kept up to date
and major changes can be tested relatively quickly. Google does not have optimizations such as
query caching, sub-indices on common terms. The inventors intended to speed up Google
considerably through distribution and hardware, software, and algorithmic improvements. They
wished to make Google as a high quality search tool for searchers and researchers all around the
world, sparking the next generation of search engine technology.

More Related Content

What's hot

Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
 
WT - Web & Working of Search Engine
WT - Web & Working of Search EngineWT - Web & Working of Search Engine
WT - Web & Working of Search Enginevinay arora
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGlebinit singh
 
Google Birth2
Google Birth2Google Birth2
Google Birth2imscott
 
Google Birth2
Google Birth2Google Birth2
Google Birth2imscott
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET Journal
 
Working of search engine
Working of search engineWorking of search engine
Working of search engineNikhil Deswal
 
Using HBase for Real Time Access
Using HBase for Real Time AccessUsing HBase for Real Time Access
Using HBase for Real Time AccessRahul Gaikwad
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 
Linq a framework for location aware indexing and query processing
Linq a framework for location aware indexing and query processingLinq a framework for location aware indexing and query processing
Linq a framework for location aware indexing and query processingieeepondy
 
NNg Visioneering-MKish
NNg Visioneering-MKishNNg Visioneering-MKish
NNg Visioneering-MKishkishmc
 

What's hot (17)

Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
 
WT - Web & Working of Search Engine
WT - Web & Working of Search EngineWT - Web & Working of Search Engine
WT - Web & Working of Search Engine
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGle
 
Google Birth2
Google Birth2Google Birth2
Google Birth2
 
Google Birth2
Google Birth2Google Birth2
Google Birth2
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
Web crawling
Web crawlingWeb crawling
Web crawling
 
Working of search engine
Working of search engineWorking of search engine
Working of search engine
 
Seminar on crawler
Seminar on crawlerSeminar on crawler
Seminar on crawler
 
Googleworks
GoogleworksGoogleworks
Googleworks
 
Using HBase for Real Time Access
Using HBase for Real Time AccessUsing HBase for Real Time Access
Using HBase for Real Time Access
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
Data Infrastructure in Kumparan
Data Infrastructure in KumparanData Infrastructure in Kumparan
Data Infrastructure in Kumparan
 
Linq a framework for location aware indexing and query processing
Linq a framework for location aware indexing and query processingLinq a framework for location aware indexing and query processing
Linq a framework for location aware indexing and query processing
 
NNg Visioneering-MKish
NNg Visioneering-MKishNNg Visioneering-MKish
NNg Visioneering-MKish
 
Search engine
Search engineSearch engine
Search engine
 

Viewers also liked

Ambulatory & Cursorial legs modification of Insects by M.Salman
Ambulatory & Cursorial legs modification of Insects by M.SalmanAmbulatory & Cursorial legs modification of Insects by M.Salman
Ambulatory & Cursorial legs modification of Insects by M.SalmanMuhammad Salman
 
Sheng mu yun mu biao
Sheng mu yun mu biaoSheng mu yun mu biao
Sheng mu yun mu biaopohyeanlee
 
Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...
Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...
Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...Raheme Matthie
 
Your memories will always remain in our hearts slideshow of joy
Your memories will always remain in our hearts slideshow of joyYour memories will always remain in our hearts slideshow of joy
Your memories will always remain in our hearts slideshow of joyRuben Cabato
 
Meet Your 7 Fascination Triggers
Meet Your 7 Fascination TriggersMeet Your 7 Fascination Triggers
Meet Your 7 Fascination TriggersHOW TO FASCINATE
 
Antidysrhythmics
AntidysrhythmicsAntidysrhythmics
Antidysrhythmicsraj kumar
 
Another/Other/Others/The other Presentation
Another/Other/Others/The other PresentationAnother/Other/Others/The other Presentation
Another/Other/Others/The other Presentationnlopez74
 
Chapter 10 toward a theory of second language acquisition
Chapter 10  toward a theory of second language acquisitionChapter 10  toward a theory of second language acquisition
Chapter 10 toward a theory of second language acquisitionNoni Ib
 
Factors causes students low english language in national university of laos
Factors causes students low english language in national university of laosFactors causes students low english language in national university of laos
Factors causes students low english language in national university of laosSam Rany
 
What is a lyric poem
What is a lyric poemWhat is a lyric poem
What is a lyric poemteacher
 
Subsistema de desenvolvimento de recursos humanos
Subsistema de desenvolvimento de recursos humanosSubsistema de desenvolvimento de recursos humanos
Subsistema de desenvolvimento de recursos humanosUniversidade Pedagogica
 
Econ315 Money and Banking: Learning Unit #05: Indirect Finance
Econ315 Money and Banking: Learning Unit #05: Indirect FinanceEcon315 Money and Banking: Learning Unit #05: Indirect Finance
Econ315 Money and Banking: Learning Unit #05: Indirect Financesakanor
 

Viewers also liked (20)

Vbulletin
VbulletinVbulletin
Vbulletin
 
Ambulatory & Cursorial legs modification of Insects by M.Salman
Ambulatory & Cursorial legs modification of Insects by M.SalmanAmbulatory & Cursorial legs modification of Insects by M.Salman
Ambulatory & Cursorial legs modification of Insects by M.Salman
 
Sheng mu yun mu biao
Sheng mu yun mu biaoSheng mu yun mu biao
Sheng mu yun mu biao
 
Olleh TV for International Users
Olleh TV for International UsersOlleh TV for International Users
Olleh TV for International Users
 
Is this your pen
Is this your penIs this your pen
Is this your pen
 
B604 Revision Booklet
B604 Revision BookletB604 Revision Booklet
B604 Revision Booklet
 
Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...
Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...
Human & Social Biology - Sample Project on 'The Impact of Heath Practices on ...
 
KDU Fees 2015
KDU Fees 2015KDU Fees 2015
KDU Fees 2015
 
Your memories will always remain in our hearts slideshow of joy
Your memories will always remain in our hearts slideshow of joyYour memories will always remain in our hearts slideshow of joy
Your memories will always remain in our hearts slideshow of joy
 
Meet Your 7 Fascination Triggers
Meet Your 7 Fascination TriggersMeet Your 7 Fascination Triggers
Meet Your 7 Fascination Triggers
 
Antidysrhythmics
AntidysrhythmicsAntidysrhythmics
Antidysrhythmics
 
Another/Other/Others/The other Presentation
Another/Other/Others/The other PresentationAnother/Other/Others/The other Presentation
Another/Other/Others/The other Presentation
 
Chapter 10 toward a theory of second language acquisition
Chapter 10  toward a theory of second language acquisitionChapter 10  toward a theory of second language acquisition
Chapter 10 toward a theory of second language acquisition
 
Writing part 3
Writing part 3Writing part 3
Writing part 3
 
Factors causes students low english language in national university of laos
Factors causes students low english language in national university of laosFactors causes students low english language in national university of laos
Factors causes students low english language in national university of laos
 
What is a lyric poem
What is a lyric poemWhat is a lyric poem
What is a lyric poem
 
Zinc Smelter Project Report
Zinc Smelter Project ReportZinc Smelter Project Report
Zinc Smelter Project Report
 
Subsistema de desenvolvimento de recursos humanos
Subsistema de desenvolvimento de recursos humanosSubsistema de desenvolvimento de recursos humanos
Subsistema de desenvolvimento de recursos humanos
 
Econ315 Money and Banking: Learning Unit #05: Indirect Finance
Econ315 Money and Banking: Learning Unit #05: Indirect FinanceEcon315 Money and Banking: Learning Unit #05: Indirect Finance
Econ315 Money and Banking: Learning Unit #05: Indirect Finance
 
Materi buku look ahead sma x (10)
Materi buku look ahead sma x (10)Materi buku look ahead sma x (10)
Materi buku look ahead sma x (10)
 

Similar to Review of "The anatomy of a large scale hyper textual web search engine"

Google Research Paper
Google Research PaperGoogle Research Paper
Google Research Paperdidip
 
The anatomy of google
The anatomy of googleThe anatomy of google
The anatomy of googlemaelmardi
 
ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010steverz
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportIOSR Journals
 
[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the webVan-Duyet Le
 
Google ppt by amit
Google ppt by amitGoogle ppt by amit
Google ppt by amitDAVV
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine Aniket_1415
 
Pagerank
PagerankPagerank
Pageranktkgcse
 
History page-brin thesis - anatomy of a large scale hypertextual web search...
History   page-brin thesis - anatomy of a large scale hypertextual web search...History   page-brin thesis - anatomy of a large scale hypertextual web search...
History page-brin thesis - anatomy of a large scale hypertextual web search...Bitsytask
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023Amanda King
 
Google Looks Into the Index Now Protocol for Crawling and Indexing
Google Looks Into the Index Now Protocol for Crawling and IndexingGoogle Looks Into the Index Now Protocol for Crawling and Indexing
Google Looks Into the Index Now Protocol for Crawling and IndexingPaulDonahue16
 

Similar to Review of "The anatomy of a large scale hyper textual web search engine" (20)

How Google Works
How Google WorksHow Google Works
How Google Works
 
Google Research Paper
Google Research PaperGoogle Research Paper
Google Research Paper
 
Test
TestTest
Test
 
Google
GoogleGoogle
Google
 
The anatomy of google
The anatomy of googleThe anatomy of google
The anatomy of google
 
ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010
 
Google
GoogleGoogle
Google
 
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey reportPageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
 
[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web
 
Page Rank
Page RankPage Rank
Page Rank
 
Web Crawler
Web CrawlerWeb Crawler
Web Crawler
 
Google ppt by amit
Google ppt by amitGoogle ppt by amit
Google ppt by amit
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine
 
Pagerank
PagerankPagerank
Pagerank
 
History page-brin thesis - anatomy of a large scale hypertextual web search...
History   page-brin thesis - anatomy of a large scale hypertextual web search...History   page-brin thesis - anatomy of a large scale hypertextual web search...
History page-brin thesis - anatomy of a large scale hypertextual web search...
 
Google
GoogleGoogle
Google
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023
 
Google Looks Into the Index Now Protocol for Crawling and Indexing
Google Looks Into the Index Now Protocol for Crawling and IndexingGoogle Looks Into the Index Now Protocol for Crawling and Indexing
Google Looks Into the Index Now Protocol for Crawling and Indexing
 
Modern web search: Web Information Systems
Modern web search: Web Information SystemsModern web search: Web Information Systems
Modern web search: Web Information Systems
 
Modern web search: Lecture 11
Modern web search: Lecture 11Modern web search: Lecture 11
Modern web search: Lecture 11
 

More from Sai Malleswar

PANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMYPANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMYSai Malleswar
 
Digital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by DynamoDigital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by DynamoSai Malleswar
 
SWOT analysis of TATA motors
SWOT analysis of TATA motorsSWOT analysis of TATA motors
SWOT analysis of TATA motorsSai Malleswar
 
Temp based fan speed control
Temp based fan speed controlTemp based fan speed control
Temp based fan speed controlSai Malleswar
 
Impact of IT on environment
Impact of IT on environmentImpact of IT on environment
Impact of IT on environmentSai Malleswar
 
Adaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalAdaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalSai Malleswar
 
Pipeline stalling in vhdl
Pipeline stalling in vhdlPipeline stalling in vhdl
Pipeline stalling in vhdlSai Malleswar
 
Bidirectional data flow
Bidirectional data flowBidirectional data flow
Bidirectional data flowSai Malleswar
 
Mobile cell phone charger
Mobile cell phone charger Mobile cell phone charger
Mobile cell phone charger Sai Malleswar
 
Manufacturing of liquid propellant tank
Manufacturing of liquid propellant tankManufacturing of liquid propellant tank
Manufacturing of liquid propellant tankSai Malleswar
 
Magneto rhelogical fluids
Magneto rhelogical fluidsMagneto rhelogical fluids
Magneto rhelogical fluidsSai Malleswar
 
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTIONLIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTIONSai Malleswar
 

More from Sai Malleswar (18)

Digital Anemometer
Digital AnemometerDigital Anemometer
Digital Anemometer
 
PANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMYPANORAMA OF GLOBAL SPACE ECONOMY
PANORAMA OF GLOBAL SPACE ECONOMY
 
Digital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by DynamoDigital Speedo Meter Powered by Dynamo
Digital Speedo Meter Powered by Dynamo
 
Vx works RTOS
Vx works RTOSVx works RTOS
Vx works RTOS
 
SWOT analysis of TATA motors
SWOT analysis of TATA motorsSWOT analysis of TATA motors
SWOT analysis of TATA motors
 
Sorting manipulator
Sorting manipulatorSorting manipulator
Sorting manipulator
 
Temp based fan speed control
Temp based fan speed controlTemp based fan speed control
Temp based fan speed control
 
Impact of IT on environment
Impact of IT on environmentImpact of IT on environment
Impact of IT on environment
 
Adaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signalAdaptive delta modulation of Speech signal
Adaptive delta modulation of Speech signal
 
Pipeline stalling in vhdl
Pipeline stalling in vhdlPipeline stalling in vhdl
Pipeline stalling in vhdl
 
Digital stop watch
Digital stop watchDigital stop watch
Digital stop watch
 
Bidirectional data flow
Bidirectional data flowBidirectional data flow
Bidirectional data flow
 
Mobile cell phone charger
Mobile cell phone charger Mobile cell phone charger
Mobile cell phone charger
 
Manufacturing of liquid propellant tank
Manufacturing of liquid propellant tankManufacturing of liquid propellant tank
Manufacturing of liquid propellant tank
 
Magneto rhelogical fluids
Magneto rhelogical fluidsMagneto rhelogical fluids
Magneto rhelogical fluids
 
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTIONLIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
LIQUID PENETRANT AND MAGNETIC PARTICLE INSPECTION
 
POLYIMIDES
POLYIMIDESPOLYIMIDES
POLYIMIDES
 
PYROTECHNICS
PYROTECHNICSPYROTECHNICS
PYROTECHNICS
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Review of "The anatomy of a large scale hyper textual web search engine"

  • 1. KOSURU SAI MALLESWAR; SC09B093; SEM-6. REVIEW OF “The Anatomy of a Large-Scale Hyper textual Web Search Engine” Sergey Brin and Lawrence Page started the design of ‘Google’ to make a search engine that can crawl and index the web quickly and efficiently and to effectively deal with huge uncontrolled hypertext collections. One of the main goals was to improve the quality and scalability of search. Another goal was to setup a system that can support novel research activities on large-scale web data and a reasonable number of people can actually use it for their academic research. Google makes efficient use of storage space to store the index. This allows the quality of the search to scale effectively to the size of the web as it grows. Its data structures are optimized for fast and efficient access. To get high precision, Google uses the link structure of the Web to calculate a quality ranking for each web page. This ranking is called PageRank. The probability that the ‘random surfer’ visits a page is its PageRank. The ranking also involves damping factor, which is the probability at each page the ‘random surfer’ will get bored and request another random page. It allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. The text of a link is associated with the page that the link is on and also with the page the link points to. This idea of anchor text propagation provides better quality search but the challenge was the efficient usage of it because of the heavy data processing task. Along with page rank Google keeps a track of location information of all hits, some visual presentation details and stores full raw HTML of pages in the repository. Most of the Google’s architecture is implemented in C or C++ for efficiency and can run in either Solaris or Linux. The data structures of Google include big files, document indexes, lexicon, forward and reverse indexes and a huge repository. Google’s data structures are optimized in terms of cost by the feature of avoiding disk seeks whenever possible. Google has a fast distributed crawling system, where URL server and the crawlers are implemented in Python. Each crawler maintains a DNS cache to reduce the no. of DNS lookups, uses asynchronous IO and a no. of queues. The steps involved in indexing are parsing, indexing documents into barrels using multiple indexers running in parallel and sorting. The Google’s ranking system is designed so that no particular factor can have too much influence. The dot product of the vector of count- weights with the vector of type-weights is used to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document. For multi word search, Google has a complex algorithm. Google also considers feedback by trusted users while updating the ranks of webpages. Google can produce better results than the major commercial search engines for most searches. Google has evolved to overcome a number of bottlenecks in CPU, memory access, memory capacity, disk seeks, disk throughput, disk capacity, and network IO during various operations. By the efficient crawling and indexing performed by Google, information can be kept up to date and major changes can be tested relatively quickly. Google does not have optimizations such as query caching, sub-indices on common terms. The inventors intended to speed up Google considerably through distribution and hardware, software, and algorithmic improvements. They wished to make Google as a high quality search tool for searchers and researchers all around the world, sparking the next generation of search engine technology.