SlideShare una empresa de Scribd logo
1 de 28
Web 2.0   blog, wiki, tag, social network: what are they, how to use them and why they are important Lesson 8 : the Google world
This material is distributed under the Creative Commons "Attribution - NonCommercial - Share Alike - 3.0", available at  http://creativecommons.org/licenses/by-nc-sa/3.0/   . Part of the slides is the result of a welcome distance collaboration with prof. Roberto Polillo, University Milan Bicocca ( http://www.rpolillo.it )
Google: searching Each search engine has three main components: - Crawler - Database - Interface and query software The crawler is a software program which surfs the net and brings the pages in the index. The crawler also takes note of the links it finds and uses them to gradually reach new pages with new links The index is a huge database where pages are stored with all metadata and where all the words are "reversed" by creating indexes / keys for each The interface receives the user's request, try to interpret it and passes the request to the "query processor" that works on the index
Google: searching search engine schema http://en.wikipedia.org/wiki/Search_engine
Google: searching The searches are usually very short: 20% use a word, almost 50% is composed of two or three words, only 5% more than six words Also the "searches" are distributed according to a "long tail" curve, approximately 50% of daily searches are unique. Do you know GoogleWhacking? About 90% of users use the first four engines: G Y AOL and Bing (G> 50%) The traffic on search engines has two peaks in the morning (in the office) and one in the evening (once returned home). The approx cost of acquiring a customer ranges from $ 70  mail advertising, online advertising to $ 50, $ 20 of the yellow pages up to $ 8 (!)for links related
Google: “old” searching First search engines:  Archie 1990 (ftp command line query) Veronica Gopher 1993 (search only documents title) WebCrawler 1994, the first to index the text of the pages. First  good  search engine: AltaVista (1995), born in DEC laboratories; thanks to Alpha 64bit processor it could launch a thousand crawler simultaneously. AltaVista answered the first year to 4 billion searches! Sold to Compaq, AltaVista was transformed into a portal  Yahoo! Born as "David's and Jerry's Guide to the WWW" with a directory approach (see archive.org), a great success thanks to the link with Netscape. Yahoo! used its own directory service and for the search it used outboard engine: OpenText, AltaVista, then Inktomi and Google. 2009: Yahoo! and Microsoft Bing http://ppcblog.com/search-history/   http://www.searchenginehistory.com/   http://performancing.com/search-engine-history/
Google: born Brin and Page studied at Stanford and Page had the degree thesis on “the Web as a graph” with Terry Winograd. The project BackRub (1995) was a system to find links on the Web, store and republishing them for analysis to see which pages pointing to a  Then (1994)  given page. In 1996 BackRub began to index the Web and, through the interpretation of graphs, also to assess the relative importance of sites. So was born the basic concept of  Page Rank algorithm, that takes into account both the number of links a site receives and the number of links to each of the sites linked to the first. In 1998 Brin and Page released the features of PageRank in paper "The Anatomy of a large-scale hypertextual Web search engine" and founded Google Inc. based in classic garage.
Google: the algorithm The secret of Google success is in the algorithm, obviously covered by secret, even if the network you can find its most important features A SEO expert has developed the “Randfish theorem"  http://www.seomoz.org/  in which an hypothesis is presented about the Google scoring method (Keywords used * 0.3) + (Domain revelance * 0.25) + (Links in input * 0.25) + (User data * 0.1) + (Content Quality * 0.1) + (Manual push) - (Penalty automatic & manual) = Google Score
Google:  the algorithm Factors in the keywords use : * Keywords in title tag * Keywords in header tags * Keywords in the document text * Keywords in internal links pointing to page * Keywords in domain name and / or URL
Google: the algorithm Domain relevance: * History of registration * Domain “age” * Importance of links pointing to the domain * Domain relevance on the subject, based on incoming and outgoing links  * Links historical use & patterns to the domain Score of incoming links: * Links “age” * Quality of domains that send the link * Quality of pages sending the link * Links text * Assessment of quantity / weight of the links (PageRank) * Relevance of pages sending the link
Google: the algorithm User data: * All-time percentage of clicks (CTR) on the results page of search engines * Time spent by users on the page * Number of searches for URL / domain name * History of visits / usage of the URL / domain name that Google users can monitor (toolbar, wifi, analytics, etc.) Content quality: * Potentially given by hand for searches and the most popular pages * Provided by Google internal evaluators  * Automated algorithms to assess the text (quality, readability, etc.)
Google: the algorithm The original patent (1998) U.s Patent file # 6,285,999 ; METHOD FOR NODE RANKING IN A LINKED DATABASE A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality.  Inventor: Page; Lawrence (Stanford, CA) Assignee: The Board of Trustees of the Leland Stanford Junior University (Stanford, CA)
Google: the algorithm The simplified formula  http://en.wikipedia.org/wiki/PageRank   Where: * PR[A] is PageRank value for A page * PR[B] ... PR[n] are PageRank values for pages B ... n linking to A  * L[B] ... L[n] is the total numer of links in pages B ... n  * d (damping factor) is the probability that an imaginary surfer who is randomly clicking on links will go on clicking. it is generally assumed that the damping factor will be set around 0.85. It represents the PageRank percentage passing from one page to another
Google: the algorithm  PageRank in detail (from  www.google.com/corporate/tech.html   ) PageRank  reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. We have always taken a pragmatic approach to help improve search quality and create useful products, and our technology uses the collective intelligence of the web to determine a page's importance.
Google: the algorithm Hypertext-Matching Analysis: Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.
Other links about search engines http://docs.google.com/View?id=dfvwdtqp_1c8x6bmd8  https://docs.google.com/present/view?id=dfvwdtqp_31dqxqk8g9&ndplr=1   https://docs.google.com/present/view?hl=en&id=dfvwdtqp_35hq27gfhk   http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1
Google The Google search-engine is now the most important access point to the network http://gs.statcounter.com/#search_engine-ww-monthly-200807-201104   Search on Google, or  to google , is now part of common language.  You don't know? Ask Google! Now many services offered by Google (BigG!) : a  big part of Web 2.0 world now belongs to Google: YouTube, Google Earth / Maps / Calendar / Reader, ... and now Google went in browser market with Chrome and in mobile market with Android  http://en.wikipedia.org/wiki/Usage_share_of_web_browsers   http://blog.nielsen.com/nielsenwire/online_mobile/who-is-winning-the-u-s-smartphone-battle/ http://blog.nielsen.com/nielsenwire/consumer/more-us-consumers-choosing-smartphones-as-apple-closes-the-gap-on-android/
Google Dance Google periodically updates engine algorithms to penalize what it considers spam by specialists SEM / SEO (Search Engine Marketing / Optimization): the position index is so important that many websites are written containing only links to "climb" the sites that pay There is no doubt that these attacks continue against spamming trade also serves to "push" services AdWords advertising.  Other frauds are possible with AdSense, where site owners earn from clicks on sponsored links on their sites; sometimes robot programs are used, sometimes workers offshore to click on the links and gain (an estimated 30% of advertising budgets so go missing) AdSense has helped to create the long tail of advertising, bringing hundreds of thousands of businesses to advertise and thousands of sites offering it. https://www.google.com/adsense/static/en/Publishertools.html
Google In 2007 Big Brother Award Italy has awarded Google the dubious prize of "most invasive technology”, motivating the decision this way: "Brin, one of the founders of Google likes to say its employees "Do not Be Evil" and this became the company slogan. The admiration for Google and  his services and its success as a company can not hide the fact that every search, every e-mail, post on Google Groups is recorded and analyzed, even if anonymous, and all the analysis head on the profiling of the navigator. Google, given the size, is the entity in the world potentially more threatening to privacy. With the recent purchase of DoubleClick.com giant of advertising and online profiling, which enlarges the potential data mining of Google, it seems that the motto could now become "Do not Be Evil, buy the Devil." http://en.wikipedia.org/wiki/Criticism_of_Google
Google AdWords AdWords (introduced in 2000) is the main advertising from Google, and the main source of revenue (> $ 28 billion in 2010) Advertisers specify the search words that bring their ads on the right of the results page of search engine ("sponsored links") The advertiser pays when the user clicks on the ad (Pay Per Click) and the price per click is determined by complex rules  The service is managed online: the software makes all the work (negotiations, sales, execution)  http://en.wikipedia.org/wiki/AdWords http://adwords.google.com   http://investor.google.com/financial/tables.html  from advertising a big part of income
Google AdWords
Google AdWords top queries covers only 3% of total -> long tail http://bnoopy.typepad.com/bnoopy/2005/03/the_long_tail_o.html see Google AdWords Intro.odp
Google AdSense With this service, Google "administer" advertising space on the web pages of the sites customers Google places ads in the web pages, according to criteria of semantic correlation with pages of the host site The host site is paid "per click" AdSense has brought hundreds of thousands of small businesses to advertise and offer it to thousands of sites Google currently shares 68% of revenues generated by AdSense with content network partners. http://en.wikipedia.org/wiki/AdSense
Google AdSense ,[object Object],[object Object]
Google Operating Systems Android : open-source platform Linux-based for mobile device application developments Google Chrome OS : netbooks/notebooks platform “ Google Chrome OS is an open source, lightweight operating system that will initially be targeted at netbooks. Later this year we will open-source its code, and netbooks running Google Chrome OS will be available for consumers in the second half of 2010. (...) Google Chrome OS will run on both x86 as well as ARM chips and we are working with multiple OEMs to bring a number of netbooks to market next year. The software architecture is simple — Google Chrome running within a new windowing system on top of a Linux kernel.” http://getchrome.eu/index.php
Google Operating Systems Android :  see Android.ppt  http://www.android.com/about/   Google Chrome OS : first systems in 2011 http://www.google.com/chromeos/features.html   http://www.chromium.org/chromium-os   http://www.chromium.org/chromium-os/chromiumos-design-docs/software-architecture
Google  tricks Google tells what information is collected when using the search engine and what is done to protect the privacy of users: http://www.youtube.com/watch?v=iPkvNr2cpqg   http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf   Search in the blogs:  http://blogsearch.google.it/   Search history  http://www.google.com/history   Sites comparison:  http://www.google.com/insights/search/ #  Other:  http://www.google.com/intl/en/options/   and  http://labs.google.com/
exercise 8 ,[object Object]

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Lecture7
Lecture7Lecture7
Lecture7
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Internet Tutorial 04
Internet Tutorial 04Internet Tutorial 04
Internet Tutorial 04
 
CSE509 Lecture 3
CSE509 Lecture 3CSE509 Lecture 3
CSE509 Lecture 3
 
Google
GoogleGoogle
Google
 
Accessing the deep web (2007)
Accessing the deep web (2007)Accessing the deep web (2007)
Accessing the deep web (2007)
 
Tutorial 4 - Information Resources on the Web
Tutorial 4 - Information Resources on the WebTutorial 4 - Information Resources on the Web
Tutorial 4 - Information Resources on the Web
 
Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)
 
Programming Social Applications
Programming Social ApplicationsProgramming Social Applications
Programming Social Applications
 
Ranking Web Pages
Ranking Web PagesRanking Web Pages
Ranking Web Pages
 
How search engine works and history of search engine
How search engine works and history of search engineHow search engine works and history of search engine
How search engine works and history of search engine
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
The Internet
The InternetThe Internet
The Internet
 
Beyond Google: Advanced Internet Search Tips and Tricks
Beyond Google: Advanced Internet Search Tips and TricksBeyond Google: Advanced Internet Search Tips and Tricks
Beyond Google: Advanced Internet Search Tips and Tricks
 
Analysis of websites as graphs for SEO
Analysis of websites as graphs for SEOAnalysis of websites as graphs for SEO
Analysis of websites as graphs for SEO
 
Pr
PrPr
Pr
 
Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011Consuming Linked Data 4/5 Semtech2011
Consuming Linked Data 4/5 Semtech2011
 
search engines
search enginessearch engines
search engines
 
Internet Search Slideshow
Internet Search SlideshowInternet Search Slideshow
Internet Search Slideshow
 
Smart Searching
Smart SearchingSmart Searching
Smart Searching
 

Similar a Web2.0.2012 - lesson 8 - Google world

Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismUmang MIshra
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGlebinit singh
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
 
ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010steverz
 
History page-brin thesis - anatomy of a large scale hypertextual web search...
History   page-brin thesis - anatomy of a large scale hypertextual web search...History   page-brin thesis - anatomy of a large scale hypertextual web search...
History page-brin thesis - anatomy of a large scale hypertextual web search...Bitsytask
 
The anatomy of google
The anatomy of googleThe anatomy of google
The anatomy of googlemaelmardi
 
Getting Traffic From Google.pdf
Getting Traffic From Google.pdfGetting Traffic From Google.pdf
Getting Traffic From Google.pdfDemetris D-Papa
 
Pagerank
PagerankPagerank
Pageranktkgcse
 
Web Techology and google code sh (2014_10_10 08_57_30 utc)
Web Techology and google code sh (2014_10_10 08_57_30 utc)Web Techology and google code sh (2014_10_10 08_57_30 utc)
Web Techology and google code sh (2014_10_10 08_57_30 utc)Suyash Gupta
 
Chewy Trewella - Google Searchtips
Chewy Trewella - Google SearchtipsChewy Trewella - Google Searchtips
Chewy Trewella - Google Searchtipssounddelivery
 
Web Crawler For Mining Web Data
Web Crawler For Mining Web DataWeb Crawler For Mining Web Data
Web Crawler For Mining Web DataIRJET Journal
 
Google ppt by amit
Google ppt by amitGoogle ppt by amit
Google ppt by amitDAVV
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithmsAnkit Raj
 
Glossary of Digital Terms
Glossary of Digital TermsGlossary of Digital Terms
Glossary of Digital TermsLaura Kerrigan
 
Glossary of Digital Terms
Glossary of Digital TermsGlossary of Digital Terms
Glossary of Digital TermsLaura Kerrigan
 
Search Engine Progressions over time
Search Engine Progressions over timeSearch Engine Progressions over time
Search Engine Progressions over timeifxyou
 

Similar a Web2.0.2012 - lesson 8 - Google world (20)

Search Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanismSearch Engine working, Crawlers working, Search Engine mechanism
Search Engine working, Crawlers working, Search Engine mechanism
 
E017624043
E017624043E017624043
E017624043
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGle
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 
ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010
 
Search Engine
Search EngineSearch Engine
Search Engine
 
History page-brin thesis - anatomy of a large scale hypertextual web search...
History   page-brin thesis - anatomy of a large scale hypertextual web search...History   page-brin thesis - anatomy of a large scale hypertextual web search...
History page-brin thesis - anatomy of a large scale hypertextual web search...
 
Seo Presentation
Seo PresentationSeo Presentation
Seo Presentation
 
The anatomy of google
The anatomy of googleThe anatomy of google
The anatomy of google
 
Getting Traffic From Google.pdf
Getting Traffic From Google.pdfGetting Traffic From Google.pdf
Getting Traffic From Google.pdf
 
Pagerank
PagerankPagerank
Pagerank
 
Web Techology and google code sh (2014_10_10 08_57_30 utc)
Web Techology and google code sh (2014_10_10 08_57_30 utc)Web Techology and google code sh (2014_10_10 08_57_30 utc)
Web Techology and google code sh (2014_10_10 08_57_30 utc)
 
Chewy Trewella - Google Searchtips
Chewy Trewella - Google SearchtipsChewy Trewella - Google Searchtips
Chewy Trewella - Google Searchtips
 
Web Crawler For Mining Web Data
Web Crawler For Mining Web DataWeb Crawler For Mining Web Data
Web Crawler For Mining Web Data
 
Google ppt by amit
Google ppt by amitGoogle ppt by amit
Google ppt by amit
 
Ranking algorithms
Ranking algorithmsRanking algorithms
Ranking algorithms
 
Glossary of Digital Terms
Glossary of Digital TermsGlossary of Digital Terms
Glossary of Digital Terms
 
Glossary of Digital Terms
Glossary of Digital TermsGlossary of Digital Terms
Glossary of Digital Terms
 
Search engine
Search engineSearch engine
Search engine
 
Search Engine Progressions over time
Search Engine Progressions over timeSearch Engine Progressions over time
Search Engine Progressions over time
 

Más de Carlo Vaccari

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and SandboxCarlo Vaccari
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleCarlo Vaccari
 
Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataCarlo Vaccari
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityCarlo Vaccari
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentCarlo Vaccari
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerCarlo Vaccari
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksCarlo Vaccari
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Vaccari
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practiceCarlo Vaccari
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDBCarlo Vaccari
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteCarlo Vaccari
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinCarlo Vaccari
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Carlo Vaccari
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Carlo Vaccari
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheCarlo Vaccari
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network Carlo Vaccari
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaCarlo Vaccari
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchersCarlo Vaccari
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 

Más de Carlo Vaccari (20)

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and Sandbox
 
I Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionaleI Big Data e la Statistica: un progetto internazionale
I Big Data e la Statistica: un progetto internazionale
 
Andrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open DataAndrea Talamonti: CKAN a tool for Open Data
Andrea Talamonti: CKAN a tool for Open Data
 
Fabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & UniversityFabrizio Allegretto: Open Data & University
Fabrizio Allegretto: Open Data & University
 
Yapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environmentYapo Juares Tanguy: RSS environment
Yapo Juares Tanguy: RSS environment
 
Matteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed readerMatteo Marchionne: Foaf e feed reader
Matteo Marchionne: Foaf e feed reader
 
Alex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networksAlex Haechler: China vs USA social networks
Alex Haechler: China vs USA social networks
 
Carlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for businessCarlo Colicchio: Big Data for business
Carlo Colicchio: Big Data for business
 
Yves Studer: Big Data in practice
Yves Studer: Big Data in practiceYves Studer: Big Data in practice
Yves Studer: Big Data in practice
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
 
Rando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suiteRando Veizi: Data warehouse and Pentaho suite
Rando Veizi: Data warehouse and Pentaho suite
 
Unkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs LinkedinUnkan Erol: Xing vs Linkedin
Unkan Erol: Xing vs Linkedin
 
Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013Big Data Conference Ottobre 2013
Big Data Conference Ottobre 2013
 
Big data analytics vaccari oct2013
Big data analytics vaccari oct2013Big data analytics vaccari oct2013
Big data analytics vaccari oct2013
 
Serena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione MarcheSerena Carota: Open Data nella Regione Marche
Serena Carota: Open Data nella Regione Marche
 
Introduzione ai Social network
Introduzione ai Social network  Introduzione ai Social network
Introduzione ai Social network
 
Start up innovative
Start up innovativeStart up innovative
Start up innovative
 
Social network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientificaSocial network ,ricerca di lavoro e ricerca scientifica
Social network ,ricerca di lavoro e ricerca scientifica
 
Social network and job searching and SN for researchers
Social network and job searching and SN for researchersSocial network and job searching and SN for researchers
Social network and job searching and SN for researchers
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 

Último

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Último (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Web2.0.2012 - lesson 8 - Google world

  • 1. Web 2.0 blog, wiki, tag, social network: what are they, how to use them and why they are important Lesson 8 : the Google world
  • 2. This material is distributed under the Creative Commons "Attribution - NonCommercial - Share Alike - 3.0", available at http://creativecommons.org/licenses/by-nc-sa/3.0/ . Part of the slides is the result of a welcome distance collaboration with prof. Roberto Polillo, University Milan Bicocca ( http://www.rpolillo.it )
  • 3. Google: searching Each search engine has three main components: - Crawler - Database - Interface and query software The crawler is a software program which surfs the net and brings the pages in the index. The crawler also takes note of the links it finds and uses them to gradually reach new pages with new links The index is a huge database where pages are stored with all metadata and where all the words are "reversed" by creating indexes / keys for each The interface receives the user's request, try to interpret it and passes the request to the "query processor" that works on the index
  • 4. Google: searching search engine schema http://en.wikipedia.org/wiki/Search_engine
  • 5. Google: searching The searches are usually very short: 20% use a word, almost 50% is composed of two or three words, only 5% more than six words Also the "searches" are distributed according to a "long tail" curve, approximately 50% of daily searches are unique. Do you know GoogleWhacking? About 90% of users use the first four engines: G Y AOL and Bing (G> 50%) The traffic on search engines has two peaks in the morning (in the office) and one in the evening (once returned home). The approx cost of acquiring a customer ranges from $ 70 mail advertising, online advertising to $ 50, $ 20 of the yellow pages up to $ 8 (!)for links related
  • 6. Google: “old” searching First search engines: Archie 1990 (ftp command line query) Veronica Gopher 1993 (search only documents title) WebCrawler 1994, the first to index the text of the pages. First good search engine: AltaVista (1995), born in DEC laboratories; thanks to Alpha 64bit processor it could launch a thousand crawler simultaneously. AltaVista answered the first year to 4 billion searches! Sold to Compaq, AltaVista was transformed into a portal Yahoo! Born as "David's and Jerry's Guide to the WWW" with a directory approach (see archive.org), a great success thanks to the link with Netscape. Yahoo! used its own directory service and for the search it used outboard engine: OpenText, AltaVista, then Inktomi and Google. 2009: Yahoo! and Microsoft Bing http://ppcblog.com/search-history/ http://www.searchenginehistory.com/ http://performancing.com/search-engine-history/
  • 7. Google: born Brin and Page studied at Stanford and Page had the degree thesis on “the Web as a graph” with Terry Winograd. The project BackRub (1995) was a system to find links on the Web, store and republishing them for analysis to see which pages pointing to a Then (1994) given page. In 1996 BackRub began to index the Web and, through the interpretation of graphs, also to assess the relative importance of sites. So was born the basic concept of Page Rank algorithm, that takes into account both the number of links a site receives and the number of links to each of the sites linked to the first. In 1998 Brin and Page released the features of PageRank in paper "The Anatomy of a large-scale hypertextual Web search engine" and founded Google Inc. based in classic garage.
  • 8. Google: the algorithm The secret of Google success is in the algorithm, obviously covered by secret, even if the network you can find its most important features A SEO expert has developed the “Randfish theorem" http://www.seomoz.org/ in which an hypothesis is presented about the Google scoring method (Keywords used * 0.3) + (Domain revelance * 0.25) + (Links in input * 0.25) + (User data * 0.1) + (Content Quality * 0.1) + (Manual push) - (Penalty automatic & manual) = Google Score
  • 9. Google: the algorithm Factors in the keywords use : * Keywords in title tag * Keywords in header tags * Keywords in the document text * Keywords in internal links pointing to page * Keywords in domain name and / or URL
  • 10. Google: the algorithm Domain relevance: * History of registration * Domain “age” * Importance of links pointing to the domain * Domain relevance on the subject, based on incoming and outgoing links * Links historical use & patterns to the domain Score of incoming links: * Links “age” * Quality of domains that send the link * Quality of pages sending the link * Links text * Assessment of quantity / weight of the links (PageRank) * Relevance of pages sending the link
  • 11. Google: the algorithm User data: * All-time percentage of clicks (CTR) on the results page of search engines * Time spent by users on the page * Number of searches for URL / domain name * History of visits / usage of the URL / domain name that Google users can monitor (toolbar, wifi, analytics, etc.) Content quality: * Potentially given by hand for searches and the most popular pages * Provided by Google internal evaluators * Automated algorithms to assess the text (quality, readability, etc.)
  • 12. Google: the algorithm The original patent (1998) U.s Patent file # 6,285,999 ; METHOD FOR NODE RANKING IN A LINKED DATABASE A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality. Inventor: Page; Lawrence (Stanford, CA) Assignee: The Board of Trustees of the Leland Stanford Junior University (Stanford, CA)
  • 13. Google: the algorithm The simplified formula http://en.wikipedia.org/wiki/PageRank Where: * PR[A] is PageRank value for A page * PR[B] ... PR[n] are PageRank values for pages B ... n linking to A * L[B] ... L[n] is the total numer of links in pages B ... n * d (damping factor) is the probability that an imaginary surfer who is randomly clicking on links will go on clicking. it is generally assumed that the damping factor will be set around 0.85. It represents the PageRank percentage passing from one page to another
  • 14. Google: the algorithm PageRank in detail (from www.google.com/corporate/tech.html ) PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. We have always taken a pragmatic approach to help improve search quality and create useful products, and our technology uses the collective intelligence of the web to determine a page's importance.
  • 15. Google: the algorithm Hypertext-Matching Analysis: Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.
  • 16. Other links about search engines http://docs.google.com/View?id=dfvwdtqp_1c8x6bmd8 https://docs.google.com/present/view?id=dfvwdtqp_31dqxqk8g9&ndplr=1 https://docs.google.com/present/view?hl=en&id=dfvwdtqp_35hq27gfhk http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1
  • 17. Google The Google search-engine is now the most important access point to the network http://gs.statcounter.com/#search_engine-ww-monthly-200807-201104 Search on Google, or to google , is now part of common language. You don't know? Ask Google! Now many services offered by Google (BigG!) : a big part of Web 2.0 world now belongs to Google: YouTube, Google Earth / Maps / Calendar / Reader, ... and now Google went in browser market with Chrome and in mobile market with Android http://en.wikipedia.org/wiki/Usage_share_of_web_browsers http://blog.nielsen.com/nielsenwire/online_mobile/who-is-winning-the-u-s-smartphone-battle/ http://blog.nielsen.com/nielsenwire/consumer/more-us-consumers-choosing-smartphones-as-apple-closes-the-gap-on-android/
  • 18. Google Dance Google periodically updates engine algorithms to penalize what it considers spam by specialists SEM / SEO (Search Engine Marketing / Optimization): the position index is so important that many websites are written containing only links to "climb" the sites that pay There is no doubt that these attacks continue against spamming trade also serves to "push" services AdWords advertising. Other frauds are possible with AdSense, where site owners earn from clicks on sponsored links on their sites; sometimes robot programs are used, sometimes workers offshore to click on the links and gain (an estimated 30% of advertising budgets so go missing) AdSense has helped to create the long tail of advertising, bringing hundreds of thousands of businesses to advertise and thousands of sites offering it. https://www.google.com/adsense/static/en/Publishertools.html
  • 19. Google In 2007 Big Brother Award Italy has awarded Google the dubious prize of "most invasive technology”, motivating the decision this way: "Brin, one of the founders of Google likes to say its employees "Do not Be Evil" and this became the company slogan. The admiration for Google and his services and its success as a company can not hide the fact that every search, every e-mail, post on Google Groups is recorded and analyzed, even if anonymous, and all the analysis head on the profiling of the navigator. Google, given the size, is the entity in the world potentially more threatening to privacy. With the recent purchase of DoubleClick.com giant of advertising and online profiling, which enlarges the potential data mining of Google, it seems that the motto could now become "Do not Be Evil, buy the Devil." http://en.wikipedia.org/wiki/Criticism_of_Google
  • 20. Google AdWords AdWords (introduced in 2000) is the main advertising from Google, and the main source of revenue (> $ 28 billion in 2010) Advertisers specify the search words that bring their ads on the right of the results page of search engine ("sponsored links") The advertiser pays when the user clicks on the ad (Pay Per Click) and the price per click is determined by complex rules The service is managed online: the software makes all the work (negotiations, sales, execution) http://en.wikipedia.org/wiki/AdWords http://adwords.google.com http://investor.google.com/financial/tables.html from advertising a big part of income
  • 22. Google AdWords top queries covers only 3% of total -> long tail http://bnoopy.typepad.com/bnoopy/2005/03/the_long_tail_o.html see Google AdWords Intro.odp
  • 23. Google AdSense With this service, Google "administer" advertising space on the web pages of the sites customers Google places ads in the web pages, according to criteria of semantic correlation with pages of the host site The host site is paid "per click" AdSense has brought hundreds of thousands of small businesses to advertise and offer it to thousands of sites Google currently shares 68% of revenues generated by AdSense with content network partners. http://en.wikipedia.org/wiki/AdSense
  • 24.
  • 25. Google Operating Systems Android : open-source platform Linux-based for mobile device application developments Google Chrome OS : netbooks/notebooks platform “ Google Chrome OS is an open source, lightweight operating system that will initially be targeted at netbooks. Later this year we will open-source its code, and netbooks running Google Chrome OS will be available for consumers in the second half of 2010. (...) Google Chrome OS will run on both x86 as well as ARM chips and we are working with multiple OEMs to bring a number of netbooks to market next year. The software architecture is simple — Google Chrome running within a new windowing system on top of a Linux kernel.” http://getchrome.eu/index.php
  • 26. Google Operating Systems Android : see Android.ppt http://www.android.com/about/ Google Chrome OS : first systems in 2011 http://www.google.com/chromeos/features.html http://www.chromium.org/chromium-os http://www.chromium.org/chromium-os/chromiumos-design-docs/software-architecture
  • 27. Google tricks Google tells what information is collected when using the search engine and what is done to protect the privacy of users: http://www.youtube.com/watch?v=iPkvNr2cpqg http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf Search in the blogs: http://blogsearch.google.it/ Search history http://www.google.com/history Sites comparison: http://www.google.com/insights/search/ # Other: http://www.google.com/intl/en/options/ and http://labs.google.com/
  • 28.
  • 29. Try some search on Google, Bing and Yahoo!: report about differences between them
  • 30. Analyze AdWords and give your opinion on it
  • 31. Give your opinion about ChromeOs future