SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Making AJAX crawlable
Katharina Probst
Engineer, Google
Bruce Johnson
Engineering Manager, Google
in collaboration with:
Arup Mukherjee, Erik van der Poel, Li Xiao, Google
The problem of AJAX for web crawlers
Web crawlers don't always see what the user sees
● JavaScript produces dynamic content that is not seen by crawlers
● Example: A Google Web Toolkit application that looks like this to a user...
...but a web crawler only sees this:
<script src='showcase.js'></script>
Why does this problem need to be solved?
● Web 2.0: More content on the web is created dynamically (~69%)
● Over time, this hurts search
● Developers are discouraged from building dynamic apps
● Not solving AJAX crawlability holds back progress on the web!
A crawler's view of the web - with and without AJAX
Crawler
Web
Server
Browser
Browser
Web
Server
www.example.com/mystate
www.example.com/
What the crawler can't seeWhat the crawler can see
With
AJAX
Without
AJAX
#mystate
● Crawling and indexing AJAX is needed for users and developers
● Problem: Which AJAX states can be indexed?
○ Explicit opt-in needed by the web server
● Problem: Don't want to cloak
○ Users and search engine crawlers need to see the same content
● Problem: How could the logistics work?
○ That's the remainder of the presentation
Goal: crawl and index AJAX
Possible solutions
● Crawlers execute all the web's JavaScript
○ This is expensive and time-consuming
○ Only major search engines would even be able to do this, and
probably only partially
○ Indexes would be more stale, resulting in worse search results
● Web servers execute their own JavaScript at crawl time
○ Avoids above problems
○ Gives more control to webmasters
○ Can be done automatically
○ Does not require ongoing maintenance
Overview of proposed approach - crawl time
Your Web
Server
Crawler
Headless
browser
3. Server maps from ugly URL to pretty URL:
www.example.com/page?query#!mystate
4. Server invokes headless browser
5. Headless browser executes JavaScript and produces an
HTML snapshot for pretty URL
6. Crawler processes
HTML snapshot, extracts
pretty URLs
1. Crawler maps from pretty URL to ugly URL:
www.example.com/page?query&_escaped_fragment_=mystate
2. Requests ugly URL
HTML
snapsho
t
Crawling is enabled by mapping between
● "pretty" URLs: www.example.com/page?query#!mystate
● "ugly" URLs: www.example.com/page?query&_escaped_fragment_=mystate
Overview of proposed approach - search time
Search
engine
1. User submits query
2. Search engine returns pretty URL:
www.example.com/page?query#!mystate
Browser
3. User clicks on pretty URL link
4. Browser returns pretty URL:
www.example.com/page?query#!mystate
Nothing changes!
Agreement between participants
● Web servers agree to
○ opt in by indicating indexable states
○ execute JavaScript for ugly URLs (no user agent sniffing!)
○ not cloak by always giving same content to browser and crawler
regardless of request (or risk elimination, as before)
● Search engines agree to
○ discover URLs as before (Sitemaps, hyperlinks)
○ modify pretty URLs to ugly URLs
○ index content
○ display pretty URLs
Summary: Life of a URL
http://example.com/stocks.html#GOOG
could easily be changed to
http://example.com/stocks.html#!GOOG
which can be crawled as
http://example.com/stocks.html?_escaped_fragment_=GOOG
but will be displayed in the search results as
http://example.com/stocks.html#!GOOG
Feedback is welcome
● We are currently working on a proposal and prototype implementation
● Check out the blog post on the Google Webmaster Central Blog: http:
//googlewebmastercentral.blogspot.com
● We welcome feedback from the community at the Google Webmaster
Help Forum (link is posted in the blog entry)

Más contenido relacionado

Último

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Destacado

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destacado (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Making AJAX crawlable by katharina Probst & Bruce Johnson

  • 1. Making AJAX crawlable Katharina Probst Engineer, Google Bruce Johnson Engineering Manager, Google in collaboration with: Arup Mukherjee, Erik van der Poel, Li Xiao, Google
  • 2. The problem of AJAX for web crawlers Web crawlers don't always see what the user sees ● JavaScript produces dynamic content that is not seen by crawlers ● Example: A Google Web Toolkit application that looks like this to a user... ...but a web crawler only sees this: <script src='showcase.js'></script>
  • 3. Why does this problem need to be solved? ● Web 2.0: More content on the web is created dynamically (~69%) ● Over time, this hurts search ● Developers are discouraged from building dynamic apps ● Not solving AJAX crawlability holds back progress on the web!
  • 4. A crawler's view of the web - with and without AJAX Crawler Web Server Browser Browser Web Server www.example.com/mystate www.example.com/ What the crawler can't seeWhat the crawler can see With AJAX Without AJAX #mystate
  • 5. ● Crawling and indexing AJAX is needed for users and developers ● Problem: Which AJAX states can be indexed? ○ Explicit opt-in needed by the web server ● Problem: Don't want to cloak ○ Users and search engine crawlers need to see the same content ● Problem: How could the logistics work? ○ That's the remainder of the presentation Goal: crawl and index AJAX
  • 6. Possible solutions ● Crawlers execute all the web's JavaScript ○ This is expensive and time-consuming ○ Only major search engines would even be able to do this, and probably only partially ○ Indexes would be more stale, resulting in worse search results ● Web servers execute their own JavaScript at crawl time ○ Avoids above problems ○ Gives more control to webmasters ○ Can be done automatically ○ Does not require ongoing maintenance
  • 7. Overview of proposed approach - crawl time Your Web Server Crawler Headless browser 3. Server maps from ugly URL to pretty URL: www.example.com/page?query#!mystate 4. Server invokes headless browser 5. Headless browser executes JavaScript and produces an HTML snapshot for pretty URL 6. Crawler processes HTML snapshot, extracts pretty URLs 1. Crawler maps from pretty URL to ugly URL: www.example.com/page?query&_escaped_fragment_=mystate 2. Requests ugly URL HTML snapsho t Crawling is enabled by mapping between ● "pretty" URLs: www.example.com/page?query#!mystate ● "ugly" URLs: www.example.com/page?query&_escaped_fragment_=mystate
  • 8. Overview of proposed approach - search time Search engine 1. User submits query 2. Search engine returns pretty URL: www.example.com/page?query#!mystate Browser 3. User clicks on pretty URL link 4. Browser returns pretty URL: www.example.com/page?query#!mystate Nothing changes!
  • 9. Agreement between participants ● Web servers agree to ○ opt in by indicating indexable states ○ execute JavaScript for ugly URLs (no user agent sniffing!) ○ not cloak by always giving same content to browser and crawler regardless of request (or risk elimination, as before) ● Search engines agree to ○ discover URLs as before (Sitemaps, hyperlinks) ○ modify pretty URLs to ugly URLs ○ index content ○ display pretty URLs
  • 10. Summary: Life of a URL http://example.com/stocks.html#GOOG could easily be changed to http://example.com/stocks.html#!GOOG which can be crawled as http://example.com/stocks.html?_escaped_fragment_=GOOG but will be displayed in the search results as http://example.com/stocks.html#!GOOG
  • 11. Feedback is welcome ● We are currently working on a proposal and prototype implementation ● Check out the blog post on the Google Webmaster Central Blog: http: //googlewebmastercentral.blogspot.com ● We welcome feedback from the community at the Google Webmaster Help Forum (link is posted in the blog entry)