SlideShare una empresa de Scribd logo
1 de 168
2009
God it’s bad.
-$1.5 Billion
Why hasn’t Google seen the changes on my page?
How should I prioritise errors in Search Console?
Are my canonicals being respected?
Does Google think this page is important?
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
IP Address
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Timestamp
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Request type
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Homepage
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Protocol
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Status Code
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
Size of the page (in bytes)
What does a log look like?
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage
HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html))"
User Agent
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
5 things
2 3 4 51
1 Diagnose crawling &
indexation issues
2 3 4 51
Number of
requests
Five folders Googlebot crawled the most
Five folders Googlebot crawled the most
Number of
requests
% of Organic sessions VS % of crawl budget
Sessions Crawl budget
2 Prioritisation
2 3 4 51
example.com/article
Prioritizing
1
Full Print
example.com/article/full
example.com/article/print
Prioritizing
2
example.com/article/pdf
Prioritizing
3
Prioritizing
1
Full Print
3 Spot bugs &
view site health
2 3 4 51
Delayed errors with a limit of 1000
4 How important does Google
see parts of your site?
2 3 4 51
My SEO was as bad as my design
But at least my hair was better
teflsearch.com
teflsearch.com/job-results
teflsearch.com/job-results/country/china
teflsearch.com/jobadvert3455
Average number of times Googlebot crawled a template
1. teflsearch.com
2. teflsearch.com/job-results
3. teflsearch.com/job-results/country/china
4. teflsearch.com/job-advert3455
1. teflsearch.com
2. teflsearch.com/job-results
3. teflsearch.com/job-results/country/china
4. teflsearch.com/job-advert3455
teflsearch.com/job-results
Average number of times Googlebot crawled a template
35%
5 How fresh does it think your
content is?
2 3 4 51
bit.ly/moz-fresh
Average number of times a page template is crawled by
Googlebot
●Improve our internal linking
●Build trust with last modified date in
sitemap
2 3 4 51
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Talk to a developer
and ask for
information
Are all the logs in one place?
Hi x
I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about
the log set-up (as well as with getting the logs!).
What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re
spending their time, the status code errors they’re finding etc.
There are also some things that are really helpful for us to know when getting logs.
Do the logs have any personal informationin?
We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be
removed.
Do you have any sort of caching which would create separate sets of logs?
If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well as just
those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example to cache
external images then we don’t need it).
Are there any sub parts of your site which log to a different place?
Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well.
Do you log hostname?
It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very useful
to have that turned on now for any future analysis.
Is there anything else we should know?
Best,
{x}
Email for a developer
So we might have something that looks like this
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
BigQuery
BigQuery
Google’s online database for
data analysis.
1. Ask powerful questions
2. Repeatable
3. Scaleable
4. Combine with crawl data
5. Easy to set-up
6. Easy to learn
What do we want from analysing our logs?
9,000,000 rows of data for 2
months.
400 - 800 queries
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Format the logs so we can import them into
BigQuery
Separate the Googlebot logs from all the
other logs
Screaming Frog Log
Analyser
Code something
Screaming Frog Log Analyser
Code something
bit.ly/logs-code
What can you do with
logs?
PART 1: THE WHY
Getting logs
Analysing Logs
Processing Logs
PART 2: THE HOW
Our data in BQ
We make sure we
got what we wanted
THE QUESTION:
What is the total number of requests
Googlebot makes each day to our site?
Our first SQL query
SELECT
timestamp
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
timestamp
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*)
FROM
[mydata.log_analysis]
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*)
FROM
[mydata.log_analysis]
GROUP BY
date
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*) as number_of_requests
FROM
[mydata.log_analysis]
GROUP BY
date
Our first SQL query
SELECT
DATE(timestamp) as date,
count(*) as number_of_requests
FROM
[mydata.log_analysis]
GROUP BY
date
Comparing logs to GSC crawl volume
Number of
requests
Run queries
Find something weird
Go look at crawl & website
Our data in BQ
1 Diagnose crawling &
indexation issues
2 Prioritisation
3 Spot bugs &
view site health
4 How important does Google
see parts of your site?
5 How fresh does it think
your content is?
1 Diagnose crawling &
indexation issues
4 How important does Google
see parts of your site?
What are the top 20 URLs crawled by
Google over our logs?
Login is my top crawled page and then search?
What are the top 20 page_path_1 folders
crawled by Google over our logs?
Location folders are taking more than 70% of my budget
Getting data by the day
Page Number of Googlebot Requests
page1 200,000
page2 120,000
Number of Googlebot requests day by day
3 Spot bugs &
view site health
How many of each status code does
Google find per day over our logs?
Number of Googlebot requests day by day
What are most requested 404 URLs by
Googlebot over the past 30 days?
Boy does it want that ad-tech snippet
5 How fresh does it think your
content is?
How many times on average is each page
in a page template crawled a day?
Average number of times a page template is crawled by
Googlebot
How long does it take for a page to be discovered after being published?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
What percentage of a directory was crawled over the past 30 days?
How long does it take for a page to be discovered after being published?
What are the top 20 combinations of page_path_1 & path_path_2 folders
crawled by Google over the time period of our logs?
Which pages have requests from Googlebot, which don’t appear in our crawl?
What are the top non-canonical pages being crawled?
Which are most crawled parameters on the website?
How often are the most visited parameters crawled each day?
Which directories have the most 301 & 404 error codes?
Which pages are crawled with parameters and without parameters?
Which pages are only partly downloaded?
How many hits does each section get, when the sections are classified in an
external dataset?
What percentage of a directory was crawled over the past 30 days?
What are the total number of requests across two different time periods?
That’s a lot of questions
bit.ly/logs-resource
bit.ly/logs-resource
bit.ly/logs-resource
bit.ly/logs-resource
In Summary
This is the thing you’re probably not doing
bit.ly/logs-resource
@dom_woodman
bit.ly/logs-resource
@dom_woodman

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

The Future Of SEO
The Future Of SEOThe Future Of SEO
The Future Of SEO
 
SEO split tests you should run - Will Critchlow
SEO split tests you should run - Will CritchlowSEO split tests you should run - Will Critchlow
SEO split tests you should run - Will Critchlow
 
SearchLove San Diego 2017 | Will Critchlow | Knowing Ranking Factors Won't Be...
SearchLove San Diego 2017 | Will Critchlow | Knowing Ranking Factors Won't Be...SearchLove San Diego 2017 | Will Critchlow | Knowing Ranking Factors Won't Be...
SearchLove San Diego 2017 | Will Critchlow | Knowing Ranking Factors Won't Be...
 
SearchLove Boston 2016 | Mike King | Developer Thinking for SEOs
SearchLove Boston 2016 | Mike King | Developer Thinking for SEOsSearchLove Boston 2016 | Mike King | Developer Thinking for SEOs
SearchLove Boston 2016 | Mike King | Developer Thinking for SEOs
 
Mobile SEO: Closing the Mobile Search Strategy Gap
Mobile SEO: Closing the Mobile Search Strategy GapMobile SEO: Closing the Mobile Search Strategy Gap
Mobile SEO: Closing the Mobile Search Strategy Gap
 
Amazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEOAmazon Search Summit - the need for split testing in SEO
Amazon Search Summit - the need for split testing in SEO
 
The State of SEO in 2017 - 2017 MnSearch Summit
The State of SEO in 2017 - 2017 MnSearch SummitThe State of SEO in 2017 - 2017 MnSearch Summit
The State of SEO in 2017 - 2017 MnSearch Summit
 
Next Era of SEO: A Guide to SEO Split-Testing
Next Era of SEO: A Guide to SEO Split-TestingNext Era of SEO: A Guide to SEO Split-Testing
Next Era of SEO: A Guide to SEO Split-Testing
 
CRO and SEO together: what happens when what's good for users isn't good for ...
CRO and SEO together: what happens when what's good for users isn't good for ...CRO and SEO together: what happens when what's good for users isn't good for ...
CRO and SEO together: what happens when what's good for users isn't good for ...
 
How Humans & Machines Can Improve Site Search Results - Search Y: Paris
How Humans & Machines Can Improve Site Search Results - Search Y: ParisHow Humans & Machines Can Improve Site Search Results - Search Y: Paris
How Humans & Machines Can Improve Site Search Results - Search Y: Paris
 
Humanizing The Serp, One Word at a Time
Humanizing The Serp, One Word at a TimeHumanizing The Serp, One Word at a Time
Humanizing The Serp, One Word at a Time
 
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
SearchLove San Diego 2018 | Will Critchlow | From the Horse’s Mouth: What We ...
 
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
SearchLove London 2016 | Tom Anthony | SEO Split-Testing - How You can Run Te...
 
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEORendering SEO Manifesto - Why we need to go beyond JavaScript SEO
Rendering SEO Manifesto - Why we need to go beyond JavaScript SEO
 
Gaps in the algorithm
Gaps in the algorithmGaps in the algorithm
Gaps in the algorithm
 
Your eCommerce deserves more. | InOrbit 2020
Your eCommerce deserves more. | InOrbit 2020Your eCommerce deserves more. | InOrbit 2020
Your eCommerce deserves more. | InOrbit 2020
 
3 New Techniques for the Modern Age of SEO
3 New Techniques for the Modern Age of SEO3 New Techniques for the Modern Age of SEO
3 New Techniques for the Modern Age of SEO
 
SearchLove Boston 2017 | Will Critchlow | Building Robot Allegiances
SearchLove Boston 2017 | Will Critchlow | Building Robot AllegiancesSearchLove Boston 2017 | Will Critchlow | Building Robot Allegiances
SearchLove Boston 2017 | Will Critchlow | Building Robot Allegiances
 
Site search strategy for publishers - Amazon Partners 29/06/17
Site search strategy for publishers - Amazon Partners 29/06/17Site search strategy for publishers - Amazon Partners 29/06/17
Site search strategy for publishers - Amazon Partners 29/06/17
 
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach UsSEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
SEO Tests on Big Sites & Small - What Etsy, Pinterest and Others Can Teach Us
 

Destacado

Making GA Work For You W/ Custom Variables
Making GA Work For You W/ Custom VariablesMaking GA Work For You W/ Custom Variables
Making GA Work For You W/ Custom Variables
Mike P.
 

Destacado (20)

SearchLove London 2016 | Dr. Pete Meyers | Tactical Keyword Research in a Ran...
SearchLove London 2016 | Dr. Pete Meyers | Tactical Keyword Research in a Ran...SearchLove London 2016 | Dr. Pete Meyers | Tactical Keyword Research in a Ran...
SearchLove London 2016 | Dr. Pete Meyers | Tactical Keyword Research in a Ran...
 
SearchLove London 2016 | Larry Kim | Ten CRO Truth Bombs that will Change You...
SearchLove London 2016 | Larry Kim | Ten CRO Truth Bombs that will Change You...SearchLove London 2016 | Larry Kim | Ten CRO Truth Bombs that will Change You...
SearchLove London 2016 | Larry Kim | Ten CRO Truth Bombs that will Change You...
 
SearchLove London 2016 |Jes Stiles | WhatsAppening with Chat App Marketing
SearchLove London 2016 |Jes Stiles | WhatsAppening with Chat App MarketingSearchLove London 2016 |Jes Stiles | WhatsAppening with Chat App Marketing
SearchLove London 2016 |Jes Stiles | WhatsAppening with Chat App Marketing
 
SearchLove London 2016 | Bas van den Beld | The Secrets of Storytelling
SearchLove London 2016 | Bas van den Beld | The Secrets of StorytellingSearchLove London 2016 | Bas van den Beld | The Secrets of Storytelling
SearchLove London 2016 | Bas van den Beld | The Secrets of Storytelling
 
SearchLove London 2016 | Rob Bucci | Taking the Top Spot: How to Earn More Fe...
SearchLove London 2016 | Rob Bucci | Taking the Top Spot: How to Earn More Fe...SearchLove London 2016 | Rob Bucci | Taking the Top Spot: How to Earn More Fe...
SearchLove London 2016 | Rob Bucci | Taking the Top Spot: How to Earn More Fe...
 
SearchLove London 2016 | Marcus Tober | Why User-Focused Content is the Death...
SearchLove London 2016 | Marcus Tober | Why User-Focused Content is the Death...SearchLove London 2016 | Marcus Tober | Why User-Focused Content is the Death...
SearchLove London 2016 | Marcus Tober | Why User-Focused Content is the Death...
 
SearchLove London 2016 | Jessica Gioglio | Make Your Marketing Memorable With...
SearchLove London 2016 | Jessica Gioglio | Make Your Marketing Memorable With...SearchLove London 2016 | Jessica Gioglio | Make Your Marketing Memorable With...
SearchLove London 2016 | Jessica Gioglio | Make Your Marketing Memorable With...
 
SearchLove London 2016 | Lisa Myers | The Mindset of Successful Outreach
SearchLove London 2016 | Lisa Myers | The Mindset of Successful OutreachSearchLove London 2016 | Lisa Myers | The Mindset of Successful Outreach
SearchLove London 2016 | Lisa Myers | The Mindset of Successful Outreach
 
SearchLove London 2016 | Lea Pica | How to Present to Get Results
SearchLove London 2016 | Lea Pica | How to Present to Get ResultsSearchLove London 2016 | Lea Pica | How to Present to Get Results
SearchLove London 2016 | Lea Pica | How to Present to Get Results
 
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
 
SearchLove Boston 2016 | Will Critchlow | The Emerging Future of Search
SearchLove Boston 2016 | Will Critchlow | The Emerging Future of SearchSearchLove Boston 2016 | Will Critchlow | The Emerging Future of Search
SearchLove Boston 2016 | Will Critchlow | The Emerging Future of Search
 
SearchLove Boston 2016 | Emily Grossman | Mobile Jedi Mind Tricks: Master the...
SearchLove Boston 2016 | Emily Grossman | Mobile Jedi Mind Tricks: Master the...SearchLove Boston 2016 | Emily Grossman | Mobile Jedi Mind Tricks: Master the...
SearchLove Boston 2016 | Emily Grossman | Mobile Jedi Mind Tricks: Master the...
 
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified DiffChoose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
 
Making GA Work For You W/ Custom Variables
Making GA Work For You W/ Custom VariablesMaking GA Work For You W/ Custom Variables
Making GA Work For You W/ Custom Variables
 
INBOUND Bold Talks: Jessica Gioglio
INBOUND Bold Talks: Jessica GioglioINBOUND Bold Talks: Jessica Gioglio
INBOUND Bold Talks: Jessica Gioglio
 
FutureM 2014 - TV With a Turbo Shot: Why Fans Crave Dunkin' Donuts + Social TV
FutureM 2014 - TV With a Turbo Shot: Why Fans Crave Dunkin' Donuts + Social TVFutureM 2014 - TV With a Turbo Shot: Why Fans Crave Dunkin' Donuts + Social TV
FutureM 2014 - TV With a Turbo Shot: Why Fans Crave Dunkin' Donuts + Social TV
 
Content Marketing In The Era of Infobesity
Content Marketing  In The Era of InfobesityContent Marketing  In The Era of Infobesity
Content Marketing In The Era of Infobesity
 
BlogWell Boston Social Media Case Study: Reebok, presented by Ben Cobb
BlogWell Boston Social Media Case Study: Reebok, presented by Ben CobbBlogWell Boston Social Media Case Study: Reebok, presented by Ben Cobb
BlogWell Boston Social Media Case Study: Reebok, presented by Ben Cobb
 
BlogWell Boston Social Media Case Study: Sanofi US, presented by Laura Kolodj...
BlogWell Boston Social Media Case Study: Sanofi US, presented by Laura Kolodj...BlogWell Boston Social Media Case Study: Sanofi US, presented by Laura Kolodj...
BlogWell Boston Social Media Case Study: Sanofi US, presented by Laura Kolodj...
 
BlogWell Boston Social Media Case Study: Green Mountain Coffee Roasters, pres...
BlogWell Boston Social Media Case Study: Green Mountain Coffee Roasters, pres...BlogWell Boston Social Media Case Study: Green Mountain Coffee Roasters, pres...
BlogWell Boston Social Media Case Study: Green Mountain Coffee Roasters, pres...
 

Similar a SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs

Similar a SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs (20)

SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your LogsSearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
SearchLove Boston 2017 | Dom Woodman | How to Get Insight From Your Logs
 
SEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech SideSEO for Large/Enterprise Websites - Data & Tech Side
SEO for Large/Enterprise Websites - Data & Tech Side
 
Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)Log analysis and pro use cases for search marketers online version (1)
Log analysis and pro use cases for search marketers online version (1)
 
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Ho...
 
Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript Deep crawl the chaotic landscape of JavaScript
Deep crawl the chaotic landscape of JavaScript
 
Google Tag Manager for Ecommerce
Google Tag Manager for EcommerceGoogle Tag Manager for Ecommerce
Google Tag Manager for Ecommerce
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
Technical SEO: Crawl Space Management - SEOZone Istanbul 2014
 
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
02.03.21 Collaborator.pro Webinar Решение 10 главных задач технической оптими...
 
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic PantaleonWebsite Audit [On Page and Off Page] by Carl Benedic Pantaleon
Website Audit [On Page and Off Page] by Carl Benedic Pantaleon
 
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...
SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...SEARCH Y :  Benjamin Bussière - Javascript and seo  misconceptions, misunders...
SEARCH Y : Benjamin Bussière - Javascript and seo misconceptions, misunders...
 
TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom TFM - Using Google Tag Manager for ecom
TFM - Using Google Tag Manager for ecom
 
Analysis report didm
Analysis report didmAnalysis report didm
Analysis report didm
 
WordPress SEO in 2014 - WordCamp Baltimore 2014
WordPress SEO in 2014 - WordCamp Baltimore 2014WordPress SEO in 2014 - WordCamp Baltimore 2014
WordPress SEO in 2014 - WordCamp Baltimore 2014
 
Javascript SEO - Leicester Digital May 2018
Javascript SEO - Leicester Digital May 2018Javascript SEO - Leicester Digital May 2018
Javascript SEO - Leicester Digital May 2018
 
Demand Quest SEO training session 2
Demand Quest SEO training session 2Demand Quest SEO training session 2
Demand Quest SEO training session 2
 
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
BrightonSEO 5 Critical Questions Your Log Files Can Answer September 2016
 
Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2Demand Quest SEO Training - Session 2
Demand Quest SEO Training - Session 2
 
Seo for Engineers
Seo for EngineersSeo for Engineers
Seo for Engineers
 
Modern JavaScript and SEO
Modern JavaScript and SEOModern JavaScript and SEO
Modern JavaScript and SEO
 

Más de Distilled

Más de Distilled (20)

SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...
SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...
SearchLove London 2019 - Will Critchlow - Misunderstood Concepts at the Heart...
 
SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...
SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...
SearchLove London 2019 - Stacey MacNaught - Actioning Search Intent: What to ...
 
SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...
SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...
SearchLove London 2019 - Lindsay Wassell - Managing Multinational & Multiling...
 
SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...
SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...
SearchLove London 2019 - Dr. Pete Meyers - Scaling Keyword Research: More Isn...
 
SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...
SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...
SearchLoveLondon 2019 - Faisal Anderson - Spying on Google: Using Log File An...
 
SearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your Audience
SearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your AudienceSearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your Audience
SearchLove London 2019 - Rory Truesdale - Using the SERPs to Know Your Audience
 
SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019
SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019
SearchLove London 2019 - Rand Fishkin - The Search Landscape in 2019
 
SearchLove London 2019 - Jes Scholtz - Giving Robots an All Access Pass
SearchLove London 2019 - Jes Scholtz - Giving Robots an All Access PassSearchLove London 2019 - Jes Scholtz - Giving Robots an All Access Pass
SearchLove London 2019 - Jes Scholtz - Giving Robots an All Access Pass
 
SearchLove London 2019 - Heather Physioc - Building a Discoverability Powerhouse
SearchLove London 2019 - Heather Physioc - Building a Discoverability PowerhouseSearchLove London 2019 - Heather Physioc - Building a Discoverability Powerhouse
SearchLove London 2019 - Heather Physioc - Building a Discoverability Powerhouse
 
SearchLove London 2019 - Andi Jarvis - The Science of Persuasion
SearchLove London 2019 - Andi Jarvis - The Science of PersuasionSearchLove London 2019 - Andi Jarvis - The Science of Persuasion
SearchLove London 2019 - Andi Jarvis - The Science of Persuasion
 
SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...
SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...
SearchLove London 2019 - Luke Carthy - Finding Powerful CRO and UX Opportunit...
 
SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...
SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...
SearchLove London 2019 - Greg Gifford - Doc Brown's Plutonium-powered Local S...
 
SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...
SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...
SearchLove London 2019 - Sarah Gurbach - Using Qualitative Data to Make Human...
 
SearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-T
SearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-TSearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-T
SearchLove London 2019 - Marie Haynes - Practical Tips for Improving E-A-T
 
SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019
SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019
SearchLove Boston 2019 - Rand Fishkin - Building Influence in 2019
 
SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...
SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...
SearchLove Boston 2019 - Courtney Cox Wakefield - Voice Search and Instant An...
 
SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...
SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...
SearchLove Boston 2019 - Tom Anthony - Search in 2020: Technologies That Will...
 
SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...
SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...
SearchLove Boston 2019 - Derek Gleason - Benchmarking Success for Client Site...
 
SearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s Toolkit
SearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s ToolkitSearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s Toolkit
SearchLove Boston 2019 - Kameron Jenkins - The Modern Search Writer’s Toolkit
 
SearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEO
SearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEOSearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEO
SearchLove Boston 2019 - Joy Hawkins - 10 Ways to Get Results with Local SEO
 

Último

FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
dollysharma2066
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
Cara Menggugurkan Kandungan 087776558899
 

Último (20)

Unlocking the Mystery of the Voynich Manuscript
Unlocking the Mystery of the Voynich ManuscriptUnlocking the Mystery of the Voynich Manuscript
Unlocking the Mystery of the Voynich Manuscript
 
Elevating Your Digital Presence by Evitha.pdf
Elevating Your Digital Presence by Evitha.pdfElevating Your Digital Presence by Evitha.pdf
Elevating Your Digital Presence by Evitha.pdf
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 
Elevate Your Advertising Game: Introducing Billion Broadcaster Lift Advertising
Elevate Your Advertising Game: Introducing Billion Broadcaster Lift AdvertisingElevate Your Advertising Game: Introducing Billion Broadcaster Lift Advertising
Elevate Your Advertising Game: Introducing Billion Broadcaster Lift Advertising
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
 
Martal Group - B2B Lead Gen Agency - Onboarding Overview
Martal Group - B2B Lead Gen Agency - Onboarding OverviewMartal Group - B2B Lead Gen Agency - Onboarding Overview
Martal Group - B2B Lead Gen Agency - Onboarding Overview
 
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
 
Unveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptx
Unveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptxUnveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptx
Unveiling the Legacy of the Rosetta stone A Key to Ancient Knowledge.pptx
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
 
Busty Desi⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
 
Rise and fall of Kulula.com, an airline won consumers by different marketing ...
Rise and fall of Kulula.com, an airline won consumers by different marketing ...Rise and fall of Kulula.com, an airline won consumers by different marketing ...
Rise and fall of Kulula.com, an airline won consumers by different marketing ...
 
Major SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain DigitalMajor SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain Digital
 
Social media, ppt. Features, characteristics
Social media, ppt. Features, characteristicsSocial media, ppt. Features, characteristics
Social media, ppt. Features, characteristics
 
Welcome to DataMetricks Consulting (1).pptx
Welcome to DataMetricks Consulting (1).pptxWelcome to DataMetricks Consulting (1).pptx
Welcome to DataMetricks Consulting (1).pptx
 
Enhancing Business Visibility PR Firms in San Francisco
Enhancing Business Visibility PR Firms in San FranciscoEnhancing Business Visibility PR Firms in San Francisco
Enhancing Business Visibility PR Firms in San Francisco
 
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night ServiceVIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
VIP Call Girls Dongri WhatsApp +91-9833363713, Full Night Service
 
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdfMicro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
Micro-Choices, Max Impact Personalizing Your Journey, One Moment at a Time.pdf
 
The+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdfThe+State+of+Careers+In+Retention+Marketing-2.pdf
The+State+of+Careers+In+Retention+Marketing-2.pdf
 
Discover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your LifestyleDiscover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your Lifestyle
 
TAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdf
TAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdfTAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdf
TAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdf
 

SearchLove London 2016 | Dom Woodman | How to Get Insight From Your Logs

  • 2.
  • 3.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18. Why hasn’t Google seen the changes on my page?
  • 19. How should I prioritise errors in Search Console?
  • 20. Are my canonicals being respected?
  • 21. Does Google think this page is important?
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 29.
  • 30.
  • 31. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" IP Address
  • 32. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Timestamp
  • 33. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Request type
  • 34. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Homepage
  • 35. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Protocol
  • 36. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Status Code
  • 37. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Size of the page (in bytes)
  • 38. What does a log look like? 123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html))" User Agent
  • 39. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 40. 5 things 2 3 4 51
  • 41. 1 Diagnose crawling & indexation issues 2 3 4 51
  • 42.
  • 43.
  • 44. Number of requests Five folders Googlebot crawled the most
  • 45. Five folders Googlebot crawled the most Number of requests
  • 46. % of Organic sessions VS % of crawl budget Sessions Crawl budget
  • 48.
  • 57. 3 Spot bugs & view site health 2 3 4 51
  • 58. Delayed errors with a limit of 1000
  • 59.
  • 60. 4 How important does Google see parts of your site? 2 3 4 51
  • 61. My SEO was as bad as my design
  • 62. But at least my hair was better
  • 67. Average number of times Googlebot crawled a template
  • 68. 1. teflsearch.com 2. teflsearch.com/job-results 3. teflsearch.com/job-results/country/china 4. teflsearch.com/job-advert3455
  • 69. 1. teflsearch.com 2. teflsearch.com/job-results 3. teflsearch.com/job-results/country/china 4. teflsearch.com/job-advert3455
  • 71. Average number of times Googlebot crawled a template 35%
  • 72. 5 How fresh does it think your content is? 2 3 4 51
  • 74. Average number of times a page template is crawled by Googlebot
  • 75. ●Improve our internal linking ●Build trust with last modified date in sitemap
  • 76. 2 3 4 51
  • 77. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 78.
  • 79.
  • 80.
  • 81. Talk to a developer and ask for information
  • 82. Are all the logs in one place?
  • 83. Hi x I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about the log set-up (as well as with getting the logs!). What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re spending their time, the status code errors they’re finding etc. There are also some things that are really helpful for us to know when getting logs. Do the logs have any personal informationin? We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be removed. Do you have any sort of caching which would create separate sets of logs? If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well as just those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example to cache external images then we don’t need it). Are there any sub parts of your site which log to a different place? Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well. Do you log hostname? It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very useful to have that turned on now for any future analysis. Is there anything else we should know? Best, {x} Email for a developer
  • 84. So we might have something that looks like this
  • 85. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 86.
  • 87.
  • 88.
  • 90.
  • 92. Google’s online database for data analysis.
  • 93. 1. Ask powerful questions 2. Repeatable 3. Scaleable 4. Combine with crawl data 5. Easy to set-up 6. Easy to learn What do we want from analysing our logs?
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99. 9,000,000 rows of data for 2 months. 400 - 800 queries
  • 100.
  • 101. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 102. Format the logs so we can import them into BigQuery Separate the Googlebot logs from all the other logs
  • 104. Screaming Frog Log Analyser
  • 105.
  • 108. What can you do with logs? PART 1: THE WHY Getting logs Analysing Logs Processing Logs PART 2: THE HOW
  • 109. Our data in BQ
  • 110. We make sure we got what we wanted
  • 111. THE QUESTION: What is the total number of requests Googlebot makes each day to our site?
  • 112. Our first SQL query SELECT timestamp FROM [mydata.log_analysis]
  • 113. Our first SQL query SELECT timestamp FROM [mydata.log_analysis]
  • 114. Our first SQL query SELECT DATE(timestamp) FROM [mydata.log_analysis]
  • 115. Our first SQL query SELECT DATE(timestamp) FROM [mydata.log_analysis]
  • 116. Our first SQL query SELECT DATE(timestamp) as date FROM [mydata.log_analysis]
  • 117. Our first SQL query SELECT DATE(timestamp) as date FROM [mydata.log_analysis]
  • 118. Our first SQL query SELECT DATE(timestamp) as date, count(*) FROM [mydata.log_analysis]
  • 119. Our first SQL query SELECT DATE(timestamp) as date, count(*) FROM [mydata.log_analysis] GROUP BY date
  • 120. Our first SQL query SELECT DATE(timestamp) as date, count(*) as number_of_requests FROM [mydata.log_analysis] GROUP BY date
  • 121. Our first SQL query SELECT DATE(timestamp) as date, count(*) as number_of_requests FROM [mydata.log_analysis] GROUP BY date
  • 122. Comparing logs to GSC crawl volume Number of requests
  • 123. Run queries Find something weird Go look at crawl & website
  • 124. Our data in BQ
  • 125. 1 Diagnose crawling & indexation issues
  • 127. 3 Spot bugs & view site health
  • 128. 4 How important does Google see parts of your site?
  • 129. 5 How fresh does it think your content is?
  • 130. 1 Diagnose crawling & indexation issues 4 How important does Google see parts of your site?
  • 131. What are the top 20 URLs crawled by Google over our logs?
  • 132. Login is my top crawled page and then search?
  • 133. What are the top 20 page_path_1 folders crawled by Google over our logs?
  • 134. Location folders are taking more than 70% of my budget
  • 135. Getting data by the day Page Number of Googlebot Requests page1 200,000 page2 120,000
  • 136. Number of Googlebot requests day by day
  • 137. 3 Spot bugs & view site health
  • 138. How many of each status code does Google find per day over our logs?
  • 139. Number of Googlebot requests day by day
  • 140. What are most requested 404 URLs by Googlebot over the past 30 days?
  • 141. Boy does it want that ad-tech snippet
  • 142. 5 How fresh does it think your content is?
  • 143. How many times on average is each page in a page template crawled a day?
  • 144. Average number of times a page template is crawled by Googlebot
  • 145. How long does it take for a page to be discovered after being published?
  • 146. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs?
  • 147. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl?
  • 148. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled?
  • 149. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website?
  • 150. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day?
  • 151. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes?
  • 152. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters?
  • 153. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset?
  • 154. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset? What percentage of a directory was crawled over the past 30 days?
  • 155. How long does it take for a page to be discovered after being published? What are the top 20 combinations of page_path_1 & path_path_2 folders crawled by Google over the time period of our logs? Which pages have requests from Googlebot, which don’t appear in our crawl? What are the top non-canonical pages being crawled? Which are most crawled parameters on the website? How often are the most visited parameters crawled each day? Which directories have the most 301 & 404 error codes? Which pages are crawled with parameters and without parameters? Which pages are only partly downloaded? How many hits does each section get, when the sections are classified in an external dataset? What percentage of a directory was crawled over the past 30 days? What are the total number of requests across two different time periods?
  • 156. That’s a lot of questions
  • 162. This is the thing you’re probably not doing
  • 163.
  • 164.
  • 165.
  • 167.

Notas del editor

  1. Walmart listened but it didnt’ go and look at what it’s customers were doing
  2. https://www.deepcrawl.com/knowledge/news/google-webmaster-hangout-notes-september-9th-2016/
  3. Start as an actual story Can i have the house salad please Greek or lentils Olives or no olives Green or black Stone or no stones Vinegrette? Balsamic or Ceaser Balsamic Do you want rocket? I would like a salad
  4. Ask for pii to be removed - how many logs - the dates?
  5. The Good You can customize for more complicated logging formats You can use reverse DNS lookup and ASN lookup You can work with log datasets that are too large to download to your computer
  6. Start as an actual story Can i have the house salad please Greek or lentils Olives or no olives Green or black Stone or no stones Vinegrette? Balsamic or Ceaser Balsamic Do you want rocket? I would like a salad
  7. This is the summation of years worth of work - i can’t fit it into a 40 min presentation so i put resources here. Dw if you get lost it’s all here