SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
Automated Duplicate
Content Consolidation with
Google Cloud Functions
Speaking today /
Présenté par
Automating Google
Lighthouse
Hamlet Batista // RankSense
slidehare.net/hamletbatista
@hamletbatista
https://jamesclear.com/marginal-gains
Agenda
➢Finding marginal but
repeatable success
➢Scaling it with automation
Cruiseline.com
Success Story
➢ No www to non-www
redirects
➢ No canonicals
➢ Redundant parameter
URLs
➢ Only 1.40% of
indexed pages with
search clicks (out of
+300k pages)
The Google SEO
Scorecard Report
➢ Duplicate content
consolidation can be
executed relatively quickly,
as it requires a small set of
technical changes
➢ You will likely see improved
rankings within weeks after
the corrections are in place
➢ New changes and
improvements to your site
are picked up faster by
Google
➢ Natzir found the
total traffic to
pages ranking
for the same
keyword was
less than when
consolidated
with redirects
➢ Same idea but
from a
keywords’
perspective
https://www.youtube.com/watch?v=zI_jkhSyAew
Cruiseline.com
Reverse
Engineering
➢ Finding repeatable success
➢ Searching for a machine
learning model to connect
new visits to technical SEO
changes
➢ We focused on the impact
of links, indexing, and
canonical clustering
Our best predictive model
achieved 85% test accuracy
➢ Canonicalization drives
repeatable success
➢ The size of the canonical
cluster turned out to be a
strong predictor
One oversimplified way to
think about a machine
learning model is to
picture a linear regression
function in Excel/Sheets.
We predicted new users
(Y) within canonicalized
clusters dependent on the
size of the clusters (X).
Machine Learning 101
https://bit.ly/3lGyeqA
To Canonicalize
or Not to
Canonicalize
Current canonical clustering is
mostly self-referential (orange)
Every product variant
canonicalizes to itself.
Their optimal canonical setup is the
inverse.
Most clusters should canonicalize to one
product “leader”
For some products, people
specific the color they want
directly in Google. But, for other
products, they don’t.
They decide the color they want
after seeing the options
available in the site.
https://bit.ly/36ZxXel
Technical Plan
➢ Build clusters using OnCrawl
➢ Get search demand using SEMrush
➢ Canonicalization algorithm
➢ Experiment on CDN using RankSense
➢ Automate everything using Cloud Functions and
Pub/sub queues
Coupled vs
Decoupled
Systems
Pub/Sub is an asynchronous
messaging service that
decouples services that
produce events from services
that process events.
It allows us to connect
OnCrawl, SEMrush, and
RankSense asynchronously to
complete a custom workflow.
Cloud Scheduler acts as a
single pane of glass, allowing us
to manage all our automation
tasks from one place.
It allows us to trigger our custom
workflow on recurring times as
search demand changes with
seasons.
Clustering with
OnCrawl
Search Demand
Tracking with
SEMrush
➢ Cloud Scheduler triggers
OnCrawl Cloud Function
which uploads each craw
export to Cloud Storage
➢ Cloud Storage update
triggers SEMrush Cloud
Function which then exports
search demand data to
Cloud Storage
Canonicalization
Algorithm
➢ We are going to perform an
intermediate step and force
all product groups to
canonicalize to the “leader”
URL in the group.
➢ The “leader” could be the
URL with most search
traffic, more
internal/external links or
most frequently crawled
We end up with one cluster that
we need to update, which
means that David Yurman is
leaving a lot of money on the
table with their current setup
that relies on self-referential
canonicals.
Deploying to
Cloudflare’s CDN
with RankSense
We are going to use the
RankSense API to publish our
new canonical clusters as
experiments in the Cloudflare
CDN
https://bit.ly/3jWm4JP
➢ We automatically populate
a Google Sheet with the
changes
➢ We submit the Sheet to
RankSense’s
PRODUCTION environment
Resources to Learn More
➢ Python code covered in this presentation
https://github.com/ranksense/weloveseo
➢ Advanced Duplicate Content Consolidation with
Python
https://www.searchenginejournal.com/advanced-
duplicate-content-consolidation-python/314471/
➢ Cloud Functions https://cloud.google.com/functions
➢ Google PubSub https://cloud.google.com/pubsub
➢ Introduction to Python for SEO Pros
https://www.searchenginejournal.com/introduction-
to-python-seo-spreadsheets/342779/
Thank you!

Más contenido relacionado

La actualidad más candente

Introduction to angular js july 6th 2014
Introduction to angular js   july 6th 2014Introduction to angular js   july 6th 2014
Introduction to angular js july 6th 2014
Simona Clapan
 
Intro to SPA using JavaScript & ASP.NET
Intro to SPA using JavaScript & ASP.NETIntro to SPA using JavaScript & ASP.NET
Intro to SPA using JavaScript & ASP.NET
Alan Hecht
 

La actualidad más candente (20)

Seo for single page applications
Seo for single page applicationsSeo for single page applications
Seo for single page applications
 
Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3Going Headless with Craft CMS 3.3
Going Headless with Craft CMS 3.3
 
Using Google App Engine Python
Using Google App Engine PythonUsing Google App Engine Python
Using Google App Engine Python
 
Client Side Optimization
Client Side OptimizationClient Side Optimization
Client Side Optimization
 
Pros and Cons of developing a Thick Clientside App
Pros and Cons of developing a Thick Clientside AppPros and Cons of developing a Thick Clientside App
Pros and Cons of developing a Thick Clientside App
 
Firebase
FirebaseFirebase
Firebase
 
Host, deploy & scale Blazor Server Apps
Host, deploy & scale Blazor Server AppsHost, deploy & scale Blazor Server Apps
Host, deploy & scale Blazor Server Apps
 
Introduction to Firebase on Android
Introduction to Firebase on AndroidIntroduction to Firebase on Android
Introduction to Firebase on Android
 
Introduction to Firebase
Introduction to FirebaseIntroduction to Firebase
Introduction to Firebase
 
Introduction to angular js july 6th 2014
Introduction to angular js   july 6th 2014Introduction to angular js   july 6th 2014
Introduction to angular js july 6th 2014
 
Angular universal
Angular universalAngular universal
Angular universal
 
Progressive Web Apps
Progressive Web AppsProgressive Web Apps
Progressive Web Apps
 
ASP.NET MVC and ajax
ASP.NET MVC and ajax ASP.NET MVC and ajax
ASP.NET MVC and ajax
 
Serverless by examples and case studies
Serverless by examples and case studiesServerless by examples and case studies
Serverless by examples and case studies
 
Modern Static Site with GatsbyJS
Modern Static Site with GatsbyJSModern Static Site with GatsbyJS
Modern Static Site with GatsbyJS
 
Introduction to Google App Engine
Introduction to Google App EngineIntroduction to Google App Engine
Introduction to Google App Engine
 
Getting Started with Firebase Cloud Functions
Getting Started with Firebase Cloud FunctionsGetting Started with Firebase Cloud Functions
Getting Started with Firebase Cloud Functions
 
Intro to SPA using JavaScript & ASP.NET
Intro to SPA using JavaScript & ASP.NETIntro to SPA using JavaScript & ASP.NET
Intro to SPA using JavaScript & ASP.NET
 
A Simpler Web App Architecture (jDays 2016)
A Simpler Web App Architecture (jDays 2016)A Simpler Web App Architecture (jDays 2016)
A Simpler Web App Architecture (jDays 2016)
 
Azure and web sites hackaton deck
Azure and web sites hackaton deckAzure and web sites hackaton deck
Azure and web sites hackaton deck
 

Similar a Automated Duplicate Content Consolidation with Google Cloud Functions

Website Performance
Website PerformanceWebsite Performance
Website Performance
Hugo Fonseca
 
AX Paris Audit and Analysis
AX Paris Audit and AnalysisAX Paris Audit and Analysis
AX Paris Audit and Analysis
Evolutia
 

Similar a Automated Duplicate Content Consolidation with Google Cloud Functions (20)

Automated Duplicate Content Consolidation with Google Cloud Functions
Automated Duplicate Content Consolidation with Google Cloud FunctionsAutomated Duplicate Content Consolidation with Google Cloud Functions
Automated Duplicate Content Consolidation with Google Cloud Functions
 
Cdn optimizely and how latency affects load speed
Cdn optimizely and how latency affects load speedCdn optimizely and how latency affects load speed
Cdn optimizely and how latency affects load speed
 
SEO & Google by Taylor
SEO & Google by TaylorSEO & Google by Taylor
SEO & Google by Taylor
 
Website Performance
Website PerformanceWebsite Performance
Website Performance
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speed
 
Dynamic Content Acceleration: Lightning Fast Web Apps with Amazon CloudFront ...
Dynamic Content Acceleration: Lightning Fast Web Apps with Amazon CloudFront ...Dynamic Content Acceleration: Lightning Fast Web Apps with Amazon CloudFront ...
Dynamic Content Acceleration: Lightning Fast Web Apps with Amazon CloudFront ...
 
Migration Best-Practices: Successfully re-launching your website - SMX New Yo...
Migration Best-Practices: Successfully re-launching your website - SMX New Yo...Migration Best-Practices: Successfully re-launching your website - SMX New Yo...
Migration Best-Practices: Successfully re-launching your website - SMX New Yo...
 
Optimizing Speed & Security of Oracle Commerce Sites Using Cloudflare
Optimizing Speed & Security  of Oracle Commerce Sites Using CloudflareOptimizing Speed & Security  of Oracle Commerce Sites Using Cloudflare
Optimizing Speed & Security of Oracle Commerce Sites Using Cloudflare
 
How To Rank On Google In 5 Minutes Using WordPress
How To Rank On Google In 5 Minutes Using WordPressHow To Rank On Google In 5 Minutes Using WordPress
How To Rank On Google In 5 Minutes Using WordPress
 
Drupal Effect on High Performance Websites
Drupal Effect on High Performance Websites Drupal Effect on High Performance Websites
Drupal Effect on High Performance Websites
 
WordPress with WP Engine and the Agency Partner Program: Getting Set Up
WordPress with WP Engine and the Agency Partner Program: Getting Set UpWordPress with WP Engine and the Agency Partner Program: Getting Set Up
WordPress with WP Engine and the Agency Partner Program: Getting Set Up
 
Amp your site an intro to accelerated mobile pages
Amp your site  an intro to accelerated mobile pagesAmp your site  an intro to accelerated mobile pages
Amp your site an intro to accelerated mobile pages
 
Amp your site: An intro to accelerated mobile pages
Amp your site: An intro to accelerated mobile pagesAmp your site: An intro to accelerated mobile pages
Amp your site: An intro to accelerated mobile pages
 
Optimization 2020 | Using Edge SEO For Technical Issues ft. Dan Taylor
Optimization 2020 | Using Edge SEO For Technical Issues ft. Dan TaylorOptimization 2020 | Using Edge SEO For Technical Issues ft. Dan Taylor
Optimization 2020 | Using Edge SEO For Technical Issues ft. Dan Taylor
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
Cloudamize Platform Training for Azure.pptx
Cloudamize Platform Training for Azure.pptxCloudamize Platform Training for Azure.pptx
Cloudamize Platform Training for Azure.pptx
 
Dynamic Content Acceleration: Amazon CloudFront and Amazon Route 53 (ARC309) ...
Dynamic Content Acceleration: Amazon CloudFront and Amazon Route 53 (ARC309) ...Dynamic Content Acceleration: Amazon CloudFront and Amazon Route 53 (ARC309) ...
Dynamic Content Acceleration: Amazon CloudFront and Amazon Route 53 (ARC309) ...
 
PPT on web development & SEO
PPT on web development & SEOPPT on web development & SEO
PPT on web development & SEO
 
AX Paris Audit and Analysis
AX Paris Audit and AnalysisAX Paris Audit and Analysis
AX Paris Audit and Analysis
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
 

Más de WeLoveSEO

Más de WeLoveSEO (7)

Core Web Vitals, les indicateurs de vitesse qui réconcilient UX et SEO
Core Web Vitals, les indicateurs de vitesse qui réconcilient UX et SEOCore Web Vitals, les indicateurs de vitesse qui réconcilient UX et SEO
Core Web Vitals, les indicateurs de vitesse qui réconcilient UX et SEO
 
Muscler le SEO pour des contenus en forme ! [Etude de cas Decathlon]
Muscler le SEO pour des contenus en forme ! [Etude de cas Decathlon] Muscler le SEO pour des contenus en forme ! [Etude de cas Decathlon]
Muscler le SEO pour des contenus en forme ! [Etude de cas Decathlon]
 
Comment utiliser la data science pour soutenir et prioriser les actions de ré...
Comment utiliser la data science pour soutenir et prioriser les actions de ré...Comment utiliser la data science pour soutenir et prioriser les actions de ré...
Comment utiliser la data science pour soutenir et prioriser les actions de ré...
 
Adapting to Google's Criteria for High-Authority, Top-Ranking Websites in 2020
Adapting to Google's Criteria for High-Authority, Top-Ranking Websites in 2020Adapting to Google's Criteria for High-Authority, Top-Ranking Websites in 2020
Adapting to Google's Criteria for High-Authority, Top-Ranking Websites in 2020
 
[Etude de cas] Audit d'opportunités SEO ou comment focaliser ses efforts sur ...
[Etude de cas] Audit d'opportunités SEO ou comment focaliser ses efforts sur ...[Etude de cas] Audit d'opportunités SEO ou comment focaliser ses efforts sur ...
[Etude de cas] Audit d'opportunités SEO ou comment focaliser ses efforts sur ...
 
Nouveaux indicateurs seo pour vos reportings 2020 #WLSVS
Nouveaux indicateurs seo pour vos reportings 2020 #WLSVSNouveaux indicateurs seo pour vos reportings 2020 #WLSVS
Nouveaux indicateurs seo pour vos reportings 2020 #WLSVS
 
Conference deck - The new visibility indicators to use in your seo reports
Conference deck - The new visibility indicators to use in your seo reportsConference deck - The new visibility indicators to use in your seo reports
Conference deck - The new visibility indicators to use in your seo reports
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Automated Duplicate Content Consolidation with Google Cloud Functions

  • 1. Automated Duplicate Content Consolidation with Google Cloud Functions
  • 2. Speaking today / Présenté par Automating Google Lighthouse Hamlet Batista // RankSense slidehare.net/hamletbatista @hamletbatista
  • 4. Agenda ➢Finding marginal but repeatable success ➢Scaling it with automation
  • 6.
  • 7.
  • 8.
  • 9. ➢ No www to non-www redirects ➢ No canonicals ➢ Redundant parameter URLs ➢ Only 1.40% of indexed pages with search clicks (out of +300k pages)
  • 10.
  • 12.
  • 13. ➢ Duplicate content consolidation can be executed relatively quickly, as it requires a small set of technical changes ➢ You will likely see improved rankings within weeks after the corrections are in place ➢ New changes and improvements to your site are picked up faster by Google
  • 14. ➢ Natzir found the total traffic to pages ranking for the same keyword was less than when consolidated with redirects ➢ Same idea but from a keywords’ perspective https://www.youtube.com/watch?v=zI_jkhSyAew
  • 16. ➢ Finding repeatable success ➢ Searching for a machine learning model to connect new visits to technical SEO changes ➢ We focused on the impact of links, indexing, and canonical clustering
  • 17.
  • 18. Our best predictive model achieved 85% test accuracy ➢ Canonicalization drives repeatable success ➢ The size of the canonical cluster turned out to be a strong predictor
  • 19. One oversimplified way to think about a machine learning model is to picture a linear regression function in Excel/Sheets. We predicted new users (Y) within canonicalized clusters dependent on the size of the clusters (X). Machine Learning 101 https://bit.ly/3lGyeqA
  • 20. To Canonicalize or Not to Canonicalize
  • 21. Current canonical clustering is mostly self-referential (orange) Every product variant canonicalizes to itself.
  • 22. Their optimal canonical setup is the inverse. Most clusters should canonicalize to one product “leader”
  • 23. For some products, people specific the color they want directly in Google. But, for other products, they don’t. They decide the color they want after seeing the options available in the site.
  • 25. Technical Plan ➢ Build clusters using OnCrawl ➢ Get search demand using SEMrush ➢ Canonicalization algorithm ➢ Experiment on CDN using RankSense ➢ Automate everything using Cloud Functions and Pub/sub queues
  • 27.
  • 28. Pub/Sub is an asynchronous messaging service that decouples services that produce events from services that process events. It allows us to connect OnCrawl, SEMrush, and RankSense asynchronously to complete a custom workflow.
  • 29.
  • 30. Cloud Scheduler acts as a single pane of glass, allowing us to manage all our automation tasks from one place. It allows us to trigger our custom workflow on recurring times as search demand changes with seasons.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 38.
  • 39. ➢ Cloud Scheduler triggers OnCrawl Cloud Function which uploads each craw export to Cloud Storage ➢ Cloud Storage update triggers SEMrush Cloud Function which then exports search demand data to Cloud Storage
  • 40.
  • 42. ➢ We are going to perform an intermediate step and force all product groups to canonicalize to the “leader” URL in the group. ➢ The “leader” could be the URL with most search traffic, more internal/external links or most frequently crawled
  • 43.
  • 44. We end up with one cluster that we need to update, which means that David Yurman is leaving a lot of money on the table with their current setup that relies on self-referential canonicals.
  • 46.
  • 47. We are going to use the RankSense API to publish our new canonical clusters as experiments in the Cloudflare CDN https://bit.ly/3jWm4JP
  • 48. ➢ We automatically populate a Google Sheet with the changes ➢ We submit the Sheet to RankSense’s PRODUCTION environment
  • 49.
  • 50. Resources to Learn More ➢ Python code covered in this presentation https://github.com/ranksense/weloveseo ➢ Advanced Duplicate Content Consolidation with Python https://www.searchenginejournal.com/advanced- duplicate-content-consolidation-python/314471/ ➢ Cloud Functions https://cloud.google.com/functions ➢ Google PubSub https://cloud.google.com/pubsub ➢ Introduction to Python for SEO Pros https://www.searchenginejournal.com/introduction- to-python-seo-spreadsheets/342779/