SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Web scraping
with Ruby
@LeWagonTokyo
Web scraping with Ruby
Understanding HTML DOM
Uunderstanding Ruby
Happy scraping!
Today’s repository
https://github.com/hidehiro98/ruby-
scraping
Workshop outline
1. Set up
Setting tools up
2. Theoretical intros
Basic concepts to understand
3. Build with me
Step by step


4. Try your own scraping
Enjoy it ; )
1. Let’s set up
Setup
https://www.gitpod.io/#https://
github.com/hidehiro98/ruby-scraping
2. Theoritical intro about
Web scraping
When do we need web
scraping?
You want information, but no API for
that.
What's a web scraper?
Web scraping is data scraping used
for extracting data from websites.
- Wikipedia
What's an HTML DOM?
We will use the OpenURI
and the Nokogiri
3. Let’s build
Install the Nokogiri
$ gem install nokogiri
4. Scrape any website with
live code
5. Scrape any website by
yourself!
If you want to scrape
dynamic website
https://readysteadycode.com/howto-
scrape-websites-with-ruby-and-
headless-chrome
Go further
lewagon.com/tokyo
@LeWagonTokyo
Thank you!
@LeWagonTokyo

Más contenido relacionado

Similar a Web scraping with Ruby

Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapyIván Compañy Avi
 
Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapyIván Compañy Avi
 
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스Rhio Kim
 
OSINT tools for security auditing with python
OSINT tools for security auditing with pythonOSINT tools for security auditing with python
OSINT tools for security auditing with pythonJose Manuel Ortega Candel
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowNeo4j
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine SpidersCJ Jenkins
 
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !Florent BENOIT
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)PRISMA CSI
 
Sinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote FormatSinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote FormatAdam Lowe
 
Sinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatSinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatAdam Lowe
 
Free Mongo on OpenShift
Free Mongo on OpenShiftFree Mongo on OpenShift
Free Mongo on OpenShiftSteven Pousty
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With PythonRobert Dempsey
 
Scrabbly GTUG presentation
Scrabbly GTUG presentationScrabbly GTUG presentation
Scrabbly GTUG presentationGrant Goodale
 
Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018Chris Tankersley
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to railsLukas Eppler
 

Similar a Web scraping with Ruby (20)

WebGL Awesomeness
WebGL AwesomenessWebGL Awesomeness
WebGL Awesomeness
 
Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapy
 
Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapy
 
Hack Rio/OS
Hack Rio/OSHack Rio/OS
Hack Rio/OS
 
Ruby And The Cloud
Ruby And The CloudRuby And The Cloud
Ruby And The Cloud
 
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
 
OSINT tools for security auditing with python
OSINT tools for security auditing with pythonOSINT tools for security auditing with python
OSINT tools for security auditing with python
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine Spiders
 
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Shifting Gears
Shifting GearsShifting Gears
Shifting Gears
 
Sinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote FormatSinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote Format
 
Sinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatSinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF Format
 
Free Mongo on OpenShift
Free Mongo on OpenShiftFree Mongo on OpenShift
Free Mongo on OpenShift
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
Scrabbly GTUG presentation
Scrabbly GTUG presentationScrabbly GTUG presentation
Scrabbly GTUG presentation
 
Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Web scraping with Ruby