SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
Starting Soon…
Web Scraping Workshop
Daniyal Bokhari
Using Beautiful Soup Package
for Python
What is Web Scraping?
The textbook definition
• Web scraping is the act of extracting data from
website
• Typically performed using automated tools to reduce
time spent on collecting data from websites
• Once data is extracted from websites, it can be parsed
to access specific pieces of information relevant to
your search
Some common uses
What is it used for?
• Industry statistics/insights
• By consumers to compare prices across different
stores
• Analyze stock or cryptocurrency prices
• Academic research
The legality of it
When should it not be used?
• Web scraping is generally considered to be legal when
it’s used to find publicly available information
• Should not be used to collect personal information,
data that the poster did not intend to be publicly
available
• Should not be used to access data that requires you
to create an account or login
A Python library for web scraping
• Beautiful Soup is a Python library that helps in web
scraping by providing useful methods to parse
through the HTML code of websites
• Creates python objects out of elements of the HTML
code allowing you to easily search and traverse
through the contents of websites
Which is better?
• Selenium refers to multiple different open-source projects, the Selenium
API can be used to control browsers
• Much more complex than Beautiful Soup, can interact with websites by
pressing buttons and filling out fields
• Used for automation and testing
• Can also be used to perform web scraping
• Beautiful Soup is useful for simple projects, can extract information faster
since it doesn’t load up all of the web page content and JavaScript
Beautiful Soup vs Selenium
• In order to use Beautiful Soup, you must specify a parser
• Parsers take the HTML code we receive from a website and construct an
object which we can use in Python to navigate and search through this
code
• Python has a built-in HTML parser called html.parser
• We will be using lxml, an external parser, as it is faster at parsing HTML
than html.parser and works for more versions of Python
Choice of Parser
Which one should you use?
Live Demo
• You will need to install Beautiful Soup, lxml, and the requests library
The best way to learn is to practice
Run these in a terminal:
pip install beautifulsoup4
pip install lxml
pip install requests
Alternate commands:
pip3 install <module>
Python3 -m pip install <module>
Python3 -m pip3 install <module>
• You can use your favourite IDE/text editor (Pycharm, VS Code, Vim, etc) to
follow along
Where can I learn more about this?
• The documentation is a great place to look at all available functions
• Online tutorials, YouTube videos are also a great place to learn more about
web scraping
Beautiful Soup 4 documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Many resources

Más contenido relacionado

Similar a Web Scraping Workshop Using Beautiful Soup

Best practices with development of enterprise-scale SharePoint solutions - Pa...
Best practices with development of enterprise-scale SharePoint solutions - Pa...Best practices with development of enterprise-scale SharePoint solutions - Pa...
Best practices with development of enterprise-scale SharePoint solutions - Pa...SPC Adriatics
 
Find maximum bugs in limited time
Find maximum bugs in limited timeFind maximum bugs in limited time
Find maximum bugs in limited timebeched
 
Getting started developing for share point
Getting started developing for share pointGetting started developing for share point
Getting started developing for share pointRoel Bethlehem
 
Automated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchAutomated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchExcella
 
BSides Lisbon 2013 - All your sites belong to Burp
BSides Lisbon 2013 - All your sites belong to BurpBSides Lisbon 2013 - All your sites belong to Burp
BSides Lisbon 2013 - All your sites belong to BurpTiago Mendo
 
DNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN InstancesDNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN InstancesWill Strohl
 
WTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal projectWTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal projectSymetris
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSPC Adriatics
 
Practical automation for beginners
Practical automation for beginnersPractical automation for beginners
Practical automation for beginnersSeoweon Yoo
 
Exploring Content API Options - March 23rd 2016
Exploring Content API Options - March 23rd 2016Exploring Content API Options - March 23rd 2016
Exploring Content API Options - March 23rd 2016Jani Tarvainen
 
If you want to automate, you learn to code
If you want to automate, you learn to codeIf you want to automate, you learn to code
If you want to automate, you learn to codeAlan Richardson
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlervinay arora
 
From marketplace to WordPress - WordCamp Belfast
From marketplace to WordPress - WordCamp BelfastFrom marketplace to WordPress - WordCamp Belfast
From marketplace to WordPress - WordCamp BelfastFellyph Cintra
 
Untangling spring week11
Untangling spring week11Untangling spring week11
Untangling spring week11Derek Jacoby
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design Allan Mangune
 
DotNetNuke Urls - Best practice for administrators, editors and developers
DotNetNuke Urls - Best practice for administrators, editors and developersDotNetNuke Urls - Best practice for administrators, editors and developers
DotNetNuke Urls - Best practice for administrators, editors and developersbrchapman
 

Similar a Web Scraping Workshop Using Beautiful Soup (20)

Web Technology Part 1
Web Technology Part 1Web Technology Part 1
Web Technology Part 1
 
Best practices with development of enterprise-scale SharePoint solutions - Pa...
Best practices with development of enterprise-scale SharePoint solutions - Pa...Best practices with development of enterprise-scale SharePoint solutions - Pa...
Best practices with development of enterprise-scale SharePoint solutions - Pa...
 
Find maximum bugs in limited time
Find maximum bugs in limited timeFind maximum bugs in limited time
Find maximum bugs in limited time
 
Getting started developing for share point
Getting started developing for share pointGetting started developing for share point
Getting started developing for share point
 
Automated Acceptance Testing from Scratch
Automated Acceptance Testing from ScratchAutomated Acceptance Testing from Scratch
Automated Acceptance Testing from Scratch
 
BSides Lisbon 2013 - All your sites belong to Burp
BSides Lisbon 2013 - All your sites belong to BurpBSides Lisbon 2013 - All your sites belong to Burp
BSides Lisbon 2013 - All your sites belong to Burp
 
DNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN InstancesDNN Summit: Robots.txt & Multi-Site DNN Instances
DNN Summit: Robots.txt & Multi-Site DNN Instances
 
WTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal projectWTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal project
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search Operations
 
Practical automation for beginners
Practical automation for beginnersPractical automation for beginners
Practical automation for beginners
 
DCI - Free Seo Tools
DCI - Free Seo ToolsDCI - Free Seo Tools
DCI - Free Seo Tools
 
Exploring Content API Options - March 23rd 2016
Exploring Content API Options - March 23rd 2016Exploring Content API Options - March 23rd 2016
Exploring Content API Options - March 23rd 2016
 
If you want to automate, you learn to code
If you want to automate, you learn to codeIf you want to automate, you learn to code
If you want to automate, you learn to code
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
From marketplace to WordPress - WordCamp Belfast
From marketplace to WordPress - WordCamp BelfastFrom marketplace to WordPress - WordCamp Belfast
From marketplace to WordPress - WordCamp Belfast
 
Wp 3hr-course
Wp 3hr-courseWp 3hr-course
Wp 3hr-course
 
Untangling spring week11
Untangling spring week11Untangling spring week11
Untangling spring week11
 
Web Scrapping Using Python
Web Scrapping Using PythonWeb Scrapping Using Python
Web Scrapping Using Python
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design
 
DotNetNuke Urls - Best practice for administrators, editors and developers
DotNetNuke Urls - Best practice for administrators, editors and developersDotNetNuke Urls - Best practice for administrators, editors and developers
DotNetNuke Urls - Best practice for administrators, editors and developers
 

Más de GDSC UofT Mississauga

ICCIT Council × GDSC: UX / UI and Figma
ICCIT Council × GDSC: UX / UI and FigmaICCIT Council × GDSC: UX / UI and Figma
ICCIT Council × GDSC: UX / UI and FigmaGDSC UofT Mississauga
 
Community Projects Info Session Fall 2023
Community Projects Info Session Fall 2023Community Projects Info Session Fall 2023
Community Projects Info Session Fall 2023GDSC UofT Mississauga
 
MCSS × GDSC: Intro to Cybersecurity Workshop
MCSS × GDSC: Intro to Cybersecurity WorkshopMCSS × GDSC: Intro to Cybersecurity Workshop
MCSS × GDSC: Intro to Cybersecurity WorkshopGDSC UofT Mississauga
 
Full Stack React Workshop [CSSC x GDSC]
Full Stack React Workshop [CSSC x GDSC]Full Stack React Workshop [CSSC x GDSC]
Full Stack React Workshop [CSSC x GDSC]GDSC UofT Mississauga
 

Más de GDSC UofT Mississauga (20)

CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
 
ICCIT Council × GDSC: UX / UI and Figma
ICCIT Council × GDSC: UX / UI and FigmaICCIT Council × GDSC: UX / UI and Figma
ICCIT Council × GDSC: UX / UI and Figma
 
Community Projects Info Session Fall 2023
Community Projects Info Session Fall 2023Community Projects Info Session Fall 2023
Community Projects Info Session Fall 2023
 
GDSC x Deerhacks - Origami Workshop
GDSC x Deerhacks - Origami WorkshopGDSC x Deerhacks - Origami Workshop
GDSC x Deerhacks - Origami Workshop
 
Reverse Engineering 101
Reverse Engineering 101Reverse Engineering 101
Reverse Engineering 101
 
Michael's OWASP Juice Shop Workshop
Michael's OWASP Juice Shop WorkshopMichael's OWASP Juice Shop Workshop
Michael's OWASP Juice Shop Workshop
 
MCSS × GDSC: Intro to Cybersecurity Workshop
MCSS × GDSC: Intro to Cybersecurity WorkshopMCSS × GDSC: Intro to Cybersecurity Workshop
MCSS × GDSC: Intro to Cybersecurity Workshop
 
Basics of C
Basics of CBasics of C
Basics of C
 
Discord Bot Workshop Slides
Discord Bot Workshop SlidesDiscord Bot Workshop Slides
Discord Bot Workshop Slides
 
Devops Workshop
Devops WorkshopDevops Workshop
Devops Workshop
 
Express
ExpressExpress
Express
 
HTML_CSS_JS Workshop
HTML_CSS_JS WorkshopHTML_CSS_JS Workshop
HTML_CSS_JS Workshop
 
DevOps Workshop Part 1
DevOps Workshop Part 1DevOps Workshop Part 1
DevOps Workshop Part 1
 
Docker workshop GDSC_CSSC
Docker workshop GDSC_CSSCDocker workshop GDSC_CSSC
Docker workshop GDSC_CSSC
 
Back-end (Flask_AWS)
Back-end (Flask_AWS)Back-end (Flask_AWS)
Back-end (Flask_AWS)
 
Full Stack React Workshop [CSSC x GDSC]
Full Stack React Workshop [CSSC x GDSC]Full Stack React Workshop [CSSC x GDSC]
Full Stack React Workshop [CSSC x GDSC]
 
Git Init (Introduction to Git)
Git Init (Introduction to Git)Git Init (Introduction to Git)
Git Init (Introduction to Git)
 
Database Workshop Slides
Database Workshop SlidesDatabase Workshop Slides
Database Workshop Slides
 
ChatGPT General Meeting
ChatGPT General MeetingChatGPT General Meeting
ChatGPT General Meeting
 
Elon & Twitter General Meeting
Elon & Twitter General MeetingElon & Twitter General Meeting
Elon & Twitter General Meeting
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Web Scraping Workshop Using Beautiful Soup

  • 2. Web Scraping Workshop Daniyal Bokhari Using Beautiful Soup Package for Python
  • 3. What is Web Scraping? The textbook definition • Web scraping is the act of extracting data from website • Typically performed using automated tools to reduce time spent on collecting data from websites • Once data is extracted from websites, it can be parsed to access specific pieces of information relevant to your search
  • 4. Some common uses What is it used for? • Industry statistics/insights • By consumers to compare prices across different stores • Analyze stock or cryptocurrency prices • Academic research
  • 5. The legality of it When should it not be used? • Web scraping is generally considered to be legal when it’s used to find publicly available information • Should not be used to collect personal information, data that the poster did not intend to be publicly available • Should not be used to access data that requires you to create an account or login
  • 6. A Python library for web scraping • Beautiful Soup is a Python library that helps in web scraping by providing useful methods to parse through the HTML code of websites • Creates python objects out of elements of the HTML code allowing you to easily search and traverse through the contents of websites
  • 7. Which is better? • Selenium refers to multiple different open-source projects, the Selenium API can be used to control browsers • Much more complex than Beautiful Soup, can interact with websites by pressing buttons and filling out fields • Used for automation and testing • Can also be used to perform web scraping • Beautiful Soup is useful for simple projects, can extract information faster since it doesn’t load up all of the web page content and JavaScript Beautiful Soup vs Selenium
  • 8. • In order to use Beautiful Soup, you must specify a parser • Parsers take the HTML code we receive from a website and construct an object which we can use in Python to navigate and search through this code • Python has a built-in HTML parser called html.parser • We will be using lxml, an external parser, as it is faster at parsing HTML than html.parser and works for more versions of Python Choice of Parser Which one should you use?
  • 9. Live Demo • You will need to install Beautiful Soup, lxml, and the requests library The best way to learn is to practice Run these in a terminal: pip install beautifulsoup4 pip install lxml pip install requests Alternate commands: pip3 install <module> Python3 -m pip install <module> Python3 -m pip3 install <module> • You can use your favourite IDE/text editor (Pycharm, VS Code, Vim, etc) to follow along
  • 10. Where can I learn more about this? • The documentation is a great place to look at all available functions • Online tutorials, YouTube videos are also a great place to learn more about web scraping Beautiful Soup 4 documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/ Many resources