Introduction to Selenium e Scrapy by Arcangelo Saracino
Web UI testing with Selenium, check actions, text and submit form.
Scrapy to crawl info from a website combined with selenium.
2. About me
Arcangelo Saracino
IT student at Bari University
2016-2018 Web developer at Aryma
2018- Feb2019 Web developer at Enterprise Digital Solution
saracinoarcangelo@gmail.com github.com/Arkango
3. Selenium
Selenium is a portable framework for testing web applications.
Selenium provides a playback (formerly also recording) tool for authoring
functional tests without the need to learn a test scripting language (Selenium IDE).
It also provides a test domain-specific language (Selenese) to write tests in a
number of popular programming languages, including C#, Groovy, Java, Perl,
PHP, Python, Ruby and Scala.
The tests can then run against most modern web browsers.
Selenium deploys on Windows, Linux, and macOS platforms.
It is open-source software, released under the Apache 2.0 license: web
developers can download and use it without charge.
Source: Wikipedia
5. Selenium IDE
Selenium IDE is a complete integrated development environment (IDE) for Selenium tests.
It is implemented as a Firefox Add-On and as a Chrome Extension.
It allows for recording, editing, and debugging of functional tests. It was previously known
as Selenium Recorder.
Selenium-IDE was originally created by Shinya Kasatani and donated to the Selenium
project in 2006.
Selenium IDE was previously little-maintained. Selenium IDE began being actively
maintained in 2018.
Scripts may be automatically recorded and edited manually providing autocompletion
support and the ability to move commands around quickly. Scripts are recorded in
Selenese, a special test scripting language for Selenium. Selenese provides commands
for performing actions in a browser (click a link, select an option), and for retrieving data
from the resulting pages.
6. Selenium Client API
As an alternative to writing tests in Selenese, tests can
also be written in various programming languages. These
tests then communicate with Selenium by calling methods
in the Selenium Client API. Selenium currently provides
client APIs for Java, C#, Ruby, JavaScript, R and Python.
With Selenium 2, a new Client API was introduced (with
WebDriver as its central component). However, the old API
(using class Selenium) is still supported.
7. Selenium Web Driver
Selenium WebDriver is the successor to Selenium RC.
Selenium WebDriver accepts commands (sent in Selenese, or
via a Client API) and sends them to a browser.
This is implemented through a browser-specific browser driver,
which sends commands to a browser and retrieves results.
Most browser drivers actually launch and access a browser
application (such as Firefox, Chrome, Internet Explorer, Safari,
or Microsoft Edge); there is also an HtmlUnit browser driver,
which simulates a browser using the headless browser
HtmlUnit.
9. Scrapy
Scrapy (/ skre pi/ SKRAY-pee) is a free and open-source web-crawlingˈ ɪ
framework written in Python. Originally designed for web scraping, it
can also be used to extract data using APIs or as a general-purpose
web crawler. It is currently maintained by Scrapinghub Ltd., a web-
scraping development and services company.
Scrapy project architecture is built around "spiders", which are self-
contained crawlers that are given a set of instructions. Following the
spirit of other don't repeat yourself frameworks, such as Django,[4] it
makes it easier to build and scale large crawling projects by allowing
developers to reuse their code. Scrapy also provides a web-crawling
shell, which can be used by developers to test their assumptions on a
site’s behavior.[5]
10. Scrapy: Basic Concept
● Command line tools
Scrapy is controlled through the scrapy command-line tool, to be referred here as the “Scrapy tool” to
differentiate it from the sub-commands, which we just call “commands” or “Scrapy commands”.
● Spiders
Spiders are classes which define how a certain site (or a group of sites) will be scraped,
including how to perform the crawl (i.e. follow links) and how to extract structured data from
their pages (i.e. scraping items). In other words, Spiders are the place where you define the
custom behaviour for crawling and parsing pages for a particular site (or, in some cases, a
group of sites).
● Selectors
Extract the data from web pages using XPath.
● Scrapy Shell
Test your extraction code in an interactive environment.
11. Scrapy: Basic Concept 2
● Items
Define the data you want to scrape.
● Items Loader
Populate your items with the extracted data.
● Items Pipeline
Post-process and store your scraped data.
● Feed Exports
Output your scraped data using different formats and storages.
● Request and responses
Scrapy uses Request and Response objects for crawling web sites.
12. Scrapy: Basic Concept 3
● Link extractor
Convenient classes to extract links to follow from pages.
● Settings
Learn how to configure Scrapy and see all available settings.
● Exceptions
See all available exceptions and their meaning.
17. About me
Arcangelo Saracino
IT student at Bari University
2016-2018 Web developer at Aryma
2018- Feb2019 Web developer at Enterprise Digital Solution
saracinoarcangelo@gmail.com github.com/Arkango