2. Who
am
I?
• Lecturer
in
Computer
Security
at
the
University
of
Birmingham,
UK
• Member
of
the
founding
team
of
Lastline,
Inc.
• Research
interests:
– Malware
analysis
– Vulnerability
analysis
10. Watering
Hole
AUacks
• SomeVmes
it
is
difficult
to
exploit
the
target
of
an
aUack
directly
– Instead
compromise
a
site
that
is
likely
to
be
visited
by
the
target
• Council
on
foreign
relaVons
→
governmental
officials
• Unaligned
Chinese
news
site
→
Chinese
dissidents
• iPhone
dev
web
site
→
developers
at
Apple,
Facebook,
TwiUer,
etc.
• NaVon
Journal
web
site
→
PoliVcal
insiders
in
Washington
12. Oracle
• EssenVally,
a
classificaVon
algorithm
for
web
content
– Input:
web
page
– Output:
classificaVon
(malicious
or
benign)
• In
pracVce,
it
is
useful
to
extract
and
provide
users
with
evidence
to
support
classificaVon
– Exploit
detecVon
– DeobfuscaVon
results
– Anything
that
helps
forensics,
really
13. Oracle
approaches
• Nowadays,
most
oracles
are
dynamic
analysis
systems
– We
care
about
the
behavior
of
a
sample/web
page/
document
• Run
a
sample/visit
a
web
page
inside
an
instrumented
environment
and
monitor
its
behavior
• Bypass
all
obfuscaVon/feasibility
concerns
associated
with
staVc
analysis
• Opens
up
a
lot
of
interesVng
challenges
related
to
transparency
and
evasion
14. Wepawet
• Detec3on
and
Analysis
of
Drive-‐by-‐Download
ABacks
and
Malicious
JavaScript
Code
Marco
Cova,
Christopher
Kruegel,
Giovanni
Vigna
in
Proceedings
of
the
World
Wide
Web
Conference
(WWW),
Raleigh,
NC,
April
2010
• hUp://wepawet.cs.ucsb.edu
• By
the
numbers:
– Number
of
unique
IPs
that
submiUed
to
Wepawet:
141,463
– Number
of
pages
visited
and
analyzed
by
Wepawet:
67,424,459
– Number
of
malicious
pages
idenVfied
as
malicious:
2,239,335
15. Wepawet
Features
• Exploit
preparaVon
– Number
of
bytes
allocated
(heap
spraying)
– Number
of
likely
shellcode
strings
• Exploit
aUempt
– Number
of
instanVated
plugins
and
AcVveX
controls
– Values
of
aUributes
and
parameters
in
method
calls
– Sequences
of
method
calls
• RedirecVons
and
cloaking
– Number
and
target
of
redirecVons
– Browser
personality-‐
and
history-‐based
differences
• ObfuscaVon
– String
definiVons/uses
– Number
of
dynamic
code
execuVons
– Length
of
dynamically-‐
executed
code
16. Filter
• If
everything
goes
well,
amer
a
while
we
will
have
more
samples/pages
than
you
can
analyze
in-‐depth
with
your
oracle
• Analysis
Vme
ranges
from
a
few
seconds
to
a
couple
of
minutes
– Oracle
actually
runs
the
sample
– SomeVmes
mulVple
Vmes
(anV-‐evasion
techniques)
• Challenge:
how
do
we
scale?
17. StaVc
filtering
• Quick
idenVficaVon
of
drive-‐by-‐download
web
pages
– Each
web
page
is
deemed
likely
benign
or
likely
malicious
• Basis
for
the
classificaVon
is
a
set
of
staVc
features
• Necessarily
more
imprecise
than
oracle
– We
only
worry
about
not
having
false
negaVves
– Very
tolerant
with
false
posiVves
(consequence:
more
work
for
our
oracle)
18. Prophiler
• Filter
for
malicious
web
pages
• Prophiler:
a
Fast
Filter
for
the
Large-‐Scale
Detec3on
of
Malicious
Web
Pages,
Davide
Canali,
Marco
Cova,
Christopher
Kruegel,
Giovanni
Vigna
in
Proceedings
of
the
Interna=onal
World
Wide
Web
Conference
(WWW),
2011
19. StaVc
features
• We
define
three
classes
of
features
(77
in
total)
– HTML
(19)
• source:
web
page
content
– JavaScript
(25)
• source:
web
page
content
– URL
and
host-‐based
(33)
• source:
page
URL
and
URLs
included
in
the
content
• One
machine
learning
model
for
each
feature
class
20. Example
features
HTML
features
• iframe
tags,
hidden
elements,
elements
with
a
small
area,
script
elements,
embed
and
object
tags,
scripts
with
a
wrong
filename
extension,
out-‐of-‐place
elements,
included
URLs,
scripVng
content
percentage,
whitespace
percentage,
meta
refresh
tags,
double
HTML
documents,
…
22. EvaluaVon
• Large-‐scale
evaluaVon
of
Prophiler
• 60
days
of
crawling
+
analysis
• 18,939,908
unlabeled
pages
• 14.3%
of
pages
flagged
as
suspicious
and
submiUed
to
Wepawet
(13.7%
FP)
• 85.7%
load
reducVon
on
Wepawet
=
saving
more
than
400
days
of
analysis!
23. Smart
crawler
• How
do
we
seed
our
oracle
+
filter
• Obvious
idea:
crawling
– Problem:
toxicity
of
regular
crawling
is
preUy
low
– ObservaVon:
crawling
only
as
good
as
the
iniVal
seeds
• Challenge:
can
we
find
beUer
seeds?
24. EvilSeed
• Guided
search
approach
to
increase
toxicity
of
pages
that
are
crawled
• Inputs:
malicious
web
pages
found
in
the
past
• Output:
set
of
(more
likely
malicious)
web
pages
• EVILSEED:
A
Guided
Approach
to
Finding
Malicious
Web
Pages,
Luca
Invernizzi,
Stefano
BenvenuV,
Paolo
Milani,
Marco
Cova,
Christopher
Kruegel,
Giovanni
Vigna,
in
Proceedings
of
the
IEEE
Symposium
on
Security
and
Privacy,
2012
27. AnV
evasion
• At
this
point
of
the
story,
the
bad
guys
will
acVvely
try
to
evade
your
system
• Lots
of
effort
in
designing
evasion
techniques
– Analysis
environment
detecVon
– User
detecVon
– Stalling
• Challenge:
how
do
we
detect
if
we
are
being
evaded?
28. Revolver
• AssumpVon:
aUackers
are
likely
to
take
exisVng
malicious
samples/web
pages
and
enhance
them
to
add
evasive
code
• Idea:
detect
similar
samples
that
are
classified
differently
by
the
oracle
• Revolver:
An
Automated
Approach
to
the
Detec3on
of
Evasive
Web-‐based
Malware
A.
Kapravelos,
Y.
Shoshitaishvili,
M.
Cova,
C.
Kruegel,
G.
Vigna
in
Proceedings
of
the
USENIX
Security
Symposium
Washington,
D.C.
August
2013
29. Revolver
IF
VAR
<=
NUM
…
Oracle
Web
IF
VAR
<=
NUM
…
Similarity
computaVon
{bi,
mj}
Malicious
evoluVon
Data-‐dependency
JavaScript
infecVons
Evasions
Pages
ASTs
Candidate
pairs
…
…