2. Outline
• Introduc2on
• Current
state
of
AV
classifica2on
• Automa2c
Malware
Clustering
• Malware
Detec2on
• Anomaly
Detec2on
• Specific
Purpose
Detec2on
• General
Purpose
Detec2on
u Note:
Some
part
of
this
slide
is
removed
due
to
research
is
under
processing
5. Why
Malware
Clustering
• Recognize
and
filter
known
malware,
so
analyst
can
focus
on
new
one.
• Track
a
malware
family
and
its
evolu2on
• Develop
remedy
mechanism
for
certain
type
• Help
to
construct
malware
model/signature
10. Kaspersky
• Classifying
the
malware
items
according
to
their
ac2vity
on
users’
computers
– The
types
with
least
threat
are
shown
in
the
lower
area
– The
types
with
greater
threat
are
displayed
in
the
upper
area
• Mul2ple
func2on
malware
– The
behaviors
that
pose
a
higher
risk
outrank
those
behaviors
that
represent
a
lower
risk.
11. Trend
Micro
Prefix
Descrip2on
ADW
Adware
ATVX
Ac2veX
malicious
code
BAT
Batch
file
virus
BHO
Browser
Helper
Object
-‐
A
non-‐destruc2ve
toolbar
applica2on
BKDR
Backdoor
virus
CHM
Compiled
HTML
file
found
on
malicious
Web
sites
COOKIE
Cookie
used
to
track
a
user's
Web
habits
for
the
purpose
of
data
mining
DOS,
DDOS
Virus
that
prevents
a
user
from
accessing
security
and
an2virus
company
Web
sites
ELF
Executable
and
Link
format
viruses
EXPL
Exploit
that
does
not
fit
other
categories
GENERIC
Memory-‐resident
boot
virus
HTML
HTML
virus
IRC
Internet
Relay
Chat
malware
JAVA
Java
malicious
code
JS
JavaScript
virus
PE
File
infector
PERL
Malware,
such
as
a
file
infector,
created
in
PERL
RAP
Remote
access
program
REG
Threat
that
modifies
the
system
registry
RTKT
Rootkit
programs
SPYW
Spyware/Grayware
TSPY
Malicious
malware
TROJ
Trojan
VBS
VBScript
virus
WORM
Worm
W2KM,
W97M,
X97M,
P97M,
A97M,
O97M,
WM,
XF,
XM,
V5M,
X2KM,
X97M
Macro
virus
12. Our
Naming
Sugges2on
• Describe
whole
life
cycle
of
malware
• <Aqack
Vector>.<Protec2on>.<Func2onality>.<Propag
a2on
method>
– Aqack
Vector:
How
malware
infected
vic2m
– Protec2on:
How
malware
protect
itself
– Func2onality:
The
malicious
behavior
executed
– Propagate
Method:
How
malware
aqack
other
machine
13. Label
Inconsistent
• Each
AV
company
has
its
own
way
of
naming
malware
families
– Popular
name
in
underground
forums
may
be
used
by
AV
vendors.
E.g.
Zeus,
ZeroAcess….
– Smaller
and
less
prominent
families
are
named
independently
by
each
AV
company
• Even
in
the
same
vendor,
Different
detect
mechanism
may
give
different
label
14. Problem?
• Make
sharing
between
vendors
more
hard
• Put
a
barrier
to
develop
remedy
mechanism
• Many
automa2c
detec2on
mechanism
need
pre-‐
define
clustering
result.
17. Malware
Clustering
• Clustering
malware
already
known
– Reduce
signature
size
– Generate
high
quality
signature/model
– Group
similar
malware
• An2-‐virus
clustering
malware
for
expert
to
filter
out
old
threat
and
generate
remedy
for
new
threat
– Informa2on
familiar
to
human
being,
e.g.
file
name,
remote
address,
….
• In
our
research,
we
want
construct
automa2c
detec2on
mechanism
– Informa2on
suitable
for
machine
processing,
e.g.
instruc2on
trace,
system
call
trace,
…..
18. Clustering
Procedure
• Mostly,
three
steps
are
involved
to
clustering
malware
– Malware
Analysis
• Dynamic,
Sta2c
– Feature
Extrac2on
• Instruc2on
Sequence
• Func2on
Call
Sequence
• Control
Flow
Graph
• Data
Flow
Graph
• Network
Communica2on
– Clustering
Algorithms
• Hierarchy
Clustering
• Kmeans
• DBScan
19. Android Malware Clustering
• Aim
to
group
similar
android
malware
– Sta2c
and
internal
informa2on
are
used
• Also
try
to
detect
android
malware
in
these
group
N-‐gram/
Feature
Hash
Hierarchical
Clustering
Centroid
of
Cluster
Samples
Feature
Vector
Detec2on
Model
Feature
Extrac2on
Model
Construc2on
Detec2on
19/25
20. N-gram & Feature Hash
.method
public
testMethod(II)I
iload_1
iload_2
Iadd
istore_3
iload_3
iconst_1
Iadd
Ireturn
.end
method
1
1
• Reverse
APK
into
Java
byte
code
• Separate
into
N-‐gram
slide
and
compute
feature
hash
Feature
Hash
20/25
21. Distance Heuristic
1
1
1
1
1
1
1
1
• With
feature
hash,
distance
between
samples
can
be
computed
– D(A,B)
=
Intersect(A,B)/Union(A,B)
malware
A
malware
B
Intersect
Union
Distance
0.33
21/25
23. Summary of Similarity-based Method
• Our
system
can
clustering
similar
malware
and
possible
to
construct
detect
mechanism
• Pros
– Construct
compact
model
which
can
be
deployed
in
end
user’s
device
• Cons
– Sta2c
only,
cannot
analysis
obfusca2on
app
– Not
Scalable,
pairwise
comparison
is
needed
23/25
25. Automa2c
Malware
Detec2on/
Classifica2on
• Detec2on
vs.
Classifica2on
– Detec2on
is
also
the
classifica2on
problem
with
only
two
labels
• Malware
Detec2on
– Signature-‐based
Detector
– Specific
Detector
– General
Detector
26. Mul2-‐Level
Detec2on
Signature-‐
Based
General
Detector
Specific
Detector
Anomaly
Detector
Known
Malware
Mutate
Malware
Brand-‐new
Malware
Benign
Program
1. Unknown
Malware
2. Mutate
Malware
3. Benign
Program
Samples
1. Unknown
Malware
2. Benign
Program
1. Unknown
Malware
2. Benign
Program
1. Unknown
Malware
2. Benign
Program
Malware
Clustering
Expert
27. Anomaly
Detector
• Anomaly
detector
used
to
filter
out
benign
program
– High
False
Alert
Rate
• Access
Miner
– Using
System-‐Centric
Models
for
Malware
Protec2on
– Proposed
in
CCS’10
by
Ins2tute
Eurecom
28. Access
Miner
• The
intui2on
is
that
benign
programs
in
general
follow
certain
way
in
which
they
use
the
OS
resources
• While
malware
may
not
follow
this
way
• Steps
of
Access
Miner
– Collect
system
calls
used
by
normal
benign
program
– Employ
N-‐gram
to
change
into
vector
– Compute
the
pair-‐wise
distance
of
benign
program
29. Steps
of
Access
Miner
ReadFile
WriteFile
Close
SendPkt
ReadFile
WriteFile
Close
ReadFile
ReadFile
WriteFile
Close
WriteFile
Close
SendPkt
ReadFile
WriteFile
Close
WriteFile
Close
ReadFile
Distance
=
0.33
Normal
Benign
Program
Malicious
30. Specific
Purpose
Detec2on
• Detect
some
malicious
features
of
malware
• Most
specific
purpose
detec2on
system
have
low
false
posi2ve
• Possible
to
iden2fy
new
threat
• Limit
to
narrow
scope
– Some
behavior
is
only
suspicious,
not
exactly
malicious
• Register
as
boot
service
• Receive
a
command
and
execute
30/25
31. Forenser
• Forenser
– Detect
shellcode
embedded
in
document
files
– Decode
each
part
of
document
as
x86
code
– If
there
are
highly
data
dependency
in
decode
result,
shellcode
may
exists
31/25
mov
eax,
ebx
push
eax
call
func
….
….
33. Kernel
Mode
Rootkit
Detec2on
• Check
whether
user
code
run
in
kernel
mode
• Dynamic
Informa2on
Flow
Tracking(Taint)
• Tracking
if
any
kernel
memory
tainted
?
• Detect
642
kernel
mode
rootkit
in
14894
samples
stmt
A
B
C
A
=
<input>
O
X
X
C
=
B
O
X
X
C
=
A
O
X
O
B
=
C
O
O
O
34. Malware Recognition based on in-Kernel function
Invocation Pattern
• MrKIP’s
goal
is
to
detect
and
classify
rootkit
• With
the
help
of
in-‐kernel
func2on
hooking,
invoca2on
sequence
can
be
recorded
• Construct
malware
model
as
state
machine
Run2me
in-‐
kernel
call
sequence
State
Transi2on
Table
HMM
Samples
Feature
Vector
Detec2on
Model
Feature
Extrac2on
Model
Construc2on
Detec2on
34/25
35. Behavior to State
• First,
MrKIP
record
run2me
in-‐kernel
func2on
call
• Similar
behavior
is
cluster
by
HLC
– Same
func2on,
different
argument
– Different
heuris2c
func2on
is
used
for
different
type
OverWrIfFile(mc11.bat)
SetReg(SERVICESGRANDE48,
00010000)
OverWrIfFile(esc.bat)
SetReg(SERVICESMSOFT98,
00010000)
State
1
State
2
malware
A
malware
B
35/25
36. Hidden Markov Model
• Hidden
Markov
Model
is
construct
to
recognize
samples
CreateFile
C:WINDOWSsystem32
SetReg
<SERVICES>DOCKER19
ErrorControl=
00010000
SetReg
<SERVICES>DOCKER19
Start
=
00020000
SetReg
SERVICESDOCKER19
Type
=
00010000
SetReg
<SERVICES>DOCKER19
ImagePath
=
C:WINDOWSsystem32
driversdocker19.sys
DelFile
<system32>KERNE
L32.PDBSYMBOLS
DLLKERNEL32.PDB
0.33
1.0
1.0
1.0
0.16
0.5
0.05
SendPkt
….trojansssOx0afre
ehosWaOx03com...
0.09
0.27
CreateProc
pinch_2.99.exe
0.25
malware
A
malware
B
State
1
State
2
State
1
State
2
State
2
State
1
36/25
38. Conclusion
• There
are
many
different
way
to
clustering
malware
• How
an2virus
classifica2on
– Labeling
Inconsistent
– Low
detec2on
rate
• Automa2c
malware
clustering
system
• Different
type
of
detec2on
mechanism
– Signature-‐based
detector
– Anomaly
detector
– Specific
Purpose
detector
– General
Purpose
detector