Multiple classifier systems under attack
Battista Biggio, Giorgio Fumera, Fabio Roli
Dept. of Electrical and Electronic Eng., Univ. of Cagliari
http://prag.diee.unica.it
9th International Workshop on Multiple Classifier Systems
Outline
● Adversarial classification
● MCSs in adversarial classification tasks
● Some experimental results
2
Adversarial classification
Two pattern classes: Examples:
● Biometric verification and recognition
legitimate, malicious ● Intrusion detection in computer networks
● Spam filtering
● Network traffic identification
Biometric verification ...
Spam filtering
I am John Smith
legitimate
Subject: MCS2010 Suggested tours
Dear MCS 2010 Participant,
Attached please find the offers
we negotiated with the travel agency
...
genuine Template
database spam
Subject: Need affordable Drugs??
J. Smith B. Brown
I am Bob Brown Order from Canadian Pharmacy
& Save You Money
We are having Specials
Hot Promotion this week!
...
3
impostor
Adversarial classification
Attack: fingerprint spoofing
spam
Subject: Need affordable Drugs??
Order from Canadian Pharmacy & Save You Money
We are having Specials Hot Promotion this week!
B. Brown ...
I am Bob Brown
Attack:
Bad word obfuscation
Good word insertion
impostor Subject: Need affordab1e D r u g s?? spam
Order from (anadian Ph@rmacy & S@ve You Money
We are having Specials H0t Promotion this week!
...
"Don't you guys ever read a paper? Moyer's a
gentleman now. He knows t
"Well I'm sure I can't help what you think,"
she said tartly. "After a
J. Smith B. Brown
Template database 4
Adversarial classification
Main issues:
● vulnerabilities of pattern recognition systems
● performance evaluation under attack
● design of pattern recognition systems robust to attacks
5
Multiple classifier systems
in adversarial environments
I am Bob Brown
Fusion rule
Accepted/
J. Smith B. Brown Rejected
impostor
Multimodal biometric systems: more accurate than unimodal ones
6
Multiple classifier systems
in adversarial environments
I am Bob Brown
Fusion rule
Accepted/
J. Smith B. Brown Rejected
impostor
Multimodal biometric systems: more accurate than unimodal ones
And also more robust to attacks (?)
Analogous claims in other applications
(spam filtering, network intrusion detection, etc.) 7
Aim of our work
Main issues in adversarial classification:
● vulnerabilities of pattern recognition systems
● performance evaluation under attack
● design of pattern recognition systems robust to attacks
Our goal: to investigate whether and how MCSs allow to
improve the robustness of PR systems under attack
8
Linear classifiers under attack
The adversary exploits some knowledge on
● the features
● the classifier's decision function
An example: spam filtering, linear classifiers
f(x) = sign { ω1x1 + ω2x2 + ... + ωNxN + ω0 }
xi {0,1}; f(x) = +1: spam; f(x) = -1: legitimate
Buy viagra! Buy vi4gr4!
Did you ever play that game
when you were a kid where the
little plastic hippo tries to
gobble up all your marbles?
x = [ 1 0 1 0 0 0 0 0 …] x’ = [ 1 0 0 0 1 0 0 1 …]
9
Linear classifiers under attack
The adversary exploits some knowledge on
● the features
● the classifier's decision function
ω buy
viagra f(x) = sign { ω1x1 + ω2x2 + ... ωNxN + ω0 }
2.0
0.5 Buy viagra! 0.5 + 2.0 - 0.9 = 0.6 > 0: spam
Buy vi4gr4! 0.5 - 0.9 = -0.4 < 0: legitimate
-0.5
-0.9 Buy viagra! 0.5 + 2.0 - 2.0 - 0.9 = -0.4 < 0: legitimate
-2.0 game
kid game ω
0
10
Linear classifiers under attack
Possible strategy to improve the robustness of linear
classifiers: keep weights as much uniform as possible
(Kolcz and Teo, 6th Conf. on Email and Anti-Spam,
CEAS 2009)
ω buy
viagra f(x) = sign { ω1x1 + ω2x2 + ... ωNxN + ω0 }
1.0 1.5
Buy viagra! 1.0 + 1.5 - 0.9 = 1.6 > 0: spam
Buy vi4gr4! 1.0 - 0.9 = 0.1 > 0: spam
-1.0 -0.9 Buy viagra! 1.0 + 1.5 - 1.5 - 0.9 = 0.1 > 0: spam
-1.5
game
kid game ω
0
Buy viagra! 1.0 + 1.5 - 1.0 - 1.5 - 0.9 = -0.9 < 0
kid game legitimate
11
Ensembles of linear classifiers under attack
Do randomisation-based MCS techniques result in more
uniform weights of linear base classifiers?
● bagging
● random subspace method
● ...
(accuracy-robustness trade-off)
12
Experimental setting (1)
● Spam filtering task
● TREC 2007 data set (20,000 out of > 75,000 e-mails, 2/3 spam)
● Features: bag of words (word occurrence) > 360,000
● Base linear classifiers: SVM, Logistic Regression
● MCS
● ensemble size: 3, 5, 10
● bagging: 20%, 100% training samples
● RSM: 20%, 50%, 80% feature subset sizes
● 5 runs
● Evaluation of performance under attack: worst-case BWO/GWI
attack, for m obfuscated/added words (m = “attack strength”)
13
Measure of weights uniformity
sum of top-K weights |ω|
absolute values
sum of weights
absolute values |ω1| |ωΝ|
F(K)
least uniform weights |ω|
1
|ω1| |ωΝ|
|ω|
most uniform weights
|ω1| |ωΝ|
K
0 N
15
Kolcz and Teo, 6th Conf. on Email and Anti-Spam (CEAS 2009)
Conclusions
● Adversarial classification: which roles can MCSs play?
● This work:
● linear classifiers
● attacks based on some knowledge about features
and decision function (case study: spam filtering)
● Future works: investigating MCSs on different
applications, base classifiers, kinds of attacks, ...
19