SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
The recent advancement of adversarial machine learning
Alexey Kurakin
The Google Brain Team
What are adversarial examples?
Why they are important?
What are adversarial examples?
Hamster
Clean
image
Image
classifier
Crafted adversarial
perturbation
Adversarial
image
Image
classifier
Cauliflower
Related problem - garbage class examples
(Nguyen, Yosinski, Clune, 2015)
Why adversarial examples are important?
Clean image Adversarial image
● Small changes in input -> change of classification label
○ Lack of robustness of machine learning
● Security problem
○ You can fool computer, human won’t notice a problem
Why adversarial examples are important?
(Kurakin, Goodfellow, Bengio, 2016)
Why adversarial examples are important?
(Athalye, Engstrom, Ilyas, Kwok, 2017)
Why adversarial examples are important?
(Brown, Mane, Roy, Abadi, Gilmer, 2017)
Adversarial attacks
How to generate adversarial examples
White box VS black box
(Papernot, McDaniel, Goodfellow, 2016)
White box case:
● Everything is known about classifier (including architecture, model parameters)
Black box case:
● Model parameters are unknown
In such case craft adversarial examples for known model and transfer to unknown
Digital VS Physical
Digital attack:
● Directly feeding numbers into classifier
classifier
Physical attack:
● Classifier perceives world through sensor (e.g. camera)
White box adversarial attack - L-BFGS
Where:
● F(•) - neural network or another ML classifier
● x - clean image
● L - desired target class
● r - adversarial perturbation, which makes x+r adversarial image
Method is very computationally demanding and slow
(Szegedy et al, 2014)
White box adversarial attack - FGSM
(Goodfellow, Shlens, Szegedy, 2014)
Where:
● J(x,y) - cross entropy loss of prediction for input sample x and label y
● x - clean image with true label Ytrue
● ϵ - method parameter, size of perturbation
White box adversarial attack - Iterative FGSM
Where:
● J(x,y) - cross entropy loss of prediction for input sample x and label y
● x - clean image with true label Ytrue
● α - method parameter, size of one step
(Kurakin, Goodfellow, Bengio, 2016)
White box adversarial attack - Iterative FGSM
Where:
● J(x,y) - cross entropy loss of prediction for input sample x and label y
● x - clean image, Ytarget
- desired target class
● α - method parameter, size of one step
(Kurakin, Goodfellow, Bengio, 2016)
Considered one of the strongest white box attacks.
White box adversarial attack - C&W
Where:
● Logits(•) - logits of the network
● X - clean image, Xadv
- adversarial image
● t - desired target class
● (z)+
= max(0, z)
(Carlini, Wagner, 2016)
Black box attacks
1. Train your own classifier on similar problem
2. Construct adversarial examples for your classifiers
3. Use them to attack unknown classifier
(Papernot, McDaniel, Goodfellow, 2016)
Defenses against adversarial examples
Defenses - input preprocessing
Problems:
● May degrade quality on clean images
● Broken when attacker is aware of
transformation
Classifier
Transformation
(ex.: blur)
(multiple work by multiple authors)
Defenses - gradient masking
Classifier
Similar to previous, but with non
differentiable transformation
Problem:
● Can use transferability to attack the model
Conv
Non
differentiable
function
Conv
let’s make this impossible
Defenses - adversarial training
Minibatch from dataset
Generate
adversarial
examples
Training minibatch
Model
weights
Compute
gradients update
weights
1
1
1
2
3
4
Idea: let’s inject adversarial examples into training set
1. Compute adversarial examples using current model
weights
2. Compose mixed minibatch of clean and adversarial
examples
3. Use mixed minibatch to compute model gradients
4. Update model weights
Problems:
● may be harder to train
● model may learn to mask gradients
Defenses - ensemble adversarial training
Trainable
model
Fixed
model 1
Compute
gradients
update
weights
To solve “gradient masking” problem
of adversarial training let’s ensemble a
few models.
Fixed
model N
Minibatch from dataset
Adversarial examples
Training minibatch
(Tramer et al, 2017)
Adversarial competition
Adversarial competition
Three tracks / sub-competitions:
● Adversarial attacks - algorithms which try to confuse classifier
○ Input: image
○ Output: adversarial image
● Adversarial targeted attacks - algorithms which try to confuse classifiers in a very specific way
○ Input: image and target class
○ Output: adversarial image
● Adversarial defenses - classifiers which robust to adversarial examples
○ Input: image
○ Output: classification label
Attacks are not aware of defenses, so this is simulation of black box scenario. (Kurakin, Goodfellow, Bengio, 2017)
Adversarial competition
Dataset
Attacks
Targeted
attacks
Target
classes
Defenses
Adversarial
images
Classification
result
Compute scores by
comparing predictions
with true labels and
target classes
Adversarial competition - best defenses
Trainable
model
Fixed
model 1
Compute
gradients
update
weights
● Top defenses are showing >90% accuracy on
adversarial examples
● Most of the top defenses are using
“ensemble adversarial training” as a part of
the model.
Fixed
model N
Minibatch from dataset
Adversarial examples
Training minibatch
Summary and conclusion
Summary
● Adversarial examples could be used to fool machine learning models
● Adversarial attacks:
○ White box VS black box
○ Digital VS physical
● Defenses against adversarial examples:
○ A lot of defenses works only if attacked does not know about them
○ One of potentially universal defenses - adversarial training
However adversarial training does not work in all cases
● Adversarial competition
○ Showed that ensemble adversarial training could be pretty good defense against black box
attack
Q & A

Más contenido relacionado

Similar a Alexey kurakin-what's new in adversarial machine learning

DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learningDEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learningFelipe Prado
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentationrohan_anil
 
Adversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfAdversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfMichelleHoogenhout
 
Deep Generative Modelling
Deep Generative ModellingDeep Generative Modelling
Deep Generative ModellingPetko Nikolov
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Causative Adversarial Learning
Causative Adversarial LearningCausative Adversarial Learning
Causative Adversarial LearningDavid Dao
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksKyuri Kim
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksDenis Dus
 
Black-Box attacks against Neural Networks - technical project report
Black-Box attacks against Neural Networks - technical project reportBlack-Box attacks against Neural Networks - technical project report
Black-Box attacks against Neural Networks - technical project reportRoberto Falconi
 
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Kishor Datta Gupta
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...Roelof van Zwol
 
Deceiving Autonomous Cars with Toxic Signs
Deceiving Autonomous Cars with Toxic SignsDeceiving Autonomous Cars with Toxic Signs
Deceiving Autonomous Cars with Toxic SignsLeonardoSalvucci1
 

Similar a Alexey kurakin-what's new in adversarial machine learning (20)

DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learningDEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
Adversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfAdversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdf
 
Deep Generative Modelling
Deep Generative ModellingDeep Generative Modelling
Deep Generative Modelling
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
 
Causative Adversarial Learning
Causative Adversarial LearningCausative Adversarial Learning
Causative Adversarial Learning
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Generative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural NetworksGenerative modeling with Convolutional Neural Networks
Generative modeling with Convolutional Neural Networks
 
Black-Box attacks against Neural Networks - technical project report
Black-Box attacks against Neural Networks - technical project reportBlack-Box attacks against Neural Networks - technical project report
Black-Box attacks against Neural Networks - technical project report
 
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
 
Deceiving Autonomous Cars with Toxic Signs
Deceiving Autonomous Cars with Toxic SignsDeceiving Autonomous Cars with Toxic Signs
Deceiving Autonomous Cars with Toxic Signs
 

Más de GeekPwn Keen

THE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGE
THE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGETHE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGE
THE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGEGeekPwn Keen
 
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...GeekPwn Keen
 
TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team - DEF CON 26 CAA...
TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team -  DEF CON 26 CAA...TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team -  DEF CON 26 CAA...
TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team - DEF CON 26 CAA...GeekPwn Keen
 
ADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGE
ADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGEADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGE
ADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGEGeekPwn Keen
 
HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...
HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...
HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...GeekPwn Keen
 
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...GeekPwn Keen
 
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...GeekPwn Keen
 
WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...
WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...
WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...GeekPwn Keen
 
Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...
Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...
Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...GeekPwn Keen
 
Huiming Liu-'resident evil' of smart phones--wombie attack
Huiming Liu-'resident evil' of smart phones--wombie attackHuiming Liu-'resident evil' of smart phones--wombie attack
Huiming Liu-'resident evil' of smart phones--wombie attackGeekPwn Keen
 
Zhiyun Qian-what leaves attacker hijacking USA Today site
Zhiyun Qian-what leaves attacker hijacking USA Today siteZhiyun Qian-what leaves attacker hijacking USA Today site
Zhiyun Qian-what leaves attacker hijacking USA Today siteGeekPwn Keen
 
Bo Li-they’ve created images that reliably fool neural network
Bo Li-they’ve created images that reliably fool neural networkBo Li-they’ve created images that reliably fool neural network
Bo Li-they’ve created images that reliably fool neural networkGeekPwn Keen
 
Nick Stephens-how does someone unlock your phone with nose
Nick Stephens-how does someone unlock your phone with noseNick Stephens-how does someone unlock your phone with nose
Nick Stephens-how does someone unlock your phone with noseGeekPwn Keen
 

Más de GeekPwn Keen (13)

THE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGE
THE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGETHE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGE
THE VANISHING TRICK FOR SELF-DRIVING CARS - Weilin Xu - DEF CON 26 CAAD VILLAGE
 
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
TARGETED ADVERSARIAL EXAMPLES FOR BLACK BOX AUDIO SYSTEMS - Rohan Taori, Amog...
 
TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team - DEF CON 26 CAA...
TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team -  DEF CON 26 CAA...TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team -  DEF CON 26 CAA...
TRANSFERABLE ADVERSARIAL PERTURBATIONS - Tencent Blade Team - DEF CON 26 CAA...
 
ADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGE
ADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGEADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGE
ADVERSARIAL^2 TRAINING - Yao Zhao, Yuzhe Zhao - DEF CON 26 CAAD VILLAGE
 
HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...
HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...
HOW TO LEVERAGE THE OPEN-SOURCE INFORMATION TO MAKE AN EFFECTIVE ADVERSARIAL ...
 
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
 
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
 
WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...
WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...
WEAPONS FOR DOG FIGHT:ADAPTING MALWARE TO ANTI-DETECTION BASED ON GAN - Zhuan...
 
Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...
Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...
Hardware Trojan Attacks on Neural Networks - Joseph Clements - DEF CON 26 CAA...
 
Huiming Liu-'resident evil' of smart phones--wombie attack
Huiming Liu-'resident evil' of smart phones--wombie attackHuiming Liu-'resident evil' of smart phones--wombie attack
Huiming Liu-'resident evil' of smart phones--wombie attack
 
Zhiyun Qian-what leaves attacker hijacking USA Today site
Zhiyun Qian-what leaves attacker hijacking USA Today siteZhiyun Qian-what leaves attacker hijacking USA Today site
Zhiyun Qian-what leaves attacker hijacking USA Today site
 
Bo Li-they’ve created images that reliably fool neural network
Bo Li-they’ve created images that reliably fool neural networkBo Li-they’ve created images that reliably fool neural network
Bo Li-they’ve created images that reliably fool neural network
 
Nick Stephens-how does someone unlock your phone with nose
Nick Stephens-how does someone unlock your phone with noseNick Stephens-how does someone unlock your phone with nose
Nick Stephens-how does someone unlock your phone with nose
 

Alexey kurakin-what's new in adversarial machine learning

  • 1.
  • 2. The recent advancement of adversarial machine learning Alexey Kurakin The Google Brain Team
  • 3. What are adversarial examples? Why they are important?
  • 4. What are adversarial examples? Hamster Clean image Image classifier Crafted adversarial perturbation Adversarial image Image classifier Cauliflower
  • 5. Related problem - garbage class examples (Nguyen, Yosinski, Clune, 2015)
  • 6. Why adversarial examples are important? Clean image Adversarial image ● Small changes in input -> change of classification label ○ Lack of robustness of machine learning ● Security problem ○ You can fool computer, human won’t notice a problem
  • 7. Why adversarial examples are important? (Kurakin, Goodfellow, Bengio, 2016)
  • 8. Why adversarial examples are important? (Athalye, Engstrom, Ilyas, Kwok, 2017)
  • 9. Why adversarial examples are important? (Brown, Mane, Roy, Abadi, Gilmer, 2017)
  • 10. Adversarial attacks How to generate adversarial examples
  • 11. White box VS black box (Papernot, McDaniel, Goodfellow, 2016) White box case: ● Everything is known about classifier (including architecture, model parameters) Black box case: ● Model parameters are unknown In such case craft adversarial examples for known model and transfer to unknown
  • 12. Digital VS Physical Digital attack: ● Directly feeding numbers into classifier classifier Physical attack: ● Classifier perceives world through sensor (e.g. camera)
  • 13. White box adversarial attack - L-BFGS Where: ● F(•) - neural network or another ML classifier ● x - clean image ● L - desired target class ● r - adversarial perturbation, which makes x+r adversarial image Method is very computationally demanding and slow (Szegedy et al, 2014)
  • 14. White box adversarial attack - FGSM (Goodfellow, Shlens, Szegedy, 2014) Where: ● J(x,y) - cross entropy loss of prediction for input sample x and label y ● x - clean image with true label Ytrue ● ϵ - method parameter, size of perturbation
  • 15. White box adversarial attack - Iterative FGSM Where: ● J(x,y) - cross entropy loss of prediction for input sample x and label y ● x - clean image with true label Ytrue ● α - method parameter, size of one step (Kurakin, Goodfellow, Bengio, 2016)
  • 16. White box adversarial attack - Iterative FGSM Where: ● J(x,y) - cross entropy loss of prediction for input sample x and label y ● x - clean image, Ytarget - desired target class ● α - method parameter, size of one step (Kurakin, Goodfellow, Bengio, 2016)
  • 17. Considered one of the strongest white box attacks. White box adversarial attack - C&W Where: ● Logits(•) - logits of the network ● X - clean image, Xadv - adversarial image ● t - desired target class ● (z)+ = max(0, z) (Carlini, Wagner, 2016)
  • 18. Black box attacks 1. Train your own classifier on similar problem 2. Construct adversarial examples for your classifiers 3. Use them to attack unknown classifier (Papernot, McDaniel, Goodfellow, 2016)
  • 20. Defenses - input preprocessing Problems: ● May degrade quality on clean images ● Broken when attacker is aware of transformation Classifier Transformation (ex.: blur) (multiple work by multiple authors)
  • 21. Defenses - gradient masking Classifier Similar to previous, but with non differentiable transformation Problem: ● Can use transferability to attack the model Conv Non differentiable function Conv let’s make this impossible
  • 22. Defenses - adversarial training Minibatch from dataset Generate adversarial examples Training minibatch Model weights Compute gradients update weights 1 1 1 2 3 4 Idea: let’s inject adversarial examples into training set 1. Compute adversarial examples using current model weights 2. Compose mixed minibatch of clean and adversarial examples 3. Use mixed minibatch to compute model gradients 4. Update model weights Problems: ● may be harder to train ● model may learn to mask gradients
  • 23. Defenses - ensemble adversarial training Trainable model Fixed model 1 Compute gradients update weights To solve “gradient masking” problem of adversarial training let’s ensemble a few models. Fixed model N Minibatch from dataset Adversarial examples Training minibatch (Tramer et al, 2017)
  • 25. Adversarial competition Three tracks / sub-competitions: ● Adversarial attacks - algorithms which try to confuse classifier ○ Input: image ○ Output: adversarial image ● Adversarial targeted attacks - algorithms which try to confuse classifiers in a very specific way ○ Input: image and target class ○ Output: adversarial image ● Adversarial defenses - classifiers which robust to adversarial examples ○ Input: image ○ Output: classification label Attacks are not aware of defenses, so this is simulation of black box scenario. (Kurakin, Goodfellow, Bengio, 2017)
  • 27. Adversarial competition - best defenses Trainable model Fixed model 1 Compute gradients update weights ● Top defenses are showing >90% accuracy on adversarial examples ● Most of the top defenses are using “ensemble adversarial training” as a part of the model. Fixed model N Minibatch from dataset Adversarial examples Training minibatch
  • 29. Summary ● Adversarial examples could be used to fool machine learning models ● Adversarial attacks: ○ White box VS black box ○ Digital VS physical ● Defenses against adversarial examples: ○ A lot of defenses works only if attacked does not know about them ○ One of potentially universal defenses - adversarial training However adversarial training does not work in all cases ● Adversarial competition ○ Showed that ensemble adversarial training could be pretty good defense against black box attack
  • 30. Q & A