Más contenido relacionado Más de Edge AI and Vision Alliance (20) "AI Reliability Against Adversarial Inputs," a Presentation from Intel1. © 2019 Intel
AI Reliability Against
Adversarial Inputs
Gokcen Cilingir, Li Chen
Intel
May 2019
2. © 2019 Intel
Motivation
Adversarial examples are already seen in our daily life.
Designing AI solutions that are robust against adversarial inputs is important
for
• [Security] Critical system asset management and defense against
malware
• [Reliability] Ensuring consistent and reliable system behavior
2
AI-based
malware
detector
PASS
Automated
Speech
Recognition
Adversarial input
“A B C” “X Y Z”
Faulty prediction
8. © 2019 Intel
Adversarial ML concepts
8
A taxonomy of adversaries against machine learning models at test time
(with evasion as the goal)
Image source: [6]
• Evasion vs. Data
poisoning attacks
• Threat models
• Substitute model
creation and
transferability
9. © 2019 Intel
Adversarial example creation
• The very tool that makes ML powerful is being used to break it: optimization
• An adversarial example x* is found by perturbing an originally correctly classified input x by
(approximately) solving the following constrained optimization problem. t is the target class.
9
Added noise
magnified by 10x
Prediction:
School bus
Prediction:
Ostrich
Image source: [4], Text adopted:[5]
10. © 2019 Intel
How has Machine Learning been Exploited?
• Take binary classification as an example. Through training, one can generally learn only an
approximation of the true boundaries.
• The model error between the approximate and expected decision boundaries is exploited by
adversaries as illustrated in the following figure:
10
Image source, text adopted: [5]
12. © 2019 Intel
High level flow for AI-based solutions
12
AI inferenceInput
Prediction &
confidence Action
determination
Action
AI model
definition and
training
Training
dataset
AI model
CLIENT
CLOUD
AI-based application
13. © 2019 Intel
Defense and mitigation against adversarial attacks
13
AI inferenceInput
Prediction &
confidence Action
determination
Action
Adversary aware AI model
definition and training
Training
dataset
Strengthened AI model
CLIENT
CLOUD
AI-based application
Pre-processing for
perturbation
removal
Input validation
/Adversary
detection
Mitigation
policy
Examples: Adversarial training,
defensive distillation, logit pairing,
architectural modifications like BNNs
Examples: JPEG
compression, SHIELD,
MP3 audio compression
Examples: Distributional detection,
normalization detection, PCA-based
detection, secondary classification
Robustness metric
14. © 2019 Intel
Defenses and their limitations
• Current status: Race continues between attackers and defenses. All known defense
techniques come with limitations.
• Adversarial training uses generated adversarial examples as part of training
• Architectures like Bayesian NNs provide better uncertainty modeling
• Detection and pre-processing methods are generally domain specific.
• SHIELD compresses away small pixel manipulations over images, MP3 audio
compression applies the same idea over audio data.
14
16. © 2019 Intel
MLsploit: A Cloud-Based Framework for Adversarial
Machine Learning Research
16
• Research module for adversarial machine learning
• Interactive interface and experimentation
• Comparison for attack and defenses
• Easy integration
18. © 2019 Intel 18
Toolkits
• Adversarial Robustness Toolbox (ART): Python library for adversarial
attacks and defenses (evasion, poisoning) for NNs. Also supplies access
to robustness metrics.
• Cleverhans: An adversarial example library for constructing attacks,
building defenses, and benchmarking both.
• ALFASVMLib: An open-source Matlab library that implements a set of
heuristic attacks against Support Vector Machines (SVMs).
• Foolbox: A Python toolbox to create adversarial examples that fool neural
networks in PyTorch, TensorFlow, Keras, MxNet, ..
• MLsploit: A web platform to demo Machine Learning as a Service on
security researches. A portal to demo adversarial ML and
countermeasures is provided.
19. © 2019 Intel
Conclusion
• ML can be vulnerable to adversarial examples.
• Designing AI solutions that are robust against adversarial inputs is
important for security and reliability of critical applications.
• Several free and open-sourced toolkits exist to assess and strengthen AI
models.
19
20. © 2019 Intel
Resources
20
[1] Sharif, Mahmood, et al. "Accessorize to a crime: Real and stealthy attacks on state-of-the-art face
recognition." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications
Security. ACM, 2016.
[2] Metzen, Jan Hendrik, et al. "Universal adversarial perturbations against semantic image
segmentation." 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017.
[3] Carlini, Nicholas, and David Wagner. "Audio adversarial examples: Targeted attacks on speech-to-
text." 2018 IEEE Security and Privacy Workshops (SPW). IEEE, 2018.
[4] Szegedy, Christian, et al. "Intriguing properties of neural networks." arXiv preprint
arXiv:1312.6199 (2013).
[5] Goodfellow, Ian, Patrick McDaniel, and Nicolas Papernot. "Making machine learning robust against
adversarial inputs." Communications of the ACM 61.7 (2018): 56-66.
[6] Papernot, Nicolas, et al. "The limitations of deep learning in adversarial settings." 2016 IEEE
European Symposium on Security and Privacy (EuroS&P). IEEE, 2016.
[7] Das, Nilaksh, et al. "Shield: Fast, practical defense and vaccination for deep learning using jpeg
compression." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining. ACM, 2018.
[8] Das, Nilaksh, et al. "ADAGIO: Interactive Experimentation with Adversarial Attack and Defense for
Audio." Joint European Conference on Machine Learning and Knowledge Discovery in Databases.
Springer, Cham, 2018.
21. © 2019 Intel
Glossary
• Adversarial example: Inputs that have been intentionally optimized to cause misclassification
• Threat models in the context of adversarial ML refers to the explicit definition of the capabilities
of the adversary
• White-box threat model: An adversary with an access to (at minimum) the model architecture
and the parameter values.
• Black-box threat model: An adversary with an access to either just the samples or the oracle
(system output is visible)
• Adversarial sample transferability: The property that adversarial samples produced by training
on a specific model can affect another model, even if they have different architectures and/or
training data
• Evasion attack: The adversary tries to evade the system by adjusting malicious samples during
testing phase
• Data poisoning attack: An adversary tries to poison the training data by injecting carefully
designed samples to compromise the learning process
21
22. © 2019 Intel
Legal Disclaimers
▪Intel provides these materials as-is, with no express or implied warranties.
▪Intel products may contain design defects or errors known as errata, which may cause the
product to deviate from published specifications. Current characterized errata are available
on request.
▪Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. No computer system can be absolutely secure. Check with your system
manufacturer or retailer or learn more at http://intel.com.
▪No license (express or implied, by estoppel or otherwise) to any intellectual property rights
is granted by this document.
Copyright © 2019 Intel Corporation. All rights reserved. Intel, the Intel logo and others are
trademarks of Intel Corporation in the U.S. and/or other countries.
22