Li Chen & Ravi Sahita
In this talk, we juxtapose the resiliency and trustworthiness of composition of DL and classical ML algorithms for security, via a case study of evaluating the resiliency of ransomware detection via the generative adversarial network (GAN). We propose to use GAN to automatically produce dynamic features that exhibit generalized malicious behaviors that can reduce the efficacy of black-box ransomware classifiers. We examine the quality of the GAN-generated samples by comparing the statistical similarity of these samples to real ransomware and benign software. Further we investigate the latent subspace where the GAN-generated samples lie and explore reasons why such samples cause a certain class of ransomware classifiers to degrade in performance. The automatically generated adversarial samples can then be fed into the training set to reduce the blind spots of the detectors.
There has been a surge of interest in using machine learning (ML) particularly deep learning (DL) to automatically detect malware through their dynamic behaviors. These approaches have achieved significant improvement in detection rates and lower false positive rates at large scale compared with traditional malware analysis methods. ML in threat detection has demonstrated to be a good cop to guard platform security. However it is imperative to evaluate - is ML-powered security resilient enough?
To generate reliable traces of system activity, we can utilize CPU-based telemetry such as Intel Processor Trace which can be extracted via a hypervisor without guest instrumentation. We advocate that file I/O events extracted from Intel processor trace together with algorithmic improvements have shown potential stronger defense in ML -based model deployment in the wild to combat ransomware attack. Our results and discoveries should pose relevant questions for defenders such as how ML models can be made more resilient for robust enforcement of security objectives.
Handwritten Text Recognition for manuscripts and early printed texts
BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches for ransomware detection
1.
2. PRESENTERS: LI CHEN, RAVI SAHITA
CONTRIBUTORS: LI CHEN, RAVI SAHITA, CHIH-YUAN YANG, ANINDYA PAUL
Machine learning based
ransomware detection:
the good, the bad, and the ugly
4. OUTLINE OF THE TALK
• Ransomware detection case study
• The Good:
• Machine Learning (ML) is effective
• The Bad:
• ML can launch adversarial attacks on ML models
• The Ugly:
• ML Model durability
• Improving detection via complementary platform capabilties
5. What is Ransomware?
• Ransomware is a category of malware which hijacks victim’s data or
machine and demands monetary returns
• Categories:
• Locker-ransomware: hijack resources without encryption
• Crypto-ransomware: deny access using encryption
• The damage done by crypto-ransomware is irreversible in most cases due
to the use of cryptography
7. Ransomware Data Description
• Downloaded total ~22k ransomware
using Microsoft and Kaspersky’s labels
from VirusTotal
• ~ 5min execution for each sample
• Decoy files to identify activated crypto-
ransomware - Identified ~4.4k active
samples
Ransomware families →
8. DATA ACQUISITION VIA Sandbox
System
• Bare-metal system built on
Windows*-based system
• Refresh system by checkpointing
SSD writes and restoring SSD
partition image
• Anti-evasion mechanisms
• Simulated human activities
• Opened applications
• Limited heuristics
Storage
Control
Server
Storage
Robot
Internet
Data
storage
Router
Programmable
Power Control
…
Robot
9. Behavior Data BASED ON I/O Events
• Collected Time stamp, I/O Event Type, Target Filename, Entropy
• Based on C# .Net framework FileSystemWatcher
• Entropy of target files calculated by normalized Shannon entropy
11. Feature extraction
Events Feature
encoding
Padding 0
File deleted 1
File content changed and entropy is [0.9, 1] 2
File content changed and entropy is [0.2, 0.4) 3
File content changed and entropy is [0, 0.2) 4
File created 5
File content changed and entropy is [0.8, 0.9) 6
File renamed 7
File content changed and entropy is [0.4, 0.6) 8
File content changed and entropy is [0.6, 0.8) 9
• Each execution log is
represented by a
sequence of events.
• We set the length = 3000
for each sample.
12. ML model results for ransomware
detection
❖Train-Test ratio: 0.8:0.2
❖Training samples: 1292 benign, 3736 malicious
❖Test samples: 324 benign, 934 malicious
❖Dimensionality: n x 3000
❖7 ML models
We select Text CNN as feature
extractor due to its superior
performance compared with
other classifiers.
14. Features are well-separated in Text-
CNN subspace
Class-conditional density plot for each dimension in Text-CNN feature space.
15. classifiers greatly improve in Text CNN
feature subspace
Classifiers improve up to 55% in accuracy in Text-CNN space.
16. The Good - summary
Machine learning is highly effective for malware detection.
When ML classifiers are used in security-critical applications,
are accuracy, FPR, precision, recall, F1 scores enough?
18. Adversarial Machine Learning in Vision
Object Detection
(amplified)
DNN: “Speed Limit Sign” DNN: “Ruler”Difference
Image Classification
https://arxiv.org/abs/1412.6572
https://arxiv.org/abs/1703.08603
Adversarial Pertubation
Object detection on original image with
adversarial pertubationObject detection on original image
19. Generative Adversarial Network
(GAN)
Generator
Discriminator
G generates fakes to fool D
D differentiates fakes and reals.
Over time, G and D get better.
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
20. Core Idea: GAN to synthesize
ransomware logs
Our threat model assumes the adversary has access to training dataset but has no
knowledge of the ML classifier.
21. adversarial quality assessment
A successful evasion means the generated malicious samples not
only fool ransomware classifier, but also persists maliciousness
based on certain metrics.
We propose sample-based and batch-based adversarial quality
metrics to evaluate
23. GAN to bypass ransomware detection
• We use the same training data to train AC-GAN
• The stopping criterion is based on the loss of the
discriminator
• At test time, we generate 5000 malicious segments and
ensure their adversarial quality
24. Detection results on good quality
adversarial examples
Indicates a broad attack surface for ML
25. THE BAD - SUMMARY
Machine learning can automatically hack other highly effective
ML systems. Generative adversarial network can serve as an
intelligent hacker to bypass effective systems.
Robustness and resiliency are equally important as accuracy,
FPR, precision, recall, F1 scores.
Why does this happen?
27. Investigation
We investigate why the
generated samples can
bypass ML detection
The generated samples, in
dark red, lie close to a
linear boundary but much
closer to the real benign
samples in the Text-CNN
latent feature subspace
28. Non-linear boundary decision shows to
be more robust
• SVM with radial basis in
Text-CNN space was able
to detect all the
adversarial examples
• The non-linear boundary
decision pertain
robustness and indicates
a smaller blindspot
30. The Ugly - summary:
Investigation of ML boundaries indicates adversarial
samples lie close to the benign samples in feature
subspace.
Nonlinear boundary decisions show better resiliency
against adversarial examples.
32. Intel Labs
Analytical to real-world samples
How to take the output of GAN and incorporate it in the tool to
run ransomware?
33. PLATFORM capabilities to make ML system
more trustworthy
• Can we use ML + system capabilities to make the attackers’ job harder?
• Intel® Processor Trace and other telemetry can be used to make the system call
activity information more trustworthy
• Checkpointing technologies are a useful tool for recovery
• Trusted Execution capabilities can prevent model stealing attacks
• New storage mechanisms (such as persistent memory) provide new avenues for
access-control
35. ML VULNERABILITY RESEARCH PLATFORM
- MLsploit
• A Cloud-Based Framework for Adversarial
Machine Learning Research
• Tool for interactive investigation of ML
vulnerabilities
• Interactive interface and iterative
experimentation
• Comparison for attack and defenses
36. SUMMARY
• ML can be used to build efficient, scalable and accurate at recognizing malicious
attacks such as ransomware
• ML can also be used to hack vulnerable ML systems
• ML models must comprehend adversarial approaches, concept drift and time
variations
• Combing platform capabilities for attack surface reduction (prevention) and
recovery capabilities can complement ML detection for robust solutions
40. The intersection of AI & Security
40
Security Analytics Secure AI Workloads Adversarial Resilient AI
Today’s focus
41. Case Study
• Collect real ransomware and benign software
• Examine ML effectiveness for ransomware detection
• Explore ML robustness when ML generates adversarial ransomware
samples
• Investigate ML blind spot and boundaries
43. 43
Beyond vision: audio or malware
Attack in ASR domain on audio waveforms to fool
DeepSpeech (speech-to-text transcription)
AVPASS: adversarial malwares variants that can
beat VirusTotal detection
https://arxiv.org/abs/1801.01944
https://www.blackhat.com/us-17/briefings/schedule/#avpass-leaking-and-bypassing-antivirus-detection-model-automatically-7354
Bypass VirusTotal
up to 100%
44. Training GAN
Challenges:
❖Convergence issue:
❖Transfer learning
❖Learning rates adapted for generator and discriminaro
• We use the same training data to train AC-GAN
• The stopping criterion is based on the loss of the discriminator
• At test time, we generate 5000 malicious segments
44