SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
REDNAGA
ANDROID MALWARE
AND MACHINE LEARNING
CALEB FENTON
08.24.2017
Dead Drop SF
WHO AM I
• Researcher @ SentinelOne
• Previously @ Lookout and @SourceClear
• Enjoy reading, cryptocurrency, economics
• Made Simplify and other Android tools
• @caleb_fenton
• github.com/CalebFenton
CALEB
WHO ARE WE
• rednaga.io
• Banded together by the love of 0days and hot sauces
• Collaborate and try to improve the community
• Disclosures / Code / Lessons on GitHub
• @RedNagaSec
• github.com/RedNaga
RED NAGA
TALK OVERVIEW
1. Machine learning overview
2. Using apkfile for feature extraction
3. Useful features for Android malware
4. Tips for building good models
REDNAGA
MACHINE LEARNING
OVERVIEW
STEP 1: UNDERSTAND THE FORMAT
• Android apps come as APK files
• APKs are just ZIPs
• APKs are rich with variety
• Android manifest / binary XML
• Dalvik executables
• Signing certificates
• Other resources (icons, maps, sounds, …)
• Offensive & Defensive Android Reverse Engineering

github.com/rednaga/training/tree/master/DEFCON23
STEP 2: COLLECT SAMPLES
• Need lots of good and bad samples
• Diversity of good and bad is important
• Sample sources:
• VirusTotal, VirusShare, market crawlers, other
researchers, friends
STEP 3: ENGINEER FEATURES
How do humans do it?
STEP 3: ENGINEER FEATURES
features = [has_beard]
STEP 3: ENGINEER FEATURES
App label: MX Player Pro
Package: com.mxtech.videoplayer.pro
CN=Kim Jae hyun, O=MX Technologies, L=Seoul, ST=South Korea, C=KR
App label: Google Service Updater
Package: it.googleandroid.updater
CN=GService inc, OU=G Service inc, O=G, L=New York, ST=New York, C=US
Example 1
Example 2
STEP 3: ENGINEER FEATURES
• Certificate details - common name, country, …
• Suspicious strings - “pm uninstall”, “google”
• Permissions - which ones and how many
• API calls - send SMS, load DEX file
• Overall app quality - default icons, typos
STEP 4: BUILD AND TUNE MODELS
• Collect and prepare data
• Drop low value features
• Try many algorithms
• Train and blend multiple models
REVIEW
1. Collect samples
2. Understand the format
3. Engineer features (apkfile!)
4. Build and tune model
REDNAGA
USING APKFILE
WHAT IS APKFILE?
• APK feature extraction library (Java)
• github.com/CalebFenton/apkfile
• Parses DEX files (dexlib2)
• Parses APK certificates
• Parses Android manifest (based on ArscBlamer)
• Hardened for use against obfuscation
• Everything is an object for easy inspection
EXAMPLE: ANDROID MANIFEST
ApkFile apkFile = new ApkFile("someapp.apk");
AndroidManifest androidManifest = apkFile.getAndroidManifest();
// Get some manifest properties
String packageName = androidManifest.getPackageName();
String appLabel = androidManifest.getApplication().getLabel();
// Print permission names
for (Permission permission : androidManifest.getPermissions()) {
System.out.println("permission: " + permission.getName());
}
// Print exported services
for (Service service : androidManifest.getApplication().getServices()) {
if (service.isExported()) {
System.out.println("exported: " + service.getName());
}
}
EXAMPLE: APK CERTIFICATE
ApkFile apkFile = new ApkFile("example-malware.apk");
Certificate certificate = apkFile.getCertificate();
Collection<Certificate.SubjectAndIssuerRdns> allRdns =
certificate.getAllRdns();
// APK may be signed by multiple certificates
for (Certificate.SubjectAndIssuerRdns rdns : allRdns) {
Map<String, String> subjectRdns = rdns.getSubjectRdns();
// Get certificate subject CN and O properties
System.out.println("Subject common name: " + subjectRdns.get("CN"));
System.out.println("Subject organization: " + subjectRdns.get("O"));
// Print all certificate properties
System.out.println("Issuer RDNS: " + rdns.getIssuerRdns());
}
EXAMPLE: DALVIK EXECUTABLES
Map<String, DexFile> pathToDexFile = apkFile.getDexFiles();
for (Map.Entry<String, DexFile> e : pathToDexFile.entrySet()) {
String path = e.getKey();
DexFile dexFile = e.getValue();
System.out.println("Analyzing " + path);
dexFile.analyze();
// Average cyclomatic complexity, also available for each method
System.out.println("Cyclomatic complexity: " + dexFile.getCyclomaticComplexity());
// Get API call counts over all methods
// Trove maps generally preferred for unboxing, incrementing performance
TObjectIntIterator<MethodReference> iterator = dexFile.getApiCounts().iterator();
while (iterator.hasNext()) {
iterator.advance();
MethodReference methodRef = iterator.key();
int count = iterator.value();
// E.g. Ljava/lang/StringBuilder;->toString called 18 times
System.out.println(methodRef + " called " + count + " times");
}
// Print op code histograms for each method
for (Map.Entry<String, DexMethod> me : dexFile.getMethodDescriptorToMethod().entrySet()) {
String methodDescriptor = me.getKey();
// E.g. Lit/googleandroid/updater/a;->a(Ljava/lang/String;)Ljava/lang/String; op counts
System.out.println(methodDescriptor + " op counts");
DexMethod dexMethod = me.getValue();
TObjectIntIterator<Opcode> opIter = dexMethod.getOpCounts().iterator();
while (opIter.hasNext()) {
opIter.advance();
// E.g. MOVE_RESULT_OBJECT: 46
System.out.println(" " + opIter.key() + ": " + opIter.value());
}
}
}
REDNAGA
USEFUL FEATURES
ANDROID MANIFEST
• Has main launcher activity
• No launcher implies no user interaction
• Number of activity package paths
• Malicious activities injected?
• Permissions / number of permissions
• Good clue what app may do
APKID FEATURES
• “PEiD for Android” - detects compilers, packers, …
• Compiler - dx (native) / dexlib (modified)
• Anti-VM strings - avoiding VM analysis
• Build.MANUFACTURER, SIM operator, device ID, subscriber ID
• Detecting Pirated and Malicious Android Apps with APKiD

rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/
STRINGS
• Number of gibberish strings
• Find weird certificate details
• Find unusual obfuscation
•
Using Markov Chains for Android Malware Detection

calebfenton.github.io/2017/08/23/using-markov-chains-for-android-malware-detection/
REDNAGA
TIPS FOR BUILDING
GOOD MODELS
TIPS
• Most guides are for toy data sets
• No one talks about large data set problems
• Everyone assumes you have a dense matrix
• Assuming sklearn, but applies to other libs
PREPARING DATA
• Normalization is important
• Scale with MaxAbs or MinMax if many 0s
• Needed for some algorithms (not decision trees)
• Needed for dropping invariant features
• Drop invariant features
• Reduces chance of overfitting
• Example: file hash, app label, rare API calls
SELECTING FEATURES
• Score features and plot scores to build intuition
• Usually long tail of useless features
• Gives ideas for new features
• Top 100 features almost as good as top 1000
• Run experiments with subsets of features
• Improves speed
• Only interested in relative differences
BUILDING MODELS
• Grid search to find best algorithms and parameters
• Iterate on several, smaller searches
• Decision tree ensembles aren’t hip, but work well

sentinelone.com/blog/detecting-malware-pre-execution-static-analysis-machine-learning/
• Build and blend multiple models

sentinelone.com/blog/measuring-the-usefulness-of-multiple-models/
• Feature Selection and Grid Searching Hyper-parameters

gist.github.com/CalebFenton/66aa04af7b4a4d98efca059cb8c2e7aa
REDNAGA
EXTENDED READING
https://github.com/rednaga/training/tree/master/DEFCON23
http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
https://rednaga.io/
https://calebfenton.github.io/
http://androidcracking.blogspot.com/
REDNAGA
08.24.2017
THANKS!
Dead Drop SF
CALEB FENTON
@CALEB_FENTON
QUESTIONS?

Más contenido relacionado

La actualidad más candente

A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
mametter
 
Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use
AdaCore
 

La actualidad más candente (20)

ScalaClean at ScalaSphere 2019
ScalaClean at ScalaSphere 2019ScalaClean at ScalaSphere 2019
ScalaClean at ScalaSphere 2019
 
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3A Static Type Analyzer of Untyped Ruby Code for Ruby 3
A Static Type Analyzer of Untyped Ruby Code for Ruby 3
 
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...What I Learned From Writing a Test Framework (And Why I May Never Write One A...
What I Learned From Writing a Test Framework (And Why I May Never Write One A...
 
Dependence day insurgence
Dependence day insurgenceDependence day insurgence
Dependence day insurgence
 
CNIT 126 13: Data Encoding
CNIT 126 13: Data EncodingCNIT 126 13: Data Encoding
CNIT 126 13: Data Encoding
 
Fallacies of unit testing
Fallacies of unit testingFallacies of unit testing
Fallacies of unit testing
 
Variables in Pharo5
Variables in Pharo5Variables in Pharo5
Variables in Pharo5
 
Building Scalable Applications with Laravel
Building Scalable Applications with LaravelBuilding Scalable Applications with Laravel
Building Scalable Applications with Laravel
 
Reflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakReflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond Smalltak
 
The Python in the Apple
The Python in the AppleThe Python in the Apple
The Python in the Apple
 
Reflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond SmalltakReflection in Pharo: Beyond Smalltak
Reflection in Pharo: Beyond Smalltak
 
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for JavaSystematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
Systematic Evaluation of the Unsoundness of Call Graph Algorithms for Java
 
Robot Framework Introduction & Sauce Labs Integration
Robot Framework Introduction & Sauce Labs IntegrationRobot Framework Introduction & Sauce Labs Integration
Robot Framework Introduction & Sauce Labs Integration
 
Android lint presentation
Android lint presentationAndroid lint presentation
Android lint presentation
 
Building Large Scale PHP Web Applications with Laravel 4
Building Large Scale PHP Web Applications with Laravel 4Building Large Scale PHP Web Applications with Laravel 4
Building Large Scale PHP Web Applications with Laravel 4
 
Dynamically Composing Collection Operations through Collection Promises
Dynamically Composing Collection Operations through Collection PromisesDynamically Composing Collection Operations through Collection Promises
Dynamically Composing Collection Operations through Collection Promises
 
Sonarjenkins ajip
Sonarjenkins ajipSonarjenkins ajip
Sonarjenkins ajip
 
Why the Dark Side should use Swift and a SOLID Architecture
Why the Dark Side should use Swift and a SOLID ArchitectureWhy the Dark Side should use Swift and a SOLID Architecture
Why the Dark Side should use Swift and a SOLID Architecture
 
Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use Tech Days 2015: CodePeer - Introduction and Examples of Use
Tech Days 2015: CodePeer - Introduction and Examples of Use
 
Practical Malware Analysis: Ch 15: Anti-Disassembly
Practical Malware Analysis: Ch 15: Anti-DisassemblyPractical Malware Analysis: Ch 15: Anti-Disassembly
Practical Malware Analysis: Ch 15: Anti-Disassembly
 

Similar a Android Malware and Machine Learning

Android village @nullcon 2012
Android village @nullcon 2012 Android village @nullcon 2012
Android village @nullcon 2012
hakersinfo
 
Android Scripting
Android ScriptingAndroid Scripting
Android Scripting
Juan Gomez
 
Matteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break editionMatteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break edition
DuckMa
 
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
Area41
 

Similar a Android Malware and Machine Learning (20)

CNIT 128 6. Analyzing Android Applications (Part 1)
CNIT 128 6. Analyzing Android Applications (Part 1)CNIT 128 6. Analyzing Android Applications (Part 1)
CNIT 128 6. Analyzing Android Applications (Part 1)
 
Steelcon 2015 Reverse-Engineering Obfuscated Android Applications
Steelcon 2015 Reverse-Engineering Obfuscated Android ApplicationsSteelcon 2015 Reverse-Engineering Obfuscated Android Applications
Steelcon 2015 Reverse-Engineering Obfuscated Android Applications
 
Android Penetration testing - Day 2
 Android Penetration testing - Day 2 Android Penetration testing - Day 2
Android Penetration testing - Day 2
 
In app search 1
In app search 1In app search 1
In app search 1
 
OWASP Nagpur Meet #3 Android RE
OWASP Nagpur Meet #3 Android REOWASP Nagpur Meet #3 Android RE
OWASP Nagpur Meet #3 Android RE
 
Hacking your Droid (Aditya Gupta)
Hacking your Droid (Aditya Gupta)Hacking your Droid (Aditya Gupta)
Hacking your Droid (Aditya Gupta)
 
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
Security Vulnerabilities in Mobile Applications (Kristaps Felzenbergs)
 
Android application analyzer
Android application analyzerAndroid application analyzer
Android application analyzer
 
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptxSANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
SANS_PentestHackfest_2022-PurpleTeam_Cloud_Identity.pptx
 
Android village @nullcon 2012
Android village @nullcon 2012 Android village @nullcon 2012
Android village @nullcon 2012
 
From MEAN to the MERN Stack
From MEAN to the MERN StackFrom MEAN to the MERN Stack
From MEAN to the MERN Stack
 
Null Mumbai Meet_Android Reverse Engineering by Samrat Das
Null Mumbai Meet_Android Reverse Engineering by Samrat DasNull Mumbai Meet_Android Reverse Engineering by Samrat Das
Null Mumbai Meet_Android Reverse Engineering by Samrat Das
 
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
Matteo Gazzurelli - Andorid introduction - Google Dev Fest 2013
 
Android Scripting
Android ScriptingAndroid Scripting
Android Scripting
 
Attacking and Defending Mobile Applications
Attacking and Defending Mobile ApplicationsAttacking and Defending Mobile Applications
Attacking and Defending Mobile Applications
 
Introduction of Android Architecture
Introduction of Android ArchitectureIntroduction of Android Architecture
Introduction of Android Architecture
 
Aleksei Dremin - Application Security Pipeline - phdays9
Aleksei Dremin - Application Security Pipeline - phdays9Aleksei Dremin - Application Security Pipeline - phdays9
Aleksei Dremin - Application Security Pipeline - phdays9
 
Matteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break editionMatteo Gazzurelli - Introduction to Android Development - Have a break edition
Matteo Gazzurelli - Introduction to Android Development - Have a break edition
 
Proactive Security AppSec Case Study
Proactive Security AppSec Case StudyProactive Security AppSec Case Study
Proactive Security AppSec Case Study
 
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
hashdays 2011: Tobias Ospelt - Reversing Android Apps - Hacking and cracking ...
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Último (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Android Malware and Machine Learning

  • 1. REDNAGA ANDROID MALWARE AND MACHINE LEARNING CALEB FENTON 08.24.2017 Dead Drop SF
  • 2. WHO AM I • Researcher @ SentinelOne • Previously @ Lookout and @SourceClear • Enjoy reading, cryptocurrency, economics • Made Simplify and other Android tools • @caleb_fenton • github.com/CalebFenton CALEB
  • 3. WHO ARE WE • rednaga.io • Banded together by the love of 0days and hot sauces • Collaborate and try to improve the community • Disclosures / Code / Lessons on GitHub • @RedNagaSec • github.com/RedNaga RED NAGA
  • 4. TALK OVERVIEW 1. Machine learning overview 2. Using apkfile for feature extraction 3. Useful features for Android malware 4. Tips for building good models
  • 6. STEP 1: UNDERSTAND THE FORMAT • Android apps come as APK files • APKs are just ZIPs • APKs are rich with variety • Android manifest / binary XML • Dalvik executables • Signing certificates • Other resources (icons, maps, sounds, …) • Offensive & Defensive Android Reverse Engineering
 github.com/rednaga/training/tree/master/DEFCON23
  • 7. STEP 2: COLLECT SAMPLES • Need lots of good and bad samples • Diversity of good and bad is important • Sample sources: • VirusTotal, VirusShare, market crawlers, other researchers, friends
  • 8. STEP 3: ENGINEER FEATURES How do humans do it?
  • 9. STEP 3: ENGINEER FEATURES features = [has_beard]
  • 10. STEP 3: ENGINEER FEATURES App label: MX Player Pro Package: com.mxtech.videoplayer.pro CN=Kim Jae hyun, O=MX Technologies, L=Seoul, ST=South Korea, C=KR App label: Google Service Updater Package: it.googleandroid.updater CN=GService inc, OU=G Service inc, O=G, L=New York, ST=New York, C=US Example 1 Example 2
  • 11. STEP 3: ENGINEER FEATURES • Certificate details - common name, country, … • Suspicious strings - “pm uninstall”, “google” • Permissions - which ones and how many • API calls - send SMS, load DEX file • Overall app quality - default icons, typos
  • 12. STEP 4: BUILD AND TUNE MODELS • Collect and prepare data • Drop low value features • Try many algorithms • Train and blend multiple models
  • 13. REVIEW 1. Collect samples 2. Understand the format 3. Engineer features (apkfile!) 4. Build and tune model
  • 15. WHAT IS APKFILE? • APK feature extraction library (Java) • github.com/CalebFenton/apkfile • Parses DEX files (dexlib2) • Parses APK certificates • Parses Android manifest (based on ArscBlamer) • Hardened for use against obfuscation • Everything is an object for easy inspection
  • 16. EXAMPLE: ANDROID MANIFEST ApkFile apkFile = new ApkFile("someapp.apk"); AndroidManifest androidManifest = apkFile.getAndroidManifest(); // Get some manifest properties String packageName = androidManifest.getPackageName(); String appLabel = androidManifest.getApplication().getLabel(); // Print permission names for (Permission permission : androidManifest.getPermissions()) { System.out.println("permission: " + permission.getName()); } // Print exported services for (Service service : androidManifest.getApplication().getServices()) { if (service.isExported()) { System.out.println("exported: " + service.getName()); } }
  • 17. EXAMPLE: APK CERTIFICATE ApkFile apkFile = new ApkFile("example-malware.apk"); Certificate certificate = apkFile.getCertificate(); Collection<Certificate.SubjectAndIssuerRdns> allRdns = certificate.getAllRdns(); // APK may be signed by multiple certificates for (Certificate.SubjectAndIssuerRdns rdns : allRdns) { Map<String, String> subjectRdns = rdns.getSubjectRdns(); // Get certificate subject CN and O properties System.out.println("Subject common name: " + subjectRdns.get("CN")); System.out.println("Subject organization: " + subjectRdns.get("O")); // Print all certificate properties System.out.println("Issuer RDNS: " + rdns.getIssuerRdns()); }
  • 18. EXAMPLE: DALVIK EXECUTABLES Map<String, DexFile> pathToDexFile = apkFile.getDexFiles(); for (Map.Entry<String, DexFile> e : pathToDexFile.entrySet()) { String path = e.getKey(); DexFile dexFile = e.getValue(); System.out.println("Analyzing " + path); dexFile.analyze(); // Average cyclomatic complexity, also available for each method System.out.println("Cyclomatic complexity: " + dexFile.getCyclomaticComplexity()); // Get API call counts over all methods // Trove maps generally preferred for unboxing, incrementing performance TObjectIntIterator<MethodReference> iterator = dexFile.getApiCounts().iterator(); while (iterator.hasNext()) { iterator.advance(); MethodReference methodRef = iterator.key(); int count = iterator.value(); // E.g. Ljava/lang/StringBuilder;->toString called 18 times System.out.println(methodRef + " called " + count + " times"); } // Print op code histograms for each method for (Map.Entry<String, DexMethod> me : dexFile.getMethodDescriptorToMethod().entrySet()) { String methodDescriptor = me.getKey(); // E.g. Lit/googleandroid/updater/a;->a(Ljava/lang/String;)Ljava/lang/String; op counts System.out.println(methodDescriptor + " op counts"); DexMethod dexMethod = me.getValue(); TObjectIntIterator<Opcode> opIter = dexMethod.getOpCounts().iterator(); while (opIter.hasNext()) { opIter.advance(); // E.g. MOVE_RESULT_OBJECT: 46 System.out.println(" " + opIter.key() + ": " + opIter.value()); } } }
  • 20. ANDROID MANIFEST • Has main launcher activity • No launcher implies no user interaction • Number of activity package paths • Malicious activities injected? • Permissions / number of permissions • Good clue what app may do
  • 21. APKID FEATURES • “PEiD for Android” - detects compilers, packers, … • Compiler - dx (native) / dexlib (modified) • Anti-VM strings - avoiding VM analysis • Build.MANUFACTURER, SIM operator, device ID, subscriber ID • Detecting Pirated and Malicious Android Apps with APKiD
 rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/
  • 22. STRINGS • Number of gibberish strings • Find weird certificate details • Find unusual obfuscation • Using Markov Chains for Android Malware Detection
 calebfenton.github.io/2017/08/23/using-markov-chains-for-android-malware-detection/
  • 24. TIPS • Most guides are for toy data sets • No one talks about large data set problems • Everyone assumes you have a dense matrix • Assuming sklearn, but applies to other libs
  • 25. PREPARING DATA • Normalization is important • Scale with MaxAbs or MinMax if many 0s • Needed for some algorithms (not decision trees) • Needed for dropping invariant features • Drop invariant features • Reduces chance of overfitting • Example: file hash, app label, rare API calls
  • 26. SELECTING FEATURES • Score features and plot scores to build intuition • Usually long tail of useless features • Gives ideas for new features • Top 100 features almost as good as top 1000 • Run experiments with subsets of features • Improves speed • Only interested in relative differences
  • 27. BUILDING MODELS • Grid search to find best algorithms and parameters • Iterate on several, smaller searches • Decision tree ensembles aren’t hip, but work well
 sentinelone.com/blog/detecting-malware-pre-execution-static-analysis-machine-learning/ • Build and blend multiple models
 sentinelone.com/blog/measuring-the-usefulness-of-multiple-models/ • Feature Selection and Grid Searching Hyper-parameters
 gist.github.com/CalebFenton/66aa04af7b4a4d98efca059cb8c2e7aa
  • 29. REDNAGA 08.24.2017 THANKS! Dead Drop SF CALEB FENTON @CALEB_FENTON QUESTIONS?