Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
1
AutoBLG: Automatic URL Blacklist
Generator Using Search Space
Expansion and Filters 	
Bo Sun1,Mitsuaki Akiyama2,Takeshi ...
Background(1)	
•  The estimated number of drive-by-download
attacks is 4.3 M per day
	
2	
  
7%
93%
The	
  number	
  of	
 ...
Background(2)	
•  What is Drive-by-download attack
3	
  
user	
Landing page URL       	
 Exploit URL
  	
Malware download ...
Background(3)	
•  What is URL Blacklist	
4	
  
user	
Landing page URL         	
 Exploit URL
 	
Malware download URL
Landi...
Background(4)	
•  However, URL Blacklist cannot cope with previously
unseen malicious URLs
•  It is crucial to keep the UR...
Background(5)	
6	
  
30 trillion
unique URLs
Wild Internet	
Web client honeypot	
Scan
7	
  
Goal	
• Our main objective is to accelerate the process of
generating a URL blacklist automatically.
Idea	
Existing
...
AutoBLG Framework	
•  Three primary components:
	
8	
  
Img	
  from	
  h0p://www.itguyswa.com.au/free-­‐anJvirus-­‐protecJ...
URL Expansion(1)Seed	
9	
  
http://2339XXX.net/main
http://auth.veXXXXX.com
Seed	
Pre-
processing	
Passive DNS
Database	
S...
URL Expansion(2)Pre-processing	
10	
  
11X.5X.1XX.XX4
2X.XXX.X99.X2
Seed	
Pre-
processing	
Passive DNS
Database	
Search
En...
URL Expansion(3)
Passive DNS Database	
11	
  
sediscoXXXXXX.gruXXX.com
vorXXXXXXX.zdjecXXXki.com
Seed	
Pre-
processing	
Pa...
URL Expansion(4)
Search Engine and Web Crawler	
12	
  
http://100XXXXXwebcam.bXXX.pl/island-XXX-wXX.html
http://100XXXXXwe...
URL Filrtation	
13	
  Img from http://www.primalsecurity.net/0xc-python-tutorial-python-malware/	
Existing Malicious URLs	...
URL Verification 	
•  Three tools for verification of drive-by-
download attacks  	
Ø Web Client honeypot Marionette
Ø A...
Performance Evaluation	
15	
  
•  The number of URL Expansion data: 59,394
•  No URL Filtration: more than 100 hours
•  UR...
Results(1)	
16	
  
Web client
honeypot	
Antivirus software	
 Virustotal	
1.16%	
 3.8%	
 16.5%	
•  Web Client Honeypot : de...
Results(2)	
•  some URLs are identified by multiple tools
•  After eliminating duplications, of the 600 of extracted URLs,...
Limitation and future work

	
18	
  
Item	
 Limitation	
 Future work	
Search Engine	
 Only get Top-50 search
results
To ac...
Summary	
•  We have proposed the AutoBLG framework
Ø  light-weight
Ø  new and previously unknown drive-by-download URLs
...
Thank you for your listening
	
20
URL Filtration(1)Feature Extraction	
21	
  
HTML Feature	
 Difference with pervious works	
The number of elements with a s...
URL Filtration(2)Similarity Search 	
22	
  
Similarity Search:
Bayesian Sets
From web
space	
Toyota
Nissan
Honda	
BMW
Ford...
The range of experiment	
23	
  
Preliminary
Experiment
Performance Evaluation	
URL Expansion	
 URL Filtration	
 URL
Verifi...
Preliminary Experiment 	
24	
  
100
101
102
103
Top-K URLs
0
1
2
3
ThenumberofMaliciousURLs
Query Pattern1
Query Pattern2
...
Próxima SlideShare
Cargando en…5
×

AutoBLG by Sun Bo

2.509 visualizaciones

Publicado el

This is a talk presented at IEEE ISCC 2015.

Publicado en: Ingeniería
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

AutoBLG by Sun Bo

  1. 1. 1 AutoBLG: Automatic URL Blacklist Generator Using Search Space Expansion and Filters Bo Sun1,Mitsuaki Akiyama2,Takeshi Yagi2,     Mitsuhiro Hatada1,Tatsuya Mori1 1,Waseda University 2,NTT Secure Platform Laboratories IEEE  ISCC  2015
  2. 2. Background(1) •  The estimated number of drive-by-download attacks is 4.3 M per day 2   7% 93% The  number  of  web-­‐based  a1acks   other  a0acks   drive-­‐by-­‐download  a0ack  
  3. 3. Background(2) •  What is Drive-by-download attack 3   user Landing page URL        Exploit URL    Malware download URL     download malware automatically exploit vulnerabilities Click on URL
  4. 4. Background(3) •  What is URL Blacklist 4   user Landing page URL          Exploit URL   Malware download URL Landing page URL Exploit URL URL Blacklist Matching Block Malware download URL
  5. 5. Background(4) •  However, URL Blacklist cannot cope with previously unseen malicious URLs •  It is crucial to keep the URLs updated to make a URL blacklist effective 5   To collect fresh malicious URLs
  6. 6. Background(5) 6   30 trillion unique URLs Wild Internet Web client honeypot Scan
  7. 7. 7   Goal • Our main objective is to accelerate the process of generating a URL blacklist automatically. Idea Existing Malicious URLs New Malicious URLs Search Space Filter (Machine Learning) Expansion Reduction Input: Output:
  8. 8. AutoBLG Framework •  Three primary components: 8   Img  from  h0p://www.itguyswa.com.au/free-­‐anJvirus-­‐protecJon/                                      h0ps://www.virustotal.com/ja/                                      h0p://www.soumu.go.jp/main_content/000174846.pdf 8   URL Expansion URL Flirtation 8   URL Verification
  9. 9. URL Expansion(1)Seed 9   http://2339XXX.net/main http://auth.veXXXXX.com Seed Pre- processing Passive DNS Database Search Engine Web Crawler
  10. 10. URL Expansion(2)Pre-processing 10   11X.5X.1XX.XX4 2X.XXX.X99.X2 Seed Pre- processing Passive DNS Database Search Engine Web Crawler
  11. 11. URL Expansion(3) Passive DNS Database 11   sediscoXXXXXX.gruXXX.com vorXXXXXXX.zdjecXXXki.com Seed Pre- processing Passive DNS Database Search Engine Web Crawler
  12. 12. URL Expansion(4) Search Engine and Web Crawler 12   http://100XXXXXwebcam.bXXX.pl/island-XXX-wXX.html http://100XXXXXwebcam.bXXX.pl/isteam-XXXX.html Seed Pre- processing Passive DNS Database Search Engine Web Crawler
  13. 13. URL Filrtation 13  Img from http://www.primalsecurity.net/0xc-python-tutorial-python-malware/ Existing Malicious URLs Unknown URLs Similarity Search HTML Features Bayesian sets
  14. 14. URL Verification •  Three tools for verification of drive-by- download attacks   Ø Web Client honeypot Marionette Ø Antivirus Software Ø Virustotal online service 14  
  15. 15. Performance Evaluation 15   •  The number of URL Expansion data: 59,394 •  No URL Filtration: more than 100 hours •  URL Filtration in use: approximately 6 hours To accelerate the process of generating blacklist URLs by adopting a high performance filter
  16. 16. Results(1) 16   Web client honeypot Antivirus software Virustotal 1.16% 3.8% 16.5% •  Web Client Honeypot : definitely malicious Ø  it contained redirecting to the exploit web pages •  Antivirus Software : highly suspicious Ø  they contained several HTTP objects that were detected by the antivirus checkers; (malicious JavaScript or executable malware) •  VirusTotal : suspicious Ø  need further manual inspection
  17. 17. Results(2) •  some URLs are identified by multiple tools •  After eliminating duplications, of the 600 of extracted URLs, 106 URLs were detected as malicious or suspicious •  Of the discovered 106 URLs, seven URLs are completely new URLs that have not been listed in the VirusTotal 17  
  18. 18. Limitation and future work
 18   Item Limitation Future work Search Engine Only get Top-50 search results To accelerate web search engine process Web Crawler evaded by ‘cloaking techniques’ To develop more sophisticated tools Query Pattern Miss several malicious URLs To increase the number of query patterns URL Verification Only two version of browser or plug-in To adopt a low- interaction honeypot Online operation Not fully online due to URL Expansion part To pipeline URL expansion step
  19. 19. Summary •  We have proposed the AutoBLG framework Ø  light-weight Ø  new and previously unknown drive-by-download URLs Ø  other suspicious URLs that need for further analysis   •  Key ideas Ø  the use of search space expansion and filters •  We proposed a high-performance filter Ø  it reduced number of URLs to be investigated with the dynamic analysis systems by 99% Ø  while successfully finding new URLs that have not been listed in the widely used popular URL reputation system 19  
  20. 20. Thank you for your listening 20
  21. 21. URL Filtration(1)Feature Extraction 21   HTML Feature Difference with pervious works The number of elements with a small area Frameset tags border,frameborder,framespacing The number of suspicious word in the script’s content some strings such as shellcode ,shcode. The number of URLs with a different domain Only count URL with different domain.     The number of iframe and frame tags       same The number of hidden elements The number of meta refresh tags The number of out-of-place elements The number of embed and object tags The presence of unescape behavior The number of setTimeout functions
  22. 22. URL Filtration(2)Similarity Search 22   Similarity Search: Bayesian Sets From web space Toyota Nissan Honda BMW Ford Audi Mitsubishi Mazda Volkswagen Google Sets From all unknown URLs Adopting several existing malicious URL as query (Malicious URLs that are created with same Exploit Kit) To output all URLs’ Score in descending order. The higher score is, the more probably URL is Malicious 22  
  23. 23. The range of experiment 23   Preliminary Experiment Performance Evaluation URL Expansion URL Filtration URL Verification •Commercial blacklist •Pre-processing •Passive DNS database •Search Engine •Web crawler •Feature Extraction •Similarity Search •Web Client Honeypot •Antivirus Software •VirusTotal Steps in URL Expansion Steps in URL Filtration Tools in URL Verification 23  
  24. 24. Preliminary Experiment 24   100 101 102 103 Top-K URLs 0 1 2 3 ThenumberofMaliciousURLs Query Pattern1 Query Pattern2 •  Experiment Data Ø  The number of benign URLs:10,000 Ø  The number of malicious URLs:6 •  Experiment Result Ø  The two query patterns identify different three malicious URLs in top 300 scores respectively and extract all the six malicious URLs totally Ø  we considered the top 300 scores as the   threshold for URL filtration. 24  

×