Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

2021_TLSH_SOC_pub.pdf

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
rspamd-fosdem
rspamd-fosdem
Cargando en…3
×

Eche un vistazo a continuación

1 de 35 Anuncio

Más Contenido Relacionado

Similares a 2021_TLSH_SOC_pub.pdf (20)

Anuncio

Más reciente (20)

2021_TLSH_SOC_pub.pdf

  1. 1. TLSH for the SOC Jonathan Oliver
  2. 2. About Me • Data Scientist at TrendMicro • PhD at Monash University • Data Mining consultant for NASA and FAA • Data Scientist at Mailfrontier • Inventor TLSH • Adjunct Professor at University of Queensland
  3. 3. This Talk What? • TLSH Tools for processing malware • Data derived from Malware Bazaar Why? • Label new / unknown samples How? • Clustering Malware Bazaar using standard ML tools • (HAC-T / DBSCAN) • Visualization of clusters (from Malware Bazaar)
  4. 4. Quick Intro to TLSH • Trendmicro Locality Sensitive Hash • pip install py-tlsh • Open source code at https://github.com/trendmicro/tlsh • Fuzzy Hash • With advantages from Machine Learning • Works with Sklearn, Jupyter Notebooks and DBSCAN • Adopted by VirusTotal • Adopted by Malware Bazaar • A part of the STIX standard
  5. 5. What do TLSH look like? chrome.exe SHA256:c70b8cbb2ac962b343535454e4f2bcb3e48d83a04792c64bc768d59b3c1bf403 T11c159d11f445c1b7e5b211b2d879ba71467cbc28832641db63987e1a3db03d23a3b6db T1c4159d11f445c1b7d5b211b2d47dba71467cbc28832a40db63987e1a3eb43d22a3b6db chrome.exe SHA256:723aa4a407160bd99430de690f1f0d34af4a6622e2c44fe95be3bda3d7c344b3
  6. 6. Distance Calculation T11c159d11f445c1b7e5b211b2d879ba71467cbc28832641db63987e1a3db03d23a3b6db T1c4159d11f445c1b7d5b211b2d47dba71467cbc28832a40db63987e1a3eb43d22a3b6db 1 1 3 3 3 Total Distance = 11 0-30 Very Close Match 31-60 Close Match 61-100 Possible Match
  7. 7. Malware Bazaar
  8. 8. Malware Bazaar As of 17 Sept 2021, Malware Bazaar https://bazaar.abuse.ch/ has a dataset with • 389300 samples • 323709 samples have a label We have clustered this dataset and found 16452 clusters https://github.com/trendmicro/tlsh/tree/master/tlshCluster/malbaz
  9. 9. Use Cases / Motivation
  10. 10. Typical Use Case
  11. 11. Demo (1) • Clustered Malware Bazaar • Cluster output and pattern file from 2021-09-17 provided at • https://github.com/trendmicro/tlsh/tree/master/tlshCluster/malbaz • Use this to predict the malware family of Malware Bazaar 2021-09-18
  12. 12. Demo (1)
  13. 13. Demo (1)
  14. 14. Demo (1): Predicting Signature • Difficult task as there are 592 distinct signatures in Malware Bazaar • Associated 164 / 246 samples to clusters. • We split the predictions into 3 categories • Correct Signature 132/164 • Incorrect 13/164 • Inconclusive 19/164
  15. 15. Demo (1): Uses in the SOC • Automatic labelling of unknown samples • Scalable • Suitable for Automation • Associates unknown samples with similar historical samples • Understand scope of the threat • YARA rules • … ÞTake suitable action
  16. 16. Demo (2) • Understanding Clustering • Dendrograms for malware • See https://github.com/trendmicro/tlsh/blob/master/tlshCluster/malbaz.ipynb
  17. 17. Digging Deeper • Why TLSH is the way that it is. • Why it uses kskip-grams • Comparison of TLSH with other Similarity Digests • Comparison of Clustering Methods
  18. 18. Why K-skip-grams? • Work on short strings / files • Hard to attack
  19. 19. Kskip Ngrams Data: Ngram Features (N=4) ABCD BCDE CDEF DEFG EFGH FGHI GHIJ Kskip-Ngram N=4 K=2 AB AC AD BC BD BE CD CE CF DE DF DG EF EG EH FG FH FI GH GI GJ HI HJ IJ A B C D E F G H I J
  20. 20. Selecting K and N for Kskip-Ngrams Computational Complexity(low score is good) K=5 21 K=4 15 35 K=3 10 20 35 K=2 6 10 15 21 K=1 3 4 5 6 7 K=0 (Ngram) 1 1 1 1 1 1 N=3 N=4 N=5 N=6 N=7 N=8 …
  21. 21. Kskip-Ngram versus Ngrams GAN-like experiment Real World Data Adversarial Agent Discriminator Match No Match
  22. 22. Selecting K and N for Kskip-Ngrams Adversarial Agent (Search Width = 15) (low score is good) K=5 7.5 K=4 11.3 K=3 13.7 K=2 16.1 K=1 16.0 K=0 (Ngram) 25.4 31.2 32 43.4 57.4 N=3 N=4 N=5 N=6 N=7 N=8 …
  23. 23. Selecting K and N for Kskip-Ngrams Accuracy
  24. 24. Comparing LSH / Similarity Digests
  25. 25. Ref: Mar)n-Perez et al. “Bringing order to approximate matching: Classifica?on and a@acks on similarity digest algorithms”
  26. 26. Metric Trees for Nearest Neighbor Search Nodes contain (item, distance)
  27. 27. Metric Trees: Do not work for (bounded) Similarity Measures
  28. 28. Comparing Clustering Approaches
  29. 29. Types of Clustering • Similarity of the files • Fuzzy Hashes • Feature based • Deep Learning • YARA Rules • Apply a pattern (Smart pattern) • Sandbox / behavioural analysis • …
  30. 30. Fuzzy Hashes • Cryptographic Hashes: • Any change completely changes the hash • Useful for collecting evidence • Fuzzy Hashes: • Have the convenience of cryptographic hashes • Can measure the Similarity between files • Speed and Scale
  31. 31. Potential Issues with Clustering • Scale • Does the method scale up to 10 million / 100 million files? • Access to the file • Does the method need to process the file? • Manual effort • Packers • Multiple malware families may use the same packer • Some methods will distinguish; other methods will not
  32. 32. Category Technique Speed / Scale Access to file Manual effort Can separate families that share a packer Similarity Fuzzy Hash Fast No No No Feature based ML Slow Yes Features No Deep Learning Slow Yes Network ? YARA rules Medium Yes Yes Yes Smart Pattern Fast Yes Yes Yes Sandbox / Behavioral Slow Yes No Yes
  33. 33. Clustering Solutions • Use multiple methods of clustering • Split clustering / categorization into phases 1. Large scale / quick / cheap • Fuzzy hashes (TLSH) are ideal 2. When needed, use more expensive methods • Extensive security knowledge required • Sandboxes • Smart Patterns • YARA rules • Deep Learning • etc
  34. 34. Conclusion • Get the tools. • pip install py-tlsh • Open Source (Apache license) • https://github.com/trendmicro/tlsh • Fuzzy Hashes / TLSH / Telfhash are really useful tools • Working with huge databases • Use standard dev-ops / ML tools for malware • Jupyter notebooks • Sklearn • DBSCAN • Dendrograms for visualizing clustering
  35. 35. Resources • TLSH • https://github.com/trendmicro/tlsh • Papers on TLSH • http://tlsh.org/papers.html • Malware Bazaar • https://bazaar.abuse.ch/ Thanks to University of Queensland

×