Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation with eBPF

57 visualizaciones

Publicado el

Matt Carroll
Infrastructure Security Engineer at Yelp

"Attestation is hard" is something you might hear from security researchers tracking nation states and APTs, but it's actually pretty true for most network-connected systems!

Modern deployment methodologies mean that disparate teams create workloads for shared worker-hosts (ranging from Jenkins to Kubernetes and all the other orchestrators and CI tools in-between), meaning that at any given moment your hosts could be running any one of a number of services, connecting to who-knows-what on the internet.

So when your network-based intrusion detection system (IDS) opaquely declares that one of these machines has made an "anomalous" network connection, how do you even determine if it's business as usual? Sure you can log on to the host to try and figure it out, but (in case you hadn't noticed) computers are pretty fast these days, and once the connection is closed it might as well not have happened... Assuming it wasn't actually a reverse shell...

At Yelp we turned to the Linux kernel to tell us whodunit! Utilizing the Linux kernel's eBPF subsystem - an in-kernel VM with syscall hooking capabilities - we're able to aggregate metadata about the calling process tree for any internet-bound TCP connection by filtering IPs and ports in-kernel and enriching with process tree information in userland. The result is "pidtree-bcc": a supplementary IDS. Now whenever there's an alert for a suspicious connection, we just search for it in our SIEM (spoiler alert: it's nearly always an engineer doing something "innovative")! And the cherry on top? It's stupid fast with negligible overhead, creating a much higher signal-to-noise ratio than the kernels firehose-like audit subsystems.

This talk will look at how you can tune the signal-to-noise ratio of your IDS by making it reflect your business logic and common usage patterns, get more work done by reducing MTTR for false positives, use eBPF and the kernel to do all the hard work for you, accidentally load test your new IDS by not filtering all RFC-1918 addresses, and abuse Docker to get to production ASAP!

As well as looking at some of the technologies that the kernel puts at your disposal, this talk will also tell pidtree-bcc's road from hackathon project to production system and how focus on demonstrating business value early on allowed the organization to give us buy-in to build and deploy a brand new project from scratch.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation with eBPF

  1. 1. London | 14-15 November 2019 A Kernel of Truth Matt Carroll
  2. 2. Matt Carroll @grimmware A Kernel of Truth Intrusion Detection and Attestation with eBPF
  3. 3. ● Matt Carroll ○ @grimmware ○ ● Infrastructure Security Engineer at Yelp ● Ex-SRE (like a sysadmin but with more yaml) ● Hand-wringing Linux botherer Who am I?
  4. 4. ● We built a supplementary* IDS and it’s pretty cool! ● Utilizing OS features as security features ● Told in (roughly) the order it happened. What is this about?
  5. 5. ● How to get a greenfield security project off the ground ○ Treating defensive security like economics ○ Gluing together extant technologies to bootstrap custom security tools ○ Using your business logic to maximize signal vs noise What is this about?
  6. 6. Yelp’s Mission Connecting people with great local businesses.
  7. 7. ● Built on Mesos + Marathon + Docker ● More recently migration towards k8s ● Majority of our workloads run here ● What are they all doing??? PaaSTA
  8. 8. Network IDS: Amazon GuardDuty
  9. 9. Kind of unsurprising, also pretty unhelpful...
  10. 10. Welp...
  11. 11. Uuuhhh… 🤔
  12. 12. WHAAAAAAA 😱
  13. 13. ● What host class connected? ● What IP/ASN did it connect to? ● What’s on the other end? ● How long was the connection? ● What direction? ● How many bytes were transferred? ● What did the pslogs say? Attestation From Inference
  14. 14. ● What host class connected? ● What IP/ASN did it connect to? ● What’s on the other end? ● How long was the connection? ● What direction? ● How many bytes were transferred? ● What did the pslogs say? Attestation From Inference lol jk
  15. 15. Context is lost as soon as the instantiating process ends
  16. 16. What if we could reduce MTTR for false positives?
  17. 17. ● When a GuardDuty alert fires I want to be able to determine if it’s a false-positive quickly ● Only for GuardDuty traffic (not internal to our VPCs) ● Only for outbound TCP (i.e. non-RFC1918) ● I want the entire calling process tree so I can see full local causality ● Include process ownership information ● Must not require workload tooling The problem space
  18. 18. eBPF!
  19. 19. eBPF!
  20. 20. ● “Berkeley Packet Filter” from BSD ● An in-kernel VM accessed as a device (/dev/bpf) ● Limited number of registers ● No loops (to prevent kernel deadlocking) ● Used for packet filtering BPF
  21. 21. ● An in-kernel VM in Linux (and now FreeBSD!) ● It’s “extended”! ● Moar registers than BPF ● Used for hooking syscalls, tracing, proxying sockets, and (you guessed) in-kernel packet filtering ○ Can actually offload to some NICs! ● In our case, dispatching kprobes for the tcp_v4_connect syscall eBPF
  22. 22. Enjoy writing your filters as an array of BPF VM instructions...
  23. 23. bcc + psutil = PROFIT???
  24. 24. bcc + psutil = PROFIT??? ✅
  25. 25. How it works sd 54321 for each syscall...
  26. 26. ● Filters in-kernel from Jinja2 templates which iterate over subnets in YAML configuration ● Events that don’t get filtered out are passed to userland Python daemon ● psutil used to crawl process tree to init and log alongside other metadata
  27. 27. The End.
  28. 28. Except it was a hackathon project so all it did was print events to stdout and could only match classful networks and I developed it on my personal laptop.
  29. 29. The Road To Production
  30. 30. Don’t try to be clever with bitwise network matching
  31. 31. ● I realised only the classful networks worked because of the byte boundaries ● Don’t try to do clever bitwise shifting with the mask length ● Endianness and byte ordering between network and host don’t work how you think they do ● No srs Matching all CIDRs
  32. 32. ● A coworker was trying to figure out which batch jobs were accessing a service for a data auth project ● He asked me if we could match ports ● I said I’d have it: ○ Matching ports ○ Dockerized for adhoc usage ○ By the next day ● The next day he found all unauthenticated clients. Dockerizing for debugging
  33. 33. ● Contains python2.7 and dependencies (sorry) ● Needs some setup at runtime ● Volume mount /etc/passwd for uid mapping ● Not your typical flags: ○ --privileged ○ --cap-add sys_admin ○ --pid host ● Don’t worry I am a professional probably. pidtree-bcc in Docker
  34. 34. ● We run our own PaaS called PaaSTA which uses Docker as containerizer ● Runs the vast majority of our workloads ● Can pull-from-registry and run in a systemd unit file without further setup ● Don’t have to install dependencies (inc. LLVM, python2) ● Get coverage quickly Opportunistic deploy with Docker
  35. 35. ● Previous projects with goaudit meant we already had a secure logging pipeline for reading a FIFO and outputting to Amazon Kinesis ○ syslog2kinesis adds other Yelpy metadata (e.g. hostname, environment, Puppet role...) ● Originally fed to our Logstash => Elasticsearch SIEM ● Migrated to Kinesis Firehose => Splunk this quarter <3 Log aggregation
  36. 36. ● Better to ask forgiveness than permission... ● Rolled out to two security devboxes and watched the logs roll in! ● Negligible performance impact!!! ○ As postulated, cost of subnet filtering << cost of instantiating a TCP connection ● Lots of connections out to public Amazon IPs creating a lot of noise Dip Test
  37. 37. If only Amazon maintained some kind of list of their public prefixes...
  38. 38. Surely you can’t load ~200 netblocks into the kernel and compare all non- RFC1918 tcp_v4_connect syscalls to them in a performant manner...
  39. 39. Surely you can’t load ~200 netblocks into the kernel and compare all non- RFC1918 tcp_v4_connect syscalls to them in a performant manner...
  40. 40. ● ~25,000 - ~50,000 messages per hour across dev and stage ● Once accidentally load-tested at ~80,000 messages in 5m from one host for several hours ● Nobody on the host noticed ● TCP connections are way more expensive than the filters! Load
  41. 41. ● bpf_trace_printk() -> BPF_PERF_OUTPUT() ○ Global (e.g. per-kernel) debug output with hand- hacked json and string manipulation ○ To structured data in a ring buffer ○ Multi-tenancy makes it a better utility and more testable! ● Added unit tests ● Adding integration tests ● Adding infrastructure for deploy in production environment Undoing my nasty hacks
  42. 42. ● De-containerize (e.g. debian package) ● Python3 ● Plugin for container awareness ○ Easy mapping to service and therefore owner! ● Enable immutable loginuid and add that to metadata ○ --loginuid-immutable under `man auditctl` ○ Cryptically says “but can cause some problems in certain kinds of containers” ● Threat modelling/hardening! Future work
  43. 43. ● Performance improvements ○ BPF longest-match maps ○ Pre-processing masks ○ Probably totally unnecessary ● Moar syscalls! ○ TCP listens, ipv6, UDP, SUID, forwarded SSH socket reads… ● SIEM tooling ○ ASN matching, bad IP matching, GuardDuty auto- enrichment... Future work
  44. 44. We're Hiring!
  45. 45. @YelpEngineering
  46. 46. London | 14-15 November 2019 @grimmware Thanks for listening!