Storage security in a critical enterprise OpenStack environment

Storage security in a critical
enterprise OpenStack environment
Danny Al-Gaaf (Deutsche Telekom AG), Sage Weil (Red Hat)
OpenStack Summit 2015 - Vancouver

● Secure NFV cloud at DT
● Attack surface
● Proactive countermeasures
○ Setup
○ Vulnerability prevention
○ Breach mitigation
● Reactive countermeasures
○ 0-days, CVEs
○ Security support SLA and lifecycle
● Conclusions
Overview
2

NFV Cloud @ Deutsche Telekom
● Datacenter design
○ BDCs
■ few but classic DCs
■ high SLAs for infrastructure and services
■ for private/customer data and services
○ FDCs
■ small but many
■ near to the customer
■ lower SLAs, can fail at any time
■ services:
● spread over many FDCs
● failures are handled by services and not the infrastructure
4

High Security Requirements
● Multiple security placement zones (PZ)
○ e.g. EHD, DMZ, MZ, SEC, Management
○ TelcoWG “Security Segregation” use case
● Separation required for:
○ compute
○ networks
○ storage
● Protect against many attack vectors
● Enforced and reviewed by security department
● Run telco core services @ OpenStack/KVM/Ceph
5

Solutions for telco services
● Separation between security zones needed
● Physical separation
○ Large number of clusters (>100)
○ Large hardware demand (compute and storage)
○ High maintenance effort
○ Less flexibility
● RADOS pool separation
○ Much more flexible
○ Efficient use of hardware
● Question:
○ Can we get the same security as physical separation?
8

Placement Zones
● Separate RADOS pool(s) for each security zone
○ Limit access using Ceph capabilities
● OpenStack AZs as PZs
● Cinder
○ Configure one backend/volume type per pool (with own key)
○ Need to map between AZs and volume types via policy
● Glance
○ Lacks separation between control and compute/storage layer
○ Separate read-only vs management endpoints
● Manila
○ Currently not planned to use in production with CephFS
○ May use RBD via NFS
9

RadosGW attack surface
● S3/Swift
○ Network access to gateway
only
○ No direct access for consumer
to other Ceph daemons
● Single API attack surface
11

RBD attack surface
● Protection from hypervisor
block layer
○ No network access or CephX
keys needed at guest level
● Issue:
○ hypervisor is software and
therefore not 100% secure…
■ e.g., Venom!
12

Host attack surface
● If KVM is compromised, the attacker ...
○ has access to neighbor VMs
○ has access to local Ceph keys
○ has access to Ceph public network and Ceph daemons
● Firewalls, deep packet inspection (DPI), ...
○ partly impractical due to used protocols
○ implications to performance and cost
● Bottom line: Ceph daemons must resist attack
○ C/C++ is harder to secure than e.g. Python
○ Homogenous: if one daemon is vulnerable, all in the cluster are!
○ Risk of denial-of-service
13

Network attack surface
● Client/cluster sessions are not encrypted
○ Sniffer can recover any data read or written
● Sessions are authenticated
○ Attacker cannot impersonate clients or servers
○ Attacker cannot mount man-in-the-middle attacks
14

Denial of Service
● Scenarios
○ Submit many / large / expensive IOs
■ use qemu IO throttling!
○ Open many connections
○ Use flaws to crash Ceph daemons
○ Identify non-obvious but expensive features of client/OSD interface
15

Deployment and Setup
● Network
○ Always use separated cluster and public net
○ Always separate your control nodes from other networks
○ Don’t expose to the open internet
○ Encrypt inter-datacenter traffic
● Avoid hyper-converged infrastructure
○ Isolate compute and storage resources
○ Scale them independently
○ Risk mitigation if daemons are compromised or DoS’d
○ Don’t mix
■ compute and storage
■ control nodes (OpenStack and Ceph)
17

Deploying RadosGW
● Big and easy target through
HTTP(S) protocol
● Small appliance per tenant with
○ Separate network
○ SSL terminated proxy forwarding
requests to radosgw
○ WAF (mod_security) to filter
○ Placed in secure/managed zone
● Don’t share buckets/users
between tenants
18

Ceph security: CephX
● Monitors are trusted key servers
○ Store copies of all entity keys
○ Each key has an associated “capability”
■ Plaintext description of what the key user is
allowed to do
● What you get
○ Mutual authentication of client + server
○ Extensible authorization w/ “capabilities”
○ Protection from man-in-the-middle, TCP
session hijacking
● What you don’t get
○ Secrecy (encryption over the wire)
19

Ceph security: CephX take-aways
● Monitors must be secured
○ Protect the key database
● Key management is important
○ Separate key for each Cinder backend/AZ
○ Restrict capabilities associated with each key
○ Limit administrators’ power
■ use ‘allow profile admin’ and ‘allow profile readonly’
■ restrict role-definer or ‘allow *’ keys
○ Careful key distribution (Ceph and OpenStack nodes)
● To do:
○ Thorough CephX code review by security experts
○ Audit OpenStack deployment tools’ key distribution
○ Improve security documentation20

● Static Code Analysis (SCA)
○ Buffer overflows and other code flaws
○ Regular Coverity scans
■ 996 fixed, 284 dismissed; 420 outstanding
■ defect density 0.97
○ cppcheck
○ LLVM: clang/scan-build
● Runtime analysis
○ valgrind memcheck
● Plan
○ Reduce backlog of low-priority issues (e.g., issues in test code)
○ Automated reporting of new SCA issues on pull requests
○ Improve code reviewer awareness of security defects
Preventing Breaches - Defects
21

● Pen-testing
○ human attempt to subvert security, generally guided by code review
● Fuzz testing
○ computer attempt to subvert or crash, by feeding garbage input
● Harden build
○ -fpie -fpic
○ -D_FORTIFY_SOURCE=2 -O2 (?)
○ -stack-protector=strong
○ -Wl,-z,relro,-z,now
○ Check for performance regression!
Preventing Breaches - Hardening
22

Mitigating Breaches
● Run non-root daemons
○ Prevent escalating privileges to get root
○ Run as ‘ceph’ user and group
○ Pending for Infernalis
● MAC
○ SELinux / AppArmor
○ Profiles for daemons and tools planned for Infernalis
● Run (some) daemons in VMs or containers
○ Monitor and RGW - less resource intensive
○ MDS - maybe
○ OSD - prefers direct access to hardware
● Separate mon admin network
23

Encryption: Data at Rest
● Ceph-disk tool supports dm-crypt
○ Encrypt raw block device (OSD and journal)
○ Allow disks to be safely discarded if key remains secret
● Key management is still very simple
○ Encryption key stored on disk via LUKS
○ LUKS key stored in /etc/ceph/keys
● Plan
○ Petera, a new key escrow project from Red Hat
■ https://github.com/npmccallum/petera
○ Alternative: simple key management via monitor
24

● Goal
○ Protect data from someone listening in on network
○ Protect administrator sessions configuring client keys
● Plan
○ Generate per-session keys based on existing tickets
○ Selectively encrypt monitor administrator sessions
Encryption: On Wire
25

● Limit load from client
○ Use qemu IO throttling features - set safe upper bound
● To do:
○ Limit max open sockets per OSD
○ Limit max open sockets per source IP
■ handle on Ceph or in the network layer?
○ Throttle operations per-session or per-client (vs just globally)?
Denial of Service attacks
26

CephFS
● No standard virtualization layer (unlike block)
○ Proxy through gateway (NFS?)
○ Filesystem passthrough (9p/virtfs) to host
○ Allow direct access from tenant VM
● Granularity of access control is harder
○ No simple mapping to RADOS objects
● Work in progress
○ root_squash
○ Restrict mount to subtree
○ Restrict mount to user
27

● Community
○ Single point of contact: security@ceph.com
■ Core development team
■ Red Hat, SUSE, Canonical security teams
○ Security related fixes are prioritized and backported
○ Releases may be accelerated on ad hoc basis
○ Security advisories to ceph-announce@ceph.com
● Red Hat Ceph
○ Strict SLA on issues raised with Red Hat security team
○ Escalation process to Ceph developers
○ Red Hat security team drives CVE process
○ Hot fixes distributed via Red Hat’s CDN
Reactive Security Process
29

Detecting and Preventing Breaches
● Brute force attacks
○ Good logging of any failed authentication
○ Monitoring easy via existing tools like e.g. Nagios
● To do:
○ Automatic blacklisting IPs/clients after n-failed attempts on Ceph level
● Unauthorized injection of keys
○ Monitor the audit log
■ trigger alerts for auth events -> monitoring
○ Periodic comparison with signed backup of auth database?
30

Summary
● Reactive processes are in place
○ security@ceph.com, CVEs, downstream product updates, etc.
● Proactive measures in progress
○ Code quality improves (SCA, etc.)
○ Unprivileged daemons
○ MAC (SELinux, AppArmor)
○ Encryption
● Progress defining security best-practices
○ Document best practices for security
● Ongoing process
32

Get involved !
● OpenStack
○ Telco Working Group
■ #openstack-nfv
○ Cinder, Glance, Manila, ...
● Ceph
○ https://ceph.com/community/contribute/
○ ceph-devel@vger.kernel.org
○ IRC: OFTC
■ #ceph,
■ #ceph-devel
○ Ceph Developer Summit
33

danny.al-gaaf@telekom.de
sage@redhat.com
dalgaaf
sage
Danny Al-Gaaf
Senior Cloud Technologist
Sage Weil
Ceph Principal Architect
IRC
THANK YOU!

Storage security in a critical enterprise OpenStack environment

Recomendados

Recomendados

Más contenido relacionado

Más de Sage Weil

Más de Sage Weil (8)

Último

Último (20)

Storage security in a critical enterprise OpenStack environment