SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Storage security in a critical
enterprise OpenStack environment
Danny Al-Gaaf (Deutsche Telekom AG), Sage Weil (Red Hat)
OpenStack Summit 2015 - Vancouver
● Secure NFV cloud at DT
● Attack surface
● Proactive countermeasures
○ Setup
○ Vulnerability prevention
○ Breach mitigation
● Reactive countermeasures
○ 0-days, CVEs
○ Security support SLA and lifecycle
● Conclusions
Overview
2
Secure NFV Cloud @ DT
NFV Cloud @ Deutsche Telekom
● Datacenter design
○ BDCs
■ few but classic DCs
■ high SLAs for infrastructure and services
■ for private/customer data and services
○ FDCs
■ small but many
■ near to the customer
■ lower SLAs, can fail at any time
■ services:
● spread over many FDCs
● failures are handled by services and not the infrastructure
4
High Security Requirements
● Multiple security placement zones (PZ)
○ e.g. EHD, DMZ, MZ, SEC, Management
○ TelcoWG “Security Segregation” use case
● Separation required for:
○ compute
○ networks
○ storage
● Protect against many attack vectors
● Enforced and reviewed by security department
● Run telco core services @ OpenStack/KVM/Ceph
5
Ceph and OpenStack
6
Ceph Architecture
7
Solutions for telco services
● Separation between security zones needed
● Physical separation
○ Large number of clusters (>100)
○ Large hardware demand (compute and storage)
○ High maintenance effort
○ Less flexibility
● RADOS pool separation
○ Much more flexible
○ Efficient use of hardware
● Question:
○ Can we get the same security as physical separation?
8
Placement Zones
● Separate RADOS pool(s) for each security zone
○ Limit access using Ceph capabilities
● OpenStack AZs as PZs
● Cinder
○ Configure one backend/volume type per pool (with own key)
○ Need to map between AZs and volume types via policy
● Glance
○ Lacks separation between control and compute/storage layer
○ Separate read-only vs management endpoints
● Manila
○ Currently not planned to use in production with CephFS
○ May use RBD via NFS
9
Attack Surface
RadosGW attack surface
● S3/Swift
○ Network access to gateway
only
○ No direct access for consumer
to other Ceph daemons
● Single API attack surface
11
RBD attack surface
● Protection from hypervisor
block layer
○ No network access or CephX
keys needed at guest level
● Issue:
○ hypervisor is software and
therefore not 100% secure…
■ e.g., Venom!
12
Host attack surface
● If KVM is compromised, the attacker ...
○ has access to neighbor VMs
○ has access to local Ceph keys
○ has access to Ceph public network and Ceph daemons
● Firewalls, deep packet inspection (DPI), ...
○ partly impractical due to used protocols
○ implications to performance and cost
● Bottom line: Ceph daemons must resist attack
○ C/C++ is harder to secure than e.g. Python
○ Homogenous: if one daemon is vulnerable, all in the cluster are!
○ Risk of denial-of-service
13
Network attack surface
● Client/cluster sessions are not encrypted
○ Sniffer can recover any data read or written
● Sessions are authenticated
○ Attacker cannot impersonate clients or servers
○ Attacker cannot mount man-in-the-middle attacks
14
Denial of Service
● Scenarios
○ Submit many / large / expensive IOs
■ use qemu IO throttling!
○ Open many connections
○ Use flaws to crash Ceph daemons
○ Identify non-obvious but expensive features of client/OSD interface
15
Proactive Countermeasures
Deployment and Setup
● Network
○ Always use separated cluster and public net
○ Always separate your control nodes from other networks
○ Don’t expose to the open internet
○ Encrypt inter-datacenter traffic
● Avoid hyper-converged infrastructure
○ Isolate compute and storage resources
○ Scale them independently
○ Risk mitigation if daemons are compromised or DoS’d
○ Don’t mix
■ compute and storage
■ control nodes (OpenStack and Ceph)
17
Deploying RadosGW
● Big and easy target through
HTTP(S) protocol
● Small appliance per tenant with
○ Separate network
○ SSL terminated proxy forwarding
requests to radosgw
○ WAF (mod_security) to filter
○ Placed in secure/managed zone
● Don’t share buckets/users
between tenants
18
Ceph security: CephX
● Monitors are trusted key servers
○ Store copies of all entity keys
○ Each key has an associated “capability”
■ Plaintext description of what the key user is
allowed to do
● What you get
○ Mutual authentication of client + server
○ Extensible authorization w/ “capabilities”
○ Protection from man-in-the-middle, TCP
session hijacking
● What you don’t get
○ Secrecy (encryption over the wire)
19
Ceph security: CephX take-aways
● Monitors must be secured
○ Protect the key database
● Key management is important
○ Separate key for each Cinder backend/AZ
○ Restrict capabilities associated with each key
○ Limit administrators’ power
■ use ‘allow profile admin’ and ‘allow profile readonly’
■ restrict role-definer or ‘allow *’ keys
○ Careful key distribution (Ceph and OpenStack nodes)
● To do:
○ Thorough CephX code review by security experts
○ Audit OpenStack deployment tools’ key distribution
○ Improve security documentation20
● Static Code Analysis (SCA)
○ Buffer overflows and other code flaws
○ Regular Coverity scans
■ 996 fixed, 284 dismissed; 420 outstanding
■ defect density 0.97
○ cppcheck
○ LLVM: clang/scan-build
● Runtime analysis
○ valgrind memcheck
● Plan
○ Reduce backlog of low-priority issues (e.g., issues in test code)
○ Automated reporting of new SCA issues on pull requests
○ Improve code reviewer awareness of security defects
Preventing Breaches - Defects
21
● Pen-testing
○ human attempt to subvert security, generally guided by code review
● Fuzz testing
○ computer attempt to subvert or crash, by feeding garbage input
● Harden build
○ -fpie -fpic
○ -D_FORTIFY_SOURCE=2 -O2 (?)
○ -stack-protector=strong
○ -Wl,-z,relro,-z,now
○ Check for performance regression!
Preventing Breaches - Hardening
22
Mitigating Breaches
● Run non-root daemons
○ Prevent escalating privileges to get root
○ Run as ‘ceph’ user and group
○ Pending for Infernalis
● MAC
○ SELinux / AppArmor
○ Profiles for daemons and tools planned for Infernalis
● Run (some) daemons in VMs or containers
○ Monitor and RGW - less resource intensive
○ MDS - maybe
○ OSD - prefers direct access to hardware
● Separate mon admin network
23
Encryption: Data at Rest
● Ceph-disk tool supports dm-crypt
○ Encrypt raw block device (OSD and journal)
○ Allow disks to be safely discarded if key remains secret
● Key management is still very simple
○ Encryption key stored on disk via LUKS
○ LUKS key stored in /etc/ceph/keys
● Plan
○ Petera, a new key escrow project from Red Hat
■ https://github.com/npmccallum/petera
○ Alternative: simple key management via monitor
24
● Goal
○ Protect data from someone listening in on network
○ Protect administrator sessions configuring client keys
● Plan
○ Generate per-session keys based on existing tickets
○ Selectively encrypt monitor administrator sessions
Encryption: On Wire
25
● Limit load from client
○ Use qemu IO throttling features - set safe upper bound
● To do:
○ Limit max open sockets per OSD
○ Limit max open sockets per source IP
■ handle on Ceph or in the network layer?
○ Throttle operations per-session or per-client (vs just globally)?
Denial of Service attacks
26
CephFS
● No standard virtualization layer (unlike block)
○ Proxy through gateway (NFS?)
○ Filesystem passthrough (9p/virtfs) to host
○ Allow direct access from tenant VM
● Granularity of access control is harder
○ No simple mapping to RADOS objects
● Work in progress
○ root_squash
○ Restrict mount to subtree
○ Restrict mount to user
27
Reactive Countermeasures
● Community
○ Single point of contact: security@ceph.com
■ Core development team
■ Red Hat, SUSE, Canonical security teams
○ Security related fixes are prioritized and backported
○ Releases may be accelerated on ad hoc basis
○ Security advisories to ceph-announce@ceph.com
● Red Hat Ceph
○ Strict SLA on issues raised with Red Hat security team
○ Escalation process to Ceph developers
○ Red Hat security team drives CVE process
○ Hot fixes distributed via Red Hat’s CDN
Reactive Security Process
29
Detecting and Preventing Breaches
● Brute force attacks
○ Good logging of any failed authentication
○ Monitoring easy via existing tools like e.g. Nagios
● To do:
○ Automatic blacklisting IPs/clients after n-failed attempts on Ceph level
● Unauthorized injection of keys
○ Monitor the audit log
■ trigger alerts for auth events -> monitoring
○ Periodic comparison with signed backup of auth database?
30
Conclusions
Summary
● Reactive processes are in place
○ security@ceph.com, CVEs, downstream product updates, etc.
● Proactive measures in progress
○ Code quality improves (SCA, etc.)
○ Unprivileged daemons
○ MAC (SELinux, AppArmor)
○ Encryption
● Progress defining security best-practices
○ Document best practices for security
● Ongoing process
32
Get involved !
● OpenStack
○ Telco Working Group
■ #openstack-nfv
○ Cinder, Glance, Manila, ...
● Ceph
○ https://ceph.com/community/contribute/
○ ceph-devel@vger.kernel.org
○ IRC: OFTC
■ #ceph,
■ #ceph-devel
○ Ceph Developer Summit
33
danny.al-gaaf@telekom.de
sage@redhat.com
dalgaaf
sage
Danny Al-Gaaf
Senior Cloud Technologist
Sage Weil
Ceph Principal Architect
IRC
THANK YOU!

Más contenido relacionado

Más de Sage Weil

Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageSage Weil
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and BeyondSage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackSage Weil
 
Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersSage Weil
 

Más de Sage Weil (8)

Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStack
 
Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containers
 

Último

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Último (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Storage security in a critical enterprise OpenStack environment

  • 1. Storage security in a critical enterprise OpenStack environment Danny Al-Gaaf (Deutsche Telekom AG), Sage Weil (Red Hat) OpenStack Summit 2015 - Vancouver
  • 2. ● Secure NFV cloud at DT ● Attack surface ● Proactive countermeasures ○ Setup ○ Vulnerability prevention ○ Breach mitigation ● Reactive countermeasures ○ 0-days, CVEs ○ Security support SLA and lifecycle ● Conclusions Overview 2
  • 4. NFV Cloud @ Deutsche Telekom ● Datacenter design ○ BDCs ■ few but classic DCs ■ high SLAs for infrastructure and services ■ for private/customer data and services ○ FDCs ■ small but many ■ near to the customer ■ lower SLAs, can fail at any time ■ services: ● spread over many FDCs ● failures are handled by services and not the infrastructure 4
  • 5. High Security Requirements ● Multiple security placement zones (PZ) ○ e.g. EHD, DMZ, MZ, SEC, Management ○ TelcoWG “Security Segregation” use case ● Separation required for: ○ compute ○ networks ○ storage ● Protect against many attack vectors ● Enforced and reviewed by security department ● Run telco core services @ OpenStack/KVM/Ceph 5
  • 8. Solutions for telco services ● Separation between security zones needed ● Physical separation ○ Large number of clusters (>100) ○ Large hardware demand (compute and storage) ○ High maintenance effort ○ Less flexibility ● RADOS pool separation ○ Much more flexible ○ Efficient use of hardware ● Question: ○ Can we get the same security as physical separation? 8
  • 9. Placement Zones ● Separate RADOS pool(s) for each security zone ○ Limit access using Ceph capabilities ● OpenStack AZs as PZs ● Cinder ○ Configure one backend/volume type per pool (with own key) ○ Need to map between AZs and volume types via policy ● Glance ○ Lacks separation between control and compute/storage layer ○ Separate read-only vs management endpoints ● Manila ○ Currently not planned to use in production with CephFS ○ May use RBD via NFS 9
  • 11. RadosGW attack surface ● S3/Swift ○ Network access to gateway only ○ No direct access for consumer to other Ceph daemons ● Single API attack surface 11
  • 12. RBD attack surface ● Protection from hypervisor block layer ○ No network access or CephX keys needed at guest level ● Issue: ○ hypervisor is software and therefore not 100% secure… ■ e.g., Venom! 12
  • 13. Host attack surface ● If KVM is compromised, the attacker ... ○ has access to neighbor VMs ○ has access to local Ceph keys ○ has access to Ceph public network and Ceph daemons ● Firewalls, deep packet inspection (DPI), ... ○ partly impractical due to used protocols ○ implications to performance and cost ● Bottom line: Ceph daemons must resist attack ○ C/C++ is harder to secure than e.g. Python ○ Homogenous: if one daemon is vulnerable, all in the cluster are! ○ Risk of denial-of-service 13
  • 14. Network attack surface ● Client/cluster sessions are not encrypted ○ Sniffer can recover any data read or written ● Sessions are authenticated ○ Attacker cannot impersonate clients or servers ○ Attacker cannot mount man-in-the-middle attacks 14
  • 15. Denial of Service ● Scenarios ○ Submit many / large / expensive IOs ■ use qemu IO throttling! ○ Open many connections ○ Use flaws to crash Ceph daemons ○ Identify non-obvious but expensive features of client/OSD interface 15
  • 17. Deployment and Setup ● Network ○ Always use separated cluster and public net ○ Always separate your control nodes from other networks ○ Don’t expose to the open internet ○ Encrypt inter-datacenter traffic ● Avoid hyper-converged infrastructure ○ Isolate compute and storage resources ○ Scale them independently ○ Risk mitigation if daemons are compromised or DoS’d ○ Don’t mix ■ compute and storage ■ control nodes (OpenStack and Ceph) 17
  • 18. Deploying RadosGW ● Big and easy target through HTTP(S) protocol ● Small appliance per tenant with ○ Separate network ○ SSL terminated proxy forwarding requests to radosgw ○ WAF (mod_security) to filter ○ Placed in secure/managed zone ● Don’t share buckets/users between tenants 18
  • 19. Ceph security: CephX ● Monitors are trusted key servers ○ Store copies of all entity keys ○ Each key has an associated “capability” ■ Plaintext description of what the key user is allowed to do ● What you get ○ Mutual authentication of client + server ○ Extensible authorization w/ “capabilities” ○ Protection from man-in-the-middle, TCP session hijacking ● What you don’t get ○ Secrecy (encryption over the wire) 19
  • 20. Ceph security: CephX take-aways ● Monitors must be secured ○ Protect the key database ● Key management is important ○ Separate key for each Cinder backend/AZ ○ Restrict capabilities associated with each key ○ Limit administrators’ power ■ use ‘allow profile admin’ and ‘allow profile readonly’ ■ restrict role-definer or ‘allow *’ keys ○ Careful key distribution (Ceph and OpenStack nodes) ● To do: ○ Thorough CephX code review by security experts ○ Audit OpenStack deployment tools’ key distribution ○ Improve security documentation20
  • 21. ● Static Code Analysis (SCA) ○ Buffer overflows and other code flaws ○ Regular Coverity scans ■ 996 fixed, 284 dismissed; 420 outstanding ■ defect density 0.97 ○ cppcheck ○ LLVM: clang/scan-build ● Runtime analysis ○ valgrind memcheck ● Plan ○ Reduce backlog of low-priority issues (e.g., issues in test code) ○ Automated reporting of new SCA issues on pull requests ○ Improve code reviewer awareness of security defects Preventing Breaches - Defects 21
  • 22. ● Pen-testing ○ human attempt to subvert security, generally guided by code review ● Fuzz testing ○ computer attempt to subvert or crash, by feeding garbage input ● Harden build ○ -fpie -fpic ○ -D_FORTIFY_SOURCE=2 -O2 (?) ○ -stack-protector=strong ○ -Wl,-z,relro,-z,now ○ Check for performance regression! Preventing Breaches - Hardening 22
  • 23. Mitigating Breaches ● Run non-root daemons ○ Prevent escalating privileges to get root ○ Run as ‘ceph’ user and group ○ Pending for Infernalis ● MAC ○ SELinux / AppArmor ○ Profiles for daemons and tools planned for Infernalis ● Run (some) daemons in VMs or containers ○ Monitor and RGW - less resource intensive ○ MDS - maybe ○ OSD - prefers direct access to hardware ● Separate mon admin network 23
  • 24. Encryption: Data at Rest ● Ceph-disk tool supports dm-crypt ○ Encrypt raw block device (OSD and journal) ○ Allow disks to be safely discarded if key remains secret ● Key management is still very simple ○ Encryption key stored on disk via LUKS ○ LUKS key stored in /etc/ceph/keys ● Plan ○ Petera, a new key escrow project from Red Hat ■ https://github.com/npmccallum/petera ○ Alternative: simple key management via monitor 24
  • 25. ● Goal ○ Protect data from someone listening in on network ○ Protect administrator sessions configuring client keys ● Plan ○ Generate per-session keys based on existing tickets ○ Selectively encrypt monitor administrator sessions Encryption: On Wire 25
  • 26. ● Limit load from client ○ Use qemu IO throttling features - set safe upper bound ● To do: ○ Limit max open sockets per OSD ○ Limit max open sockets per source IP ■ handle on Ceph or in the network layer? ○ Throttle operations per-session or per-client (vs just globally)? Denial of Service attacks 26
  • 27. CephFS ● No standard virtualization layer (unlike block) ○ Proxy through gateway (NFS?) ○ Filesystem passthrough (9p/virtfs) to host ○ Allow direct access from tenant VM ● Granularity of access control is harder ○ No simple mapping to RADOS objects ● Work in progress ○ root_squash ○ Restrict mount to subtree ○ Restrict mount to user 27
  • 29. ● Community ○ Single point of contact: security@ceph.com ■ Core development team ■ Red Hat, SUSE, Canonical security teams ○ Security related fixes are prioritized and backported ○ Releases may be accelerated on ad hoc basis ○ Security advisories to ceph-announce@ceph.com ● Red Hat Ceph ○ Strict SLA on issues raised with Red Hat security team ○ Escalation process to Ceph developers ○ Red Hat security team drives CVE process ○ Hot fixes distributed via Red Hat’s CDN Reactive Security Process 29
  • 30. Detecting and Preventing Breaches ● Brute force attacks ○ Good logging of any failed authentication ○ Monitoring easy via existing tools like e.g. Nagios ● To do: ○ Automatic blacklisting IPs/clients after n-failed attempts on Ceph level ● Unauthorized injection of keys ○ Monitor the audit log ■ trigger alerts for auth events -> monitoring ○ Periodic comparison with signed backup of auth database? 30
  • 32. Summary ● Reactive processes are in place ○ security@ceph.com, CVEs, downstream product updates, etc. ● Proactive measures in progress ○ Code quality improves (SCA, etc.) ○ Unprivileged daemons ○ MAC (SELinux, AppArmor) ○ Encryption ● Progress defining security best-practices ○ Document best practices for security ● Ongoing process 32
  • 33. Get involved ! ● OpenStack ○ Telco Working Group ■ #openstack-nfv ○ Cinder, Glance, Manila, ... ● Ceph ○ https://ceph.com/community/contribute/ ○ ceph-devel@vger.kernel.org ○ IRC: OFTC ■ #ceph, ■ #ceph-devel ○ Ceph Developer Summit 33
  • 34. danny.al-gaaf@telekom.de sage@redhat.com dalgaaf sage Danny Al-Gaaf Senior Cloud Technologist Sage Weil Ceph Principal Architect IRC THANK YOU!