Do we need a new standard for visualizing the invisible?
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Storage as-a-Service
1. Dell and CEPH
Steve Smith:
Steve_l_smith@dell.com
@SteveSAtDell
Paul Brook
Paul_brook@dell.com
Twitter @PaulBrookAtDell
Ceph Day London
October 22nd 2014
2. agenda
• Why we are here. – we sell CEPH support
• You need hardware to sit this on. Here are some ideas
• Some best practice shared with CEPH colleagues this year
• A concept – (Research Data – would like your input)
Dell Corporation
3. Dell is a certified reseller of Red Hat-Inktank
Services, Support and Training.
• Need to Access and buy Red Hat Services & Support?
15+ Years of Red Hat and Dell
• Red Hat 1-year /3-year subscription packages
– Inktank Pre-Production subscription
– Gold (24*7) Subscription
• Red Hat Professional Services
– Ceph Pro services Starter Pack
– Additional days services options
– Ceph Training from Red Hat
Or…you can download CEPH for Free
Dell Corporation
3Confidential
7. Planning your Ceph Implementation
• Business Requirements
– Budget considerations, organisational commitment
– Replacing Enterprise SAN/NAS for cost saving
– xaaS use cases for massive-scale, cost-effective storage
– Avoid lock-in – use open source and industry standards
– Steady-state vs. Spike data usage
• Sizing requirements
– What is the initial storage capacity?
– What is the expected growth rate?
• Workload requirements
– Does the workload need high performance or it is more capacity
focused?
– What are IOPS/Throughput requirements?
– What applications will be running on Ceph cluster?
– What type of data will be stored?
Dell Corporation
8. Architectural considerations – Redundancy and
replication considerations
• Tradeoff between Cost vs. Reliability (use-case dependent)
• How many node failures can be tolerated?
• In a multi-rack scenario, should a whole rack failure be
tolerated?
• Is there a need for multi-site data replication?
• Erasure coding (more capacity with the same raw disk. More
CPU load)
• Plan for redundancy of the monitor nodes – distribute across
fault zones
• 3 copies = 8 nines availability, less than 1 second downtime per
year
• Many many things affect performance - in Ceph, above Ceph
and below Ceph.
Dell Corporation
14. Multi-Site Issues
• Within a CEPH cluster RADOS enforces Strong Consistency
• The Writer process will wait for the ACK, which happens after the
primary copy, the replicated copies and the journals have all been
written.
• On a WAN this might extend latencies unacceptably.
• Alternatives
• For S3/Swift systems, federated gateways between CEPH clusters,
RADOS uses Eventual Consistency.
• For remote backup use RBD with sync agents and incremental
snapshots.
Dell Corporation
15. Recommended Storage Server Configurations
CEPH and InkTank recommendations are a bit out of date.
• CPU – 1 core GHz per OSD
– so a 2 x 8-core Intel Haswell 2.0GHz server could support 32 OSDs
– less for AMD
• Memory – 2GB per OSD
– Must be ECC
• Disk Controller – SAS or SATA without extender for data and
journal, RAID 1 for operating system disks
• Data Disks – Size doesn’t matter! Rebuilds happen across
hundreds of placement groups.
– 12 disks seems a good number
• Journal Disks – SSDs – write optimised
Dell Corporation
17. Memory Considerations
C0 C1 C2 C3
C0 C1 C2 C3
C4 C5 C6 C7 C4 C5 C6 C7
• Always populate all channels – groups of 8
• Anything less loses significant memory bandwidth
• Speed drops with 3DPC (sometimes 2DPC)
• Use Dual Rank RDIMMs for maximum performance and expandability
• Important to PIN process and data to same NUMA node
• But let OS processes float
• Or try Hyperthreading
• Sensible memory is now 64GB (8 x 8GB RDIMMs)
Dell Corporation
19. Ceph Gateway Server
• Gateway does CRC32 and MD5 checksumming
– Now included in Intel AVX2 on Haswell
• 64GB memory (minimum sensible)
• 2 separate 10GbE NICs, 1 for client comms, 1 for store/retrieve
• Make sure you have enough file handles, default is 100 - you should
start at 4096!
• Load balancing with multiple gateways
Dell Corporation
20. Ceph Cluster Monitors
• Best practice to deploy monitor role on dedicated hardware
– Not resource intensive but critical – Stewards of the cluster
– Using separate hardware ensures no contention for resources
• Make sure monitor processes are never starved for resources
– If running monitor process on shared hardware, fence off resources
• Deploy an odd number of monitors (3 or 5)
– Need to have an odd number of monitors for quorum voting
– Clusters < 200 nodes work well with 3 monitors
– Larger clusters may benefit from 5
– Main reason to go to 7 is to have redundancy in fault zones
• Add redundancy to monitor nodes as appropriate
– Make sure the monitor nodes are distributed across fault zones
– Consider refactoring fault zones if needing more than 7 monitors
– Build in redundant power, cooling, disk
2
0
Dell Corporation
21. Networking Overview
• Plan for low latency and high bandwidth
• Use 10GbE switches within the rack
• Use 40GbE uplinks between racks in the datacentre
• Use more bandwidth at the backend compared to the front end
• Enable Jumbo frames
• Replication is done by the storage not the client
• Client writes to primary and journal
• Primary writes to replicas through back end network
• Backend also does recovery and rebalancing
2
1
Dell Corporation
22. Potential Dell Server Hardware Choices
• Rackable Storage Node
– Dell PowerEdge R720XD OR new 13g R730/R730xd
• Bladed Storage Node
– Dell PowerEdge C8000XD Disk
and PowerEdge C8220 CPU
– 2x Xeon E5-2687 CPU, 128GB RAM
– 2x 400GB SSD drives
(OS and optionally Journals)
– 12x 3TB NL SAS drive
– 2x 10GbE, 1x 1GbE, IPMI
• Monitor Node
– Dell PowerEdge R415
– 2x 1TB SATA
– 1x 10GbE
Dell Corporation
2Confidential
2
23. Mixed Use Deployments
• For simplicity, dedicate hardware to specific role
– That may not always be practical (e.g., small clusters)
– If needed, can combine multiple functions on same hardware
• Multiple Ceph Roles (e.g., OSD+RGW, OSD+MDS, Mon+RGW)
– Balance IO-intensive with CPU/memory intensive roles
– If both roles are relatively light (e.g., Mon and RGW) can
combine
• Multiple Applications (e.g., OSD+Compute, Mon+Horizon)
– In OpenStack environment, may need to mix components
– Follow same logic of balancing IO-intensive with CPU intensive
2
3
Dell Corporation
24. Super-size CEPH
• Lots of Disk space
• CEPH Rules apply
• Great for cold dark storage
• Surprisingly popular with
Customers
• 3PB raw in a rack!
R730/R730XD or R720/R720XD
PowerVault JBOD
Dell Corporation
25. Other Design Guidelines
• Use simple components, don't buy more than you
need.
–Save money on RAID, redundant NICs, PS
and buy more disks
• Keep networks as flat as possible (East-West)
–VLANs don't scale
– Use Software Defined Networking for multi-tenancy in
cloud
• Design the fault zones carefully for NoSPoF
–Rack
–Row
–Datacentre
2
5
Dell Corporation
27. Concept: Get started?
Keep,
Search,
Collaborate-
Publish
Research Data & Publications
Digital - Pre-Publication
(Any Format?)
Digital -Other (Any Format?)
Dell Corporation
28. Concept: Get started?
Keep,
Search,
Collaborate-
Publish
Research Data & Publications
Digital - Pre-Publication
(Any Format?)
Digital -Other (Any Format?)
How tag metadata?
How Search?
Data Security?
File types to store?
How long to store?
How Collaborate?
Dell Corporation
29. Holding a tin cup below a Niagara Falls of data!"
Data keeps on
coming &…….
..coming……&
coming………..
Has anyone else had this problem and already solved it. ?
Open Source is best protection/longevity. “Web 2.0/Social has already solved scale-storage
problem”
Dell Corporation
30. Solve problems one at a time
OpenStack
Layer
(Access)
CEPH Storage
Identity
Management
Governance
Policy &
Control
PUBLISH:
Existing
Publishing
routes
Dell Corporation
31. Solve problems one at a time
OpenStack
Layer
(Access)
CEPH Storage
Identity
Management
Governance
Policy &
Control
Start Here
PUBLISH:
Existing
Publishing
routes
Dell Corporation
Notas del editor
Welcome to a short overview of Ceph storage in Dell OpenStack-Powered Cloud Solutions
Ceph is a transformational storage technology available as free open source software. It’s a universal storage solution that provides block, file, and object storage from a scalable cluster built on top of standard utility server hardware.
Dell has partnered with Inktank, the Ceph experts, to bring a validated Ceph storage solution to Dell cloud customers
Suggested notes: Paul_ We sell Red Hat /Inktank support and training and stuff. If you want it/need it – we can help you get it
Not even the least bit complicated. – But if we are positioning this OUTSIDE CEPH community – what is best way ? Cloud scale-low cost-flexible stoRage -