SlideShare a Scribd company logo
1 of 38
Download to read offline
SF BAY AREA CEPH
USERS GROUP

INAUGURAL MEETUP

Thursday, January 16, 14
AGENDA
Intro to Ceph
Ceph Networking
Public Topologies
Cluster Topologies
Network Hardware

2

Thursday, January 16, 14
THE FORECAST

By 2020
over 39 ZB
of data will
be stored.
1.5 ZB are
stored today.

3
THE PROBLEM

Growth of data

 Existing systems don’t
scale

IT Storage Budget

 Increasing cost and
complexity
2010

4

Thursday, January 16, 14

2020

 Need to invest in new
platforms ahead of time
THE SOLUTION

PAST: SCALE UP

FUTURE: SCALE OUT

5

Thursday, January 16, 14
CEPH
Thursday, January 16, 14
INTRO TO CEPH
 Distributed storage system
 Horizontally scalable
 No single point of failure
 Self healing and self managing
 Runs on commodity hardware
 GPLv2 License

7

Thursday, January 16, 14
ARCHITECTURE

8

Thursday, January 16, 14
SERVICE COMPONENTS
MONITOR
 PAXOS for consensus
 Maintain cluster state
 Typically 3-5 nodes
 NOT in write path

OSD
 Object storage interface
 Gossips with peers
 Data lives here

9

Thursday, January 16, 14

PART 1
SERVICE COMPONENTS
RADOS GATEWAY
 Provides S3/Swift compatibility
 Scale out

METADATA
 Object storage interface
 Gossips with peers
 Dynamic subtree partitioning

10

Thursday, January 16, 14

PART 2
CRUSH
 Ceph uses CRUSH for data placement
 Aware of cluster topography
 Statistically even distribution across pool
 Supports asymmetric nodes and devices
 Hierarchal weighting

11

Thursday, January 16, 14
DATA PLACEMENT

12

Thursday, January 16, 14
POOLS
 Groupings of OSDs
 Both physical and logical
 Volumes / Images
 Hot SSD pool
 Cold SATA pool
 DMCrypt pool

13

Thursday, January 16, 14
REPLICATION
 Original data durability mechanism
 Ceph creates N replicas of each RADOS object
 Uses CRUSH to determine replica placement
 Required for mutable objects (RBD, CephFS)
 More reasonable for smaller installations

14

Thursday, January 16, 14
ERASURE CODING
 (8:4) MDS code in example
 1.5x overhead
 8 units of client data to write
 4 parity units generated using FEC
 All 12 units placed with CRUSH
 8/12 total units to satisfy a read

15

Thursday, January 16, 14

Firefly Release
CLIENT COMPONENTS
Native API
 Mutable object store
 Many language bindings
 Object classes

CephFS
 Linux Kernel CephFS client since 2.6.34
 FUSE client
 Hadoop JNI bindings

16

Thursday, January 16, 14
CLIENT COMPONENTS
Block Storage
 Linux Kernel RBD client since 2.6.37+
 KVM/QEMU integration
 Xen integration

S3/Swift
S3/SWIFT
OSD
 RESTful interfaces (HTTP)
 CRUD operations
 Usage accounting for billing

17

Thursday, January 16, 14
Ceph Networking
Thursday, January 16, 14
INFINIBAND
 Currently only supported via IPoIB
 Accelio (libxio) integration in Ceph is in early stages
 Accelio supports multiple transports RDMA, TCP and
Shared-Memory
 Accelio supports multiple RDMA transports (IB, RoCE,
iWARP)

19

Thursday, January 16, 14
ETHERNET
 Tried and true
 Proven at scale
 Economical
 Many suitable vendors

20

Thursday, January 16, 14
10GbE or 1GbE
 Cost of 10GbE trending downward
 White box switches turning up heat on vendors
 Twinax relatively inexpensive and low power
 SFP+ is versatile wrt distance
 Single 10GbE for object
 Dual 10GbE for block storage (public/cluster)
 Bonding many 1GbE links adds lots of complexity

21

Thursday, January 16, 14
IPv4 or IPv6 Native
 It’s 2014, is this really a question?
 Ceph fully supports both modes of operation
 Hierarchal allocation models allows “roll up” of routes
 Optimal efficiency in RIB
 Some tools believe the earth is flat

22

Thursday, January 16, 14
LAYER 2
 Spanning tree
 Switch table size
 Broadcast domains (ARP)
 MAC frame checksum
 Storage protocols (FCoE, ATAoE)
 TRILL, MLAG
 Layer 2 DCI is crazy pants
 Layer 2 tunneled over internet is super crazy pants

23

Thursday, January 16, 14
LAYER 3
 Address and subnet planning
 Proven scale at big web shops
 Error detection only on TCP header
 Equal cost multi-path (ECMP)
 Reasonable for inter-site connectivity

24

Thursday, January 16, 14
Public Topologies
Thursday, January 16, 14
CLIENT TOPOLOGIES
 Path diversity for resiliency
 Minimize network diameter
 Consistent hop count to minimize net long tail latency
 Ease of scaling
 Tolerate adversarial traffic patterns (fan-in/fan-out)

26

Thursday, January 16, 14
FOLDED CLOS
 Sometimes called Fat Tree or Spine and Leaf
 Minimum 4 fixed switches, grows to 10k+ node fabrics
 Rack or cluster oversubscription possible
 Non-blocking also possible
S
S

S

S

 Path diversity

S
....

....
1

27

Thursday, January 16, 14

2

N

1

2

S

....
N

1

2

....
N

1

2

N
Cluster Topologies
Thursday, January 16, 14
REPLICA TOPOLOGIES
 Replica and erasure fan-out
 Recovery and remap impact on cluster bandwidth
 OSD peering
 Backfill served from primary
 Tune backfills to avoid large fan-in

29

Thursday, January 16, 14
FOLDED CLOS
 Sometimes called Fat Tree or Spine and Leaf
 Minimum 4, grows to 10k+ node fabrics
 Rack or cluster oversubscription possible
 Non-blocking also possible
S
S

S

S

 Path diversity

S
....

....
1

30

Thursday, January 16, 14

2

N

1

2

S

....
N

1

2

....
N

1

2

N
N-WAY PARTIAL MESH

31

Thursday, January 16, 14
EVALUATE
 Replication
 Erasure coding
 Special purpose vs general purpose
 Extra port cost

32

Thursday, January 16, 14
Network Hardware
Thursday, January 16, 14
Features
 Buffer sizes
 Cut through vs store and forward
 Oversubscribed vs non-blocking
 Automation and monitoring

34

Thursday, January 16, 14
FIXED
 Fixed switches can easily build large clusters
 Easier to source
 Smaller failure domains
 Fixed designs have many control planes
 Virtual chassis.. L3 split brain hilarity?

35

Thursday, January 16, 14
LESS SKU
 Utilize as few vendor SKUs as possible
 If permitted, use same fixed switch for spine and leaf
 More affordable to have spares on site or more spares
 Quicker MTTR when gear is ready to go

36

Thursday, January 16, 14
Thanks to our host!

37

Thursday, January 16, 14
Kyle Bader
Sr. Solutions Architect

kyle@inktank.com

Thursday, January 16, 14

More Related Content

What's hot

Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldSage Weil
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016John Spray
 
An intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data WorkshopAn intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data WorkshopPatrick McGarry
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...Ian Colle
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro
 
Hadoop over rgw
Hadoop over rgwHadoop over rgw
Hadoop over rgwzhouyuan
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about cephEmma Haruka Iwao
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with cephIan Colle
 

What's hot (19)

Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud world
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
An intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data WorkshopAn intro to Ceph and big data - CERN Big Data Workshop
An intro to Ceph and big data - CERN Big Data Workshop
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
Hadoop over rgw
Hadoop over rgwHadoop over rgw
Hadoop over rgw
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
librados
libradoslibrados
librados
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with ceph
 

Viewers also liked

Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability MattersMark Swarbrick
 
Tiery Eyed
Tiery EyedTiery Eyed
Tiery EyedZendCon
 
Framework Shootout
Framework ShootoutFramework Shootout
Framework ShootoutZendCon
 
PHP on IBM i Tutorial
PHP on IBM i TutorialPHP on IBM i Tutorial
PHP on IBM i TutorialZendCon
 
Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请Zhaoyang Wang
 
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/NetMySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/NetMark Swarbrick
 
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and ScalabilitySolving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and ScalabilityZendCon
 
Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践Zhaoyang Wang
 
Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍Zhaoyang Wang
 
Zend Core on IBM i - Security Considerations
Zend Core on IBM i - Security ConsiderationsZend Core on IBM i - Security Considerations
Zend Core on IBM i - Security ConsiderationsZendCon
 
Zend_Tool: Practical use and Extending
Zend_Tool: Practical use and ExtendingZend_Tool: Practical use and Extending
Zend_Tool: Practical use and ExtendingZendCon
 
MySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMark Swarbrick
 
A Storage Story #ChefConf2013
A Storage Story #ChefConf2013A Storage Story #ChefConf2013
A Storage Story #ChefConf2013Kyle Bader
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
Application Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server TracingApplication Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server TracingZendCon
 
Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站Zhaoyang Wang
 
PHP on Windows - What's New
PHP on Windows - What's NewPHP on Windows - What's New
PHP on Windows - What's NewZendCon
 
PHP and Platform Independance in the Cloud
PHP and Platform Independance in the CloudPHP and Platform Independance in the Cloud
PHP and Platform Independance in the CloudZendCon
 

Viewers also liked (20)

Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability Matters
 
Tiery Eyed
Tiery EyedTiery Eyed
Tiery Eyed
 
Framework Shootout
Framework ShootoutFramework Shootout
Framework Shootout
 
PHP on IBM i Tutorial
PHP on IBM i TutorialPHP on IBM i Tutorial
PHP on IBM i Tutorial
 
Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请
 
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/NetMySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
 
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and ScalabilitySolving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
 
Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践
 
Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍
 
Zend Core on IBM i - Security Considerations
Zend Core on IBM i - Security ConsiderationsZend Core on IBM i - Security Considerations
Zend Core on IBM i - Security Considerations
 
MySQL in your laptop
MySQL in your laptopMySQL in your laptop
MySQL in your laptop
 
Zend_Tool: Practical use and Extending
Zend_Tool: Practical use and ExtendingZend_Tool: Practical use and Extending
Zend_Tool: Practical use and Extending
 
Script it
Script itScript it
Script it
 
MySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats new
 
A Storage Story #ChefConf2013
A Storage Story #ChefConf2013A Storage Story #ChefConf2013
A Storage Story #ChefConf2013
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Application Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server TracingApplication Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server Tracing
 
Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站
 
PHP on Windows - What's New
PHP on Windows - What's NewPHP on Windows - What's New
PHP on Windows - What's New
 
PHP and Platform Independance in the Cloud
PHP and Platform Independance in the CloudPHP and Platform Independance in the Cloud
PHP and Platform Independance in the Cloud
 

Similar to SF Ceph Users Jan. 2014

Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBDDan Frincu
 
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)Ali Ordoubadian
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Community
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadOpen-NFP
 
FOSDEM 2017 Trip Report
FOSDEM 2017 Trip ReportFOSDEM 2017 Trip Report
FOSDEM 2017 Trip ReportOCaml Labs
 
The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...OVHcloud
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
 
Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0Fred Bovy
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlOpen-NFP
 
Webinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeWebinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeCumulus Networks
 
June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on Videoguy
 
Basic of ip subnet and addressing
Basic of ip subnet and addressingBasic of ip subnet and addressing
Basic of ip subnet and addressingrahul_cuet
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionCcie Light
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
 

Similar to SF Ceph Users Jan. 2014 (20)

Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
ONOS Deployment Brigade
ONOS Deployment BrigadeONOS Deployment Brigade
ONOS Deployment Brigade
 
BSDCan2006.pdf
BSDCan2006.pdfBSDCan2006.pdf
BSDCan2006.pdf
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem Update
 
6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol6LoWPAN: An Open IoT Networking Protocol
6LoWPAN: An Open IoT Networking Protocol
 
I Pv6
I Pv6I Pv6
I Pv6
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC Offload
 
FOSDEM 2017 Trip Report
FOSDEM 2017 Trip ReportFOSDEM 2017 Trip Report
FOSDEM 2017 Trip Report
 
The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...The advantages of Arista/OVH configurations, and the technologies behind buil...
The advantages of Arista/OVH configurations, and the technologies behind buil...
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
IPv6 ND 2020
IPv6 ND 2020IPv6 ND 2020
IPv6 ND 2020
 
Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0Fb i pv6-sparchimanv1.0
Fb i pv6-sparchimanv1.0
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and Control
 
Webinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeWebinar-Linux Networking is Awesome
Webinar-Linux Networking is Awesome
 
June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on
 
Basic of ip subnet and addressing
Basic of ip subnet and addressingBasic of ip subnet and addressing
Basic of ip subnet and addressing
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sion
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 

Recently uploaded

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

SF Ceph Users Jan. 2014

  • 1. SF BAY AREA CEPH USERS GROUP INAUGURAL MEETUP Thursday, January 16, 14
  • 2. AGENDA Intro to Ceph Ceph Networking Public Topologies Cluster Topologies Network Hardware 2 Thursday, January 16, 14
  • 3. THE FORECAST By 2020 over 39 ZB of data will be stored. 1.5 ZB are stored today. 3
  • 4. THE PROBLEM Growth of data  Existing systems don’t scale IT Storage Budget  Increasing cost and complexity 2010 4 Thursday, January 16, 14 2020  Need to invest in new platforms ahead of time
  • 5. THE SOLUTION PAST: SCALE UP FUTURE: SCALE OUT 5 Thursday, January 16, 14
  • 7. INTRO TO CEPH  Distributed storage system  Horizontally scalable  No single point of failure  Self healing and self managing  Runs on commodity hardware  GPLv2 License 7 Thursday, January 16, 14
  • 9. SERVICE COMPONENTS MONITOR  PAXOS for consensus  Maintain cluster state  Typically 3-5 nodes  NOT in write path OSD  Object storage interface  Gossips with peers  Data lives here 9 Thursday, January 16, 14 PART 1
  • 10. SERVICE COMPONENTS RADOS GATEWAY  Provides S3/Swift compatibility  Scale out METADATA  Object storage interface  Gossips with peers  Dynamic subtree partitioning 10 Thursday, January 16, 14 PART 2
  • 11. CRUSH  Ceph uses CRUSH for data placement  Aware of cluster topography  Statistically even distribution across pool  Supports asymmetric nodes and devices  Hierarchal weighting 11 Thursday, January 16, 14
  • 13. POOLS  Groupings of OSDs  Both physical and logical  Volumes / Images  Hot SSD pool  Cold SATA pool  DMCrypt pool 13 Thursday, January 16, 14
  • 14. REPLICATION  Original data durability mechanism  Ceph creates N replicas of each RADOS object  Uses CRUSH to determine replica placement  Required for mutable objects (RBD, CephFS)  More reasonable for smaller installations 14 Thursday, January 16, 14
  • 15. ERASURE CODING  (8:4) MDS code in example  1.5x overhead  8 units of client data to write  4 parity units generated using FEC  All 12 units placed with CRUSH  8/12 total units to satisfy a read 15 Thursday, January 16, 14 Firefly Release
  • 16. CLIENT COMPONENTS Native API  Mutable object store  Many language bindings  Object classes CephFS  Linux Kernel CephFS client since 2.6.34  FUSE client  Hadoop JNI bindings 16 Thursday, January 16, 14
  • 17. CLIENT COMPONENTS Block Storage  Linux Kernel RBD client since 2.6.37+  KVM/QEMU integration  Xen integration S3/Swift S3/SWIFT OSD  RESTful interfaces (HTTP)  CRUD operations  Usage accounting for billing 17 Thursday, January 16, 14
  • 19. INFINIBAND  Currently only supported via IPoIB  Accelio (libxio) integration in Ceph is in early stages  Accelio supports multiple transports RDMA, TCP and Shared-Memory  Accelio supports multiple RDMA transports (IB, RoCE, iWARP) 19 Thursday, January 16, 14
  • 20. ETHERNET  Tried and true  Proven at scale  Economical  Many suitable vendors 20 Thursday, January 16, 14
  • 21. 10GbE or 1GbE  Cost of 10GbE trending downward  White box switches turning up heat on vendors  Twinax relatively inexpensive and low power  SFP+ is versatile wrt distance  Single 10GbE for object  Dual 10GbE for block storage (public/cluster)  Bonding many 1GbE links adds lots of complexity 21 Thursday, January 16, 14
  • 22. IPv4 or IPv6 Native  It’s 2014, is this really a question?  Ceph fully supports both modes of operation  Hierarchal allocation models allows “roll up” of routes  Optimal efficiency in RIB  Some tools believe the earth is flat 22 Thursday, January 16, 14
  • 23. LAYER 2  Spanning tree  Switch table size  Broadcast domains (ARP)  MAC frame checksum  Storage protocols (FCoE, ATAoE)  TRILL, MLAG  Layer 2 DCI is crazy pants  Layer 2 tunneled over internet is super crazy pants 23 Thursday, January 16, 14
  • 24. LAYER 3  Address and subnet planning  Proven scale at big web shops  Error detection only on TCP header  Equal cost multi-path (ECMP)  Reasonable for inter-site connectivity 24 Thursday, January 16, 14
  • 26. CLIENT TOPOLOGIES  Path diversity for resiliency  Minimize network diameter  Consistent hop count to minimize net long tail latency  Ease of scaling  Tolerate adversarial traffic patterns (fan-in/fan-out) 26 Thursday, January 16, 14
  • 27. FOLDED CLOS  Sometimes called Fat Tree or Spine and Leaf  Minimum 4 fixed switches, grows to 10k+ node fabrics  Rack or cluster oversubscription possible  Non-blocking also possible S S S S  Path diversity S .... .... 1 27 Thursday, January 16, 14 2 N 1 2 S .... N 1 2 .... N 1 2 N
  • 29. REPLICA TOPOLOGIES  Replica and erasure fan-out  Recovery and remap impact on cluster bandwidth  OSD peering  Backfill served from primary  Tune backfills to avoid large fan-in 29 Thursday, January 16, 14
  • 30. FOLDED CLOS  Sometimes called Fat Tree or Spine and Leaf  Minimum 4, grows to 10k+ node fabrics  Rack or cluster oversubscription possible  Non-blocking also possible S S S S  Path diversity S .... .... 1 30 Thursday, January 16, 14 2 N 1 2 S .... N 1 2 .... N 1 2 N
  • 32. EVALUATE  Replication  Erasure coding  Special purpose vs general purpose  Extra port cost 32 Thursday, January 16, 14
  • 34. Features  Buffer sizes  Cut through vs store and forward  Oversubscribed vs non-blocking  Automation and monitoring 34 Thursday, January 16, 14
  • 35. FIXED  Fixed switches can easily build large clusters  Easier to source  Smaller failure domains  Fixed designs have many control planes  Virtual chassis.. L3 split brain hilarity? 35 Thursday, January 16, 14
  • 36. LESS SKU  Utilize as few vendor SKUs as possible  If permitted, use same fixed switch for spine and leaf  More affordable to have spares on site or more spares  Quicker MTTR when gear is ready to go 36 Thursday, January 16, 14
  • 37. Thanks to our host! 37 Thursday, January 16, 14
  • 38. Kyle Bader Sr. Solutions Architect kyle@inktank.com Thursday, January 16, 14