SlideShare una empresa de Scribd logo
1 de 30
Future of CephFS
Sage Weil
APP
APP

LIBRADOS
LIBRADOS

APP
APP

RADOSGW
RADOSGW

AA bucket-based
bucket-based
AA library allowing REST gateway,
library allowing
REST gateway,
apps to directly
compatible with S3
apps to directly
compatible with S3
access RADOS,
access RADOS, and Swift
and Swift
with support for
with support for
C, C++, Java,
C, C++, Java,
Python, Ruby,
Python, Ruby,
and PHP
and PHP

HOST/VM
HOST/VM

CLIENT
CLIENT

RBD
RBD

CEPH FS

AA reliable and fullyreliable and fullydistributed block
distributed block
device, with a a Linux
device, with Linux
kernel client and a a
kernel client and
QEMU/KVM driver
QEMU/KVM driver

A POSIX-compliant
distributed file system,
with a Linux kernel
client and support for
FUSE

RADOS
RADOS
AA reliable, autonomous, distributed object store comprised of self-healing, self-managing,
reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
CLIENT
CLIENT

metadata

01
01
10
10

data

M
M
M
M

M
M
M
M
M
M

M
M
Metadata Server
• Manages metadata for a
POSIX-compliant shared
filesystem
• Directory hierarchy
• File metadata (owner,
timestamps, mode, etc.)

• Stores metadata in RADOS
• Does not serve file data to
clients
• Only required for shared
filesystem
legacy metadata storage
●

a scaling disaster
●

name → inode → block list →
data

●

no inode table locality

●

fragmentation
–

inode table

–

directory

●

many seeks

●

difficult to partition

etc
home
usr
var
vmlinuz
…

hosts
mtab
passwd
…

bin
include
lib
…
ceph fs metadata storage
●

●

100
1
etc
home
usr
var
vmlinuz
…

hosts
mtab
passwd
…

block lists unnecessary
inode table mostly useless
●

102
bin
include
lib
…

●

●

APIs are path-based, not
inode-based
no random table access,
sloppy caching

embed inodes inside
directories
●

good locality, prefetching

●

leverage key/value object
controlling metadata io
●

view ceph-mds as cache
●

reduce reads
–

●

reduce writes
–

●

dir+inode prefetching
journal

consolidate multiple writes

large journal or log
●

stripe over objects

●

two tiers
–
–

●

journal for short term
per-directory for long term

fast failure recovery

directories
one tree

three metadata servers

??
load distribution
●

coarse (static subtree)
●

●

high management overhead

fine (hash)
●

always balanced

●

less vulnerable to hot spots

●

●

static subtree

preserve locality

●

good locality

destroy hierarchy, locality

can a dynamic approach
capture benefits of both
extremes?

hash directories

good balance

hash files
DYNAMIC SUBTREE PARTITIONING
dynamic subtree partitioning
●

scalable
●

●

arbitrarily partition
metadata

adaptive
●

●

●

move work from busy to
idle servers
replicate hot metadata

efficient
●

●

hierarchical partition
preserve locality

dynamic
●

daemons can join/leave

●

take over for failed nodes
Dynamic partitioning
many directories

same directory
Failure recovery
Metadata replication and availability
Metadata cluster scaling
client protocol
●

highly stateful
●

●

consistent, fine-grained caching

seamless hand-off between ceph-mds daemons
●

●

●

when client traverses hierarchy
when metadata is migrated between servers

direct access to OSDs for file I/O
an example
●

mount -t ceph 1.2.3.4:/ /mnt
●

●

●

3 ceph-mon RT
2 ceph-mds RT (1 ceph-mds to -osd RT)

2 ceph-mds RT (2 ceph-mds to -osd RT)

ls -al
●

open

●

readdir
–

1 ceph-mds RT (1 ceph-mds to -osd RT)

●

stat each file

●

●

ceph-osd

cd /mnt/foo/bar
●

●

ceph-mon

close

cp * /tmp
●

N ceph-osd RT

ceph-mds
recursive accounting
●

ceph-mds tracks recursive directory stats
●

file sizes

●

file and directory counts

●

modification time

●

virtual xattrs present full stats

●

efficient
$ ls ­alSh | head
total 0
drwxr­xr­x 1 root            root      9.7T 2011­02­04 15:51 .
drwxr­xr­x 1 root            root      9.7T 2010­12­16 15:06 ..
drwxr­xr­x 1 pomceph         pg4194980 9.6T 2011­02­24 08:25 pomceph
drwxr­xr­x 1 mcg_test1       pg2419992  23G 2011­02­02 08:57 mcg_test1
drwx­­x­­­ 1 luko            adm        19G 2011­01­21 12:17 luko
drwx­­x­­­ 1 eest            adm        14G 2011­02­04 16:29 eest
drwxr­xr­x 1 mcg_test2       pg2419992 3.0G 2011­02­02 09:34 mcg_test2
drwx­­x­­­ 1 fuzyceph        adm       1.5G 2011­01­18 10:46 fuzyceph
drwxr­xr­x 1 dallasceph      pg275     596M 2011­01­14 10:06 dallasceph
snapshots
●

volume or subvolume snapshots unusable at petabyte
scale
●

●

snapshot arbitrary subdirectories

simple interface
●

hidden '.snap' directory

●

no special tools

$ mkdir foo/.snap/one
$ ls foo/.snap
one
$ ls foo/bar/.snap
_one_1099511627776
$ rm foo/myfile
$ ls -F foo
bar/
$ ls -F foo/.snap/one
myfile bar/
$ rmdir foo/.snap/one

# create snapshot

# parent's snap name is mangled

# remove snapshot
multiple client implementations
●

Linux kernel client
●

●

mount -t ceph 1.2.3.4:/
/mnt
export (NFS), Samba (CIFS)

●

ceph-fuse

●

libcephfs.so
●

your app

●

Ganesha (NFS)

●

Hadoop (map/reduce)

Ganesha
libcephfs

Samba
libcephfs

Hadoop
libcephfs

your app
libcephfs

Samba (CIFS)

●

SMB/CIFS

NFS

ceph

ceph-fuse
fuse
kernel
APP
APP

LIBRADOS
LIBRADOS

APP
APP

HOST/VM
HOST/VM

RADOSGW
RADOSGW

RBD
RBD

AA bucket-based
bucket-based
AA library allowing REST gateway,
library allowing
REST gateway,
apps to directly
compatible with S3
apps to directly
compatible with S3
access RADOS,
access RADOS, and Swift
and Swift
with support for
with support for
C, C++, Java,
C, C++, Java,
Python, Ruby,
Python, Ruby,
AWESOME
and PHP
and PHP

CEPH FS
CEPH FS

AA reliable and fullyreliable and fullydistributed block
distributed block
device, with a a Linux
device, with Linux
kernel client and a a
kernel client and
QEMU/KVM driver
QEMU/KVM driver

AA POSIX-compliant
POSIX-compliant
distributed file system,
distributed file system,
with a a Linux kernel
with Linux kernel
client and support for
client and support for
FUSE
FUSE

AWESOME

AWESOME
RADOS
RADOS

CLIENT
CLIENT

NEARLY
AWESOME

AWESOME

AA reliable, autonomous, distributed object store comprised of self-healing, self-managing,
reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
intelligent storage nodes
Path forward
●

Testing
●

●

●

Various workloads
Multiple active MDSs

Test automation
●

●

●

Simple workload generator scripts
Bug reproducers

Hacking
●

●

●

Bug squashing
Long-tail features

Integrations
●

Ganesha, Samba, *stacks
hard links?
●

rare

●

useful locality properties
●

●

●

intra-directory
parallel inter-directory

on miss, file objects provide per-file
backpointers
●

degenerates to log(n) lookups

●

optimistic read complexity
what is journaled
●

lots of state
●

●

●

journaling is expensive up-front, cheap to recover
non-journaled state is cheap, but complex (and somewhat
expensive) to recover

yes
●

●

●

client sessions
actual fs metadata modifications

no
●

●

●

cache provenance
open files

lazy flush
●

client modifications may not be durable until fsync() or visible by
another client

Más contenido relacionado

La actualidad más candente

What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...Ian Colle
 
Brief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformBrief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformJean-Paul Azar
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCeph Community
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph Community
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesKafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesJean-Paul Azar
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red_Hat_Storage
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ian Colle
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopJean-Paul Azar
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?Ricardo Paiva
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
 
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Community
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsJean-Paul Azar
 
How Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeHow Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeRicardo Paiva
 

La actualidad más candente (20)

What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
 
Brief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformBrief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming Platform
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOKCEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
 
Ceph as storage for CloudStack
Ceph as storage for CloudStack Ceph as storage for CloudStack
Ceph as storage for CloudStack
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesKafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examples
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016Red Hat Ceph Storage Roadmap: January 2016
Red Hat Ceph Storage Roadmap: January 2016
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and Hadoop
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati ChamarthyCeph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
Ceph Client librbd Performance Analysis and Learnings - Mahati Chamarthy
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and Ops
 
How Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeHow Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in Europe
 

Destacado

London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress Ceph Community
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph Ceph Community
 
London Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with JujuLondon Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with JujuCeph Community
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNCeph Community
 
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + GanetiLondon Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + GanetiCeph Community
 
London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet? London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet? Ceph Community
 
London Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the EchosystemLondon Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the EchosystemCeph Community
 
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Community
 
DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012Ceph Community
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Community
 
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview Ceph Community
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Community
 
Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome Ceph Community
 
Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6 Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6 Ceph Community
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Community
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Community
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Community
 
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...Ceph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Qué es un virus informático
Qué es un virus informáticoQué es un virus informático
Qué es un virus informáticojoss_24_jvvg
 

Destacado (20)

London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress London Ceph Day: Erasure Coding: Purpose and Progress
London Ceph Day: Erasure Coding: Purpose and Progress
 
London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph London Ceph Day Keynote: Building Tomorrow's Ceph
London Ceph Day Keynote: Building Tomorrow's Ceph
 
London Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with JujuLondon Ceph Day: Deploying Ceph and OpenStack with Juju
London Ceph Day: Deploying Ceph and OpenStack with Juju
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERN
 
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + GanetiLondon Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
London Ceph Day: Unified Cloud Storage with Synnefo + Ceph + Ganeti
 
London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet? London Ceph Day: Ceph for SMBs: Are we there yet?
London Ceph Day: Ceph for SMBs: Are we there yet?
 
London Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the EchosystemLondon Ceph Day: Ceph in the Echosystem
London Ceph Day: Ceph in the Echosystem
 
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
 
DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012DreamObjects - Ceph Day Nov 2012
DreamObjects - Ceph Day Nov 2012
 
Ceph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem UpdateCeph Day New York 2014: Ceph Ecosystem Update
Ceph Day New York 2014: Ceph Ecosystem Update
 
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Day London 2014 - Ceph Ecosystem Overview
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
 
Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome Ceph Day Beijing: Welcome
Ceph Day Beijing: Welcome
 
Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6 Ceph Day Amsterdam 2015 - Ceph over IPv6
Ceph Day Amsterdam 2015 - Ceph over IPv6
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clustersCeph Day Berlin: Measuring and predicting performance of Ceph clusters
Ceph Day Berlin: Measuring and predicting performance of Ceph clusters
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
 
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
Ceph Day Berlin: Building Your Own Disaster? The Safe Way to Make Ceph Storag...
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Qué es un virus informático
Qué es un virus informáticoQué es un virus informático
Qué es un virus informático
 

Similar a London Ceph Day: The Future of CephFS

Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS Ceph Community
 
New features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and BeyondNew features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and BeyondCeph Community
 
New Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and BeyondNew Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and BeyondOpenStack Foundation
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with cephIan Colle
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemNETWAYS
 
Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Ceph Community
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Ceph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph FundamentalsCeph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph FundamentalsCeph Community
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep DiveRed_Hat_Storage
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Divejoshdurgin
 
Ceph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver MeetupCeph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver Meetupktdreyer
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatThe Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatOpenStack
 
Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)Sharma Aashish
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisAnkit Singhal
 
Ceph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define StorageCeph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define StorageItalo Santos
 

Similar a London Ceph Day: The Future of CephFS (20)

Block Storage For VMs With Ceph
Block Storage For VMs With CephBlock Storage For VMs With Ceph
Block Storage For VMs With Ceph
 
XenSummit - 08/28/2012
XenSummit - 08/28/2012XenSummit - 08/28/2012
XenSummit - 08/28/2012
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS
 
New features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and BeyondNew features for Ceph with Cinder and Beyond
New features for Ceph with Cinder and Beyond
 
New Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and BeyondNew Features for Ceph with Cinder and Beyond
New Features for Ceph with Cinder and Beyond
 
Openstack with ceph
Openstack with cephOpenstack with ceph
Openstack with ceph
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage System
 
Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012Storage Developer Conference - 09/19/2012
Storage Developer Conference - 09/19/2012
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Ceph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph FundamentalsCeph Day NYC: Ceph Fundamentals
Ceph Day NYC: Ceph Fundamentals
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices: A Deep DiveCeph Block Devices: A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver MeetupCeph Overview for Distributed Computing Denver Meetup
Ceph Overview for Distributed Computing Denver Meetup
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatThe Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)Cloudjiffy vs Open Shift (private cloud)
Cloudjiffy vs Open Shift (private cloud)
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
 
Ceph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define StorageCeph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define Storage
 

Último

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

London Ceph Day: The Future of CephFS

  • 2. APP APP LIBRADOS LIBRADOS APP APP RADOSGW RADOSGW AA bucket-based bucket-based AA library allowing REST gateway, library allowing REST gateway, apps to directly compatible with S3 apps to directly compatible with S3 access RADOS, access RADOS, and Swift and Swift with support for with support for C, C++, Java, C, C++, Java, Python, Ruby, Python, Ruby, and PHP and PHP HOST/VM HOST/VM CLIENT CLIENT RBD RBD CEPH FS AA reliable and fullyreliable and fullydistributed block distributed block device, with a a Linux device, with Linux kernel client and a a kernel client and QEMU/KVM driver QEMU/KVM driver A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE RADOS RADOS AA reliable, autonomous, distributed object store comprised of self-healing, self-managing, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes intelligent storage nodes
  • 5. Metadata Server • Manages metadata for a POSIX-compliant shared filesystem • Directory hierarchy • File metadata (owner, timestamps, mode, etc.) • Stores metadata in RADOS • Does not serve file data to clients • Only required for shared filesystem
  • 6. legacy metadata storage ● a scaling disaster ● name → inode → block list → data ● no inode table locality ● fragmentation – inode table – directory ● many seeks ● difficult to partition etc home usr var vmlinuz … hosts mtab passwd … bin include lib …
  • 7. ceph fs metadata storage ● ● 100 1 etc home usr var vmlinuz … hosts mtab passwd … block lists unnecessary inode table mostly useless ● 102 bin include lib … ● ● APIs are path-based, not inode-based no random table access, sloppy caching embed inodes inside directories ● good locality, prefetching ● leverage key/value object
  • 8. controlling metadata io ● view ceph-mds as cache ● reduce reads – ● reduce writes – ● dir+inode prefetching journal consolidate multiple writes large journal or log ● stripe over objects ● two tiers – – ● journal for short term per-directory for long term fast failure recovery directories
  • 10. load distribution ● coarse (static subtree) ● ● high management overhead fine (hash) ● always balanced ● less vulnerable to hot spots ● ● static subtree preserve locality ● good locality destroy hierarchy, locality can a dynamic approach capture benefits of both extremes? hash directories good balance hash files
  • 11.
  • 12.
  • 13.
  • 14.
  • 16. dynamic subtree partitioning ● scalable ● ● arbitrarily partition metadata adaptive ● ● ● move work from busy to idle servers replicate hot metadata efficient ● ● hierarchical partition preserve locality dynamic ● daemons can join/leave ● take over for failed nodes
  • 19. Metadata replication and availability
  • 21. client protocol ● highly stateful ● ● consistent, fine-grained caching seamless hand-off between ceph-mds daemons ● ● ● when client traverses hierarchy when metadata is migrated between servers direct access to OSDs for file I/O
  • 22. an example ● mount -t ceph 1.2.3.4:/ /mnt ● ● ● 3 ceph-mon RT 2 ceph-mds RT (1 ceph-mds to -osd RT) 2 ceph-mds RT (2 ceph-mds to -osd RT) ls -al ● open ● readdir – 1 ceph-mds RT (1 ceph-mds to -osd RT) ● stat each file ● ● ceph-osd cd /mnt/foo/bar ● ● ceph-mon close cp * /tmp ● N ceph-osd RT ceph-mds
  • 23. recursive accounting ● ceph-mds tracks recursive directory stats ● file sizes ● file and directory counts ● modification time ● virtual xattrs present full stats ● efficient $ ls ­alSh | head total 0 drwxr­xr­x 1 root            root      9.7T 2011­02­04 15:51 . drwxr­xr­x 1 root            root      9.7T 2010­12­16 15:06 .. drwxr­xr­x 1 pomceph         pg4194980 9.6T 2011­02­24 08:25 pomceph drwxr­xr­x 1 mcg_test1       pg2419992  23G 2011­02­02 08:57 mcg_test1 drwx­­x­­­ 1 luko            adm        19G 2011­01­21 12:17 luko drwx­­x­­­ 1 eest            adm        14G 2011­02­04 16:29 eest drwxr­xr­x 1 mcg_test2       pg2419992 3.0G 2011­02­02 09:34 mcg_test2 drwx­­x­­­ 1 fuzyceph        adm       1.5G 2011­01­18 10:46 fuzyceph drwxr­xr­x 1 dallasceph      pg275     596M 2011­01­14 10:06 dallasceph
  • 24. snapshots ● volume or subvolume snapshots unusable at petabyte scale ● ● snapshot arbitrary subdirectories simple interface ● hidden '.snap' directory ● no special tools $ mkdir foo/.snap/one $ ls foo/.snap one $ ls foo/bar/.snap _one_1099511627776 $ rm foo/myfile $ ls -F foo bar/ $ ls -F foo/.snap/one myfile bar/ $ rmdir foo/.snap/one # create snapshot # parent's snap name is mangled # remove snapshot
  • 25. multiple client implementations ● Linux kernel client ● ● mount -t ceph 1.2.3.4:/ /mnt export (NFS), Samba (CIFS) ● ceph-fuse ● libcephfs.so ● your app ● Ganesha (NFS) ● Hadoop (map/reduce) Ganesha libcephfs Samba libcephfs Hadoop libcephfs your app libcephfs Samba (CIFS) ● SMB/CIFS NFS ceph ceph-fuse fuse kernel
  • 26. APP APP LIBRADOS LIBRADOS APP APP HOST/VM HOST/VM RADOSGW RADOSGW RBD RBD AA bucket-based bucket-based AA library allowing REST gateway, library allowing REST gateway, apps to directly compatible with S3 apps to directly compatible with S3 access RADOS, access RADOS, and Swift and Swift with support for with support for C, C++, Java, C, C++, Java, Python, Ruby, Python, Ruby, AWESOME and PHP and PHP CEPH FS CEPH FS AA reliable and fullyreliable and fullydistributed block distributed block device, with a a Linux device, with Linux kernel client and a a kernel client and QEMU/KVM driver QEMU/KVM driver AA POSIX-compliant POSIX-compliant distributed file system, distributed file system, with a a Linux kernel with Linux kernel client and support for client and support for FUSE FUSE AWESOME AWESOME RADOS RADOS CLIENT CLIENT NEARLY AWESOME AWESOME AA reliable, autonomous, distributed object store comprised of self-healing, self-managing, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes intelligent storage nodes
  • 27. Path forward ● Testing ● ● ● Various workloads Multiple active MDSs Test automation ● ● ● Simple workload generator scripts Bug reproducers Hacking ● ● ● Bug squashing Long-tail features Integrations ● Ganesha, Samba, *stacks
  • 28.
  • 29. hard links? ● rare ● useful locality properties ● ● ● intra-directory parallel inter-directory on miss, file objects provide per-file backpointers ● degenerates to log(n) lookups ● optimistic read complexity
  • 30. what is journaled ● lots of state ● ● ● journaling is expensive up-front, cheap to recover non-journaled state is cheap, but complex (and somewhat expensive) to recover yes ● ● ● client sessions actual fs metadata modifications no ● ● ● cache provenance open files lazy flush ● client modifications may not be durable until fsync() or visible by another client

Notas del editor

  1. {"5":"If you aren’t running Ceph FS, you don’t need to deploy metadata servers.\n","11":"If there’s just one MDS (which is a terrible idea), it manages metadata for the entire tree.\n","12":"When the second one comes along, it will intelligently partition the work by taking a subtree.\n","1":"<number>\n","13":"When the third MDS arrives, it will attempt to split the tree again.\n","2":"Finally, let’s talk about Ceph FS. Ceph FS is a parallel filesystem that provides a massively scalable, single-hierarchy, shared disk. If you use a shared drive at work, this is the same thing except that the same drive could be shared by everyone you’ve ever met (and everyone they’ve ever met).\n","14":"Same with the fourth.\n","3":"Remember all that meta-data we talked about in the beginning? Feels so long ago. It has to be stored somewhere! Something has to keep track of who created files, when they were created, and who has the right to access them. And something has to remember where they live within a tree. Enter MDS, the Ceph Metadata Server. Clients accessing Ceph FS data first make a request to an MDS, which provides what they need to get files from the right OSDs.\n","9":"So how do you have one tree and multiple servers?\n","26":"Ceph FS is feature-complete but still lacks the testing, quality assurance, and benchmarking work we feel it needs to recommend it for production use.\n","15":"A MDS can actually even just take a single directory or file, if the load is high enough. This all happens dynamically based on load and the structure of the data, and it’s called “dynamic subtree partitioning”.\n","4":"There are multiple MDSs!\n"}