SlideShare a Scribd company logo
1 of 23
Download to read offline
KFS aka Kosmos FS

  Nicolae Florin Petrovici
About me
• C++ programmer, several years of experience
• PHP programmer, several years of experience
• Working at 1&1 since March 2011, the backup dpt.
• Interested in : Linux, opensource,architecture
  design, Nokia Qt, frameworks, Web 2.0
• Personal projects: Web 2.0 search site (will launch
  in Feb. 2012), small contributions to opensource
• florin.petrovici@1and1.ro


                                               2
What's this all about ?

• KosmosFS – distributed FS, written in C++
• Previously known as Cloudstore
• Developed by Kosmix (American company)
• Opensource http://code.google.com/p/kosmosfs/
• Kosmix was so popular, that it was acquired by
  Walmart, the largest retail store chain in the
  world
• Several days experience with the product

                                         3
Sneak peek: A large variety of products inside a Walmart store




                                                     4
Fun facts
• KFS is modelled after HDFS (Hadoop
  Filesystem)
• C++ clone, version 0.5, unstable
• Bindings for Java, Python
• 4 bugs in the issue tracker, one submitted
  by yours truly
• Doesn't compile with g++ 4.6 (boost regex
  linking error)
• Has some potential                  5
So what's a distributed filesystem?


• As it says, it is a filesystem which is distributed
• Allows access to files from multiple hosts
• Usually, has a custom protocol based on TCP/UDP
• Clients do not have direct access to block storage
• It is NOT a shared filesystem (like NFS)




                                                        6
So what's a distributed filesystem?

Data is usually stored in chunks, across the network on
●


multiple nodes
●
 Chunks are replicated using a replication factor of 2 or 3
(fault-tolerance)
Metadata (the chunk location) is maintained using a
●


metaserver
Usually has some sort of single point of failure
●



It is the next big thing advertised for handling BigData
●



Hype started with Google Filesystem and MapReduce
●




                                                       7
Google Filesystem in a pic




   Chunks are usually 64 MB
                              8
Google Filesystem general overview
•   Cheap commodity hardware
•   Nodes are divided in : Master Node, Chunkservers
•   Chunkservers store the data already broken up in chunks
•   Each chunk is assigned a unique 64 bit label (logical mapping of files to chunks)
•   Each chunk is replicated several times
•   Master Node: table mappings with the 64 bit labels to chunk locations, locations of the copies of
    chunks, what processes are reading or writing
•   Metadata is kept in memory and flushed from time to time (checkpoints)
•   Permissions for modifications are handled by a system of time-limited expiring “leases”
•   Lease – finite period of time of ownership on that chunk
•   The chunk is then propagated to the chunkservers with the backup copies
•   ACK mechanism => operation atomicity
•   Single point of Failure: Master Node
•   Usespace library, no FUSE
•   Copies: HDFS (Java stack), KFS (C++ stack)
                                                                                   9
What were we talking about ?
•   Modelled after GFS
                                KFS
•   MetaServer, ChunkServer, client library
•   ChunkServer stores chunks as files
•   To protect agains corruptions, checksums are used on each 64KB block and saved in the
    chunk metadata
•   On reads, checksum verification is done using the saved data
•   Each chunk file is named: [file-id].[chunk-id].version
•   Each chunk has a 16K header that contains the chunk checksum information, updated during
    writes


The ChunkServer is only aware of its own chunks.
Upon restart, the metaserver validates the blocks and notifies the chunkserver of any stale
   blocks (not ownred by any file in the system) resulting in the deletion of those chunks


                                                                                 10
Features
                        KFS
•   Per file Replication (of course)
•   Re-replication (if possible)
•   Data integrity (checksums)
•   File writes (lazy writing or force-flush)
•   Stale chunks detection
•   Bindings (already mentioned that)
•   FUSE (Filesystem in Userspace)
                                                11
Advantages over HDFS

●
    File writing
    HDFS writes to a file once and reads many times. KFS supports seeking in file
    and writing multiple times
●
    Data integrity
     After you write to a file, the data becomes visible to other apps when the app
     closes the file. So, if the process were to crash before closing the file, the data
     is lost. KFS exposes data as soon as it gets pushed out to the chunkservers. It
     also has a caching mechanism which can be disabled/enabled on client side
●
    Data rebalancing
     Rudimentary support for automatic rebalancing (The system may migrate
     chunks from over-utilized nodes to under-utilized nodes)


                                                                           12
Insert smiley face here




                          13
Java people – why should you be interested?
•       KFS can be integrated in the Hadoop chain
•       Instructions here: http://code.google.com/p/kosmosfs/wiki/UsingKFSWithHadoop
Still
•       Actually, you shouldn't be
Hadoop stack is much better:
         - Provides all the necessary tools for mapreduce       jobs
         - Hadoop also has streaming support => clients can also be written in Python/C
         - Pig, Hive and other analyzing frameworks
         - HDFS is widely used in conjunction with HBASE
•       But in a few years:
•       Imagine a world with the C++ equivalent stack:
         -   KFS – distributed filesystem
         - HyperTable – C++ equivalent of Hbase (http://www.hypertable.org/)
         - Mapreduce in C++ anyone ?
         Major advantage: lower memory footprint, faster loading times etc
                                                                                 14
So it's basically




                    vs




Moore                       Lennon

                                15
Building (for those interested)

• Cmake based system
• Binaries are created inside a “build”
  directory
• Option to build the JNI/Python bindings
  and the FUSE filesystem
• Doesn't work with g++ 4.6 (boost_regex
  linking error)


                                             16
Deployment

•   Very engineer-wise:
•   You must have ssh passwordless acces on all machines
•   Script copies binaries and scripts to all nodes
•   Caveat: Doesn't work on different architectures
•   Define a machine configuration file
•   Define a machines.txt file (names of the nodes from machines.cfg file)
•   Deploy: python kfssetup.py -f machines.cfg -m machines.txt -b ../build -w ../webui
•   Start: python kfslaunch.py -f machines.cfg -m machines.txt -s
•   Stop: python kfslaunch.py -f machines.cfg -m machines.txt -S
•   Check status: kfsping -m -s <metaserver host> -p <metaserver port>
•   Uninstall: python kfssetup.py -f machines.cfg -m machines.txt -b ../build/bin -U
•   Also comes with a Python webserver which calls kfsping in the background for
    those interested




                                                                           17
Example config file
            [metaserver]
          node: machine1
     clusterkey: kfs-test-cluster
        rundir: /mnt/kfs/meta
          baseport: 20000
           loglevel: INFO
           numservers: 2
       [chunkserver_defaults]
       rundir: /mnt/kfs/chunk
chunkDir: /mnt/kfs/chunk/bin/kfschunk
          baseport: 30000
           space: 3400 G
           loglevel: INFO

                                        18
Cluster setup

• I proceeded enthusiastically to set up my own KFS cluster out of :
• x86_32 laptop (3GB ram) - masterNode
• x86_32 Netbook (Atom 1.2 Ghz 1gb ram) – chunkServer
• armv7 SheevaPlug (ARM 1.2Ghz, 512MB ram) -chunkServer


Slow compile-time on the SheevaPlug (over 1 hour)
Communication done via wireless


By the way, anyone wants to buy a netbook ?



                                                          19
Mounting via FUSE

• If you want to mount via FUSE:
kfs_mount <mount_point> -f (force foreground)
You need to have a kfs.prp file:


• metaServer.name = localhost
• metaServer.port = 20000

Unfortunately, the FUSE mount sometimes produces a segfault in
 one of the chunkservers. C'est la vie. I already filed a bug for this




                                                             20
Accessing via the client library

• Rich undocumented OOP/OOD API
• Main entry point: KfsClientFactory
Something in the lines of:

client = getKfsClientFactory()->GetClient(kfsPropsFile)
client->Mkdirs(dirname);
fd =client->Create(filename.c_str(), numReplicas)
client->SetIoBufferSize(fd, cliBufSize)
client->Write(fd, buffer, sizeBytes)


Other APIs: Enable/Disable Async write, CompareChunkReplicas, GetDirSummary, AtomicRecordAppend


Also available in the Java/Python distribution near you




                                                                                   21
Support opensource
• This project needs your help in order to
  thrive and to provide a real alternative to
  the Java stack
• API is good, bindings are ready
• Still has some bugs to be fixed
• Not production ready, maybe in a few
  years?


                                        22
23

More Related Content

What's hot

Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESJan Kalcic
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersKento Aoyama
 
Stateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOSStateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOSStephen Nguyen
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesNETWAYS
 
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...NETWAYS
 
Python on FreeBSD
Python on FreeBSDPython on FreeBSD
Python on FreeBSDpycontw
 
Docker and coreos20141020b
Docker and coreos20141020bDocker and coreos20141020b
Docker and coreos20141020bRichard Kuo
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyGluster.org
 
Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets  Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets Perforce
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationEtsuji Nakai
 
Webinar - Getting Started With Ceph
Webinar - Getting Started With CephWebinar - Getting Started With Ceph
Webinar - Getting Started With CephCeph Community
 
Varnish Configuration Step by Step
Varnish Configuration Step by StepVarnish Configuration Step by Step
Varnish Configuration Step by StepKim Stefan Lindholm
 
Java in containers
Java in containersJava in containers
Java in containersMartin Baez
 
pg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL PerformanceSean Chittenden
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a productCarlo Daffara
 
OpenZFS data-driven performance
OpenZFS data-driven performanceOpenZFS data-driven performance
OpenZFS data-driven performanceahl0003
 
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleXPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleThe Linux Foundation
 

What's hot (20)

Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
 
Stateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOSStateful Containers: Flocker on CoreOS
Stateful Containers: Flocker on CoreOS
 
LXC
LXCLXC
LXC
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
 
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
OSDC 2016 - Interesting things you can do with ZFS by Allan Jude&Benedict Reu...
 
Python on FreeBSD
Python on FreeBSDPython on FreeBSD
Python on FreeBSD
 
Docker and coreos20141020b
Docker and coreos20141020bDocker and coreos20141020b
Docker and coreos20141020b
 
Hands On Gluster with Jeff Darcy
Hands On Gluster with Jeff DarcyHands On Gluster with Jeff Darcy
Hands On Gluster with Jeff Darcy
 
CvmFS Workshop
CvmFS WorkshopCvmFS Workshop
CvmFS Workshop
 
Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets  Scaling Servers and Storage for Film Assets
Scaling Servers and Storage for Film Assets
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 
Webinar - Getting Started With Ceph
Webinar - Getting Started With CephWebinar - Getting Started With Ceph
Webinar - Getting Started With Ceph
 
Varnish Configuration Step by Step
Varnish Configuration Step by StepVarnish Configuration Step by Step
Varnish Configuration Step by Step
 
Java in containers
Java in containersJava in containers
Java in containers
 
pg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performance
 
Docker.io
Docker.ioDocker.io
Docker.io
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a product
 
OpenZFS data-driven performance
OpenZFS data-driven performanceOpenZFS data-driven performance
OpenZFS data-driven performance
 
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleXPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
 

Similar to Kfs presentation

Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Roger Zhou 周志强
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"IT Event
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Community
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing dataPhil Cryer
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredNETWAYS
 
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebula Project
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) 동현 김
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsNeependra Khare
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Neeraj Shrimali
 
bfarm-v2
bfarm-v2bfarm-v2
bfarm-v2Zeus G
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020Akihiro Suda
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 

Similar to Kfs presentation (20)

Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015 Linux High Availability Overview - openSUSE.Asia Summit 2015
Linux High Availability Overview - openSUSE.Asia Summit 2015
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 
Storing and distributing data
Storing and distributing dataStoring and distributing data
Storing and distributing data
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
 
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Performance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fsPerformance characterization in large distributed file system with gluster fs
Performance characterization in large distributed file system with gluster fs
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Dfs in iaa_s
Dfs in iaa_sDfs in iaa_s
Dfs in iaa_s
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup.
 
bfarm-v2
bfarm-v2bfarm-v2
bfarm-v2
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Kfs presentation

  • 1. KFS aka Kosmos FS Nicolae Florin Petrovici
  • 2. About me • C++ programmer, several years of experience • PHP programmer, several years of experience • Working at 1&1 since March 2011, the backup dpt. • Interested in : Linux, opensource,architecture design, Nokia Qt, frameworks, Web 2.0 • Personal projects: Web 2.0 search site (will launch in Feb. 2012), small contributions to opensource • florin.petrovici@1and1.ro 2
  • 3. What's this all about ? • KosmosFS – distributed FS, written in C++ • Previously known as Cloudstore • Developed by Kosmix (American company) • Opensource http://code.google.com/p/kosmosfs/ • Kosmix was so popular, that it was acquired by Walmart, the largest retail store chain in the world • Several days experience with the product 3
  • 4. Sneak peek: A large variety of products inside a Walmart store 4
  • 5. Fun facts • KFS is modelled after HDFS (Hadoop Filesystem) • C++ clone, version 0.5, unstable • Bindings for Java, Python • 4 bugs in the issue tracker, one submitted by yours truly • Doesn't compile with g++ 4.6 (boost regex linking error) • Has some potential 5
  • 6. So what's a distributed filesystem? • As it says, it is a filesystem which is distributed • Allows access to files from multiple hosts • Usually, has a custom protocol based on TCP/UDP • Clients do not have direct access to block storage • It is NOT a shared filesystem (like NFS) 6
  • 7. So what's a distributed filesystem? Data is usually stored in chunks, across the network on ● multiple nodes ● Chunks are replicated using a replication factor of 2 or 3 (fault-tolerance) Metadata (the chunk location) is maintained using a ● metaserver Usually has some sort of single point of failure ● It is the next big thing advertised for handling BigData ● Hype started with Google Filesystem and MapReduce ● 7
  • 8. Google Filesystem in a pic Chunks are usually 64 MB 8
  • 9. Google Filesystem general overview • Cheap commodity hardware • Nodes are divided in : Master Node, Chunkservers • Chunkservers store the data already broken up in chunks • Each chunk is assigned a unique 64 bit label (logical mapping of files to chunks) • Each chunk is replicated several times • Master Node: table mappings with the 64 bit labels to chunk locations, locations of the copies of chunks, what processes are reading or writing • Metadata is kept in memory and flushed from time to time (checkpoints) • Permissions for modifications are handled by a system of time-limited expiring “leases” • Lease – finite period of time of ownership on that chunk • The chunk is then propagated to the chunkservers with the backup copies • ACK mechanism => operation atomicity • Single point of Failure: Master Node • Usespace library, no FUSE • Copies: HDFS (Java stack), KFS (C++ stack) 9
  • 10. What were we talking about ? • Modelled after GFS KFS • MetaServer, ChunkServer, client library • ChunkServer stores chunks as files • To protect agains corruptions, checksums are used on each 64KB block and saved in the chunk metadata • On reads, checksum verification is done using the saved data • Each chunk file is named: [file-id].[chunk-id].version • Each chunk has a 16K header that contains the chunk checksum information, updated during writes The ChunkServer is only aware of its own chunks. Upon restart, the metaserver validates the blocks and notifies the chunkserver of any stale blocks (not ownred by any file in the system) resulting in the deletion of those chunks 10
  • 11. Features KFS • Per file Replication (of course) • Re-replication (if possible) • Data integrity (checksums) • File writes (lazy writing or force-flush) • Stale chunks detection • Bindings (already mentioned that) • FUSE (Filesystem in Userspace) 11
  • 12. Advantages over HDFS ● File writing HDFS writes to a file once and reads many times. KFS supports seeking in file and writing multiple times ● Data integrity After you write to a file, the data becomes visible to other apps when the app closes the file. So, if the process were to crash before closing the file, the data is lost. KFS exposes data as soon as it gets pushed out to the chunkservers. It also has a caching mechanism which can be disabled/enabled on client side ● Data rebalancing Rudimentary support for automatic rebalancing (The system may migrate chunks from over-utilized nodes to under-utilized nodes) 12
  • 14. Java people – why should you be interested? • KFS can be integrated in the Hadoop chain • Instructions here: http://code.google.com/p/kosmosfs/wiki/UsingKFSWithHadoop Still • Actually, you shouldn't be Hadoop stack is much better: - Provides all the necessary tools for mapreduce jobs - Hadoop also has streaming support => clients can also be written in Python/C - Pig, Hive and other analyzing frameworks - HDFS is widely used in conjunction with HBASE • But in a few years: • Imagine a world with the C++ equivalent stack: - KFS – distributed filesystem - HyperTable – C++ equivalent of Hbase (http://www.hypertable.org/) - Mapreduce in C++ anyone ? Major advantage: lower memory footprint, faster loading times etc 14
  • 15. So it's basically vs Moore Lennon 15
  • 16. Building (for those interested) • Cmake based system • Binaries are created inside a “build” directory • Option to build the JNI/Python bindings and the FUSE filesystem • Doesn't work with g++ 4.6 (boost_regex linking error) 16
  • 17. Deployment • Very engineer-wise: • You must have ssh passwordless acces on all machines • Script copies binaries and scripts to all nodes • Caveat: Doesn't work on different architectures • Define a machine configuration file • Define a machines.txt file (names of the nodes from machines.cfg file) • Deploy: python kfssetup.py -f machines.cfg -m machines.txt -b ../build -w ../webui • Start: python kfslaunch.py -f machines.cfg -m machines.txt -s • Stop: python kfslaunch.py -f machines.cfg -m machines.txt -S • Check status: kfsping -m -s <metaserver host> -p <metaserver port> • Uninstall: python kfssetup.py -f machines.cfg -m machines.txt -b ../build/bin -U • Also comes with a Python webserver which calls kfsping in the background for those interested 17
  • 18. Example config file [metaserver] node: machine1 clusterkey: kfs-test-cluster rundir: /mnt/kfs/meta baseport: 20000 loglevel: INFO numservers: 2 [chunkserver_defaults] rundir: /mnt/kfs/chunk chunkDir: /mnt/kfs/chunk/bin/kfschunk baseport: 30000 space: 3400 G loglevel: INFO 18
  • 19. Cluster setup • I proceeded enthusiastically to set up my own KFS cluster out of : • x86_32 laptop (3GB ram) - masterNode • x86_32 Netbook (Atom 1.2 Ghz 1gb ram) – chunkServer • armv7 SheevaPlug (ARM 1.2Ghz, 512MB ram) -chunkServer Slow compile-time on the SheevaPlug (over 1 hour) Communication done via wireless By the way, anyone wants to buy a netbook ? 19
  • 20. Mounting via FUSE • If you want to mount via FUSE: kfs_mount <mount_point> -f (force foreground) You need to have a kfs.prp file: • metaServer.name = localhost • metaServer.port = 20000 Unfortunately, the FUSE mount sometimes produces a segfault in one of the chunkservers. C'est la vie. I already filed a bug for this 20
  • 21. Accessing via the client library • Rich undocumented OOP/OOD API • Main entry point: KfsClientFactory Something in the lines of: client = getKfsClientFactory()->GetClient(kfsPropsFile) client->Mkdirs(dirname); fd =client->Create(filename.c_str(), numReplicas) client->SetIoBufferSize(fd, cliBufSize) client->Write(fd, buffer, sizeBytes) Other APIs: Enable/Disable Async write, CompareChunkReplicas, GetDirSummary, AtomicRecordAppend Also available in the Java/Python distribution near you 21
  • 22. Support opensource • This project needs your help in order to thrive and to provide a real alternative to the Java stack • API is good, bindings are ready • Still has some bugs to be fixed • Not production ready, maybe in a few years? 22
  • 23. 23