SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
A Travel
Through Mesos
Episode I
1.What is Mesos?
An introduction to Mesos and
its architecture
8 Kg
4kg?
1 Kg
2 Kg
We Want to Buy Oranges...
8 Kg
4kg?
1 Kg
2 Kg
We Need to Try Until There’re Enough
8 Kg
4kg?
1 Kg
2 Kg
One Big Shop Instead of Three!!
What is Mesos?
Resource Manager
Mesos abstracts
computing resources
from nodes in the
datacenter.
“Program against your
datacenter like it’s a single
pool of resources”
Different workloads
Mesos is a platform for
sharing a cluster
between applications. It
can scale up to 10,000s
of nodes.
Uses containerization
Workloads are launched
in containers (either LXC
or Docker), providing an
isolation level.
A Distributed Systems Kernel
Just like OS manages resource utilization allowing concurrent use of the limited
resources by multiple applications, Mesos applies this principle to a whole cluster
of machines to provide resource management and scheduling across the cluster.
2. Architecture
http://mesos.apache.org/documentation/latest/architecture/
http://www.datio.com/2017/01/02/mesos-architecture-roles-
and-responsibilities/
Zookeeper
Masters Agents
Mesos Architecture
Master Nodes
● Source of truth of the cluster status
(in memory - high memory usage)
● Send resource offers to the
applications.
● Host primary UI
● High availability with active-pasive
replication using Zookeeper for
leader election and Paxos for state
sharing.
Zookeeper
Agent Nodes
● Launch containers running
application tasks.
● Advertise their available resources
to the master.
● Host an UI for the launched
containers.
● Manage status updates from the
running tasks and they’re in charge
of communication with the master.
● Known as slaves until 0.28
MESOS IS NOT AN OS
The Kernel comparison can be confusing: each node
has an OS installed and Mesos runs as a service
daemon on it
3. Resources and attributes
http://mesos.apache.org/documentation/latest/attributes-re
sources/
What is a Resource?
Types
● SCALAR (1024.0)
● RANGE ([1-10])
● SET ({elem1, elem2})
Predefined resources
● cpus
● mem
● disk
● ports
Everything an application task uses for doing its work
Resources are Defined by Agent
● Each Mesos agent is
configured with the
resources it has.
● The agent continuously
sends updates to the
master with its available
resources.
cpu 8.0
mem 4096.0
disk 1024.0
ports [9000-65536]
cpu 16.0
mem 8192.0
disk 512.0
ports [9000-10000]
CPUs Resource
Represents how many CPU cores are available.
● Can be specified in fractions (0.5
CPUs)
● By default, Mesos configures
each agent with the number of
cores in the processor.
● Mesos enforces it by using
CPU shares (CPU time per
second)
● It’s a guaranteed minimum (if
there’s more CPU time
available, it could be used)Example
cpus=24
Memory Resource
Represents how many MB of memory are available.
● By default, Mesos configures
each agent with 1 GB or 50% of
detected memory, whichever is
smaller. (Leave memory for the
OS!!)
● It’s a strictly preallocated
resource (you get what you
reserve)
● That makes it a critical resource
(you have to get the right amount
of memory for your tasks,
otherwise they could get killed if
they try to use too much)
Example
mem=1024.0
Disk Resource
Represents how many MB of disk space are available.
● By default, Mesos configures
each agent with 5 GB or 50% of
detected disk, whichever is
smaller
● If affects the container’s
sandbox.
● Mesos, by default, doesn’t
enforce it (it’s not really
allocated, a task can use as
much space as it wants). Setting
--enforce_container_disk_quota
changes that behaviour.
Example
disk=2048.0
Ports Resource
Represents the available ports to listen in the agent.
● It’s a RANGE.
● By default, Mesos configures
each agent to expose port range
31000–32000.
● Port usage is not enforced by
Mesos.
● However, it’s important to
reserve the ports a task must
listen to, to be sure to avoid
conflicts (only one process can
be listening in a port at a time).
Example
ports=[9000-9300]
Custom Resources
● Mesos allows to define any
custom resource.
● Remember that a resource is
something which can be
exclusively reserved.
● There’s no need to enforce
the resource allocation (see
disk or ports).
Examples
● network_bandwith=1000.0
● bugs={bug1, bug2}
● oranges=1500.0
This resources will be offered to
applications, which need to be
able to manage it if they want to
use it.
What is an Attribute?
Types
● SCALAR (1024.0)
● RANGE ([1-10])
● SET ({elem1, elem2})
● They are not allocated, only
passed along with the
resources to the applications
in offers.
● They are a helper for the
scheduling decisions.
Arbitrary key-value data that serves as metadata about the
machine running the agent.
Example
● rack_id=eu-1
● os=ubuntu
4.Frameworks
https://github.com/apache/mesos/blob/master/include/mes
os/mesos.proto
https://gist.github.com/guenter/7471695
Leader Agents
Framework Architecture
Scheduler
Executors - Tasks
Register
Offer
Accept and Launch
Reject
What is a Framework? An application that runs
on Mesos.
● Based in the
master-worker design.
● It’s ad-hoc for the
application business
model
Two components:
● Scheduler
● Executors
Scheduler
● It’s the brain of the
framework.
● Registers with Mesos and
receives resource offers.
● Launches tasks for the
application when it has been
offered with enough
resources, or according
another scheduling logic.
● We could see it as an
intermediate between the
application logic and the
Mesos layer.
● It’s developed for each
application. Mesos provides
an API for doing it (HTTP and
native)
Executor
● Launched by the scheduler
when it has work to do
(worker).
● It will receive tasks to do
from the scheduler and will
send back status updates
(it’s connected with Mesos
too).
● Act as a process container
that runs tasks.
● Mesos provides an executor
API also, but, given that it’s
more general purpose than
the scheduler, Mesos
provides a
CommandExecutor that
should be enough for most of
the workloads.
Task
● The unit of work in Mesos,
the workload that a
scheduler wants to run in the
cluster.
● Runs inside an executor.
● An Executor can run more
than one task (not common).
● A task has a definition of the
needed resources that will
be allocated.
● Mesos will allocate to the
container enough resources
for the bunch of tasks
launched plus the executor.
(and will resize it dynamically
if more tasks are added).
5.Offers
http://www.datio.com/iaas/monitoring-mesos-resource-offer
s-and-tasks/
What is an Offer?
● Used by Mesos to allocate resources to a
framework.
● Leading master send offers to the
frameworks’ schedulers.
What’s Inside an Offer?
● Resources offered.
● Affected agent
(slaveId).
● Attributes of the
agent.
cpu 8.0
mem 4096.0
disk 1024.0
ports [9000-65536]
hostname agent-1
rack_id EU-I-1
slaveId asd1323...
How’re Offers Sent to Frameworks?
● Masters run the resource
allocator module.
● This module decides to
which framework send an
offer using an algorithm
called DRF (Dominant
resource fairness).
● The allocation module is
pluggable.
● The algorithm tries to
maximize the minimal
dominant share across
frameworks. (Considering
their dominant resource)
● DRF orders frameworks and
then the offer is sent to them
in order one at a time.
What to Do with an Offer?
ACCEPT
● Launch a task with
resources of the offer (only
the needed, not all)
● Perform a reservation.
● Create a persistent volume.
REJECT
● Don’t do anything with an
offer.
● Why? When Mesos sends an
offer to a scheduler for the
Allocator the resources are
allocated to the framework.
(framework penalized in the
DRF)
More About Offers
● Different offers of the same
agent can be grouped to get
more resources (when
accepting an offer).
● Several tasks can be
launched with the same
offer (as long as there are
enough resources)
● Mesos tries to send offers as
big as possible.
Two Level Scheduling
Master manages cluster
resources and decides to
which framework send an
offer.
Schedulers accept or
reject offers according to
the concrete application
needs.
A Travel
Through Mesos
Episode II
6.Roles
http://mesos.apache.org/documentation/latest/roles/
http://mesos.apache.org/documentation/latest/weights/
http://mesos.apache.org/documentation/latest/quota/
What’s a Role?
● Like a group of frameworks.
● Used to ensure that certain resources are only offered
to certain frameworks (only resources allocated to a
role are offered to a framework, with an exception).
● Each framework registers with Mesos with a role (by
default, * )
* IS A ROLE, NOT ANY
The default role (*) doesn’t mean that any role is
accepted, is a concrete role (Bad name…)
More on Roles
Any role is allowed
Frameworks can register
with any role name,
unless the flag --roles is
set in the Mesos masters
with a concrete list.
Resources allocated to *
are available to all roles
By default, resources are
allocated to the default
role (*). All the
frameworks, no matter
their role, will receive
offers of resources
allocated to ‘*’.
Roles can use weights
Weights can be assigned
to roles, allowing to
indicate in DRF that
certain role has to get a
higher amount of
resources than other.
7.Reservation
http://mesos.apache.org/documentation/latest/reservation/
What’s a Reservation?
The way to allocate
resources in an agent to
specific roles
Static Reservation
While configuring the
exposed resources in an
agent, those resources
could be statically
reserved to concrete
roles.
cpu 4.0
mem 2048.0
disk(*) 512.0
ports [9000-65536]
cpu(pro) 4.0
mem(pro) 2048.0
disk(pro) 512.0
Static Reservation
Not recommended
Static reservations are
only maintained for
backwards compatibility.
Restart needed
To change the amount of
reserved resources it’s
needed to modify the
agent configuration and
restart it.
By default, resources
are allocated to the
default role
Dynamic Reservation
Resources can be
reserved and
unreserved
In runtime, resources
can be reserved to a
role, and later they can
be unreserved when no
task is using that
resources.
Using an HTTP
endpoint
Dynamic reservation is
managed by operators
using HTTP endpoints
for reserve and
unreserve.
Using an acceptOffers
operation
Schedulers can
reserve/unreserve
resources when
accepting an offer by
using two special
operations.
8. Persistent Volumes
http://mesos.apache.org/documentation/latest/sandbox/
http://mesos.apache.org/documentation/latest/multiple-disk/
http://mesos.apache.org/documentation/latest/persistent-volume/
Sandbox (Disk Resource)
Working directory
A Sandbox is a
temporary directory
given to each executor
and set as working
directory for it. It’s
accessible from outside
the container.
Stores logs and other
data
It contains the stdout
and stderr of the
executor. Besides that it
contains the fetched files
(URI) and files created by
the task.
Garbage collected
This directory is cleaned
from the agent system
once a configurable
period of time has
passed.
Persistent Volumes
● Created from disk resources, they live outside the
executor’s sandbox and will persist on the agent.
● When a task using them finishes, they are offered back
without losing data.
● Used for stateful services.
More on Persistent Volumes
● Created over previously
reserved disk resources.
● No more than one task can
have the volume at the same
time.
● To unreserve the disk
resources associated with a
persistent volume, it’s
needed to destroy the
volume first
● Created/destroyed using
HTTP endpoints or via
acceptOffers in the
Scheduler.
● Associated to a role (volume
can be offered back to any
framework in the role).
Type of Disk Resources
ROOT
Maps to the main
operating system
storage drive. It’s the
default option.
MOUNT
Auxiliary disks provided
by operators which maps
to a mount point in the
host OS. When reserved,
all the disk is reserved
(no matter the reserved
size).
PATH
Auxiliary disk resource
created by operator,
which maps a directory
in the host OS to a disk
resource. Usually used
to carve up a mounted
disk in smaller chunks.
9.Containerizers
http://mesos.apache.org/documentation/latest/containerizer/
http://mesos.apache.org/documentation/latest/containerizer-inter
nals/
http://mesos.apache.org/documentation/latest/container-image/
Container?
Task
isolation
Contain
task
resources
Control
task
resource
usage
Run in
different
environm.
Docker Containerizer
● Works with Docker images
(task/executor).
● Uses docker-engine (docker
run….).
● Needs docker installed in
each agent. (external
dependency…)
Mesos roadmap is unifying
containerizers and stop its
support.
Mesos Containerizer
● Runs commands of the host
OS.
● Runs Docker/AppC Images
(Universal Containerizer).
● Uses LXC.
● Based on pluggable
isolators, which are used for
isolating resources from
other containers.
● Examples: cgroups/cpu,
cgroups/mem,
docker/volume, disk/du,
docker/runtime, network/cni,
etc.
Tip:
sudo nsenter --mount --uts --ipc --net --pid
--target <PID_CONTAINER>
Docker on Mesos Containerizer
● A Docker image
represents a filesystem.
● Mesos pulls the image
and extracts the
filesystem.
● Using pivotroot, the
container is launched
over that filesystem.
● Isolation is done by the
Mesos containerizer (no
docker-engine
dependency).http://events.linuxfoundation.org/sites/events/files/sli
des/Mesos%20and%20Containers.pdf
Docker on Mesos Containerizer
BE CAREFUL WITH
PERMISSIONS
User namespace matches with
the agent (the only way to use an
user created in the Dockerfile is
to have an user in the agent with
the same name, uid and gid).
BRIDGE NETWORK IS NOT
SUPPORTED
When you bind to a port, by
default you do it on the agent
host stack (if you’re not using
another isolator like network/cni
for using virtual networks and IP
per container).
10. More Aspects
External volumes, oversubscription, checkpointing
External Volumes
● Uses dvdcli and a Docker
Volume plugin, for instance
REX-Ray or GlusterFS
(dependency).
● Mounts an external volume
from a storage provider to
the task container (Cinder,
Amazon EBS, etc).
● Instead of binding a task data
to an agent (persistent
volumes) it manages storage
outside the agents.
Oversubscription
● Frameworks can use
resources allocated to a
framework but temporarily
unused.
● These resources can be
revoked by Mesos in any
moment.
● A QoS module ensures that
the framework to which these
resources belong has not
impact in its performance.
Checkpointing
For agent recovery, a
Framework can enable
checkpointing to write its
state to disk regularly.
If the Mesos Agent is stopped (a
failure or upgrade), tasks of
checkpointed frameworks
continue running (otherwise,
all running tasks are killed).
Hands On: Let’s make a framework
https://github.com/roberveral/mesos-gocd
THANKS!
Any questions?
@datiobd | @roberveral
rveral@datiobd.com
datio-big-data | robertoveral

Más contenido relacionado

La actualidad más candente

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsCheng Lian
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Cosmin Lehene
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0Sigmoid
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark InternalsKnoldus Inc.
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
 

La actualidad más candente (20)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Apache spark core
Apache spark coreApache spark core
Apache spark core
 
Road to Analytics
Road to AnalyticsRoad to Analytics
Road to Analytics
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime Internals
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 

Similar a A Travel Through Mesos

Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesosOmid Vahdaty
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Joe Stein
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemNETWAYS
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Rahul Kumar
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosJoe Stein
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Timothy St. Clair
 
Mesos study report 03v1.2
Mesos study report  03v1.2Mesos study report  03v1.2
Mesos study report 03v1.2Stefanie Zhao
 
DockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDocker, Inc.
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on MesosJoe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache MesosKnoldus Inc.
 
Apache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basicsApache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basicsGladson Manuel
 
Netflix container scheduling talk at stanford final
Netflix container scheduling talk at stanford   finalNetflix container scheduling talk at stanford   final
Netflix container scheduling talk at stanford finalSharma Podila
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersVaibhav Sharma
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark Summit
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosRob Gulewich
 

Similar a A Travel Through Mesos (20)

Apache mesos - overview
Apache mesos - overviewApache mesos - overview
Apache mesos - overview
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Apache Mesos
Apache MesosApache Mesos
Apache Mesos
 
Podila QCon SF 2016
Podila QCon SF 2016Podila QCon SF 2016
Podila QCon SF 2016
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
Mesos sys adminday
Mesos sys admindayMesos sys adminday
Mesos sys adminday
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.
 
Mesos study report 03v1.2
Mesos study report  03v1.2Mesos study report  03v1.2
Mesos study report 03v1.2
 
DockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and ContainerizationDockerCon14 Cluster Management and Containerization
DockerCon14 Cluster Management and Containerization
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Apache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basicsApache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basics
 
Netflix container scheduling talk at stanford final
Netflix container scheduling talk at stanford   finalNetflix container scheduling talk at stanford   final
Netflix container scheduling talk at stanford final
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Scaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with MesosScaling and Embracing Failure: Clustering Docker with Mesos
Scaling and Embracing Failure: Clustering Docker with Mesos
 

Más de Datio Big Data

Descubriendo la Inteligencia Artificial
Descubriendo la Inteligencia ArtificialDescubriendo la Inteligencia Artificial
Descubriendo la Inteligencia ArtificialDatio Big Data
 
Learning Python. Level 0
Learning Python. Level 0Learning Python. Level 0
Learning Python. Level 0Datio Big Data
 
How to document without dying in the attempt
How to document without dying in the attemptHow to document without dying in the attempt
How to document without dying in the attemptDatio Big Data
 
Ceph: The Storage System of the Future
Ceph: The Storage System of the FutureCeph: The Storage System of the Future
Ceph: The Storage System of the FutureDatio Big Data
 
Quality Assurance Glossary
Quality Assurance GlossaryQuality Assurance Glossary
Quality Assurance GlossaryDatio Big Data
 
Gamification: from buzzword to reality
Gamification: from buzzword to realityGamification: from buzzword to reality
Gamification: from buzzword to realityDatio Big Data
 
Pandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data ManipulationPandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data ManipulationDatio Big Data
 
DC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern appsDC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern appsDatio Big Data
 
PDP Your personal development plan
PDP Your personal development planPDP Your personal development plan
PDP Your personal development planDatio Big Data
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by DatioDatio Big Data
 

Más de Datio Big Data (17)

Búsqueda IA
Búsqueda IABúsqueda IA
Búsqueda IA
 
Descubriendo la Inteligencia Artificial
Descubriendo la Inteligencia ArtificialDescubriendo la Inteligencia Artificial
Descubriendo la Inteligencia Artificial
 
Learning Python. Level 0
Learning Python. Level 0Learning Python. Level 0
Learning Python. Level 0
 
Learn Python
Learn PythonLearn Python
Learn Python
 
How to document without dying in the attempt
How to document without dying in the attemptHow to document without dying in the attempt
How to document without dying in the attempt
 
Developers on test
Developers on testDevelopers on test
Developers on test
 
Ceph: The Storage System of the Future
Ceph: The Storage System of the FutureCeph: The Storage System of the Future
Ceph: The Storage System of the Future
 
Datio OpenStack
Datio OpenStackDatio OpenStack
Datio OpenStack
 
Quality Assurance Glossary
Quality Assurance GlossaryQuality Assurance Glossary
Quality Assurance Glossary
 
Data Integration
Data IntegrationData Integration
Data Integration
 
Gamification: from buzzword to reality
Gamification: from buzzword to realityGamification: from buzzword to reality
Gamification: from buzzword to reality
 
Pandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data ManipulationPandas: High Performance Structured Data Manipulation
Pandas: High Performance Structured Data Manipulation
 
Del Mono al QA
Del Mono al QADel Mono al QA
Del Mono al QA
 
DC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern appsDC/OS: The definitive platform for modern apps
DC/OS: The definitive platform for modern apps
 
PDP Your personal development plan
PDP Your personal development planPDP Your personal development plan
PDP Your personal development plan
 
Security&Governance
Security&GovernanceSecurity&Governance
Security&Governance
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by Datio
 

Último

Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Amil baba
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide LaboratoryBahzad5
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfNaveenVerma126
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfRedhwan Qasem Shaddad
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxwendy cai
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Akarthi keyan
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvementVijayMuni2
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Projectreemakb03
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Apollo Techno Industries Pvt Ltd
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderjuancarlos286641
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technologyabdulkadirmukarram03
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabusViolet Violet
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...sahb78428
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfGiovanaGhasary1
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptxMUKULKUMAR210
 

Último (20)

Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdf
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptx
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part A
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvement
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Project
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entender
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technology
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabus
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdf
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptx
 
Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 

A Travel Through Mesos

  • 2. 1.What is Mesos? An introduction to Mesos and its architecture
  • 3. 8 Kg 4kg? 1 Kg 2 Kg We Want to Buy Oranges...
  • 4. 8 Kg 4kg? 1 Kg 2 Kg We Need to Try Until There’re Enough
  • 5. 8 Kg 4kg? 1 Kg 2 Kg One Big Shop Instead of Three!!
  • 6. What is Mesos? Resource Manager Mesos abstracts computing resources from nodes in the datacenter. “Program against your datacenter like it’s a single pool of resources” Different workloads Mesos is a platform for sharing a cluster between applications. It can scale up to 10,000s of nodes. Uses containerization Workloads are launched in containers (either LXC or Docker), providing an isolation level.
  • 7. A Distributed Systems Kernel Just like OS manages resource utilization allowing concurrent use of the limited resources by multiple applications, Mesos applies this principle to a whole cluster of machines to provide resource management and scheduling across the cluster.
  • 10. Master Nodes ● Source of truth of the cluster status (in memory - high memory usage) ● Send resource offers to the applications. ● Host primary UI ● High availability with active-pasive replication using Zookeeper for leader election and Paxos for state sharing. Zookeeper
  • 11. Agent Nodes ● Launch containers running application tasks. ● Advertise their available resources to the master. ● Host an UI for the launched containers. ● Manage status updates from the running tasks and they’re in charge of communication with the master. ● Known as slaves until 0.28
  • 12. MESOS IS NOT AN OS The Kernel comparison can be confusing: each node has an OS installed and Mesos runs as a service daemon on it
  • 13. 3. Resources and attributes http://mesos.apache.org/documentation/latest/attributes-re sources/
  • 14. What is a Resource? Types ● SCALAR (1024.0) ● RANGE ([1-10]) ● SET ({elem1, elem2}) Predefined resources ● cpus ● mem ● disk ● ports Everything an application task uses for doing its work
  • 15. Resources are Defined by Agent ● Each Mesos agent is configured with the resources it has. ● The agent continuously sends updates to the master with its available resources. cpu 8.0 mem 4096.0 disk 1024.0 ports [9000-65536] cpu 16.0 mem 8192.0 disk 512.0 ports [9000-10000]
  • 16. CPUs Resource Represents how many CPU cores are available. ● Can be specified in fractions (0.5 CPUs) ● By default, Mesos configures each agent with the number of cores in the processor. ● Mesos enforces it by using CPU shares (CPU time per second) ● It’s a guaranteed minimum (if there’s more CPU time available, it could be used)Example cpus=24
  • 17. Memory Resource Represents how many MB of memory are available. ● By default, Mesos configures each agent with 1 GB or 50% of detected memory, whichever is smaller. (Leave memory for the OS!!) ● It’s a strictly preallocated resource (you get what you reserve) ● That makes it a critical resource (you have to get the right amount of memory for your tasks, otherwise they could get killed if they try to use too much) Example mem=1024.0
  • 18. Disk Resource Represents how many MB of disk space are available. ● By default, Mesos configures each agent with 5 GB or 50% of detected disk, whichever is smaller ● If affects the container’s sandbox. ● Mesos, by default, doesn’t enforce it (it’s not really allocated, a task can use as much space as it wants). Setting --enforce_container_disk_quota changes that behaviour. Example disk=2048.0
  • 19. Ports Resource Represents the available ports to listen in the agent. ● It’s a RANGE. ● By default, Mesos configures each agent to expose port range 31000–32000. ● Port usage is not enforced by Mesos. ● However, it’s important to reserve the ports a task must listen to, to be sure to avoid conflicts (only one process can be listening in a port at a time). Example ports=[9000-9300]
  • 20. Custom Resources ● Mesos allows to define any custom resource. ● Remember that a resource is something which can be exclusively reserved. ● There’s no need to enforce the resource allocation (see disk or ports). Examples ● network_bandwith=1000.0 ● bugs={bug1, bug2} ● oranges=1500.0 This resources will be offered to applications, which need to be able to manage it if they want to use it.
  • 21. What is an Attribute? Types ● SCALAR (1024.0) ● RANGE ([1-10]) ● SET ({elem1, elem2}) ● They are not allocated, only passed along with the resources to the applications in offers. ● They are a helper for the scheduling decisions. Arbitrary key-value data that serves as metadata about the machine running the agent. Example ● rack_id=eu-1 ● os=ubuntu
  • 23. Leader Agents Framework Architecture Scheduler Executors - Tasks Register Offer Accept and Launch Reject
  • 24. What is a Framework? An application that runs on Mesos. ● Based in the master-worker design. ● It’s ad-hoc for the application business model Two components: ● Scheduler ● Executors
  • 25. Scheduler ● It’s the brain of the framework. ● Registers with Mesos and receives resource offers. ● Launches tasks for the application when it has been offered with enough resources, or according another scheduling logic. ● We could see it as an intermediate between the application logic and the Mesos layer. ● It’s developed for each application. Mesos provides an API for doing it (HTTP and native)
  • 26. Executor ● Launched by the scheduler when it has work to do (worker). ● It will receive tasks to do from the scheduler and will send back status updates (it’s connected with Mesos too). ● Act as a process container that runs tasks. ● Mesos provides an executor API also, but, given that it’s more general purpose than the scheduler, Mesos provides a CommandExecutor that should be enough for most of the workloads.
  • 27. Task ● The unit of work in Mesos, the workload that a scheduler wants to run in the cluster. ● Runs inside an executor. ● An Executor can run more than one task (not common). ● A task has a definition of the needed resources that will be allocated. ● Mesos will allocate to the container enough resources for the bunch of tasks launched plus the executor. (and will resize it dynamically if more tasks are added).
  • 29. What is an Offer? ● Used by Mesos to allocate resources to a framework. ● Leading master send offers to the frameworks’ schedulers.
  • 30. What’s Inside an Offer? ● Resources offered. ● Affected agent (slaveId). ● Attributes of the agent. cpu 8.0 mem 4096.0 disk 1024.0 ports [9000-65536] hostname agent-1 rack_id EU-I-1 slaveId asd1323...
  • 31. How’re Offers Sent to Frameworks? ● Masters run the resource allocator module. ● This module decides to which framework send an offer using an algorithm called DRF (Dominant resource fairness). ● The allocation module is pluggable. ● The algorithm tries to maximize the minimal dominant share across frameworks. (Considering their dominant resource) ● DRF orders frameworks and then the offer is sent to them in order one at a time.
  • 32. What to Do with an Offer? ACCEPT ● Launch a task with resources of the offer (only the needed, not all) ● Perform a reservation. ● Create a persistent volume. REJECT ● Don’t do anything with an offer. ● Why? When Mesos sends an offer to a scheduler for the Allocator the resources are allocated to the framework. (framework penalized in the DRF)
  • 33. More About Offers ● Different offers of the same agent can be grouped to get more resources (when accepting an offer). ● Several tasks can be launched with the same offer (as long as there are enough resources) ● Mesos tries to send offers as big as possible.
  • 34. Two Level Scheduling Master manages cluster resources and decides to which framework send an offer. Schedulers accept or reject offers according to the concrete application needs.
  • 37. What’s a Role? ● Like a group of frameworks. ● Used to ensure that certain resources are only offered to certain frameworks (only resources allocated to a role are offered to a framework, with an exception). ● Each framework registers with Mesos with a role (by default, * )
  • 38. * IS A ROLE, NOT ANY The default role (*) doesn’t mean that any role is accepted, is a concrete role (Bad name…)
  • 39. More on Roles Any role is allowed Frameworks can register with any role name, unless the flag --roles is set in the Mesos masters with a concrete list. Resources allocated to * are available to all roles By default, resources are allocated to the default role (*). All the frameworks, no matter their role, will receive offers of resources allocated to ‘*’. Roles can use weights Weights can be assigned to roles, allowing to indicate in DRF that certain role has to get a higher amount of resources than other.
  • 41. What’s a Reservation? The way to allocate resources in an agent to specific roles
  • 42. Static Reservation While configuring the exposed resources in an agent, those resources could be statically reserved to concrete roles. cpu 4.0 mem 2048.0 disk(*) 512.0 ports [9000-65536] cpu(pro) 4.0 mem(pro) 2048.0 disk(pro) 512.0
  • 43. Static Reservation Not recommended Static reservations are only maintained for backwards compatibility. Restart needed To change the amount of reserved resources it’s needed to modify the agent configuration and restart it. By default, resources are allocated to the default role
  • 44. Dynamic Reservation Resources can be reserved and unreserved In runtime, resources can be reserved to a role, and later they can be unreserved when no task is using that resources. Using an HTTP endpoint Dynamic reservation is managed by operators using HTTP endpoints for reserve and unreserve. Using an acceptOffers operation Schedulers can reserve/unreserve resources when accepting an offer by using two special operations.
  • 46. Sandbox (Disk Resource) Working directory A Sandbox is a temporary directory given to each executor and set as working directory for it. It’s accessible from outside the container. Stores logs and other data It contains the stdout and stderr of the executor. Besides that it contains the fetched files (URI) and files created by the task. Garbage collected This directory is cleaned from the agent system once a configurable period of time has passed.
  • 47. Persistent Volumes ● Created from disk resources, they live outside the executor’s sandbox and will persist on the agent. ● When a task using them finishes, they are offered back without losing data. ● Used for stateful services.
  • 48. More on Persistent Volumes ● Created over previously reserved disk resources. ● No more than one task can have the volume at the same time. ● To unreserve the disk resources associated with a persistent volume, it’s needed to destroy the volume first ● Created/destroyed using HTTP endpoints or via acceptOffers in the Scheduler. ● Associated to a role (volume can be offered back to any framework in the role).
  • 49. Type of Disk Resources ROOT Maps to the main operating system storage drive. It’s the default option. MOUNT Auxiliary disks provided by operators which maps to a mount point in the host OS. When reserved, all the disk is reserved (no matter the reserved size). PATH Auxiliary disk resource created by operator, which maps a directory in the host OS to a disk resource. Usually used to carve up a mounted disk in smaller chunks.
  • 52. Docker Containerizer ● Works with Docker images (task/executor). ● Uses docker-engine (docker run….). ● Needs docker installed in each agent. (external dependency…) Mesos roadmap is unifying containerizers and stop its support.
  • 53. Mesos Containerizer ● Runs commands of the host OS. ● Runs Docker/AppC Images (Universal Containerizer). ● Uses LXC. ● Based on pluggable isolators, which are used for isolating resources from other containers. ● Examples: cgroups/cpu, cgroups/mem, docker/volume, disk/du, docker/runtime, network/cni, etc. Tip: sudo nsenter --mount --uts --ipc --net --pid --target <PID_CONTAINER>
  • 54. Docker on Mesos Containerizer ● A Docker image represents a filesystem. ● Mesos pulls the image and extracts the filesystem. ● Using pivotroot, the container is launched over that filesystem. ● Isolation is done by the Mesos containerizer (no docker-engine dependency).http://events.linuxfoundation.org/sites/events/files/sli des/Mesos%20and%20Containers.pdf
  • 55. Docker on Mesos Containerizer BE CAREFUL WITH PERMISSIONS User namespace matches with the agent (the only way to use an user created in the Dockerfile is to have an user in the agent with the same name, uid and gid). BRIDGE NETWORK IS NOT SUPPORTED When you bind to a port, by default you do it on the agent host stack (if you’re not using another isolator like network/cni for using virtual networks and IP per container).
  • 56. 10. More Aspects External volumes, oversubscription, checkpointing
  • 57. External Volumes ● Uses dvdcli and a Docker Volume plugin, for instance REX-Ray or GlusterFS (dependency). ● Mounts an external volume from a storage provider to the task container (Cinder, Amazon EBS, etc). ● Instead of binding a task data to an agent (persistent volumes) it manages storage outside the agents.
  • 58. Oversubscription ● Frameworks can use resources allocated to a framework but temporarily unused. ● These resources can be revoked by Mesos in any moment. ● A QoS module ensures that the framework to which these resources belong has not impact in its performance.
  • 59. Checkpointing For agent recovery, a Framework can enable checkpointing to write its state to disk regularly. If the Mesos Agent is stopped (a failure or upgrade), tasks of checkpointed frameworks continue running (otherwise, all running tasks are killed).
  • 60. Hands On: Let’s make a framework https://github.com/roberveral/mesos-gocd
  • 61. THANKS! Any questions? @datiobd | @roberveral rveral@datiobd.com datio-big-data | robertoveral