Mesos: A Flexible Cluster Resource Manager
Mesos is a platform for sharing clusters between multiple parallel applications, such as Hadoop, MPI, and web services. It allows organizations to consolidate workloads from multiple applications onto a single cluster, which increases utilization and simplifies management, while ensuring that each application gets suitable resources to run on (e.g. that Hadoop gets data locality). Hadoop users can leverage Mesos to run multiple isolated instances of Hadoop (e.g. production and experimental instances) on the same cluster, to run multiple versions of Hadoop concurrently (e.g. to test patches), and to run MPI and other applications alongside Hadoop. In addition, due to its simple design, Mesos is highly scalable (to tens of thousands of nodes) and fault tolerant (using ZooKeeper). In this talk, we'll provide an introduction to Mesos for Hadoop users. We have also been working to test Mesos at Twitter and Facebook, and we'll present experience and results from those deployments.
Spark: A Framework for Iterative and Interactive Cluster Computing
Although the MapReduce programming model has been highly successful, it is not suitable for all applications. We present Spark, a framework optimized for one such type of applications - iterative jobs where a dataset is shared across multiple parallel operations, as is common in machine learning algorithms. Spark provides a functional programming model similar to MapReduce, but also lets users cache data between iterations, leading to up to 10x better performance than Hadoop. Spark also makes programming jobs easy by integrating into the Scala programming language. Finally, the ability of Spark to load a dataset into memory and query it repeatedly makes it especially suitable for interactive analysis of big datasets. We have modified the Scala interpreter to make it possible to use Spark interactively, providing a much more responsive experience than Hive and Pig. Spark is built on top of Mesos and can therefore run alongside Hadoop.
Project URL: http://www.cs.berkeley.edu/~matei/spark/
Gen AI in Business - Global Trends Report 2024.pdf
HUG August 2010: Mesos
1. Mesos A Resource Management Platform for Hadoop and Big Data Clusters Benjamin Hindman, Andy Konwinski,MateiZaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica UC Berkeley
2. Challenges in Hadoop Cluster Management Isolation (both fault and resource) E.g. if co-locating production and experimental jobs Testing and deploying upgrades JobTracker scalability (in larger clusters) Running non-Hadoop jobs E.g. MPI, streaming MapReduce, etc
3. Mesos Mesosis a cluster resource manager over which multiple instances of Hadoop and other distributed applications can run Hadoop (production) Hadoop (ad-hoc) MPI … Mesos Node Node Node Node
4. What’s Different About It? Other resource managers exist today, including Hadoop on Demand Batch schedulers (e.g. Torque) VM schedulers (e.g. Eucalyptus) However, have 2 problems with Hadoop-like workloads: Data locality compromised due to static partitioning of nodes Utilization hurt because jobs hold nodes for their full duration Mesos addresses these through two features: Fine-grained sharing at the level of tasks Two-level scheduling model where jobs control placement
7. Design Goals Scalability (to 10,000’s of nodes) Robustness (even to master failure) Flexibility (support wide variety of frameworks) Resulting design: simple, minimal core that pushes resource selection logic to frameworks
8. Other Features Master fault tolerance using ZooKeeper Resource isolation using Linux Containers Isolate CPU and memory between tasks on each node In newest kernels, can isolate network & disk IO too Web UI for viewing cluster state Deploy scripts for private clusters and EC2
9. Mesos Status Prototype in 10000 lines of C++ Ported frameworks: Hadoop (0.20.2), MPI (MPICH2), Torque New frameworks: Spark, Scala framework for iterative & interactive jobs Test deployments at Twitter and Facebook
12. Spark A framework for iterative and interactive cluster computing Matei Zaharia, MosharafChowdhury, Michael Franklin, Scott Shenker, Ion Stoica
13. Spark Goals Support iterative applications Common in machine learning but problematic forMapReduce, Dryad, etc Retain MapReduce’s fault tolerance & scalability Experiment with programmability Integrate into Scala programming language Support interactive use from Scala interpreter
14. Key Abstraction Resilient Distributed Datasets (RDDs) Collections of elements distributed across cluster that can persist across parallel operations Can be stored in memory, on disk, etc Can be transformed with map, filter, etc Automatically rebuilt on failure
15. Example: Log Mining Load error messages from a log into memory, then interactively search for various patterns Cache 1 Base RDD Transformed RDD lines = spark.textFile(“hdfs://...”) errors = lines.filter(_.startsWith(“ERROR”)) messages = errors.map(_.split(‘’)(2)) cachedMsgs = messages.cache() Worker results tasks Block 1 Driver Cached RDD Parallel operation cachedMsgs.filter(_.contains(“foo”)).count Cache 2 cachedMsgs.filter(_.contains(“bar”)).count Worker . . . Cache 3 Block 2 Worker Block 3
16. Example: Logistic Regression Goal: find best line separating two sets of points random initial line + + + + + + – + + – – + – + – – – – – – target
17. Logistic Regression Code val data = spark.textFile(...).map(readPoint).cache() varw = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = data.map(p => { val scale = (1/(1+exp(-p.y*(w dot p.x))) - 1) * p.y scale * p.x }).reduce(_ + _) w -= gradient } println("Finalw: " + w)
20. Conclusion Mesos provides a stable platform to multiplex resources among diverse cluster applications Spark is a new cluster programming framework for iterative & interactive jobs enabled by Mesos Both are open-source (but in very early alpha!) http://github.com/mesos http://mesos.berkeley.edu
Notas del editor
Mention it’s a research project but has recently gone open source
Mention Hama, Pregel, Dremel, etc; also mention Spark***Mention how we want to enable NEW frameworks too!!***
Mention how we want to enable NEW frameworks too!
Note here that even users of Hadoop want Mesos for the isolation it provides
Explain that Hadoop port uses pluggable scheduler, hence few changes to Hadoop and no changes to job clients
Point out that Scala is a modern PL etcMention DryadLINQ (but we go beyond it with RDDs)Point out that interactive use and iterative use go hand in hand because both require small tasks and dataset reuse
Note that dataset is reused on each gradient computation
This is for a 29 GB dataset on 20 EC2 m1.xlarge machines (4 cores each)
This slide has to say we’re committed to making Nexus “real”, and that it will be open sourced