Enviar búsqueda
Cargar
Ramping up your Devops Fu for Big Data developers
•
1 recomendación
•
670 vistas
François Garillot
Seguir
Lessons learned in building a Spark distribution
Leer menos
Leer más
Software
Denunciar
Compartir
Denunciar
Compartir
1 de 40
Descargar ahora
Descargar para leer sin conexión
Recomendados
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
Managing 10,000 Node Storage Clusters at Twitter
Managing 10,000 Node Storage Clusters at Twitter
J On The Beach
Event Sourcing + CQRS
Event Sourcing + CQRS
Bryan Reinero
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
탑크리에듀(구로디지털단지역3번출구 2분거리)
The easiest consistent hashing
The easiest consistent hashing
DaeMyung Kang
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Holden Karau
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sj
Holden Karau
Recomendados
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
Managing 10,000 Node Storage Clusters at Twitter
Managing 10,000 Node Storage Clusters at Twitter
J On The Beach
Event Sourcing + CQRS
Event Sourcing + CQRS
Bryan Reinero
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
탑크리에듀(구로디지털단지역3번출구 2분거리)
The easiest consistent hashing
The easiest consistent hashing
DaeMyung Kang
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Holden Karau
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sj
Holden Karau
2014 holden - databricks umd scala crash course
2014 holden - databricks umd scala crash course
Holden Karau
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Holden Karau
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
Holden Karau
Spark with Elasticsearch
Spark with Elasticsearch
Holden Karau
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
Holden Karau
Fraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Matt Ray
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Spark Summit
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
Spark Summit
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
François Garillot
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs Strata NY 2015
Holden Karau
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
DevOps DC - Magic Myth and the DevOps
DevOps DC - Magic Myth and the DevOps
Jennifer Davis
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Jennifer Davis
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
Chris Aniszczyk
The Place of Schema.org in Linked Ocean Data
The Place of Schema.org in Linked Ocean Data
Adam Leadbetter
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
Loretta Auvil
Más contenido relacionado
Destacado
2014 holden - databricks umd scala crash course
2014 holden - databricks umd scala crash course
Holden Karau
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Holden Karau
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
Holden Karau
Spark with Elasticsearch
Spark with Elasticsearch
Holden Karau
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
Holden Karau
Fraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Matt Ray
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Spark Summit
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
Spark Summit
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
François Garillot
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs Strata NY 2015
Holden Karau
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
Destacado
(16)
2014 holden - databricks umd scala crash course
2014 holden - databricks umd scala crash course
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
Spark with Elasticsearch
Spark with Elasticsearch
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
Fraud Detection using Hadoop
Fraud Detection using Hadoop
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Application Architectures with Hadoop
Application Architectures with Hadoop
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud Detection
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs Strata NY 2015
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Similar a Ramping up your Devops Fu for Big Data developers
DevOps DC - Magic Myth and the DevOps
DevOps DC - Magic Myth and the DevOps
Jennifer Davis
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Jennifer Davis
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
Chris Aniszczyk
The Place of Schema.org in Linked Ocean Data
The Place of Schema.org in Linked Ocean Data
Adam Leadbetter
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
Loretta Auvil
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
SEASR Overview
SEASR Overview
Loretta Auvil
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
MapR Technologies
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !
Microsoft
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
Uwe Printz
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Uwe Printz
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
Codemotion
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015
Joy Chatterjee
Hybrid cloud wiskyweb2012
Hybrid cloud wiskyweb2012
Combell NV
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
MapR Technologies
Ncku csie talk about Spark
Ncku csie talk about Spark
Giivee The
Similar a Ramping up your Devops Fu for Big Data developers
(20)
DevOps DC - Magic Myth and the DevOps
DevOps DC - Magic Myth and the DevOps
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
The Place of Schema.org in Linked Ocean Data
The Place of Schema.org in Linked Ocean Data
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
SEASR Overview
SEASR Overview
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015
Hybrid cloud wiskyweb2012
Hybrid cloud wiskyweb2012
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
Ncku csie talk about Spark
Ncku csie talk about Spark
Más de François Garillot
Growing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your Workload
François Garillot
Deep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and spark
François Garillot
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
François Garillot
Delivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscom
François Garillot
Spark Streaming : Dealing with State
Spark Streaming : Dealing with State
François Garillot
Diving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data Pool
François Garillot
Scala Collections : Java 8 on Steroids
Scala Collections : Java 8 on Steroids
François Garillot
Más de François Garillot
(7)
Growing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your Workload
Deep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and spark
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Delivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscom
Spark Streaming : Dealing with State
Spark Streaming : Dealing with State
Diving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data Pool
Scala Collections : Java 8 on Steroids
Scala Collections : Java 8 on Steroids
Último
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
MyIntelliSource, Inc.
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
Evangelist Apps https://twitter.com/EvangelistSW/
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
SolGuruz
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
kalichargn70th171
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
BradBedford3
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
kaushalgiri8080
Professional Resume Template for Software Developers
Professional Resume Template for Software Developers
Vinodh Ram
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
OPEN KNOWLEDGE GmbH
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
Arshad QA
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
joe51371421
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ComplianceQuest1
DNT_Corporate presentation know about us
DNT_Corporate presentation know about us
Dynamic Netsoft
What is Binary Language? Computer Number Systems
What is Binary Language? Computer Number Systems
JheuzeDellosa
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
OnePlan Solutions
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Alberto González Trastoy
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Fatema Valibhai
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
soniya singh
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
ThousandEyes
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
AxelRicardoTrocheRiq
Último
(20)
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
Professional Resume Template for Software Developers
Professional Resume Template for Software Developers
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
Software Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
DNT_Corporate presentation know about us
DNT_Corporate presentation know about us
What is Binary Language? Computer Number Systems
What is Binary Language? Computer Number Systems
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
Ramping up your Devops Fu for Big Data developers
1.
Ramping(up(your(devops1fu( for(Big(Data(Developers 1
2.
Francois)Garillot Typesafe @huitseeker 2
3.
3
4.
4
5.
Apache'Mesos • top%level)Apache)project)since)July)2013 • framework)agnos?c •
a)cluster)manager)&)resource)manager • developed)by)TwiDer)&)Mesosphere,)among)others • "The)data)center's)opera?ng)system" 5
6.
Mesos%Principles Mesos%=%cluster%+%cgroups%+%LXC 6
7.
7
8.
8
9.
Mesos%internals 9
10.
10
11.
11
12.
Mesos%topology 12
13.
13
14.
So,$why$do$we$care$? • mul%&processes • mul%&roles •
mul%&versions • legacy3use3cases 14
15.
Spark "To$validate$our$hypothesis$[...],$we$have%also%built%a% new%framework%on%top%of%Mesos%called%Spark,$ op7mized$for$itera7ve$jobs$where$a$dataset$is$reused$ in$many$parallel$operand$shown$that$Spark$can$ outperform$Hadoop$by$10x$in$itera7ve$machine$ learning$workloads. —"Hindman"&"al."2011 15
16.
Spark • top%level)Apache)Project)since)February)2014 • also,)growth 16
17.
Spark&expressivity val textFile =
spark.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") 17
18.
Java$word$count package org.myorg; import java.io.IOException; import
java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } } 18
19.
Spark&advantages • Fast&!&... • Because&no&dump&to&disk&between&every&opera9on •
Combiners&(map<side&reduce)&automa9cally& applied&... • ...&and&easy&to&define • clever&map&pipeline 19
20.
Spark&advantages • flexible(I/O(:(interfaces(to(DBs,(Streaming,(S3,(local( filesystem(and(HDFS • faultAtolerance(for(executor(&(master •
SparkSQL • MLLib,(GraphX 20
21.
Spark&Streaming 21
22.
Spark&advantages Momentum(!! • Sparkling+Water+=+H2O+++Spark • Apache+Mahout+rewrite+since+March+2014 •
DeepLearning4jBScaleout+=+Deeplearning4j+on+ ND4J+++Spark • 'Lingua+Franca'+of+distributed+data+analysis 22
23.
Spark&clustering&modes • local • standalone •
Mesos • YARN 23
24.
Spark&on&Mesos 24
25.
25
26.
Fine%grained*mode • “fine&grained”-mode-(default):-each-Spark-task-runs- as-a-separate-Mesos-task. • each-applica?on-gets-more-or-fewer-machines-as-it- ramps-up-and-down, •
but-overhead-in-launching-each-task. 26
27.
Coarse'grained,mode • “coarse)grained”/mode/:/only/one/long)running/ Spark/task/on/each/Mesos machine, • and/dynamically/schedule/its/own/“mini)tasks”/ within/it. •
much/lower/startup/overhead, • but/reserving/the/Mesos/resources/for/the/duraAon 27
28.
Deployment 28
29.
Automa'on 29
30.
Ansible • pilots(through(ssh • no(dependencies(on(slaves •
YAML(scrip7ng,(but(can(drop( down(to(Python • integrated(modules(for(EC2,( apt(... 30
31.
Ansible ... - name: download
spark sources git: repo: "{{ spark_repo }}" dest: "{{ spark_dir }}" version: "{{ spark_ref }}" force: yes - name: prepare sources for {{ scala_major_version }} command: dev/change-version-to-{{scala_major_version}}.sh args: chdir: "{{spark_dir}}" - name: build spark command: ./make-distribution.sh -Pyarn -Phadoop-{{hadoop_major_version}} args: chdir: "{{ spark_dir }}" environment: java_env ... 31
32.
Packer • hybrid(virtual(image( genera2on • provision(on(VirtualBox •
provision(on(Amazon(AWS • Vagrant(an(interes2ng(target( as(well 32
33.
Tinc • VPN • simple+file-based+configura7on+ (BSD-style) •
automa7c+mesh+rou7ng+in+1+ config+line: AutoConnect = yes • mul7ple+opera7ng+systems 33
34.
Tinc%and%Spark • Spark'binds'using'naming'only'(see'SPARK9624) • Tinc'name'resolu@on'only'works'reliably'in'some' configura@ons •
use'avahi9daemon'or'your'own'DNS • more'simply,'set'hostnames'and'write'to'/etc/ hosts'everywhere • avoid'non9ascii'in'both'@nc'network'and'machine' names 34
35.
So#Far • deployment+of+Mesos,+HDFS,+Spark • fully+automated,+from+any+commit+of+Mesos+/+Spark+ git+repositories •
...+or+our+forks • stress=tes>ng,+in+collab.+Mesosphere+&+DataBricks • partnership+for+huge+prototype+deployment 35
36.
Ongoing&steps 36
37.
Mesos%and%Spark%integra0on • dynamic)alloca,on)for)coarse1grained)mode)&) external)shuffle)service) • co1tes,ng)w/DB,)Mesosphere) •
cluster)mode) 37
38.
Docker':'your'favorite' containerizer 38
39.
39
40.
40
Descargar ahora