SlideShare una empresa de Scribd logo
1 de 42
Demi Ben-Ari
Ofek Alumni Meetup
24/12/2015
About me
Demi Ben-Ari
Senior Software Engineer at Windward Ltd.
BS’c Computer Science – Academic College Tel-Aviv Yaffo
Co-Founder “Big Things” Big Data Community
In the Past:
Software Team Leader & Senior Java Software Engineer,
Missile defense and Alert System - “Ofek” – IAF
• Interested in almost every kind of technology – A True Geek
Agenda
 What is Spark?
 Spark Infrastructure and Basics
 Spark Features and Suite
◦ Spark-Shell Live Demo
◦ Cassandra & Spark
 Development with Spark
 Conclusion
What is Spark?
Efficient Usable
 General execution
graphs
 In-memory storage
 Rich APIs in Java,
Scala, Python
 Interactive shell
Fast and Expressive Cluster Computing
Engine Compatible with Apache Hadoop
What is Spark?
 Apache Spark is a general-purpose, cluster
computing framework
 Spark does computation In Memory & on
Disk
 Apache Spark has low level and high level
APIs
Spark Philosophy
 Make life easy and productive for data
scientists
 Well documented, expressive API’s
 Powerful domain specific libraries
 Easy integration with storage systems
 … and caching to avoid data movement
 Predictable releases, stable API’s
 Stable release each 3 months
Spark Contributors
 Highly active open source community
(09/2015)
◦ https://github.com/apache/spark/
 https://www.openhub.net/p/apache-spark
About Spark project
 Spark was founded at UC Berkeley and the
main contributor is “Databricks”.
 Interactive shell Spark in Scala and Python
◦ (spark-shell, pyspark)
 Currently stable in version 1.6
Spark Petabyte Sort
Multi Language API Support
 Scala
 Java
 Python
 Clojure
 R
Unified Tools Platform
Unified Tools Platform
Spark
SQL
GraphX
MLlib
Machine
Learning
Spark
Streamin
g
Spark Core
Data Frames
SparkR
Basic Terms
 Cluster
 Driver (Master)
 Executors (Slaves)
 Spark Context
 RDD – Resilient Distributed Dataset
Driver and Spark Context
 Spark Context is your “handle” to the Spark
cluster.
 The driver program contains the main
method.
 You use your Spark Context to access your
cluster.
◦ Configure the connection to the cluster
◦ It lets you create RDDs.
 The variable named sc (for the Spark
Context) is already defined in your Driver in
the Spark Shell.
What’s an RDD?
 Resilient Distributed Datasets
◦ Fault tolerant
◦ Parallel data structure
◦ Distributed on the nodes in the cluster
◦ Immutable!!!
◦ Can persist intermediate results in memory
◦ Transformations are operators and are Lazy
evaluated
Resilient Distributed Datasets
Resilient Distributed Datasets
RDD Persistence and
partitioning
 Users have control which RDD will be
reuse (in memory and disk storage)
◦ Persist, Cache, Unpersist
 Users can order an RDD’s to be
partitioned across machines
 Only the lost partitions of an RDD
need to be recomputed upon failure.
Spark execution engine
 Spark uses lazy evaluation
◦ Runs the code only when it encounters an
action operation
 There is no need to design and write a
single complex map-reduce job.
◦ In Spark we can write smaller and
manageable operations
◦ Spark will group operations together
Spark execution engine
 Serializes your code to the executors
◦ Can choose your serialization method
(Java serialization, Kryo)
 In Java - functions are specified as
objects that implement one of Spark’s
Function interfaces.
◦ Can use the same method of
implementation in Scala and Python as
well.
Persistence layers for Spark
 Distributed system
◦ Hadoop (HDFS)
◦ Local file system
◦ Amazon S3
◦ Cassandra
◦ Hive
◦ Hbase
 File formats
◦ Text file
 CSV, TSV, Plain Text
◦ Sequence File
◦ AVRO
◦ Parquet
Spark Core Features
 Distributed In memory Computation
 Stand alone and Local Capabilities
 History server for Spark UI
 Resource management Integration
 Unified job submission tool
History Server
 Can be run on all Spark deployments,
◦ Stand Alone, YARN, Mesos
 Integrates both with YARN and Mesos
 In Yarn / Mesos, run history server as
a daemon.
Job Submission Tool
 ./bin/spark-submit <app-jar> 
--class my.main.Class
--name myAppName
--master local[4]
--master spark://some-cluster
Spark Shell
 YouTube – Word Count Example
Cassandra & Spark
 Cassandra cluster
◦ Bare metal vs. On the cloud
 DSE – DataStax Enterprise
◦ Cassandra & Spark in each node
 Vs
◦ Separate Cassandra and Spark clusters
Development with Spark
Where do I start from?!
 Download spark as a package
◦ Run it on “local” mode (no need of a real
cluster)
◦ “spark-ec2” scripts to ramp-up a Stand Alone
mode cluster
◦ Amazon Elastic Map Reduce (EMR)
 Yarn vs. Mesos vs. Stand Alone
Running Environments
 Development – Testing – Production
◦ Don’t you need more?
◦ Be as flexible as you can
 Cluster Utilization
◦ Unified Cluster for all environments
 Vs.
◦ Cluster per Environment
 (Cluster per Data Center)
 Configuration
◦ Local Files vs. Distributed
Saving and Maintaining the
Data Local File System – Not effective in a distributed
environment
 HDFS
◦ Might be very Expensive
◦ Locality Rules – Spark + HDFS node + Same machine
 S3
◦ High latency and pretty slow but low costs
 Cassandra
◦ Rigid data model
◦ Very fast and depends on the Volume of the data can be
DevOps – Keep It Simple,
Stupid Linux
◦ Bash scripts
◦ Crontab
 Automation via Jenkins
 Continuous Deployment – with every GIT push
Dev Testing
Live
Staging
Production
Daily ManualAutomaticAutomatic
Build Automation
 Maven
◦ Sonatype Nexus artifact management
 -
◦ Deploy and Script generation scripts
◦ Per Environment Testing
◦ Data Validation
◦ Scheduled Tasks
Testing
Dev Testing
Live
Staging
Production
Summary
Cluster
Dev Testing
Live
Staging
ProductionEnv
ELK
Data Flow
Extern
al Data
Source
s
Analytics Layers Data Output
Conclusion
 Spark is a popular and very powerful
distributed in memory computation
framework
 Broadly used and has lots of contributors
 Leading tool in the new world of Petabytes
of unexplored data in the world
Questions?
Thanks,
Resources and Contact
 Demi Ben-Ari
◦ LinkedIn
◦ Twitter: @demibenari
◦ Blog: http://progexc.blogspot.com/
◦ Email: demi.benari@gmail.com
◦ “Big Things” Community
 Meetup, YouTube, Facebook, Twitter

Más contenido relacionado

La actualidad más candente

Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...Amazon Web Services
 
AWS Data migration services
AWS Data migration servicesAWS Data migration services
AWS Data migration servicesArun Sirimalla
 
Deep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech TalksDeep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech TalksAmazon Web Services
 
Introduction to AWS Database Services
Introduction to AWS Database ServicesIntroduction to AWS Database Services
Introduction to AWS Database ServicesAmazon Web Services
 
AWS Innovation at Scale – Rodney Haywood
AWS Innovation at Scale – Rodney HaywoodAWS Innovation at Scale – Rodney Haywood
AWS Innovation at Scale – Rodney HaywoodAmazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSAmazon Web Services
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksAmazon Web Services
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSArun Sirimalla
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisAmazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Benefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftBenefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftAmazon Web Services LATAM
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...Amazon Web Services
 
Deploying in the Cloud: Why and How
Deploying in the Cloud: Why and HowDeploying in the Cloud: Why and How
Deploying in the Cloud: Why and HowMatt Small
 
Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Amazon Web Services
 
Riak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud StorageRiak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud Storagebuildacloud
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...Amazon Web Services
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)Amazon Web Services
 

La actualidad más candente (20)

Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
Migrate from SQL Server or Oracle into Amazon Aurora using AWS Database Migra...
 
Windows on AWS
Windows on AWSWindows on AWS
Windows on AWS
 
AWS Data migration services
AWS Data migration servicesAWS Data migration services
AWS Data migration services
 
Deep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech TalksDeep Dive on MySQL Databases on AWS - AWS Online Tech Talks
Deep Dive on MySQL Databases on AWS - AWS Online Tech Talks
 
Introduction to AWS Database Services
Introduction to AWS Database ServicesIntroduction to AWS Database Services
Introduction to AWS Database Services
 
AWS Innovation at Scale – Rodney Haywood
AWS Innovation at Scale – Rodney HaywoodAWS Innovation at Scale – Rodney Haywood
AWS Innovation at Scale – Rodney Haywood
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWS
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Benefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon RedshiftBenefícios e melhores práticas no uso do Amazon Redshift
Benefícios e melhores práticas no uso do Amazon Redshift
 
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
AWS re:Invent 2016: Workshop: Converting Your Oracle or Microsoft SQL Server ...
 
Deploying in the Cloud: Why and How
Deploying in the Cloud: Why and HowDeploying in the Cloud: Why and How
Deploying in the Cloud: Why and How
 
Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017
 
Riak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud StorageRiak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud Storage
 
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
AWS re:Invent 2016: Service Integration Delivery and Automation Using Amazon ...
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 

Similar a Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni

Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @WindwardDemi Ben-Ari
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your EyesDemi Ben-Ari
 
Spark in the Maritime Domain
Spark in the Maritime DomainSpark in the Maritime Domain
Spark in the Maritime DomainDemi Ben-Ari
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxRahul Borate
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Olalekan Fuad Elesin
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkDavid Smelker
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkIke Ellis
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerSpark Summit
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 

Similar a Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni (20)

Spark to Production @Windward
Spark to Production @WindwardSpark to Production @Windward
Spark to Production @Windward
 
Bring the Spark To Your Eyes
Bring the Spark To Your EyesBring the Spark To Your Eyes
Bring the Spark To Your Eyes
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Spark in the Maritime Domain
Spark in the Maritime DomainSpark in the Maritime Domain
Spark in the Maritime Domain
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
Azure Databricks is Easier Than You Think
Azure Databricks is Easier Than You ThinkAzure Databricks is Easier Than You Think
Azure Databricks is Easier Than You Think
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 

Más de Demi Ben-Ari

Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
CTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at PanoraysCTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at PanoraysDemi Ben-Ari
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Demi Ben-Ari
 
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Demi Ben-Ari
 
CTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysCTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysDemi Ben-Ari
 
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - PanoraysAll I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - PanoraysDemi Ben-Ari
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysDemi Ben-Ari
 
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-AriCommunity, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-AriDemi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysDemi Ben-Ari
 
Know the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek AlumniKnow the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek AlumniDemi Ben-Ari
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Know the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek AlumniKnow the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek AlumniDemi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Bootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-AriBootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-AriDemi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 

Más de Demi Ben-Ari (20)

Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
CTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at PanoraysCTO Management Tool Box - Demi Ben-Ari at Panorays
CTO Management Tool Box - Demi Ben-Ari at Panorays
 
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
Kubernetes, Toolbox to fail or succeed for beginners - Demi Ben-Ari, VP R&D @...
 
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
Hacker vs company, Cloud Cyber Security Automated with Kubernetes - Demi Ben-...
 
CTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- PanoraysCTO Management ToolBox - Demi Ben-Ari -- Panorays
CTO Management ToolBox - Demi Ben-Ari -- Panorays
 
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - PanoraysAll I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
All I Wanted Is to Found a Startup - Demi Ben-Ari - Panorays
 
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - PanoraysHacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
 
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-AriCommunity, Unifying the Geeks to Create Value - Demi Ben-Ari
Community, Unifying the Geeks to Create Value - Demi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
Know the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek AlumniKnow the Startup World - Demi Ben-Ari - Ofek Alumni
Know the Startup World - Demi Ben-Ari - Ofek Alumni
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Know the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek AlumniKnow the Startup World - Demi Ben Ari - Ofek Alumni
Know the Startup World - Demi Ben Ari - Ofek Alumni
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Bootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-AriBootstrapping a Tech Community - Demi Ben-Ari
Bootstrapping a Tech Community - Demi Ben-Ari
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 

Último

How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 

Último (20)

How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 

Spark 101 – First Steps To Distributed Computing - Demi Ben-Ari @ Ofek Alumni

  • 1. Demi Ben-Ari Ofek Alumni Meetup 24/12/2015
  • 2. About me Demi Ben-Ari Senior Software Engineer at Windward Ltd. BS’c Computer Science – Academic College Tel-Aviv Yaffo Co-Founder “Big Things” Big Data Community In the Past: Software Team Leader & Senior Java Software Engineer, Missile defense and Alert System - “Ofek” – IAF • Interested in almost every kind of technology – A True Geek
  • 3.
  • 4. Agenda  What is Spark?  Spark Infrastructure and Basics  Spark Features and Suite ◦ Spark-Shell Live Demo ◦ Cassandra & Spark  Development with Spark  Conclusion
  • 5. What is Spark? Efficient Usable  General execution graphs  In-memory storage  Rich APIs in Java, Scala, Python  Interactive shell Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop
  • 6. What is Spark?  Apache Spark is a general-purpose, cluster computing framework  Spark does computation In Memory & on Disk  Apache Spark has low level and high level APIs
  • 7. Spark Philosophy  Make life easy and productive for data scientists  Well documented, expressive API’s  Powerful domain specific libraries  Easy integration with storage systems  … and caching to avoid data movement  Predictable releases, stable API’s  Stable release each 3 months
  • 8.
  • 9. Spark Contributors  Highly active open source community (09/2015) ◦ https://github.com/apache/spark/  https://www.openhub.net/p/apache-spark
  • 10. About Spark project  Spark was founded at UC Berkeley and the main contributor is “Databricks”.  Interactive shell Spark in Scala and Python ◦ (spark-shell, pyspark)  Currently stable in version 1.6
  • 12. Multi Language API Support  Scala  Java  Python  Clojure  R
  • 15. Basic Terms  Cluster  Driver (Master)  Executors (Slaves)  Spark Context  RDD – Resilient Distributed Dataset
  • 16. Driver and Spark Context  Spark Context is your “handle” to the Spark cluster.  The driver program contains the main method.  You use your Spark Context to access your cluster. ◦ Configure the connection to the cluster ◦ It lets you create RDDs.  The variable named sc (for the Spark Context) is already defined in your Driver in the Spark Shell.
  • 17. What’s an RDD?  Resilient Distributed Datasets ◦ Fault tolerant ◦ Parallel data structure ◦ Distributed on the nodes in the cluster ◦ Immutable!!! ◦ Can persist intermediate results in memory ◦ Transformations are operators and are Lazy evaluated
  • 20. RDD Persistence and partitioning  Users have control which RDD will be reuse (in memory and disk storage) ◦ Persist, Cache, Unpersist  Users can order an RDD’s to be partitioned across machines  Only the lost partitions of an RDD need to be recomputed upon failure.
  • 21. Spark execution engine  Spark uses lazy evaluation ◦ Runs the code only when it encounters an action operation  There is no need to design and write a single complex map-reduce job. ◦ In Spark we can write smaller and manageable operations ◦ Spark will group operations together
  • 22. Spark execution engine  Serializes your code to the executors ◦ Can choose your serialization method (Java serialization, Kryo)  In Java - functions are specified as objects that implement one of Spark’s Function interfaces. ◦ Can use the same method of implementation in Scala and Python as well.
  • 23. Persistence layers for Spark  Distributed system ◦ Hadoop (HDFS) ◦ Local file system ◦ Amazon S3 ◦ Cassandra ◦ Hive ◦ Hbase  File formats ◦ Text file  CSV, TSV, Plain Text ◦ Sequence File ◦ AVRO ◦ Parquet
  • 24.
  • 25.
  • 26. Spark Core Features  Distributed In memory Computation  Stand alone and Local Capabilities  History server for Spark UI  Resource management Integration  Unified job submission tool
  • 27. History Server  Can be run on all Spark deployments, ◦ Stand Alone, YARN, Mesos  Integrates both with YARN and Mesos  In Yarn / Mesos, run history server as a daemon.
  • 28. Job Submission Tool  ./bin/spark-submit <app-jar> --class my.main.Class --name myAppName --master local[4] --master spark://some-cluster
  • 29. Spark Shell  YouTube – Word Count Example
  • 30. Cassandra & Spark  Cassandra cluster ◦ Bare metal vs. On the cloud  DSE – DataStax Enterprise ◦ Cassandra & Spark in each node  Vs ◦ Separate Cassandra and Spark clusters
  • 32. Where do I start from?!  Download spark as a package ◦ Run it on “local” mode (no need of a real cluster) ◦ “spark-ec2” scripts to ramp-up a Stand Alone mode cluster ◦ Amazon Elastic Map Reduce (EMR)  Yarn vs. Mesos vs. Stand Alone
  • 33. Running Environments  Development – Testing – Production ◦ Don’t you need more? ◦ Be as flexible as you can  Cluster Utilization ◦ Unified Cluster for all environments  Vs. ◦ Cluster per Environment  (Cluster per Data Center)  Configuration ◦ Local Files vs. Distributed
  • 34. Saving and Maintaining the Data Local File System – Not effective in a distributed environment  HDFS ◦ Might be very Expensive ◦ Locality Rules – Spark + HDFS node + Same machine  S3 ◦ High latency and pretty slow but low costs  Cassandra ◦ Rigid data model ◦ Very fast and depends on the Volume of the data can be
  • 35. DevOps – Keep It Simple, Stupid Linux ◦ Bash scripts ◦ Crontab  Automation via Jenkins  Continuous Deployment – with every GIT push Dev Testing Live Staging Production Daily ManualAutomaticAutomatic
  • 36. Build Automation  Maven ◦ Sonatype Nexus artifact management  - ◦ Deploy and Script generation scripts ◦ Per Environment Testing ◦ Data Validation ◦ Scheduled Tasks
  • 40. Conclusion  Spark is a popular and very powerful distributed in memory computation framework  Broadly used and has lots of contributors  Leading tool in the new world of Petabytes of unexplored data in the world
  • 42. Thanks, Resources and Contact  Demi Ben-Ari ◦ LinkedIn ◦ Twitter: @demibenari ◦ Blog: http://progexc.blogspot.com/ ◦ Email: demi.benari@gmail.com ◦ “Big Things” Community  Meetup, YouTube, Facebook, Twitter

Notas del editor

  1. Generalize the map/reduce framework