SlideShare a Scribd company logo
1 of 24
Why Functional Programming Is
Important In Big Data Era?
handaru@tiket.com
What Is Big Data?
What Are The Steps?
Act On
Analyze
Collect
What We Need?
D
Distributed Computing
Cluster
ProcessData
What We Need?
• Spark as data processsing in cluster, originally
written in Scala, which allows concise
function syntax and interactive use
• Mesos as cluster manager
• ZooKeeper as highly reliable distributed
coordinator
• HDFS as distributed storage
What We Need?
• Pure functions
• Atomic operations
• Parallel patterns or skeletons
• Lightweight algorithms
The only thing that works for parallel programming
is functional programming.
--Carnegie Mello Professor Bob Harper
What Is Functional Programming?
FP Quick Tour In Scala
• Basic transformations:
var array = new Array[Int](10)
var list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
• Indexing:
array(0) = 1
println(list(0))
• Anonymous functions:
val multiplay = (x: Int, y: Int) => x * y
val procedure = { x: Int => {
println(“Hello, ”+x)
println(x * 10)
}
}
FP Quick Tour In Scala
• Scala closure syntax:
(x: Int) => x * 10 // full version
x => x * 10 // type interference
_ * 10 // underscore syntax
x => { // body is block of code
val y = 10
x * y
}
FP Quick Tour In Scala
• Processing collections:
var list = List(1, 2, 3, 4, 5, 6, 7, 8, 9)
list.foreach(x => println(x))
list.map(_ * 10)
list.filter(x => x % 2 == 0)
list.reduce((x, y) => x + y)
list.reduce(_ + _)
def f(x: Int) = List(x-1, x x+1)
list.map(x => f(x))
list.map(f(_))
list.flatMap(x => f(x))
list.map(x => f(x)).reduce(_ ++ _)
Spark Quick Tour
• Spark context:
• Entry point to Spark functionality
• In spark-shell, crated as sc
• In standalone-spark-program, we must create it
• Resilient distributed datasets (RDDs) :
• A distributed memory abstraction
• A logically centralized entity but physically partitioned across multiple
machines inside a cluster based on some notion of key
• Immutable
• Automatically rebuilt on failure
• Based on LRU (Least Recent Use) eviction algorithm
Spark Quick Tour
Working with RDDs
Spark Quick Tour
Cached RDDs
Spark Quick Tour
• Transformations:
• Lazy operations to build RDDs from other RDDs
• Narrow transformation (involves no data shuffling) :
• map
• flatMap
• filter
• Wide transformation (involves data shuffling):
• sortByKey
• reduceByKey
• groupByKey
• Actions:
• Return a result or write it to storage
• collect
• count
• take(n)
Spark Quick Tour
Transformations
Spark Quick Tour
• Creating RDDs:
val numbers = sc.parallelize(List(1, 2, 3, 4, 5))
val textFile = sc.textFile("hdfs://localhost/test/tobe.txt")
val textFile = sc.textFile("hdfs://localhost/test/*.txt")
• Basic transformations:
val squares = numbers.map(x => x * x)
val evens = squares.filter(_ < 9)
val mapto = numbers.flatMap(x => 1 to x)
val words = textFile.flatMap(_.split(" ")).cache()
Base RDD
Transformed RDD
Turn a collection
to RDD
Spark Quick Tour
• Basic actions:
words.collect()
words take(5)
words count
words.reduce(_ + _)
words.filter(_ == “be").count()
words.filter(_ == “or").count()
words.saveAsTextFile("hdfs://localhost/test/result")
The influence of
cache
Spark Quick Tour
• Pair syntax:
val pair = (a, b)
• Accessing pair elements:
pair._1
pair._2
• Key-value operations:
val pets = sc.parallelize(List(("cat", 1), ("dog", 2), ("cat", 3)))
pets.reduceByKey(_ + _)
pets.groupByKey()
pets.sortByKey()
Hello World
val logFile = "hdfs://localhost/test/tobe.txt"
val logData = sc.textFile(logFile).cache()
val wordCount = logData.flatMap(_.split(“ “))
.map((_, 1))
.reduceByKey(_ + _)
wordCount.saveAsTextFile("hdfs://localhost/wordcount/result")
sc.stop()
Execution
Software Components
Application
Spark Context
ZooKeeper
Mesos
Master
Mesos Slave
Spark Executor
Mesos Slave
Spark Executor
HDFS/Other Storage
Literature
Parallel Programming With Spark
Spark: Low latency, massively parallel processing framework
handaru@tiket.com
handaru@tiket.com

More Related Content

What's hot

Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormBrandon O'Brien
 
Scylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScyllaDB
 
Building DSLs with Scala
Building DSLs with ScalaBuilding DSLs with Scala
Building DSLs with ScalaMohit Jaggi
 
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !Bruno Bonnin
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)Julia Angell
 
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...ScyllaDB
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesScyllaDB
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetesharidasnss
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)Claudiu Barbura
 
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Databricks
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Managerharidasnss
 
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformReal-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformScyllaDB
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Dockerharidasnss
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for KubernetesScylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for KubernetesScyllaDB
 

What's hot (20)

Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 
Scylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and Scylla
 
Building DSLs with Scala
Building DSLs with ScalaBuilding DSLs with Scala
Building DSLs with Scala
 
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !
Apache Spark avec NodeJS ? Oui, c'est possible avec EclairJS !
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)
 
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
 
Cassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary DifferencesCassandra vs. ScyllaDB: Evolutionary Differences
Cassandra vs. ScyllaDB: Evolutionary Differences
 
Apache Spark on Kubernetes
Apache Spark on KubernetesApache Spark on Kubernetes
Apache Spark on Kubernetes
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
 
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile PlatformReal-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
Real-time Fraud Detection for Southeast Asia’s Leading Mobile Platform
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for KubernetesScylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
Scylla Summit 2022: What’s New in ScyllaDB Operator for Kubernetes
 

Similar to Why Functional Programming Is Important in Big Data Era

Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015cdmaxime
 
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014cdmaxime
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2Gal Marder
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferretAndrii Gakhov
 
Apache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptApache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptbhargavi804095
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemBojan Babic
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizationsGal Marder
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Sharktrihug
 
Spark Programming
Spark ProgrammingSpark Programming
Spark ProgrammingTaewook Eom
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark Summit
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
 
Introduction to apache spark
Introduction to apache sparkIntroduction to apache spark
Introduction to apache sparkJohn Godoi
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Sumedh Wale's presentation
Sumedh Wale's presentationSumedh Wale's presentation
Sumedh Wale's presentationpunesparkmeetup
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark TutorialAhmet Bulut
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkWill Du
 

Similar to Why Functional Programming Is Important in Big Data Era (20)

Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
 
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
Apache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptApache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.ppt
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizations
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Introduction to apache spark
Introduction to apache sparkIntroduction to apache spark
Introduction to apache spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Sumedh Wale's presentation
Sumedh Wale's presentationSumedh Wale's presentation
Sumedh Wale's presentation
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 

More from Handaru Sakti

Game Theory of Oligopolistic Pricing Strategies
Game Theory of  Oligopolistic Pricing StrategiesGame Theory of  Oligopolistic Pricing Strategies
Game Theory of Oligopolistic Pricing StrategiesHandaru Sakti
 
Innovation management
Innovation managementInnovation management
Innovation managementHandaru Sakti
 
Product Design Language System
Product Design Language SystemProduct Design Language System
Product Design Language SystemHandaru Sakti
 
IES Triangle Principle
IES Triangle PrincipleIES Triangle Principle
IES Triangle PrincipleHandaru Sakti
 
Business Model Canvas
Business Model CanvasBusiness Model Canvas
Business Model CanvasHandaru Sakti
 
Transition management of product as platform
Transition management of  product as platformTransition management of  product as platform
Transition management of product as platformHandaru Sakti
 
Storial - Be Storyteller
Storial - Be StorytellerStorial - Be Storyteller
Storial - Be StorytellerHandaru Sakti
 
Mobile App Trends in 2016
Mobile App Trends in 2016Mobile App Trends in 2016
Mobile App Trends in 2016Handaru Sakti
 
Android career opportunities
Android career opportunitiesAndroid career opportunities
Android career opportunitiesHandaru Sakti
 
Android Support Package
Android Support PackageAndroid Support Package
Android Support PackageHandaru Sakti
 
Fisikawan dan Dunia Kerja
Fisikawan dan Dunia KerjaFisikawan dan Dunia Kerja
Fisikawan dan Dunia KerjaHandaru Sakti
 

More from Handaru Sakti (15)

Game Theory of Oligopolistic Pricing Strategies
Game Theory of  Oligopolistic Pricing StrategiesGame Theory of  Oligopolistic Pricing Strategies
Game Theory of Oligopolistic Pricing Strategies
 
Innovation management
Innovation managementInnovation management
Innovation management
 
Product Design Language System
Product Design Language SystemProduct Design Language System
Product Design Language System
 
Real-Time Big Data
Real-Time Big DataReal-Time Big Data
Real-Time Big Data
 
IES Triangle Principle
IES Triangle PrincipleIES Triangle Principle
IES Triangle Principle
 
Business Model Canvas
Business Model CanvasBusiness Model Canvas
Business Model Canvas
 
Transition management of product as platform
Transition management of  product as platformTransition management of  product as platform
Transition management of product as platform
 
My Storial
My StorialMy Storial
My Storial
 
Storial - Be Storyteller
Storial - Be StorytellerStorial - Be Storyteller
Storial - Be Storyteller
 
Mobile App Trends in 2016
Mobile App Trends in 2016Mobile App Trends in 2016
Mobile App Trends in 2016
 
Android career opportunities
Android career opportunitiesAndroid career opportunities
Android career opportunities
 
Loader
LoaderLoader
Loader
 
Android Support Package
Android Support PackageAndroid Support Package
Android Support Package
 
Fisikawan dan Dunia Kerja
Fisikawan dan Dunia KerjaFisikawan dan Dunia Kerja
Fisikawan dan Dunia Kerja
 
SAH2H PPT
SAH2H PPTSAH2H PPT
SAH2H PPT
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Why Functional Programming Is Important in Big Data Era

  • 1. Why Functional Programming Is Important In Big Data Era? handaru@tiket.com
  • 2. What Is Big Data?
  • 3. What Are The Steps? Act On Analyze Collect
  • 4. What We Need? D Distributed Computing Cluster ProcessData
  • 5. What We Need? • Spark as data processsing in cluster, originally written in Scala, which allows concise function syntax and interactive use • Mesos as cluster manager • ZooKeeper as highly reliable distributed coordinator • HDFS as distributed storage
  • 6. What We Need? • Pure functions • Atomic operations • Parallel patterns or skeletons • Lightweight algorithms The only thing that works for parallel programming is functional programming. --Carnegie Mello Professor Bob Harper
  • 7. What Is Functional Programming?
  • 8. FP Quick Tour In Scala • Basic transformations: var array = new Array[Int](10) var list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) • Indexing: array(0) = 1 println(list(0)) • Anonymous functions: val multiplay = (x: Int, y: Int) => x * y val procedure = { x: Int => { println(“Hello, ”+x) println(x * 10) } }
  • 9. FP Quick Tour In Scala • Scala closure syntax: (x: Int) => x * 10 // full version x => x * 10 // type interference _ * 10 // underscore syntax x => { // body is block of code val y = 10 x * y }
  • 10. FP Quick Tour In Scala • Processing collections: var list = List(1, 2, 3, 4, 5, 6, 7, 8, 9) list.foreach(x => println(x)) list.map(_ * 10) list.filter(x => x % 2 == 0) list.reduce((x, y) => x + y) list.reduce(_ + _) def f(x: Int) = List(x-1, x x+1) list.map(x => f(x)) list.map(f(_)) list.flatMap(x => f(x)) list.map(x => f(x)).reduce(_ ++ _)
  • 11. Spark Quick Tour • Spark context: • Entry point to Spark functionality • In spark-shell, crated as sc • In standalone-spark-program, we must create it • Resilient distributed datasets (RDDs) : • A distributed memory abstraction • A logically centralized entity but physically partitioned across multiple machines inside a cluster based on some notion of key • Immutable • Automatically rebuilt on failure • Based on LRU (Least Recent Use) eviction algorithm
  • 14. Spark Quick Tour • Transformations: • Lazy operations to build RDDs from other RDDs • Narrow transformation (involves no data shuffling) : • map • flatMap • filter • Wide transformation (involves data shuffling): • sortByKey • reduceByKey • groupByKey • Actions: • Return a result or write it to storage • collect • count • take(n)
  • 16. Spark Quick Tour • Creating RDDs: val numbers = sc.parallelize(List(1, 2, 3, 4, 5)) val textFile = sc.textFile("hdfs://localhost/test/tobe.txt") val textFile = sc.textFile("hdfs://localhost/test/*.txt") • Basic transformations: val squares = numbers.map(x => x * x) val evens = squares.filter(_ < 9) val mapto = numbers.flatMap(x => 1 to x) val words = textFile.flatMap(_.split(" ")).cache() Base RDD Transformed RDD Turn a collection to RDD
  • 17. Spark Quick Tour • Basic actions: words.collect() words take(5) words count words.reduce(_ + _) words.filter(_ == “be").count() words.filter(_ == “or").count() words.saveAsTextFile("hdfs://localhost/test/result") The influence of cache
  • 18. Spark Quick Tour • Pair syntax: val pair = (a, b) • Accessing pair elements: pair._1 pair._2 • Key-value operations: val pets = sc.parallelize(List(("cat", 1), ("dog", 2), ("cat", 3))) pets.reduceByKey(_ + _) pets.groupByKey() pets.sortByKey()
  • 19. Hello World val logFile = "hdfs://localhost/test/tobe.txt" val logData = sc.textFile(logFile).cache() val wordCount = logData.flatMap(_.split(“ “)) .map((_, 1)) .reduceByKey(_ + _) wordCount.saveAsTextFile("hdfs://localhost/wordcount/result") sc.stop()
  • 21. Software Components Application Spark Context ZooKeeper Mesos Master Mesos Slave Spark Executor Mesos Slave Spark Executor HDFS/Other Storage
  • 22. Literature Parallel Programming With Spark Spark: Low latency, massively parallel processing framework