SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Big Data
Dipl. Inform.(FH) Jony Sugianto, M. Comp. Sc.
Hp:0838-98355491
WA:0812-13086659
Email:jonysugianto@gmail.com
Agenda
● What is Big Data?
● Analytic
● Big Data Platforms
● Questions
What is Big Data?
● The Basic idea behind the phrase Big Data is that
everything we do is increasingly leaving a digital trace(or
data), which we(and others) can use and analyse
● Big Data therefore refers to our ability to make use of the
ever increasing volumes of data
● Big Data is not about the size of the data, it's about the
value within the data
Datafication of the world
● Activities
- Web Browser
- Credit Cards
- E-Commerce
● Conversations
- WhatsApp
- Email
- Twitter
● Photos/Videos
- Instagram
- You Tube
● Sensors
- Gps
● Etc...
Turning Big Data into Value
Datafication of
our world
● Activity
● Conversation
● Sensors
● Photo/Video
● Etc...
Analysing Big
Data
● Text Analytics
● Sentiment
Analysis
● Movement
Analytics
● Face/Voice
Recognition
● Etc...
Value
Webdata
● Log Data(all user)
- Anonymous ID from Cookie Data
- LoginID (if exist)
- ArticleId
- Kanal / Category
- Browser
- IP
- Etc...
● Registered User Data(10 %)
- Login ID
- Name
- Age
- Gender
- Education
- Etc...
Valuable data
● User activness
● User interest based on reading behaviour
● Personal Profile for all user
Compute
Compute
Why use the UA 2 ?
User Activness and User Interest
How to update the User activness?
How to update the User activness?
New-UA=w_history * UA-sofar + w_current * UA-Per-Minggu
w_history=0.75
w_current=0.25
Asigning personal profile
Final data
How to define the similarity?
● Linear |x1 – x2|
● Square (x1 – x2)^2
● Exponential 10^f(|x1-x2|)
Complexity Analysis
● Assume 30.000.000 click a day
● A week: 210.000.000 click
● Size log entry: 1 kb
● Total size: 210.000.000.000 byte = 210 Gb
Complexity Analysis
● All User : 10.000.000
● Loginuser: 1.000.000
● Comparison per second per CPU: 1.000.000
● Total Comparison: 9.000.000.000.000
● Total Time: 9.000.000 second=104 hari
Big Data Platforms
What is the different?
What is Hadoop?
● Hadoop:
an open-source framework that supports data-intensive
distributed applications, licensed under apache v2 license
● Goals:
- Abstract and facilitate the storage and processing of large
and/or rapidly growing data sets
- High scalability and availability
- Use commodity Hardware
- Fault-tolerance
- Move computation rather than data
Hadoop Components
● Hadoop Distributed File System(HDFS)
A distributed file system that provides high-throughput access to
application data
● Hadoop YARN
A framework for job scheduling and cluster resource
management
● Hadoop MapReduce
A Yarn-based system for parallel processing of large data sets
What is Hive?
● Hive is a data warehouse infrastructure built on top of
Hadoop
● Hive stored data in the HDFS
● Hive compile SQL Queries into MapReduce jobs
Example Hive Script
What is Pig?
● Pig is a platform for analyzing large data sets that consist
of a high-level language for expressing data analysis
programs
● Pig generates and compiles a MapReduce program on the
fly
Example Pig Script
What is Spark?
● Fast and general purpose cluster computing system
● 10x(on disk) – 100x(in-memory) faster than Hadoop
MapReduce
● Provides high level APIs in
-Scala
-Java
-Python
● Can be deployed through Apache Mesos, Apache Hadoop
via YARN, or Spark's cluster manager
Resilient Distributed Datasets
● Written in scala
● Fundamental Unit of data in spark
● Distributed collection of object
● Resilient-Ability to recompute missing partions(node failure)
● Distributed-Split across multiple partions
● Dataset-Can contains any type, Scala/Java/Python Object or User
defined object
● Operations
-Transformations(map, filter, groupBy,...)
-Actions(count, collect, save, ...)
Spark Example
// Spark wordcount
object WordCount {
def main(args: Array[String]) {
val env = new SparkContext("local","wordCount")
val data = List("hi","how are you","hi")
val dataSet = env.parallelize(data)
val words = dataSet.flatMap(value => value.split("s+"))
val mappedWords = words.map(value => (value,1))
val sum = mappedWords.reduceByKey(_+_)
println(sum.collect())
}
}
What is Flink?
● Written in java
● An open source platform for distributed stream and batch
data processing
● Several APIs in Java/Scala/Python
-DataSet API – Batch processing
-DataStream API – Stream processing
-Table API – Relational Queries
Flink Example
// Flink wordcount
object WordCount {
def main(args: Array[String]) {
val env = ExecutionEnvironment.getExecutionEnvironment
val data = List("hi","how are you","hi")
val dataSet = env.fromCollection(data)
val words = dataSet.flatMap(value => value.split("s+"))
val mappedWords = words.map(value => (value,1))
val grouped = mappedWords.groupBy(0)
val sum = grouped.sum(1)
println(sum.collect())
}
}
Question ?

Más contenido relacionado

La actualidad más candente

Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
Ajay Ohri
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
Lewis Crawford
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
Edureka!
 

La actualidad más candente (20)

Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big data 101
Big data 101Big data 101
Big data 101
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Destacado

Destacado (18)

Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Brighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn ProgrammingBrighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn Programming
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.js
 
Codepolitan profile 2016
Codepolitan profile 2016Codepolitan profile 2016
Codepolitan profile 2016
 
How Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved IndonesiansHow Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved Indonesians
 
IoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to LearnIoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to Learn
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of Things
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for Hackathon
 
E-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of ThingsE-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of Things
 
Technology Stack KUDO.co.id
Technology Stack KUDO.co.idTechnology Stack KUDO.co.id
Technology Stack KUDO.co.id
 
React Webinar With CodePolitan
React Webinar With CodePolitanReact Webinar With CodePolitan
React Webinar With CodePolitan
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOP
 
Scaling tokopedia-past-present-future
Scaling tokopedia-past-present-futureScaling tokopedia-past-present-future
Scaling tokopedia-past-present-future
 
Rekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi WebsiteRekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi Website
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
Perkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia IndustriPerkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia Industri
 
Strategi Gaul di Sosial Media
Strategi Gaul di Sosial MediaStrategi Gaul di Sosial Media
Strategi Gaul di Sosial Media
 

Similar a What is Big Data?

Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
k4ndar
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
tsliwowicz
 

Similar a What is Big Data? (20)

Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
PyTables
PyTablesPyTables
PyTables
 
Py tables
Py tablesPy tables
Py tables
 

Más de CodePolitan (11)

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium Member
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarov
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potential
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayat
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexander
 
Vison final
Vison   finalVison   final
Vison final
 
Tride
TrideTride
Tride
 
React ftw
React ftwReact ftw
React ftw
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3
 

Último

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 

Último (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 

What is Big Data?

  • 1. Big Data Dipl. Inform.(FH) Jony Sugianto, M. Comp. Sc. Hp:0838-98355491 WA:0812-13086659 Email:jonysugianto@gmail.com
  • 2. Agenda ● What is Big Data? ● Analytic ● Big Data Platforms ● Questions
  • 3. What is Big Data? ● The Basic idea behind the phrase Big Data is that everything we do is increasingly leaving a digital trace(or data), which we(and others) can use and analyse ● Big Data therefore refers to our ability to make use of the ever increasing volumes of data ● Big Data is not about the size of the data, it's about the value within the data
  • 4. Datafication of the world ● Activities - Web Browser - Credit Cards - E-Commerce ● Conversations - WhatsApp - Email - Twitter ● Photos/Videos - Instagram - You Tube ● Sensors - Gps ● Etc...
  • 5. Turning Big Data into Value Datafication of our world ● Activity ● Conversation ● Sensors ● Photo/Video ● Etc... Analysing Big Data ● Text Analytics ● Sentiment Analysis ● Movement Analytics ● Face/Voice Recognition ● Etc... Value
  • 6. Webdata ● Log Data(all user) - Anonymous ID from Cookie Data - LoginID (if exist) - ArticleId - Kanal / Category - Browser - IP - Etc... ● Registered User Data(10 %) - Login ID - Name - Age - Gender - Education - Etc...
  • 7. Valuable data ● User activness ● User interest based on reading behaviour ● Personal Profile for all user
  • 10. Why use the UA 2 ?
  • 11. User Activness and User Interest
  • 12. How to update the User activness?
  • 13. How to update the User activness? New-UA=w_history * UA-sofar + w_current * UA-Per-Minggu w_history=0.75 w_current=0.25
  • 16. How to define the similarity? ● Linear |x1 – x2| ● Square (x1 – x2)^2 ● Exponential 10^f(|x1-x2|)
  • 17. Complexity Analysis ● Assume 30.000.000 click a day ● A week: 210.000.000 click ● Size log entry: 1 kb ● Total size: 210.000.000.000 byte = 210 Gb
  • 18. Complexity Analysis ● All User : 10.000.000 ● Loginuser: 1.000.000 ● Comparison per second per CPU: 1.000.000 ● Total Comparison: 9.000.000.000.000 ● Total Time: 9.000.000 second=104 hari
  • 20. What is the different?
  • 21. What is Hadoop? ● Hadoop: an open-source framework that supports data-intensive distributed applications, licensed under apache v2 license ● Goals: - Abstract and facilitate the storage and processing of large and/or rapidly growing data sets - High scalability and availability - Use commodity Hardware - Fault-tolerance - Move computation rather than data
  • 22. Hadoop Components ● Hadoop Distributed File System(HDFS) A distributed file system that provides high-throughput access to application data ● Hadoop YARN A framework for job scheduling and cluster resource management ● Hadoop MapReduce A Yarn-based system for parallel processing of large data sets
  • 23. What is Hive? ● Hive is a data warehouse infrastructure built on top of Hadoop ● Hive stored data in the HDFS ● Hive compile SQL Queries into MapReduce jobs
  • 25. What is Pig? ● Pig is a platform for analyzing large data sets that consist of a high-level language for expressing data analysis programs ● Pig generates and compiles a MapReduce program on the fly
  • 27. What is Spark? ● Fast and general purpose cluster computing system ● 10x(on disk) – 100x(in-memory) faster than Hadoop MapReduce ● Provides high level APIs in -Scala -Java -Python ● Can be deployed through Apache Mesos, Apache Hadoop via YARN, or Spark's cluster manager
  • 28. Resilient Distributed Datasets ● Written in scala ● Fundamental Unit of data in spark ● Distributed collection of object ● Resilient-Ability to recompute missing partions(node failure) ● Distributed-Split across multiple partions ● Dataset-Can contains any type, Scala/Java/Python Object or User defined object ● Operations -Transformations(map, filter, groupBy,...) -Actions(count, collect, save, ...)
  • 29. Spark Example // Spark wordcount object WordCount { def main(args: Array[String]) { val env = new SparkContext("local","wordCount") val data = List("hi","how are you","hi") val dataSet = env.parallelize(data) val words = dataSet.flatMap(value => value.split("s+")) val mappedWords = words.map(value => (value,1)) val sum = mappedWords.reduceByKey(_+_) println(sum.collect()) } }
  • 30. What is Flink? ● Written in java ● An open source platform for distributed stream and batch data processing ● Several APIs in Java/Scala/Python -DataSet API – Batch processing -DataStream API – Stream processing -Table API – Relational Queries
  • 31. Flink Example // Flink wordcount object WordCount { def main(args: Array[String]) { val env = ExecutionEnvironment.getExecutionEnvironment val data = List("hi","how are you","hi") val dataSet = env.fromCollection(data) val words = dataSet.flatMap(value => value.split("s+")) val mappedWords = words.map(value => (value,1)) val grouped = mappedWords.groupBy(0) val sum = grouped.sum(1) println(sum.collect()) } }