SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Scalable Big Data Architecture
Big Data Big Problem?
PRESENTATION BY :
MOHAMMAD HASAN FARAZMAND
OCTOBER 2016
M.H.FARAZMAND@GMAIL.COM
We Will Review…
 Identifying Big Data Symptoms
 Size Matters
 Typical Business Use Case
 Understanding the Big Data Project’s Ecosystem
 Hadoop Distribution
 Data Acquisition
 Processing Language
 Machine Learning
 NoSQL Stores
 Foundation of long-term Big Data Architecture
 Architecture Overview
 Long Ingestion Application
 Learning Application
 Processing Engine
 Search Engine
This presentation has been prepared based on the first chapter of
Scalable Big Data Architecture
by
Bahaaldine Azarmi
Identifying Big Data Symptoms
 Data management is more complex than it has been before!
 Big Data is every where , on every one’s mind
 When Should I think about employing Big Data ?
 Am I ready?
 What should I start with?!
 Different needs :
 The volume of data you handle
 Variety of data structure
 Scalability issue
 Reduce the cost of data processing
Size Matters
 Two main areas : Size + Volume
 Handle new data structures with flexible & schemaless technology
 Big data is also about extracting added value information
 Near real time processing with distributed architecture
 Execute complex queries with NoSQL store
Value
Typical Business Use Case
Analyzing application’s log, web access log, server log, DB log, Social
Networks
 Customer Behavior Analytics : Used on e-commerce websites
 Sentiment Analysis : Images and reputation of companies which
perceived across social networks.
 CRM On Boarding : Combine online data sources with offline data
sources for better and more accurate customer segmentation ( profile-
customized offers)
 Prediction : Learning from Data , main big data trend (for 2 past years) –
For example in telecommunication industry :
1) Issue or event prediction based on router log
2) Product catalog selection
3) Pricing depending on user’s global behavior
Understanding Big Data Project’s Ecosystem
Choosing …
 Hadoop distribution
 Distributed file system
 SQL-Like processing language
 Machine learning language
 Scheduler
 Message-oriented middleware
 NoSQL data store
 Data visualization
Hadoop Distribution
Two Choices :
 Download the project you need separately
 Use one of most popular Hadoop distribution
Cloudera CDH
1. Impala : realtime, parallelized, SQL based engine that searches for
data in HDFS and Base.
2. Cloudera Management : Cloudera’s console to manage and
deploy Hadoop components.
3. Hue : Console for user interaction with data and scripts
Hortonworks HDP
Hadoop Distributed File System
HDFS
Key features:
 Distribution
 High Availability
 Fault Tolerance
 Tuning
 Security
 Load Balancing
 High Throughput Access
Automatic replication across the cluster data nodes
Data Acquisition
 Large log file, Streamed data, ETL processing outcome, Online
unstructured data, Offline structured data, etc.
ApacheFlume
 Reliable, Highly available, Simple, Flexible, Intuitive programming
model based on streaming data flows.
 Composed of “Sources”,”Channels”,”Sinks”
Apache Sqoop
 Transfer bulk data between structured data store and HDFS.
 Import data from external relational database to HDFS, Hbase , Hive.
 Export data from Hadoop cluster to a relational database or data
warehouse.
Processing Language
 MapReduce was the main processing framework in the first
generation of the Hadoop cluster.
 Grouping sibling data together (Map) and then aggregating the
data in depending on a specified aggregation operation (Reduce).
 Now that YARN (Yet Another Resource Negotiator) has been
implemented.
Batch Processing with Hive
 Hive, which brings users the simplicity and power of querying data
from HDFS in a SQL-like way.
 Hive is not a near or real-time processing language. It is long-term
processing job with a low priority
 Main drawback of using another language rather than using native
MapReduce, is “Performance”.
Stream Processing with Spark Streaming
 Extension of Spark.
 Leveraging Spark’s distributed data processing framework and treats
streaming computation.
 Spark Streaming lets you write a processing job as you would do for
batch processing in Java, Scale, or Python.
 Foundation of a strong fault-tolerant and high-performance system.
Message-Oriented Middleware
with Apache Kafka
 Persistent messaging and high-throughput system.
 Kafka as a pivot point in our architecture mainly to receive data
and push it into Spark Streaming.
Machine Learning
 Spark MLlib enables machine learning for Spark.
 Composed of various algorithms that go from basic statistics, logistic
regression, k-means clustering, and Gaussian mixtures to singular
value decomposition and multinomial naive Bayes.
 Train your data and build prediction models with a few lines of code
NoSQL Stores
 Fundamental pieces of the data architecture.
 Scalability and Resiliency, and thus High Availability.
 Ingest a very large amount of data.
Couchbase
 Document-oriented NoSQL database that is easily scalable,
provides a flexible model, and is consistently high performance.
ElasticSearch
 Scalable distributed indexing engine and search features.
 Based on Apache Lucene and enables real-time data analytics
and full-text search in your architecture.
ELK platform
 ElasticSearch is part of the ELK platform.
 ElasticSearch + Logstash + Kibana
 Provide the best end-to-end platform for collecting, storing, and
visualizing data.
 Logstash lets you collect data from many kinds of sources
 ElasticSearch indexes the data in a distributed, scalable, and
resilient system.
 Kibana is a customizable user interface in which you can build a
simple to complex dashboard to explore and visualize data indexed
by ElasticSearch.
Foundation of a Long-Term
Big Data Architecture
Log Ingestion Application
 Consume application logs such as web access logs.
Learning Application
 Receives a stream of data and builds prediction to optimize our
recommendation engine.
Processing Engine
 Heart of the architecture
Summary
 The search engine leverages the data processed by the processing
engine and exposes a dedicated RESTful API that will be used for
analytic purposes.
Search Engine
 We have seen all the components that make up our architecture
Good Luck

Más contenido relacionado

La actualidad más candente

Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 

La actualidad más candente (20)

Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Digital Transformation with Microsoft Azure
Digital Transformation with Microsoft AzureDigital Transformation with Microsoft Azure
Digital Transformation with Microsoft Azure
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 

Destacado (17)

Group feedback overall
Group feedback overallGroup feedback overall
Group feedback overall
 
Glosary Hotel and Tourism books
Glosary Hotel and Tourism booksGlosary Hotel and Tourism books
Glosary Hotel and Tourism books
 
Research
ResearchResearch
Research
 
Research34
Research34Research34
Research34
 
Unaltered Pictures for Mixmag magazine
Unaltered Pictures for Mixmag magazineUnaltered Pictures for Mixmag magazine
Unaltered Pictures for Mixmag magazine
 
Final proposal
Final proposalFinal proposal
Final proposal
 
El amor
El amorEl amor
El amor
 
Welcome to btec media 4.0
Welcome to btec media 4.0Welcome to btec media 4.0
Welcome to btec media 4.0
 
My report
My reportMy report
My report
 
Page 4
Page 4Page 4
Page 4
 
Questionnaire for student magazine
Questionnaire for student magazineQuestionnaire for student magazine
Questionnaire for student magazine
 
Page 6
Page 6Page 6
Page 6
 
Assignment 2 ethics 16th jan 2013
Assignment 2 ethics 16th jan 2013Assignment 2 ethics 16th jan 2013
Assignment 2 ethics 16th jan 2013
 
Questionnaire for student magazine
Questionnaire for student magazineQuestionnaire for student magazine
Questionnaire for student magazine
 
El semàfor dels sentiments
El semàfor dels sentimentsEl semàfor dels sentiments
El semàfor dels sentiments
 
Top 10 Security Challenges
Top 10 Security ChallengesTop 10 Security Challenges
Top 10 Security Challenges
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar a Big Data , Big Problem?

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Similar a Big Data , Big Problem? (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
paper
paperpaper
paper
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Hadoop
HadoopHadoop
Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 

Último

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Último (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Big Data , Big Problem?

  • 1. Scalable Big Data Architecture Big Data Big Problem? PRESENTATION BY : MOHAMMAD HASAN FARAZMAND OCTOBER 2016 M.H.FARAZMAND@GMAIL.COM
  • 2. We Will Review…  Identifying Big Data Symptoms  Size Matters  Typical Business Use Case  Understanding the Big Data Project’s Ecosystem  Hadoop Distribution  Data Acquisition  Processing Language  Machine Learning  NoSQL Stores  Foundation of long-term Big Data Architecture  Architecture Overview  Long Ingestion Application  Learning Application  Processing Engine  Search Engine
  • 3. This presentation has been prepared based on the first chapter of Scalable Big Data Architecture by Bahaaldine Azarmi
  • 4. Identifying Big Data Symptoms  Data management is more complex than it has been before!  Big Data is every where , on every one’s mind  When Should I think about employing Big Data ?  Am I ready?  What should I start with?!  Different needs :  The volume of data you handle  Variety of data structure  Scalability issue  Reduce the cost of data processing
  • 5. Size Matters  Two main areas : Size + Volume  Handle new data structures with flexible & schemaless technology  Big data is also about extracting added value information  Near real time processing with distributed architecture  Execute complex queries with NoSQL store Value
  • 6. Typical Business Use Case Analyzing application’s log, web access log, server log, DB log, Social Networks  Customer Behavior Analytics : Used on e-commerce websites  Sentiment Analysis : Images and reputation of companies which perceived across social networks.  CRM On Boarding : Combine online data sources with offline data sources for better and more accurate customer segmentation ( profile- customized offers)  Prediction : Learning from Data , main big data trend (for 2 past years) – For example in telecommunication industry : 1) Issue or event prediction based on router log 2) Product catalog selection 3) Pricing depending on user’s global behavior
  • 7. Understanding Big Data Project’s Ecosystem Choosing …  Hadoop distribution  Distributed file system  SQL-Like processing language  Machine learning language  Scheduler  Message-oriented middleware  NoSQL data store  Data visualization
  • 8. Hadoop Distribution Two Choices :  Download the project you need separately  Use one of most popular Hadoop distribution
  • 9. Cloudera CDH 1. Impala : realtime, parallelized, SQL based engine that searches for data in HDFS and Base. 2. Cloudera Management : Cloudera’s console to manage and deploy Hadoop components. 3. Hue : Console for user interaction with data and scripts
  • 11. Hadoop Distributed File System HDFS Key features:  Distribution  High Availability  Fault Tolerance  Tuning  Security  Load Balancing  High Throughput Access Automatic replication across the cluster data nodes
  • 12. Data Acquisition  Large log file, Streamed data, ETL processing outcome, Online unstructured data, Offline structured data, etc. ApacheFlume  Reliable, Highly available, Simple, Flexible, Intuitive programming model based on streaming data flows.  Composed of “Sources”,”Channels”,”Sinks”
  • 13. Apache Sqoop  Transfer bulk data between structured data store and HDFS.  Import data from external relational database to HDFS, Hbase , Hive.  Export data from Hadoop cluster to a relational database or data warehouse.
  • 14. Processing Language  MapReduce was the main processing framework in the first generation of the Hadoop cluster.  Grouping sibling data together (Map) and then aggregating the data in depending on a specified aggregation operation (Reduce).  Now that YARN (Yet Another Resource Negotiator) has been implemented.
  • 15. Batch Processing with Hive  Hive, which brings users the simplicity and power of querying data from HDFS in a SQL-like way.  Hive is not a near or real-time processing language. It is long-term processing job with a low priority  Main drawback of using another language rather than using native MapReduce, is “Performance”.
  • 16. Stream Processing with Spark Streaming  Extension of Spark.  Leveraging Spark’s distributed data processing framework and treats streaming computation.  Spark Streaming lets you write a processing job as you would do for batch processing in Java, Scale, or Python.  Foundation of a strong fault-tolerant and high-performance system.
  • 17. Message-Oriented Middleware with Apache Kafka  Persistent messaging and high-throughput system.  Kafka as a pivot point in our architecture mainly to receive data and push it into Spark Streaming.
  • 18. Machine Learning  Spark MLlib enables machine learning for Spark.  Composed of various algorithms that go from basic statistics, logistic regression, k-means clustering, and Gaussian mixtures to singular value decomposition and multinomial naive Bayes.  Train your data and build prediction models with a few lines of code
  • 19. NoSQL Stores  Fundamental pieces of the data architecture.  Scalability and Resiliency, and thus High Availability.  Ingest a very large amount of data.
  • 20. Couchbase  Document-oriented NoSQL database that is easily scalable, provides a flexible model, and is consistently high performance. ElasticSearch  Scalable distributed indexing engine and search features.  Based on Apache Lucene and enables real-time data analytics and full-text search in your architecture.
  • 21. ELK platform  ElasticSearch is part of the ELK platform.  ElasticSearch + Logstash + Kibana  Provide the best end-to-end platform for collecting, storing, and visualizing data.  Logstash lets you collect data from many kinds of sources  ElasticSearch indexes the data in a distributed, scalable, and resilient system.  Kibana is a customizable user interface in which you can build a simple to complex dashboard to explore and visualize data indexed by ElasticSearch.
  • 22. Foundation of a Long-Term Big Data Architecture
  • 23. Log Ingestion Application  Consume application logs such as web access logs.
  • 24. Learning Application  Receives a stream of data and builds prediction to optimize our recommendation engine.
  • 25. Processing Engine  Heart of the architecture
  • 26. Summary  The search engine leverages the data processed by the processing engine and exposes a dedicated RESTful API that will be used for analytic purposes. Search Engine  We have seen all the components that make up our architecture