SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión


Final Project


Apache Flume




Rapheephan Thongkham-Uan (Nancy)

cscie90 Cloud Computing
Harvard University Extension School
Prof. Zoran B. Djordjević

@TakeshiDemonkey

1
What is Apache Flume?
▪ Flume is distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data from
many different sources to a centralised data store. (http://
flume.apache.org/FlumeUserGuide.html)

!
!
!
!
!
!
!
▪ Currently available versions are 0.9.x and 1.x
▪ I want to focus on Flume use cases in manufacturing.

@Rapheephan

2
Applying Flume to Manufacturing Process
▪ In the factory, there are many machines used in the production.

!
!
!
!
!
▪ If a machine produces 1 log data file when 1 lot of product finishes
processing. In one day, there will be a big amount of log data stored
in the server.
▪ For the quality control and the production control improvement,
analysing these log files in a real time is our objective.
▪ First, we need to collect these log data files from the production
lines into the HDFS, then pass them through the analysis process.

@Rapheephan

3
Multi-agent flow image in the production system

AGENT 1

consolidation

AGENT 2

AGENT 4
HDFS

AGENT 3

@Rapheephan

4
My Sample
agent 1
CHANNEL

SOURCE

SINK

HDFS

▪ My system
▪ Java Runtime Environment (Java1.6.0_31)
▪ Cloudera's Distribution Including Apache Hadoop (CDH4.3)

▪ Working steps
1. Install Apache Flume on the Host machine
(Flume installation guide for CDH4: http://www.cloudera.com/content/cloudera-content/
cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_12.html)

2. Create 2 log generation java applications for machine1 and machine2
3. Configure Flume agent
4. Start Flume agent and test the system

@Rapheephan

5
Prepare the log generation application
▪ Create 2 Virtual Machines for generating machine1’s and machine2’s
log data.
▪ Create a simple socket java program to producing log events to
agent’s source with specific port (11111)

!
!
!
!
!
!
!
▪ Export it as an executable JAR file, and move it to the virtual
machine1
▪ Copy and move the other to the virtual machine2
@Rapheephan

6
Configuration Flume-ng agent on Host
▪ We have to configure all sink, channel, and source in the flow. My
agent name is hdfs-agent
▪ First, name the components in the agent.

! hdfs-agent.sources = log-collect
= memoryChannel
! hdfs-agent.channelshdfs-write
hdfs-agent.sinks =
!
▪ Next, define the source’s properties as follow

! hdfs-agent.sources.log-collect.type = netcat
! hdfs-agent.sources.log-collect.bind = 133.196.211.209
hdfs-agent.sources.log-collect.port = 11111
!!
hdfs-agent.sources.log-collect.channels = memoryChannel
!
▪ My source type is netcat-like source, that listens on the port ‘11111’
▪ Don’t forget to define the channel used by the source.
@Rapheephan

7
Configuration Flume-ng agent on Host (2)
▪ We want to collect the log data and write to the ‘testFlume’
directory on the HDFS cluster. Therefore, the sink should be defined
as follow.

! hdfs-agent.sinks.hdfs-write.type = hdfs
! hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://<namenode>/user/
<myusername>/testflume
= Text
! hdfs-agent.sinks.hdfs-write.hdfs.writeFormatDataStream
hdfs-agent.sinks.hdfs-write.hdfs.fileType =
!!
hdfs-agent.sinks.hdfs-write.channel = memoryChannel
!
▪ Don’t forget to specify the channel used by the sink.
▪ Finally, configure the channel

! hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 1000
!
▪ The channel will store the log data in-memory with the maximum
1000 events.
@Rapheephan

8
Start the Flume agent and get result
▪ My configuration file name is ‘flume.conf’, and my agent name is
‘hdfs-agent’.
▪ Start the Flume agent using the following command.
$! flume-ng agent --conf-file flume.conf --name hdfs-agent

▪ Execute the genLog.jar on both machines
▪ On Flume master, you will be able to see something like this

!
!
!
!

13/12/17 14:36:13 INFO hdfs.BucketWriter: Creating hdfs://<namenode>:
8020/user/<my userid>/testflume/FlumeData.1387258572230.tmp
13/12/17 14:36:19 INFO hdfs.BucketWriter: Renaming hdfs://cmccldULL6400.toshiba.co.jp:8020/user/g0092010/testflume/FlumeData.
1387258572230.tmp to hdfs://<namenode>:8020/user/<my userid>/testflume/
FlumeData.1387258572230

▪ Verify that the log data has stored as events on the HDFS
g0092010@cmc-cldULL6400:~$ hadoop fs -cat
2013-12-17 14:32:19: This is a sample log
2013-12-17 14:32:24: This is a sample log
2013-12-17 14:32:27: This is a sample log
2013-12-17 14:32:29: This is a sample log
@Rapheephan

testflume/*30
file from machine
file from machine
file from machine
file from machine

1.
1.
2.
1.

9
Next steps
▪ Analyse the log data and Visualise it in the (near) real time.

!
!
!
!
!
!
!
!
!

AGENT 1

MapReduce
Hive

AGENT 2

AGENT 4

Mahout

Visualisation
tools

HDFS

Impala

AGENT 3

▪ Improving throughputs of the system.
▪ Analysing and Predicting the future trend.
▪ etc.
@Rapheephan

10

Más contenido relacionado

La actualidad más candente

ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016Jayesh Thakrar
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Cloudera, Inc.
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperAnandMHadoop
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLCloudera, Inc.
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...kawamuray
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...Yahoo Developer Network
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkStreamNative
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...JinfengHuang3
 

La actualidad más candente (20)

ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Flume intro-100715
Flume intro-100715Flume intro-100715
Flume intro-100715
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11
 
Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
ApacheCon-HBase-2016
ApacheCon-HBase-2016ApacheCon-HBase-2016
ApacheCon-HBase-2016
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQLHBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 

Destacado

Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveIMC Institute
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutesdwmclary
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with FlumeRatnakar Pawar
 
Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015Steve Hoffman
 
DataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestDataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestOllie Parsley
 
Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011ZoeMM
 
The Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalThe Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalRushin Naik
 
Apache Flume NG
Apache Flume NGApache Flume NG
Apache Flume NGhuguk
 
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Hortonworks
 
Tweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningTweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningSngular Meaning
 
Demo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designDemo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designChristine Outram
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
 

Destacado (20)

Analyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and HiveAnalyse Tweets using Flume, Hadoop and Hive
Analyse Tweets using Flume, Hadoop and Hive
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
'Flume' Case Study
'Flume' Case Study'Flume' Case Study
'Flume' Case Study
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
Flume Case Study
Flume Case StudyFlume Case Study
Flume Case Study
 
Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
Centralized logging with Flume
Centralized logging with FlumeCentralized logging with Flume
Centralized logging with Flume
 
Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015Enabling Microservices @Orbitz - DockerCon 2015
Enabling Microservices @Orbitz - DockerCon 2015
 
Impalaチューニングポイントベストプラクティス
ImpalaチューニングポイントベストプラクティスImpalaチューニングポイントベストプラクティス
Impalaチューニングポイントベストプラクティス
 
Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~Hadoop ~Yahoo! JAPANの活用について~
Hadoop ~Yahoo! JAPANの活用について~
 
DataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestDataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - Devnest
 
Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011
 
The Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalThe Asset Consultancy_PPT _final
The Asset Consultancy_PPT _final
 
Apache Flume NG
Apache Flume NGApache Flume NG
Apache Flume NG
 
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
 
Tweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningTweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion mining
 
Demo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designDemo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product design
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 

Similar a Apache Flume and its use case in Manufacturing

Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015Chris Tankersley
 
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Chris Tankersley
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - FlumeAnandMHadoop
 
Php through the eyes of a hoster confoo
Php through the eyes of a hoster confooPhp through the eyes of a hoster confoo
Php through the eyes of a hoster confooCombell NV
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
 
Why we choose Symfony2
Why we choose Symfony2Why we choose Symfony2
Why we choose Symfony2Merixstudio
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Amin Astaneh
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkooFabrice dos Santos
 
Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015Chris Tankersley
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
Installing and Getting Started with Alfresco
Installing and Getting Started with AlfrescoInstalling and Getting Started with Alfresco
Installing and Getting Started with AlfrescoWildan Maulana
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0venkatakrishnan k
 
lamp technology
lamp technologylamp technology
lamp technologyDeepa
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa
 

Similar a Apache Flume and its use case in Manufacturing (20)

Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015Your Inner Sysadmin - MidwestPHP 2015
Your Inner Sysadmin - MidwestPHP 2015
 
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - Flume
 
Php through the eyes of a hoster confoo
Php through the eyes of a hoster confooPhp through the eyes of a hoster confoo
Php through the eyes of a hoster confoo
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Running php on nginx
Running php on nginxRunning php on nginx
Running php on nginx
 
Why we choose Symfony2
Why we choose Symfony2Why we choose Symfony2
Why we choose Symfony2
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)Linux Server Deep Dives (DrupalCon Amsterdam)
Linux Server Deep Dives (DrupalCon Amsterdam)
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkoo
 
Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015Your Inner Sysadmin - LonestarPHP 2015
Your Inner Sysadmin - LonestarPHP 2015
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
Installing and Getting Started with Alfresco
Installing and Getting Started with AlfrescoInstalling and Getting Started with Alfresco
Installing and Getting Started with Alfresco
 
Its3 Drupal
Its3 DrupalIts3 Drupal
Its3 Drupal
 
Its3 Drupal
Its3 DrupalIts3 Drupal
Its3 Drupal
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
lamp technology
lamp technologylamp technology
lamp technology
 
Deepa ppt about lamp technology
Deepa ppt about lamp technologyDeepa ppt about lamp technology
Deepa ppt about lamp technology
 

Último

Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

Último (20)

Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 

Apache Flume and its use case in Manufacturing

  • 1. 
 Final Project
 Apache Flume
 
 Rapheephan Thongkham-Uan (Nancy) cscie90 Cloud Computing Harvard University Extension School Prof. Zoran B. Djordjević @TakeshiDemonkey 1
  • 2. What is Apache Flume? ▪ Flume is distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data from many different sources to a centralised data store. (http:// flume.apache.org/FlumeUserGuide.html) ! ! ! ! ! ! ! ▪ Currently available versions are 0.9.x and 1.x ▪ I want to focus on Flume use cases in manufacturing. @Rapheephan 2
  • 3. Applying Flume to Manufacturing Process ▪ In the factory, there are many machines used in the production. ! ! ! ! ! ▪ If a machine produces 1 log data file when 1 lot of product finishes processing. In one day, there will be a big amount of log data stored in the server. ▪ For the quality control and the production control improvement, analysing these log files in a real time is our objective. ▪ First, we need to collect these log data files from the production lines into the HDFS, then pass them through the analysis process. @Rapheephan 3
  • 4. Multi-agent flow image in the production system AGENT 1 consolidation AGENT 2 AGENT 4 HDFS AGENT 3 @Rapheephan 4
  • 5. My Sample agent 1 CHANNEL SOURCE SINK HDFS ▪ My system ▪ Java Runtime Environment (Java1.6.0_31) ▪ Cloudera's Distribution Including Apache Hadoop (CDH4.3) ▪ Working steps 1. Install Apache Flume on the Host machine (Flume installation guide for CDH4: http://www.cloudera.com/content/cloudera-content/ cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_12.html) 2. Create 2 log generation java applications for machine1 and machine2 3. Configure Flume agent 4. Start Flume agent and test the system @Rapheephan 5
  • 6. Prepare the log generation application ▪ Create 2 Virtual Machines for generating machine1’s and machine2’s log data. ▪ Create a simple socket java program to producing log events to agent’s source with specific port (11111) ! ! ! ! ! ! ! ▪ Export it as an executable JAR file, and move it to the virtual machine1 ▪ Copy and move the other to the virtual machine2 @Rapheephan 6
  • 7. Configuration Flume-ng agent on Host ▪ We have to configure all sink, channel, and source in the flow. My agent name is hdfs-agent ▪ First, name the components in the agent. ! hdfs-agent.sources = log-collect = memoryChannel ! hdfs-agent.channelshdfs-write hdfs-agent.sinks = ! ▪ Next, define the source’s properties as follow ! hdfs-agent.sources.log-collect.type = netcat ! hdfs-agent.sources.log-collect.bind = 133.196.211.209 hdfs-agent.sources.log-collect.port = 11111 !! hdfs-agent.sources.log-collect.channels = memoryChannel ! ▪ My source type is netcat-like source, that listens on the port ‘11111’ ▪ Don’t forget to define the channel used by the source. @Rapheephan 7
  • 8. Configuration Flume-ng agent on Host (2) ▪ We want to collect the log data and write to the ‘testFlume’ directory on the HDFS cluster. Therefore, the sink should be defined as follow. ! hdfs-agent.sinks.hdfs-write.type = hdfs ! hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://<namenode>/user/ <myusername>/testflume = Text ! hdfs-agent.sinks.hdfs-write.hdfs.writeFormatDataStream hdfs-agent.sinks.hdfs-write.hdfs.fileType = !! hdfs-agent.sinks.hdfs-write.channel = memoryChannel ! ▪ Don’t forget to specify the channel used by the sink. ▪ Finally, configure the channel ! hdfs-agent.channels.memoryChannel.type = memory hdfs-agent.channels.memoryChannel.capacity = 1000 ! ▪ The channel will store the log data in-memory with the maximum 1000 events. @Rapheephan 8
  • 9. Start the Flume agent and get result ▪ My configuration file name is ‘flume.conf’, and my agent name is ‘hdfs-agent’. ▪ Start the Flume agent using the following command. $! flume-ng agent --conf-file flume.conf --name hdfs-agent ▪ Execute the genLog.jar on both machines ▪ On Flume master, you will be able to see something like this ! ! ! ! 13/12/17 14:36:13 INFO hdfs.BucketWriter: Creating hdfs://<namenode>: 8020/user/<my userid>/testflume/FlumeData.1387258572230.tmp 13/12/17 14:36:19 INFO hdfs.BucketWriter: Renaming hdfs://cmccldULL6400.toshiba.co.jp:8020/user/g0092010/testflume/FlumeData. 1387258572230.tmp to hdfs://<namenode>:8020/user/<my userid>/testflume/ FlumeData.1387258572230 ▪ Verify that the log data has stored as events on the HDFS g0092010@cmc-cldULL6400:~$ hadoop fs -cat 2013-12-17 14:32:19: This is a sample log 2013-12-17 14:32:24: This is a sample log 2013-12-17 14:32:27: This is a sample log 2013-12-17 14:32:29: This is a sample log @Rapheephan testflume/*30 file from machine file from machine file from machine file from machine 1. 1. 2. 1. 9
  • 10. Next steps ▪ Analyse the log data and Visualise it in the (near) real time. ! ! ! ! ! ! ! ! ! AGENT 1 MapReduce Hive AGENT 2 AGENT 4 Mahout Visualisation tools HDFS Impala AGENT 3 ▪ Improving throughputs of the system. ▪ Analysing and Predicting the future trend. ▪ etc. @Rapheephan 10