SlideShare una empresa de Scribd logo
1 de 20
Log Ingestion on Big Data Platform
with Flume
Agenda
•Why Centralized Logging on Hadoop
•Flume Introduction
•Simple Flume Logging
•Centralized and Scalable Flume Logging
•Leveraging log data
•Example
2
•There are tons of logs generated from Applications
•These logs are stored on local disks on individual nodes.
•Log files containing records are required to archive in near real time to
create some value.
•Enable analytics on logs for diagnosing issues on Hadoop platform.
3
Use Case: Centralized Logging Requirements
Centralized Log Management & Analytics : Goals
•Have a central repository to store large volume of machine generated data
from all sources and tiers of applications and infrastructures
•Feed log data from multiple sources to the common repository in a non-
intrusive way and in near real time
•Enable analytics on log data using standard analytical solutions
•Provide capability to search and correlate information across different sources
for quick problem isolation and resolution.
•Improve operational intelligence and
•Be centralized without redundancy of multiple agents on all hosts for log
collections
4
Solution Components for centralized logging
Flume
•Flume is a streaming service, distributed as part of Apache Hadoop ecosystem, and
primarily a reliable way of getting stream and log data into HDFS. Its pluggable
architecture supports any consumer. A correctly configured pipeline of Flume is
guaranteed to not lose data, provided durable channels are used.
•Each Flume agent consists of three major components: sources, channels, and sinks.
Sources
An active component that receives events from a specialized location or mechanism
and places it on one or Channels.
Different Source types:
Specialized sources for integrating with well-known systems. Example:
Syslog, Netcat
AvroSource NetcatSource SpoolDirectorySource
ExecSource JMSSource SyslogTcpSource SyslogUDPSource
5
Channels
A passive component that buffers the incoming events until they are drained by
Sinks.
Different Channels offer different levels of persistence:
Memory Channel: volatile
Data lost if JVM or machine restarts
File Channel: backed by WAL implementation Data not lost unless the disk dies.
Eventually, when the agent comes back data can be accessed.
Channels are fully transactional
Provide weak ordering guarantees (in case of failures / rollbacks )
Can work with any number of Sources and Sinks.
Handles upstream bursts
Upstream or downstream buffers
Sinks
An active component that removes events from a Channel and transmits them
to their next hop destination.
Different types of Sinks:
Terminal sinks that deposit events to their final destination. For example:
HDFS, HBase, Kite-Solr, Elastic Search
Sinks support serialization to user’s preferred formats.
HDFS sink supports time-based and arbitrary bucketing of data while writing to
HDFS.
IPC sink for Agent-to-Agent communication: Avro
Require exactly one channel to function
Flume Multi Tier Setup
[Client]+ Agent [ Agent]* Destination_______
Interceptors
Interceptor
Flume has the capability to modify/drop events in-flight. This is done with the help of
interceptors. An interceptor can modify or even drop events based on any criteria
chosen by the developer of the interceptor.
Built-in Interceptors allow adding headers such as timestamps, hostname, static
markers etc.
Custom interceptors can introspect event payload to create specific headers where
necessary
Configuration Example: Flume Agents
● Hierarchical
● Flow of components
11
Contextual Routing with Interceptors
Achieved using Interceptors and Channel Selectors
Terminal Sinks can directly use Headers to make destination selections
HDFS Sink can use headers values to create dynamic path for files that the event
will be added to.
# channel selector configuration
agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing
agent_foo.sources.avro-AppSrv-source1.selector.header = State
agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1
agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2
agent_foo.sources.avro-AppSrv-source1.selector.optional.CA = mem-channel-1 file-channel-2
agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1
Flume Client
An entity that generates events and sends them to one or more Agents.
• Example
• Flume/Syslog log4j Appender
• Custom Client using Client SDK (org.apache.flume.api)
• Embedded Agent – An agent embedded within your application
• Decouples Flume from the system where event data is consumed from
• Not needed in all cases
Client Applications
Configuration Example: Log4j
•log4j.appender.syslog=org.apache.log4j.net.SyslogAppender
•log4j.appender.syslog.Facility=LOCAL3
•log4j.appender.syslog.FacilityPrinting=false
•log4j.appender.syslog.Header=true
•log4j.appender.syslog.SyslogHost=FlumedestinationHost:4444
•log4j.appender.syslog.layout=org.apache.log4j.PatternLayout
•log4j.appender.syslog.layout.ConversionPattern= TYPE: DUMMY %p: (%F:%L) %x %m %n
14
Below is log4 configuration snippet , To enable java applications to send events
For Non log4j Applications
Rsyslog
•Rsyslog is an open-source software utility used on UNIX and Unix-like computer systems for
forwarding log messages in an IP network. It implements the basic syslog protocol, extends it
with content-based filtering, rich filtering capabilities, flexible configuration options and adds
features such as using TCP for transport.
● Used in most of the Linux distros as standard logger
● Has multiple facilities for application use local0-local7 (avoid local7)
● Can poll any file on system and send new events over the network to syslog destinations
● service rsyslog restart
$ModLoad imfile
$InputFileName /var/log/NEWAPP/NEWAPP.log
$InputFileTag TYPE:_NEWAPP
$WorkDirectory /var/spool/rsyslog/NEWAPP
$InputFileStateFile NEWAPP-log
$InputFileFacility local7
$InputFilePersistStateInterval 10
$InputFileSeverity info
$RepeatedMsgReduction off
$InputRunFileMonitor
local7.* @@flumehost:4444
Solution: Near Real Time Log Archive to Hadoop Platform
16
Event Flow :: Simple Flume Logging
Solution: Near Real Time Log Archive to Hadoop Platform
17
•Less centralized , avoiding single point of failure.
•In case collector fails , events are still not lost.
•Scope for further scalability , with minimum configuration.
Configuration Example: Flume Multi tier Config
●Flume Listener Agents
■ This agent gathers events from multiple applications.
■ can also perform event inspections using interceptors in this tier.
■ Each event is analyzed and sent forward with appropriate header(only) updates so next agent
can make sense of it.
■ We can use filechannel or any other durable channel here.
■ Events aggregated for next tier
●Flume Writer Tier
■ Minimum connections to HDFS
■ This agent gets events from aggregator and reads headers.
■ According to header events are sent to relevant location on HDFS.
18
DDL for creating a Hive table with log data,
CREATE TABLE logData_H2 (
Ltype STRING,
event_time STRING,
porder STRING,
SEVERITY STRING,
SCLASS STRING,
PHO STRING ,
MESG STRING
) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/data/logmgmt/_DUMMY/raz-XPS14/150703/';
Thank you

Más contenido relacionado

La actualidad más candente

Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil DubeySwapnil Dubey
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insightsOmid Vahdaty
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexasArvind Prabhakar
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an IntroductionErik Schmiegelow
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016Jayesh Thakrar
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeArvind Prabhakar
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight OverviewJacques Nadeau
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBaseHBaseCon
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetupvmoorthy
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 

La actualidad más candente (20)

Apache flume by Swapnil Dubey
Apache flume by Swapnil DubeyApache flume by Swapnil Dubey
Apache flume by Swapnil Dubey
 
Apache flume - Twitter Streaming
Apache flume - Twitter Streaming Apache flume - Twitter Streaming
Apache flume - Twitter Streaming
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Apache Flume - DataDayTexas
Apache Flume - DataDayTexasApache Flume - DataDayTexas
Apache Flume - DataDayTexas
 
Apache flume - an Introduction
Apache flume - an IntroductionApache flume - an Introduction
Apache flume - an Introduction
 
Cloudera's Flume
Cloudera's FlumeCloudera's Flume
Cloudera's Flume
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache Flume
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Hadoop - Apache Pig
Hadoop - Apache PigHadoop - Apache Pig
Hadoop - Apache Pig
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 

Similar a Centralized logging with Flume

Flume lspe-110325145754-phpapp01
Flume lspe-110325145754-phpapp01Flume lspe-110325145754-phpapp01
Flume lspe-110325145754-phpapp01joahp
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkooFabrice dos Santos
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202Timothy Spann
 
Data persistency (draco, cygnus, sth comet, quantum leap)
Data persistency (draco, cygnus, sth comet, quantum leap)Data persistency (draco, cygnus, sth comet, quantum leap)
Data persistency (draco, cygnus, sth comet, quantum leap)Fernando Lopez Aguilar
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - FlumeAnandMHadoop
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauMapR Technologies
 
Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptxJayesh Patil
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedwhoschek
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrTimothy Spann
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Splunk Discovery: Warsaw 2018 - Getting Data In
Splunk Discovery: Warsaw 2018 - Getting Data InSplunk Discovery: Warsaw 2018 - Getting Data In
Splunk Discovery: Warsaw 2018 - Getting Data InSplunk
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application Apache Apex
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3uzzal basak
 

Similar a Centralized logging with Flume (20)

Flume lspe-110325145754-phpapp01
Flume lspe-110325145754-phpapp01Flume lspe-110325145754-phpapp01
Flume lspe-110325145754-phpapp01
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkoo
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
 
Data persistency (draco, cygnus, sth comet, quantum leap)
Data persistency (draco, cygnus, sth comet, quantum leap)Data persistency (draco, cygnus, sth comet, quantum leap)
Data persistency (draco, cygnus, sth comet, quantum leap)
 
Session 09 - Flume
Session 09 - FlumeSession 09 - Flume
Session 09 - Flume
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques Nadeau
 
Flume DS -JSP.pptx
Flume DS -JSP.pptxFlume DS -JSP.pptx
Flume DS -JSP.pptx
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Splunk Discovery: Warsaw 2018 - Getting Data In
Splunk Discovery: Warsaw 2018 - Getting Data InSplunk Discovery: Warsaw 2018 - Getting Data In
Splunk Discovery: Warsaw 2018 - Getting Data In
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
 

Último

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Último (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Centralized logging with Flume

  • 1. Log Ingestion on Big Data Platform with Flume
  • 2. Agenda •Why Centralized Logging on Hadoop •Flume Introduction •Simple Flume Logging •Centralized and Scalable Flume Logging •Leveraging log data •Example 2
  • 3. •There are tons of logs generated from Applications •These logs are stored on local disks on individual nodes. •Log files containing records are required to archive in near real time to create some value. •Enable analytics on logs for diagnosing issues on Hadoop platform. 3 Use Case: Centralized Logging Requirements
  • 4. Centralized Log Management & Analytics : Goals •Have a central repository to store large volume of machine generated data from all sources and tiers of applications and infrastructures •Feed log data from multiple sources to the common repository in a non- intrusive way and in near real time •Enable analytics on log data using standard analytical solutions •Provide capability to search and correlate information across different sources for quick problem isolation and resolution. •Improve operational intelligence and •Be centralized without redundancy of multiple agents on all hosts for log collections 4
  • 5. Solution Components for centralized logging Flume •Flume is a streaming service, distributed as part of Apache Hadoop ecosystem, and primarily a reliable way of getting stream and log data into HDFS. Its pluggable architecture supports any consumer. A correctly configured pipeline of Flume is guaranteed to not lose data, provided durable channels are used. •Each Flume agent consists of three major components: sources, channels, and sinks. Sources An active component that receives events from a specialized location or mechanism and places it on one or Channels. Different Source types: Specialized sources for integrating with well-known systems. Example: Syslog, Netcat AvroSource NetcatSource SpoolDirectorySource ExecSource JMSSource SyslogTcpSource SyslogUDPSource 5
  • 6. Channels A passive component that buffers the incoming events until they are drained by Sinks. Different Channels offer different levels of persistence: Memory Channel: volatile Data lost if JVM or machine restarts File Channel: backed by WAL implementation Data not lost unless the disk dies. Eventually, when the agent comes back data can be accessed. Channels are fully transactional Provide weak ordering guarantees (in case of failures / rollbacks ) Can work with any number of Sources and Sinks. Handles upstream bursts Upstream or downstream buffers
  • 7. Sinks An active component that removes events from a Channel and transmits them to their next hop destination. Different types of Sinks: Terminal sinks that deposit events to their final destination. For example: HDFS, HBase, Kite-Solr, Elastic Search Sinks support serialization to user’s preferred formats. HDFS sink supports time-based and arbitrary bucketing of data while writing to HDFS. IPC sink for Agent-to-Agent communication: Avro Require exactly one channel to function
  • 8. Flume Multi Tier Setup [Client]+ Agent [ Agent]* Destination_______
  • 9.
  • 10. Interceptors Interceptor Flume has the capability to modify/drop events in-flight. This is done with the help of interceptors. An interceptor can modify or even drop events based on any criteria chosen by the developer of the interceptor. Built-in Interceptors allow adding headers such as timestamps, hostname, static markers etc. Custom interceptors can introspect event payload to create specific headers where necessary
  • 11. Configuration Example: Flume Agents ● Hierarchical ● Flow of components 11
  • 12. Contextual Routing with Interceptors Achieved using Interceptors and Channel Selectors Terminal Sinks can directly use Headers to make destination selections HDFS Sink can use headers values to create dynamic path for files that the event will be added to. # channel selector configuration agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing agent_foo.sources.avro-AppSrv-source1.selector.header = State agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1 agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2 agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2 agent_foo.sources.avro-AppSrv-source1.selector.optional.CA = mem-channel-1 file-channel-2 agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2 agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1
  • 13. Flume Client An entity that generates events and sends them to one or more Agents. • Example • Flume/Syslog log4j Appender • Custom Client using Client SDK (org.apache.flume.api) • Embedded Agent – An agent embedded within your application • Decouples Flume from the system where event data is consumed from • Not needed in all cases
  • 14. Client Applications Configuration Example: Log4j •log4j.appender.syslog=org.apache.log4j.net.SyslogAppender •log4j.appender.syslog.Facility=LOCAL3 •log4j.appender.syslog.FacilityPrinting=false •log4j.appender.syslog.Header=true •log4j.appender.syslog.SyslogHost=FlumedestinationHost:4444 •log4j.appender.syslog.layout=org.apache.log4j.PatternLayout •log4j.appender.syslog.layout.ConversionPattern= TYPE: DUMMY %p: (%F:%L) %x %m %n 14 Below is log4 configuration snippet , To enable java applications to send events
  • 15. For Non log4j Applications Rsyslog •Rsyslog is an open-source software utility used on UNIX and Unix-like computer systems for forwarding log messages in an IP network. It implements the basic syslog protocol, extends it with content-based filtering, rich filtering capabilities, flexible configuration options and adds features such as using TCP for transport. ● Used in most of the Linux distros as standard logger ● Has multiple facilities for application use local0-local7 (avoid local7) ● Can poll any file on system and send new events over the network to syslog destinations ● service rsyslog restart $ModLoad imfile $InputFileName /var/log/NEWAPP/NEWAPP.log $InputFileTag TYPE:_NEWAPP $WorkDirectory /var/spool/rsyslog/NEWAPP $InputFileStateFile NEWAPP-log $InputFileFacility local7 $InputFilePersistStateInterval 10 $InputFileSeverity info $RepeatedMsgReduction off $InputRunFileMonitor local7.* @@flumehost:4444
  • 16. Solution: Near Real Time Log Archive to Hadoop Platform 16 Event Flow :: Simple Flume Logging
  • 17. Solution: Near Real Time Log Archive to Hadoop Platform 17 •Less centralized , avoiding single point of failure. •In case collector fails , events are still not lost. •Scope for further scalability , with minimum configuration.
  • 18. Configuration Example: Flume Multi tier Config ●Flume Listener Agents ■ This agent gathers events from multiple applications. ■ can also perform event inspections using interceptors in this tier. ■ Each event is analyzed and sent forward with appropriate header(only) updates so next agent can make sense of it. ■ We can use filechannel or any other durable channel here. ■ Events aggregated for next tier ●Flume Writer Tier ■ Minimum connections to HDFS ■ This agent gets events from aggregator and reads headers. ■ According to header events are sent to relevant location on HDFS. 18
  • 19. DDL for creating a Hive table with log data, CREATE TABLE logData_H2 ( Ltype STRING, event_time STRING, porder STRING, SEVERITY STRING, SCLASS STRING, PHO STRING , MESG STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data/logmgmt/_DUMMY/raz-XPS14/150703/';