This document provides an overview of monitoring in big data frameworks. It discusses the challenges of monitoring large-scale cloud environments running big data applications. Several open-source monitoring tools are described, including Hadoop Performance Monitoring UI, SequenceIQ, Ganglia, Apache Chukwa, and Nagios. Key requirements for monitoring big data platforms are also outlined, such as scalability, timeliness, and handling constant changes. The document concludes by introducing the DICE monitoring platform, which collects metrics from Hadoop, YARN, Spark, Storm and Kafka using Collectd and stores the data in Elasticsearch for analysis and visualization with Kibana.
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
1. DICE Horizon 2020 Project
Grant Agreement no. 644869
http://www.dice-h2020.eu Funded by the Horizon 2020
Framework Programme of the European Union
Monitoring in Big Data
Frameworks
Gabriel Iuhasz
Institute e-Austria Timisoara
26 November 2015
2. Overview
o Introduction
o Cloud Computing and Big Data
o Monitoring Tools
o Monitoring Requirements and Solutions
o Conclusions
3. Introduction
o Big Data in Cloud computing
o Volume, Velocity, Variety and Veracity
o Cost Reduction, Rapid provisioning/time to market,
Flexibility/scalability
o DevOps and Cloud
o Development and Operations
o Communication, Collaboration, Integration,
Automation
o DevOps Monitoring
o Measurement is a key aspect of DevOps
4. Big Data in Cloud Computing
o Challenges of Big Data On Cloud
o Low Latency real-time data
o Virtualization overhead
o Multi-tenancy overhead
o Scalability
o Lack of RDBMS support
o Availability
o Data integrity/privacy
8. Monitoring Architecture
o Cross layer monitoring of big data platforms
o Types of metrics are highly dependent on the type of the
application
o Have to be decided on a platform/application basis
o Centralized Monitoring
o All resource states are sent to a centralized monitoring server
o Metrics are continuously polled from monitored components
o Single point of failure
o Lacks scalability
o Decentralized Monitoring
o No single point of failure
o Central authority is diffused
9. Tools
o Hadoop Performance Monitoring UI
o Lightweight monitoring UI for Hadoop server
o Uses Hadoop metrics (using Sinks)
o SequenceIQ
o Based on ELK stack and Docker containers
o ElasticSearch can be easily scaled horizontally
o Logstash server on client side
o Ganglia
o Scalable distributed monitoring system
o Low per-node overhead
o Focused on System Metrics
o Gmond, gmetad and Web Front-end
10. Tools II
o Apache Chukwa
o Built on top of HDFS
o Easily scalable
o Potentially high overhead
o Hadoop Vaidya
o Rule Based diagnostic tool for M/R jobs
o Performes post run results analysis
o Nagios
o Plugin based architecture
o Uses a centralized server to collect metrics
o Possible to create a hierarchical deployment
11. Requirements
o Difficulties in cloud monitoring
o Scale
o Velocity or Timeliness
o Constant changes
o The need for scalability and automation
o Easy re-configurability
o Lightweight metrics collectors
o Identifying pertinent metrics
13. DICE Monitoring Platform
o RESTful Web Service
o Used to deploy and configure all core/auxiliary components
o Used to query ElasticSearch
Exports metrics in: JSON, CSV, OSLC Perf. Mon 2.0 (RDF+XML)
o Used for auto-scaling of monitoring solution
o ELK Stack
o Extremely flexible/configurable
o Horizontally scalable
o Can except various input and output formats
o ETL via Logstash server (filters)
o Logstash-forwarder secure transmission (new Beats Data Shippers)
o Visualization using Kibana4
o Collectd
o Statistics collection daemon
o A lot of plugins available
o Simple configuration
17. Conclusions
o We have given a short overview of current
monitoring platforms Identified key requirements for
Big Data Monitoring
o Scaling, Autonomy, Timeliness
o Automation via Chef recipes
o Presented the current Architecture of the DICE
Monitoring Platform
o Currently collecting from: HDFS, YARN, Spark, Storm, Kafka
o In the near future: Cassandra possibly Trident
o Creating the full lambda architecture based anomaly
detection platform
o ElasticSearch used as serving layer
- DevOps is a design philosophy that emphasizes collaboration and communication while automating the process of software delivery and infrastructure changes
RDBMS
Quality-Aware Development for Big Data applications