2. Agenda
Who we are ?
Why we need History Server?
Application History Server
Timeline Server V1
Timeline Server V2
3. Who we are ?
Naganarasimha G R
Senior Technical Lead @ Huawei
Active Apache Hadoop Contributor.
Currently working in Hadoop Platform Dev team
Earlier worked in Reporting Domain
Varun Saxena
Technical Lead @ Huawei
Active Apache Hadoop Contributor.
Currently working in Hadoop Platform Dev team
Earlier worked in Telecom Data Network Domain
Both of us are currently participating in ATS V2 development
4. Agenda
Who we are ?
Why we need History Server?
Application History Server
Timeline Server V1
Timeline Server V2
5. Need for new History Server
Job History server is only for MR app, YARN supports
many Applications.
YARN level Events and Metrics are not captured.
Storage is HDFS only, Not good for adhoc analysis.
JHS is only for historical or completed jobs.
On failures of Application Master, Data for current
running application is lost.
Storage is very MR specific
- Counters
- Mappers and Reducers
6. Agenda
Who we are ?
Why we need History Server?
Application History Server
Timeline Server V1 & V1.5
Timeline Server V2
7. Application History Server
Separate Process
Resource Manager directly writes to Storage(HDFS)
Aggregated Logs
Separate UI, CLI and Rest End Point
Data stored :
- Application level data (queue, user etc…)
- List of ApplicationAttempts
- Information about each ApplicationAttempt
- List of containers for ApplicationAttempt
- Generic information about each container.
CLI and REST Query interfaces were supported
Drawbacks :
Storing Application specific custom data not
supported
RM crashes, HDFS files are not readable
Hard limit no number of Files
Upgrades / Update
Supports only completed jobs.
8. Agenda
Who we are ?
Why we need History Server?
Application History Server
Timeline Server V1
Timeline Server V2
9. Application Timeline Service
Motivation :
YARN takes care of it
- Relieving the application from monitoring service
Application diversity
- Framework specific metadata/metrics
10. ATS V1 : Data Model
Timeline Domain
- Namespace for Timeline server which supports
isolations users and applications
- Timeline server Security is defined at this level
Timeline Entity
- An abstract concept of anything
- Defines the relationship between entities
- Can be an application, an application attempt, a
container or any user-defined object
- contains Primary filters which will be used to index the
entities in the Timeline Store.
- uniquely identified by an EntityId and EntityType.
Timeline Event
- Event that is related to a specific Timeline Entity of an application
- Users are free to define what an event means, such as starting an application, getting allocated a container,
11. ATS V1 : Architecture
Separate Process
Pluggable store – defaults to LevelDB
REST Interfaces
12. ATS V1 : Level DB
Key- value store
Lightweight
Open source Compatible license
Used to store
- TimelineStore : Domain, Entity, Events and metrics
- TimelineStateStore : Security Tokens
Supports Data Retention
13. ATS V1 : Client & API
Timeline client
- Wrap over REST POST method
- POJO objects
TimelineEntity
TimelineEvent
- In Client/AM/Container
Rest APIs, JSON as the media
- Get timeline entities
http://localhost:8188/ws/v1/timeline/{entityType}
- Get timeline entity
http://localhost:8188/ws/v1/timeline/{entityType}/{entityId}
- Get timeline events
http://localhost:8188/ws/v1/timeline/{entityType}/events
16. Agenda
Who we are ?
Why we need History Server?
Application History Server
Timeline Server V1
Timeline Server V2
17. Why ATSv2 ?
Scalability
• Single global instance of writer/reader
• ATSv1 uses local disk based LevelDB storage
Usability
• Handle flows as first-class concepts and model aggregation.
• Elevate configuration and metrics to first-class members.
• Better support for queries.
Reliability
• Data is stored only in a local disk .
• Single daemon so single point of failure.
Existing external tooling: hRaven, Finch, Dr. Elephant, etc. As new Hadoop versions are rolled out,
maintenance of these tools becomes an issue.
18. Key Design Points
Distributed writers (per app and per node)
• Per App Writer/Collector launched as part of RM.
• Per Node Collector/Writer launched as an auxiliary service in NM.
• In future, will support standalone writers.
Scalable and reliable backend storage (HBase)
A new object model API with flows built into it.
Separate reader instance(s). Currently have a single reader instance.
Aggregation i.e. rolling up the metric values to the parent.
• Online aggregation for apps and flow runs.
• Offline aggregation for users, flows and queues.
20. Resource
Manager
RMApp
Distributed Writers / Collectors
Node Manager 1
{
app_1_collector_info
….
}
List of app collectors
App Master
3. Launch App Master
App Collector
App Collector
Aux Service
4. Notify Aux Service
to bind new collector 5. Bind new collector
NODE 1
HBase
NM
Collector
Service
6. Register new
collector
RM
Events
Heartbeat with collector info
App Collector
App Collector
Node
Manager 2
Node
Manager X
1. User submits app
Heartbeat with collector info
2. RMApp launches
companion app collector on
new app submission
7. Report new collector
info. (IP + Port)
Container
Events
AM reports events to app
collector notified in HB by
RM.
NM reports events to app
collector notified in HB by
RM.
{
app_1_collector_info
app_2_collector_info
….
}
21. App 1
App 2
App 3 App 4
Run at 9:00 pm
Flow
Script / Program
(eg. HIVE Query /
Pig Script)
App 1
App 2
App 3 App 4
Run at 7:30 pm
Joe
22. Data Model
Entity
ID + Type
Configurations
Metadata(Info)
Parent-Child
Relationships
Metrics
Events
Cluster
Type
Cluster Attributes
Flow
Type
User
Flow Runs
Flow Attributes
Flow Run
Type
User
Running
apps
Flow Run
Attributes
Application
Type
User
Flow + Run
Queue
Attempts
Attempt
Type
Application
Queue
Containers
Container
Type
Attempt
Attributes
Entities of first
class citizens
User
Username(ID)
Aggregated metrics
Queue
Queue(ID)
Sub queues
Aggregated metrics
Aggregation
Event
ID
Metadata
Timestamp
Metric
ID
Metadata
Single Value or
Time Series(with
timestamps)
23. HBase vs Phoenix evaluation
Based on the evaluation of both Hbase and Phoenix, it was decided that HBase will be used on write path. With
Hbase, much higher throughput, a lower IO wait and far lower CPU load was witnessed.
Test descript
ion
Map
tasks
Entities
per
mapper
Total
entities
written
Phoenix
Transaction
Rate (per
mapper)
ops/sec
HBase
Transactio
n Rate
(per
mapper)
ops/sec
Phoenix Write
Time (job
counter
TIMELINE_
SERVICE_
WRITE_TIME)
Hbase Write Time
(job counter
TIMELINE _SERVICE
_WRITE_TIME)
Synthetic Data 170 1k 170k 112.83 2285.13 1506704 74394
Synthetic Data 170 10k 1.7M 53.029 636.41 32057957 2671241
Synthetic Data 1 50k 50k 196.67 19770.66 254225 2529
9 History Files 33 - 85k 319.19
(write errors)
962.32 265460 88049
555 History
Files
33 - 810k 206.25
(write errors)
927.62 4102364 874151
24. Aggregation
Aggregation basically means rolling up metrics from child entities to parent entities. We can perform different operations such as
SUM, AVG ,etc. while rolling them up and store them in the parent.
App level aggregation will be done by app collector as and when it receives different metrics.
Online or real time aggregation for apps would be a simple SUM of metrics of child entities. Additional metrics will also be stored
which indicate AVG, MAX, AREA(time integral) etc. More on this in next slide.
App to flow run aggregation will be done via a HBase coprocessor on the read path. Cell tags used to achieve this.
For user/flow, aggregation happens periodically(not real time i.e. offline). For this, Phoenix tables will be used. To achieve offline
aggregation, a MR job is run which reads application table and writes to user and flow aggregation tables
Container A1
(CPUCoresMillis = 400)
Container A2
(CPUCoresMillis = 300)
Container B1
(CPUCoresMillis = 200)
App A
(CPUCoresMillis = 700)
App B
(CPUCoresMillis = 200)
Flow
(CPUCoresMillis = 900)
26. HBase Table Schema
Entity Table – Used for storing Timeline Entity object. Contains configs, metrics and other info (events,
parent child relationships, etc.).
Row Key : clusterId!user!flowId!flowRunId!appId!entityType!entityId
Application Table – Used for storing YARN Application entity. Contains configs, metrics and other info.
Same as entity table but added for better performance.
Row Key : clusterId!user!flowId!flowRunId!appId
App To Flow Table – Used for getting flowId and flowRunId information based on cluster and app. This is
helpful in querying entity table on the basis of just the cluster and app information.
Row Key : clusterId! appId
Flow Run Table – Stores flow run information aggregated across apps.
Row Key : clusterId!user!flowId!flowRunId
27. HBase Table Schema (Contd.)
Flow Activity Table – Used for storing daily activity records for a flow. For quick lookup of flow level info.
Row Key : clusterId!inverted top of the day timestamp!user!flowId
Phoenix Tables for Offline Aggregation :
Flow Aggregation Table – Stores aggregated metrics at flow level. Metrics are aggregated from
application table.
Primary Key : user, cluster, flowId
User Aggregation Table – Stores aggregated metrics at user level. Metrics are aggregated from
application table.
Primary Key : user, cluster
28. Querying ATSv2
ATSv2 offers major enhancement over ATSv1 in terms of queries supported. Efficient queries around
flows, flow runs, apps, etc. are possible. Moreover, ATSv2 can support complex queries to filter out
results.
ATSv1 offered only primary filters and secondary filters for filtering out entities. ATSv2 offers ability to
filter out entities based on config values, metric values, entity parent child relationships and events. It
also supports returning only certain configurations and metrics in the result.
ATSv1 queries supported only “equal to” match for primary and secondary filters. But for metrics this
does not quite make sense. A user would while filtering on the basis of metric values would more likely be
using relational operators such as >=, <=, != etc. All these relational operators are supported in ATSv2 for
metrics. In addition to this different predicates in filters can be combined using “AND” and “OR”
operators.
All in all this gives ATSv2 a very powerful query interface.
29. Querying ATSv2 (Contd.)
ATSv2, like ATSv1 supports a REST API interface with JSON as the media. Some examples are given below.
Get Entities – Returns a set of TimelineEntity objects based on cluster, app and entity type. The query also
supports multiple optional query parameters such as limit on number of entities to be returned,
configurations and metrics to be returned, filter on the basis of created and modified time window, config
filters, metric filters and event filters.
http://localhost:8188/entities/{clusterId}/{appId}/{entityType}
Example : -
http://localhost:8188/entities/cluster1/application_1334432321_0002/YARN_CONTAINER?limit=5&metrics=
memory,cpu
Get Entity – Returns a Timeline Entity object based on cluster, app, entity type and entityId.
http://localhost:8188/entity/{clusterId}/{appId}/{entityType}/{entityId}
30. Possible use cases
Cluster utilization and inputs for capacity planning. Cluster can learn from flow’s/application’s historical
data.
Mappers / reducers optimizations.
Application performance over time.
Identifying job bottlenecks.
Ad-hoc troubleshooting and identification of problems in cluster.
Complex queries possible at flow, user and queue level. For instance, queries like % of applications which
ran more than 10000 containers.
Full DAG from flow to flow run to application to container level can be seen.
31. Team Members
Sangjin Lee, Vrushali C and Joep Rottinghuis (Twitter)
Junping Du, Li Lu and Vinod Kumar Vavillapalli (Hortonworks)
Zhijie Shen (formerly Hortonworks)
Varun Saxena and Naganarasimha G R (Huawei)
Robert Kanter and Karthik Kambatla (Cloudera)
Inputs from LinkedIn, Yahoo! and Altiscale.
32. Feature Status
Distributed per-app and per-node writers (as Aux Service)
RM Companion writer
NM, RM and AM writing events and metrics to ATS
File based readers and writers for test
HBase and Phoenix writer implementations
Performance evaluation of these writers
HBase based reader implementation
Support for flows
App and flow run level online Aggregation
Offline Aggregation
Query Interface
33. Feature Status (Contd.)
Standalone timeline writer
Distributed timeline readers and a reader pool
ATSv2 UI
Security
Support for migration