SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
M3AT: Monitoring Agents Assignment Model for the Data-
Intensive Applications
Vladislav Kashansky, Dragi Kimovski, Radu Prodan, Prateek Agrawal, Fabrizio Marozzo,
Iuhasz Gabriel, Marek Justyna and Javier Garcia-Blas
1
ASPIDE Collaboration
vladislav@itec.aau.at
28th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
Presentation Outline
 Background Information;
 Monitoring Tools and Techniques;
 Challenges at the Large Scales;
 M3AT Architecture & Formal Model;
 Practical Approach using SCIP Optimization Suite;
 Evaluation;
 Q&A;
2
ASPIDE Project’s Architecture
3
Example of the Technical Challenge
 It’s required to trace/profile the behavior of the application on the HPC/Cloud cluster in the next
T seconds with low transition latencies afterwards. It’s unknown how to select the proper
sampling rate and how to place monitoring agents in the current system:
1. Specific application is running on the cluster which represents workload, but it runs not
in isolated environment and affected by the current network congestion state;
2. Operator must see all the required information about the processing to identify hotspots.
Moreover the data is required to be stored for the long-term analysis;
3. Run time data collection and transport enables analysis while applications are running
and while the system is experiencing conditions of interest. Post-processing analysis
does not solve problems as they occur in practice [LDMS, SC’2014].
4
Goal and Objectives
 Project’s Goal: Scalable monitoring system for large scale data-intensive systems
 Our Goal: Mathematical model for the low-latency monitoring data collection subject to the
given I/O policies
 Objectives:
1. Analyze existing state-of-the-art monitoring approaches and frameworks, mathematical
methods in the field of combinatorial and discrete optimization;
2. Propose the mathematical model for the efficient monitoring data collection;
3. Design the architecture that will enable practical evaluation of the proposed model.
5
Contemporary Linux Performance Measurement Tools
6
Contemporary Model-Specific Performance Measurement
Tools
 Cube GUI by Jülich Supercomputing Centre
 Scalasca Trace Tools by Jülich Supercomputing
Centre
 Vampir by Technische Universität Dresden
 Periscope by Technische Universität München
 TAU by University of Oregon
 Extra-P by Technische Universität Darmstadt
7
Challenge lies in massively parallel data and meta-data requests which overcharge distributed parallel file systems. This is a fundamental problem
on highest-scale HPC machines today. - Knüpfer, Andreas, et al. "Score-P: A joint performance measurement run-time infrastructure for Periscope,
Scalasca, TAU, and Vampir.
Curse of Dimensionality
 It is impractical and many times impossible to globally measure the performance metrics
of large-scale applications and systems, while preserving, for example, I/O limitation
policies.
 Thus, it is critical to identify:
• The parameters to monitor and the granularity level (e.g dynamic tracing, profiling, per-node aggregated
statistics, per cluster I/O heatmaps);
• The measurement interval and the communication patterns in relation to these intervals;
• The aggregation and pre-processing of performance metrics at a monitor granularity for further analysis
8
Transition to the larger-scale architecture
Agelastos, Anthony, et al. "The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large
scale computing systems and applications." High Performance Computing, Networking, Storage and Analysis, SC14:
International Conference for. IEEE, 2014.
9
Graph Model
10
 Exchange Protocol, Access Algorithms
 Monitoring Data Vectors
 Monitoring Data Vectors
 The scalable cluster with
unified data space provided
by DFS
Large-scale Monitoring Architecture
 M3AT component for monitoring agent and
aggregator assignment control;
 Aggregation and event detection component
(AEDC) provides monitoring data aggregation from
the agents and detects possible events. This
component is decentralized and runs an instance
on every aggregation node;
 Main analysis component (AC) is centralized and
provides the set of analytic tools, including
smart monitoring of application performance and
bottlenecks detection;
11
Mathematical model – M3AT Components
12
 The M3AT model aims to identify an optimal assignment of monitoring agents and aggregation points, where the
monitoring data needs to be pre-processed for further analysis. Initially, the monitoring agents are selected from
the partitions, subject to a given application. Thereafter, we assign to the monitoring agents a subset of the
required aggregators guaranteeing a low response time and a fixed amount of monitoring traffic within the given
upper limits.
Model’s Limitations
 The a-priori information about the running application is already present, and delivered by the runtime system
(i.e SLURM, Hadoop, Borg);
 The number and location of monitoring and aggregating agents is decided by runtime and data management
systems;
 The relevant application performance metrics have already been selected and considered as the data volume,
accumulated within the given push interval;
 The optimal control criteria (objective function) with the set of constraints is not changing during the solving
procedure of the optimization problem.
13
Formal Mathematical Model
14
 Convex Polyhedron
 Knapsack constraints:
• Upper limit on total bandwidth limitation (Resource
Constraints);
 Assignment constraints:
• Each monitor assigned exactly to the one aggregator;
 0-1 Formulation:
• Admissible values of the decision variables constrained
to the [0,1] set;
Formal Mathematical Model. Matrix and LP Format
15
 Matrix Format:  LP Format:
Data Generation and Limitation Policies
16
 We tested the model on a set of 50 assignments
problems, sampled with a uniform distribution using
variable random seeds ranging form 100 to 150,
generated by the MT19937 generator of the GNU
Scientific Library 2.0.1.
 We identified the constants for the uniform distribution
based on the use-case ecosystem requirements.
 Each class provides possible I/O limitation scenarios
imposed to the given aggregators set.
 For example, complexity classes B† and C † model
an environment with bandwidth saturation within a
given HPC cluster partition by setting limitation
policies inversely proportional to the current
number of aggregators and possible amount of
traffic in circulation.
 The complexity class A† is also derived from the
application use-cases and allows probabilistic
variability in the bandwidth saturation for a given
aggregator.
SCIP Optimization Suite
 Provides a fast open-source IP, MIP and MINLP solver;
 Incorporates
• MIP features (cutting planes, LP relaxation);
• MINLP features;
• CP features (domain propagation);
• SAT-solving features (conict analysis, restarts);
• branch-cut-and-price framework,
• Has a modular structure via plugins;
• Free for academic purposes.
17
Achterberg, Tobias. "SCIP: solving constraint integer programs." Mathematical
Programming Computation 1.1 (2009): 1-41.
 Possible to parallelize branch-and-bound based
methods in a distributed or shared memory
computing environment.
SCIP Optimization Suite – Solution Output
18
 SCIP Solution for the130x80 dimensionality:
Time (sec)
Time (sec)
SCIPRelativeGAP%Amount(n)
Numerical Results
19
aa
Conclusion
 To solve this problem, we applied the ILP formalism and reduced the problem to a GAP formulation;
 We identified the requirements and the parameters for the given model and its solving techniques
based on several data-intensive applications, their corresponding ecosystems, and current the
state-of-the-art monitoring and profiling techniques;
 We have evaluated the scalability of our model based on several varying complexity data sets in
relation to the specific SCIP precision configuration;
 The approach scales well when the number of agents is within these boundaries, demonstrating
high sensitivity to the problem scale and the input data.
20
Open question on SCIP optimization suite and auto-tuning
 There are parameters to affect almost every part of the solving process of SCIP.
• SCIP currently features more than 1600 parameters: boolean, integer, real valued;
• Automated parameter tuning could improve the SCIP performance;
• Nowadays it is active research area for many major OR and AI ecosystems (i.e. CPLEX,
Gurobi, GAMS, Coin-OR, Tensorflow);
 Where to start?
• The default settings yield a good performance on a heterogeneous MIP benchmark set;
• Application-specific subset of parameters;
• Only centralized approach is considered, distributed case requires analysis;
21
Thank you
Q&A
22

Más contenido relacionado

La actualidad más candente

An efficient recovery mechanism
An efficient recovery mechanismAn efficient recovery mechanism
An efficient recovery mechanismijcsa
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficeSAT Journals
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficeSAT Publishing House
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSNexgen Technology
 
Presentation1
Presentation1Presentation1
Presentation1Borreke
 
Dynamic timed energy efficient and data collision free mac protocol for wirel...
Dynamic timed energy efficient and data collision free mac protocol for wirel...Dynamic timed energy efficient and data collision free mac protocol for wirel...
Dynamic timed energy efficient and data collision free mac protocol for wirel...LogicMindtech Nologies
 
Kinetics reaction scheme_v1 3
Kinetics reaction scheme_v1 3Kinetics reaction scheme_v1 3
Kinetics reaction scheme_v1 3RobBerger
 
Multisensor data fusion for defense application
Multisensor data fusion for defense applicationMultisensor data fusion for defense application
Multisensor data fusion for defense applicationSayed Abulhasan Quadri
 
Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Minor33
 
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Christian Kehl
 
Detailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless NetworksDetailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless NetworksGabriele D'Angelo
 
Intro to Multitarget Tracking for CURVE
Intro to Multitarget Tracking for CURVEIntro to Multitarget Tracking for CURVE
Intro to Multitarget Tracking for CURVEchenhm
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstracttsysglobalsolutions
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...IEEEGLOBALSOFTSTUDENTPROJECTS
 

La actualidad más candente (17)

An efficient recovery mechanism
An efficient recovery mechanismAn efficient recovery mechanism
An efficient recovery mechanism
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network traffic
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network traffic
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
 
Recursive
RecursiveRecursive
Recursive
 
Presentation1
Presentation1Presentation1
Presentation1
 
Dynamic timed energy efficient and data collision free mac protocol for wirel...
Dynamic timed energy efficient and data collision free mac protocol for wirel...Dynamic timed energy efficient and data collision free mac protocol for wirel...
Dynamic timed energy efficient and data collision free mac protocol for wirel...
 
Kinetics reaction scheme_v1 3
Kinetics reaction scheme_v1 3Kinetics reaction scheme_v1 3
Kinetics reaction scheme_v1 3
 
Multisensor data fusion for defense application
Multisensor data fusion for defense applicationMultisensor data fusion for defense application
Multisensor data fusion for defense application
 
Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...Performance evaluation of a discovery and scheduling protocol for multihop ad...
Performance evaluation of a discovery and scheduling protocol for multihop ad...
 
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
 
Detailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless NetworksDetailed Simulation of Large-Scale Wireless Networks
Detailed Simulation of Large-Scale Wireless Networks
 
Intro to Multitarget Tracking for CURVE
Intro to Multitarget Tracking for CURVEIntro to Multitarget Tracking for CURVE
Intro to Multitarget Tracking for CURVE
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstract
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
 
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
[IJET V2I2P18] Authors: Roopa G Yeklaspur, Dr.Yerriswamy.T
 
Ajila (1)
Ajila (1)Ajila (1)
Ajila (1)
 

Similar a M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications

8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...LDBC council
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Eswar Publications
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centersRaimon Bosch
 
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...csandit
 
Clustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionClustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionAPNIC
 
FPGA Based Data Processing for Real-time WSN Applications:
FPGA Based Data Processing for Real-time WSN Applications: FPGA Based Data Processing for Real-time WSN Applications:
FPGA Based Data Processing for Real-time WSN Applications: Ilham Amezzane
 
Ieee transactions on 2018 network and service management
Ieee transactions on 2018 network and service managementIeee transactions on 2018 network and service management
Ieee transactions on 2018 network and service managementtsysglobalsolutions
 
Advanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion ForecastAdvanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion ForecastPower System Operation
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RIRJET Journal
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentSwapnil Shahade
 
Target Response Electrical usage Profile Clustering using Big Data
Target Response Electrical usage Profile Clustering using Big DataTarget Response Electrical usage Profile Clustering using Big Data
Target Response Electrical usage Profile Clustering using Big DataIRJET Journal
 
Fault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational gridFault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational gridGhulam Asfia
 
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Tiziano De Matteis
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmIRJET Journal
 
Paper sharing_resource optimization scheduling and allocation for hierarchica...
Paper sharing_resource optimization scheduling and allocation for hierarchica...Paper sharing_resource optimization scheduling and allocation for hierarchica...
Paper sharing_resource optimization scheduling and allocation for hierarchica...YOU SHENG CHEN
 
Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netredpel dot com
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 

Similar a M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications (20)

8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
GRID COMPUTING
GRID COMPUTINGGRID COMPUTING
GRID COMPUTING
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centers
 
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
Fault-Tolerance Aware Multi Objective Scheduling Algorithm for Task Schedulin...
 
Clustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow DetectionClustering-based Analysis for Heavy-Hitter Flow Detection
Clustering-based Analysis for Heavy-Hitter Flow Detection
 
FPGA Based Data Processing for Real-time WSN Applications:
FPGA Based Data Processing for Real-time WSN Applications: FPGA Based Data Processing for Real-time WSN Applications:
FPGA Based Data Processing for Real-time WSN Applications:
 
Ieee transactions on 2018 network and service management
Ieee transactions on 2018 network and service managementIeee transactions on 2018 network and service management
Ieee transactions on 2018 network and service management
 
Advanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion ForecastAdvanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion Forecast
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
Target Response Electrical usage Profile Clustering using Big Data
Target Response Electrical usage Profile Clustering using Big DataTarget Response Electrical usage Profile Clustering using Big Data
Target Response Electrical usage Profile Clustering using Big Data
 
Fault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational gridFault Tollerant scheduling system for computational grid
Fault Tollerant scheduling system for computational grid
 
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
 
Paper sharing_resource optimization scheduling and allocation for hierarchica...
Paper sharing_resource optimization scheduling and allocation for hierarchica...Paper sharing_resource optimization scheduling and allocation for hierarchica...
Paper sharing_resource optimization scheduling and allocation for hierarchica...
 
Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot net
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications

  • 1. M3AT: Monitoring Agents Assignment Model for the Data- Intensive Applications Vladislav Kashansky, Dragi Kimovski, Radu Prodan, Prateek Agrawal, Fabrizio Marozzo, Iuhasz Gabriel, Marek Justyna and Javier Garcia-Blas 1 ASPIDE Collaboration vladislav@itec.aau.at 28th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
  • 2. Presentation Outline  Background Information;  Monitoring Tools and Techniques;  Challenges at the Large Scales;  M3AT Architecture & Formal Model;  Practical Approach using SCIP Optimization Suite;  Evaluation;  Q&A; 2
  • 4. Example of the Technical Challenge  It’s required to trace/profile the behavior of the application on the HPC/Cloud cluster in the next T seconds with low transition latencies afterwards. It’s unknown how to select the proper sampling rate and how to place monitoring agents in the current system: 1. Specific application is running on the cluster which represents workload, but it runs not in isolated environment and affected by the current network congestion state; 2. Operator must see all the required information about the processing to identify hotspots. Moreover the data is required to be stored for the long-term analysis; 3. Run time data collection and transport enables analysis while applications are running and while the system is experiencing conditions of interest. Post-processing analysis does not solve problems as they occur in practice [LDMS, SC’2014]. 4
  • 5. Goal and Objectives  Project’s Goal: Scalable monitoring system for large scale data-intensive systems  Our Goal: Mathematical model for the low-latency monitoring data collection subject to the given I/O policies  Objectives: 1. Analyze existing state-of-the-art monitoring approaches and frameworks, mathematical methods in the field of combinatorial and discrete optimization; 2. Propose the mathematical model for the efficient monitoring data collection; 3. Design the architecture that will enable practical evaluation of the proposed model. 5
  • 6. Contemporary Linux Performance Measurement Tools 6
  • 7. Contemporary Model-Specific Performance Measurement Tools  Cube GUI by Jülich Supercomputing Centre  Scalasca Trace Tools by Jülich Supercomputing Centre  Vampir by Technische Universität Dresden  Periscope by Technische Universität München  TAU by University of Oregon  Extra-P by Technische Universität Darmstadt 7 Challenge lies in massively parallel data and meta-data requests which overcharge distributed parallel file systems. This is a fundamental problem on highest-scale HPC machines today. - Knüpfer, Andreas, et al. "Score-P: A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir.
  • 8. Curse of Dimensionality  It is impractical and many times impossible to globally measure the performance metrics of large-scale applications and systems, while preserving, for example, I/O limitation policies.  Thus, it is critical to identify: • The parameters to monitor and the granularity level (e.g dynamic tracing, profiling, per-node aggregated statistics, per cluster I/O heatmaps); • The measurement interval and the communication patterns in relation to these intervals; • The aggregation and pre-processing of performance metrics at a monitor granularity for further analysis 8
  • 9. Transition to the larger-scale architecture Agelastos, Anthony, et al. "The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications." High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for. IEEE, 2014. 9
  • 10. Graph Model 10  Exchange Protocol, Access Algorithms  Monitoring Data Vectors  Monitoring Data Vectors  The scalable cluster with unified data space provided by DFS
  • 11. Large-scale Monitoring Architecture  M3AT component for monitoring agent and aggregator assignment control;  Aggregation and event detection component (AEDC) provides monitoring data aggregation from the agents and detects possible events. This component is decentralized and runs an instance on every aggregation node;  Main analysis component (AC) is centralized and provides the set of analytic tools, including smart monitoring of application performance and bottlenecks detection; 11
  • 12. Mathematical model – M3AT Components 12  The M3AT model aims to identify an optimal assignment of monitoring agents and aggregation points, where the monitoring data needs to be pre-processed for further analysis. Initially, the monitoring agents are selected from the partitions, subject to a given application. Thereafter, we assign to the monitoring agents a subset of the required aggregators guaranteeing a low response time and a fixed amount of monitoring traffic within the given upper limits.
  • 13. Model’s Limitations  The a-priori information about the running application is already present, and delivered by the runtime system (i.e SLURM, Hadoop, Borg);  The number and location of monitoring and aggregating agents is decided by runtime and data management systems;  The relevant application performance metrics have already been selected and considered as the data volume, accumulated within the given push interval;  The optimal control criteria (objective function) with the set of constraints is not changing during the solving procedure of the optimization problem. 13
  • 14. Formal Mathematical Model 14  Convex Polyhedron  Knapsack constraints: • Upper limit on total bandwidth limitation (Resource Constraints);  Assignment constraints: • Each monitor assigned exactly to the one aggregator;  0-1 Formulation: • Admissible values of the decision variables constrained to the [0,1] set;
  • 15. Formal Mathematical Model. Matrix and LP Format 15  Matrix Format:  LP Format:
  • 16. Data Generation and Limitation Policies 16  We tested the model on a set of 50 assignments problems, sampled with a uniform distribution using variable random seeds ranging form 100 to 150, generated by the MT19937 generator of the GNU Scientific Library 2.0.1.  We identified the constants for the uniform distribution based on the use-case ecosystem requirements.  Each class provides possible I/O limitation scenarios imposed to the given aggregators set.  For example, complexity classes B† and C † model an environment with bandwidth saturation within a given HPC cluster partition by setting limitation policies inversely proportional to the current number of aggregators and possible amount of traffic in circulation.  The complexity class A† is also derived from the application use-cases and allows probabilistic variability in the bandwidth saturation for a given aggregator.
  • 17. SCIP Optimization Suite  Provides a fast open-source IP, MIP and MINLP solver;  Incorporates • MIP features (cutting planes, LP relaxation); • MINLP features; • CP features (domain propagation); • SAT-solving features (conict analysis, restarts); • branch-cut-and-price framework, • Has a modular structure via plugins; • Free for academic purposes. 17 Achterberg, Tobias. "SCIP: solving constraint integer programs." Mathematical Programming Computation 1.1 (2009): 1-41.  Possible to parallelize branch-and-bound based methods in a distributed or shared memory computing environment.
  • 18. SCIP Optimization Suite – Solution Output 18  SCIP Solution for the130x80 dimensionality: Time (sec) Time (sec) SCIPRelativeGAP%Amount(n)
  • 20. Conclusion  To solve this problem, we applied the ILP formalism and reduced the problem to a GAP formulation;  We identified the requirements and the parameters for the given model and its solving techniques based on several data-intensive applications, their corresponding ecosystems, and current the state-of-the-art monitoring and profiling techniques;  We have evaluated the scalability of our model based on several varying complexity data sets in relation to the specific SCIP precision configuration;  The approach scales well when the number of agents is within these boundaries, demonstrating high sensitivity to the problem scale and the input data. 20
  • 21. Open question on SCIP optimization suite and auto-tuning  There are parameters to affect almost every part of the solving process of SCIP. • SCIP currently features more than 1600 parameters: boolean, integer, real valued; • Automated parameter tuning could improve the SCIP performance; • Nowadays it is active research area for many major OR and AI ecosystems (i.e. CPLEX, Gurobi, GAMS, Coin-OR, Tensorflow);  Where to start? • The default settings yield a good performance on a heterogeneous MIP benchmark set; • Application-specific subset of parameters; • Only centralized approach is considered, distributed case requires analysis; 21