SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Maintaining Non-Stop Services
 with Multi Layer Monitoring

                       Lahav Savir
       System Architect and CEO of Emind Systems
                lahavs@emindsys.com

                   www.emindsys.com
The approach
• Non-stop applications can’t leave on their own
• More complex systems require more
    monitoring
•   Proactive system monitoring
    • Customized monitors
    • Monitor each process, component and application
        separately and all as a whole
    •   Proactive correction of problems before they become
        noticeable by your customers.
    •   Allow application to function at maximum availability
    •   SNMP monitoring of application infrastructure
    •   Alerting of potential problem or situation prior to accuracy
    •   Visual layered display of the entire data center

2
Monitoring is not just for
     System Administrators
    but for Developers as well




4
The goal for monitoring is to
      keep track of the running services 24/7,
        find troubles as early as possible and
        keep you alerted (only when needed…)


    Good monitoring infrastructure provides you a
    quick and direct troubleshooting abilities via
      visual representation of the system status




5
Multi-Layered Monitoring


                               Services
                     Keeping SLA, End-to-end service,
                        User experience monitors




                                                               Unified Dashboard
                            Applications
                      Application proprietary monitors,
                              Custom counters




                      Operating Systems
                  CPU, Memory, Disk, Network, Processes




                          Infrastructure
                  Network connections, network devices,
                 Chassis, Routers, Load Balancers, Firewalls




6
Visualize the information
• Use Maps & Views
    • Visualize the topology
    • Visualize the application flow
    • Focus on different layers
• Layered views (service & host groups)
    • Network, Hardware, OS, Application
• Different roles looking for different info
• Graph Service performance
    • Transactions, Success rates, Cache
• Aggregated view
    • Cluster’s average


7
Why multi layer ?
• Correlated information on one uniform view
    • Network throughput, CPU usage, Application usage
• Generate aggregated reports for different
    machines & layers
•   Collect information from all nodes
    • Switches, routers, firewalls, load balancers, storage,
      servers, applications
• Collect different types of data
    • Utilization, throughput, concurrency, cache status
    • Application performance, error rates
• Objective
    • Find the root cause via visuals on the dashboard !
    • Be aware of what’s going on
8
Unified Dashboard
             Infrastructure
     Network connections, network devices,
    Chassis, Routers, Load Balancers, Firewalls




9
Infrastructure layer
• Hardware redundancy my be dangerous if you
     don’t keep your eyes on it
     • Administrators not always seeing the HW
     • Redundant hardware can fool you (until it dies)
• Vendor specific MIBS / Syslog
• Today’s hardware provides detailed status
     interfaces
     • Power supplies, power usage
     • Fans & temperature
     • Disk controllers & drives
     • Switch ports, interfaces
     • Links, connections

10
HP HW monitoring




11
Network infrastructure health
Link Quality        Devices utilization




                    Network throughput




12
Unified Dashboard
        Operating Systems
     CPU, Memory, Disk, Network, Processes




13
Operating System health
• Monitoring of OS
     essentials
      • CPU, Memory, Disk
        I/O, Network traffic,
        processes, services
•    Use cases
      • Failure on log
        cleanup > disks full




14
Unified Dashboard
           Applications
     Application proprietary monitors,
             Custom counters




15
Applications health
• Transaction counters
     • Measure transaction rates
     • Measure counters on
     input & output
• % Success rates
     • Success counters for
     primary operations




16
System input/output monitoring

        Outlined topology




17
Applications health
• Queues
     • Processing backlog
     • Semaphores / throttle usage
• Latency
     • Measure the time it takes to process request / data chunk
• Optimizations
     • Measure compression rates




18
Applications health
• DB synchronizations
     • Replication status
     • Replication backlog & delays




19
Applications health
• Cluster & topology monitoring
     • Track application topology changes
     • Indicate dependencies status




20
Application topology

                     Outlined topology
                 Front-ends    Routers                Transports




       Load                                                          Load
                              Firewalls   Firewalls
     Balancers                                                     Balancers




21
Services
     Keeping SLA, End-to-end service,
        User experience monitors




                                        Unified Dashboard
22
Cluster overall QoS
•Now it’s a cluster, aggregated counters
•Want to know what users are experiencing




23
User Experience & Service QoS
• Simulate user behavior
   • Latency
       • How long it takes to login
       • How long it takes to send and
         receive a message
       • How long it takes to “check out”
     • Success rates
       • What’s the success rate of the
         user’s operation
     • Download speed
       • What’s the download speed
       from different locations
       • What’s your Content Delivery
         Network (CDN) performance
24
Recommendations
• Build a generic monitoring infrastructure with
     generic tools and interfaces
•    Use embedded SNMP
•    Net-snmp is extendable (also for Windows)
     • PROXY – proxy request to other SNMP agent (embedded)
     proxy -v 2c -c public 127.0.0.1:50910 1.3.6.1.4.1.15867.2000.3.6
       1.3.6.1.4.1.15867.2000.3.6
     • PASS – STDOUT based subagent
     pass .1.3.6.1.4.1.15867.2001 /bin/sh
       /usr/local/ixi/GenericSubAgent.sh
     • EXEC – run a script
     exec .1.3.6.1.4.1.15867.1100.20.10 axs-imap-test-stat /bin/bash
       /opt/mas/scripts/imap_tester.sh last_state

• SSH / Telnet
25
Using Status files
• Perfect for batch operations
     • perl, python, php
• Status file
     TIMESTAMP:1276203703
     STATUS:0
     HOSTNAME:myserver

• Observer
     if [ $(get_time_delta ${file}) -gt ${max_d_s} ]; then
          err "Delta is greater than ${max_delta} hours"
          return
     Fi
     if [ "$(parse_status_file ${file} STATUS)" != "0" ]; then
          err "Last backup status is not 0“
          return
     Fi
     echo ${ok}

27
Command line based info.
• Command line applications
      • DB, Softswitch, Etc.
• Example
      • Snmpd.conf
     pass .1.3.6.1.4.1.15867.1.100 /bin/bash
     /usr/local/emind/replication_status.sh


      • Run the script replication_status.sh
         • Execute SQL Query - show    slave statusG;
         • Parse the output
         • Return data to snmpd



28
Important to remember
• Define your goals, What’s right for you ?
     • Don’t over monitor
     • Use methodologies and technologies that fit your network
       and needs, not the other way around.
• Build generic interfaces
     • SNMP
     • Simple command line
     • No proprietary protocols
     • Agnostic to the monitoring tool




29
Mobile Access




30
Leading tools – Open source first…
• Nagios
     • Very generic, lot’s of public plug-ins
     • Easy to tweak and build it at your own style
• Zabix
     • A complete monitoring solution
     • Less customizable
• Cacti
     • Graphing (RRD tool)
     • Easy to configure
     • Lot’s of public templates



31
Monitoring on the Cloud
• Nagios + Dynamic Configuration = Dynagios !
• Key features
     • Auto provisioning
     • Add, Remove, Suspend, Unsuspended
• Machines are monitored base on their
     predefined profiles
•    Machines can join / leave the monitor
     (purposely)
     • Join on boot
     • Leave on shutdown
     • If crash happens alert will raise


32
Commercial tools

Feature                          ManageEngine       Orion
OS Requirements                  Windows & Linux   Windows
Modular (applications, IP SLA,         Yes           Yes
Netflow, Conf.)
Very easy to deploy                    Yes           Yes
Multi vendor with lot’s of             Yes           Yes
templates
Maps                                   Yes           Yes
Less customizable … but still          Yes           Yes
flexible
Est. price for 100 devices            $10k          $15k




33
Questions?




34

Más contenido relacionado

La actualidad más candente

Cloud Automation Manager
Cloud Automation ManagerCloud Automation Manager
Cloud Automation ManagerNithin Babu
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7Site24x7
 
Service fabric overview
Service fabric overviewService fabric overview
Service fabric overviewHimanshu Desai
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureAWS Vietnam Community
 
Enterprise Beacon Object Hive - Siebel Version Control
Enterprise Beacon Object Hive - Siebel Version ControlEnterprise Beacon Object Hive - Siebel Version Control
Enterprise Beacon Object Hive - Siebel Version ControlMilind Waikul
 
Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the CloudSite24x7
 
How Applications Manager helps with application performance monitoring
How Applications Manager helps with application performance monitoringHow Applications Manager helps with application performance monitoring
How Applications Manager helps with application performance monitoringManageEngine, Zoho Corporation
 
Docker productivity boost
Docker productivity boostDocker productivity boost
Docker productivity boostIgor Dmitriev
 
#DFWVMUG - Automating the Next Generation Datacenter
#DFWVMUG - Automating the Next Generation Datacenter#DFWVMUG - Automating the Next Generation Datacenter
#DFWVMUG - Automating the Next Generation DatacenterJosh Atwell
 
Automating the Next Generation Datacenter
Automating the Next Generation DatacenterAutomating the Next Generation Datacenter
Automating the Next Generation DatacenterJosh Atwell
 
Performance management
Performance managementPerformance management
Performance managementAlan Lok
 
PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Supportrcastain
 
Getting Started with Infrastructure as Code (IaC)
Getting Started with Infrastructure as Code (IaC)Getting Started with Infrastructure as Code (IaC)
Getting Started with Infrastructure as Code (IaC)Noor Basha
 
Tokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service FabricTokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service FabricTokyo Azure Meetup
 
Steampunk App Servers in
Steampunk App Servers in Steampunk App Servers in
Steampunk App Servers in Chris Haddad
 
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the CloudNGINX, Inc.
 

La actualidad más candente (20)

Cloud Automation Manager
Cloud Automation ManagerCloud Automation Manager
Cloud Automation Manager
 
[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7[Webinar] AWS Monitoring with Site24x7
[Webinar] AWS Monitoring with Site24x7
 
Service fabric overview
Service fabric overviewService fabric overview
Service fabric overview
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Cloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless ArchitectureCloudsolutionday 2016: Getting Started with Severless Architecture
Cloudsolutionday 2016: Getting Started with Severless Architecture
 
Enterprise Beacon Object Hive - Siebel Version Control
Enterprise Beacon Object Hive - Siebel Version ControlEnterprise Beacon Object Hive - Siebel Version Control
Enterprise Beacon Object Hive - Siebel Version Control
 
Azure Reference Architectures
Azure Reference ArchitecturesAzure Reference Architectures
Azure Reference Architectures
 
Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the Cloud
 
How Applications Manager helps with application performance monitoring
How Applications Manager helps with application performance monitoringHow Applications Manager helps with application performance monitoring
How Applications Manager helps with application performance monitoring
 
Devops as a service
Devops as a serviceDevops as a service
Devops as a service
 
Docker productivity boost
Docker productivity boostDocker productivity boost
Docker productivity boost
 
#DFWVMUG - Automating the Next Generation Datacenter
#DFWVMUG - Automating the Next Generation Datacenter#DFWVMUG - Automating the Next Generation Datacenter
#DFWVMUG - Automating the Next Generation Datacenter
 
Automating the Next Generation Datacenter
Automating the Next Generation DatacenterAutomating the Next Generation Datacenter
Automating the Next Generation Datacenter
 
Performance management
Performance managementPerformance management
Performance management
 
PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Support
 
kaushal resume1
kaushal resume1kaushal resume1
kaushal resume1
 
Getting Started with Infrastructure as Code (IaC)
Getting Started with Infrastructure as Code (IaC)Getting Started with Infrastructure as Code (IaC)
Getting Started with Infrastructure as Code (IaC)
 
Tokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service FabricTokyo Azure Meetup #5 - Microservices and Azure Service Fabric
Tokyo Azure Meetup #5 - Microservices and Azure Service Fabric
 
Steampunk App Servers in
Steampunk App Servers in Steampunk App Servers in
Steampunk App Servers in
 
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
 

Similar a Multi Layer Monitoring V1

Lecture 12 monitoring the network
Lecture 12   monitoring the networkLecture 12   monitoring the network
Lecture 12 monitoring the networkWiliam Ferraciolli
 
RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup
RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup
RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup Ricoh India Limited
 
Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3Juan Herrera Utande
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
Kaseya Connect 2012 - THE ABC'S OF MONITORING
Kaseya Connect 2012 - THE ABC'S OF MONITORINGKaseya Connect 2012 - THE ABC'S OF MONITORING
Kaseya Connect 2012 - THE ABC'S OF MONITORINGKaseya
 
Simplifying DCIM with OP Manager - DCW'17
Simplifying DCIM with OP Manager - DCW'17Simplifying DCIM with OP Manager - DCW'17
Simplifying DCIM with OP Manager - DCW'17PG Software Europe
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance ManagementNoriaki Tatsumi
 
12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodologylaeshin park
 
RuSIEM overview (english version)
RuSIEM overview (english version)RuSIEM overview (english version)
RuSIEM overview (english version)Olesya Shelestova
 
Zenoss presentation (nur nabilah hassan)
Zenoss presentation (nur nabilah hassan)Zenoss presentation (nur nabilah hassan)
Zenoss presentation (nur nabilah hassan)Nur Nabilah Hassan
 
Lahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile GatewaysLahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile GatewaysLahav Savir
 
Cisco Standard Network Platform (SNP) - Catholic Relief Services Case Study
Cisco Standard Network Platform (SNP) - Catholic Relief Services Case StudyCisco Standard Network Platform (SNP) - Catholic Relief Services Case Study
Cisco Standard Network Platform (SNP) - Catholic Relief Services Case Studynicholas njoroge
 

Similar a Multi Layer Monitoring V1 (20)

MobZabbix.pptx
MobZabbix.pptxMobZabbix.pptx
MobZabbix.pptx
 
Lecture 12 monitoring the network
Lecture 12   monitoring the networkLecture 12   monitoring the network
Lecture 12 monitoring the network
 
RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup
RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup
RPASS - Ricoh Proactive ServiceS for Remote Monitoring & Backup
 
Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3Best practices in Deploying SUSE CaaS Platform v3
Best practices in Deploying SUSE CaaS Platform v3
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
QNX Sales Engineering Presentation
QNX Sales Engineering PresentationQNX Sales Engineering Presentation
QNX Sales Engineering Presentation
 
Kaseya Connect 2012 - THE ABC'S OF MONITORING
Kaseya Connect 2012 - THE ABC'S OF MONITORINGKaseya Connect 2012 - THE ABC'S OF MONITORING
Kaseya Connect 2012 - THE ABC'S OF MONITORING
 
Simplifying DCIM with OP Manager - DCW'17
Simplifying DCIM with OP Manager - DCW'17Simplifying DCIM with OP Manager - DCW'17
Simplifying DCIM with OP Manager - DCW'17
 
Application Performance Management
Application Performance ManagementApplication Performance Management
Application Performance Management
 
Simics - Break the Rules of Product Development
Simics - Break the Rules of Product DevelopmentSimics - Break the Rules of Product Development
Simics - Break the Rules of Product Development
 
12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodology
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDN
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
RuSIEM overview (english version)
RuSIEM overview (english version)RuSIEM overview (english version)
RuSIEM overview (english version)
 
Zenoss presentation (nur nabilah hassan)
Zenoss presentation (nur nabilah hassan)Zenoss presentation (nur nabilah hassan)
Zenoss presentation (nur nabilah hassan)
 
Lahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile GatewaysLahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile Gateways
 
Cisco Standard Network Platform (SNP) - Catholic Relief Services Case Study
Cisco Standard Network Platform (SNP) - Catholic Relief Services Case StudyCisco Standard Network Platform (SNP) - Catholic Relief Services Case Study
Cisco Standard Network Platform (SNP) - Catholic Relief Services Case Study
 
Computer Fundamentals
Computer FundamentalsComputer Fundamentals
Computer Fundamentals
 
Computer fundamental
Computer fundamentalComputer fundamental
Computer fundamental
 

Más de Lahav Savir

How to Secure Your AWS Powered Mobile App End-to-End
How to Secure Your AWS Powered Mobile App End-to-EndHow to Secure Your AWS Powered Mobile App End-to-End
How to Secure Your AWS Powered Mobile App End-to-EndLahav Savir
 
Best of re:Invent 2016 meetup presentation
Best of re:Invent 2016 meetup presentationBest of re:Invent 2016 meetup presentation
Best of re:Invent 2016 meetup presentationLahav Savir
 
How to protect your IoT data on AWS
How to protect your IoT data on AWSHow to protect your IoT data on AWS
How to protect your IoT data on AWSLahav Savir
 
How to Protect your AWS Environment
How to Protect your AWS EnvironmentHow to Protect your AWS Environment
How to Protect your AWS EnvironmentLahav Savir
 
Emind’s Architecture for Enterprise with AWS Integration
Emind’s Architecture for Enterprise with AWS IntegrationEmind’s Architecture for Enterprise with AWS Integration
Emind’s Architecture for Enterprise with AWS IntegrationLahav Savir
 
Real-Time Vote Platform Benchmark
Real-Time Vote Platform BenchmarkReal-Time Vote Platform Benchmark
Real-Time Vote Platform BenchmarkLahav Savir
 
Running an erlang based messaging system on AWS
Running an erlang based messaging system on AWSRunning an erlang based messaging system on AWS
Running an erlang based messaging system on AWSLahav Savir
 
Deploying secure backup on to the Cloud
Deploying secure backup on to the CloudDeploying secure backup on to the Cloud
Deploying secure backup on to the CloudLahav Savir
 
סע לשלום - הדרכה לרכזים כיתתיים
סע לשלום - הדרכה לרכזים כיתתייםסע לשלום - הדרכה לרכזים כיתתיים
סע לשלום - הדרכה לרכזים כיתתייםLahav Savir
 

Más de Lahav Savir (9)

How to Secure Your AWS Powered Mobile App End-to-End
How to Secure Your AWS Powered Mobile App End-to-EndHow to Secure Your AWS Powered Mobile App End-to-End
How to Secure Your AWS Powered Mobile App End-to-End
 
Best of re:Invent 2016 meetup presentation
Best of re:Invent 2016 meetup presentationBest of re:Invent 2016 meetup presentation
Best of re:Invent 2016 meetup presentation
 
How to protect your IoT data on AWS
How to protect your IoT data on AWSHow to protect your IoT data on AWS
How to protect your IoT data on AWS
 
How to Protect your AWS Environment
How to Protect your AWS EnvironmentHow to Protect your AWS Environment
How to Protect your AWS Environment
 
Emind’s Architecture for Enterprise with AWS Integration
Emind’s Architecture for Enterprise with AWS IntegrationEmind’s Architecture for Enterprise with AWS Integration
Emind’s Architecture for Enterprise with AWS Integration
 
Real-Time Vote Platform Benchmark
Real-Time Vote Platform BenchmarkReal-Time Vote Platform Benchmark
Real-Time Vote Platform Benchmark
 
Running an erlang based messaging system on AWS
Running an erlang based messaging system on AWSRunning an erlang based messaging system on AWS
Running an erlang based messaging system on AWS
 
Deploying secure backup on to the Cloud
Deploying secure backup on to the CloudDeploying secure backup on to the Cloud
Deploying secure backup on to the Cloud
 
סע לשלום - הדרכה לרכזים כיתתיים
סע לשלום - הדרכה לרכזים כיתתייםסע לשלום - הדרכה לרכזים כיתתיים
סע לשלום - הדרכה לרכזים כיתתיים
 

Último

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Multi Layer Monitoring V1

  • 1. Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems lahavs@emindsys.com www.emindsys.com
  • 2. The approach • Non-stop applications can’t leave on their own • More complex systems require more monitoring • Proactive system monitoring • Customized monitors • Monitor each process, component and application separately and all as a whole • Proactive correction of problems before they become noticeable by your customers. • Allow application to function at maximum availability • SNMP monitoring of application infrastructure • Alerting of potential problem or situation prior to accuracy • Visual layered display of the entire data center 2
  • 3. Monitoring is not just for System Administrators but for Developers as well 4
  • 4. The goal for monitoring is to keep track of the running services 24/7, find troubles as early as possible and keep you alerted (only when needed…) Good monitoring infrastructure provides you a quick and direct troubleshooting abilities via visual representation of the system status 5
  • 5. Multi-Layered Monitoring Services Keeping SLA, End-to-end service, User experience monitors Unified Dashboard Applications Application proprietary monitors, Custom counters Operating Systems CPU, Memory, Disk, Network, Processes Infrastructure Network connections, network devices, Chassis, Routers, Load Balancers, Firewalls 6
  • 6. Visualize the information • Use Maps & Views • Visualize the topology • Visualize the application flow • Focus on different layers • Layered views (service & host groups) • Network, Hardware, OS, Application • Different roles looking for different info • Graph Service performance • Transactions, Success rates, Cache • Aggregated view • Cluster’s average 7
  • 7. Why multi layer ? • Correlated information on one uniform view • Network throughput, CPU usage, Application usage • Generate aggregated reports for different machines & layers • Collect information from all nodes • Switches, routers, firewalls, load balancers, storage, servers, applications • Collect different types of data • Utilization, throughput, concurrency, cache status • Application performance, error rates • Objective • Find the root cause via visuals on the dashboard ! • Be aware of what’s going on 8
  • 8. Unified Dashboard Infrastructure Network connections, network devices, Chassis, Routers, Load Balancers, Firewalls 9
  • 9. Infrastructure layer • Hardware redundancy my be dangerous if you don’t keep your eyes on it • Administrators not always seeing the HW • Redundant hardware can fool you (until it dies) • Vendor specific MIBS / Syslog • Today’s hardware provides detailed status interfaces • Power supplies, power usage • Fans & temperature • Disk controllers & drives • Switch ports, interfaces • Links, connections 10
  • 11. Network infrastructure health Link Quality Devices utilization Network throughput 12
  • 12. Unified Dashboard Operating Systems CPU, Memory, Disk, Network, Processes 13
  • 13. Operating System health • Monitoring of OS essentials • CPU, Memory, Disk I/O, Network traffic, processes, services • Use cases • Failure on log cleanup > disks full 14
  • 14. Unified Dashboard Applications Application proprietary monitors, Custom counters 15
  • 15. Applications health • Transaction counters • Measure transaction rates • Measure counters on input & output • % Success rates • Success counters for primary operations 16
  • 16. System input/output monitoring Outlined topology 17
  • 17. Applications health • Queues • Processing backlog • Semaphores / throttle usage • Latency • Measure the time it takes to process request / data chunk • Optimizations • Measure compression rates 18
  • 18. Applications health • DB synchronizations • Replication status • Replication backlog & delays 19
  • 19. Applications health • Cluster & topology monitoring • Track application topology changes • Indicate dependencies status 20
  • 20. Application topology Outlined topology Front-ends Routers Transports Load Load Firewalls Firewalls Balancers Balancers 21
  • 21. Services Keeping SLA, End-to-end service, User experience monitors Unified Dashboard 22
  • 22. Cluster overall QoS •Now it’s a cluster, aggregated counters •Want to know what users are experiencing 23
  • 23. User Experience & Service QoS • Simulate user behavior • Latency • How long it takes to login • How long it takes to send and receive a message • How long it takes to “check out” • Success rates • What’s the success rate of the user’s operation • Download speed • What’s the download speed from different locations • What’s your Content Delivery Network (CDN) performance 24
  • 24. Recommendations • Build a generic monitoring infrastructure with generic tools and interfaces • Use embedded SNMP • Net-snmp is extendable (also for Windows) • PROXY – proxy request to other SNMP agent (embedded) proxy -v 2c -c public 127.0.0.1:50910 1.3.6.1.4.1.15867.2000.3.6 1.3.6.1.4.1.15867.2000.3.6 • PASS – STDOUT based subagent pass .1.3.6.1.4.1.15867.2001 /bin/sh /usr/local/ixi/GenericSubAgent.sh • EXEC – run a script exec .1.3.6.1.4.1.15867.1100.20.10 axs-imap-test-stat /bin/bash /opt/mas/scripts/imap_tester.sh last_state • SSH / Telnet 25
  • 25. Using Status files • Perfect for batch operations • perl, python, php • Status file TIMESTAMP:1276203703 STATUS:0 HOSTNAME:myserver • Observer if [ $(get_time_delta ${file}) -gt ${max_d_s} ]; then err "Delta is greater than ${max_delta} hours" return Fi if [ "$(parse_status_file ${file} STATUS)" != "0" ]; then err "Last backup status is not 0“ return Fi echo ${ok} 27
  • 26. Command line based info. • Command line applications • DB, Softswitch, Etc. • Example • Snmpd.conf pass .1.3.6.1.4.1.15867.1.100 /bin/bash /usr/local/emind/replication_status.sh • Run the script replication_status.sh • Execute SQL Query - show slave statusG; • Parse the output • Return data to snmpd 28
  • 27. Important to remember • Define your goals, What’s right for you ? • Don’t over monitor • Use methodologies and technologies that fit your network and needs, not the other way around. • Build generic interfaces • SNMP • Simple command line • No proprietary protocols • Agnostic to the monitoring tool 29
  • 29. Leading tools – Open source first… • Nagios • Very generic, lot’s of public plug-ins • Easy to tweak and build it at your own style • Zabix • A complete monitoring solution • Less customizable • Cacti • Graphing (RRD tool) • Easy to configure • Lot’s of public templates 31
  • 30. Monitoring on the Cloud • Nagios + Dynamic Configuration = Dynagios ! • Key features • Auto provisioning • Add, Remove, Suspend, Unsuspended • Machines are monitored base on their predefined profiles • Machines can join / leave the monitor (purposely) • Join on boot • Leave on shutdown • If crash happens alert will raise 32
  • 31. Commercial tools Feature ManageEngine Orion OS Requirements Windows & Linux Windows Modular (applications, IP SLA, Yes Yes Netflow, Conf.) Very easy to deploy Yes Yes Multi vendor with lot’s of Yes Yes templates Maps Yes Yes Less customizable … but still Yes Yes flexible Est. price for 100 devices $10k $15k 33