SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
1
Confidential
Media High Availability Service
November 2020
Confidential
Nazariy Mamrokha - Engineering Director, Media
● More than 13 years of experience in software
development and Media domain. Including 10 years of
software development and 6 years of management
background.
● Strong experience in Media and Broadcasting domain.
● Architecting solutions for:
○ Media OTT applications (mobile, living room, gaming
consoles, Smart TVs, Web),
○ Backend Services (CMS/CDN, Streaming Services,
Subscription Management, Ad-Tech solutions,
Billing and Monetization, Analytics, Application
Store).
● Leading Media Program with 150+ engineers working
on 25+ projects
● Понад 13 років досвіду у розробці програмного
забезпечення та медіа-домені. У тому числі 10
років розробки програмного забезпечення та 6
років досвіду в менеджментів.
● Великий досвід у сфері Медіа та Мовлення.
● Розробка архітектурних рішення для:
○ Медіа/OTT додатків (мобільні, телевізійні
приставки, ігрові консолі, Smart TV,),
○ Бекенд-сервіси (CMS / CDN, потокові
сервіси, сервіси підписки, рішення Ad-Tech,
Монетизація та платежі, сервіси аналітики,
магазини додатків)
● Очолює програму з 25+ проектів у сфері Media,
загальною кількість 150+ людей
3
Confidential
3
Intro to Media
4
Confidential
End-to-End Video Content Lifecycle
Get your content and data
into the system
Perform all necessary content
manipulations
Play on any
device
Deliver to
end-user
MonetizeProduce
content
- Ingest
- Metadata
- Encode -
Transcode
- ABR
- Codecs
- Store
- Host
- Organize
- Scale
- Backup
- CDN
- ABR
- Packaging
- Encrypt
- DRM
- CAS
- Algorithms
- Apps
- Any
platform
- Any
device
- Subscription
management
- Ad Exchange
- Billing
- Manage
- Extract
- Archive
- Search
- Workflows
- Capture -
Edit
- Effects
- Workflows
- Finishing
Analytics
Content providers (TV
networks, studios, video
bloggers, etc.)
Service providers &
technology vendors (telecom,
broadband, CDN, ISVs, etc.)
OEM (connected
devices, consumer
electronics, etc.)
Engineering QA & Automation DevOps MigrationDesign Architecture Support
Industry
Content
GlobalLogic
Engineering Services
Confidential
Cloud video streaming platform
OTT-service-like cloud platform for video content delivery and monetization
VOD
Live
MAM
Metadata
management, CMS
cDVR SSAI
User management
CDN Clients
Metadata
VoD, Linear (HLS)
User profiles, Auth
API
Content with ads
Timeshifted,
TVPersonal
Recordings
VoD Library,
Scheduling, EPG
Ad management
Ad Insertion
Ad Tracking
Ad Decisions
Ingest, Transcode,
Playout, Package
Confidential
Serhiy Onanchenko - NOC Team Leader
● Over 18 years of professional experience in IT
industry
● Full stack developer, DBA, Linux/Windows
environments system administrator, network
engineer
● Supported production-grade ecosystems in Telecom
domain
● Managed support groups (30 members) providing
administration and monitoring services (24/7) for 350+
customers
● Currently manage 12 engineers NOC monitoring
multiple high loaded environments (up to 250K
RPS, 3000+ instances)
● Більше 18 років професійного досвіду в ІТ-
індустрії
● Full stack developer, DBA, адміністратор
Linux/Windows середовищ, інженер мережевого
обладнання
● Підтримував Supported екосистеми виробничого
рівня в домені телекомунікацій
● Був керівником груп підтримки (30 інженерів) які
займались адмініструванням та моніторингом
сервісів (24/7) для 350+ замовників
● В даний момент є менеджером NOC з 12 інженерів
який надає сервіси моніторингу для багатьох
високонавантажених середовищ
(до 250K RPS, 3000+ серверів)
7
Confidential
7
NOC from scratch
8
Confidential
1.NOC - who we are ?
- team structure
- scope
2.Incidents management
3.Monitoring toolset
4.Monitoring challenges
and
best practices
Agenda
9
Confidential
9
NOC - who are are?
Confidential
Current Team structure
1 Team Leader
12 NOC Engineers (2 people per shift)
● Linux, Windows systems
administration, automation
scripting
● Cloud computing and networks
● Web applications and servers
architecture, HTTP, REST API
● Monitoring tools and principles
● Strong troubleshooting and
problem-solving skills
● Good English language skills
Confidential
NOC setup
Confidential
Questions to audience
Poll #1
What is the largest environment you supported ?
Confidential
● 5+ products, 1000+ B2B customers
● 9+ AWS production environments
● Microservices, Kubernetes clusters
● 3000+ running instances
● up to 250K RPS
Scope
Availability target: up to 99.995% =
max 30.2 sec of downtime weekly
MTTA
(Mean Time to Acknowledge)
Target - 1 minute
Confidential
Responsibilities
● Infrastructure, Services monitoring
● Incident management and documenting
● Monitoring systems and checks maintaining,
implementations of new metrics and monitoring scenarios
● Keep and update a directory of all 3rd parties
● One focal point that always knows the service level and issues status
● Defining reliable and preventive monitoring requirements as part of the product development life cycle
● Communication, coordination, collaboration
15
Confidential
15
Incidents management
Confidential
Incident management process
Confidential
Questions to audience
What incident management tools you used to work with ?
Poll #2
18
Confidential
18
Monitoring toolset
Confidential
Monitoring toolset OpsGenie
Prometheus
Grafana
Amazon CloudWatch
PRTG
Dotcom-Monitor
Foglight
Witbe robot
Youbora
Logz.io
+
multiple
custom scripts/sensors
Confidential
Questions to audience
What monitoring tools do you use for
production environment monitoring ?
Poll #3
Confidential
Dotcom-monitor
Confidential
Grafana
Confidential
Logz.io
Confidential
Youbora Analytics
Confidential
Witbe Robots
Witbe robots for end-to-end scenarios
testing on any device (PC, smartphone,
STB) and Quality of Experience (QoE)
monitoring.
26
Confidential
26
Monitoring challenges
and
best practices
Confidential
Monitoring challenges
● Mix of infrastructures setups and products
● Black Box monitoring
● Noise and false-positives
● Anomalies detection
● Multiple communication channels
● Complicated and long Runbooks Human in the middle
real-time operations
Confidential
SRE Golden Signals to monitor
There are three common methodologies:
● From the Google SRE book: Latency, Traffic, Errors, and
Saturation
● USE Method (from Brendan Gregg): Utilization, Saturation, and
Errors
● RED Method (from Tom Wilkie): Rate, Errors, and Duration
Useful references:
#1 #2 #3
Confidential
The USE Method
Methodology for analyzing the performance of any system
A summary of USE is
“For every resource, check utilization, saturation, and errors.”
Resource: all physical server functional components (CPUs, disks,...)
● Utilization: the average time the resource was busy servicing work
● Saturation: the degree to which the resource has extra work
which it can’t service, often queued
● Errors: the count of error events
Confidential
The RED Method
Methodology for services analysis
A summary of RED is
“For every service, check rate, errors, and duration.”
● Rate: the number of requests per second
● Errors: the number of those requests that are failing
● Duration: the amount of time those requests take
Confidential
Confidential
Anton Bil - Senior Software Engineer
● Over 8 years of professional experience in IT industry
● Strong experience Linux/Windows environments
system administrator, DevOps, SRE
● As a SRE supported highly loaded infrastructures with
more than 7,000+ servers. Media and CND services.
● Currently works as SRE which provides services in
support, optimization and automation in high loaded
environments (up to 250K RPS, 3000+ instances)
● Більше 8 років професійного досвіду в ІТ-
індустрії
● Великий досвід у адмініструванні Linux/Windows
середовищ, DevOps, SRE
● Як SRE підтримував високонавантажені
інфраструктури з більше ніж 7000+ серверів.
Media і CDN сервіси
● В даний момент SRE який надає послуги в
підтримці, оптимізації і автоматизації
високонавантажених середовищ
(до 250K RPS, 3000+ серверів)
33
Confidential
33
SRE -
who is Site Reliability Engineer?
Confidential
Questions to audience
Poll #4
Does your organization formally use
Site Reliability Engineering?
Confidential
Availability
Confidential
Error budget
Confidential
Questions to audience
Poll #5
How many incidents are happening
during the changes?
Confidential
38
How to achieve stability in media
products?
Confidential
39
“Day in the life of SRE”
1. Monitoring, Alerts management
2. Deployments
3. Automation
4. Processes/Documentation
5. Incident management
Confidential
Official information sources by Google:
books
online course
41
Confidential
Summary
Confidential
Summary
● Media/OTT streaming industry is constantly raising and skyrocketing
because of COVID
● Reliability is a key to sustain daily streaming of millions of hours
● Requires constant Quality of Service monitoring
● Requires 24x7 support across the world
43
Confidential
Thank you!

Más contenido relacionado

Similar a [Global logic] media high availability service

PrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latestPrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latest
Prashant Soni
 

Similar a [Global logic] media high availability service (20)

Shaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M ResumeShaik Niyas Ahamed M Resume
Shaik Niyas Ahamed M Resume
 
Rhytha Service Portfolio
Rhytha Service PortfolioRhytha Service Portfolio
Rhytha Service Portfolio
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New visionsoft
New visionsoftNew visionsoft
New visionsoft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
New Vision Soft
New Vision SoftNew Vision Soft
New Vision Soft
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
DevOps as a Service - our own true story with a happy ending (JuCParis 2018)
 
Develer - Company Profile
Develer - Company ProfileDeveler - Company Profile
Develer - Company Profile
 
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and JenkinsExpedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
Expedite Enterprise Software Development with JIRA®, TeamForge® SCM, and Jenkins
 
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
In Need For A Linux Kernel Maintained For A Very Long Time? CIP Linux Kernel ...
 
DevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss BankingDevOps & DevSecOps in Swiss Banking
DevOps & DevSecOps in Swiss Banking
 
Resume-Piyush_Agarwal (1)
Resume-Piyush_Agarwal (1)Resume-Piyush_Agarwal (1)
Resume-Piyush_Agarwal (1)
 
PrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latestPrashantSoni_exp_embeddedSwDevelopment_latest
PrashantSoni_exp_embeddedSwDevelopment_latest
 
Applied Systems '22: services & solutions.pptx
Applied Systems '22: services & solutions.pptxApplied Systems '22: services & solutions.pptx
Applied Systems '22: services & solutions.pptx
 
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
Dashlane Triple Track : à la recherche de la bonne organisation - Agile en Se...
 
ServerAdminz - A Server Management Company - Portfolio
ServerAdminz - A Server Management Company - PortfolioServerAdminz - A Server Management Company - Portfolio
ServerAdminz - A Server Management Company - Portfolio
 
Janakiraman_Mar2016_SF
Janakiraman_Mar2016_SFJanakiraman_Mar2016_SF
Janakiraman_Mar2016_SF
 

Más de GlobalLogic Ukraine

GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Ukraine
 

Más de GlobalLogic Ukraine (20)

GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
GlobalLogic Java Community Webinar #17 “SpringJDBC vs JDBC. Is Spring a Hero?”
 
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
GlobalLogic JavaScript Community Webinar #18 “Long Story Short: OSI Model”
 
Штучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptxШтучний інтелект як допомога в навчанні, а не замінник.pptx
Штучний інтелект як допомога в навчанні, а не замінник.pptx
 
Задачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptxЗадачі AI-розробника як застосовується штучний інтелект.pptx
Задачі AI-розробника як застосовується штучний інтелект.pptx
 
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptxЩо треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
Що треба вивчати, щоб стати розробником штучного інтелекту та нейромереж.pptx
 
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
GlobalLogic Java Community Webinar #16 “Zaloni’s Architecture for Data-Driven...
 
JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"JavaScript Community Webinar #14 "Why Is Git Rebase?"
JavaScript Community Webinar #14 "Why Is Git Rebase?"
 
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
GlobalLogic .NET Community Webinar #3 "Exploring Serverless with Azure Functi...
 
Страх і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic EducationСтрах і сила помилок - IT Inside від GlobalLogic Education
Страх і сила помилок - IT Inside від GlobalLogic Education
 
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
GlobalLogic .NET Webinar #2 “Azure RBAC and Managed Identity”
 
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”GlobalLogic QA Webinar “What does it take to become a Test Engineer”
GlobalLogic QA Webinar “What does it take to become a Test Engineer”
 
“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?“How to Secure Your Applications With a Keycloak?
“How to Secure Your Applications With a Keycloak?
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
 
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
GlobalLogic C++ Webinar “The Minimum Knowledge to Become a C++ Developer”
 
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
Embedded Webinar #17 "Low-level Network Testing in Embedded Devices Development"
 
GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"GlobalLogic Webinar "Introduction to Embedded QA"
GlobalLogic Webinar "Introduction to Embedded QA"
 
C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"C++ Webinar "Why Should You Learn C++ in 2021-22?"
C++ Webinar "Why Should You Learn C++ in 2021-22?"
 
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
GlobalLogic Test Automation Live Testing Session “Android Behind UI — Testing...
 
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
GlobalLogic Test Automation Online TechTalk “Test Driven Development as a Per...
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

[Global logic] media high availability service

  • 2. Confidential Nazariy Mamrokha - Engineering Director, Media ● More than 13 years of experience in software development and Media domain. Including 10 years of software development and 6 years of management background. ● Strong experience in Media and Broadcasting domain. ● Architecting solutions for: ○ Media OTT applications (mobile, living room, gaming consoles, Smart TVs, Web), ○ Backend Services (CMS/CDN, Streaming Services, Subscription Management, Ad-Tech solutions, Billing and Monetization, Analytics, Application Store). ● Leading Media Program with 150+ engineers working on 25+ projects ● Понад 13 років досвіду у розробці програмного забезпечення та медіа-домені. У тому числі 10 років розробки програмного забезпечення та 6 років досвіду в менеджментів. ● Великий досвід у сфері Медіа та Мовлення. ● Розробка архітектурних рішення для: ○ Медіа/OTT додатків (мобільні, телевізійні приставки, ігрові консолі, Smart TV,), ○ Бекенд-сервіси (CMS / CDN, потокові сервіси, сервіси підписки, рішення Ad-Tech, Монетизація та платежі, сервіси аналітики, магазини додатків) ● Очолює програму з 25+ проектів у сфері Media, загальною кількість 150+ людей
  • 4. 4 Confidential End-to-End Video Content Lifecycle Get your content and data into the system Perform all necessary content manipulations Play on any device Deliver to end-user MonetizeProduce content - Ingest - Metadata - Encode - Transcode - ABR - Codecs - Store - Host - Organize - Scale - Backup - CDN - ABR - Packaging - Encrypt - DRM - CAS - Algorithms - Apps - Any platform - Any device - Subscription management - Ad Exchange - Billing - Manage - Extract - Archive - Search - Workflows - Capture - Edit - Effects - Workflows - Finishing Analytics Content providers (TV networks, studios, video bloggers, etc.) Service providers & technology vendors (telecom, broadband, CDN, ISVs, etc.) OEM (connected devices, consumer electronics, etc.) Engineering QA & Automation DevOps MigrationDesign Architecture Support Industry Content GlobalLogic Engineering Services
  • 5. Confidential Cloud video streaming platform OTT-service-like cloud platform for video content delivery and monetization VOD Live MAM Metadata management, CMS cDVR SSAI User management CDN Clients Metadata VoD, Linear (HLS) User profiles, Auth API Content with ads Timeshifted, TVPersonal Recordings VoD Library, Scheduling, EPG Ad management Ad Insertion Ad Tracking Ad Decisions Ingest, Transcode, Playout, Package
  • 6. Confidential Serhiy Onanchenko - NOC Team Leader ● Over 18 years of professional experience in IT industry ● Full stack developer, DBA, Linux/Windows environments system administrator, network engineer ● Supported production-grade ecosystems in Telecom domain ● Managed support groups (30 members) providing administration and monitoring services (24/7) for 350+ customers ● Currently manage 12 engineers NOC monitoring multiple high loaded environments (up to 250K RPS, 3000+ instances) ● Більше 18 років професійного досвіду в ІТ- індустрії ● Full stack developer, DBA, адміністратор Linux/Windows середовищ, інженер мережевого обладнання ● Підтримував Supported екосистеми виробничого рівня в домені телекомунікацій ● Був керівником груп підтримки (30 інженерів) які займались адмініструванням та моніторингом сервісів (24/7) для 350+ замовників ● В даний момент є менеджером NOC з 12 інженерів який надає сервіси моніторингу для багатьох високонавантажених середовищ (до 250K RPS, 3000+ серверів)
  • 8. 8 Confidential 1.NOC - who we are ? - team structure - scope 2.Incidents management 3.Monitoring toolset 4.Monitoring challenges and best practices Agenda
  • 10. Confidential Current Team structure 1 Team Leader 12 NOC Engineers (2 people per shift) ● Linux, Windows systems administration, automation scripting ● Cloud computing and networks ● Web applications and servers architecture, HTTP, REST API ● Monitoring tools and principles ● Strong troubleshooting and problem-solving skills ● Good English language skills
  • 12. Confidential Questions to audience Poll #1 What is the largest environment you supported ?
  • 13. Confidential ● 5+ products, 1000+ B2B customers ● 9+ AWS production environments ● Microservices, Kubernetes clusters ● 3000+ running instances ● up to 250K RPS Scope Availability target: up to 99.995% = max 30.2 sec of downtime weekly MTTA (Mean Time to Acknowledge) Target - 1 minute
  • 14. Confidential Responsibilities ● Infrastructure, Services monitoring ● Incident management and documenting ● Monitoring systems and checks maintaining, implementations of new metrics and monitoring scenarios ● Keep and update a directory of all 3rd parties ● One focal point that always knows the service level and issues status ● Defining reliable and preventive monitoring requirements as part of the product development life cycle ● Communication, coordination, collaboration
  • 17. Confidential Questions to audience What incident management tools you used to work with ? Poll #2
  • 19. Confidential Monitoring toolset OpsGenie Prometheus Grafana Amazon CloudWatch PRTG Dotcom-Monitor Foglight Witbe robot Youbora Logz.io + multiple custom scripts/sensors
  • 20. Confidential Questions to audience What monitoring tools do you use for production environment monitoring ? Poll #3
  • 25. Confidential Witbe Robots Witbe robots for end-to-end scenarios testing on any device (PC, smartphone, STB) and Quality of Experience (QoE) monitoring.
  • 27. Confidential Monitoring challenges ● Mix of infrastructures setups and products ● Black Box monitoring ● Noise and false-positives ● Anomalies detection ● Multiple communication channels ● Complicated and long Runbooks Human in the middle real-time operations
  • 28. Confidential SRE Golden Signals to monitor There are three common methodologies: ● From the Google SRE book: Latency, Traffic, Errors, and Saturation ● USE Method (from Brendan Gregg): Utilization, Saturation, and Errors ● RED Method (from Tom Wilkie): Rate, Errors, and Duration Useful references: #1 #2 #3
  • 29. Confidential The USE Method Methodology for analyzing the performance of any system A summary of USE is “For every resource, check utilization, saturation, and errors.” Resource: all physical server functional components (CPUs, disks,...) ● Utilization: the average time the resource was busy servicing work ● Saturation: the degree to which the resource has extra work which it can’t service, often queued ● Errors: the count of error events
  • 30. Confidential The RED Method Methodology for services analysis A summary of RED is “For every service, check rate, errors, and duration.” ● Rate: the number of requests per second ● Errors: the number of those requests that are failing ● Duration: the amount of time those requests take
  • 32. Confidential Anton Bil - Senior Software Engineer ● Over 8 years of professional experience in IT industry ● Strong experience Linux/Windows environments system administrator, DevOps, SRE ● As a SRE supported highly loaded infrastructures with more than 7,000+ servers. Media and CND services. ● Currently works as SRE which provides services in support, optimization and automation in high loaded environments (up to 250K RPS, 3000+ instances) ● Більше 8 років професійного досвіду в ІТ- індустрії ● Великий досвід у адмініструванні Linux/Windows середовищ, DevOps, SRE ● Як SRE підтримував високонавантажені інфраструктури з більше ніж 7000+ серверів. Media і CDN сервіси ● В даний момент SRE який надає послуги в підтримці, оптимізації і автоматизації високонавантажених середовищ (до 250K RPS, 3000+ серверів)
  • 33. 33 Confidential 33 SRE - who is Site Reliability Engineer?
  • 34. Confidential Questions to audience Poll #4 Does your organization formally use Site Reliability Engineering?
  • 37. Confidential Questions to audience Poll #5 How many incidents are happening during the changes?
  • 38. Confidential 38 How to achieve stability in media products?
  • 39. Confidential 39 “Day in the life of SRE” 1. Monitoring, Alerts management 2. Deployments 3. Automation 4. Processes/Documentation 5. Incident management
  • 40. Confidential Official information sources by Google: books online course
  • 42. Confidential Summary ● Media/OTT streaming industry is constantly raising and skyrocketing because of COVID ● Reliability is a key to sustain daily streaming of millions of hours ● Requires constant Quality of Service monitoring ● Requires 24x7 support across the world