SlideShare una empresa de Scribd logo
1 de 30
Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/
g the Blinds: Monitoring Windows Server
• SaaS based infrastructure and app monitoring
• Open Source Agent
• Time series data (metrics and events)
• Processing nearly a trillion data points per day
• Intelligent Alerting and Insightful Dashboards
Datadog Overview
Operating Systems, Cloud Providers (AWS), Containers, Web Servers, Datastores,
Caches, Queues and more...
Monitor Everything
Agenda
- Why should I monitor Windows Server?
- What are some indicators of performance
issues?
- How can I collect performance metrics for
analysis?
What to monitor?
CPU metrics
- PercentProcessorTime
- ContextSwitchesPersec
- ProcessorQueueLength
- DPCsQueuedPersec
- PercentPrivilegedTime
- PercentDPCTime
- PercentInterruptTime
CPU: ContextSwitchesPersec
What it tracks:
Number of times the processor switched to a new thread
Correlate with:
Memory: PageFaultsPersec
Disk: DiskTransfersPersec
Network: BytesSentPersec/BytesReceivedPersec
Issue resolution:
Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
CPU: PercentProcessorTime
What it tracks:
Percentage of time spent performing work (not idle)
Correlate with:
ProcessorQueueLength
Issue resolution:
More processors, bigger instance, optimize offending application,
CPU: ProcessorQueueLength
What it tracks:
Size of processor queue
Correlate with:
CPU: PercentProcessorTime, PercentPrivilegedTime, PercentDPCTime, PercentInterruptTime
Issue resolution:
Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
CPU:DPCsQueuedPersec
What it tracks:
Deferred procedure call (DPC) enqueue rate
Correlate with:
CPU: PercentDPCTime
Disk: DiskTransfersPersec
Network: BytesSentPersec/BytesReceivedPersec
Issue resolution:
Remove buggy device, rollback driver
CPU: PercentPrivilegedTime/PercentDPCTime
PercentInterruptTime
What they track:
Percentage of time CPU spent in privileged mode/deferred procedure
calls/interrupts
Correlate with:
ContextSwitchesPersec/PercentPrivilegedTime/PercentDPCTime PercentInterruptTime
Issue resolution:
Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
Memory metrics
- PoolNonpagedBytes
- PageFaultsPersec
- PagesInputPersec
Memory: PoolNonpagedBytes
What it tracks:
Amount of non-paged memory in use
Correlate with:
Windows Event 2019 “Nonpaged Memory Pool Empty”
Issue resolution:
Identify troublesome driver/roll back to known good state
What it tracks:
Rate of page faults
Correlate with:
PagesInputPersec
Issue resolution:
Increase system memory
Memory: PageFaultsPersec
What it tracks:
Rate pages are read (from disk) into memory
Correlate with:
PageFaultsPersec/ DiskTransfersPersec
Issue resolution:
Increase system memory, move page file to separate physical disk
Memory: PagesInputPersec
- AvgDiskQueueLength
- DiskTransfersPersec
- PercentIdleTime
Disk Metrics
Disk: AvgDiskQueueLength
What it tracks:
Running average of I/O ops in queue
Correlate with:
DiskTransfersPersec
Issue resolution:
Move data for I/O-intensive applications to separate disk; add disks to syste
Disk: DiskTransfersPersec
What it tracks:
Aggregate I/O rate
Correlate with:
AvgDiskQueueLength
Issue resolution:
Move data for I/O-intensive applications to separate disk; add disks to
system; increase disk cache
Disk: PercentIdleTime
What it tracks:
Percent of time disk is idle
Correlate with:
AvgDiskQueueLength
Issue resolution:
Move page file to separate disk; add disks to system; use SSDs
Tooling
Word of Warning
Powershell
- Windows’ scripting language (no more batch files!)
- Powerful language with deep OS support
- Integrates with C# natively
- Output is typed (unlike *NIX)
Powershell
Powershell
Perfmon
Windows Performance Toolkit
Requires Windows
Assessment and
Deployment Kit (formerly
Windows Performance
Toolkit)
https://www.microsoft.com
/en-
US/download/details.aspx
?id=39982
Windows Performance Recorder
Questions?
Evan Mouzakitis
Research Engineer
Twitter: @vagelim
Email: evan@datadoghq.com
Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/

Más contenido relacionado

La actualidad más candente

AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017AWSKRUG - AWS한국사용자모임
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon Web Services Korea
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
Cloud Price Comparison - AWS vs Azure vs Google
Cloud Price Comparison - AWS vs Azure vs GoogleCloud Price Comparison - AWS vs Azure vs Google
Cloud Price Comparison - AWS vs Azure vs GoogleRightScale
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAmazon Web Services
 
Serverless with IAC - terraform과 cloudformation 비교
Serverless with IAC - terraform과 cloudformation 비교Serverless with IAC - terraform과 cloudformation 비교
Serverless with IAC - terraform과 cloudformation 비교재현 신
 
AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019Amazon Web Services
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsGuido Schmutz
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Amazon Web Services
 
AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017
AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017
AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017Amazon Web Services Korea
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Institute of Contemporary Sciences
 
고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)
고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)
고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)Amazon Web Services Korea
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsAmazon Web Services
 
서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)
서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)
서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)ChanMin Park
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Lucidworks
 
검색 서비스 간략 교육
검색 서비스 간략 교육 검색 서비스 간략 교육
검색 서비스 간략 교육 Rjs Ryu
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation systemGaurav Sawant
 

La actualidad más candente (20)

AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
AWS 기반 대규모 트래픽 견디기 - 장준엽 (구로디지털 모임) :: AWS Community Day 2017
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Cloud Price Comparison - AWS vs Azure vs Google
Cloud Price Comparison - AWS vs Azure vs GoogleCloud Price Comparison - AWS vs Azure vs Google
Cloud Price Comparison - AWS vs Azure vs Google
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
 
Serverless with IAC - terraform과 cloudformation 비교
Serverless with IAC - terraform과 cloudformation 비교Serverless with IAC - terraform과 cloudformation 비교
Serverless with IAC - terraform과 cloudformation 비교
 
AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)
 
AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017
AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017
AWS 클라우드 비용 최적화를 위한 모범 사례-AWS Summit Seoul 2017
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
 
고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)
고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)
고객 경험을 통한 AWS 클라우드 이전을 위한 지름길 - 김효정 (AWS 솔루션즈 아키텍트)
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundations
 
서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)
서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)
서버리스 기반의 프론트엔드 서버 구축(Serverless frontend web server)
 
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
Galene - LinkedIn's Search Architecture: Presented by Diego Buthay & Sriram S...
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
AWS Elastic Beanstalk
AWS Elastic BeanstalkAWS Elastic Beanstalk
AWS Elastic Beanstalk
 
검색 서비스 간략 교육
검색 서비스 간략 교육 검색 서비스 간략 교육
검색 서비스 간략 교육
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation system
 

Destacado

Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadogalexismidon
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using DatadogMukta Aphale
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDatadog
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleDatadog
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack MattersAmazon Web Services
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoringRohit Jnagal
 
20161108 datadog and_sushi
20161108 datadog and_sushi20161108 datadog and_sushi
20161108 datadog and_sushiMasahiro Hattori
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSylvain Kalache
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights Gunnar Peipman
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016Matthew Broberg
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama SlidesLoris Degioanni
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...Amazon Web Services
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceLN Renganarayana
 

Destacado (20)

Scaling monitoring with Datadog
Scaling monitoring with DatadogScaling monitoring with Datadog
Scaling monitoring with Datadog
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Application Monitoring using Datadog
Application Monitoring using DatadogApplication Monitoring using Datadog
Application Monitoring using Datadog
 
Dataday Texas 2016 - Datadog
Dataday Texas 2016 - DatadogDataday Texas 2016 - Datadog
Dataday Texas 2016 - Datadog
 
Running & Monitoring Docker at Scale
Running & Monitoring Docker at ScaleRunning & Monitoring Docker at Scale
Running & Monitoring Docker at Scale
 
Why Visibility into Your Stack Matters
Why Visibility into Your Stack MattersWhy Visibility into Your Stack Matters
Why Visibility into Your Stack Matters
 
Datadog- Monitoring In Motion
Datadog- Monitoring In Motion Datadog- Monitoring In Motion
Datadog- Monitoring In Motion
 
Datadog + VictorOps Webinar
Datadog + VictorOps WebinarDatadog + VictorOps Webinar
Datadog + VictorOps Webinar
 
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All SlidesCloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
CloudCamp Chicago - Big Data & Cloud May 2015 - All Slides
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
 
20161108 datadog and_sushi
20161108 datadog and_sushi20161108 datadog and_sushi
20161108 datadog and_sushi
 
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the CloudSkynet project: Monitor, analyze, scale, and maintain a system in the Cloud
Skynet project: Monitor, analyze, scale, and maintain a system in the Cloud
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Data Logging and Telemetry
Data Logging and TelemetryData Logging and Telemetry
Data Logging and Telemetry
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
Sysdig Monitorama Slides
Sysdig Monitorama SlidesSysdig Monitorama Slides
Sysdig Monitorama Slides
 
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 

Similar a Lifting the Blinds: Monitoring Windows Server 2012

Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101Quest Software
 
SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management jems7
 
Web Performance Part 3 "Server-side tips"
Web Performance Part 3  "Server-side tips"Web Performance Part 3  "Server-side tips"
Web Performance Part 3 "Server-side tips"Binary Studio
 
Testing pc’s performance lf
Testing pc’s performance lfTesting pc’s performance lf
Testing pc’s performance lfiteclearners
 
Ch14.run time support systems
Ch14.run time support systemsCh14.run time support systems
Ch14.run time support systemsYi-Jun Zheng
 
#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoringchriswoj
 
Optimization In Mobile Systems
Optimization In Mobile SystemsOptimization In Mobile Systems
Optimization In Mobile Systemsmomobangalore
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance TuningBala Subra
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016Mark Smith
 
Testing pc’s performance
Testing pc’s performanceTesting pc’s performance
Testing pc’s performanceiteclearners
 
Windows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementWindows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementKent Huang
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Camilo Alvarez Rivera
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Sql server troubleshooting
Sql server troubleshootingSql server troubleshooting
Sql server troubleshootingNathan Winters
 
How Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherHow Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherCompellent Technologies
 
16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptxMyName1sJeff
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance LectureVishwanath Ramdas
 

Similar a Lifting the Blinds: Monitoring Windows Server 2012 (20)

Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101
 
SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management SharePoint 2013 Performance and Capacity Management
SharePoint 2013 Performance and Capacity Management
 
Web Performance Part 3 "Server-side tips"
Web Performance Part 3  "Server-side tips"Web Performance Part 3  "Server-side tips"
Web Performance Part 3 "Server-side tips"
 
Testing pc’s performance lf
Testing pc’s performance lfTesting pc’s performance lf
Testing pc’s performance lf
 
Ch14.run time support systems
Ch14.run time support systemsCh14.run time support systems
Ch14.run time support systems
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring#SUGCON 2015 Sitecore Monitoring
#SUGCON 2015 Sitecore Monitoring
 
Optimization In Mobile Systems
Optimization In Mobile SystemsOptimization In Mobile Systems
Optimization In Mobile Systems
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance Tuning
 
Big data meet_up_08042016
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
 
SQL 2005 Disk IO Performance
SQL 2005 Disk IO PerformanceSQL 2005 Disk IO Performance
SQL 2005 Disk IO Performance
 
Testing pc’s performance
Testing pc’s performanceTesting pc’s performance
Testing pc’s performance
 
Windows Internal - Ch9 memory management
Windows Internal - Ch9 memory managementWindows Internal - Ch9 memory management
Windows Internal - Ch9 memory management
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01Introductiontoasp netwindbgdebugging-100506045407-phpapp01
Introductiontoasp netwindbgdebugging-100506045407-phpapp01
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Sql server troubleshooting
Sql server troubleshootingSql server troubleshooting
Sql server troubleshooting
 
How Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work TogetherHow Data Instant Replay and Data Progression Work Together
How Data Instant Replay and Data Progression Work Together
 
16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx
 
Application Performance Lecture
Application Performance LectureApplication Performance Lecture
Application Performance Lecture
 

Más de Datadog

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderDatadog
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Datadog
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog Datadog
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Datadog
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Datadog
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as GarbageDatadog
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsDatadog
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLDatadog
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) dataDatadog
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analyticsDatadog
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developersDatadog
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportDatadog
 
I <3 graphs in 20 slides
I <3 graphs in 20 slidesI <3 graphs in 20 slides
I <3 graphs in 20 slidesDatadog
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsDDatadog
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painDatadog
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoringDatadog
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based MonitoringDatadog
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toDatadog
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerDatadog
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcDatadog
 

Más de Datadog (20)

What it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service ProviderWhat it Means to be a Next-Generation Managed Service Provider
What it Means to be a Next-Generation Managed Service Provider
 
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015
 
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog PyData NYC 2015 - Automatically Detecting Outliers with Datadog
PyData NYC 2015 - Automatically Detecting Outliers with Datadog
 
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015
 
Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015Monitoring Docker containers - Docker NYC Feb 2015
Monitoring Docker containers - Docker NYC Feb 2015
 
Treating Infrastructure as Garbage
Treating Infrastructure as GarbageTreating Infrastructure as Garbage
Treating Infrastructure as Garbage
 
Events and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of WebopsEvents and metrics the Lifeblood of Webops
Events and metrics the Lifeblood of Webops
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQL
 
Big (IT) data
Big (IT) dataBig (IT) data
Big (IT) data
 
Deep dive into Nagios analytics
Deep dive into Nagios analyticsDeep dive into Nagios analytics
Deep dive into Nagios analytics
 
Just enough web ops for web developers
Just enough web ops for web developersJust enough web ops for web developers
Just enough web ops for web developers
 
Customer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer supportCustomer Ops: DevOps <3 customer support
Customer Ops: DevOps <3 customer support
 
I <3 graphs in 20 slides
I <3 graphs in 20 slidesI <3 graphs in 20 slides
I <3 graphs in 20 slides
 
Effective monitoring with StatsD
Effective monitoring with StatsDEffective monitoring with StatsD
Effective monitoring with StatsD
 
Alerting: more signal, less noise, less pain
Alerting: more signal, less noise, less painAlerting: more signal, less noise, less pain
Alerting: more signal, less noise, less pain
 
Fact based monitoring
Fact based monitoringFact based monitoring
Fact based monitoring
 
Fact-Based Monitoring
Fact-Based MonitoringFact-Based Monitoring
Fact-Based Monitoring
 
Monitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-toMonitoring NGINX (plus): key metrics and how-to
Monitoring NGINX (plus): key metrics and how-to
 
What’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike FiedlerWhat’s in this Cookbook? - Mike Fiedler
What’s in this Cookbook? - Mike Fiedler
 
I Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-QuôcI Love Graphs - Alexis Lê-Quôc
I Love Graphs - Alexis Lê-Quôc
 

Último

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 

Último (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 

Lifting the Blinds: Monitoring Windows Server 2012

  • 1. Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/ g the Blinds: Monitoring Windows Server
  • 2. • SaaS based infrastructure and app monitoring • Open Source Agent • Time series data (metrics and events) • Processing nearly a trillion data points per day • Intelligent Alerting and Insightful Dashboards Datadog Overview
  • 3. Operating Systems, Cloud Providers (AWS), Containers, Web Servers, Datastores, Caches, Queues and more... Monitor Everything
  • 4. Agenda - Why should I monitor Windows Server? - What are some indicators of performance issues? - How can I collect performance metrics for analysis?
  • 5.
  • 7.
  • 8. CPU metrics - PercentProcessorTime - ContextSwitchesPersec - ProcessorQueueLength - DPCsQueuedPersec - PercentPrivilegedTime - PercentDPCTime - PercentInterruptTime
  • 9. CPU: ContextSwitchesPersec What it tracks: Number of times the processor switched to a new thread Correlate with: Memory: PageFaultsPersec Disk: DiskTransfersPersec Network: BytesSentPersec/BytesReceivedPersec Issue resolution: Adding processors, thread partitioning, DPC partitioning, hardware interrupt partitioning, disable I/O counters
  • 10. CPU: PercentProcessorTime What it tracks: Percentage of time spent performing work (not idle) Correlate with: ProcessorQueueLength Issue resolution: More processors, bigger instance, optimize offending application,
  • 11. CPU: ProcessorQueueLength What it tracks: Size of processor queue Correlate with: CPU: PercentProcessorTime, PercentPrivilegedTime, PercentDPCTime, PercentInterruptTime Issue resolution: Adding processors, thread partitioning, DPC partitioning, hardware interrupt partitioning, disable I/O counters
  • 12. CPU:DPCsQueuedPersec What it tracks: Deferred procedure call (DPC) enqueue rate Correlate with: CPU: PercentDPCTime Disk: DiskTransfersPersec Network: BytesSentPersec/BytesReceivedPersec Issue resolution: Remove buggy device, rollback driver
  • 13. CPU: PercentPrivilegedTime/PercentDPCTime PercentInterruptTime What they track: Percentage of time CPU spent in privileged mode/deferred procedure calls/interrupts Correlate with: ContextSwitchesPersec/PercentPrivilegedTime/PercentDPCTime PercentInterruptTime Issue resolution: Adding processors, thread partitioning, DPC partitioning, hardware interrupt partitioning, disable I/O counters
  • 14. Memory metrics - PoolNonpagedBytes - PageFaultsPersec - PagesInputPersec
  • 15. Memory: PoolNonpagedBytes What it tracks: Amount of non-paged memory in use Correlate with: Windows Event 2019 “Nonpaged Memory Pool Empty” Issue resolution: Identify troublesome driver/roll back to known good state
  • 16. What it tracks: Rate of page faults Correlate with: PagesInputPersec Issue resolution: Increase system memory Memory: PageFaultsPersec
  • 17. What it tracks: Rate pages are read (from disk) into memory Correlate with: PageFaultsPersec/ DiskTransfersPersec Issue resolution: Increase system memory, move page file to separate physical disk Memory: PagesInputPersec
  • 18. - AvgDiskQueueLength - DiskTransfersPersec - PercentIdleTime Disk Metrics
  • 19. Disk: AvgDiskQueueLength What it tracks: Running average of I/O ops in queue Correlate with: DiskTransfersPersec Issue resolution: Move data for I/O-intensive applications to separate disk; add disks to syste
  • 20. Disk: DiskTransfersPersec What it tracks: Aggregate I/O rate Correlate with: AvgDiskQueueLength Issue resolution: Move data for I/O-intensive applications to separate disk; add disks to system; increase disk cache
  • 21. Disk: PercentIdleTime What it tracks: Percent of time disk is idle Correlate with: AvgDiskQueueLength Issue resolution: Move page file to separate disk; add disks to system; use SSDs
  • 24. Powershell - Windows’ scripting language (no more batch files!) - Powerful language with deep OS support - Integrates with C# natively - Output is typed (unlike *NIX)
  • 28. Windows Performance Toolkit Requires Windows Assessment and Deployment Kit (formerly Windows Performance Toolkit) https://www.microsoft.com /en- US/download/details.aspx ?id=39982
  • 30. Questions? Evan Mouzakitis Research Engineer Twitter: @vagelim Email: evan@datadoghq.com Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/

Notas del editor

  1. Our goal is to help you monitor everything from all levels of your stack so that you can make intelligent data based decisions about your applications and infrastructure.
  2. Why monitor Windows in the first place? Monitoring the performance of the applications that run your business is critical; but applications don’t live in a vacuum. Applications interact with the underlying operating system often to, request resources, preempt the execution of other processes, access hardware devices, and more. Being aware of the health and performance of the operating system gives you more information when troubleshooting issues anywhere higher up in the stack (not to mention that monitoring the operating system is critical for insight into hardware issues). For example, is a SQL Server database query slow because of the query itself, or because the SQL Server is also hosted alongside Exchange and they are competing for disk access? These kinds of issues can only be surfaced when you monitor both the application in question and the underlying operating system.
  3. A monitoring plan typically tries to cover Work metrics, Resource metrics, and non-metric data like events or code changes. As the broker between applications and hardware resources, when monitoring Windows server we are primarily focused on resource metrics, because that is what the operating system is managing. Work metrics are usually more applicable to application-level monitoring, but as you will see there are a few work metrics related to disk access that we’ll cover here too.
  4. What kind of resources are we interested in monitoring? What kinds of metrics can we surface from those resources? Generally speaking, the most useful resources to monitor are CPU, RAM, disk, and network. Things like power consumption, thermal monitoring, noise and data of a similar nature, while useful, don’t usually add meaningful context to application or operating system performance issues.
  5. At the highest level, the following metrics are useful in assessing CPU performance, and can shed light on performance bottlenecks depending on what the kind of work the CPU spends most of its time performing.
  6. ContextSwitchesPersec tracks the number of times the processor switched to a new execution context. Context switches are computationally expensive; before the processor can enter the execution context of another thread, it must first save the current context, push the old context to the bottom of its priority queue, find the highest priority queue containing an executable thread, pop it from its queue, load its context, and finally execute the thread. In a multi-core machine (common today), context switching add significant overhead. By default, the Windows Task manager measures I/O per-process, and attributing I/O to a particular process in a multi-core multithreaded environment can have a drastic performance impact under heavy I/O loads. If that’s the case, you would benefit from disabling global and per-process I/O counters by adding a CountOperations entry as a REG_DWORD with a value of 0 to the registry under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\I/O System\
  7. PercentProcessorTime is a metric most everyone is familiar with, even if they don’t know the name. It tracks the percentage of time the CPU was doing something. In and of itself, this metric isn’t all that useful. For example, if I’m analyzing data on a single core machine, I’d expect the CPU to in use 100% of the time. However, when correlated with ProcessorQueueLength, which tracks the number of pending threads, you have enough information to determine whether or not the system is suffering a CPU bottleneck. A queue length greater than 2 * the number of processors, coupled with prolonged periods of maxed out CPU utilization very clearly indicate that the system does not have enough processor resources to perform all of its tasks.
  8. The processor queue length is a value which reflects the number of threads that are ready to run, but are not able to use the processor. A healthy measure of processor queue length is about 2 * the number of processors on the system. Even on multicore machines, there is only one processorqueuelength performance counter. High values for this counter very clearly indicate CPU contention. You can correlate this metric with other CPU metrics like PercentProcessorTime, PercentPrivilegedTime, PercentDPCTime, PercentInterruptTime to determine where the CPU is spending its time, and to narrow down if the CPU is the bottleneck causing backed up queue.
  9. Hardware requirements demand real-time, unfettered access to the CPU in order to ensure that high-priority work (like accepting keyboard input) is performed when it is needed. Interrupts provide a means by which devices can interrupt the processor and force it to perform the requested operation (triggering the processor to perform a context switch). Some work from devices may be put off until later, but still must be accomplished in a timely manner. Enter DPCs. Through DPCs, real-time processes like device drivers can schedule lower-priority tasks to be completed after higher-priority interrupts are handled. DPCs are created by the kernel, and can only be called by kernel mode programs. A large or near-constant number of DPCs could point to issues with low-level system software. An unused but buggy sound driver could be the culprit, for example.
  10. This trio of metrics, taken together, help to shed light on where the CPU is spending its time. In particular, privileged time reflects the time spent executing instructions for kernel-mode programs. Code executing in privileged mode have unrestricted access to the system’s hardware. This includes device drivers, core operating system functions, etc. If you observe a system spending 30 percent or more of its time processing privileged instructions, check the values of PercentDPCTime and PercentInterruptTime. If either of those two metrics report values greater than 20%, it is likely that a poorly written device driver, or very busy peripheral is the culprit.
  11. As with CPU metrics, Windows exposes a wealth of performance counters tracking memory statistics. We’ve omitted AvailableMemory and similar metrics from this webinar because they are pretty self-explanatory. The three listed here, PageFaultsPersec, PoolNonpagedBytes, and PagesInputPersec provide insight into the nature of issues which may be impacting performance. We’ll touch on each in turn, but at a high level, PageFaultsPersec tracks the rate of page faults, PoolNonpagedBytes describes the current size of non-pageable memory, and the last, PagesInputPersec, describes the rate of pages read from disk (which is distinct from the number of page reads from disk).
  12. Windows maintains two general pools of memory: a paged pool and non paged pool. The paged pool is for general use and is the pool used by all user space applications for memory allocation. Because user space applications are more tolerant to latency, or, to put it another way, because user space applications don’t generally have real-time requirements, they can get by if the requested memory needs to be read in (or paged in) from disk. Because kernel-level software has real-time execution requirements, device drivers and the like make use of the non paged pool. The non paged pool is guaranteed to reside in physical memory at all times, with no possibility of being paged to disk (hence the name “non paged”). This significantly reduces latency by preventing the possibility of page faults. No memory pool is infinite, and poorly written device drivers could end up exhausting the entire non paged pool if left unchecked. If you are seeing reports of Event 2019, it’s already too late. But keeping an eye on the size of this pool and its growth over time are necessary to identify and deal with any troublesome drivers or hardware.
  13. Page faults occur when a thread references a page that is not in the current set of memory-resident pages. Because the thread can’t perform its work without the requested memory, a hardware interrupt occurs, the processor enters into kernel-mode (resulting in a context switch—both upon entering and exiting kernel-mode), and attempts to locate the page in memory. If the page is found somewhere else in memory, it is that address which is returned to the requesting thread. This is called a “soft” page fault. If the page is not elsewhere in memory the kernel will look in the page file and read it into memory. This is called a “hard” page fault. Because this operation requires accessing the disk, it is more computationally expensive to perform this type of lookup. Page faults occur under normal operating conditions, but a spike in page faults could result in serious performance degradation, depending on the “hardness” of the fault. By tracking the page fault rate alongside the page input rate, you can differentiate between hard and soft page faults. High values of both metrics unequivocally indicate hard page faults. There’s not much you can do to prevent soft page faults from occurring, but increasing the amount of RAM available on the system is a straightforward way of alleviating hard page faults. It is worth mentioning that when a hard page fault does occur, Windows attempts to retrieve multiple, contiguous pages into memory, to maximize the work performed by each read. This, in turn, can potentially increase a page fault’s performance impact, as more disk bandwidth is consumed reading in potentially unneeded pages. All of this can potentially be avoided by putting your page file (see next section) on a separate physical (not logical) disk, or increasing the amount of RAM available to your system.
  14. As I mentioned, there are two types of page faults, and tracking PagesInputPersec alongside PageFaultsPersec gives you the information you need to determine the type of page fault occurring. If you are seeing high values of both metrics, the page faults are hard. The effects of hard page faults can be exacerbated if disk is a contentious resource. To give a simplified example, if your have a system with one disk and it’s running an I/O intensive application, page faults will hit this system harder (and performance will degrade in the application) because Windows is competing with the application for disk access (and Windows always wins). This goes to show that an excessive number of page faults can be responsible for system wide effects, completely unrelated to the application experiencing performance degradation.
  15. Though there are many disk metrics worth tracking, I’ve distilled the list to the most essential, while omitting the obvious, like PercentFreeSpace.
  16. The AvgDiskQueueLength counter gives an estimated average of the number of I/O operations currently awaiting execution. Generally speaking, this counter should not exceed 2 * the number of drives on the system. If you are seeing greater values than that, it means the system cannot service the number of I/O requests it’s receiving in a timely manner, which can lead to processing delays, degraded application performance, and more.
  17. DiskTransfersPersec is an aggregate measure of both disk reads and writes. It is useful for shedding light on the cause of bottlenecks. High values for this metric do not always indicate issues; for example if you are running I/O intensive applications on your server you are definitely going to observe high values for this metric (and most likely for PercentIdleTime as well). However, if I/O ops are not being enqueued (per the AvgDiskQueueLength metric) and applications are not hurting for memory (and thus paging to disk), there should be no observable performance impact.
  18. PercentIdleTime is a pretty intuitive metric that tracks the percent of time disks are idle. Depending on the role of the system under investigation, low idle times may be expected, especially for when running I/O intensive applications like SQL Server or Exchange. If that’s not the case, low values should be investigated. If you don’t already have your page file stored on a separate drive, you should do so. Otherwise, consider either adding disks to the system to increase performance, or swap out HDDs for SSDs if possible.
  19. Windows offers numerous methods by which you can collect, store, and visualize system performance data. Because the methods are so varied, I will only go through a couple of the tools that I have experience with. All of the tools mentioned are native to Windows Server 2012 R2 so you can get up and running quickly.
  20. Reading performance counters does not generally appear to have much of an impact on system performance. In my tests, collecting 2631 counters with 1-second sample rate caused a 4 percent increase in user CPU usage (by perfmon). There are a few things to keep in mind, though: depending on the data collected and the duration of the collection, the collected data could be very large. To give you an idea about the size of the data collected, in a test collecting handle and kernel base events, pagefaults, cpu, I/O and memory samples, the data grew at a rate approaching 100 MB/min. Additionally, if you are collecting data from your local machine, you may see occasional spikes in I/O latency; in my tests I observed response times for some user space applications in excess of 2000 ms! Also, I did not attempt to collect performance counters from user applications which may have an impact on the application’s performance. And as I mentioned earlier in the CPU section, if you are sampling I/O with processor-specific information, you most certainly will observe degradation in performance.
  21. Powershell is great for collecting performance counters programmatically. You can query the event log from powershell as well. You can use powershell to collect metrics from local and remote machines.
  22. Here are some example powershell commands for retrieving CPU-related performance counters. As you can see, there is a regular pattern. For a full list of commands to retrieve performance counters for CPU, memory, disk, network, and events, check out my “How to collect Windows Server 2012 metrics” article on the datadog blog. https://www.datadoghq.com/blog/collect-windows-server-2012-metrics/#toc-powershell
  23. Last thing about powershell, if you want to do something in powershell and there’s no pre-packaged cmdlet to get you what you want, you can always interface with WMI to get what you’re looking for.
  24. In my honest opinion, perfmon is not nearly as useful as xperf or Windows Performance Recorder when it comes to investigating performance issues. It is a good tool to help spot issues, but not so good for getting into the nitty gritty. Here’s a screenshot of perfmon collecting “System Performance counters” a counter set provided out of the box. As you can see, there is a lot going on. My investigation was focusing on the cause of excessive memory use, visualized as the black bar nearly pinned to the 100 mark. From this image it’s clear that something is going on, but since I was only collecting the Total memory usage (as opposed to collection per-process), it isn’t clear which process is exhausting RAM. To determine the underlying cause in this case requires me to re-run perfmon, this time collecting per-process counters in addition to the total, and hoping that my issue arises again. As you’re about to see, we can do better.
  25. The Windows performance toolkit contains the Windows Performance Recorder & Windows Performance Analyzer (WPA). Though technically not strictly “native” since it requires a download, it is a useful, graphical tool for collecting and analyzing windows performance data and is made by Microsoft.
  26. Windows performance recorder is a modern replacement for xperf. It features both graphical and command line interfaces. Here you can see the available collection profiles. Collecting data with the Windows Performance Recorder is as easy as clicking “Start”. Technically, Windows Performance Recorder (and xperf) do not merely collect performance counters; they are a tracing mechanism for collecting fine-grained performance data. As you will see, traces are superior to performance counters when investigating performance issues.