The document discusses Microsoft 365 Product Assurance and Validation Checks (PAVC) scanning, which scans Microsoft service machines to ensure security patches are up to date. It describes how the PAVC scanning infrastructure has scaled to support the rapid growth of M365, including transitioning from unreliable internal tools to scalable cloud services. Recommendations are provided for using data to support further engineering scalability and intelligent monitoring across large numbers of machines and services.
2. Agenda
• Introduction to PAVCscanning
• Comparison for PAVC in old days VS now
• How to use data to support engineering scalability
• Provide recommendations and best practices
• Q&A
4. What is PAVC and what it has achieved
• PAVC is one of the M365 compliance requirements which scans all
service machines residing in different environments to ensure
security patches up to date.
• With M365 rapid growth, the current scanning coverage of 1 million
machines might grow to 10 million machines soon
• PAVC Scanning infrastructure is capable to seamlessly scale to more
Azure resources (recent SharePoint 200,000 hosts adding in)
5. M365 PAVC in OldDaysI
With a rapid growing of M365 Office customers, the security
scanning coverage has been tripled in years and it keeps scaling.
Making all Office service machines compliantand security
patchingup to date within different product environmentsis
challengingand require a growing mindset and scalable
engineering solution. In this session, we introduce approaches
and security scanning infrastructures we buildto support large
scale of service machines. We will discuss how to detect
unhealthyscanners and hosts across M365 services and how to
make monitoring and alerts intelligentand action based.
• Network scanner set up (need domain privilege)
• Azure is not ready instead of using internal tools
• ODL
• Cosmos and Sangam scheduler
• SLAM Scheduler
• Batching process
• Scanning happens once a day
• ODL not reliable
• Cosmos job started every few hours
• Power BI on top for reporting
6. M365 PAVC in OldDaysII
With a rapid growing of M365 Office customers, the security
scanning coverage has been tripled in years and it keeps scaling.
Making all Office service machines compliantand security
patchingup to date within different product environmentsis
challengingand require a growing mindset and scalable
engineering solution. In this session, we introduce approaches
and security scanning infrastructures we buildto support large
scale of service machines. We will discuss how to detect
unhealthyscanners and hosts across M365 services and how to
make monitoring and alerts intelligentand action based.
• A machine got a security patch and will take days to get the result
• Too many delays
• ODL (not stable and unreliable)
• Cosmos and Sangam scheduler
• SLAM Scheduler (nobody maintains)
• When things go wrong, lots of trouble shooting
• Scanning happens once a day
• ODL not reliable
• Cosmos job started every few hours
• Power BI on top for reporting
7. M365 PAVC Now
With a rapid growing of M365 Office customers, the security
scanning coverage has been tripled in years and it keeps scaling.
Making all Office service machines compliantand security
patchingup to date within different product environmentsis
challengingand require a growing mindset and scalable
engineering solution. In this session, we introduce approaches
and security scanning infrastructures we buildto support large
scale of service machines. We will discuss how to detect
unhealthyscanners and hosts across M365 services and how to
make monitoring and alerts intelligentand action based.
• Move away internal solutions to public Azure solution
• Good reliabilityand support
• Move away from batch processing to continuous (Event) driven
processing
• A machine get a security patch
• The agent will start security scan
• Whenever an output is ready,it will be sent to PAVC Cloud service
• Saved to blob and consumed by reporting team
• When things go wrong, easy to investigate
10. Call to Action
• Small team can do big things with a scalable solution
• Fine-granularity based monitors make scalability grounded
• Multi-dimensional data speed up incident recovery
• Leverage the work to third-parties
• Take fully advantage of cloud infrastructure and alerting system
13. Introduction to PAVC Scanning
An ant may well destroy a whole dam.
• Why do we need PAVC?
千里之堤,毁于蚁穴
An ant may well
destroy a while dam.
14. Introducing the
PAVC Agent
❖ Network scanner
dedicated box with IP range to
be scanned
❖ PAVC agent
component installed on each
target machine
• What made us reconsider?
1 Security aspect
• NW needs admin forest rights
2 Reliability
• NW environment
• firewall
3 Management– NW scanners
require management
• load balancing
• redistribution in case of failure
• IP ranges management
15. PAVC Agent
11/10/2017 M365 PAVC 15
Adopted model – client
agent + backend
architecture
Many other agent
examples, LAM, ODL,
Geneva etc...
What makes us stand
out?
Q: Is your service
running on a compliant
platform?
Q: Agent health?
Ans: PAVC provides and
ensures end-to-end
compliance including
client agent health – no
gray areas!
Example: ODL agent.
Who is responsible for
compliance and issue
investigation? ODL
team? Workload team?
16. Need for
data support
Main focus:
❖ PROD readiness of the new
pipeline
❖ Quick turnaround in case of
failures
❖ PAVC infra enhancement and
future features
Data support planning:
1 Top level infra health
• Agent install health
• Scan success vs failure, SCAP
and Vuln scan health
2 Regression
• Host count
• NW scanner health (legacy)
3 Scan Quality
• OS detection, Outdated
Audits, AV, Scan Alerts