SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
a study of
our DNS full-resolvers
Matsuzaki ‘maz’ Yoshinobu
<maz@iij.ad.jp>
Topic for today
• Lesson learned from an outage of our full-resolvers
• An interesting behavior of clients
Users and DNS full-resolver
full-resolver
(cache nameserver)
Most users are using ISP’s full-
resolvers as those information
are provided automatically
maz@iij.ad.jp 3
DNS cache nameserver
• Usually ISPs provide 2 nameservers for customers
• Just in case
• Our assumptions here:
• Even single server was failed, another server can handle
DNS queries
• Users somehow automatically pick an usable one up for
their use
maz@iij.ad.jp 4
In 2009, we had a trouble
• Trouble on cache nameservers for consumers
• Apr, 2009
• On two (all) nameservers
• 1st failure happened on
a server (ns01)
• then 2nd failure happened
on another server (ns11)
• About 12min blackout
maz@iij.ad.jp 5
Failures on our cache nameservers
• ns01: 17:14:26 - 17:48:07 (33min14sec)
• ns11: 17:35:51 - 17:48:52 (13min01sec)
• During the both servers were in trouble
(12min16sec), our users couldn’t resolve
hostnames
• During this trouble, the servers couldn’t answer
14,005,644 DNS queries
maz@iij.ad.jp 6
The query graph
maz@iij.ad.jp 7
Before failure
• Clients prefer to use ns01
• Order of configuration?
• Clients sent DNS queries to
another server as well
• measuring delays?
• just in case?
maz@iij.ad.jp 8
During single failure
• DNS queries to ns01 were
discarded during this period
• It seems users could still resolve
hostnames as the ns11 was alive
• No strange traffic pattern here
• Users might feel some delays
• A bit higher rate of queries in the
first 3min, and then ‘stable’ state
maz@iij.ad.jp 9
During single failure (cont.
• Query rate(ns01+ns11) looks
almost the same as before
• Even though ns01 was discarding
queries during this period
• Probably most clients usually send
DNS queries to both nameservers
maz@iij.ad.jp 10
During double failure (outage)
• Users couldn’t resolve hostnames
at all during this period
• Query rate suddenly increased on
both nameservers
• Those are all discarded though
• Mostly because of ‘retries’
• We observed multiple queries that
has the same QNAME
maz@iij.ad.jp 11
Restoration
• Once ns01 was restored, it got
about 7times more queries than
usual for several seconds
• Web pages are composed by many
“modules”
• Single web page makes several and
more DNS queries sometimes
• Browsers’ prefetch function
• 12min was enough to flush
clients’ side DNS cache
maz@iij.ad.jp 12
Restora(on (cont.
• Then ns11 was also restored
• It also got higher rate of queries for
several seconds
• Gradually the query rate were
getting ‘normal’ state as same as
the before
maz@iij.ad.jp 13
Lesson learned
• Single server failure will not cause a disaster, when
users configure multiple DNS cache servers on their
device
• Probably the impact could be negligible
• During double failure (full-outage), nameservers
got more queries
• Once a server is restored from full-outage, it gets
higher query rate for a while
• In our case, 7 times more than usual for several seconds
maz@iij.ad.jp 14
Redundancy is important
• DNS resolving works somehow as long as one of servers
is functional on each part of DNS
• Full-resolvers (caching nameservers)
• Authoritative nameservers
• Do have redundancy, avoid outage
• A multiple server deployment works well
• IP anycast would be also useful
• my bdnog7 talk - https://www.slideshare.net/bdnog/ip-anycasting
• Once outage, we should expect a large amount of
queries during and just after the outage
• A warning for those who has a security device in front of
nameservers
Users and DNS full-resolver
full-resolver
(cache nameserver)
Many devices
on a home network
maz@iij.ad.jp 16
Usual graph - 5min average
measurement date: 2018/05/16
A bit different view - 1sec average
measurement date: 2018/05/16
Peaks on the hour
• Minor peaks on the hour and half (like 07:30)
• “Alarm clock” wakes up the phone itself
• Some applications are also initiated by the wakeup
• I guess those are mostly coming from smartphones
• QNAMEs also hint
A spike at 15sec before the hour
measurement date: 2018/05/16
we’ve something different now
measurement date: 2019/10/25
Summary
• It’s reasonable for us to provide 2 full-resolvers
(caching nameservers) to customers
• Clients seem to have the ability to use a functional one
• Once outage, we should expect a large amount of
queries during and just after the outage
• A warning to those who has a security device in front of
nameservers
• Clients are synced up unintentionally
• ‘alarm clock’ or scheduled tasks
• This particular case is not an issue at this moment, but
it’s worth to pay attention to those behaviors
maz@iij.ad.jp 22

Más contenido relacionado

La actualidad más candente

Dns introduction
Dns   introduction Dns   introduction
Dns introduction
sunil kumar
 

La actualidad más candente (20)

Introduction to DNS
Introduction to DNSIntroduction to DNS
Introduction to DNS
 
Dnssec
DnssecDnssec
Dnssec
 
Namespaces for Local Networks
Namespaces for Local NetworksNamespaces for Local Networks
Namespaces for Local Networks
 
Windows 2012 and DNSSEC
Windows 2012 and DNSSECWindows 2012 and DNSSEC
Windows 2012 and DNSSEC
 
Get your instance by name integration of nova, neutron and designate
Get your instance by name  integration of nova, neutron and designateGet your instance by name  integration of nova, neutron and designate
Get your instance by name integration of nova, neutron and designate
 
Designate - DNSaaS for OpenStack - FOSDEM 2014
Designate - DNSaaS for OpenStack - FOSDEM 2014Designate - DNSaaS for OpenStack - FOSDEM 2014
Designate - DNSaaS for OpenStack - FOSDEM 2014
 
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]
DNSSEC Tutorial, by Champika Wijayatunga [APNIC 38]
 
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOSPart 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
Part 3 - Local Name Resolution in Linux, FreeBSD and macOS/iOS
 
Implementing Domain Name
Implementing Domain NameImplementing Domain Name
Implementing Domain Name
 
Grey H@t - DNS Cache Poisoning
Grey H@t - DNS Cache PoisoningGrey H@t - DNS Cache Poisoning
Grey H@t - DNS Cache Poisoning
 
Dns introduction
Dns   introduction Dns   introduction
Dns introduction
 
DNS for Developers - NDC Oslo 2016
DNS for Developers - NDC Oslo 2016DNS for Developers - NDC Oslo 2016
DNS for Developers - NDC Oslo 2016
 
Best Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle DatabaseBest Practices - PHP and the Oracle Database
Best Practices - PHP and the Oracle Database
 
OARC 31: NSEC Caching Revisited
OARC 31: NSEC Caching RevisitedOARC 31: NSEC Caching Revisited
OARC 31: NSEC Caching Revisited
 
DNSSEC signing Tutorial
DNSSEC signing Tutorial DNSSEC signing Tutorial
DNSSEC signing Tutorial
 
Windows Server 2016 Webinar
Windows Server 2016 WebinarWindows Server 2016 Webinar
Windows Server 2016 Webinar
 
Domain name system
Domain name systemDomain name system
Domain name system
 
bdNOG 7 - Re-engineering the DNS - one resolver at a time
bdNOG 7 - Re-engineering the DNS - one resolver at a timebdNOG 7 - Re-engineering the DNS - one resolver at a time
bdNOG 7 - Re-engineering the DNS - one resolver at a time
 
DNS High-Availability Tools - Open-Source Load Balancing Solutions
DNS High-Availability Tools - Open-Source Load Balancing SolutionsDNS High-Availability Tools - Open-Source Load Balancing Solutions
DNS High-Availability Tools - Open-Source Load Balancing Solutions
 
DNS Vulnerabilities
DNS VulnerabilitiesDNS Vulnerabilities
DNS Vulnerabilities
 

Similar a A study of our DNS full-resolvers

Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health Check
Luis Guirigay
 
Whalebone-UKNOF44security992_new_impl.pptx
Whalebone-UKNOF44security992_new_impl.pptxWhalebone-UKNOF44security992_new_impl.pptx
Whalebone-UKNOF44security992_new_impl.pptx
Ans Sembiring
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
KoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just beganKoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just began
Tobias Koprowski
 

Similar a A study of our DNS full-resolvers (20)

DNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and ResponseDNS in IR: Collection, Analysis and Response
DNS in IR: Collection, Analysis and Response
 
NZNOG 2013 - Experiments in DNSSEC
NZNOG 2013 - Experiments in DNSSECNZNOG 2013 - Experiments in DNSSEC
NZNOG 2013 - Experiments in DNSSEC
 
Got Problems? Let's Do a Health Check
Got Problems? Let's Do a Health CheckGot Problems? Let's Do a Health Check
Got Problems? Let's Do a Health Check
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
The latest news in the DNS resolution: DNSSEC
The latest news in the DNS resolution: DNSSECThe latest news in the DNS resolution: DNSSEC
The latest news in the DNS resolution: DNSSEC
 
Whalebone-UKNOF44security992_new_impl.pptx
Whalebone-UKNOF44security992_new_impl.pptxWhalebone-UKNOF44security992_new_impl.pptx
Whalebone-UKNOF44security992_new_impl.pptx
 
DevOps throughout time
DevOps throughout timeDevOps throughout time
DevOps throughout time
 
Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
DNS Security
DNS SecurityDNS Security
DNS Security
 
Best And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM ConnectionsBest And Worst Practices Deploying IBM Connections
Best And Worst Practices Deploying IBM Connections
 
DINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNS
DINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNSDINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNS
DINR 2021 Virtual Workshop: Passive vs Active Measurements in the DNS
 
PLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam Obszyński
PLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam ObszyńskiPLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam Obszyński
PLNOG14: DNS, czyli co nowego w świecie DNS-ozaurów - Adam Obszyński
 
Signing DNSSEC answers on the fly at the edge: challenges and solutions
Signing DNSSEC answers on the fly at the edge: challenges and solutionsSigning DNSSEC answers on the fly at the edge: challenges and solutions
Signing DNSSEC answers on the fly at the edge: challenges and solutions
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
2 technical-dns-workshop-day1
2 technical-dns-workshop-day12 technical-dns-workshop-day1
2 technical-dns-workshop-day1
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Custom coded projects
Custom coded projectsCustom coded projects
Custom coded projects
 
ION Islamabad - Deploying DNSSEC
ION Islamabad - Deploying DNSSECION Islamabad - Deploying DNSSEC
ION Islamabad - Deploying DNSSEC
 
KoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just beganKoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just began
 

Más de Bangladesh Network Operators Group

Más de Bangladesh Network Operators Group (20)

Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephAccelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
 
Recent IRR changes by Yoshinobu Matsuzaki, IIJ
Recent IRR changes by Yoshinobu Matsuzaki, IIJRecent IRR changes by Yoshinobu Matsuzaki, IIJ
Recent IRR changes by Yoshinobu Matsuzaki, IIJ
 
Fact Sheets : Network Status in Bangladesh
Fact Sheets : Network Status in BangladeshFact Sheets : Network Status in Bangladesh
Fact Sheets : Network Status in Bangladesh
 
AI Driven Wi-Fi for the Bottom of the Pyramid
AI Driven Wi-Fi for the Bottom of the PyramidAI Driven Wi-Fi for the Bottom of the Pyramid
AI Driven Wi-Fi for the Bottom of the Pyramid
 
IPv6 Security Overview by QS Tahmeed, APNIC RCT
IPv6 Security Overview by QS Tahmeed, APNIC RCTIPv6 Security Overview by QS Tahmeed, APNIC RCT
IPv6 Security Overview by QS Tahmeed, APNIC RCT
 
Network eWaste : Community role to manage end of life Product
Network eWaste : Community role to manage end of life ProductNetwork eWaste : Community role to manage end of life Product
Network eWaste : Community role to manage end of life Product
 
A plenarily integrated SIEM solution and it’s Deployment
A plenarily integrated SIEM solution and it’s DeploymentA plenarily integrated SIEM solution and it’s Deployment
A plenarily integrated SIEM solution and it’s Deployment
 
IPv6 Deployment in South Asia 2022
IPv6 Deployment in South Asia  2022IPv6 Deployment in South Asia  2022
IPv6 Deployment in South Asia 2022
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
RPKI Deployment Status in Bangladesh
RPKI Deployment Status in BangladeshRPKI Deployment Status in Bangladesh
RPKI Deployment Status in Bangladesh
 
An Overview about open UDP Services
An Overview about open UDP ServicesAn Overview about open UDP Services
An Overview about open UDP Services
 
12 Years in DNS Security As a Defender
12 Years in DNS Security As a Defender12 Years in DNS Security As a Defender
12 Years in DNS Security As a Defender
 
Contents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceContents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User Experience
 
BdNOG-20220625-MT-v6.0.pptx
BdNOG-20220625-MT-v6.0.pptxBdNOG-20220625-MT-v6.0.pptx
BdNOG-20220625-MT-v6.0.pptx
 
Route Leak Prevension with BGP Community
Route Leak Prevension with BGP CommunityRoute Leak Prevension with BGP Community
Route Leak Prevension with BGP Community
 
Tale of a New Bangladeshi NIX
Tale of a New Bangladeshi NIXTale of a New Bangladeshi NIX
Tale of a New Bangladeshi NIX
 
MANRS for Network Operators
MANRS for Network OperatorsMANRS for Network Operators
MANRS for Network Operators
 
Re-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaRe-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with Grafana
 
RPKI ROA updates
RPKI ROA updatesRPKI ROA updates
RPKI ROA updates
 
Blockchain Demystified
Blockchain DemystifiedBlockchain Demystified
Blockchain Demystified
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

A study of our DNS full-resolvers

  • 1. a study of our DNS full-resolvers Matsuzaki ‘maz’ Yoshinobu <maz@iij.ad.jp>
  • 2. Topic for today • Lesson learned from an outage of our full-resolvers • An interesting behavior of clients
  • 3. Users and DNS full-resolver full-resolver (cache nameserver) Most users are using ISP’s full- resolvers as those information are provided automatically maz@iij.ad.jp 3
  • 4. DNS cache nameserver • Usually ISPs provide 2 nameservers for customers • Just in case • Our assumptions here: • Even single server was failed, another server can handle DNS queries • Users somehow automatically pick an usable one up for their use maz@iij.ad.jp 4
  • 5. In 2009, we had a trouble • Trouble on cache nameservers for consumers • Apr, 2009 • On two (all) nameservers • 1st failure happened on a server (ns01) • then 2nd failure happened on another server (ns11) • About 12min blackout maz@iij.ad.jp 5
  • 6. Failures on our cache nameservers • ns01: 17:14:26 - 17:48:07 (33min14sec) • ns11: 17:35:51 - 17:48:52 (13min01sec) • During the both servers were in trouble (12min16sec), our users couldn’t resolve hostnames • During this trouble, the servers couldn’t answer 14,005,644 DNS queries maz@iij.ad.jp 6
  • 8. Before failure • Clients prefer to use ns01 • Order of configuration? • Clients sent DNS queries to another server as well • measuring delays? • just in case? maz@iij.ad.jp 8
  • 9. During single failure • DNS queries to ns01 were discarded during this period • It seems users could still resolve hostnames as the ns11 was alive • No strange traffic pattern here • Users might feel some delays • A bit higher rate of queries in the first 3min, and then ‘stable’ state maz@iij.ad.jp 9
  • 10. During single failure (cont. • Query rate(ns01+ns11) looks almost the same as before • Even though ns01 was discarding queries during this period • Probably most clients usually send DNS queries to both nameservers maz@iij.ad.jp 10
  • 11. During double failure (outage) • Users couldn’t resolve hostnames at all during this period • Query rate suddenly increased on both nameservers • Those are all discarded though • Mostly because of ‘retries’ • We observed multiple queries that has the same QNAME maz@iij.ad.jp 11
  • 12. Restoration • Once ns01 was restored, it got about 7times more queries than usual for several seconds • Web pages are composed by many “modules” • Single web page makes several and more DNS queries sometimes • Browsers’ prefetch function • 12min was enough to flush clients’ side DNS cache maz@iij.ad.jp 12
  • 13. Restora(on (cont. • Then ns11 was also restored • It also got higher rate of queries for several seconds • Gradually the query rate were getting ‘normal’ state as same as the before maz@iij.ad.jp 13
  • 14. Lesson learned • Single server failure will not cause a disaster, when users configure multiple DNS cache servers on their device • Probably the impact could be negligible • During double failure (full-outage), nameservers got more queries • Once a server is restored from full-outage, it gets higher query rate for a while • In our case, 7 times more than usual for several seconds maz@iij.ad.jp 14
  • 15. Redundancy is important • DNS resolving works somehow as long as one of servers is functional on each part of DNS • Full-resolvers (caching nameservers) • Authoritative nameservers • Do have redundancy, avoid outage • A multiple server deployment works well • IP anycast would be also useful • my bdnog7 talk - https://www.slideshare.net/bdnog/ip-anycasting • Once outage, we should expect a large amount of queries during and just after the outage • A warning for those who has a security device in front of nameservers
  • 16. Users and DNS full-resolver full-resolver (cache nameserver) Many devices on a home network maz@iij.ad.jp 16
  • 17. Usual graph - 5min average measurement date: 2018/05/16
  • 18. A bit different view - 1sec average measurement date: 2018/05/16
  • 19. Peaks on the hour • Minor peaks on the hour and half (like 07:30) • “Alarm clock” wakes up the phone itself • Some applications are also initiated by the wakeup • I guess those are mostly coming from smartphones • QNAMEs also hint
  • 20. A spike at 15sec before the hour measurement date: 2018/05/16
  • 21. we’ve something different now measurement date: 2019/10/25
  • 22. Summary • It’s reasonable for us to provide 2 full-resolvers (caching nameservers) to customers • Clients seem to have the ability to use a functional one • Once outage, we should expect a large amount of queries during and just after the outage • A warning to those who has a security device in front of nameservers • Clients are synced up unintentionally • ‘alarm clock’ or scheduled tasks • This particular case is not an issue at this moment, but it’s worth to pay attention to those behaviors maz@iij.ad.jp 22