Impact of configuration errors on dns robustness
1 / 28

Impact of Configuration Errors on DNS Robustness - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Impact of Configuration Errors on DNS Robustness. V. Pappas * Z. Xu * , S. Lu * , D. Massey ** , A. Terzis *** , L. Zhang * * UCLA, ** Colorado State, *** John Hopkins. are they the same?. Motivation. DNS: part of the Internet core infrastructure

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Impact of Configuration Errors on DNS Robustness

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Impact of Configuration Errors on DNS Robustness

V. Pappas *

Z. Xu *, S. Lu *, D. Massey **, A. Terzis ***, L. Zhang *

* UCLA, ** Colorado State, *** John Hopkins

are they the same?


  • DNS: part of the Internet core infrastructure

    • Applications: web, e-mail, e164, CDNs …

  • DNS: considered as a very reliable system

    • Works almost always

  • Question: is DNS a robust system?

    • User-perceived robustness

    • System robustness


Short Answer:

“Microsoft's websites were offline for up to 23 hours -- the most dramatic snafu to date on the Internet --because of an equipment misconfiguration” -- Wired News, Jan 2001

  • Thousands or even millions of users affected

  • All due to a singleDNS configuration error

Related Work

  • Traffic & implementation errors studies:

    • Danzig et al. [SIGCOMM92]: bugs

    • CAIDA : traffic & bugs

  • Performance studies:

    • Jung et al. [IMW01]: caching

    • Cohen et al. [SAINT01]: proactive caching

    • Liston et al. [IMW02]: diversity

  • Server availability :

    • To appear [OSDI04, IMC04]

Our Work: Study DNS Robustness

  • Classify DNS operational errors:

    • Study known errors

    • Identify new types of errors

  • Measure their pervasiveness

  • Quantify their impact on DNS

    • availability

    • performance


  • DNS Overview

  • Measurement Methodology

  • DNS Configuration Errors

    • Example Cases

    • Measurement Results

  • Discussion & Summary






Occupies a continues subspace

Served by the same nameservers


resource records

name servers









answer: A

caching server


com NS RRs

com A RRs

root zone


foo NS RRs

foo A RRs


bar NS RRs

bar A RRs

com zone

foo zone

bar zone

asking for

client NS NS NS A A A NS NS NS A A A

Infrastructure RRs

  • NS Resource Record:

    • Provides the names of a zone’s authoritative servers

    • Stored both at the parent and at the child zone


  • A Resource Record

    • Associated with a NS resource record

    • Stored at the parent zone (glue A record)

focus of

our work

What Affects DNS Availability

  • Name Servers:

    • Software failures

    • Network failures

    • Scheduled maintenance tasks

  • Infrastructure Resource Records:

    • Availability of these records

    • Configuration errors









The configuration of infrastructure

RRs does not correspond to the

actual authoritative name-servers.

More than one name-servers share a

common point of failure.

Classification of Measured Errors



What is Measured?

  • Frequency of configuration errors:

    • System parameters: TLDs , DNS level, zone size (i.e. the number of delegations)

  • Impact on availability:

    • Number of servers: lost due to these errors

    • Zone’s availability: probability of resolving a name

  • Impact on performance:

    • Total time to resolve a query

      • Starting from the query issuing time

      • Finishing at the query final answer time

Measurement Methodology

  • Error frequency and availability impact:

    • 3 sets of active measurements

      • Random set of 50K zones

      • 20K zones that allow zone transfers

      • 500 popular zones

  • Performance impact:

    • 2 sets of passive measurements:1-week DNS packet traces

Lame Delegation NS NS A A


1) Non-existing server

-- 3 seconds perf. penalty


2) DNS error code

-- 1 RTT perf. penalty

3) Useless referral

-- 1 RTT perf. penalty

4) Non-authoritative

answer (cached)

Lame Delegation Results


0.06 sec

3 sec

0.4 sec

Lame Delegation Results

Lame Delegation Results

  • Error Frequency:

    • 15% of the zones

    • 8% for the 500 most popular zones

    • independent of the zone’s size, varies a lot per TLD

  • Impact:

    • 70% of the zones with errors lose half or more of the authoritative servers

    • 8% of the queries experience increased response times (up to an order of magnitude) due to lame delegation

Diminished Server Redundancy NS NS A A


A) Network level:

- belong to the same subnet


B) Autonomous system level:

- belong to the same AS

C) Geographic location level:

- belong to the same city

Diminished Server Redundancy Results

  • Error Frequency:

    • 45% of all zones have all servers in the same /24 subnet

    • 75% of all zones have servers in the same AS

    • large & popular zones: better AS and geo diversity

  • Impact:

    • less than 99.9% availability: all servers in the same /24 subnet

    • more than 99.99% availability: 3 servers at different ASs or different cities A

The A glue RR for missing depends


If is

unavailable then is too

Cyclic Zone Dependency (1) NS NS A


foo NS NS A

If and are

unavailable, B addr.

are unresolvable

The B servers

depend on A servers


The zone seems

correctly configured

Cyclic Zone Dependency (2) NS NS A


The combination of

and zones is wrongly



Cyclic Zone Dependency Results

  • Error Frequency:

    • 2% of the zones

    • None of the 500 most popular zones

  • Impact:

    • 90% of the zones with cyclic dependency errors lose 25% (or even more) of their servers

    • 2 or 4 zones are involved in most errors

Discussion: User-Perceived != System Robustness

  • User-perceived robustness:

    • Data replication: only one server is needed

    • Data caching: temporarymasks infrastructure failures

    • Popular zones: fewer configuration errors

  • System robustness:

    • Fewer available servers: due to inconsistency errors

    • Fewer redundant servers: due to dependency errors

Discussion: Why so many errors?

  • Superficially: are due to operators:

    • Unaware of these errors

    • Lack of coordination

      • parent-child zone, secondary servers hosting

  • Fundamentally: are due to protocol design:

    • Lack of mechanisms to handle these errors

      • proactively or reactively

    • Design choices that embrace some of them:

      • Name-servers are recognized with names

      • Glue NS & A records necessary to set up the DNS tree


  • DNS operational errors are widespread

  • DNS operational errors affect availability:

    • 50% of the servers lost

    • less than 99.9% availability

  • DNS operational errors affect performance:

    • 1 or even 2 orders of magnitude

  • DNS system robustness lower than user perception

    • Due to protocol design, not just due to operator errors

Ongoing Work

  • Reactive mechanisms:

    • DNS Troubleshooting [NetTs 04]

  • Proactive mechanisms:

    • Enhancing DNS replication & caching

Thank You!!!

  • Login