Home CPSC 414

Domain Name System

 

Overview

So far we have only used IP addresses to identify hosts on a network. When we covered the network layer we saw how IP addresses are formatted and used. When we have done socket programming, we connected to servers by specifying the IP address.

Of course as a regular Internet user, you almost never connect to machines by IP address. Instead, you use a hostname. Hostnames are more easily remembered by people than IP addresses. However, we couldn't just use hostnames because IP addresses are more useful for computers. For instance, a hostname does not tell you at all where in the Internet a machine is, but an IP address does.

So the solution is that networks internally use IP addresses, but people use hostnames. Then there is a part of the Internet that translates hostnames into IP addresses when needed. That is the "Domain Name System", or DNS.


 

DNS Services

DNS broadly consists of two things:

  1. A distributed database which is housed in a hierarchy of DNS servers.
  2. A protocol which allows hosts to query this database. The protocol runs at the application layer, but is implemented in system software. It is built on top of UDP and typically uses port 53.

Many other application protocols, such as HTTP, SSH, SMTP, etc. use DNS to allow users to specify hosts with DNS instead of IP addresses.

While translating hostnames to IP addresses is the main job of DNS, it actually provides a few other services as well:


 

DNS Lookup Overview

When a client needs to know the IP address associated with a hostname, it is called a lookup. For example, if you connect your web browser to umw.edu/, your browser needs to know the IP address of this machine. It goes through the following process for this:

  1. The browser gets the host name out of the URL you requested, and passes it to the DNS client. The DNS client is part of the OS on the same machine as the web browser.
  2. The DNS client will then send a query containing this hostname to a DNS server.
  3. The DNS client receives a reply which contains the IP address.
  4. The DNS client then passes this IP address to the browser which can then use that IP address to connect to it over TCP and send an HTTP request.

In Python, the socket.connect function can take a hostname instead of an IP address. The fact that some extra steps are needed is hidden inside of the socket API.


 

Distributed DNS

It would be possible to design DNS such that there was a single, authoritative server that managed the mapping database. However, that would not really work in practice:

Instead DNS is organized into a distributed hierarchy of multiple servers. No single server even knows all of the hostnames in the Internet. There are three primary levels of this hierarchy:

  1. First are the root name servers, of which there are over 900, spread across the world. These are managed by 13 different organizations. Root name servers provide the IP addresses of the top-level domain (TLD) servers.
  2. The top-level domain (TLD) servers are each associated with one top-level domain (such as com, org, edu, net) or a country domain (such as uk, jp, es, ca). These servers provide the IP addresses for the authoritative servers. Each of these is run by some organization, such as the company "Verisign" for .com.
  3. Every organization with a publicly accessible host (such as a web server) must provide DNS records for those hosts in an authoritative name server. They can either run these name servers themselves, or pay another company to. Most larger companies, and universities, maintain their own DNS servers.

When a DNS request is made, we do the following:

  1. Send a request to a root name server. This will give us the IP of the appropriate TLD name server.
  2. We contact the TLD name server which gives us the IP of the appropriate authoritative name server.
  3. We contact this authoritative server which finally gives us the IP of the host we are looking for.

Of course if we really went though all of this every time we needed to find an IP address, all connections would take 4 times longer! To get around this issue, DNS uses caching to avoid looking up more names than we need to.


 

DNS Caching

In addition to the three-level hierarchy of DNS name servers, large organizations and internet service providers also provide local DNS servers. When your machine needs to lookup a hostname, it will actually go through the local DNS server first. The local server will then go to the root DNS server as described above:

At each level of this chain, the DNS servers can save a name mapping in a cache. The goals of this caching is to reduce the delay of finding the IP of a given host, and also to reduce the amount of DNS requests and responses going through the network.

Imagine that you are on the school WiFi and request a page from Wikipedia. Your machine will connect to the UMW local name server (which your machine gets through DHCP). If the local server does not already have this IP cached, it will then connect to a root DNS server. This root DNS server will then respond with the address of the TLD name server (the one for .org in this case). This server will then give us the IP of Wikipedia's authoritative name server, which in turn will give us an IP for Wikipedia.

Now the UMW local name server will cache all of this information. If we request another page from Wikipedia, we will not need to go through this whole process. Moreover, if anyone here at UMW, using the same local name server, goes to Wikipedia, they will use the cached IP as well.

Because DNS mappings are not expected to last forever, name servers clear entries from the cache every so often (normally after 24 hours or so).

The DNS cache system results in most queries being handled locally.


 

DNS Records

Now we will talk about what these DNS databases actually store. Each entry, called a resource record contains four fields:

  1. Name
  2. Value
  3. Type
  4. Time to Live

The "Time to Live" field indicates how much time should pass before this entry is removed from a DNS cache.

The "Type" field determines how the other two fields are handled. There are four values for Type:


 

DNS Messages

There are only 2 sorts of messages used in DNS: queries and replies. They both use the same message format:

These fields are described below:


 

Inserting Records

This all describes how DNS can be used to lookup an IP. But how do these records get put into the system in the first place? This is managed by companies called registrars. The ICANN (Internet Corporation for Assigned Names and Numbers) accredits the registrars.

When you want to create a hostname in DNS, you pay a registrar to insert the appropriate records for you. These will include the A records mapping your hostname to an IP address. If you are going to use your own authoritative DNS servers, these will also include the appropriate NS records. Often the registrar will let you use their name servers. This is done in the TLD name server level.

Now, when someone tries to connect to your hostname, they will connect to a root DNS server (because it isn't cached yet). The root server will then connect you with the appropriate TLD server. Because the registrar has inserted records here for your hostname, it will connect to the right authoritative name server. This name server (whether yours or one belonging to the registrar itself) will then supply the A record for your host.

Copyright © 2024 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.