So far we have only used IP addresses to identify hosts on a network. When we covered the network layer we saw how IP addresses are formatted and used. When we have done socket programming, we connected to servers by specifying the IP address.
Of course as a regular Internet user, you almost never connect to machines by IP address. Instead, you use a hostname. Hostnames are more easily remembered by people than IP addresses. However, we couldn't just use hostnames because IP addresses are more useful for computers. For instance, a hostname does not tell you at all where in the Internet a machine is, but an IP address does.
So the solution is that networks internally use IP addresses, but people use hostnames. Then there is a part of the Internet that translates hostnames into IP addresses when needed. That is the "Domain Name System", or DNS.
DNS broadly consists of two things:
Many other application protocols, such as HTTP, SSH, SMTP, etc. use DNS to allow users to specify hosts with DNS instead of IP addresses.
While translating hostnames to IP addresses is the main job of DNS, it actually provides a few other services as well:
When a client needs to know the IP address associated with a hostname, it is called a lookup. For example, if you connect your web browser to umw.edu/, your browser needs to know the IP address of this machine. It goes through the following process for this:
In Python, the
socket.connect function can take a hostname
instead of an IP address. The fact that some extra steps are needed is hidden
inside of the socket API.
It would be possible to design DNS such that there was a single, authoritative server that managed the mapping database. However, that would not really work in practice:
Instead DNS is organized into a distributed hierarchy of multiple servers. No single server even knows all of the hostnames in the Internet. There are three primary levels of this hierarchy:
When a DNS request is made, we do the following:
Of course if we really went though all of this every time we needed to find an IP address, all connections would take 4 times longer! To get around this issue, DNS uses caching to avoid looking up more names than we need to.
In addition to the three-level hierarchy of DNS name servers, large organizations and internet service providers also provide local DNS servers. When your machine needs to lookup a hostname, it will actually go through the local DNS server first. The local server will then go to the root DNS server as described above:
At each level of this chain, the DNS servers can save a name mapping in a cache. The goals of this caching is to reduce the delay of finding the IP of a given host, and also to reduce the amount of DNS requests and responses going through the network.
Imagine that you are on the school WiFi and request a page from Wikipedia. Your machine will connect to the UMW local name server (which your machine gets through DHCP). If the local server does not already have this IP cached, it will then connect to a root DNS server. This root DNS server will then respond with the address of the TLD name server (the one for .org in this case). This server will then give us the IP of Wikipedia's authoritative name server, which in turn will give us an IP for Wikipedia.
Now the UMW local name server will cache all of this information. If we request another page from Wikipedia, we will not need to go through this whole process. Moreover, if anyone here at UMW, using the same local name server, goes to Wikipedia, they will use the cached IP as well.
Because DNS mappings are not expected to last forever, name servers clear entries from the cache every so often (normally after 24 hours or so).
The DNS cache system results in most queries being handled locally.
Now we will talk about what these DNS databases actually store. Each entry, called a resource record contains four fields:
The "Time to Live" field indicates how much time should pass before this entry is removed from a DNS cache.
The "Type" field determines how the other two fields are handled. There are four values for Type:
These are the standard "Address" records. Here the Name field gives a hostname and the Value field gives an IP address.
These "Name Server" records indicate the name server which knows the IP for particular host. Here the Name field gives a hostname, and the Value field gives the hostname for an authoritative DNS server which knows the IP of the hostname. These are used for routing queries along the hierarchy.
These provide the canonical name for an alias hostname. Here the Name field is an alias and the Value field is the canonical hostname of the alias.
These are used for email hostnames. Like the A records, they map hostnames to IP addresses. Having separate entries allow a hostname to map to different IPs for mail vs. other uses. This allows an organization to use different machines for hosting email vs. other applications like websites.
There are only 2 sorts of messages used in DNS: queries and replies. They both use the same message format:
These fields are described below:
A 16-bit number which identifies this request. When a server sends a reply, it uses the same number so the client can match the reply with the original query.
Contains a number of bit flags which indicate things. These include whether this is a query or reply, and whether this reply is from an authoritative server.
These indicate how many of each of the following things exist in this message.
In a query, contains a variable number of queries. These include the hostname we are querying as well as some other info. Like whether this is a mail server (in which case we want an MX record) or not (we want an A record).
In a reply, this contains the resource records (described above) that the client wanted. There can be multiple records for one query, for example when a server uses multiple IPs.
Contains records for authoritative servers, these communicate the NS records described above.
Contain other information the client may find helpful.
This all describes how DNS can be used to lookup an IP. But how do these records get put into the system in the first place? This is managed by companies called registrars. The ICANN (Internet Corporation for Assigned Names and Numbers) accredits the registrars.
When you want to create a hostname in DNS, you pay a registrar to insert the appropriate records for you. These will include the A records mapping your hostname to an IP address. If you are going to use your own authoritative DNS servers, these will also include the appropriate NS records. Often the registrar will let you use their name servers. This is done in the TLD name server level.
Now, when someone tries to connect to your hostname, they will connect to a root DNS server (because it isn't cached yet). The root server will then connect you with the appropriate TLD server. Because the registrar has inserted records here for your hostname, it will connect to the right authoritative name server. This name server (whether yours or one belonging to the registrar itself) will then supply the A record for your host.
Copyright © 2019 Ian Finlayson | Licensed under a Creative Commons Attribution 4.0 International License.