Home CPSC 414

Simple HTTP Server

 

Due: March 14

 

Objective

To gain experience writing a socket server and implementing a real protocol.


 

Task

For this project, you will write a simple HTTP server which is capable of serving HTML pages to web browsers. You will implement a subset of the HTTP protocol that the web is built on.


 

HTTP Overview

HTTP is built on requests. A client sends a request to a server and the server handles it. HTTP supports several different types of requests, but the most common is GET, which is the only one we will handle.

A GET request sent from my browser looks like this:

GET / HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1

The first line is the most important. It starts with the request method, "GET". Next is the file that is being requested, in this case "/". This means the root file of the website. Finally the protocol version "HTTP/1.1".

The rest of the lines are extra information sent to the server. For our purposes, we can ignore all these lines. The request ends with a blank line.

After sending the request, the client waits for a response. The response can look like this:

HTTP/1.1 200 OK
Date: Mon, 25 Feb 2019 17:31:23 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Type: text/html; charset=UTF-8
Content-Length: 6339

<file contents>

HTTP allows more header fields that the server can send, but these are the most important ones, which we will implement. Note the blank line between the header and file! More information for each one follows:


 

Program Details

The major things your program will need to do are outlined below:

  1. When running this on your Google cloud VM, make sure you have an open port, and make a note of your internal and external IPs.
  2. Start by creating a server socket. Bind the internal address at the port you selected, and listen on it for clients to connect.
  3. The rest of the program is in an infinite loop. Each iteration of the loop, accept one connection and handle it.
  4. Once you have a connection, read the request from the socket. The whole request will come in as one recv, so use a large size, like 4096.
  5. Next get the first line of the request. Python's string .split() method is handy for this. The first line is all we care about.
  6. If the first line does not have three "words" in it, return 400. If the first word of the line is not "GET", return 405. If the last word is not "HTTP/1.1", return 505.
  7. Look at the file name that the user requested. If it is something like "/page.html", then try to open up "page.html" (removing the /) from the current directory. If it is "/", then try to open up "index.html" as the default site page.
  8. If the file is not found, return 404. If the file exists but can't be read, return 403. The easiest way to do this is to put the file code in a try block and catch the Python FileNotFoundError and PermissionError exceptions.
  9. Then read the contents of the file into a string.
  10. Next, send all of the header information, with a 200 code. The length of the string after it's encoded can be used for the Content-Length. Then send a blank line, and finally the file contents.
  11. If your program encounters an exception doing any of the above (that has not already been caught), you can return 500 to the client.

 

Testing

In order to test your server, you can run it on your VM and then connect to it with a regular web browser. Put your external IP address in as the URL. The default HTTP port is 80. If you are using something else (as we probably are) then you can put a colon after the host in the URL bar. This way you can connect with your HTTP server:


Connecting to your server in the URL bar of a browser

If you leave off a file name, then your server should give an index.html file by default. If there is no file with that name in the same directory, it should return 404.

You can use the following index.html file for testing purposes. You can download it and unpack it with the following commands:

$ wget http://ianfinlayson.net/class/cpsc414/assignments/index.html.gz
$ gunzip index.html.gz

You should also test specifying another file. To do this, just append the file name in the URL bar, like this:


Specifying a file name in the URL bar of a browser

Make another HTML file or two to test this functionality.


 

General Requirements

When writing your program, also be sure to:


 

Submitting

To submit your program, email the program file to ifinlay@umw.edu.

Copyright © 2019 Ian Finlayson | Licensed under a Creative Commons Attribution 4.0 International License.