As seen in the last lab, the data that is sent over TCP is stored as-is inside of TCP packets. Anyone who sees the TCP packet can read the data that is contained in one. Because the Internet is de-centralized, your data packets can be sent through routers operated by many different organizations. Also, on Wireless networks, any machine connected to the network can see packets which are being broadcast, whether they are the recipient or not.
Possibly worse, unencrypted traffic can be modified by a router which handles your packets, such as your ISP. Comcast has injected warnings about bandwidth usage into customer pages, and Verizon has injected cookies to track customers.
In order to avoid third-parties reading data, it must be encrypted.
SSL (Secure Socket Layer) was developed initially by Netscape beginning in 1995. It progressed to version 3.0, but was then renamed to TLS (Transport Layer Security). TLS is still under active development. Sometimes people still refer to it as SSL.
HTTPS is a version of HTTP which uses TLS for securing the requests and the responses. HTTPS is one of the major uses for TLS, but not the only one. SMTP can use TLS for secure email transmission, and your own programs can use TLS as well.
TLS does two main things:
TLS sort of lives between the application layer and the transport layer. It allows applications to send data over TCP knowing that it is secured.
There are some things that TLS does not handle:
So TLS does not stop people from seeing who you are talking to, just what you are saying.
TLS is based on the idea of having a public and a private key. You can tell everyone your public key, but must keep the private key secret. This is called asymmetric cryptography.
This is based on having an algorithm with certain mathematical properties:
If something is encrypted with the public key, it can only be decrypted with the private key:
Thus to send encrypted data to somebody, you only need to know their public key. They can use their own private key to decrypt the text (and they are the only one who can do that).
If something is "signed" with the private key, it can be verified with the public key:
If we sign something with our private key, then anyone with our public key can tell that we have done so. As long as the private key is truly kept private, anyone can verify that the data is from us.
TLS supports a number of different cryptographic algorithms (also called "ciphers". The different versions of TLS include different algorithms to choose between. Sometimes flaws are found in certain algorithms, and new ones are added.
RSA is one of the oldest and most widely used asymmetric cryptography algorithms. It is based around the fact that finding the factors of a large composite number is a difficult problem, but finding the product of two numbers is easy.
Consider these examples:
72253 * 59209?
The first question is trivial to answer - it takes a computer no time to find this product. The second question is much harder. There is no efficient way of doing this. We must resort to the brute force approach of trying all possible factors.
RSA essentially works by using the large composite number as the public key, and the two prime numbers as the private key. (There are a few more details we are skipping over, but that's the main idea).
This allows for all three of the requirements of an asymmetric cryptographic algorithm:
Other cryptographic functions are based on similar principles. There is a lot of theoretical math that goes into providing good ciphers that work well and are secure.
With HTTPS, these public keys are contained in certificates. If you direct your browser at an HTTPS server, you need to know the public key the server is using, in order to verify its identity.
When you connect to a site, it will give you its certificate. But how can you trust that the certificate for the site is the real one?
Generating certificates with RSA, or any other cipher, is not a hard thing to do. What is to stop a site from forging a certificate and giving it to your browser?
The only way around this problem is to rely on a third party to provide a list of trusted certificates. This is called certificate authority. These are organizations that maintain a list of sites along with their official certificate, including public key.
Some of the larger authorities are:
When you connect your browser to a new web site using HTTPS, the site will provide its certificate. Your browser will then compare this certificate against the one provided by a certificate authority. If they match your browser will continue. If they do not, you will get a security warning.
Assuming they match, your browser will use the public key contained therein to encrypt data before sending it to the server. The server will then use the corresponding private key to decrypt the data.
HTTPS is necessary for sites that communicate private information, such as credit card information. It is not as necessary for sites that don't communicate private information, but still has benefits:
Most browsers indicate that HTTPS is being used in the address bar. For example, Firefox displays a green lock:
Browsers will generally warn you if a page has a login form and isn't using HTTPS, and will definitely warn you if a site gives the browser a certificate which can't be verified by an authority. The site https://badssl.com/ can be used to test bad certificates.
HTTPS overtook HTTP in terms of market share in September 2018. Google has encouraged adoption by ranking sites which use HTTPS ahead of those which do not.
In order for a site to use HTTPS, it must create a certificate and register it with a certificate authority. This can cost money, but in 2014 an organization called Let's Encrypt was founded which offers free certificate authentication.
Copyright © 2021 Ian Finlayson | Licensed under a Creative Commons Attribution 4.0 International License.