HTTP is an application layer protocol working on top of TCP/IP and is widely used on the Internet to access websites. It’s a text-based protocol up to version 2
when a binary version was first introduced. Understanding HTTP on a basic level is crucial for both back-end and front-end software engineers. Here’s a short explanation of how HTTP works.
What is TCP/IP
IP (Internet Protocol) is a network protocol used to address and route packets of data across network. IP address is a unique 32-bit host identifier that allows addressing the host. Normally it’s written in the form of four octets separated by a dot. Here’s the IP address of my site:
80.87.97.149
There are two versions of IP now IPv4 (32-bit address) and IPv6 (128-bit address). I’ll skip IPv6 for simplicity and focus on IPv4.
TCP (Transport Control Protocol) is a transport protocol with control of delivery. In practice, it means that the delivery of data to a recipient is controlled by the protocol. Together TCP/IP form the Internet protocol suite and, as the name suggests is the main protocol on the Internet.
HTTP flow
What happens when you click a link in a browser? First, the browser requests the DNS to resolve the IP address from a domain name. Second, it establishes a TCP connection to the resolved IP address. Then it sends an HTTP request which is a simple text message terminated by two CRLF
. The browser waits until it gets an HTTP response which is again a text message terminated by two CRLF
.
HTTP is a client-server protocol but an IP address the browser connects to is not necessarily a web server. It can be a proxy server, load balancer, or CDN server. The web server itself can also act as a proxy when it serves dynamic content. In this case, the HTTP request is passed to the underlying back-end application.
┌───────┐ ┌───┐ ┌─────────────┐ ┌──────────┐ ┌──────────────────┐
│Browser│ │DNS│ │Load Balancer│ │Web Server│ │Application Server│
└───┬───┘ └─┬─┘ └──────┬──────┘ └────┬─────┘ └────────┬─────────┘
│resolve domain │ │ │ │
│──────────────>│ │ │ │
│ │ │ │ │
│ IP-address │ │ │ │
│<──────────────│ │ │ │
│ │ │ │ │
│ HTTP request │ │ │
│─────────────────────────>│ │ │
│ │ │ │ │
│ │ │ HTTP request │ │
│ │ │──────────────────>│ │
│ │ │ │ │
│ │ │ │ HTTP request │
│ │ │ │ ──────────────────>
│ │ │ │ │
│ │ │ │ HTTP response │
│ │ │ │ <──────────────────
│ │ │ │ │
│ │ │ HTTP response │ │
│ │ │<──────────────────│ │
│ │ │ │ │
│ HTTP response │ │ │
│<─────────────────────────│ │ │
┌───┴───┐ ┌─┴─┐ ┌──────┴──────┐ ┌────┴─────┐ ┌────────┴─────────┐
│Browser│ │DNS│ │Load Balancer│ │Web Server│ │Application Server│
└───────┘ └───┘ └─────────────┘ └──────────┘ └──────────────────┘
HTTP flow diagram
HTTP request structure
If you take a look at HTTP request you might notice a pattern. The first line of the HTTP request is different but the rest lines are alike. Here’s an example:
The first line consists of a method followed by a path and a protocol. The rest are headers and the majority of them are optional. In fact, some web servers would even allow you to skip the Host
header.
HTTP response structure
The first line consists of an HTTP protocol followed by a status code and status message. The next lines are headers separated from a body by two CRLF
. The body itself is terminated by another two CRLF
.
Try it out
You can use telnet to make an HTTP request to a server. First, resolve an IP address from a domain name:
$ dig atabakoff.com
;; ANSWER SECTION:
atabakoff.com. 300 IN A 80.87.97.149
Then start telnet and open a connection to the IP address we resolved on port 80
:
$ telnet
telnet> open 80.87.97.149 80
Trying 80.87.97.149...
Connected to 80.87.97.149.
Escape character is '^]'.
Now you can make HTTP requests, let’s just GET
the main page:
GET / HTTP/1.1
Host: atabakoff.com
User-Agent: ziccurat
HTTP/1.1 301 Moved Permanently
Server: nginx/1.18.0 (Ubuntu)
Date: Fri, 17 Mar 2023 17:49:08 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: https://atabakoff.com/
The response code 301
is because my web server is configured to redirect to the encrypted HTTPS location. Telnet can only be used for bare HTTP but we can utilize other tools like CURL.
CURL to make an HTTP request
CURL is both a library and a command-line tool. As a library, it has bindings for multiple languages and it perhaps is the most widely used HTTP library out there.
As a command-line tool, you can use curl to quickly download files, and test a connection to a site or an API. It’s a to-go tool when you write automation scripts and I’ll show a few snippets I often use.
Download a file with CURL
$ curl -L https://github.com/yt-dlp/yt-dlp/releases/download/2023.03.04/yt-dlp_linux -o yt-dlp
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 27.9M 100 27.9M 0 0 10.0M 0 0:00:02 0:00:02 --:--:-- 11.0M
-L
option is to follow redirects and-o
to specify a file name. By default,curl
doesn’t follow redirects and will silently end. To debug use the-v
option that tells it to be verbose.
Get your IP on Internet
$ curl httpbin.org/ip
{
"origin": "217.231.24.43"
}
Conclusion
HTTP is the main application protocol of the Internet and its understanding can be even helpful on a user level. A form does not work and you have no idea why? Hit F12
, open the Network
tab, and see that a server returns an error that is never shown to you. Now you have some valuable information to contact support or realize that you only need to wait a minute.