Demystifying The Modern Web, Vol. 1 - URLs Are Everything

Jon Sully

Tue Jun 2 '20

5 Minutes

Breaking down a URL into its clever little bits

Audience

It’s important to me that I always preface my writing with a clear depitction of who I’m writing to for two reasons - on my side, it allows me to make assumptions about existing knowledge of readers. On your end, it means that I don’t have to explain all the way down the stack and you can know that I won’t spend time on anything I’m declaring “already known.”

This article is written for newer developers of all varieties. It’s plainly about URLs, their structure, and some history. Anybody that doesn’t readily understand all of the components of a URL can benefit from this read (I hope!).

Noice! 👍🏻

So let’s talk about URLs. Here’s the example URL this discussion is going to center around:

https://www.facebook.com/photo.php?fbid=891894004608610&type=3&theater=true

You don’t have to have a Facebook account to open that - it’s a fully public image of an [IKE Smart City](https://www.ikesmartcity.com) Kiosk (which I happen to write code for). Nothing particularly special, just a public Facebook post. Cool? Let’s dig in piece by piece.

https

is the first part. This is the scheme or more simply, the communication style that both parties in a web connection agree to communicate with. A shared language, if you will. Most often in web development we see http and https but depending on what you’re working on, you may have seen or used others, like ftp, ssh, tel, mailto, etc.

: (colon)

is the delineator. Is it just a colon? Yes. It’s a ✨magical✨ colon. It defines where the text for which scheme you’re using ends and the rest of the text for the URL starts. Each scheme has its own requirements for what the rest of the URL needs to look like, but the idea is that the colon : separates the scheme from the stuff pertaining to that scheme. Let me illustrate that really briefly using the ‘language’ concept I referenced above. Consider these to be valid:

english:Hello,_world!

spanish:Hola,_Mundo!

The stuff after the colon is what you’d expect based on the scheme referenced before the colon. Conversely, this would be considered incorrect:

english:Hola,_Mindo!

Because the stuff after the colon no longer fits the expectation set forth by the scheme given.

Now wait a second.. 🤔

There’s a few foundational things to note of before we move ahead. The scheme:stuff idea that I just presented is an over-arching rule around URLs, but URLs are bigger than just http and https. Those other schemes that we’ve probably seen follow the same rules. That’s why URLs to kick off an email are mailto:[email protected] and URLs to kick off a text message are sms:1112224444. Your device knows how to handle the sms scheme and/or mailto scheme - it just opens a new message, then passes the stuff to the application that handles that scheme. Slick!

So everything from here on out pertains specifically to the http(s) scheme. Other schemes have different requirements for their stuff.

// (double slash)

A double slash is probably the most well known beginning of an http(s) stuff string (I’ll probably just call it the ‘http string’ from here on out). In short, a double slash is used to reference an absolute host. Typically a domain name, but the idea is that you use a double slash when you’re calling out to somewhere specific - a new, known, place. It’s why we have to use double-slashes when navigating to a site on the web. We’re not currently there, and we want to go there, so we use double-slashes to indicate the full, absolute path of that location (by domain name).

As a point of contrast, http actually supports other slash mechanisms. Webpages often link to other assets or pages. We typically use either relative paths (../foo/bar.jpg) or absolute paths (/foo/images/bar.jpg) but even that’s just short hand for using http without a separate domain (single slash, in this case): http:/../foo/bar.jpg or http:/foo/images/bar.jpg. That might look strange if you’re used to only ever seeing double-slash URLs with http and https but it’s valid! A single slash just means ‘same domain, asking for this path now.’ Feel free to pop open the dev tools on any webpage and try for yourself!

A screenshot of the developer tools panel making a single-slash http request

www.facebook.com

This is the absolute place on the internet that the double slash above is referring to. It’s a named, specific place that anybody can look up - in this case, a name that resolves to a single IP address. This is often equated to a physical home address, and I think that metaphor is valid.

/photo.php

is the path you’re targeting - often equivalent to the page you’re viewing. In the physical-address metaphor, this would be like an apartment number. Without it, you’ll get to the right building, but it probably still won’t get to the right underlying place.

In static websites where content is just plain files served up to the correlating path, a /foo.html path truly and literally points at the file foo.html in the root directory. This behavior is the same for most PHP (web programming language) sites - in this case we are saying “hey I want the photo.php file” and what’s rendered is the contents of that file.

? (question mark)

This question mark is key - it is the query string marker. It marks where the path ends and the query string parameters begin. Like passing arguments to a function as inputs, query string parameters are the inputs for a webpage. The ? indicates where the query string parameters begin. They’re always last, but they can be QUITE large.

fbid=3490363167645541

is the first query string parameter. They always follow a very simple string=string key/value structure. Remember how I said query string parameters are like inputs to a page? Exactly the case here. If I had to guess, fbid is short for “Facebook Id”, and the number is the global unique id of the image we’re looking at. Passing that as an argument to the photo.php tells that page which photo I want to look at.

& (ampersand)

The ampersand is another key part. This is the delimiter between query string parameters. It’s how we mark where one query parameter ends and the next begins. Similar to how commas are used in many programming languages to separate function arguments from each other.

theater=true

Finally, this is the second query string parameter in the URL. This one in particular tells the page that I want to see the picture in theater mode. Facebook recently redid their interface, and this parameter may no longer actually do anything, but that’s another great thing about query string parameters. There’s no issue in having extra parameters. They’ll just get ignored 🙂

Noice! 👍🏻

So what’s a URL? It’s the sum of each important little piece:

https + : + // + www.facebook.com + /photo.php + ? + fbid=3490363167645541 + & + theater=true

That ultimately results in a specific room in a specific building speaking a specific language with a few important inputs 🤓

Audience

Noice! 👍🏻

https

: (colon)

Now wait a second.. 🤔

// (double slash)

www.facebook.com

/photo.php

? (question mark)

fbid=3490363167645541

& (ampersand)

theater=true

Noice! 👍🏻

Comments? Thoughts?