Web Gobbledygook: Understanding the Basics of URLs

At Tensyl, we regularly conduct cybersecurity awareness training for our clients. Not to toot our own horns, but we're really good at it - call us. The training often includes a unit on phishing. Phishing is, after all, used in a significant percentage of cyber-attacks and thus, represents a significant threat for most organizations. Organizations will reduce the risk associated with this threat if its employees are adept at identifying phishing messages (and know how to handle once received). Hence, the training.

One important aspect of our phishing training is familiarizing users (often nontechnical users) with the syntax of web addresses, or URLs. Why? Because a common technique that phishers use is the inclusion of a hyperlink in a message (along with a tantalizing reason for the recipient to click on the link). Tensyl trainers teach users to be wary of hyperlinks, hover their mouse over the hyperlink to reveal the URL, and carefully inspect the URL before ever thinking about clicking. To be effective, though, users must have a basic understanding of URL syntax. We thought we'd share with you how we train users on URLs through this blog post.

To begin, it's important to keep in mind the whole point of a web address, or URL. Namely, it provides the exact location of a resource that is available via the Internet (often a web server). It is tantamount to a mailing address. And in the same way the U.S. postal service has defined a specific format on how to address a letter to make sure it is delivered to its intended recipient, the creators of the Internet have created a syntax (see RFC 3986) to ensure that computers can connect to the exact host(s) that they intend to connect with. To understand the basics of URL syntax, let's look at the URL for the homepage of the Tensyl website:

 
Screenshot (350).png
 

At a high level, URLs have two main components: the domain and the transfer protocol. See figure below. The domain is simply the name of the host (or hosts) that is being requested. The transfer protocol tells the host how it will send data over the network. Think of the protocol as the agreed-upon "language" that will be used for the communication ("Hey, let's have a call and let's agree to speak Spanish!"). HTTP and HTTPS are the most common transfer protocols used on the Internet, but there are numerous others. And with many web browsers today, there is no longer a need to designate a transfer protocol ("www.tensylsf.com/" is as valid a request as "https://www.tensylsf.com/").

 
Screenshot (351).png
 

So, when you type "https://www.tensylsf.com/" into a web browser, what you are telling the browser is in effect, use the HTTPS protocol to traverse the Internet to connect with the web server for Tensyl.

Now, some of you savvy readers may be standing on your chairs and screaming "Wait! You are WAY oversimplifying things!". And you'd be right. So, let's look at a more complex example.

Screenshot (347).png

To understand the syntax of this URL, let's work from left to right. As we saw with the above example, the first operator is the transfer protocol. Here again the transfer protocol is HTTPS.

As we learned above, the next part of the syntax is the domain. The domain, however, is not one monolithic operator as suggested above. In fact, the domain is comprised of several important sub-components: (i) the top-level domain (TLD), (ii) the second-level domain, and (iii) sometimes (but not always) a subdomain. See figure below. These domain sub-components are ways to organize resources on the Internet in the same way your local grocery store groups products with other similar products. The dairy products are in the same aisle of your grocery store just like educational institutions use the .EDU top level domain. Is it possible that a school uses a TLD other than .EDU? Of course, in the same way your grocery store may put the almond milk in the nut section not the dairy aisle (wait, is almond milk actually milk?). While there are many TLDs, in the U.S., the most common TLDs are .com, .net, .org, .gov, and .edu.

Screenshot (352).png

Immediately to the left of the TLD (separated by a ".") is the second level domain (SLD). The second level domain is what makes a domain unique. It is the specific name of the resource that is being requested and a key component of what we commonly think of as the web address. In terms of identifying potential phishing messages, being able to identify the SLD is crucial. The SLD helps reveal the true destination of the URL. If you receive an email purporting to be from Amazon, and the email includes a hyperlink that includes a URL in which the SLD is "notamazon", then you should be very suspicious of the message. Of course, the bad guys rarely make it so easy and deploy several techniques to try to trick users into thinking the domain is legitimate (subject of another blog post).

One last comment on domains before we move on. Sometimes, but not always, the URL will include a subdomain which is immediately to the left of the SLD (separated again by a "."). As the name suggests, a subdomain is a sub-section of a domain and in fact, is treated as a unique web address. In our example, the subdomain is "www". WWW, or world wide web, is a legacy from the early days of the Internet and was a designation that indicated a website was publicly available. But a subdomain can be anything, and is often used by website operators as a way to organize content on a website.

Continuing to move from left to right in our example, to the right of the TLD (and separated by a forward slash "/") is a directory name. Here, the directory is called "blog". Directories are not required components of a URL and would be included only if (a) it is part of the website's design and (b) if the user requests a page that is maintained within the directory. Directories are similar to folder structures on a computer. In the same way files are organized in folders and sub-folders, web pages (and/or files) can be organized in directories and sub-directories. Here, "blog" is a directory, and there are four sub-directories: (1) "2019", (2) "8", (3) "28", and (4) "a-quick-introduction-to-dkim".

Believe it or not, we've barely scratched the surface of the exciting topic of URL syntax. Indeed, it is a fairly complex topic that is beyond the scope of this blog post. We hope, though, that we've introduced you to the basics so that you'll be better prepared if/when you receive a potential phish and need to analyze the URL of a hyperlink. Remember, find the top level domain, look at the text immediately to the left and ask yourself, is the web address directing me to a trusted site? If the answer is no, or I'm not sure, then stop and do not click on the link.

To learn more about cybersecurity awareness training, URLs, or almond milk, contact the Tensyl team.

David Garrett