How to get the base of a URL in Python

avatar
Borislav Hadzhiev

Last updated: Apr 9, 2024
2 min

banner

# Get the base of a URL in Python

To get the base of a URL:

  1. Pass the URL to the urlparse method from the urllib.parse module.
  2. Access the netloc attribute on the parse result.
main.py
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) # ๐Ÿ‘‡๏ธ ParseResult(scheme='https', netloc='bobbyhadz.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # ๐Ÿ‘‰๏ธ bobbyhadz.com path = parsed.path print(path) # ๐Ÿ‘‰๏ธ /images/wallpaper.jpg with_path = base + '/'.join(path.split('/')[:-1]) print(with_path) # ๐Ÿ‘‰๏ธ bobbyhadz.com/images print(path.split('/')) # ๐Ÿ‘‰๏ธ ['', 'images', 'wallpaper.jpg']

get base of url in python

The code for this article is available on GitHub

We used the urlparse() method from the urllib.parse module.

The urlparse method takes a URL and parses it into six components.

main.py
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) # ๐Ÿ‘‡๏ธ ParseResult(scheme='https', netloc='bobbyhadz.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # ๐Ÿ‘‰๏ธ bobbyhadz.com path = parsed.path print(path) # ๐Ÿ‘‰๏ธ /images/wallpaper.jpg
The netloc attribute on the parse result returns the base URL.

We also have access to other attributes like path, query, etc.

# Exclude a portion of the path from the result

If you need to exclude a portion of the path from the result, use the str.rsplit() or str.split() methods.

main.py
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # ๐Ÿ‘‰๏ธ bobbyhadz.com path = parsed.path print(path) # ๐Ÿ‘‰๏ธ /images/wallpaper.jpg with_path = base + '/'.join(path.rsplit('/', 1)[:-1]) print(with_path) # ๐Ÿ‘‰๏ธ bobbyhadz.com/images print(path.rsplit('/', 1)) # ๐Ÿ‘‰๏ธ ['/images', 'wallpaper.jpg']

exclude portion of path from result

The code for this article is available on GitHub

The str.rsplit() method returns a list of the words in the string using the provided separator as the delimiter string.

main.py
# ๐Ÿ‘‡๏ธ ['/images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/', 1)) # ๐Ÿ‘‡๏ธ ['', 'images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/'))

The method takes the following 2 arguments:

NameDescription
separatorSplit the string into substrings on each occurrence of the separator
maxsplitAt most maxsplit splits are done, the rightmost ones (optional)

Except for splitting from the right, rsplit() behaves like split().

You can set the maxsplit argument to 1 if you only want to split once from the right.

# Working with more deeply nested URL paths

Here is another example with a more deeply nested URL.

main.py
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/nature/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # ๐Ÿ‘‰๏ธ bobbyhadz.com path = parsed.path print(path) # ๐Ÿ‘‰๏ธ /images/wallpaper.jpg with_path = base + path.rsplit('/', 2)[0] print(with_path) # ๐Ÿ‘‰๏ธ bobbyhadz.com/images print(path.rsplit('/', 2)) # ๐Ÿ‘‰๏ธ ['/images', 'nature', 'wallpaper.jpg']
The code for this article is available on GitHub

The URL in the example is one level deeper.

We split the URL 2 times from the right and added the /images path to the base URL.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.

Copyright ยฉ 2024 Borislav Hadzhiev