How to get the base of a URL in Python

avatar

Borislav Hadzhiev

Last updated: Jul 8, 2022

banner

Photo from Unsplash

Get the base of a URL in Python #

To get the base of a URL:

  1. Pass the url to the urlparse method from the urllib.parse module.
  2. Access the netloc attribute on the parse result.
main.py
from urllib.parse import urlparse my_url = 'https://example.com/images/wallpaper.jpg' parsed = urlparse(my_url) # 👇️ ParseResult(scheme='https', netloc='example.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # 👉️ example.com path = parsed.path print(path) # 👉️ /images/wallpaper.jpg with_path = base + '/'.join(path.split('/')[:-1]) print(with_path) # 👉️ example.com/images print(path.split('/')) # 👉️ ['', 'images', 'wallpaper.jpg']

We used the urlparse method from the urllib.parse module.

The urlparse method takes a URL and parses it into six components.

main.py
from urllib.parse import urlparse my_url = 'https://example.com/images/wallpaper.jpg' parsed = urlparse(my_url) # 👇️ ParseResult(scheme='https', netloc='example.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # 👉️ example.com path = parsed.path print(path) # 👉️ /images/wallpaper.jpg
The netloc attribute on the parse result returns the base URL.

We also have access to other attributes like path, query, etc.

If you need to exclude a portion of the path from the result, use the str.rsplit() or str.split() methods.

main.py
from urllib.parse import urlparse my_url = 'https://example.com/images/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # 👉️ example.com path = parsed.path print(path) # 👉️ /images/wallpaper.jpg with_path = base + '/'.join(path.rsplit('/', 1)[:-1]) print(with_path) # 👉️ example.com/images print(path.rsplit('/', 1)) # 👉️ ['/images', 'wallpaper.jpg']

The str.rsplit method returns a list of the words in the string using the provided separator as the delimiter string.

main.py
# 👇️ ['/images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/', 1)) # 👇️ ['', 'images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/'))

The method takes the following 2 arguments:

NameDescription
separatorSplit the string into substrings on each occurrence of the separator
maxsplitAt most maxsplit splits are done, the rightmost ones (optional)

Except for splitting from the right, rsplit() behaves like split().

You can set the maxsplit argument to 1 if you only want to split once from the right.

Here is another example with a more deeply nested URL.

main.py
from urllib.parse import urlparse my_url = 'https://example.com/images/nature/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # 👉️ example.com path = parsed.path print(path) # 👉️ /images/wallpaper.jpg with_path = base + path.rsplit('/', 2)[0] print(with_path) # 👉️ example.com/images print(path.rsplit('/', 2)) # 👉️ ['/images', 'nature', 'wallpaper.jpg']

The URL in the example is one level deeper.

We split the URL 2 times from the right and added the /images path to the base URL.

I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.