Last updated: Apr 9, 2024
Reading timeยท2 min

To get the base of a URL:
urlparse method from the urllib.parse module.netloc attribute on the parse result.from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) # ๐๏ธ ParseResult(scheme='https', netloc='bobbyhadz.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg with_path = base + '/'.join(path.split('/')[:-1]) print(with_path) # ๐๏ธ bobbyhadz.com/images print(path.split('/')) # ๐๏ธ ['', 'images', 'wallpaper.jpg']

We used the urlparse() method from the urllib.parse module.
The urlparse method takes a URL and parses it into six components.
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) # ๐๏ธ ParseResult(scheme='https', netloc='bobbyhadz.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg
netloc attribute on the parse result returns the base URL.We also have access to other attributes like path, query, etc.
If you need to exclude a portion of the path from the result, use the
str.rsplit() or str.split() methods.
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg with_path = base + '/'.join(path.rsplit('/', 1)[:-1]) print(with_path) # ๐๏ธ bobbyhadz.com/images print(path.rsplit('/', 1)) # ๐๏ธ ['/images', 'wallpaper.jpg']

The str.rsplit() method returns a list of the words in the string using the provided separator as the delimiter string.
# ๐๏ธ ['/images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/', 1)) # ๐๏ธ ['', 'images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/'))
The method takes the following 2 arguments:
| Name | Description |
|---|---|
| separator | Split the string into substrings on each occurrence of the separator |
| maxsplit | At most maxsplit splits are done, the rightmost ones (optional) |
Except for splitting from the right, rsplit() behaves like split().
maxsplit argument to 1 if you only want to split once from the right.Here is another example with a more deeply nested URL.
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/nature/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg with_path = base + path.rsplit('/', 2)[0] print(with_path) # ๐๏ธ bobbyhadz.com/images print(path.rsplit('/', 2)) # ๐๏ธ ['/images', 'nature', 'wallpaper.jpg']
The URL in the example is one level deeper.
We split the URL 2 times from the right and added the /images path to the base
URL.