Last updated: Apr 9, 2024
Reading timeยท2 min
To get the base of a URL:
urlparse
method from the urllib.parse
module.netloc
attribute on the parse result.from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) # ๐๏ธ ParseResult(scheme='https', netloc='bobbyhadz.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg with_path = base + '/'.join(path.split('/')[:-1]) print(with_path) # ๐๏ธ bobbyhadz.com/images print(path.split('/')) # ๐๏ธ ['', 'images', 'wallpaper.jpg']
We used the urlparse() method from the urllib.parse module.
The urlparse
method takes a URL and parses it into six components.
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) # ๐๏ธ ParseResult(scheme='https', netloc='bobbyhadz.com', path='/images/wallpaper.jpg', params='', query='', fragment='') print(parsed) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg
netloc
attribute on the parse result returns the base URL.We also have access to other attributes like path
, query
, etc.
If you need to exclude a portion of the path from the result, use the
str.rsplit()
or str.split()
methods.
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg with_path = base + '/'.join(path.rsplit('/', 1)[:-1]) print(with_path) # ๐๏ธ bobbyhadz.com/images print(path.rsplit('/', 1)) # ๐๏ธ ['/images', 'wallpaper.jpg']
The str.rsplit() method returns a list of the words in the string using the provided separator as the delimiter string.
# ๐๏ธ ['/images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/', 1)) # ๐๏ธ ['', 'images', 'wallpaper.jpg'] print('/images/wallpaper.jpg'.rsplit('/'))
The method takes the following 2 arguments:
Name | Description |
---|---|
separator | Split the string into substrings on each occurrence of the separator |
maxsplit | At most maxsplit splits are done, the rightmost ones (optional) |
Except for splitting from the right, rsplit()
behaves like split()
.
maxsplit
argument to 1
if you only want to split once from the right.Here is another example with a more deeply nested URL.
from urllib.parse import urlparse my_url = 'https://bobbyhadz.com/images/nature/wallpaper.jpg' parsed = urlparse(my_url) base = parsed.netloc print(base) # ๐๏ธ bobbyhadz.com path = parsed.path print(path) # ๐๏ธ /images/wallpaper.jpg with_path = base + path.rsplit('/', 2)[0] print(with_path) # ๐๏ธ bobbyhadz.com/images print(path.rsplit('/', 2)) # ๐๏ธ ['/images', 'nature', 'wallpaper.jpg']
The URL in the example is one level deeper.
We split the URL 2 times from the right and added the /images
path to the base
URL.