Join a base URL with another URLs in Python

avatar

Borislav Hadzhiev

Last updated: Jun 18, 2022

banner

Photo from Unsplash

Join a base URL with another URLs in Python #

Use the urljoin method from the urllib.parse module to join a base URL with another URLs, e.g. result = urljoin(base_url, path). The urljoin method constructs a full (absolute) URL by combining a base URL with another URL.

main.py
from urllib.parse import urljoin base_url = 'https://example.com' path = 'images/static/cat.jpg' result = urljoin(base_url, path) # 👇️ https://example.com/images/static/cat.jpg print(result) # 👇️ /global/images/static/dog.png print(urljoin('/global/images/', 'static/dog.png'))

If you have multiple URL components, use the posixpath module to join them before passing them to the urljoin() method.

main.py
import posixpath from urllib.parse import urljoin base_url = 'https://example.com' path_1 = 'images' path_2 = 'static' path_3 = 'cat.jpg' path = posixpath.join(path_1, path_2, path_3) print(path) # 👉️ 'images/static/cat.jpg' result = urljoin(base_url, path) # 👇️ https://example.com/images/static/cat.jpg print(result)

The urllib.parse.urljoin method takes a base URL and another URL as parameters and constructs a full (absolute) URL by combining them.

You can also use the urljoin method to join URL path components.

main.py
from urllib.parse import urljoin # 👇️ /global/images/static/dog.png print(urljoin('/global/images/', 'static/dog.png'))

Make sure the output you get is what you expect because the urljoin method can be a bit confusing when working with URL components that don't end in a forward slash /.

Here is an example.

main.py
from urllib.parse import urljoin # 👇️ /global/static/dog.png print(urljoin('/global/images', 'static/dog.png'))

Notice that the method stripped images from the first component before joining the second component.

The method behaves as expected when the first component ends with a forward slash.

main.py
from urllib.parse import urljoin # 👇️ /global/images/static/dog.png print(urljoin('/global/images/', 'static/dog.png'))

You might also notice confusing behavior if the second component starts with a forward slash.

main.py
from urllib.parse import urljoin # 👇️ /static/dog.png print(urljoin('/global/images', '/static/dog.png'))

When the second component starts with a forward slash, it is assumed to start at the root.

The posixpath.join() method is a bit more predictable and could also be used to join URL path components.

main.py
import posixpath # 👇️ /global/images/static/dog.png print(posixpath.join('/global/images', 'static/dog.png')) # 👇️ /global/images/static/dog.png print(posixpath.join('/global/images/', 'static/dog.png')) # 👇️ /static/dog.png print(posixpath.join('/global/images', '/static/dog.png'))

The posixpath.join method can also be passed more than 2 paths.

main.py
import posixpath # 👇️ /global/images/static/dog.png print(posixpath.join('/global', 'images', 'static', 'dog.png'))
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
book cover
You can use the search field on my Home Page to filter through all of my articles.