Python: 3 Ways to Get File Name and Extension from URL

Suppose you have a file URL like this:

https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv

Or like this:

https://api.slingacademy.com/public/sample-photos/1.jpeg

How can you programmatically extract the file name (with or without extension) and the file extension from each URL? This concise, code-centric article will show you three different ways to get the job done. We’ll only use built-in features of Python and won’t use any third-party packages.

Using the urllib and the os modules
Using regular expressions
Using the urllib and pathlib modules
Final Words

Using the urllib and the os modules

This approach doesn’t require installing any third-party library. It is competent to handle different URL formats and variations in file extensions. The steps you can follow are:

Use the urlparse() function from the urllib.parse module to parse the given URL and extract the path component.
Use the os.path module’s basename() function to get the filename (including extension) from the path.
Use the os.path module’s splitext() function to separate the name part and the extension part.

Code example:

from urllib.parse import urlparse
import os

def get_filename_and_extension(url):
    parsed_url = urlparse(url)
    path = parsed_url.path
    filename = os.path.basename(path)
    filename_without_extension, file_extension = os.path.splitext(filename)
    return filename, filename_without_extension, file_extension


# Test it out
url = 'https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: marketing-campaigns.csv
Filename without extension: marketing-campaigns
File extension: .csv

Using regular expressions

Here is the regular expression pattern we’ll use to match the filename (without extension) and the extension from a given URL:

r"/([^/]+)(\.[^.]+)$"

With this pattern defined, the re.search() function can take care of the remaining work.

Code example:

import re

def get_filename_and_extension(url):
    pattern = r"/([^/]+)(\.[^.]+)$"
    match = re.search(pattern, url)
    if match:
        filename_without_extension = match.group(1)
        file_extension = match.group(2)
        filename = filename_without_extension + file_extension
        return filename, filename_without_extension, file_extension
    else:
        return None

# Test it out
url = 'https://api.slingacademy.com/public/sample-photos/1.jpeg'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: 1.jpeg
Filename without extension: 1
File extension: .jpeg

Using the urllib and pathlib modules

The pathlib module was added in Python 3.4 (which was released in 2014 – a very long time ago). We can use its Path class for the purpose of extracting the filename and file extension from a given URL. below are the steps:

Import the urlparse() function from urllib.parse and import the Path class from the pathlib module.
Create a Path object with the URL.
Get the filename using the name attribute.
Use the stem property to extract the filename without the extension.
Use the suffix property to get the file extension.

Code example:

from urllib.parse import urlparse
from pathlib import Path

def get_filename_and_extension(url):
    parsed_url = urlparse(url)
    path = parsed_url.path
    filename = Path(path).name
    filename_without_extension = Path(filename).stem
    file_extension = Path(filename).suffix
    return filename, filename_without_extension, file_extension


# Test it out
url = 'https://api.slingacademy.com/v1/examples/sample-page.html'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: sample-page.html
Filename without extension: sample-page
File extension: .html

This method has some similarities with the first method. In terms of the amount of code to write and the efficiency, they are about the same.

Final Words

You’ve learned several ways to retrieve the filename and file extension from a network URL by using some Python standard modules. Which solution do you like the most? Please let me know by leaving a comment. Happy coding, and enjoy your day. Good luck!

Next Article: Python: Get Hostname and Protocol from a URL

Previous Article: Python: 3 ways to extract all URLs from text

Series: Working with Strings in Python

Python