Sling Academy
Home/Python/Python: 3 Ways to Get File Name and Extension from URL

Python: 3 Ways to Get File Name and Extension from URL

Last updated: June 02, 2023

Suppose you have a file URL like this:

https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv

Or like this:

https://api.slingacademy.com/public/sample-photos/1.jpeg

How can you programmatically extract the file name (with or without extension) and the file extension from each URL? This concise, code-centric article will show you three different ways to get the job done. We’ll only use built-in features of Python and won’t use any third-party packages.

Using the urllib and the os modules

This approach doesn’t require installing any third-party library. It is competent to handle different URL formats and variations in file extensions. The steps you can follow are:

  1. Use the urlparse() function from the urllib.parse module to parse the given URL and extract the path component.
  2. Use the os.path module’s basename() function to get the filename (including extension) from the path.
  3. Use the os.path module’s splitext() function to separate the name part and the extension part.

Code example:

from urllib.parse import urlparse
import os

def get_filename_and_extension(url):
    parsed_url = urlparse(url)
    path = parsed_url.path
    filename = os.path.basename(path)
    filename_without_extension, file_extension = os.path.splitext(filename)
    return filename, filename_without_extension, file_extension


# Test it out
url = 'https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: marketing-campaigns.csv
Filename without extension: marketing-campaigns
File extension: .csv

Using regular expressions

Here is the regular expression pattern we’ll use to match the filename (without extension) and the extension from a given URL:

r"/([^/]+)(\.[^.]+)$"

With this pattern defined, the re.search() function can take care of the remaining work.

Code example:

import re

def get_filename_and_extension(url):
    pattern = r"/([^/]+)(\.[^.]+)$"
    match = re.search(pattern, url)
    if match:
        filename_without_extension = match.group(1)
        file_extension = match.group(2)
        filename = filename_without_extension + file_extension
        return filename, filename_without_extension, file_extension
    else:
        return None

# Test it out
url = 'https://api.slingacademy.com/public/sample-photos/1.jpeg'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: 1.jpeg
Filename without extension: 1
File extension: .jpeg

Using the urllib and pathlib modules

The pathlib module was added in Python 3.4 (which was released in 2014 – a very long time ago). We can use its Path class for the purpose of extracting the filename and file extension from a given URL. below are the steps:

  1. Import the urlparse() function from urllib.parse and import the Path class from the pathlib module.
  2. Create a Path object with the URL.
  3. Get the filename using the name attribute.
  4. Use the stem property to extract the filename without the extension.
  5. Use the suffix property to get the file extension.

Code example:

from urllib.parse import urlparse
from pathlib import Path

def get_filename_and_extension(url):
    parsed_url = urlparse(url)
    path = parsed_url.path
    filename = Path(path).name
    filename_without_extension = Path(filename).stem
    file_extension = Path(filename).suffix
    return filename, filename_without_extension, file_extension


# Test it out
url = 'https://api.slingacademy.com/v1/examples/sample-page.html'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: sample-page.html
Filename without extension: sample-page
File extension: .html

This method has some similarities with the first method. In terms of the amount of code to write and the efficiency, they are about the same.

Final Words

You’ve learned several ways to retrieve the filename and file extension from a network URL by using some Python standard modules. Which solution do you like the most? Please let me know by leaving a comment. Happy coding, and enjoy your day. Good luck!

Next Article: Python: Get Hostname and Protocol from a URL

Previous Article: Python: 3 ways to extract all URLs from text

Series: Working with Strings in Python

Python

You May Also Like

  • Python Warning: Secure coding is not enabled for restorable state
  • Python TypeError: write() argument must be str, not bytes
  • 4 ways to install Python modules on Windows without admin rights
  • Python TypeError: object of type ‘NoneType’ has no len()
  • Python: How to access command-line arguments (3 approaches)
  • Understanding ‘Never’ type in Python 3.11+ (5 examples)
  • Python: 3 Ways to Retrieve City/Country from IP Address
  • Using Type Aliases in Python: A Practical Guide (with Examples)
  • Python: Defining distinct types using NewType class
  • Using Optional Type in Python (explained with examples)
  • Python: How to Override Methods in Classes
  • Python: Define Generic Types for Lists of Nested Dictionaries
  • Python: Defining type for a list that can contain both numbers and strings
  • Using TypeGuard in Python (Python 3.10+)
  • Python: Using ‘NoReturn’ type with functions
  • Type Casting in Python: The Ultimate Guide (with Examples)
  • Python: Using type hints with class methods and properties
  • Python: Typing a function with default parameters
  • Python: Typing a function that can return multiple types