Sling Academy
Home/Python/Python: 3 Ways to Get File Name and Extension from URL

Python: 3 Ways to Get File Name and Extension from URL

Last updated: June 02, 2023

Suppose you have a file URL like this:

https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv

Or like this:

https://api.slingacademy.com/public/sample-photos/1.jpeg

How can you programmatically extract the file name (with or without extension) and the file extension from each URL? This concise, code-centric article will show you three different ways to get the job done. We’ll only use built-in features of Python and won’t use any third-party packages.

Using the urllib and the os modules

This approach doesn’t require installing any third-party library. It is competent to handle different URL formats and variations in file extensions. The steps you can follow are:

  1. Use the urlparse() function from the urllib.parse module to parse the given URL and extract the path component.
  2. Use the os.path module’s basename() function to get the filename (including extension) from the path.
  3. Use the os.path module’s splitext() function to separate the name part and the extension part.

Code example:

from urllib.parse import urlparse
import os

def get_filename_and_extension(url):
    parsed_url = urlparse(url)
    path = parsed_url.path
    filename = os.path.basename(path)
    filename_without_extension, file_extension = os.path.splitext(filename)
    return filename, filename_without_extension, file_extension


# Test it out
url = 'https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: marketing-campaigns.csv
Filename without extension: marketing-campaigns
File extension: .csv

Using regular expressions

Here is the regular expression pattern we’ll use to match the filename (without extension) and the extension from a given URL:

r"/([^/]+)(\.[^.]+)$"

With this pattern defined, the re.search() function can take care of the remaining work.

Code example:

import re

def get_filename_and_extension(url):
    pattern = r"/([^/]+)(\.[^.]+)$"
    match = re.search(pattern, url)
    if match:
        filename_without_extension = match.group(1)
        file_extension = match.group(2)
        filename = filename_without_extension + file_extension
        return filename, filename_without_extension, file_extension
    else:
        return None

# Test it out
url = 'https://api.slingacademy.com/public/sample-photos/1.jpeg'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: 1.jpeg
Filename without extension: 1
File extension: .jpeg

Using the urllib and pathlib modules

The pathlib module was added in Python 3.4 (which was released in 2014 – a very long time ago). We can use its Path class for the purpose of extracting the filename and file extension from a given URL. below are the steps:

  1. Import the urlparse() function from urllib.parse and import the Path class from the pathlib module.
  2. Create a Path object with the URL.
  3. Get the filename using the name attribute.
  4. Use the stem property to extract the filename without the extension.
  5. Use the suffix property to get the file extension.

Code example:

from urllib.parse import urlparse
from pathlib import Path

def get_filename_and_extension(url):
    parsed_url = urlparse(url)
    path = parsed_url.path
    filename = Path(path).name
    filename_without_extension = Path(filename).stem
    file_extension = Path(filename).suffix
    return filename, filename_without_extension, file_extension


# Test it out
url = 'https://api.slingacademy.com/v1/examples/sample-page.html'

filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")

Output:

Filename: sample-page.html
Filename without extension: sample-page
File extension: .html

This method has some similarities with the first method. In terms of the amount of code to write and the efficiency, they are about the same.

Final Words

You’ve learned several ways to retrieve the filename and file extension from a network URL by using some Python standard modules. Which solution do you like the most? Please let me know by leaving a comment. Happy coding, and enjoy your day. Good luck!

Next Article: Python: Get Hostname and Protocol from a URL

Previous Article: Python: 3 ways to extract all URLs from text

Series: Working with Strings in Python

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots