Suppose you have a file URL like this:
https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csvOr like this:
https://api.slingacademy.com/public/sample-photos/1.jpegHow can you programmatically extract the file name (with or without extension) and the file extension from each URL? This concise, code-centric article will show you three different ways to get the job done. We’ll only use built-in features of Python and won’t use any third-party packages.
Using the urllib and the os modules
This approach doesn’t require installing any third-party library. It is competent to handle different URL formats and variations in file extensions. The steps you can follow are:
- Use the
urlparse()function from theurllib.parsemodule to parse the given URL and extract the path component. - Use the
os.pathmodule’sbasename()function to get the filename (including extension) from the path. - Use the
os.pathmodule’ssplitext()function to separate the name part and the extension part.
Code example:
from urllib.parse import urlparse
import os
def get_filename_and_extension(url):
parsed_url = urlparse(url)
path = parsed_url.path
filename = os.path.basename(path)
filename_without_extension, file_extension = os.path.splitext(filename)
return filename, filename_without_extension, file_extension
# Test it out
url = 'https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv'
filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")
Output:
Filename: marketing-campaigns.csv
Filename without extension: marketing-campaigns
File extension: .csvUsing regular expressions
Here is the regular expression pattern we’ll use to match the filename (without extension) and the extension from a given URL:
r"/([^/]+)(\.[^.]+)$"With this pattern defined, the re.search() function can take care of the remaining work.
Code example:
import re
def get_filename_and_extension(url):
pattern = r"/([^/]+)(\.[^.]+)$"
match = re.search(pattern, url)
if match:
filename_without_extension = match.group(1)
file_extension = match.group(2)
filename = filename_without_extension + file_extension
return filename, filename_without_extension, file_extension
else:
return None
# Test it out
url = 'https://api.slingacademy.com/public/sample-photos/1.jpeg'
filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")Output:
Filename: 1.jpeg
Filename without extension: 1
File extension: .jpegUsing the urllib and pathlib modules
The pathlib module was added in Python 3.4 (which was released in 2014 – a very long time ago). We can use its Path class for the purpose of extracting the filename and file extension from a given URL. below are the steps:
- Import the
urlparse()function fromurllib.parseand import thePathclass from thepathlibmodule. - Create a
Pathobject with the URL. - Get the filename using the
nameattribute. - Use the
stemproperty to extract the filename without the extension. - Use the
suffixproperty to get the file extension.
Code example:
from urllib.parse import urlparse
from pathlib import Path
def get_filename_and_extension(url):
parsed_url = urlparse(url)
path = parsed_url.path
filename = Path(path).name
filename_without_extension = Path(filename).stem
file_extension = Path(filename).suffix
return filename, filename_without_extension, file_extension
# Test it out
url = 'https://api.slingacademy.com/v1/examples/sample-page.html'
filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")Output:
Filename: sample-page.html
Filename without extension: sample-page
File extension: .htmlThis method has some similarities with the first method. In terms of the amount of code to write and the efficiency, they are about the same.
Final Words
You’ve learned several ways to retrieve the filename and file extension from a network URL by using some Python standard modules. Which solution do you like the most? Please let me know by leaving a comment. Happy coding, and enjoy your day. Good luck!