Suppose you have a file URL like this:
https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv
Or like this:
https://api.slingacademy.com/public/sample-photos/1.jpeg
How can you programmatically extract the file name (with or without extension) and the file extension from each URL? This concise, code-centric article will show you three different ways to get the job done. We’ll only use built-in features of Python and won’t use any third-party packages.
Using the urllib and the os modules
This approach doesn’t require installing any third-party library. It is competent to handle different URL formats and variations in file extensions. The steps you can follow are:
- Use the
urlparse()
function from theurllib.parse
module to parse the given URL and extract the path component. - Use the
os.path
module’sbasename()
function to get the filename (including extension) from the path. - Use the
os.path
module’ssplitext()
function to separate the name part and the extension part.
Code example:
from urllib.parse import urlparse
import os
def get_filename_and_extension(url):
parsed_url = urlparse(url)
path = parsed_url.path
filename = os.path.basename(path)
filename_without_extension, file_extension = os.path.splitext(filename)
return filename, filename_without_extension, file_extension
# Test it out
url = 'https://api.slingacademy.com/v1/sample-data/files/marketing-campaigns.csv'
filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")
Output:
Filename: marketing-campaigns.csv
Filename without extension: marketing-campaigns
File extension: .csv
Using regular expressions
Here is the regular expression pattern we’ll use to match the filename (without extension) and the extension from a given URL:
r"/([^/]+)(\.[^.]+)$"
With this pattern defined, the re.search()
function can take care of the remaining work.
Code example:
import re
def get_filename_and_extension(url):
pattern = r"/([^/]+)(\.[^.]+)$"
match = re.search(pattern, url)
if match:
filename_without_extension = match.group(1)
file_extension = match.group(2)
filename = filename_without_extension + file_extension
return filename, filename_without_extension, file_extension
else:
return None
# Test it out
url = 'https://api.slingacademy.com/public/sample-photos/1.jpeg'
filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")
Output:
Filename: 1.jpeg
Filename without extension: 1
File extension: .jpeg
Using the urllib and pathlib modules
The pathlib
module was added in Python 3.4 (which was released in 2014 – a very long time ago). We can use its Path
class for the purpose of extracting the filename and file extension from a given URL. below are the steps:
- Import the
urlparse()
function fromurllib.parse
and import thePath
class from thepathlib
module. - Create a
Path
object with the URL. - Get the filename using the
name
attribute. - Use the
stem
property to extract the filename without the extension. - Use the
suffix
property to get the file extension.
Code example:
from urllib.parse import urlparse
from pathlib import Path
def get_filename_and_extension(url):
parsed_url = urlparse(url)
path = parsed_url.path
filename = Path(path).name
filename_without_extension = Path(filename).stem
file_extension = Path(filename).suffix
return filename, filename_without_extension, file_extension
# Test it out
url = 'https://api.slingacademy.com/v1/examples/sample-page.html'
filename, filename_without_extension, file_extension = get_filename_and_extension(url)
print(f"Filename: {filename}")
print(f"Filename without extension: {filename_without_extension}")
print(f"File extension: {file_extension}")
Output:
Filename: sample-page.html
Filename without extension: sample-page
File extension: .html
This method has some similarities with the first method. In terms of the amount of code to write and the efficiency, they are about the same.
Final Words
You’ve learned several ways to retrieve the filename and file extension from a network URL by using some Python standard modules. Which solution do you like the most? Please let me know by leaving a comment. Happy coding, and enjoy your day. Good luck!