This concise example-based article will walk you through three different ways to check whether a string is a valid URL or not in Python. The first two approaches only use built-in features of Python, while the last one takes advantage of a third-party library.
Using regular expressions
You can declare a regular expression pattern to match valid URLs as follows:
pattern = r'^(http|https):\/\/([\w.-]+)(\.[\w.-]+)+([\/\w\.-]*)*\/?$'
The re.match()
or re.search()
function can then be used to check if the input string matches the pattern.
Code example:
import re
def is_valid_url(url):
pattern = r'^(http|https):\/\/([\w.-]+)(\.[\w.-]+)+([\/\w\.-]*)*\/?$'
return bool(re.match(pattern, url))
# Usage
url1 = 'https://www.slingacademy.com/cat/sample-data'
url2 = "https://api.slingacademy.com/v1/sample-data/"
url3 = "abcxyz"
print(is_valid_url(url1)) # True
print(is_valid_url(url2)) # True
print(is_valid_url(url3)) # False
This approach is flexible and allows customization of the URL validation pattern to fit specific requirements. However, regular expressions can be complex and may require careful crafting to cover all possible URL scenarios.
Using the urllib module
The urllib
module, part of the Python standard library, provides a robust URL parsing mechanism. You can make use of it for the purpose of URL validation. The steps to do that are as follows:
- Import the
urlparse()
function fromurllib.parse
. - Use the
urlparse()
function to parse the input URL string. - Check if the
scheme
andnetloc
attributes of the parsed result are non-empty, indicating a valid URL.
Code example:
from urllib.parse import urlparse
def is_valid_url(url):
try:
result = urlparse(url)
return all([result.scheme, result.netloc])
except ValueError:
return False
# Usage
url1 = 'https://www.slingacademy.com/cat/sample-data'
url2 = "https://api.slingacademy.com/v1/sample-data/"
url3 = "http://localhost:3000"
print(is_valid_url(url1)) # True
print(is_valid_url(url2)) # True
print(is_valid_url(url3)) # True
This approach is simple, but it may allow URLs without a scheme or netloc, which may not be desired depending on the specific requirements. For instance, it judges https://slingacademy
as a valid URL even though .com
is omitted.
Using the validators package
validators
is a popular open-source library, designed specifically for data validation purposes in Python. You can install it by running the following command:
pip install validators
Then use its validators.url()
function to check whether a given string is a valid URL like this:
import validators
url1 = 'https://www.slingacademy.com/cat/sample-data'
url2 = "https://api.slingacademy.com/v1/sample-data/"
url3 = "http://localhost:3000"
url4 = "localhost:3000"
print(validators.url(url1))
print(validators.url(url2))
print(validators.url(url3))
print(validators.url(url4))
Output:
True
True
True
ValidationFailure(func=url, args={'value': 'localhost:3000', 'public': False})
As you can see, the validators
library makes our lives much easier. The tutorial ends here. Happy coding & enjoy your day!