JavaScript Regular Expressions: Extract & Validate URLs

Updated: February 27, 2023 By: Khue Post a comment

This concise article shows you how to use regular expressions to extract and validate URLs in JavaScript. No more delay; let’s get to the point.

Extracting URLs

To extract URLs from a string, use the match() method with this regular expression:

/(https?:\/\/[^\s]+)/g

Example:

const text =
  'Check out my website at https://www.slingacademy.com or http://api.slingacademy.com/v1/sample-data/photos';

const urlRegex = /(https?:\/\/[^\s]+)/g;

// Extract URLs
const urls = text.match(urlRegex);
console.log(urls); 

Output:

[
  'https://www.slingacademy.com',
  'http://api.slingacademy.com/v1/sample-data/photos'
]

Validating a URL

Why do we need a different pattern?

The pattern for extracting URLs and the pattern for validating URLs can be different because they serve different purposes.

The pattern for extracting URLs is designed to identify all possible URLs within a given text, including partial or incomplete URLs. This pattern should match any string that resembles a URL, even if it’s not necessarily a valid URL.

On the other hand, the pattern for validating URLs is designed to check if a given string is a valid URL according to certain rules. This pattern should only match strings that are complete and valid URLs, and it should reject strings that do not meet the criteria for a valid URL.

Therefore, the pattern for extracting URLs can be more lenient and permissive, while the pattern for validating URLs should be more strict and exacting.

The Pattern

The following is a regular expression pattern to validate URLs:

/^(https?:\/\/)?([\da-z.-]+)\.([a-z.]{2,6})([\/\w .-]*)*\/?$/

The table below explains each component of the regular expression above:

ComponentExplanation
(https?:\/\/)?Matches the optional protocol (http or https)
([\da-z.-]+)Matches one or more digits, lowercase letters, dots, or hyphens (subdomain and domain name)
\.Matches a dot
([a-z.]{2,6})Matches a top-level domain (e.g., com, org, net) with a length between 2 and 6 characters
([\/\w .-]*)*Matches an optional path with forward slashes, alphanumeric characters, spaces, dots, or hyphens
\/?Matches an optional trailing slash
$Matches the end of the string

Keep in mind that URLs can be complex and that no regular expression can match all of them perfectly. You should be flexible and adapt it if your case is special and rare.

Example

You can check whether a given string is a valid URL or not by using the test() method with the regular expression above, as follows:

const url1 = 'https://www.slingacademy.com';
const url2 = 'https://www.slingacademy.com/contact/';
const url3 = 'abcdefghiklmlnopqrstuvwxyz';

const urlRegex = /^(https?:\/\/)?([\da-z.-]+)\.([a-z.]{2,6})([\/\w .-]*)*\/?$/;

// validate URLs
if (urlRegex.test(url1)) {
  console.log(`${url1} is a valid URL`);
} else {
  console.log(`${url1} is not a valid URL`);
}

if (urlRegex.test(url2)) {
  console.log(`${url2} is a valid URL`);
} else {
  console.log(`${url2} is not a valid URL`);
}

if (urlRegex.test(url3)) {
  console.log(`${url3} is a valid URL`);
} else {
  console.log(`${url3} is not a valid URL`);
}

Output:

https://www.slingacademy.com is a valid URL
https://www.slingacademy.com/contact/ is not a valid URL
abcdefghiklmlnopqrstuvwxyz is not a valid URL