Sling Academy
Home/PHP/PHP: Extract URLs from a string

PHP: Extract URLs from a string

Last updated: January 09, 2024

Introduction

As you navigate the web programmatically or parse text data in PHP, you may often need to extract URLs from strings. This skill is particularly useful for web scraping, data migration, and SEO tools development. In this tutorial, we will explore multiple methods to accomplish this with PHP, enhancing our toolkit from basic to advanced as we progress.

Basic URL Extraction

To start, we will discuss the simplest way to extract URLs using PHP’s built-in functions. The
egex_match_all() function is a powerful tool that can search for patterns defined by Regular Expressions within a string. A basic Regular Expression for URL extraction would look like this:

// The input string containing URLs
$string = 'Check out https://www.example.com and http://www.foo.com.';

// Regular Expression Pattern for a basic URL
$pattern = '/\b(?:https?:\/\/)[a-zA-Z0-9\.\-]+(?:\.[a-zA-Z]{2,})(?:\/\S*)?/';

// Array to hold the matched URLs
$matches = [];

// Perform the pattern match
preg_match_all($pattern, $string, $matches);

// Print the matches
print_r($matches[0]);

Improved URL Extraction with Regex

As we go deeper, we can refine our Regular Expression to better handle edge cases and different URL formats:

// Improved Regular Expression Pattern
$pattern = '/\b(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\.\-]+\.\w+(?:\/[\w\/.?-]*)?/';
// Rest of the code is the same...

This regex takes into account optional protocols and subdomains, as well as various URL path components.

Using PHP Filters

Beyond regex, PHP provides filters that can validate and sanitize data, including URLs. Here we demonstrate how to employ filter_var with the FILTER_VALIDATE_URL flag to find and validate URLs:

// Split the input string by spaces or any other delimiters you expect
$parts = preg_split('/\s+/', $string);

// Array to hold valid URLs
$validURLs = [];

foreach ($parts as $part) {
    if (filter_var($part, FILTER_VALIDATE_URL) !== false) {
        $validURLs[] = $part;
    }
}

// Print the valid URLs
print_r($validURLs);

Advanced URL Extraction

In more complex scenarios, such as dealing with encoded URLs or URLs embedded within scripts or styles, additional parsing logic is required. Libraries or functions capable of more deeply understanding the structure of HTML can help:

// For example, using the PHP Simple HTML DOM Parser:

// Assume we're using the simple_html_dom library available through Composer. Be sure you have included the library in your project.

// Create a DOM object from a string
$html = str_get_html($string);

// Find all the links
foreach($html->find('a') as $element) {
    echo $element->href . '\n';
}

// Remember to handle script, style, or encoded URLs differently
// Additional parsing logic here

This will require you to handle more cases and also, perhaps, to employ some additional libraries for robust HTML parsing.

Conclusion

In this tutorial, we covered ways to extract URLs from strings in PHP, starting from a simple regex and progressing to advanced methods utilizing PHP’s native functions and external libraries. By now, you should have a good understanding of how to approach this common task and adapt the examples to fit more complex scenarios or specific requirements in your projects.

Next Article: PHP: How to Escape Double Quotes in a String

Previous Article: PHP: 3 Ways to Validate Credit Card Patterns

Series: Working with Numbers and Strings in PHP

PHP

You May Also Like

  • Pandas DataFrame.value_counts() method: Explained with examples
  • Constructor Property Promotion in PHP: Tutorial & Examples
  • Understanding mixed types in PHP (5 examples)
  • Union Types in PHP: A practical guide (5 examples)
  • PHP: How to implement type checking in a function (PHP 8+)
  • Symfony + Doctrine: Implementing cursor-based pagination
  • Laravel + Eloquent: How to Group Data by Multiple Columns
  • PHP: How to convert CSV data to HTML tables
  • Using ‘never’ return type in PHP (PHP 8.1+)
  • Nullable (Optional) Types in PHP: A practical guide (5 examples)
  • Explore Attributes (Annotations) in Modern PHP (5 examples)
  • An introduction to WeakMap in PHP (6 examples)
  • Type Declarations for Class Properties in PHP (5 examples)
  • Static Return Type in PHP: Explained with examples
  • PHP: Using DocBlock comments to annotate variables
  • PHP: How to ping a server/website and get the response time
  • PHP: 3 Ways to Get City/Country from IP Address
  • PHP: How to find the mode(s) of an array (4 examples)
  • PHP: Calculate standard deviation & variance of an array