Sling Academy
Home/PHP/PHP: How to remove HTML tags from a string

PHP: How to remove HTML tags from a string

Last updated: January 10, 2024

Overview

Removing HTML tags from strings is a common task in PHP development, ensuring clean text for processing, storage, or display. Proper handling of HTML cleanup promotes security and data integrity.

Basics of Stripping HTML Tags

PHP provides a built-in function strip_tags(), which strips HTML and PHP tags from a string.

$stringWithHtml = '<h1>Hello World!</h1>';
$cleanString = strip_tags($stringWithHtml);
echo $cleanString; // Outputs: Hello World!

However, sometimes you might want to allow certain tags for formatting purposes.

$stringWithHtml = '<p>Hello,<span style="color:red;"> World!</span></p>';
$allowedTags = '<p><span>';
$cleanString = strip_tags($stringWithHtml, $allowedTags);
echo $cleanString; // Outputs: <p>Hello,<span style="color:red;"> World!</span></p>

Dealing with Malicious Code

While strip_tags() is effective, it may not be enough to prevent XSS attacks. Here’s where htmlspecialchars() comes into play, converting special characters to HTML entities.

$stringWithHtml = '<script>alert("XSS Attack!")</script>' +
'<div>Some text</div>';
$safeString = htmlspecialchars($stringWithHtml);
echo $safeString; // Outputs: &lt;script&gt;alert("XSS Attack!")&lt;/script&gt;&lt;div&gt;Some text&lt;/div&gt;

Custom HTML Tag Stripping Functions

What if you need more control? You can write custom functions using regular expressions with preg_replace().

$stringWithHtml = '<div style="font-size: 18px;">Text</div>';
$cleanString = preg_replace('/<\/?.+?(>|$/s', '', $stringWithHtml);
echo $cleanString; // Outputs: Text

Using DOMDocument for Advanced HTML Manipulation

For more complex operations, such as removing scripts but keeping other tags intact, the DOMDocument class is very powerful.

$dom = new DOMDocument();
$dom->loadHTML($stringWithHtml);
$scriptTags = $dom->getElementsByTagName('script');

foreach ($scriptTags as $tag) {
    $tag->parentNode->removeChild($tag);
}

echo $dom->saveHTML(); // Outputs HTML without script tags

Libraries for Sanitizing HTML

Third-party libraries, like HTML Purifier, provide a solid foundation for cleaning up HTML content whilst maintaining a balance between security and flexibility.

require_once 'HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);

$cleanHtml = $purifier->purify($dirtyHtml);
echo $cleanHtml;

Conclusion

In conclusion, PHP offers multiple ways to remove HTML tags from strings. With built-in functions for quick usage, regular expressions for customized solutions, and advanced classes like DOMDocument, you have the flexibility to handle HTML content securely. External libraries like HTML Purifier can be adopted for more robust requirements and enhanced security.

Next Article: PHP: Split a string into N evenly sized chunks

Previous Article: PHP: How to convert a string to char codes and vice versa

Series: Working with Numbers and Strings in PHP

PHP

You May Also Like

  • Pandas DataFrame.value_counts() method: Explained with examples
  • Constructor Property Promotion in PHP: Tutorial & Examples
  • Understanding mixed types in PHP (5 examples)
  • Union Types in PHP: A practical guide (5 examples)
  • PHP: How to implement type checking in a function (PHP 8+)
  • Symfony + Doctrine: Implementing cursor-based pagination
  • Laravel + Eloquent: How to Group Data by Multiple Columns
  • PHP: How to convert CSV data to HTML tables
  • Using ‘never’ return type in PHP (PHP 8.1+)
  • Nullable (Optional) Types in PHP: A practical guide (5 examples)
  • Explore Attributes (Annotations) in Modern PHP (5 examples)
  • An introduction to WeakMap in PHP (6 examples)
  • Type Declarations for Class Properties in PHP (5 examples)
  • Static Return Type in PHP: Explained with examples
  • PHP: Using DocBlock comments to annotate variables
  • PHP: How to ping a server/website and get the response time
  • PHP: 3 Ways to Get City/Country from IP Address
  • PHP: How to find the mode(s) of an array (4 examples)
  • PHP: Calculate standard deviation & variance of an array