PHP: How to Unescape HTML Entities

Updated: January 10, 2024 By: Guest Contributor Post a comment

Introduction

HTML entities are a vital part of web development, allowing characters which have special significance in HTML to be represented correctly. In PHP, there’s sometimes a need to convert these entities back to their applicable characters. This article explores how to effectively unescape HTML entities in PHP.

Basic Usage of html_entity_decode

The html_entity_decode function is the built-in PHP function used to convert HTML entities back to their corresponding characters. Here’s a simple example demonstrating its usage:

<?php
$str = '&lt;div&gt;Hello World&lt;/div&gt;';
echo html_entity_decode($str);
// Output: Hello World
?>

Specifying Character Encoding

By default, html_entity_decode uses the default character encoding. To specify a different encoding, pass it as the second argument:

<?php
$str = '&eacute;';
echo html_entity_decode($str, ENT_COMPAT | ENT_HTML401, 'UTF-8');
// Output: é
?>

Handling Quotes with Flags

Sometimes you may also want to define how quotes are handled during the decoding. Flags like ENT_COMPAT, ENT_QUOTES, and ENT_NOQUOTES can be used:

<?php
$str = 'Bob&#039;s &lt;em&gt;Special&lt;/em&gt; Burger';
echo html_entity_decode($str, ENT_QUOTES); // Decodes double and single quotes
// Output: Bob's Special Burger
?>

Decoding in Specific Scenarios

In advanced use-cases, you may encounter entities that are not predefined in HTML or you might want to define custom entity decoding behavior. This can be managed using the get_html_translation_table function combined with strtr:

<?php
$str = 'The &copy; symbol';
$trans = get_html_translation_table(HTML_ENTITIES);
$trans = array_flip($trans);
// Add a custom entity
$trans['&copy;'] = '(c)';
echo strtr($str, $trans);
// Output: The (c) symbol
?>

Handling All Entities Including Custom Ones

For handling entities including custom defined entities with mappings, additional manipulation of the translation table may be required:

<?php
// This could be content retrieved from a database where custom entities are used
echo html_entity_decode('Oxygen &Osub2;amp; is essential.', ENT_QUOTES, 'UTF-8');
// Custom entity translation map
$custom_entities = array(
    '&Osub2;' => 'Oâ‚‚'
);
$str = strtr($html, $custom_entities);
// Output: Oxygen Oâ‚‚ is essential.
?>

Troubleshooting Common Issues

Common pitfalls including handling invalid character sequences, understanding encoding types and dealing with character conversion issues are all points which require attention. This section will discuss troubleshooting strategies and code practices to handle different types of character encoding and representation issues effectively.

Security Considerations

When decoding HTML entities, be wary of potential security issues, such as Cross-Site Scripting (XSS). Always sanitize input before outputting to the browser. The htmlspecialchars function can convert potentially dangerous characters to their entity equivalents before storing or displaying user-supplied content.

Summary

Unescaping HTML entities in PHP is generally straightforward thanks to html_entity_decode. However, when dealing with unusual character sets, custom entities, or avoiding security pitfalls, careful consideration of the context and the use of appropriate flags, encoding types, and security measures is important. This article took you through various scenarios and provided suitable examples to master unescaping HTML entities in PHP.