Sling Academy
Home/Python/Python: How to unescape HTML entities in a string

Python: How to unescape HTML entities in a string

Last updated: May 20, 2023

Overview

HTML entities are special characters that are used to represent characters that have special meaning in HTML or that are not part of the character set. They start with an ampersand (&) and end with a semicolon (;). Some common HTML entities are:

& // ampersand
< // less than
> // greater than
© // copyright

This practical, example-centric shows you a couple of different ways to unescape HTML entities in a given string in Python. No more boring words; let’s get to the point.

Using the html module

You can use the html.unescape() function to turn all HTML entities to their corresponding characters. Here’s how you can do it:

import html

def unescape_html_entities(text):
    return html.unescape(text)

text = "©2023 Sling Academy. Happy coding & enjoy the day."
print(unescape_html_entities(text))

Output:

©2023 Sling Academy. Happy coding & enjoy the day.

html is a built-in module of Python, so you don’t have to install anything.

Using BeautifulSoup4

This solution leverages the beautifulsoup4 library to parse HTML entities and return the desired result with all HTML entities converted to their corresponding characters.

Install the library:

pip install beautifulsoup4

Example:

from bs4 import BeautifulSoup

def unescape_html_entities(text):
    soup = BeautifulSoup(text, 'html.parser')
    return soup.get_text()

text = "Is 1 > 2? & < ? I dunno. "Yes" & 'No'."
print(unescape_html_entities(text))

Output:

Is 1 > 2? & < ? I dunno. "Yes" & 'No'.

That’s it. Happy coding & have a nice day!

Next Article: Python: 3 ways to remove empty lines from a string

Previous Article: Python: 5 ways to remove HTML tags from a string

Series: Working with Strings in Python

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots