Sling Academy
Home/Python/Python Requests module: How to parse HTML responses

Python Requests module: How to parse HTML responses

Last updated: January 02, 2024

Introduction

Working with HTML responses in Python is a common task for developers. Using the Requests module alongside parsers like BeautifulSoup, we can easily navigate and manipulate HTML content fetched from the web.

Setting up the Environment

Before parsing HTML with Python Requests, you need to install the necessary packages. Open your terminal or command prompt and run:

pip install requests
pip install beautifulsoup4

Fetching HTML Content

To fetch HTML content from a webpage, we use the Requests module’s get method:

import requests
response = requests.get('https://example.com')
html_content = response.text

Parsing HTML with BeautifulSoup

Once we have the HTML content, we can use BeautifulSoup to parse and extract data:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_content, 'html.parser')

With BeautifulSoup, locating elements by tag is straightforward:

headers = soup.find_all('h1')
for header in headers:
    print(header.text)

To find elements by class or id:

navigation_bar = soup.find('div', {'class': 'nav-bar'})
footer = soup.find('footer', {'id': 'site-footer'})

Extracting Attributes and Text

Extracting attributes like href from anchor tags can be done with:

for link in soup.find_all('a'):
    print(link.get('href'))

Similarly, to extract text you can use:

for paragraph in soup.find_all('p'):
    print(paragraph.text)

Handling Relative URLs

If you encounter relative URLs, you can resolve them by using Requests’ URL joining utilities:

from urllib.parse import urljoin

base_url = 'https://example.com'
for link in soup.find_all('a'):
    absolute_url = urljoin(base_url, link.get('href'))
    print(absolute_url)

Advanced Parsing: Using Selectors

You can make use of CSS selectors with the select method:

for item in soup.select('div.content > p.entry'):
    print(item.text)

Working with Forms

To work with forms, you can extract form fields and prepare data for submission:

form = soup.find('form')
form_action = form['action']
form_data = {input['name']: input.get('value', '') for input in form.find_all('input')}
response = requests.post(urljoin(base_url, form_action), data=form_data)

Session Handling

If maintaining sessions is necessary, use the Session object to persist cookies and headers across requests:

with requests.Session() as session:
    session.get('https://example.com/login')
    session.post('https://example.com/login', data={'username': 'user', 'password': 'pass'})
    response = session.get('https://example.com/dashboard')
    # Parse response as before

Error Handling

It’s crucial to handle potential errors in network communication:

try:
    response = requests.get('https://example.com/nonexistent', timeout=5)
    response.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print(f'HTTP Error: {errh}')
except requests.exceptions.ConnectionError as errc:
    print(f'Error Connecting: {errc}')
except requests.exceptions.Timeout as errt:
    print(f'Timeout Error: {errt}')
except requests.exceptions.RequestException as err:
    print(f'OOps: Something Else: {err}')

Conclusion

Python’s Requests module paired with BeautifulSoup makes it simple to fetch and parse HTML content. Through these examples, you can customize and build robust systems for web scraping and automated interactions with web pages.

Next Article: Python aiohttp: Limit the number of requests per second

Previous Article: Resolving ImportError: No Module Named ‘requests’ in Python

Series: Python: Network & JSON tutorials

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots