Sling Academy
Home/Python/Python Requests module: How to parse HTML responses

Python Requests module: How to parse HTML responses

Last updated: January 02, 2024

Introduction

Working with HTML responses in Python is a common task for developers. Using the Requests module alongside parsers like BeautifulSoup, we can easily navigate and manipulate HTML content fetched from the web.

Setting up the Environment

Before parsing HTML with Python Requests, you need to install the necessary packages. Open your terminal or command prompt and run:

pip install requests
pip install beautifulsoup4

Fetching HTML Content

To fetch HTML content from a webpage, we use the Requests module’s get method:

import requests
response = requests.get('https://example.com')
html_content = response.text

Parsing HTML with BeautifulSoup

Once we have the HTML content, we can use BeautifulSoup to parse and extract data:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_content, 'html.parser')

With BeautifulSoup, locating elements by tag is straightforward:

headers = soup.find_all('h1')
for header in headers:
    print(header.text)

To find elements by class or id:

navigation_bar = soup.find('div', {'class': 'nav-bar'})
footer = soup.find('footer', {'id': 'site-footer'})

Extracting Attributes and Text

Extracting attributes like href from anchor tags can be done with:

for link in soup.find_all('a'):
    print(link.get('href'))

Similarly, to extract text you can use:

for paragraph in soup.find_all('p'):
    print(paragraph.text)

Handling Relative URLs

If you encounter relative URLs, you can resolve them by using Requests’ URL joining utilities:

from urllib.parse import urljoin

base_url = 'https://example.com'
for link in soup.find_all('a'):
    absolute_url = urljoin(base_url, link.get('href'))
    print(absolute_url)

Advanced Parsing: Using Selectors

You can make use of CSS selectors with the select method:

for item in soup.select('div.content > p.entry'):
    print(item.text)

Working with Forms

To work with forms, you can extract form fields and prepare data for submission:

form = soup.find('form')
form_action = form['action']
form_data = {input['name']: input.get('value', '') for input in form.find_all('input')}
response = requests.post(urljoin(base_url, form_action), data=form_data)

Session Handling

If maintaining sessions is necessary, use the Session object to persist cookies and headers across requests:

with requests.Session() as session:
    session.get('https://example.com/login')
    session.post('https://example.com/login', data={'username': 'user', 'password': 'pass'})
    response = session.get('https://example.com/dashboard')
    # Parse response as before

Error Handling

It’s crucial to handle potential errors in network communication:

try:
    response = requests.get('https://example.com/nonexistent', timeout=5)
    response.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print(f'HTTP Error: {errh}')
except requests.exceptions.ConnectionError as errc:
    print(f'Error Connecting: {errc}')
except requests.exceptions.Timeout as errt:
    print(f'Timeout Error: {errt}')
except requests.exceptions.RequestException as err:
    print(f'OOps: Something Else: {err}')

Conclusion

Python’s Requests module paired with BeautifulSoup makes it simple to fetch and parse HTML content. Through these examples, you can customize and build robust systems for web scraping and automated interactions with web pages.

Next Article: Python Requests module: How to crawl raw HTML from a URL

Previous Article: Resolving ImportError: No Module Named ‘requests’ in Python

Series: Python: Network & JSON tutorials

Python

You May Also Like

  • Python Warning: Secure coding is not enabled for restorable state
  • Python TypeError: write() argument must be str, not bytes
  • 4 ways to install Python modules on Windows without admin rights
  • Python TypeError: object of type ‘NoneType’ has no len()
  • Python: How to access command-line arguments (3 approaches)
  • Understanding ‘Never’ type in Python 3.11+ (5 examples)
  • Python: 3 Ways to Retrieve City/Country from IP Address
  • Using Type Aliases in Python: A Practical Guide (with Examples)
  • Python: Defining distinct types using NewType class
  • Using Optional Type in Python (explained with examples)
  • Python: How to Override Methods in Classes
  • Python: Define Generic Types for Lists of Nested Dictionaries
  • Python: Defining type for a list that can contain both numbers and strings
  • Using TypeGuard in Python (Python 3.10+)
  • Python: Using ‘NoReturn’ type with functions
  • Type Casting in Python: The Ultimate Guide (with Examples)
  • Python: Using type hints with class methods and properties
  • Python: Typing a function with default parameters
  • Python: Typing a function that can return multiple types