Sling Academy
Home/Python/Python: Get Hostname and Protocol from a URL

Python: Get Hostname and Protocol from a URL

Last updated: June 02, 2023

This concise article shows you how to get the hostname and the protocol from a given URL in Python.

Before writing code, let me clarify some things. Suppose we have a URL like this:

https://www.slingacademy.com/cat/sample-data

Then the components of the URL are:

  • https: The protocol.
  • www.slingacademy.com: The hostname.
  • www: The subdomain.
  • slingacademy.com: The domain (domain name).
  • /cat/sample-data: The path segment.

Using the urllib.parse module

The built-in urllib.parse module of Python provides beautiful tools for URL parsing. It can handle various URL formats, including different protocols.

Here’re the steps to extract the hostname and protocol from a URL:

  1. Import the urlparse function from the urllib.parse module.
  2. Parse the URL using the urlparse function.
  3. Access the hostname and scheme attributes from the parsed URL object.

A code example is worth more than thousands of boring words:

from urllib.parse import urlparse

def extract_hostname_and_protocol(url):
    parsed_url = urlparse(url)
    hostname = parsed_url.hostname
    protocol = parsed_url.scheme
    return hostname, protocol


# Test it out
url1 = 'https://www.slingacademy.com/cat/sample-data/'
hostname_1, protocol_1 = extract_hostname_and_protocol(url1)
print(f"Hostname: {hostname_1}")
print(f"Protocol: {protocol_1}")

url2 = "https://api.slingacademy.com/v1/examples/sample-page.html"
hostname_2, protocol_2 = extract_hostname_and_protocol(url2)
print(f"Hostname: {hostname_2}")
print(f"Protocol: {protocol_2}")

Output:

Hostname: www.slingacademy.com
Protocol: https
Hostname: api.slingacademy.com
Protocol: https

Using regular expressions

The preceding approach is elegant and works well. However, it isn’t the only possible way to get the job done. An alternative solution is to use a regular expression.

The steps are:

  1. Import the re module for regular expressions.
  2. Define a regular expression pattern to match the hostname and protocol.
  3. Use the re.search function to find the pattern within the URL string.
  4. Extract the matched groups for hostname and protocol.

Here’s the pattern we’ll use:

pattern = r"^(?P<protocol>https?)://(?P<hostname>[^/]+)"

Let me explain the pattern above:

  • ^ : Start of the string anchor.
  • (?P<protocol>https?) : Named capturing group protocol to match the protocol. It matches http or https using the ? quantifier to make the s optional.
  • ://: Matches the colon and double slashes.
  • (?P<hostname>[^/]+) : Named capturing group hostname to match the hostname. It matches one or more characters that are not a forward slash (/), indicating the hostname portion of the URL.

Code example:

import re

def extract_hostname_and_protocol(url):
    pattern = r"^(?P<protocol>https?)://(?P<hostname>[^/]+)"
    match = re.search(pattern, url)
    if match:
        protocol = match.group("protocol")
        hostname = match.group("hostname")
        return hostname, protocol
    return None, None


# Test it out
url1 = 'https://www.slingacademy.com/cat/sample-data/'
hostname_1, protocol_1 = extract_hostname_and_protocol(url1)
print(f"Hostname: {hostname_1}")
print(f"Protocol: {protocol_1}")

url2 = "http://api.slingacademy.com/v1/examples/sample-page.html"
hostname_2, protocol_2 = extract_hostname_and_protocol(url2)
print(f"Hostname: {hostname_2}")
print(f"Protocol: {protocol_2}")

Output:

Hostname: www.slingacademy.com
Protocol: https
Hostname: api.slingacademy.com
Protocol: http

Regular expressions allow you to deal with rare and specific use cases by crafting your own custom pattern. However, it may be tough sometimes, even with experienced programmers.

Next Article: Python: How to Algin a String (Left, Right, and Center)

Previous Article: Python: 3 Ways to Get File Name and Extension from URL

Series: Working with Strings in Python

Python

You May Also Like

  • Introduction to yfinance: Fetching Historical Stock Data in Python
  • Monitoring Volatility and Daily Averages Using cryptocompare
  • Advanced DOM Interactions: XPath and CSS Selectors in Playwright (Python)
  • Automating Strategy Updates and Version Control in freqtrade
  • Setting Up a freqtrade Dashboard for Real-Time Monitoring
  • Deploying freqtrade on a Cloud Server or Docker Environment
  • Optimizing Strategy Parameters with freqtrade’s Hyperopt
  • Risk Management: Setting Stop Loss, Trailing Stops, and ROI in freqtrade
  • Integrating freqtrade with TA-Lib and pandas-ta Indicators
  • Handling Multiple Pairs and Portfolios with freqtrade
  • Using freqtrade’s Backtesting and Hyperopt Modules
  • Developing Custom Trading Strategies for freqtrade
  • Debugging Common freqtrade Errors: Exchange Connectivity and More
  • Configuring freqtrade Bot Settings and Strategy Parameters
  • Installing freqtrade for Automated Crypto Trading in Python
  • Scaling cryptofeed for High-Frequency Trading Environments
  • Building a Real-Time Market Dashboard Using cryptofeed in Python
  • Customizing cryptofeed Callbacks for Advanced Market Insights
  • Integrating cryptofeed into Automated Trading Bots