Python: Remove non-alphanumeric characters from a string

Updated: June 1, 2023 By: Khue Post a comment

Overview

Non-alphanumeric characters are characters that are not letters or numbers. They include punctuation marks, symbols, whitespace, and control characters. For example, some of the non-alphanumeric characters are:

! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

When working with Python, there might be cases where you want to remove all non-alphanumeric characters from a given string, such as:

  • When you want to normalize or standardize your string data for analysis or comparison.
  • When you want to extract or validate the information from your string data, for example, you may want to remove special characters from usernames, passwords, etc., to check their validity or extract their components.

This succinct, practical article will show you a couple of different ways to eliminate all non-alphanumeric characters from a given string in Python. Let’s go!

Removing Non-Alphanumeric Characters from a String

Using regular expressions

Regular expressions are patterns that can match strings based on certain rules. You can import the re module in Python to work with regular expressions.

To remove all special characters from a string, you need to use a pattern that matches any character that is not a letter or a number. You can use the following pattern for this purpose:

[^A-Za-z0-9]+

Code example:

import re

string = "1, 2, 3, and 4 are numbers. Hey! 5 is a number too. But [email protected] is not a number. _-+-&$"
pattern = "[^A-Za-z0-9]+"
new_string = re.sub(pattern, "", string)
print(new_string)

Output:

123and4arenumbersHey5isanumbertooButcontactslingacademycomisnotanumber

Another common regular expression pattern for our purpose is:

\W+

It is very similar to the earlier mentioned pattern, except for a slight difference. This pattern won’t remove the underscore character (_).

Example:

import re

string = "one_man_show @-+!@#$%^&*()_+ 1234567890 Rockstar!"
pattern = "\W+"
new_string = re.sub(pattern, "", string)
print(new_string)

Output:

one_man_show_1234567890Rockstar

Using list comprehension with join() and isalnum()

The concept of this approach is simple: Iterate over each character in the input string and keep only the alphanumeric characters using the string str.isalnum() method. The steps are:

  • Create a new string by iterating over each character in the input string.
  • Use isalnum() to check if each character is alphanumeric.
  • Join the alphanumeric characters together to form the cleaned string (with the join() method).

Example:

# define a function to remove non-alphanumeric characters
def remove_non_alphanumeric_isalnum(text):
    cleaned_text = ''.join(c for c in text if c.isalnum())
    return cleaned_text

text = "1, 2, 3, and 4 are numbers. Hey! 5 is a number too. But [email protected] is not a number. _-+-&$"
result = remove_non_alphanumeric_isalnum(text)
print(result)

Output:

123and4arenumbersHey5isanumbertooButcontactslingacademycomisnotanumber

Conclusion

You’ve learned two distinct techniques to delete special characters, punctuations, and spaces from a Python string. Both are them are concise, elegant, and work well. Choose the one you like to go with. Happy coding & enjoy your day. Goodbye!