Python: How to Extract Numbers from Text (2 Approaches)

Updated: June 4, 2023 By: Khue Post a comment

This practical, example-based article will show you a few ways to extract numbers (integers and floats, positive and negative) from text in Python. There’s no time to waste; let’s get our hands dirty with code.

Using regular expressions

Regular expressions provide flexibility to extract numbers with different formats. However, defining a proper pattern might be tough, even with experienced programmers. Below is the regular expression pattern we’ll use in the example to come:

r"[-+]?(?:\d+(?:,\d\d\d)*(?:\.\d*)?|\.\d+)(?:[eE][-+]?\d+)?"

This pattern is designed to match a wide range of number formats, including positive or negative whole numbers, decimal numbers, and numbers with exponential notation. It also handles optional thousands separators (comma) and accounts for the presence of a decimal part (dot) or exponent.

Full example:

# slingacademy.com
import re

# Define a function to extract numbers from a string
def extract_numbers(text):
    pattern = r"[-+]?(?:\d+(?:,\d\d\d)*(?:\.\d*)?|\.\d+)(?:[eE][-+]?\d+)?"
    numbers = re.findall(pattern, text)
    return numbers

text = """123 is a positive integer. This year is 2023. 3.14 is a float number. Examples of a negative float numbers are -12.345 and -6.789. 
And here is a number with comma separators: 1,234,567.89.
"""

numbers = extract_numbers(text)
print(numbers)

Output:

['123', '2023.', '3.14', '-12.345', '-6.789', '1,234,567.89']

The result is a list of numeric strings. In case you need a list of floats, just do like this:

numbers = [float(number.replace(",", "")) for number in numbers]

Using string methods and operations

You can use list comprehension and split() to convert the text into a list of words, and then filter out the words that are digits using isdigit(). This approach is far simpler than the previous one, but the trade-off is that it can only handles simple use cases.

Code example:

text = "There are 6 turtles in the pond, and 3 of them are red-eared sliders."

numbers = [int(s) for s in text.split() if s.isdigit()]
print(numbers)

Output:

[6, 3]

Note that the code snippet above only works with positive integers. It will overlook negative numbers as well as floats and other number formats.