This practical, example-based article will show you a few ways to extract numbers (integers and floats, positive and negative) from text in Python. There’s no time to waste; let’s get our hands dirty with code.
Using regular expressions
Regular expressions provide flexibility to extract numbers with different formats. However, defining a proper pattern might be tough, even with experienced programmers. Below is the regular expression pattern we’ll use in the example to come:
This pattern is designed to match a wide range of number formats, including positive or negative whole numbers, decimal numbers, and numbers with exponential notation. It also handles optional thousands separators (comma) and accounts for the presence of a decimal part (dot) or exponent.
# slingacademy.com import re # Define a function to extract numbers from a string def extract_numbers(text): pattern = r"[-+]?(?:\d+(?:,\d\d\d)*(?:\.\d*)?|\.\d+)(?:[eE][-+]?\d+)?" numbers = re.findall(pattern, text) return numbers text = """123 is a positive integer. This year is 2023. 3.14 is a float number. Examples of a negative float numbers are -12.345 and -6.789. And here is a number with comma separators: 1,234,567.89. """ numbers = extract_numbers(text) print(numbers)
['123', '2023.', '3.14', '-12.345', '-6.789', '1,234,567.89']
The result is a list of numeric strings. In case you need a list of floats, just do like this:
numbers = [float(number.replace(",", "")) for number in numbers]
Using string methods and operations
You can use list comprehension and
split() to convert the text into a list of words, and then filter out the words that are digits using
isdigit(). This approach is far simpler than the previous one, but the trade-off is that it can only handles simple use cases.
text = "There are 6 turtles in the pond, and 3 of them are red-eared sliders." numbers = [int(s) for s in text.split() if s.isdigit()] print(numbers)
Note that the code snippet above only works with positive integers. It will overlook negative numbers as well as floats and other number formats.