This succinct and straight-to-the-point article will walk you through some different ways to find the frequency (the number of occurrences) of each word in a string in Python.
Using the Counter class
This approach is very concise since it utilizes the Counter
class from the collections
module. The steps are:
- Import the
Counter
class from thecollections
module. - Split the string into words using the
split()
method. - Create a
Counter
object from the list of words and store it in a variable.
Example:
from collections import Counter
string = "blue blue red red red red green green blue yellow"
word_list = string.split()
word_frequency = Counter(word_list)
print(word_frequency)
Output:
Counter({'red': 4, 'blue': 3, 'green': 2, 'yellow': 1})
You can also iterate through the Counter
object like so:
for word, count in word_frequency.items():
print(f"{word}: {count}")
Output:
blue: 3
red: 4
green: 2
yellow: 1
In case you only want to get the most common words and their counts, just call the most_common()
method on the Counter
object and pass it the number of words you want to retrieve. It will return a list of tuples:
from collections import Counter
string = "blue blue red red red red green green blue yellow"
word_list = string.split()
word_frequency = Counter(word_list)
two_most_common = word_frequency.most_common(2)
print(two_most_common)
Output:
[('red', 4), ('blue', 3)]
This approach is convenient and efficient and doesn’t rely on a third-party library.
Using a dictionary
We will use a dictionary to store words and their counts. Here’re the steps to follow:
- Create an empty dictionary.
- Split the string into words using the
split()
method. - Iterate through each word and update its count in the dictionary.
Code implement:
string = "Sling Academy ball box Sling box ball Academy hello hello hello"
word_list = string.split()
word_frequency = {}
for word in word_list:
word_frequency[word] = word_frequency.get(word, 0) + 1
for word, count in word_frequency.items():
print(f"{word}: {count}")
Output:
Sling: 2
Academy: 2
ball: 2
box: 2
hello: 3
This technique is simple and intuitive, efficient for small to moderate-sized strings.