MongoDB full-text index: A practical guide (with examples)

Introduction
What is Full-Text Indexing?
Creating a Full-Text Index
Advanced Text Search
Text Search in Aggregation Pipelines
Handling Multiple Languages
Optimizing Text Indexes
Combining Text Search with Other Queries
Index Management
Best Practices
Conclusion

Introduction

In today’s data-driven world, having an efficient way to search through large datasets is essential. MongoDB, one of the leading NoSQL databases, provides a powerful feature called full-text indexing, which allows for high-performance text searches. In this tutorial, we’ll dive into what full-text indexing is, why it’s useful, and how you can implement it in your MongoDB database, aided by practical examples.

What is Full-Text Indexing?

Full-text indexing is a search engine feature that enables you to perform text searches on a collection of documents within a database. Unlike traditional databases that search through text using the ‘LIKE’ query pattern match, full-text search engines tokenize the text in documents and build an index to allow very fast text search capabilities.

MongoDB’s full-text search support comes from its text indexes, which are special kinds of indexes that consider tokenization and common stop words. With the addition of text search functions in the MongoDB aggregation pipeline, this functionality becomes even more powerful and flexible.

Creating a Full-Text Index

Let’s start with creating a basic full-text index. To do this, we’ll need a collection with some data to search through. Say we have a collection ‘articles’ with many documents following this structure:

{
  title: String,
  content: String,
  tags: [String]
}

To create a text index on the ‘title’ and ‘content’ fields, we would use the following command:

db.articles.createIndex({ title: "text", content: "text" });

Now, let’s try a simple search looking for articles containing the word “MongoDB”:

db.articles.find({ $text: { $search: "MongoDB" } });

This will return documents where ‘MongoDB’ appears at least once in either the title or the content.

Advanced Text Search

Text search in MongoDB supports a range of advanced features. For instance, you can search for phrases, exclude certain words, or use additional query operators to refine your search.

db.articles.find({
  $text: { $search: '"MongoDB indexing" -bugs' }
});

This query searches for documents that contain the exact phrase “MongoDB indexing” but do not contain the word ‘bugs’.

Text Search in Aggregation Pipelines

MongoDB also allows you to use full-text search within aggregation pipelines, providing even more powerful text search functionality. Here’s an example where we use the $match and $sort stages to find and sort documents:

db.articles.aggregate([
  { $match: { $text: { $search: "MongoDB" } } },
  { $sort: { score: { $meta: "textScore" } } }
]);

This will return documents that mention ‘MongoDB’, sorted by their relevance to the search term.

Handling Multiple Languages

Full-text indexes in MongoDB can also support multiple languages, providing language-specific tokenization and stemming. For example, to specify the default language for a text index, you could use the following syntax:

db.articles.createIndex(
  { title: "text", content: "text" },
  { default_language: "english" }
);

You can also specify the language for individual fields within documents:

{
  title: "Un example",
  content: "An example document",
  language: "french"
}

To handle fields written in different languages, utilize the ‘language_override’ option.

Optimizing Text Indexes

When creating text indexes, MongoDB allows you to exclude certain ‘stop words’ or include weight values to specific fields, which can help prioritize search results:

db.articles.createIndex(
  { title: "text", content: "text" },
  { weights: { title: 10, content: 5 } }
);

This assigns more weight to the title than the content when performing a text search.

Combining Text Search with Other Queries

One of the strengths of MongoDB’s text search is being able to combine it with other query parameters to further narrow down search results. Here is an example that combines text search with a range query:

db.articles.find({
  $text: { $search: "MongoDB indexing" },
  publishDate: { $gt: new Date('2020-01-01') }
});

This query searches for documents with the term “MongoDB indexing” published after January 1, 2020.

Index Management

Maintaining text indexes is similar to managing other index types in MongoDB. You’ll periodically want to review and potentially drop indexes that are no longer needed. This can be accomplished via the following command:

db.articles.dropIndex("title_text_content_text");

It is crucial to monitor the performance and storage impact your full-text indexes may have, especially in larger databases.

Best Practices

Here are some best practices for implementing efficient full-text search in MongoDB:

Create text indexes only on the fields that will be searched.
Consider the language of your text data when configuring indexes.
Use weighting to prioritize fields according to the relevance of their content to searches.
Monitor performance and be prepared to refine your indexing strategy as your application evolves.

Conclusion

MongoDB’s full-text index is undeniably a powerful tool for text-based queries. Start small, understand how indexing impacts your specific use-case, and scale as needed. Your applications will benefit from faster, more relevant search capabilities as a result.

Next Article: MongoDB: How to identify and drop unused indexes (with examples)

Previous Article: Using compound and multikey indexes in MongoDB (with examples)

Series: MongoDB Tutorials

MongoDB