PHP: Remove multiple consecutive empty lines from a string

Updated: January 9, 2024 By: Guest Contributor Post a comment

Introduction

Working with text data often requires cleaning up unnecessary whitespace to improve processing and presentation, particularly when dealing with multiple redundant newlines. In this tutorial, we’ll learn how to efficiently remove consecutive empty lines from a string in PHP, keeping our content neat and readable.

Understanding Whitespace and Newlines in PHP

Before diving into code, it’s important to understand what we’re dealing with. Whitespace includes spaces, tabs, and newline characters, which can accumulate in text content. In PHP, newline characters are represented by \n (LF – Line Feed) or \r\n (CRLF – Carriage Return & Line Feed), depending on the operating system.

Basic String Replacement

To begin trivially, we can use str_replace() to replace two consecutive newlines with a single newline:

$string = "Hello\n\nWorld";
$string = str_replace("\n\n", "\n", $string);

However, this simple method only replaces pairs of newlines and won’t work for three or more consecutive newlines.

Using Regular Expressions

Regular expressions provide a more powerful tool for this task. We can use preg_replace():

$string = "Hello\n\n\nWorld";
$string = preg_replace('/(\n{2,})/', '\n', $string);

This regex replaces two or more consecutive newlines with a single newline.

Advanced Regular Expressions

For a more robust solution that accounts for different types of whitespace and newline characters, we can enhance our regex pattern:

$string = "Hello\n\r\n\nWorld";
$string = preg_replace('/(\r\n|\n|\r){2,}/', "\n", $string);

This approach accounts for all combinations of newline characters.

Preserving Paragraphs

Sometimes you may want to preserve consecutive empty lines if they conform to paragraph breaks. To keep a single blank line between paragraphs, modify the regex:

$string = "Paragraph 1\n\n\nParagraph 2";
$string = preg_replace('/(\r\n|\n|\r){3,}/', "\n\n", $string);

This code reduces three or more newlines to two, preserving paragraph separation.

Creating a Custom PHP Function

Encapsulating this functionality in a custom function makes our code cleaner and reusable:

function removeExtraNewlines($string) {
    return preg_replace('/(\r\n|\n|\r){2,}/', "\n", $string);
}
$string = "Let's\n\n\nkeep coding!";
$string = removeExtraNewlines($string);

Now we can call removeExtraNewlines() whenever we need to clean up our strings.

Handling Multibyte Characters

If we’re dealing with multibyte characters (like unicode), we’ll need to use the u modifier in our regex. Here’s how:

$string = "Some unicode\n\n\ncontent.";
$string = preg_replace('/(\r\n|\n|\r){2,}/u', "\n", $string);

This ensures that our regular expression is applied to the string in a multibyte-safe way.

Conclusion

In this tutorial, we’ve explored techniques for removing consecutive empty lines from strings in PHP. While starting from a simple str_replace(), we progressed to more sophisticated regular expressions, and even touched upon handling unicode content. Implementing these methods helps keep text data clean and standardized across different applications, ensuring robust and maintainable code.