PHP: Get the size of a string in bytes

Updated: January 10, 2024 By: Guest Contributor Post a comment

Overview

In PHP development, understanding how to measure the size of a string in bytes is crucial for optimizing data storage and manipulation. This tutorial presents methods to accurately determine a string’s byte length.

Introduction to String Length

Traditionally, the strlen() function is used in PHP to get the length of a string. However, this function returns the number of characters rather than bytes, which can be misleading in a multi-byte character environment. With Unicode and UTF-8 encoding, characters may vary in byte length. Therefore, to obtain the size of a string in bytes, a different approach might be necessary.

$string = 'Hello, World!';
echo strlen($string);  // Outputs: 13

Understanding Multibyte Strings

With the advent of UTF-8, characters can be composed of multiple bytes. For example, standard ASCII characters are a single byte, while some UTF-8 characters can be up to four bytes. To address this, PHP offers the mb_strlen() function, which can be configured to count multibyte characters accurately.

$string = 'Hello, 世界';
echo mb_strlen($string, 'UTF-8');  // Outputs: 9, not considering bytes

Calculating the Byte Size of a String

To accurately measure the byte size of a string, we need to consider its encoding. The following sections showcase several methods to achieve this.

Using mb_strlen() with strlen()

One method is to compare the output of strlen() and mb_strlen() to determine if the string contains multibyte characters.

$string = 'Hello, 世界';
$chars = mb_strlen($string, 'UTF-8');
$bytes = strlen($string);
echo "Characters: $chars, Bytes: $bytes";  // Outputs: Characters: 9, Bytes: 13

Explicit Byte Counting

A more reliable way to determine the byte length of a string in PHP is to use the mb_strlen() function while explicitly specifying the encoding.

$string = 'Hello, 世界';
echo mb_strlen($string, '8bit');  // Outputs: 13 bytes

Advanced Methods for Byte Size Calculation

While the 8bit encoding works well, there are cases when more advanced techniques are useful, especially when dealing with file I/O operations or network communication where exact byte size is critical.

Using iconv_strlen()

The iconv_strlen() function provides an alternative to mb_strlen(), and can give the byte length when used with the appropriate encoding parameter.

$string = 'Hello, 世界';
echo iconv_strlen($string, 'UTF-8');  // Outputs: 9

However, pair it with iconv() function to ensure exact byte size.

$string = 'Hello, 世界';
$bytes = iconv('UTF-8', 'UTF-8//IGNORE', $string);
echo strlen($bytes);  // Outputs exact byte size

Calculating String Byte Size from Hexadecimal Representation

Converting the string to its hexadecimal representation and then computing byte size can offer a lower-level understanding of the string’s encoding.

$string = 'Hello, 世界';
$hexString = bin2hex($string);
echo strlen($hexString) / 2;  // Outputs the byte size

Working with Streams

In the context of streams, PHP’s fwrite() and fread() functions implicitly work with bytes, allowing us to assess the actual byte length during file I/O operations.

if ($fp = fopen('example.txt', 'w+')) {
    $string = 'Hello, 世界';
    fwrite($fp, $string);
    fseek($fp, 0);
    $data = fread($fp, 1024);
    echo strlen($data);  // Outputs the byte size
    fclose($fp);
}

Conclusion

This tutorial has elaborated on the differences between character length and byte size in PHP strings and has presented multiple methods to accurately calculate the size of a string in bytes. Understanding these concepts and techniques is invaluable for accurate data processing and storage in a multi-byte character set environment.