Sling Academy
Home/Golang/String Length in Go: Counting Characters vs Bytes

String Length in Go: Counting Characters vs Bytes

Last updated: November 24, 2024

In the Go programming language, working with strings can sometimes be tricky, especially when it comes to counting characters versus bytes. A common misconception is that the length of a string always equals the number of characters it contains. However, because Go uses UTF-8 encoding, a string's length in bytes might be different from its length in characters. Let's explore how to measure both using Go.

Understanding Strings in Go

Strings in Go are a sequence of bytes that use the UTF-8 encoding. This means each character can consist of one or more bytes. For example, ASCII characters require one byte, while some Unicode characters may consume two or more bytes.

Basic Usage

Getting the length of a string in bytes is straightforward using Go's built-in len function:

package main

import "fmt"

func main() {
    str := "hello"
    fmt.Println("Length in bytes:", len(str)) // Outputs: 5
}

In this example, each character ('h', 'e', 'l', 'l', 'o') is one byte in UTF-8 encoding.

Intermediate Usage

Now, let's look at a string with characters beyond the basic ASCII set:

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    str := "こんにちは" // "Hello" in Japanese
    fmt.Println("Length in bytes:", len(str)) // Outputs: 15
    fmt.Println("Number of characters:", utf8.RuneCountInString(str)) // Outputs: 5
}

In this case, the string "こんにちは" is 15 bytes long because each character is represented by 3 bytes in UTF-8 encoding. However, there are only 5 characters.

Advanced Usage

Let's handle strings that include variable byte length more dynamically. This will illustrate converting between runes and strings to check a string's actual characters' length, and address potential encoding issues.

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {
    str := "🙂😀😁"
    byteCount := len(str)
    charCount := utf8.RuneCountInString(str)

    fmt.Println("The string is:", str)
    fmt.Println("Length in bytes:", byteCount)     // Outputs: 12
    fmt.Println("Number of characters:", charCount) // Outputs: 3

    // Convert to runes slice
    runes := []rune(str)
    for i, r := range runes {
        fmt.Printf("Character %d: %c (Unicode code point: %U, Size: %d bytes)\n", i, r, r, utf8.RuneLen(r))
    }
}

Here, the string consists of emoji characters that consume 4 bytes each. Thus, the total length is 12 bytes, but there are only 3 character symbols displayed.

Conclusion

In Go, when working with strings, it's essential to understand the differences between measuring the length in bytes and counting the actual characters, especially when your application involves internationalization or deals with Unicode content. Using utf8.RuneCountInString is necessary when you need to track user-perceived character counts beyond simple byte length.

Next Article: Converting Strings to Runes and Vice Versa in Go

Previous Article: Manipulating Strings: Substrings, Slicing, and Splitting in Go

Series: Working with Strings in Go

Golang

Related Articles

You May Also Like

  • How to remove HTML tags in a string in Go
  • How to remove special characters in a string in Go
  • How to remove consecutive whitespace in a string in Go
  • How to count words and characters in a string in Go
  • Relative imports in Go: Tutorial & Examples
  • How to run Python code with Go
  • How to generate slug from title in Go
  • How to create an XML sitemap in Go
  • How to redirect in Go (301, 302, etc)
  • Using Go with MongoDB: CRUD example
  • Auto deploy Go apps with CI/ CD and GitHub Actions
  • Fixing Go error: method redeclared with different receiver type
  • Fixing Go error: copy argument must have slice type
  • Fixing Go error: attempted to use nil slice
  • Fixing Go error: assignment to constant variable
  • Fixing Go error: cannot compare X (type Y) with Z (type W)
  • Fixing Go error: method has pointer receiver, not called with pointer
  • Fixing Go error: assignment mismatch: X variables but Y values
  • Fixing Go error: array index must be non-negative integer constant