Working with numpy.loadtxt() function (4 examples)

Updated: February 29, 2024 By: Guest Contributor Post a comment

Introduction

The numpy.loadtxt() function is a powerful utility for reading data from text files in numerical computing with Python. This tutorial will take you through the basics to more advanced uses with clear examples at each step. Whether you are dealing with simple CSV files or more complex structured data, understanding how to effectively use numpy.loadtxt() can accelerate your data processing tasks.

The Fundamentals of numpy.loadtxt()

The numpy library provides the loadtxt() function as an easy way to load data from text files, including CSV (comma-separated values) and TSV (tab-separated values) files. It is especially useful for reading numerical data and supports specifying the delimiter, data type, converters, and many other useful parameters.

Syntax:

numpy.loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None, *, like=None)

Parameters:

  • fname: file, str, pathlib.Path, list of str, or generator. The file name or a file-like object to read from. If the filename extension is .gz or .bz2, the file is first decompressed.
  • dtype: data-type, optional. The data type of the resulting array; default is float.
  • comments: str or sequence of str, optional. The character or characters used to indicate the start of a comment; default is '#'.
  • delimiter: str, optional. The string used to separate values. By default, any whitespace acts as a delimiter.
  • converters: dict, optional. A dictionary mapping column number to a function that will parse the column string into the desired value. E.g., {0: lambda s: float(s.strip())}.
  • skiprows: int, optional. Skip the first skiprows lines; default is 0.
  • usecols: int or sequence, optional. Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th, and 6th columns. If None, read all columns.
  • unpack: bool, optional. If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...); default is False.
  • ndmin: int, optional. The returned array will have at least ndmin dimensions. Otherwise mono-dimensional axes will be squeezed.
  • encoding: str, optional. Encoding used to decode the input file. Does not apply to input streams. The special value 'bytes' enables byte-by-byte reading without decoding. Default is 'bytes'.
  • max_rows: int, optional. Read max_rows lines of content after skiprows lines. The default is to read all the lines.
  • like: array_like, optional. Reference object to allow the creation of arrays which are not NumPy arrays. If an array-like passed in as like supports the __array_function__ protocol, the result will be defined by it. Otherwise, a NumPy array will be created as usual.

Preparing Sample Datasets

You can use your own CSV data or download one of the following datasets to practice:

Example 1: Basic Usage

Let’s begin with the simplest use case: reading a standard CSV file containing floating point numbers:

import numpy as np

# Example CSV content:
# 1.0, 2.0, 3.0
# 4.0, 5.0, 6.0
# 7.0, 8.0, 9.0
data = np.loadtxt('example.csv', delimiter=','
print(data)

The output will be:

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

This basic example demonstrates how to load a simple CSV file with numeric data. The delimiter=',' parameter specifies that the data items are separated by commas.

Example 2: Skipping Rows and Specifying Columns

Often, data files contain headers or other information at the beginning that you might want to skip. The loadtxt() function allows you to specify how many rows to skip using the skiprows argument, and you can also specify which columns to read using the usecols parameter.

import numpy as np

# Example CSV content:
# Header 1, Header 2, Header 3
# 10.0, 20.0, 30.0
# 40.0, 50.0, 60.0
data = np.loadtxt('example_with_header.csv', delimiter=',', skiprows=1, usecols=(0, 2))
print(data)

The output for this would be:

[[10. 30.]
 [40. 60.]]

Here, skiprows=1 tells numpy to ignore the first line, while usecols=(0, 2) instructs it to only load the first and last columns of data.

Example 3: Using Custom Converters

What if your data isn’t purely numerical? The loadtxt() function allows the use of custom converters that can transform data in a column from one form to another during the loading process. This feature is powerful when dealing with data types like dates or times, or when performing preprocessing steps like normalization.

import numpy as np

# Example CSV content:
# Date, Temperature
# 2020-01-01, 32
# 2020-01-02, 31
converter = {
 0: lambda x: np.datetime64(x),
 1: lambda x: float(x)
}
data = np.loadtxt('example_with_dates.csv', delimiter=',', converters=converter)
print(data)

You might see output resembling this, depending on your environment and settings:

[['2020-01-01' '32.0']
 ['2020-01-02' '31.0']]

This example converts the first column to datetime64 objects and the second column to floats. Custom converters enable you to handle a wide variety of data formats and preprocessing needs with minimal effort.

Example 4: Loading Textual Data

Finally, it’s important to know that np.loadtxt() is not limited to numeric data. You can use it to load text, provided you specify the data type accordingly. This can be useful in scenarios where you’re working with mixed data types.

import numpy as np

# Example content:
# Name, Score
# John Doe, 95
# Jane Doe, 88
data = np.loadtxt('example_text_data.csv', delimiter=',', dtype='str')
print(data)

The output will be an array of strings:

[['John Doe' '95']
 ['Jane Doe' '88']]

This shows how versatile numpy.loadtxt() can be, handling not only numeric but also textual data types, making it an invaluable tool for a wide range of data parsing tasks.

Conclusion

Throughout this tutorial, we have explored the versatility and utility of the numpy.loadtxt() function across various scenarios from simple numerical data loading to handling complex structured and textual data. You should now feel confident in leveraging this function to streamline your data importing workflows in Python.