Pandas: 3 ways to convert a DataFrame to a NumPy array

Updated: February 19, 2024 By: Guest Contributor Post a comment

Introduction

Converting a Pandas DataFrame to a NumPy array is a common operation in data science, allowing you to leverage the speed and efficiency of NumPy for numerical computations. In this guide, we’ll explore several methods to perform this conversion, each suited to different scenarios and needs.

Using values Attribute

One of the simplest and most direct ways of converting a DataFrame into a NumPy array is by accessing the values attribute. This approach is straightforward and works well for most use cases.

  1. Create your DataFrame.
  2. Access the values attribute of the DataFrame.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

array = df.values
print(array)

Notes: The values attribute returns the DataFrame values in a 2D NumPy array. This method is quick and efficient but does not allow for selective column conversion or data type specification.

Using to_numpy() Method

A more flexible alternative to the values attribute, the to_numpy() method, allows you to specify the data type and whether the index should be included. This makes it a better choice for scenarios requiring more control over the conversion process.

  1. Create your DataFrame.
  2. Call the to_numpy() method on the DataFrame, optionally specifying the dtype and whether to include the index.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

array = df.to_numpy()
print(array)

Notes: The to_numpy() method offers more control over the resultant array’s data type and structure. However, it might be slightly slower than using the values attribute, especially for large DataFrames.

Applying astype() for Data Type Conversion

In situations where control over the resulting NumPy array’s data type is crucial, using the astype() method before conversion can ensure the desired data type is applied throughout the array.

  1. Create your DataFrame.
  2. Use the astype() method on the DataFrame to specify the desired data type for the entire array.
  3. Convert to a NumPy array using either the values attribute or the to_numpy() method.

Example:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df = df.astype('float64')
array = df.to_numpy()
print(array)

Notes: This approach gives you full control over the data type, which can be important for numerical computations requiring precision. However, changing data types might impact performance or lead to data loss if the conversion is not compatible.

Conclusion

Converting a DataFrame to a NumPy array is a versatile operation in pandas, with several methods available to suit various needs. Whether you need a simple conversion or require specific data types and structures, pandas provides the tools necessary to seamlessly transition between DataFrame and NumPy array representations. Understanding the benefits and limitations of each method allows you to choose the most appropriate one for your specific data manipulation and computational tasks.