Pandas: How to manually create a DataFrame and add data to it

Updated: February 19, 2024 By: Guest Contributor Post a comment

Overview

In this tutorial, you will learn how to use the pandas library in Python to manually create a DataFrame and add data to it. Pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Among its high-level data structures, the DataFrame is perhaps the most central and widely used. We will start with the basics of creating a DataFrame and gradually move on to more advanced techniques of manipulating data within a DataFrame.

Getting Started

Before diving into the creation of DataFrames, it’s important to ensure that pandas is installed in your environment. You can install pandas using pip:

pip install pandas

Once installed, you can import pandas and create your first simple DataFrame.

Creating Your First DataFrame

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df)

This code snippet creates a DataFrame from a dictionary of lists. Each key in the dictionary becomes a column in the DataFrame, and the lists become the data for those columns. The output should look something like this:

    Name  Age      City
0   John   28  New York
1   Anna   34     Paris
2  Peter   29    Berlin
3  Linda   32    London

Adding Data to an Existing DataFrame

After creating a DataFrame, you might need to add new data to it. This can be done using the append method or the pd.concat function, depending on your needs. Here’s how to add a single row using append:

new_row = {'Name': 'Max', 'Age': 26, 'City': 'Amsterdam'}
df = df.append(new_row, ignore_index=True)
print(df)

The updated DataFrame now includes the new row:

    Name  Age      City
0   John   28  New York
1   Anna   34     Paris
2  Peter   29    Berlin
3  Linda   32    London
4    Max   26 Amsterdam

Modifying DataFrame Structure

Aside from adding data, you might also want to modify the structure of your DataFrame, such as adding or deleting columns. To add a new column, you can simply assign it directly:

df['Employed'] = [True, True, False, True, True]
print(df)

This code adds a new column ‘Employed’ indicating the employment status of each individual. The DataFrame should now include the new column:

    Name  Age      City  Employed
0   John   28  New York      True
1   Anna   34     Paris      True
2  Peter   29    Berlin     False
3  Linda   32    London      True
4    Max   26 Amsterdam      True

Advanced DataFrame Manipulation

As you become more comfortable with creating and modifying DataFrames, you’ll likely encounter the need for more advanced manipulation techniques. For instance, you may want to perform operations across rows or columns, handle missing data, or merge DataFrames.

Handling Missing Data

Handling missing data is a common necessity in data analysis. Pandas offers several methods for dealing with it, such as dropna for removing rows or columns with missing data and fillna for replacing them. Here’s an example of using fillna:

df['Employed'] = df['Employed'].fillna(False)
print(df)

In cases where your DataFrame already contains data and you Encounter rows with missing ‘Employed’ status, this code defaults them to False, ensuring that every row has a complete set of data.

Conclusion

In this tutorial, you’ve learned how to manually create a pandas DataFrame and add data to it, starting with simple examples and moving to more complex data manipulation techniques. Understanding how to create and manipulate DataFrames is a foundational skill in data analysis and will enable you to work efficiently with large datasets.

Remember, the key to mastering pandas is practice and experimentation. Explore the vast functionality of pandas further and you’ll uncover even more powerful tools for your data analysis tasks.