Sling Academy
Home/Pandas/Pandas: How to manually create a DataFrame and add data to it

Pandas: How to manually create a DataFrame and add data to it

Last updated: February 19, 2024

Overview

In this tutorial, you will learn how to use the pandas library in Python to manually create a DataFrame and add data to it. Pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Among its high-level data structures, the DataFrame is perhaps the most central and widely used. We will start with the basics of creating a DataFrame and gradually move on to more advanced techniques of manipulating data within a DataFrame.

Getting Started

Before diving into the creation of DataFrames, it’s important to ensure that pandas is installed in your environment. You can install pandas using pip:

pip install pandas

Once installed, you can import pandas and create your first simple DataFrame.

Creating Your First DataFrame

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df)

This code snippet creates a DataFrame from a dictionary of lists. Each key in the dictionary becomes a column in the DataFrame, and the lists become the data for those columns. The output should look something like this:

    Name  Age      City
0   John   28  New York
1   Anna   34     Paris
2  Peter   29    Berlin
3  Linda   32    London

Adding Data to an Existing DataFrame

After creating a DataFrame, you might need to add new data to it. This can be done using the append method or the pd.concat function, depending on your needs. Here’s how to add a single row using append:

new_row = {'Name': 'Max', 'Age': 26, 'City': 'Amsterdam'}
df = df.append(new_row, ignore_index=True)
print(df)

The updated DataFrame now includes the new row:

    Name  Age      City
0   John   28  New York
1   Anna   34     Paris
2  Peter   29    Berlin
3  Linda   32    London
4    Max   26 Amsterdam

Modifying DataFrame Structure

Aside from adding data, you might also want to modify the structure of your DataFrame, such as adding or deleting columns. To add a new column, you can simply assign it directly:

df['Employed'] = [True, True, False, True, True]
print(df)

This code adds a new column ‘Employed’ indicating the employment status of each individual. The DataFrame should now include the new column:

    Name  Age      City  Employed
0   John   28  New York      True
1   Anna   34     Paris      True
2  Peter   29    Berlin     False
3  Linda   32    London      True
4    Max   26 Amsterdam      True

Advanced DataFrame Manipulation

As you become more comfortable with creating and modifying DataFrames, you’ll likely encounter the need for more advanced manipulation techniques. For instance, you may want to perform operations across rows or columns, handle missing data, or merge DataFrames.

Handling Missing Data

Handling missing data is a common necessity in data analysis. Pandas offers several methods for dealing with it, such as dropna for removing rows or columns with missing data and fillna for replacing them. Here’s an example of using fillna:

df['Employed'] = df['Employed'].fillna(False)
print(df)

In cases where your DataFrame already contains data and you Encounter rows with missing ‘Employed’ status, this code defaults them to False, ensuring that every row has a complete set of data.

Conclusion

In this tutorial, you’ve learned how to manually create a pandas DataFrame and add data to it, starting with simple examples and moving to more complex data manipulation techniques. Understanding how to create and manipulate DataFrames is a foundational skill in data analysis and will enable you to work efficiently with large datasets.

Remember, the key to mastering pandas is practice and experimentation. Explore the vast functionality of pandas further and you’ll uncover even more powerful tools for your data analysis tasks.

Next Article: Pandas: Create a DataFrame from a list of lists and add column names

Previous Article: Pandas: How to select a part of an SQLite table as a DataFrame

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)