Sling Academy
Home/Pandas/Pandas: How to combine 2 columns into one with a separator (e.g., comma)

Pandas: How to combine 2 columns into one with a separator (e.g., comma)

Last updated: February 22, 2024

Introduction

Manipulating data is a routine task in data analysis, and combining columns is among the fundamental operations you’ll likely need to perform. Whether you are preparing your data for visualization, analysis, or simply need to restructure your dataset, Pandas in Python offers versatile functions to accomplish this. This guide will walk you through various ways to combine two columns into one, using a separator like a comma.

Before diving in, let’s ensure you have Pandas installed:

pip install pandas

Basic Example

To start, let’s combine two columns in the simplest case:

import pandas as pd

df = pd.DataFrame({
  'First Name': ['John', 'Jane', 'Alice', 'Bob'],
  'Last Name': ['Doe', 'Doe', 'Cooper', 'Smith']
})

df['Full Name'] = df['First Name'] + ', ' + df['Last Name']
print(df)

This results in a new 'Full Name' column:

  First Name Last Name   Full Name
0       John       Doe  John, Doe
1       Jane       Doe  Jane, Doe
2      Alice    Cooper  Alice, Cooper
3        Bob     Smith  Bob, Smith

Using apply() Function

For more flexibility, you can use the apply() function combined with a lambda function:

df['Full Name'] = df.apply(lambda row: f"{row['First Name']}, {row['Last Name']}", axis=1)
print(df)

Again, the result will be the same. However, this method is more adaptable for complex operations.

Utilizing cat() Method

Pandas’ Series has a cat() method, ideal for concatenating string columns directly:

df['Full Name'] = df['First Name'].str.cat(df['Last Name'], sep=', ')
print(df)

This produces a similar result but is more concise and explicitly designed for concatenating strings.

Adding Conditionals

Combining columns can sometimes require conditional logic. Here’s an advanced example:

df['Full Name'] = df.apply(lambda row: f"{row['First Name']}, {row['Last Name']}" if row['First Name'] != 'Alice' else f"Ms. {row['First Name']}, {row['Last Name']}", axis=1)
print(df)

This code will prepend “Ms.” before Alice’s name, applying a simple condition directly within the apply() function.

Using vectorize() For Large Datasets

For larger datasets, np.vectorize() can be used to efficiently combine columns:

import numpy as np

df['Full Name'] = np.vectorize(lambda x,y: f"{x}, {y}")(df['First Name'], df['Last Name'])
print(df)

This method is faster than apply() for larger datasets, as np.vectorize() is designed to perform operations element-wise but more efficiently.

Handling Missing Values

When combining columns, you may encounter missing values. Below is how to handle them while combining columns:

df['Full Name'] = df['First Name'].fillna('') + ', ' + df['Last Name'].fillna('')
print(df)

This ensures that even if one column has missing values, you can still create a combined column without errors.

Conclusion

Combining two columns in Pandas using a separator, such as a comma, enables data analysts to efficiently reformat and merge dataset fields for further analysis or visualization. This guide has provided multiple methods, from basic concatenations to more advanced techniques, each suitable for different scenarios. With these skills, you are now equipped to handle a wide range of data manipulation tasks in Python’s Pandas.

Next Article: Pandas: How to print a DataFrame without index (3 ways)

Previous Article: Pandas DataFrame: Convert all numeric strings to numbers

Series: DateFrames in Pandas

Pandas

You May Also Like

  • How to Use Pandas Profiling for Data Analysis (4 examples)
  • How to Handle Large Datasets with Pandas and Dask (4 examples)
  • Pandas – Using DataFrame.pivot() method (3 examples)
  • Pandas: How to ‘FULL JOIN’ 2 DataFrames (3 examples)
  • Pandas: Select columns whose names start/end with a specific string (4 examples)
  • 3 ways to turn off future warnings in Pandas
  • How to Integrate Pandas with Apache Spark
  • How to Use Pandas for Web Scraping and Saving Data (2 examples)
  • How to Clean and Preprocess Text Data with Pandas (3 examples)
  • Pandas – Using Series.replace() method (3 examples)
  • Pandas json_normalize() function: Explained with examples
  • Pandas: Reading CSV and Excel files from AWS S3 (4 examples)
  • Using pandas.Series.rank() method (4 examples)
  • Pandas: Dropping columns whose names contain a specific string (4 examples)
  • Pandas: How to print a DataFrame without index (3 ways)
  • Fixing Pandas NameError: name ‘df’ is not defined
  • Pandas – Using DataFrame idxmax() and idxmin() methods (4 examples)
  • Pandas FutureWarning: ‘M’ is deprecated and will be removed in a future version, please use ‘ME’ instead
  • Pandas: Checking equality of 2 DataFrames (element-wise)