Pandas: How to combine 2 columns into one with a separator (e.g., comma)

Updated: February 22, 2024 By: Guest Contributor Post a comment

Introduction

Manipulating data is a routine task in data analysis, and combining columns is among the fundamental operations you’ll likely need to perform. Whether you are preparing your data for visualization, analysis, or simply need to restructure your dataset, Pandas in Python offers versatile functions to accomplish this. This guide will walk you through various ways to combine two columns into one, using a separator like a comma.

Before diving in, let’s ensure you have Pandas installed:

pip install pandas

Basic Example

To start, let’s combine two columns in the simplest case:

import pandas as pd

df = pd.DataFrame({
  'First Name': ['John', 'Jane', 'Alice', 'Bob'],
  'Last Name': ['Doe', 'Doe', 'Cooper', 'Smith']
})

df['Full Name'] = df['First Name'] + ', ' + df['Last Name']
print(df)

This results in a new 'Full Name' column:

  First Name Last Name   Full Name
0       John       Doe  John, Doe
1       Jane       Doe  Jane, Doe
2      Alice    Cooper  Alice, Cooper
3        Bob     Smith  Bob, Smith

Using apply() Function

For more flexibility, you can use the apply() function combined with a lambda function:

df['Full Name'] = df.apply(lambda row: f"{row['First Name']}, {row['Last Name']}", axis=1)
print(df)

Again, the result will be the same. However, this method is more adaptable for complex operations.

Utilizing cat() Method

Pandas’ Series has a cat() method, ideal for concatenating string columns directly:

df['Full Name'] = df['First Name'].str.cat(df['Last Name'], sep=', ')
print(df)

This produces a similar result but is more concise and explicitly designed for concatenating strings.

Adding Conditionals

Combining columns can sometimes require conditional logic. Here’s an advanced example:

df['Full Name'] = df.apply(lambda row: f"{row['First Name']}, {row['Last Name']}" if row['First Name'] != 'Alice' else f"Ms. {row['First Name']}, {row['Last Name']}", axis=1)
print(df)

This code will prepend “Ms.” before Alice’s name, applying a simple condition directly within the apply() function.

Using vectorize() For Large Datasets

For larger datasets, np.vectorize() can be used to efficiently combine columns:

import numpy as np

df['Full Name'] = np.vectorize(lambda x,y: f"{x}, {y}")(df['First Name'], df['Last Name'])
print(df)

This method is faster than apply() for larger datasets, as np.vectorize() is designed to perform operations element-wise but more efficiently.

Handling Missing Values

When combining columns, you may encounter missing values. Below is how to handle them while combining columns:

df['Full Name'] = df['First Name'].fillna('') + ', ' + df['Last Name'].fillna('')
print(df)

This ensures that even if one column has missing values, you can still create a combined column without errors.

Conclusion

Combining two columns in Pandas using a separator, such as a comma, enables data analysts to efficiently reformat and merge dataset fields for further analysis or visualization. This guide has provided multiple methods, from basic concatenations to more advanced techniques, each suitable for different scenarios. With these skills, you are now equipped to handle a wide range of data manipulation tasks in Python’s Pandas.