Pandas Series is a one-dimensional array-like object that stores data of any type. It can be created from a variety of data sources, such as a Python list, Python dictionary, numpy array, CSV file, JSON file, etc.
Overview
The syntax for creating a Pandas Series is:
pd.Series(
data,
index=index,
dtype=dtype,
name=name,
copy=False,
fastpath=False
)
Where:
- data: The data source
- index: An optional parameter that can be used to specify a label for the Series. If nothing is specified, the values of the series are labeled with their index number.
- dtype: This parameter is optional. It determines the data type of the values of the Series. If not specified, this will be inferred from data source.
- name: Optional. This parameter sets the name of the Series
- copy: Default is false. Copy input data or not.
- fastpath: This is an internal parameter. You shouldn’t care about it.
If you are new to Pandas and data science, you may find it a bit confusing. However, the following examples will give you a clear understanding of how to create a Pandas series.
Examples
Creating a Series from a Python list
This example creates a series from a list of numbers:
import pandas as pd
numbers = [9, 8, 7, 6, 5]
number_series = pd.Series(numbers)
print(number_series)
Output:
0 9
1 8
2 7
3 6
4 5
dtype: int64
This code snippet produces a series from a list of strings with customized labels:
import pandas as pd
colors = ['red', 'green', 'blue', 'yellow', 'orange']
labels = ['a', 'b', 'c', 'd', 'e']
color_series = pd.Series(colors, index=labels)
print(color_series)
Output:
a red
b green
c blue
d yellow
e orange
dtype: object
Creating a Series from a Python dictionary
Let’s say we have a dictionary whoose keys are job titles, and values are the corresponding salaries. We’ can create a Pandas Series from this dictionary like this:
import pandas as pd
jobs_and_salaries = {
'Data Scientist': 120000,
'Software Engineer': 100000,
'Data Analyst': 90000,
'Business Analyst': 80000,
'Project Manager': 80000
}
series = pd.Series(jobs_and_salaries)
print(series)
Output:
Data Scientist 120000
Software Engineer 100000
Data Analyst 90000
Business Analyst 80000
Project Manager 80000
dtype: int64
The labels are the names of the occupations and the values are the respective salaries.
Creating a Series from a Numpy array
You can construct a Series from a Numpy array as follows:
data = np.array(['a', 'b', 'c', 'd'])
s = pd.Series(data)
print(s)
Output:
0 a
1 b
2 c
3 d
dtype: object
Creating a Series from scalar value
The code:
s = pd.Series(4.4, index=['a', 'b', 'c', 'd', 'e'])
print(s)
Output:
a 4.4
b 4.4
c 4.4
d 4.4
e 4.4
dtype: float64
Creating a Series from a DataFrame
You can create a Pandas Series from a Pandas DataFrame by using the DataFrame.squeeze() method. It will convert a single column of the source DataFrame into a Series.
In this example, we have a DataFrame (about some products) with a column named Price. We’ll create a Series that contains the values from the Price column:
import pandas as pd
data = {
'Products': ['Laptop','Tablet', 'Phone', 'Keyboard', 'Mouse'],
'Brand': ['A', 'B', 'C', 'D', 'E'],
'Price': [1000, 800, 1300, 150, 100]
}
df = pd.DataFrame(data, columns=['Products', 'Brand', 'Price'])
price_series = df['Price'].squeeze()
print(price_series)
Output:
0 1000
1 800
2 1300
3 150
4 100
Name: Price, dtype: int64
Creating a Series from a CSV file
In order to create a Pandas Series from a CSV file, you can use the pandas.read_csv() function. This function takes the file path as an argument and returns a DataFrame. You can then use the DataFrame to generate a Series by selecting a column from the DataFrame.
This example creates a DataFrame from an online CSV file that stores sample data about employees in a fiction company. Then, it will return 2 Series as first_name_series (containing first names of employees) and last_name_series (containing last names of employees):
import pandas as pd
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
file_path = 'https://api.slingacademy.com/v1/sample-data/files/employees.csv'
dataframe = pd.read_csv(file_path, storage_options={'User-Agent': 'Mozilla/5.0'})
first_name_series = dataframe['first_name']
last_name_series = dataframe['last_name']
print(first_name_series.head())
print(last_name_series.head())
Output:
0 Jose
1 Douglas
2 Sherry
3 Charles
4 Sharon
Name: first_name, dtype: object
0 Lopez
1 Carter
2 Foster
3 Fisher
4 Hunter
Name: last_name, dtype: object
f you find something confusing or error in the examples above, please leave comments. Happy coding and have a nice day!