Create a new column in Pandas DataFrame based on the existing columns

28 July 2024

2

While working with data in Pandas, we perform a vast array of operations on the data to get the data in the desired form. One of these operations could be that we want to create new columns in the DataFrame based on the result of some operations on the existing columns in the DataFrame. Let’s discuss several ways in which we can do that.

Given a Dataframe containing data about an event, we would like to create a new column called ‘Discounted_Price’, which is calculated after applying a discount of 10% on the Ticket price.

Example 1: We can use DataFrame.apply() function to achieve this task.

Python3

# importing pandas as pd
import pandas as pd
 
# Creating the DataFrame
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
                    'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
                    'Cost':[10000, 5000, 15000, 2000]})
 
# Print the dataframe
print(df)

Output :

        Date    Event   Cost
0  10/2/2011    Music  10000
1  11/2/2011   Poetry   5000
2  12/2/2011  Theatre  15000
3  13/2/2011   Comedy   2000

Now we will create a new column called ‘Discounted_Price’ after applying a 10% discount on the existing ‘Cost’ column.

Python3

# using apply function to create a new column
df['Discounted_Price'] = df.apply(lambda row: row.Cost -
                                  (row.Cost * 0.1), axis = 1)
 
# Print the DataFrame after addition
# of new column
print(df)

Output :

         Date    Event   Cost  Discounted_Price
0  10/2/2011    Music  10000            9000.0
1  11/2/2011   Poetry   5000            4500.0
2  12/2/2011  Theatre  15000           13500.0
3  13/2/2011   Comedy   2000            1800.0

Example 2: We can achieve the same result by directly performing the required operation on the desired column element-wise.

Python3

import pandas as pd
 
# Creating the DataFrame
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
                    'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
                    'Cost':[10000, 5000, 15000, 2000]})
 
# Create a new column 'Discounted_Price' after applying
# 10% discount on the existing 'Cost' column.
 
# create a new column
df['Discounted_Price'] = df['Cost'] - (0.1 * df['Cost'])
 
# Print the DataFrame after
# addition of new column
print(df)

Output :

        Date    Event   Cost  Discounted_Price
0  10/2/2011    Music  10000            9000.0
1  11/2/2011   Poetry   5000            4500.0
2  12/2/2011  Theatre  15000           13500.0
3  13/2/2011   Comedy   2000            1800.0

Example 3: Using DataFrame.map() function to create new column from existing column using a mapping function

We will create a dataframe with some sample data:

Python3

data = {
    "name": ["John", "Ted", "Dev", "Brad", "Rex", "Smith", "Samuel", "David"],
    "salary": [10000, 20000, 50000, 45500, 19800, 95000, 5000, 50000]
}
# create dataframe from data dictionary
df = pd.DataFrame(data)
# print the dataframe
display(df.head())

Output:

    name    salary
0    John    10000
1    Ted    20000
2    Dev    50000
3    Brad    45500
4    Rex    19800

Now, we will create a mapping function (salary_stats) and use the DataFrame.map() function to create a new column from an existing column

Python3

def salary_stats(value):
    if value < 10000:
        return "very low"
    if 10000 <= value < 25000:
        return "low"
    elif 25000 <= value < 40000:
        return "average"
    elif 40000 <= value < 50000:
        return "better"
    elif value >= 50000:
        return "very good"
 
df['salary_stats'] = df['salary'].map(salary_stats)
display(df.head())

Output:

name    salary    salary_stats
0    John    10000    low
1    Ted    20000    low
2    Dev    50000    very good
3    Brad    45500    better
4    Rex    19800    low

Explanation: Here we have used pandas DataFrame.map() function to map each value to a string based on our defined mapping logic. The resultant series of values is assigned to a new column, “salary_stats”.

Create a new column in Pandas DataFrame based on the existing columns

Python3

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US