When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement.
We can use Pandas .str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str.split
, it can be used with split to get the desired part of the string. To get the nth part of the string, first split the column by delimiter and apply str[n-1] again on the object returned, i.e. Dataframe.columnName.str.split(" ").str[n-1]
.
Let’s make it clear by examples.
Code #1: Print a data object of the splitted column.
import pandas as pd import numpy as np df = pd.DataFrame({ 'Geek_ID' :[ 'Geek1_id' , 'Geek2_id' , 'Geek3_id' , 'Geek4_id' , 'Geek5_id' ], 'Geek_A' : [ 1 , 1 , 3 , 2 , 4 ], 'Geek_B' : [ 1 , 2 , 3 , 4 , 6 ], 'Geek_R' : np.random.randn( 5 )}) # Geek_A Geek_B Geek_ID Geek_R # 0 1 1 Geek1_id random number # 1 1 2 Geek2_id random number # 2 3 3 Geek3_id random number # 3 2 4 Geek4_id random number # 4 4 6 Geek5_id random number print (df.Geek_ID. str .split( '_' ). str [ 0 ]) |
0 Geek1 1 Geek2 2 Geek3 3 Geek4 4 Geek5 dtype: object
Code #2: Print a list of returned data object.
import pandas as pd import numpy as np df = pd.DataFrame({ 'Geek_ID' :[ 'Geek1_id' , 'Geek2_id' , 'Geek3_id' , 'Geek4_id' , 'Geek5_id' ], 'Geek_A' : [ 1 , 1 , 3 , 2 , 4 ], 'Geek_B' : [ 1 , 2 , 3 , 4 , 6 ], 'Geek_R' : np.random.randn( 5 )}) # Geek_A Geek_B Geek_ID Geek_R # 0 1 1 Geek1_id random number # 1 1 2 Geek2_id random number # 2 3 3 Geek3_id random number # 3 2 4 Geek4_id random number # 4 4 6 Geek5_id random number print (df.Geek_ID. str .split( '_' ). str [ 0 ].tolist()) |
['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5']
Code #3: Print a list of elements.
import pandas as pd import numpy as np df = pd.DataFrame({ 'Geek_ID' :[ 'Geek1_id' , 'Geek2_id' , 'Geek3_id' , 'Geek4_id' , 'Geek5_id' ], 'Geek_A' : [ 1 , 1 , 3 , 2 , 4 ], 'Geek_B' : [ 1 , 2 , 3 , 4 , 6 ], 'Geek_R' : np.random.randn( 5 )}) # Geek_A Geek_B Geek_ID Geek_R # 0 1 1 Geek1_id random number # 1 1 2 Geek2_id random number # 2 3 3 Geek3_id random number # 3 2 4 Geek4_id random number # 4 4 6 Geek5_id random number print (df.Geek_ID. str .split( '_' ). str [ 1 ].tolist()) |
['id', 'id', 'id', 'id', 'id']