Saturday, February 14, 2026
HomeLanguagesString Munging In Pandas Dataframe

String Munging In Pandas Dataframe

In this article, we are going to learn about String Munging In Pandas Dataframe. Munging is known as cleaning up anything which was messy by transforming them. In technical terms, we can say that transforming the data in the database into a useful form.

Example: “no-one@example.com”, becomes “no-one at example dot com”

Approach:

Step 1: import the library

Python3




import pandas as pd
import numpy as np
import re as re


Step 2: creating Dataframe

Now create a dictionary and pass it through pd.DataFrame to create a Dataframe.

Python3




raw_data = {"first_name": ["Jason", "Molly", "Tina", "Jake", "Amy"],
            "last_name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze"],
            "email": ["jas203@gmail.com", "momomolly@gmail.com", np.NAN,
                      "battler@milner.com", "Ames1234@yahoo.com"]}
  
df = pd.DataFrame(raw_data, columns=["first_name", "last_name", "email"])
print()
print(df)


Step 3: Applying Different Munging Operation

First, check that in feature “email” which string contains “Gmail”.

Python3




print(df["email"].str.contains("gmail"))


Now we want to separate the email into parts such that characters before “@” becomes one string and after and before “.” becomes one. At last, the remaining becomes the one string.

Python3




pattern = "([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{2,4})"
print(df["email"].str.findall(pattern, flags=re.IGNORECASE))


Below is the implementation:

Python3




def ProjectPro_Ex_136():
  
    print()
    print('**How we can do string munging in Pandas**')
  
    # loading libraries
    import pandas as pd
    import numpy as np
    import re as re
  
    # Creating dataframe
    raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
                'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
                'email': ['jas203@gmail.com', 'momomolly@gmail.com', np.NAN,
                          'battler@milner.com', 'Ames1234@yahoo.com']}
  
    df = pd.DataFrame(raw_data, columns=['first_name', 'last_name', 'email'])
    print()
    print(df)
  
    # Let us find Which string within the 
    # email column contains ‘gmail’
    print()
    print(df['email'].str.contains('gmail'))
  
    # Create a daily expression pattern that
    # breaks apart emails
    pattern = '([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\\.([A-Z]{2,4})'
  
    # Find everything in df.email that contains
    # that pattern
    print()
    print(df['email'].str.findall(pattern, flags=re.IGNORECASE))
  
  
ProjectPro_Ex_136()


Output:

Dominic
Dominichttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Dominic
32503 POSTS0 COMMENTS
Milvus
129 POSTS0 COMMENTS
Nango Kala
6880 POSTS0 COMMENTS
Nicole Veronica
12003 POSTS0 COMMENTS
Nokonwaba Nkukhwana
12095 POSTS0 COMMENTS
Shaida Kate Naidoo
7011 POSTS0 COMMENTS
Ted Musemwa
7253 POSTS0 COMMENTS
Thapelo Manthata
6964 POSTS0 COMMENTS
Umr Jansen
6954 POSTS0 COMMENTS