Thursday, December 26, 2024
Google search engine
HomeLanguagesPython | Filter list of strings based on the substring list

Python | Filter list of strings based on the substring list

Given two lists of strings string and substr, write a Python program to filter out all the strings in string that contains string in substr

Examples:

Input : string = [‘city1’, ‘class5’, ‘room2’, ‘city2’]
substr = [‘class’, ‘city’]
Output : [‘city1’, ‘class5’, ‘city2’]

Input : string = [‘coordinates’, ‘xyCoord’, ‘123abc’]
substr = [‘abc’, ‘xy’]
Output : [‘xyCoord’, ‘123abc’]

Method #1: Using List comprehension. 
We can use list comprehension along with in operator to check if the string in ‘substr’ is contained in ‘string’ or not. 

Python3




# Python3 program to Filter list of
# strings based on another list
import re
 
def Filter(string, substr):
    return [str for str in string if
             any(sub in str for sub in substr)]
     
# Driver code
string = ['city1', 'class5', 'room2', 'city2']
substr = ['class', 'city']
print(Filter(string, substr))


Output:

['city1', 'class5', 'city2']

Time complexity: O(n * m), where n is the number of strings in the input list “string” and m is the number of substrings in the input list “substr”.
Auxiliary space: O(1), as the function uses only a few variables and doesn’t create any additional data structures.

Method #2: Python Regex 

Python3




# Python3 program to Filter list of
# strings based on another list
import re
 
def Filter(string, substr):
    return [str for str in string
    if re.match(r'[^\d]+|^', str).group(0) in substr]
     
# Driver code
string = ['city1', 'class5', 'room2', 'city2']
substr = ['class', 'city']
print(Filter(string, substr))


Output:

['city1', 'class5', 'city2']

The time complexity of this program is O(nm), where n is the length of the string list and m is the length of the substr list.

The space complexity of this program is O(k), where k is the maximum length of a string in the string list. 

Method #3 : Using find() method.

find() method searches for the string that is passed as argument in given string and returns the position or else returns -1.

Python3




# Python3 program to Filter list of
# strings based on another list
 
string = ['city1', 'class5', 'room2', 'city2']
substr = ['class', 'city']
x=[]
for i in substr:
    for j in string:
        if(j.find(i)!=-1 and j not in x):
            x.append(j)
print(x)


Output

['class5', 'city1', 'city2']

The time complexity of this program is O(mn), where m is the length of the substr list and n is the length of the string list. 

The auxiliary space complexity of this program is O(k), where k is the size of the resulting list that contains the filtered strings. 

Method #4 : Using the filter function and a lambda function:
The filter function is a built-in Python function that takes in two arguments: a function and an iterable. It returns an iterator that returns the elements of the iterable for which the function returns True.

In this case, we are using a lambda function as the first argument to the filter function. The lambda function takes in a string x and returns True if any of the substrings in the substrings list appear in x, and False otherwise. The second argument to the filter function is the strings list, which is the iterable that we want to filter.

Therefore, the filter function returns an iterator that returns all the elements of the strings list for which the lambda function returns True. In this case, the lambda function returns True for the elements ‘city1’, ‘class5’, and ‘city2’, so the iterator returned by the filter function will contain those elements.

Python3




# Initialize the list of strings and the list of substrings
strings = ['city1', 'class5', 'room2', 'city2']
substrings = ['class', 'city']
 
# Use the filter function and a lambda function to filter the strings
filtered_strings = list(filter(lambda x: any(substring in x for substring in substrings), strings))
 
# Print the filtered strings
print(filtered_strings)
#This code is contributed by Edula Vinay Kumar Reddy


Output

['city1', 'class5', 'city2']

Time complexity: O(n^2), where n is the length of the strings list
Auxiliary Space: O(n), where n is the length of the filtered_strings list

Method #5: Using a for loop:

 step-by-step approach for the given program:

  1. Define a function called “Filter” that takes two arguments: “string” and “substr”.
  2. Create an empty list called “filtered_list” to store the filtered strings.
  3. Use a for loop to iterate over each string in the input list “string”.
  4. Inside the first for loop, use another for loop to iterate over each substring in the filter list “substr”.
  5. Use an if statement to check if the current substring is present in the current string.
  6. If the substring is found in the string, add the string to the “filtered_list” using the “append” method and break out of the inner loop using the “break” keyword.
  7. Once all the substrings have been checked for the current string, move on to the next string in the input list.
  8. After all the strings have been checked against all the substrings, return the final filtered list using the “return” keyword.
  9. Define the input list of strings as “string” and the filter list of substrings as “substr”.
  10. Call the “Filter” function with the “string” and “substr” arguments and store the result in “filtered_list”.
  11. Print the “filtered_list” using the “print” statement.

Python3




# Define a function to filter a list of strings based on another list of substrings
def Filter(string, substr):
    # Create an empty list to store the filtered strings
    filtered_list = []
     
    # Loop over each string in the input list
    for s in string:
        # Loop over each substring in the filter list
        for sub in substr:
            # Check if the substring is in the current string
            if sub in s:
                # If it is, add the string to the filtered list and break out of the inner loop
                filtered_list.append(s)
                break
                 
    # Return the final filtered list
    return filtered_list
 
# Define the input list of strings and the filter list of substrings
string = ['city1', 'class5', 'room2', 'city2']
substr = ['class', 'city']
 
# Call the filter function with the input lists and print the result
filtered_list = Filter(string, substr)
print(filtered_list)


Output

['city1', 'class5', 'city2']

Time complexity: O(nm), where n is the length of the input string list and m is the length of the filter substring list.
Auxiliary space:  O(k), where k is the length of the filtered list.

Method 6: Using the “any” function and a generator expression:

Step-by-step approach:

  • Define a function named “filter_strings” that takes two arguments: a list of strings and a list of substrings.
  • Use the “any” function and a generator expression to create a filter condition. The generator expression should loop over each substring in the filter list and check if it is in the current string.
  • Use the built-in “filter” function to filter the input list based on the filter condition.
  • Convert the filtered iterator to a list and return 

Below is the implementation of the above approach:

Python3




def filter_strings(string_list, substr_list):
    # Create a filter condition using the "any" function and a generator expression
    filter_cond = (any(sub in s for sub in substr_list) for s in string_list)
     
    # Use the "filter" function to filter the input list based on the filter condition
    filtered_iterator = filter(lambda x: x[1], zip(string_list, filter_cond))
     
    # Convert the filtered iterator to a list and return it
    filtered_list = [x[0] for x in filtered_iterator]
    return filtered_list
string_list = ['city1', 'class5', 'room2', 'city2']
substr_list = ['class', 'city']
 
filtered_list = filter_strings(string_list, substr_list)
print(filtered_list)


Output

['city1', 'class5', 'city2']

Time complexity: O(n*m), where n is the length of the input list and m is the average length of the substrings in the filter list. 
Auxiliary space: O(n), where n is the length of the input list. 

Method 7: Using the str.contains() method of pandas DataFrame

  1. Import the pandas module: import pandas as pd
  2. Define a function named filter_strings that takes two parameters: string_list and substr_list.
  3. Create a DataFrame using the pd.DataFrame() constructor and pass a dictionary with a single column named ‘string’. The values of this column are taken from the string_list.
  4. Use the str.contains() method of the ‘string’ column in the DataFrame to check if each string contains any of the substrings in substr_list. The ‘|’.join(substr_list) joins the substrings with the ‘|’ character to create a regex pattern that matches any of the substrings.
  5. Store the resulting boolean mask (True or False) in the filter_cond variable.
  6. Use the boolean mask filter_cond to filter the DataFrame (df.loc[filter_cond, ‘string’]) and select only the rows where the condition is True.
  7. Convert the filtered DataFrame column to a list using the tolist() method and store it in the filtered_list variable.
  8. Return the filtered_list from the filter_strings function.
  9. Define a string_list variable that contains the list of strings to filter.
  10. Define a substr_list variable that contains the list of substrings to match against the strings.
  11. Call the filter_strings function with string_list and substr_list as arguments and store the result in the filtered_list variable.
  12. Print the filtered_list to display the filtered strings that match the substrings.

Python3




import pandas as pd
 
def filter_strings(string_list, substr_list):
    df = pd.DataFrame({'string': string_list})
    filter_cond = df['string'].str.contains('|'.join(substr_list))
    filtered_list = df.loc[filter_cond, 'string'].tolist()
    return filtered_list
 
string_list = ['city1', 'class5', 'room2', 'city2']
substr_list = ['class', 'city']
 
filtered_list = filter_strings(string_list, substr_list)
print(filtered_list)


OUTPUT :
['city1', 'class5', 'city2']

Time complexity:The time complexity of this method depends on the implementation of the str.contains() method in pandas. It can be considered to have a time complexity of O(n * m), where n is the length of the string_list and m is the average length of strings in substr_list.

Auxiliary space:This method requires additional space to store the DataFrame. The space complexity is O(n), where n is the length of the string_list.

RELATED ARTICLES

Most Popular

Recent Comments