We are given a list of pattern strings and a single input string. We need to find all possible close good enough matches of input string into list of pattern strings. Examples:
Input : patterns = ['ape', 'apple', 'peach', 'puppy'], input = 'appel' Output : ['apple', 'ape']
We can solve this problem in python quickly using in built function difflib.get_close_matches().
How does difflib.get_close_matches() function work in Python ?
difflib.get_close_matches(word, possibilities, n, cutoff) accepts four parameters in which n, cutoff are optional. word is a sequence for which close matches are desired, possibilities is a list of sequences against which to match word. Optional argument n (default 3) is the maximum number of close matches to return, n must be greater than 0. Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don’t score at least that similar to word are ignored. The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.
Python3
# Function to find all close matches of # input string in given list of possible strings from difflib import get_close_matches def closeMatches(patterns, word): print (get_close_matches(word, patterns)) # Driver program if __name__ = = "__main__": word = 'appel' patterns = [ 'ape' , 'apple' , 'peach' , 'puppy' ] closeMatches(patterns, word) |
References : https://docs.python.org/2/library/difflib.html
Output:
['apple', 'ape']
Time complexity : O(n*m), where n is the number of elements in the input list patterns, and m is the length of the input string word. The reason for this is that the get_close_matches() function from the difflib library uses a dynamic programming algorithm to find approximate matches, which can take O(nm) time.
Space complexity : O(n), as it only uses a fixed amount of memory, regardless of the size of the input string.