Thursday, July 4, 2024
HomeData ModellingDynamic ProgrammingImplementing Regular Expression Matching

Implementing Regular Expression Matching

Given two strings S and P where S consists of only lowercase English alphabets while P consists of lowercase English alphabets as well as special characters ‘.’ and ‘*’, the task is to implement a function to test regular expression such that:

  • '.' Matches any single character.​​​​
  • '*' Matches zero or more of the preceding element.

Note: For each appearance of the character ‘*', there will be a previous valid character to match.

Examples:

Input: s = "aaa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aaa".

Input: s = "abb", p = "a.*"
Output: true
Explanation: replace . with b then p becomes ab* now replace * with one preceeding character hence p becomes abb.

Approach: To solve the problem follow the below observations:

  • Suppose: s = “aa” and p = “aa”, since all the characters in both s and p are the same, it’s a match.
  • Suppose:  s = “aabb” and p = “aab*”, We know that substrings bb and b* match because * can be replaced by one b. Since we already know that the remaining substrings “aa” and “aa” are matched, the whole strings are also a match.

What can we infer from this? Right, if we have a solution of part of a problem, we can use that partial result and can go forward. Also, we can use the already calculated result without calculating it again.

Does this ring a bell ????? Yes, this problem satisfies the following two properties –

  • Optimal Substructure — Any problem has optimal substructure property if its overall optimal solution can be constructed from the optimal solutions of its subproblems.
  • Overlapping Subproblems — Any problem has overlapping sub-problems if finding its solution involves solving the same subproblem multiple times.

It is now evident that we can use Dynamic Programming to solve this problem. Below are the steps —

  • Create a boolean 2D dp array with sizes as boolean[][] dp = new boolean[s.length() + 1][p.length() + 1]. We are adding extra 1 to incorporate the case in case either or both of the strings are empty.
  • If both strings are empty, then it’s a match, thus, dp[0][0] = true.
  • Let’s take an example s = "aab" and p = "c*a*b" and create a DP table.

c

*

a

*

b

0

1

2

3

4

5

0

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

a

1

FALSE

FALSE

FALSE

TRUE

TRUE

FALSE

a

2

FALSE

FALSE

FALSE

FALSE

TRUE

FALSE

b

3

FALSE

FALSE

FALSE

FALSE

FALSE

TRUE

  • First column — it means p is empty and it will match to s only if s is also empty which we have stored in dp[0][0]. Thus, remaining values of dp[0][i] will be false.
  • First row — this is not so easy. It means which p matches empty s. The answer is either an empty pattern or a pattern that represents an empty string such as "a*""x*y*""l*m*n*" and so on. In the above example, if s = "" and p = "c*", then due to *c can be replaced by 0 cs which gives us an empty string. Hence, dp[0][2] = true.
  • For non-empty strings, let’s say that s[i - 1] == p[j - 1] this means the (i – 1)th and (j – 1)th characters are same. This means, we have to check if the remaining strings are a match or not. If they are a match, then the current substrings will be a match, otherwise they won’t be a match i.e., dp[i][j] = dp[i - 1][j - 1]. We’re taking (i – 1)th and (j – 1)th characters to offset empty strings as we’re assuming our strings start from index 1.
  • If p[j - 1] == ".", then it means any single character can be matched. Therefore, here also, we will have to check if the remaining string is a match or not. Thus, dp[i][j] = dp[i - 1][j - 1].
  • If p[j - 1] == "*", then it means either it’s represents an empty string (0 characters), thus dp[i][j] = dp[i][j - 2] or s[i - 1] == p[j - 2] || p[j - 2] == ".", then current character of string equals the char preceding *' in pattern so the result is dp[i-1][j].

Below is the implementation of above approach:

C++




#include <iostream>
#include <vector>
using namespace std;
 
bool isMatch(string s, string p) {
    int rows = s.length();
    int columns = p.length();
 
    // Base conditions
    if (rows == 0 && columns == 0) {
        return true;
    }
    if (columns == 0) {
        return false;
    }
 
    // DP array
    vector<vector<bool>> dp(rows + 1, vector<bool>(columns + 1, false));
 
    // Empty string and empty pattern are a match
    dp[0][0] = true;
 
    // Deals with patterns with *
    for (int i = 2; i <= columns; i++) {
        if (p[i - 1] == '*') {
            dp[0][i] = dp[0][i - 2];
        }
    }
 
    // For remaining characters
    for (int i = 1; i <= rows; i++) {
        for (int j = 1; j <= columns; j++) {
            if (s[i - 1] == p[j - 1] || p[j - 1] == '.') {
                dp[i][j] = dp[i - 1][j - 1];
            } else if (j > 1 && p[j - 1] == '*') {
                dp[i][j] = dp[i][j - 2];
                if (p[j - 2] == '.' || p[j - 2] == s[i - 1]) {
                    dp[i][j] = dp[i][j] || dp[i - 1][j];
                }
            }
        }
    }
    return dp[rows][columns];
}
 
int main() {
    cout << boolalpha << isMatch("aab", "a.*") << endl;
    return 0;
}


Java




// Java Code for above approach
import java.io.*;
import java.util.Scanner;
public class RegularExpressionMatching {
 
    public static boolean isMatch(String s, String p)
    {
        int rows = s.length();
        int columns = p.length();
 
        /// Base conditions
        if (rows == 0 && columns == 0) {
            return true;
        }
        if (columns == 0) {
            return false;
        }
 
        // DP array
        boolean[][] dp = new boolean[rows + 1][columns + 1];
 
        // Empty string and empty pattern
        // are a match
        dp[0][0] = true;
 
        // Deals with patterns with *
        for (int i = 2; i < columns + 1; i++) {
            if (p.charAt(i - 1) == '*') {
                dp[0][i] = dp[0][i - 2];
            }
        }
 
        // For remaining characters
        for (int i = 1; i < rows + 1; i++) {
            for (int j = 1; j < columns + 1; j++) {
                if (s.charAt(i - 1) == p.charAt(j - 1)
                    || p.charAt(j - 1) == '.') {
                    dp[i][j] = dp[i - 1][j - 1];
                }
                else if (j > 1 && p.charAt(j - 1) == '*') {
                    dp[i][j] = dp[i][j - 2];
                    if (p.charAt(j - 2) == '.'
                        || p.charAt(j - 2)
                               == s.charAt(i - 1)) {
                        dp[i][j] = dp[i][j] | dp[i - 1][j];
                    }
                }
            }
        }
        return dp[rows][columns];
    }
 
    public static void main(String[] args)
    {
        System.out.println(isMatch("aab", "a.*"));
    }
}


Python3




def isMatch(s: str, p: str) -> bool:
    rows, columns = (len(s), len(p))
    # Base conditions
    if rows == 0 and columns == 0:
        return True
    if columns == 0:
        return False
    # DP array
    dp = [[False for j in range(columns + 1)] for i in range(rows + 1)]
    # Since empty string and empty pattern are a match
    dp[0][0] = True
    # Deals with patterns containing *
    for i in range(2, columns + 1):
        if p[i - 1] == '*':
            dp[0][i] = dp[0][i - 2]
    # For remaining characters
    for i in range(1, rows + 1):
        for j in range(1, columns + 1):
            if s[i - 1] == p[j - 1] or p[j - 1] == '.':
                dp[i][j] = dp[i - 1][j - 1]
            elif j > 1 and p[j - 1] == '*':
                dp[i][j] = dp[i][j - 2]
                if p[j - 2] == '.' or p[j - 2] == s[i - 1]:
                    dp[i][j] = dp[i][j] or dp[i - 1][j]
    return dp[rows][columns]
   
if __name__ == '__main__':
    print(isMatch("aa", "a"))
    print(isMatch("aa", "a*"))
    print(isMatch("ab", "."))
    print(isMatch("aab", "c*a*b"))
    print(isMatch("mississippi", "mis*is*p*."))
    print(isMatch("", ".*"))


Javascript




var isMatch = function (s, p) {
    const rows = s.length;
    const columns = p.length;
    /// Base conditions
    if (rows == 0 && columns == 0) {
        return true;
    }
    if (columns == 0) {
        return false;
    }
    // DP array
    const dp = Array.from({ length: s.length + 1 }, () => [false]);
    // Empty string and empty pattern are a match
    dp[0][0] = true;
    // Deals with patterns with *
    for (let i = 1; i < columns + 1; i++) {
        if (p[i - 1] === '*') {
            dp[0][i] = dp[0][i - 2];
        }
        else {
            dp[0][i] = false;
        };
    }
    // For remaining characters
    for (let i = 1; i < rows + 1; i++) {
        for (let j = 1; j < columns + 1; j++) {
            if (p[j - 1] === '*') {
                if (p[j - 2] === s[i - 1] || p[j - 2] === '.') {
                    dp[i][j] = dp[i][j - 2] || dp[i - 1][j];
                } else {
                    dp[i][j] = dp[i][j - 2];
                }
            } else if (p[j - 1] === s[i - 1] || p[j - 1] === '.') {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = false;
            }
        }
    }
    return dp[rows][columns];
};
 
console.log(isMatch("aa", "a"));
console.log(isMatch("aa", "a*"));
console.log(isMatch("an", "."));
console.log(isMatch("aab", "c*a*b"));
console.log(isMatch("mississippi", "mis*is*p*."));
console.log(isMatch("", ".*"));
console.log(isMatch("ab", ".*c"));


Output

true



Time Complexity: O(m×n)
Auxiliary Space: O(m×n)

Feeling lost in the world of random DSA topics, wasting time without progress? It’s time for a change! Join our DSA course, where we’ll guide you on an exciting journey to master DSA efficiently and on schedule.
Ready to dive in? Explore our Free Demo Content and join our DSA course, trusted by over 100,000 neveropen!

Dominic Rubhabha Wardslaus
Dominic Rubhabha Wardslaushttps://neveropen.dev
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments