Sunday, September 22, 2024
Google search engine
HomeData Modelling & AIFind the Longest Common Substring using Binary search and Rolling Hash

Find the Longest Common Substring using Binary search and Rolling Hash

Given two strings X and Y, the task is to find the length of the longest common substring. 

Examples:

Input: X = “neveropen”, y = “GeeksQuiz” 
Output:
Explanation: The longest common substring is “Geeks” and is of length 5.

Input: X = “abcdxyz”, y = “xyzabcd” 
Output:
Explanation: The longest common substring is “abcd” and is of length 4.

Input: X = “zxabcdezy”, y = “yzabcdezx” 
Output:
Explanation: The longest common substring is “abcdez” and is of length 6.

Longest Common Substring using Dynamic Programming:

This problem can be solved using dynamic programming in O(len(X) * len(Y)), see this. In this article we are going to discuss about an efficient approach.

Longest Common Substring using Binary Search and Rolling Hash

Pre-requisites:

Observation:

If there is a common substring of length K in both the strings, then there will be common substrings of length 0, 1, …, K – 1. Hence, binary search on answer can be applied.\

Follow the below steps to implement the idea:

  • Smallest possible answer(low) = 0 and largest possible answer(high) = min(len(X), len(Y)), Range of binary search will be [0, min(len(X), len(Y))].
    • For every mid, check if there exists a common substring of length mid, if exists then update low, else update high.
    • To check the existence of a common substring of length K, Polynomial rolling hash function can be used. 
      • Iterate over all the windows of size K  in string X and string Y and get the hash
      • If there is a common hash return True, else return False.

Below is the implementation of this approach.

Java




import java.util.HashSet;
import java.util.Set;
 
class ComputeHash {
    private long[] hash;
    private long[] invMod;
    private long mod;
    private long p;
 
    // Generates hash in O(n(log(n)))
    public ComputeHash(String s, long p, long mod) {
        int n = s.length();
        this.hash = new long[n];
        this.invMod = new long[n];
        this.mod = mod;
        this.p = p;
 
        long pPow = 1;
        long hashValue = 0;
 
        for (int i = 0; i < n; i++) {
            char c = s.charAt(i);
            c = (char) (c - 'A' + 1);
            hashValue = (hashValue + c * pPow) % this.mod;
            this.hash[i] = hashValue;
            this.invMod[i] = (long)(Math.pow(pPow, this.mod - 2) % this.mod);
            pPow = (pPow * this.p) % this.mod;
        }
    }
 
    // Return hash of a window in O(1)
    public long getHash(int l, int r) {
        if (l == 0) {
            return this.hash[r];
        }
 
        long window = (this.hash[r] - this.hash[l - 1]) % this.mod;
        return (window * this.invMod[l]) % this.mod;
    }
}
 
public class Main {
    // Function to get the longest common substring
    public static int longestCommonSubstr(String X, String Y) {
    int n = X.length();
    int m = Y.length();
 
    long p1 = 31;
    long p2 = 37;
    long m1 = (long) (Math.pow(10, 9) + 9);
    long m2 = (long) (Math.pow(10, 9) + 7);
 
    // Initialize two hash objects
    // with different p1, p2, m1, m2
    // to reduce collision
    ComputeHash hashX1 = new ComputeHash(X, p1, m1);
    ComputeHash hashX2 = new ComputeHash(X, p2, m2);
 
    ComputeHash hashY1 = new ComputeHash(Y, p1, m1);
    ComputeHash hashY2 = new ComputeHash(Y, p2, m2);
 
    // Function that returns the existence
    // of a common substring of length k
    int low = 0, high = Math.min(n, m);
    int answer = 0;
    while (low <= high) {
        int mid = (low + high) / 2;
        if (exists(mid, X, Y)) {
        answer = mid;
        low = mid + 1;
    } else {
        high = mid - 1;
    }
 
    }
    return answer;
}
    private static boolean exists(int k, String X, String Y) {
    for (int i = 0; i <= X.length() - k; i++) {
        for (int j = 0; j <= Y.length() - k; j++) {
            if (X.substring(i, i + k).equals(Y.substring(j, j + k))) {
                return true;
            }
        }
    }
    return false;
}
 
    public static void main(String[] args) {
    String X = "neveropen";
    String Y = "GeeksQuiz";
    System.out.println(longestCommonSubstr(X, Y));
}
}


Python3




# Python code to implement the approach
 
# Function to implement rolling hash
class ComputeHash:
 
    # Generates hash in O(n(log(n)))
    def __init__(self, s, p, mod):
        n = len(s)
        self.hash = [0] * n
        self.inv_mod = [0] * n
        self.mod = mod
        self.p = p
 
        p_pow = 1
        hash_value = 0
 
        for i in range(n):
            c = ord(s[i]) - 65 + 1
            hash_value = (hash_value + c * p_pow) % self.mod
            self.hash[i] = hash_value
            self.inv_mod[i] = pow(p_pow, self.mod - 2, self.mod)
            p_pow = (p_pow * self.p) % self.mod
 
    # Return hash of a window in O(1)
    def get_hash(self, l, r):
 
        if l == 0:
            return self.hash[r]
 
        window = (self.hash[r] - self.hash[l - 1]) % self.mod
        return (window * self.inv_mod[l]) % self.mod
 
# Function to get the longest common substring
def longestCommonSubstr(X, Y, n, m):
 
    p1, p2 = 31, 37
    m1, m2 = pow(10, 9) + 9, pow(10, 9) + 7
 
    # Initialize two hash objects
    # with different p1, p2, m1, m2
    # to reduce collision
    hash_X1 = ComputeHash(X, p1, m1)
    hash_X2 = ComputeHash(X, p2, m2)
 
    hash_Y1 = ComputeHash(Y, p1, m1)
    hash_Y2 = ComputeHash(Y, p2, m2)
 
    # Function that returns the existence
    # of a common substring of length k
    def exists(k):
 
        if k == 0:
            return True
 
        st = set()
         
        # Iterate on X and get hash tuple
        # of all windows of size k
        for i in range(n - k + 1):
            h1 = hash_X1.get_hash(i, i + k - 1)
            h2 = hash_X2.get_hash(i, i + k - 1)
 
            cur_window_hash = (h1, h2)
             
            # Put the hash tuple in the set
            st.add(cur_window_hash)
 
        # Iterate on Y and get hash tuple
        # of all windows of size k
        for i in range(m - k + 1):
            h1 = hash_Y1.get_hash(i, i + k - 1)
            h2 = hash_Y2.get_hash(i, i + k - 1)
 
            cur_window_hash = (h1, h2)
             
            # If hash exists in st return True
            if cur_window_hash in st:
                return True
        return False
 
    # Binary Search on length
    answer = 0
    low, high = 0, min(n, m)
 
    while low <= high:
        mid = (low + high) // 2
 
        if exists(mid):
            answer = mid
            low = mid + 1
        else:
            high = mid - 1
 
    return answer
 
 
# Driver Code
if __name__ == '__main__':
    X = 'neveropen'
    Y = 'GeeksQuiz'
    print(longestCommonSubstr(X, Y, len(X), len(Y)))


Javascript




// javascript code to implement the approach
class ComputeHash {
  constructor(mod) {
    this.mod = mod;
  }
  compute(s) {
    let hashValue = 0n;
    let pPow = 1n;
    for (let i = 0; i < s.length; i++) {
      let c = BigInt(s.charCodeAt(i) - "A".charCodeAt(0) + 1);
      hashValue = (hashValue + c * pPow) % this.mod;
      pPow = pPow * 26n % this.mod;
    }
    return hashValue;
  }
}
 
function longestCommonSubstr(s, t) {
  const mod = BigInt(10**9+7);
  const p1 = new ComputeHash(mod);
  const p2 = new ComputeHash(mod);
  let left = 0, right = Math.min(s.length, t.length);
  while (left < right) {
    const mid = Math.floor((left + right + 1) / 2);
    let found = false;
    const set = new Set();
    for (let i = 0; i + mid <= s.length; i++) {
      const hashValue = p1.compute(s.substring(i, i + mid));
      set.add(hashValue);
    }
    for (let i = 0; i + mid <= t.length; i++) {
      const hashValue = p2.compute(t.substring(i, i + mid));
      if (set.has(hashValue)) {
        found = true;
        break;
      }
    }
    if (found) {
      left = mid;
    } else {
      right = mid - 1;
    }
  }
  return left;
}
 
console.log(longestCommonSubstr("ABABAB", "BABABA"));  // expected output: 3
 
//code is implemented by chetanbargal


Output

5

Time Complexity: O(n * log(m1)) + O(m * log((m1)) + O((n + m) * log(min(n, m)))

  1. Generating hash object takes O(n*log(m1)), where n is the length of string and m1 = pow(10, 9) + 7.
  2. Binary search takes O(log(min(n, m))), where n, m are the lengths of both strings.
  3. Hash of a window takes O(1) time.
  4. Exist function takes O(n + m) time.

Auxiliary Space: O(n + m)

Feeling lost in the world of random DSA topics, wasting time without progress? It’s time for a change! Join our DSA course, where we’ll guide you on an exciting journey to master DSA efficiently and on schedule.
Ready to dive in? Explore our Free Demo Content and join our DSA course, trusted by over 100,000 neveropen!

RELATED ARTICLES

Most Popular

Recent Comments