Find most similar sentence in the file to the input sentence | NLP

26 July 2024

1

In this article, we will find the most similar sentence in the file to the input sentence.

Example:

File content:
"This is movie."
"This is romantic movie"
"This is a girl."

Input: "This is a boy"

Similar sentence to input: 
"This is a girl", "This is movie".

Approach:

Create a list to store all the unique words of the file.
Convert all the sentences of the file into the binary format by comparing each word with the content of the list, after cleaning(removing stopword, stemming, etc.)
Convert the input sentence in the binary format.
Find the number of similar words in the input sentence to each sentence and store the value in the list named similarity index.
Find the maximum value of similarity index and return the sentence having maximum similar words.

Content of the file:

Code to get a similar sentence:

Python3

from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize, sent_tokenize 
import nltk 
from nltk.corpus import stopwords 
  
  
nltk.download('stopwords') 
ps = PorterStemmer() 
f = open('romyyy.txt') 
a = sent_tokenize(f.read()) 
  
# removal of stopwords 
stop_words = list(stopwords.words('english')) 
  
# removal of punctuation signs 
punc = '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
s = [(word_tokenize(a[i])) for i in range(len(a))] 
outer_1 = [] 
  
for i in range(len(s)): 
    inner_1 = [] 
      
    for j in range(len(s[i])): 
          
        if s[i][j] not in (punc or stop_words): 
            s[i][j] = ps.stem(s[i][j]) 
              
            if s[i][j] not in stop_words: 
                inner_1.append(s[i][j].lower()) 
      
    outer_1.append(set(inner_1)) 
rvector = outer_1[0] 
  
for i in range(1, len(s)): 
    rvector = rvector.union(outer_1[i]) 
outer = [] 
  
for i in range(len(outer_1)): 
    inner = [] 
      
    for w in rvector: 
          
        if w in outer_1[i]: 
            inner.append(1) 
          
        else: 
            inner.append(0) 
    outer.append(inner) 
comparison = input("Input: ") 
  
  
check = (word_tokenize(comparison)) 
check = [ps.stem(check[i]).lower() for i in range(len(check))] 
  
  
check1 = [] 
for w in rvector: 
    if w in check: 
        check1.append(1)  # create a vector 
    else: 
        check1.append(0) 
  
ds = [] 
  
for j in range(len(outer)): 
    similarity_index = 0
    c = 0
      
    if check1 == outer[j]: 
        ds.append(0) 
    else: 
        for i in range(len(rvector)): 
  
            c += check1[i]*outer[j][i] 
  
        similarity_index += c 
        ds.append(similarity_index) 
  
  
ds 
maximum = max(ds) 
print() 
print() 
print("Similar sentences: ") 
for i in range(len(ds)): 
  
    if ds[i] == maximum: 
        print(a[i]) 

Output:

Find most similar sentence in the file to the input sentence | NLP

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US