Python | Measure similarity between two sentences using cosine similarity

5 September 2024

0

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.
Similarity = (A.B) / (||A||.||B||) where A and B are vectors.

Cosine similarity and nltk toolkit module are used in this program. To execute this program nltk must be installed in your system. In order to install nltk module follow the steps below –

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

Functions used:

nltk.tokenize: It is used for tokenization. Tokenization is the process by which big quantity of text is divided into smaller parts called tokens. word_tokenize(X) split the given sentence X into words and return list.

nltk.corpus: In this program, it is used to get a list of stopwords. A stop word is a commonly used word (such as “the”, “a”, “an”, “in”).

Below is the Python implementation –

# Program to measure the similarity between  
# two sentences using cosine similarity. 
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 
  
# X = input("Enter first string: ").lower() 
# Y = input("Enter second string: ").lower() 
X ="I love horror movies"
Y ="Lights out is a horror movie"
  
# tokenization 
X_list = word_tokenize(X)  
Y_list = word_tokenize(Y) 
  
# sw contains the list of stopwords 
sw = stopwords.words('english')  
l1 =[];l2 =[] 
  
# remove stop words from the string 
X_set = {w for w in X_list if not w in sw}  
Y_set = {w for w in Y_list if not w in sw} 
  
# form a set containing keywords of both strings  
rvector = X_set.union(Y_set)  
for w in rvector: 
    if w in X_set: l1.append(1) # create a vector 
    else: l1.append(0) 
    if w in Y_set: l2.append(1) 
    else: l2.append(0) 
c = 0
  
# cosine formula  
for i in range(len(rvector)): 
        c+= l1[i]*l2[i] 
cosine = c / float((sum(l1)*sum(l2))**0.5) 
print("similarity: ", cosine) 

Output:

similarity:  0.2886751345948129

Whether you’re preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, neveropen Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we’ve already empowered, and we’re here to do the same for you. Don’t miss out – check it out now!

Python | Measure similarity between two sentences using cosine similarity

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US