Lesk Algorithm in NLP – Python

26 July 2024

2

In Natural Language Processing (NLP), word sense disambiguation (WSD) is the challenge of determining which “sense” (meaning) of a word is activated by its use in a specific context, a process that appears to be mostly unconscious in individuals.

Lesk Algorithm is a way of Word Sense Disambiguation. The Lesk algorithm is a dictionary-based approach that is considered seminal. It is founded on the idea that words used in a text are related to one another, and that this relationship can be seen in the definitions of the words and their meanings. The pair of dictionary senses having the highest word overlap in their dictionary meanings are used to disambiguate two (or more) terms. Michael E. Lesk introduced the Lesk algorithm in 1986 as a classic approach for word sense disambiguation in Natural Language Processing. The Lesk algorithm assumes that words in a given “neighborhood” (a portion of text) will have a similar theme. The dictionary definition of an uncertain word is compared to the terms in its neighborhood in a simplified version of the Lesk algorithm.

Basic Lesk Algorithm implementation involves the following steps:

Count the number of words in the neighborhood of the word and in the dictionary definition of that sense for each sense of the word being disambiguated.
The sense to be picked is the one with the greatest number of items in this count.

Now let’s look at some examples by using the nltk’s implementation of the lesk algorithm.

Python3

%%capture 
import nltk 
from nltk.wsd import lesk 
from nltk.tokenize import word_tokenize 
nltk.download('all')

Now that we have imported the required libraries and methods let’s use them in some exemplary sequence.

Python3

def get_semantic(seq, key_word): 
    
      # Tokenization of the sequence 
    temp = word_tokenize(seq) 
      
    # Retrieving the definition  
    # of the tokens 
    temp = lesk(temp, key_word) 
    return temp.definition() 

This is just a helper function that we have implemented to keep the code neat and clean.

Example 1: Sequence with the same word and different meanings.

Python3

keyword = 'book'
seq1 = 'I love reading books on coding.'
seq2 = 'The table was already booked by someone else.'
  
print(get_semantic(seq1, keyword)) 
print(get_semantic(seq2, keyword)) 

Output:

a number of sheets (ticket or stamps etc.) bound together on one edge

arrange for and reserve (something for someone else) in advance

Here we have got the right output which is the definition of the specified keyword in the two given sequences.

Python3

keyword = 'jam'
seq1 = 'My mother prepares very yummy jam.'
seq2 = 'Signal jammers are the reason for no signal.'
  
print(get_semantic(seq1, keyword)) 
print(get_semantic(seq2, keyword))

Output:

press tightly together or cram

deliberate radiation or reflection of electromagnetic energy for the purpose of disrupting enemy use of electronic devices or systems

Lesk Algorithm in NLP – Python

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

10 Best Antivirus Black Friday/Cyber Monday Deals 2024 by Katarina Glamoslija

Bitdefender Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Kaspersky Black Friday & Cyber Monday Deals in 2024 by Kamso Oguejiofor

Norton Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Recent Comments

EDITOR PICKS

10 Best Antivirus Black Friday/Cyber Monday Deals 2024 by Katarina Glamoslija

Bitdefender Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Kaspersky Black Friday & Cyber Monday Deals in 2024 by Kamso Oguejiofor

POPULAR POSTS

10 Best Antivirus Black Friday/Cyber Monday Deals 2024 by Katarina Glamoslija

Bitdefender Black Friday & Cyber Monday Deals 2024 by Sam Boyd

Kaspersky Black Friday & Cyber Monday Deals in 2024 by Kamso Oguejiofor

POPULAR CATEGORY

ABOUT US

FOLLOW US