Sunday, November 17, 2024
Google search engine
HomeLanguagesFLAIR – A Framework for NLP

FLAIR – A Framework for NLP

What is FLAIR?

It is a simple framework for state-of-the-art NLP. It is a very powerful library which is developed by Zalando Research. The Flair framework is built on top of PyTorch.

What are the Features available in Flair?

  1. Flair supports a number of word embeddings used to perform NLP tasks such as FastText, ELMo, GloVe, BERT and its variants, XLM, and Byte Pair Embeddings including Flair Embedding.
  2. The Flair Embedding is based on the concept of contextual string embeddings which is used for Sequence Labelling.
  3. Using Flair you can also combine different word embeddings together to get better results.
  4. Flair supports a number of languages.

Contextual String Embeddings:

In this word embedding each of the letters in the words are sent to the Character Language Model and then the input representation is taken out from the forward and backward LSTMs.

The input representation for the word ‘Washington’ is been considered based on the context before the word ‘Washington’. The first and last character states of each word is taken in order to generate the word embeddings.

You can see that for the word ‘Washington’ the red mark is the forward LSTM output and the blue mark is the backward LSTM output. Both forward and backward contexts are concatenated to obtain the input representation of the word ‘Washington’.

After getting the input representation it is fed to the forward and backward LSTM to get the particular task that you are dealing with. In the diagram mentioned we are trying to get the NER.

Installation of Flair:

You should have PyTorch >=1.1 and Python >=3.6 installed. To install PyTorch on anaconda run the below command-

conda install -c pytorch pytorch

To install flair, run –

pip install flair

Working of Flair

1) Flair Datatypes:

Flair offers two types of objects. They are:

  1. Sentence
  2. Tokens

To get the number of tokens in a sentence:

Python3




import flair
from flair.data import Sentence
  
# take a sentence
s= Sentence('Lazyroar is Awesome.')
print(s)


Output:

2) NER Tags:

To predict tags for a given sentence we will use a pre-trained model as shown below:

Python3




import flair
from flair.data import Sentence
from flair.models import SequenceTagger
  
# input a sentence
s = Sentence('Lazyroar is Awesome.')
  
# loading NER tagger
tagger_NER= SequenceTagger.load('ner')
  
# run NER over sentence
tagger_NER.predict(s)
print(s)
print('The following NER tags are found:\n')
  
# iterate and print
for entity in s.get_spans('ner'):
    print(entity)


Output:

3) Word Embeddings:

Word embeddings give embeddings for each word of the text. As discussed earlier Flair supports many word embeddings including its own Flair Embeddings. Here we will see how to implement some of them.

A) Classic Word Embeddings – This class of word embeddings are static. In this, each distinct word is given only one pre-computed embedding. Most of the common word embeddings lie in this category including the GloVe embedding.

Python3




import flair
from flair.data import Sentence
from flair.embeddings import WordEmbeddings
  
# using glove embedding
GloVe_embedding = WordEmbeddings('glove')
  
# input a  sentence
s = Sentence('Geeks for Geeks helps me study.')
  
# embed the sentence
GloVe_embedding.embed(s)
  
# print the embedded tokens
for token in s:
    print(token)
    print(token.embedding)


Output:

Note: You can see here that the embeddings for the word ‘Geeks‘ are the same for both the occurrences.

B) Flair Embedding – This works on the concept of contextual string embeddings. It captures latent syntactic-semantic information. The word embeddings are contextualized by their surrounding words. It thus gives different embeddings for the same word depending on it’s surrounding text.

Python3




import flair
from flair.data import Sentence
from flair.embeddings import FlairEmbeddings
  
# using forward flair embeddingembedding
forward_flair_embedding= FlairEmbeddings('news-forward-fast')
  
# input the sentence
s = Sentence('Geeks for Geeks helps me study.')
  
# embed words in the input sentence
forward_flair_embedding.embed(s)
  
# print the embedded tokens
for token in s:
    print(token)
    print(token.embedding)


Output:

Note: Here we see that the embeddings for the word ‘Geeks’ are different for both the occurrences depending on the contextual information around them.

C) Stacked Embeddings – Using these embeddings you can combine different embeddings together. Let’s see how to combine GloVe, forward and backward Flair embeddings:

Python3




import flair
from flair.data import Sentence
from flair.embeddings import FlairEmbeddings, WordEmbeddings
from flair.embeddings import StackedEmbeddings
# flair embeddings
forward_flair_embedding= FlairEmbeddings('news-forward-fast')
backward_flair_embedding= FlairEmbeddings('news-backward-fast')
  
# glove embedding
GloVe_embedding = WordEmbeddings('glove')
  
# create a object which combines the two embeddings
stacked_embeddings = StackedEmbeddings([forward_flair_embedding,
                                        backward_flair_embedding,
                                        GloVe_embedding,])
  
# input the sentence
s = Sentence('Geeks for Geeks helps me study.')
                                          
# embed the input sentence with the stacked embedding
stacked_embeddings.embed(s)
  
# print the embedded tokens
for token in s:
    print(token)
    print(token.embedding)


Output:

4) Document Embeddings:

, Unlike word embeddings, document embeddings give a single embedding for the entire text. The document embeddings offered in Flair are:

  • A) Transformer Document Embeddings
  • B) Sentence Transformer Document Embeddings
  • C) Document RNN Embeddings
  • D) Document Pool Embeddings

Let’s have a look at how the Document Pool Embeddings work-

Document Pool Embeddings —  It is a very simple document embedding and it pooled over all the word embeddings and returns the average of all of them.

Python3




import flair
from flair.data import Sentence
from flair.embeddings import WordEmbeddings, DocumentPoolEmbeddings
  
# init the glove word embedding
GloVe_embedding = WordEmbeddings('glove')
  
# init the document embedding
doc_embeddings = DocumentPoolEmbeddings([GloVe_embedding])
  
# input the sentence
s = Sentence('Geeks for Geeks helps me study.')
                                          
#embed the input sentence with the document embedding
doc_embeddings.embed(s)
  
# print the embedded tokens
print(s.embedding)


Output:

Similarly, you can use other Document embeddings as well.

5) Training a Text Classification Model using Flair:

We are going to use the ‘TREC_6’ dataset available in Flair. You can also use your own datasets as well. To train our model we will be using the Document RNN Embeddings which trains an RNN over all the word embeddings in a sentence. The word embeddings which we will be using are the GloVe and the forward flair embedding.

Python3




from flair.data import Corpus
from flair.datasets import TREC_6
from flair.embeddings import WordEmbeddings, FlairEmbeddings, DocumentRNNEmbeddings
from flair.models import TextClassifier
from flair.trainers import ModelTrainer
  
# load the corpus
corpus = TREC_6()
  
# create a label dictionary
label_Dictionary = corpus.make_label_dictionary()
  
# list of word embeddings to be used
word_embeddings = [WordEmbeddings('glove'),FlairEmbeddings('news-forward-fast')]
  
# init document embeddings and pass the word embeddings list
doc_embeddings = DocumentRNNEmbeddings(word_embeddings,hidden_size = 250)
  
# creating the text classifier
text_classifier = TextClassifier(doc_embeddings,label_dictionary = label_Dictionary)
  
# init the text classifier trainer
model_trainer = ModelTrainer(text_classifier,corpus)
  
# train your model
model_trainer.train('resources/taggers/trec',learning_rate=0.1,mini_batch_size=40,anneal_factor=0.5,patience=5,max_epochs=200)


Results of training:

The accuracy of the model is around 95%.

Predictions: Now we can load the model and make predictions-

Python3




from flair.data import Sentence
from flair.models import TextClassifier
c = TextClassifier.load('resources/taggers/trec/final-model.pt')
  
# input example sentence
s = Sentence('Who is the President of India ?')
  
# predict class and print
c.predict(s)
  
# print the labels
print(s.labels)


Output:

[HUM (1.0)]

Now you would have got a rough idea of how to use the Flair library.

RELATED ARTICLES

Most Popular

Recent Comments