Python Biagrams

Python Biagrams

Some English words appear together more frequently than others. For example, “Sky High,” “Do or Die,” “Best Performance,” “Heavy Rain,” and so on. Therefore, in text documents, we may need to identify such word pairs to aid in sentiment analysis. First, we need to generate such word pairs from existing sentences, preserving their current order. These pairs are called bigrams. Python has a bigram function, part of the NLTK library, that helps us generate these pairs.

Example

import nltk

word_data = "The best performance can bring sky-high success."

nltk_tokens = nltk.word_tokenize(word_data)

print(list(nltk.bigrams(nltk_tokens)))

After running the above program, we get the following output –

[('The', 'best'), ('best', 'performance'), ('performance', 'can'), ('can', 'bring'),
('bring', 'in'), ('in', 'sky'), ('sky', 'high'), ('high', 'success'), ('success', '.')]

This result can be used to generate a statistical analysis of the frequency of such pairs within a given text. This will correlate with the general sentiment of the description within the body of the text.

Leave a Reply

Your email address will not be published. Required fields are marked *