Python Biagrams
Python Biagrams
Some English words appear together more frequently than others. For example, “Sky High,” “Do or Die,” “Best Performance,” “Heavy Rain,” and so on. Therefore, in text documents, we may need to identify such word pairs to aid in sentiment analysis. First, we need to generate such word pairs from existing sentences, preserving their current order. These pairs are called bigrams. Python has a bigram function, part of the NLTK library, that helps us generate these pairs.
Example
import nltk
word_data = "The best performance can bring sky-high success."
nltk_tokens = nltk.word_tokenize(word_data)
print(list(nltk.bigrams(nltk_tokens)))
After running the above program, we get the following output –
[('The', 'best'), ('best', 'performance'), ('performance', 'can'), ('can', 'bring'),
('bring', 'in'), ('in', 'sky'), ('sky', 'high'), ('high', 'success'), ('success', '.')]
This result can be used to generate a statistical analysis of the frequency of such pairs within a given text. This will correlate with the general sentiment of the description within the body of the text.