Python stemming algorithm
Python Stemming Algorithms
In natural language processing, we often encounter situations where multiple words share a common root. For example, the words “agree,” “agreeing,” and “agreeable” share the same root word, “agree.” Any search involving these words should treat them as if they share the same root word. Therefore, it is important to connect all words to their root word. The NLTK library has methods to perform this connection and display the root word in the output.
NLTK has three most commonly used stemming algorithms. They produce slightly different results. The following examples demonstrate the use of these three stemming algorithms and their results.
import nltk
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem import SnowballStemmer
porter_stemmer = PorterStemmer()
lanca_stemmer = LancasterStemmer()
sb_stemmer = SnowballStemmer("english",)
word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns"
#First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
print '***PorterStemmer****n'
for w_port in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_port,porter_stemmer.stem(w_port))
print 'n***LancasterStemmer****n'
for w_lanca in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_lanca,lanca_stemmer.stem(w_lanca))
print 'n***SnowballStemmer****n'
for w_snow in nltk_tokens:
print "Actual: %s || Stem: %s" % (w_snow,sb_stemmer.stem(w_snow))
Running the above program, we get the following output –
***PorterStemmer****
Actual: Aging || Stem: age
Actual: head || Stem: head
Actual: of || Stem: of
Actual: famous || Stem: famous
Actual: crime || Stem: crime
Actual: family || Stem: family
Actual: decides || Stem: decide
Actual: to || Stem: to
Actual: transfer || Stem: transfer
Actual: his || Stem: hi
Actual: position || Stem: posit
Actual: to || Stem: to
Actual: one || Stem: one
Actual: of || Stem: of
Actual: his || Stem: hi
Actual: subalterns || Stem: subaltern
***LancasterStemmer****
Actual: Aging || Stem: ag
Actual: head || Stem: head
Actual: of || Stem: of
Actual: famous || Stem: fam
Actual: crime || Stem: crim
Actual: family || Stem: famy
Actual: decides || Stem: decide
Actual: to || Stem: to
Actual: transfer || Stem: transf
Actual: his || Stem: his
Actual: position || Stem: posit
Actual: to || Stem: to
Actual: one || Stem: on
Actual: of || Stem: of
Actual: his || Stem: his
Actual: subalterns || Stem: subaltern
***SnowballStemmer****
Actual: Aging || Stem: age
Actual: head || Stem: head
Actual: of || Stem: of
Actual: famous || Stem: famous
Actual: crime || Stem: crime
Actual: family || Stem: family
Actual: decisions || Stem: decid
Actual: to || Stem: to
Actual: transfer || Stem: transfer
Actual: his || Stem: his
Actual: position || Stem: posit
Actual: to || Stem: to
Actual: one || Stem: one
Actual: of || Stem: of
Actual: his || Stem: his
Actual: subalterns || Stem: subaltern