Python – Chunks and Chinks
Python – Chunks and Chinks
Chunking is the process of grouping similar words together based on their properties. In the following example, we generate chunks by defining a grammar. The grammar suggests the order of phrases, such as nouns and adjectives, that will be followed when creating chunks. The following figure shows a graphical output of the chunks.
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
result.draw()
When running the above program, we get the following output –
Changing the grammar, we get another output, as shown below.
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
When running the above program, we get the following output:
Chinking
Chinking is the process of removing a series of markers from a chunk. If a series of markers appears in the middle of a chunk, they are removed, leaving two chunks in their place.
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = r"""
NP:
{<.*>+} # Chunk everything
}+{ # Chink sequences of JJ and NN
"""
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
When we run the above program, we get the following output –
As you can see, the grammatical parts are separated from the noun phrases as independent chunks. The process of extracting text from the non-essential chunks is called chinking.