Python blocks and gaps
Python Chunks and Gaps
Chunking is the process of grouping similar words together based on their properties. In the following example, we create chunks by defining a grammar. This grammar suggests following a phrase order like noun, adjective, etc. when creating chunks. Below is a graphical output of the chunks.
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
result.draw()
When we run the above program, we get the following output –
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
Running the above program yields the following output:
Seaming
Seaming is the process of removing a sequence of tokens from a chunk. If these tokens appear in the middle of a chunk, they are removed, and the two original chunks remain.
import nltk
sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = r"""
NP:
{<.*>+} # Chunk everything
}+{ # Chink sequences of JJ and NN
"""
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()
When we run the above program, we get the following output –
As you can see, the grammatically correct parts of the noun phrase are separated into separate chunks. The process of extracting text that doesn’t fit into the required chunks is called chinking.