Python blocks and gaps

Python Chunks and Gaps

Chunking is the process of grouping similar words together based on their properties. In the following example, we create chunks by defining a grammar. This grammar suggests following a phrase order like noun, adjective, etc. when creating chunks. Below is a graphical output of the chunks.

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
result.draw()

When we run the above program, we get the following output –

Python chunks and gaps

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
 ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]

grammar = "NP: {?*}"

chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()

Running the above program yields the following output:

Python Chunks and Gaps

Seaming

Seaming is the process of removing a sequence of tokens from a chunk. If these tokens appear in the middle of a chunk, they are removed, and the two original chunks remain.

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]

grammar = r"""
NP:
{<.*>+} # Chunk everything
}+{ # Chink sequences of JJ and NN
"""
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()

When we run the above program, we get the following output –

Python Chunks and Gaps

As you can see, the grammatically correct parts of the noun phrase are separated into separate chunks. The process of extracting text that doesn’t fit into the required chunks is called chinking.

Leave a Reply

Your email address will not be published. Required fields are marked *