ProCoder Cafe

Python – Chunks and Chinks

Chunking is the process of grouping similar words together based on their properties. In the following example, we generate chunks by defining a grammar. The grammar suggests the order of phrases, such as nouns and adjectives, that will be followed when creating chunks. The following figure shows a graphical output of the chunks.

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence)
print(result)
result.draw()

When running the above program, we get the following output –

Python - Chunks and Chinks

Changing the grammar, we get another output, as shown below.

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]

grammar = "NP: {?*}"

chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()

When running the above program, we get the following output:

Python - Chunks and Chinks

Chinking

Chinking is the process of removing a series of markers from a chunk. If a series of markers appears in the middle of a chunk, they are removed, leaving two chunks in their place.

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"), ("the", "DT"), ("window", "NN")]

grammar = r"""
NP:
{<.*>+} # Chunk everything
}+{ # Chink sequences of JJ and NN
"""
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence)
print(result)
result.draw()

When we run the above program, we get the following output –

Python - Chunks and Chinks

As you can see, the grammatical parts are separated from the noun phrases as independent chunks. The process of extracting text from the non-essential chunks is called chinking.

Python – Chunks and Chinks

Chinking

Related Posts

Python 3 – String isalnum() Method

Pytest test execution results in XML format

Comprehensive analysis of PLT image storage in Python

Leave a ReplyCancel Reply