Python – Text Processing State Machine

Python – Text Processing State Machine

A state machine is a program designed to control the flow of data in an application. It is a directed graph consisting of a set of nodes and a set of transition functions. Processing a text file often involves sequentially reading each block of text and performing an appropriate action on each block. The meaning of a block depends on the type of the previous block and the type of the subsequent block. This state machine is a program designed to control the flow of data in an application. It is a directed graph consisting of a set of nodes and a set of transition functions. Processing a text file often involves sequentially reading each block of text and performing an appropriate action on each block. The meaning of a block depends on the type of the previous block and the type of the subsequent block.

Consider a case where the text input must be a continuous string of AGC loop sequences (for protein analysis). As long as this specific sequence is maintained in the input string, the state of the machine remains TRUE. However, if the sequence deviates, the state of the machine becomes FALSE and remains FALSE thereafter. This ensures that further processing stops, even if more blocks of the correct sequence may exist.

The following program defines a state machine with functions to start the machine, input processing text, and step through the processing.

class StateMachine:

# initialization
    def start(self):
        self.state = self.startState

# Step through input
    def step(self, inp):
        (s, o) = self.getNextValues(self.state, inp)
        self.state = s
        return o

# Loop input
    def feeder(self, inputs):
        self.start()
        return [self.step(inp) for inp in inputs]

# Determine TRUE or FALSE status
class TextSeq(StateMachine):
    startState = 0
    def getNextValues(self, state, inp):
        if state == 0 and inp == 'A':
            return (1, True)
        elif state == 1 and inp == 'G':
            return (2, True)
        elif state == 2 and inp == 'C':
            return (0, True) else:
return (3, False)

InSeq = TextSeq()

x = InSeq.feeder(['A', 'A', 'A'])
print x

y = InSeq.feeder(['A', 'G', 'C', 'A', 'C', 'A', 'G'])
print y

When we run the above program, we get the following output −

[True, False, False]
[True, True, True, True, False, False, False]

In the result for x, the AGC pattern fails after the first “A” on the second input. The result remains False from then on. In the result for y, the AGC pattern continues until the fourth input. Therefore, the result remains True until that point. But starting from the fifth input, the result becomes False because G is expected but C is found.

Leave a Reply

Your email address will not be published. Required fields are marked *