Python text processing state machine

Python Text Processing State Machine

A state machine is a program designed to control the flow of an application. It is a directed graph consisting of a set of nodes and a set of transition functions. Processing a text file typically involves reading each block of the text file sequentially and performing certain operations on each read block. The meaning of a block depends on the type of blocks that preceded it and the blocks that follow it. This machine is about designing a program to control the flow of an application. It is a directed graph consisting of a set of nodes and a set of transition functions. Processing a text file typically involves reading each block of the text file sequentially and performing certain operations on each read block. The meaning of a block depends on the type of blocks that preceded it and the blocks that follow it.

Consider a scenario where the input text must be a continuously repeating sequence AGC (used in protein analysis). As long as this specific sequence is present in the input string, the state of the machine remains TRUE. However, if the sequence deviates, the state of the machine becomes FALSE and remains FALSE thereafter. This ensures that further processing stops even if more blocks of the correct sequence may be available later.

The following program defines a state machine with functions for starting the machine, processing text input, and stepping through the processing.

class StateMachine:

#Initialize
    def start(self):
        self.state = self.startState

# Step through the input
    def step(self, inp):
        (s, o) = self.getNextValues(self.state, inp)
        self.state = s
        return o

# Loop through the input
    def feeder(self, inputs):
        self.start()
        return [self.step(inp) for inp in inputs]

# Determine the TRUE or FALSE state
class TextSeq(StateMachine):
    startState = 0
    def getNextValues(self, state, inp):
        if state == 0 and inp == 'A':
            return (1, True)
        elif state == 1 and inp == 'G':
            return (2, True)
        elif state == 2 and inp == 'C':
            return (0, True)
else:
return (3, False)

InSeq = TextSeq()

x = InSeq.feeder(['A','A','A'])
print x

y = InSeq.feeder(['A', 'G', 'C', 'A', 'C', 'A', 'G'])
print y

When we run the above program, we get the following output −

[True, False, False]
[True, True, True, True, False, False, False]

In the result for x, AGC mode fails on the second input after the first ‘A’. Thereafter, the result remains False forever. In the result for y, AGC mode continues until the fourth input. Therefore, the result remains True at that point. But starting from the fifth input, the result becomes False because G was expected but C was found.

Leave a Reply

Your email address will not be published. Required fields are marked *