Effect of the Side Effect Machines in Edit Metric Decoding
MetadataShow full item record
The development of general edit metric decoders is a challenging problem, especially with the inclusion of additional biological restrictions that can occur in DNA error correcting codes. Side effect machines (SEMs), an extension of finite state machines, can provide efficient decoding algorithms for such edit metric codes. However, finding a good machine poses its own set of challenges and is itself considered as an open problem with no general solution. Previous studies utilizing evolutionary computation techniques, such as genetic algorithms and evolutionary programming to search for good SEMs have found success in terms of decoding accuracy. However, they all worked with extremely constricted problem spaces i.e. a single code or codes of the same length. Therefore a general approach that works well across codes of different lengths is yet to be formalized. In this research, several codes of varying lengths are used to study the effectiveness of evolutionary programming (EP) as a general approach for finding efficient edit metric decoders. Two classification methods—direct and fuzzy—are compared while also changing some of the EP settings to observe how the decoding accuracy is affected. The final SEMs are verified against an additional dataset to test their general effectiveness. Regardless of the code length, the best results are found using the fuzzy classification methods. For codes of length 10, a maximum accuracy of up to 99.4% is achieved for distance 1 whereas distance 2 and 3 achieve up to 97.1% and 85.9%, respectively. Unsurprisingly, the accuracy suffers for longer codes, as the maximum accuracies achieved by codes of length 14 were 92.4%, 85.7% and 69.2% for distance 1, 2, and 3 respectively. Additionally, the machines are examined for potential bloat by comparing the number of visited states against the number of total states. The study has found some machines with at least one unvisited state. The bloat is seen more in larger machines than it is in smaller machines. Furthermore, the results are analyzed to find potential trends and relationships among the parameters. The trend that is most consistently noticed is that — when allowed, the longer codes generally show a propensity for larger machines.