• An Abstract Algebraic Theory of L-Fuzzy Relations for Relational Databases

      Chowdhury, Abdul Wazed; Department of Computer Science
      Classical relational databases lack proper ways to manage certain real-world situations including imprecise or uncertain data. Fuzzy databases overcome this limitation by allowing each entry in the table to be a fuzzy set where each element of the corresponding domain is assigned a membership degree from the real interval [0…1]. But this fuzzy mechanism becomes inappropriate in modelling scenarios where data might be incomparable. Therefore, we become interested in further generalization of fuzzy database into L-fuzzy database. In such a database, the characteristic function for a fuzzy set maps to an arbitrary complete Brouwerian lattice L. From the query language perspectives, the language of fuzzy database, FSQL extends the regular Structured Query Language (SQL) by adding fuzzy specific constructions. In addition to that, L-fuzzy query language LFSQL introduces appropriate linguistic operations to define and manipulate inexact data in an L-fuzzy database. This research mainly focuses on defining the semantics of LFSQL. However, it requires an abstract algebraic theory which can be used to prove all the properties of, and operations on, L-fuzzy relations. In our study, we show that the theory of arrow categories forms a suitable framework for that. Therefore, we define the semantics of LFSQL in the abstract notion of an arrow category. In addition, we implement the operations of L-fuzzy relations in Haskell and develop a parser that translates algebraic expressions into our implementation.
    • Approximation Algorithms using Allegories and Coq

      Chowdhury, Durjay; Department of Computer Science
      In this thesis, we implement several approximation algorithms for solving optimization problems on graphs. The result computed by the algorithm may or may not be optimal. The approximation factor of an algorithm indicates how close the computed result is to an optimal solution. We are going to verify two properties of each algorithm in this thesis.First, we show that the algorithm computes a solution to the problem, and, second, we show that the approximation factor is satisfied. To implement these algorithms, we use the algebraic theory of relations, i.e., the theory of allegories and various extension thereof. An implementation of various kinds of lattices and the theory of categories is required for the declaration of allegories. The programming language and interactive theorem prover Coq is used for the implementation purposes. This language is based on Higher-Order Logic (HOL) with dependent types which support both reasoning and program execution. In addition to the abstract theory, we provide the model of set-theoretic relations between finite sets. This model is executable and used in our examples. Finally, we provide an example for each of the approximation algorithm.
    • Automatic evolution of conceptual building architectures

      Coia, Corrado; Department of Computer Science (2012-04-03)
      This thesis describes research in which genetic programming is used to automatically evolve shape grammars that construct three dimensional models of possible external building architectures. A completely automated fitness function is used, which evaluates the three dimensional building models according to different geometric properties such as surface normals, height, building footprint, and more. In order to evaluate the buildings on the different criteria, a multi-objective fitness function is used. The results obtained from the automated system were successful in satisfying the multiple objective criteria as well as creating interesting and unique designs that a human-aided system might not discover. In this study of evolutionary design, the architectures created are not meant to be fully functional and structurally sound blueprints for constructing a building, but are meant to be inspirational ideas for possible architectural designs. The evolved models are applicable for today's architectural industries as well as in the video game and movie industries. Many new avenues for future work have also been discovered and highlighted.
    • Automatic Inference of Graph Models for Complex Networks with Genetic Programming

      Bailey, Alexander; Department of Computer Science (Brock University, 2013-07-26)
      Complex networks can arise naturally and spontaneously from all things that act as a part of a larger system. From the patterns of socialization between people to the way biological systems organize themselves, complex networks are ubiquitous, but are currently poorly understood. A number of algorithms, designed by humans, have been proposed to describe the organizational behaviour of real-world networks. Consequently, breakthroughs in genetics, medicine, epidemiology, neuroscience, telecommunications and the social sciences have recently resulted. The algorithms, called graph models, represent significant human effort. Deriving accurate graph models is non-trivial, time-intensive, challenging and may only yield useful results for very specific phenomena. An automated approach can greatly reduce the human effort required and if effective, provide a valuable tool for understanding the large decentralized systems of interrelated things around us. To the best of the author's knowledge this thesis proposes the first method for the automatic inference of graph models for complex networks with varied properties, with and without community structure. Furthermore, to the best of the author's knowledge it is the first application of genetic programming for the automatic inference of graph models. The system and methodology was tested against benchmark data, and was shown to be capable of reproducing close approximations to well-known algorithms designed by humans. Furthermore, when used to infer a model for real biological data the resulting model was more representative than models currently used in the literature.
    • Automatic Structure Generation using Genetic Programming and Fractal Geometry

      Bergen, Steve; Department of Computer Science (Brock University, 2012-02-09)
      Three dimensional model design is a well-known and studied field, with numerous real-world applications. However, the manual construction of these models can often be time-consuming to the average user, despite the advantages o ffered through computational advances. This thesis presents an approach to the design of 3D structures using evolutionary computation and L-systems, which involves the automated production of such designs using a strict set of fitness functions. These functions focus on the geometric properties of the models produced, as well as their quantifiable aesthetic value - a topic which has not been widely investigated with respect to 3D models. New extensions to existing aesthetic measures are discussed and implemented in the presented system in order to produce designs which are visually pleasing. The system itself facilitates the construction of models requiring minimal user initialization and no user-based feedback throughout the evolutionary cycle. The genetic programming evolved models are shown to satisfy multiple criteria, conveying a relationship between their assigned aesthetic value and their perceived aesthetic value. Exploration into the applicability and e ffectiveness of a multi-objective approach to the problem is also presented, with a focus on both performance and visual results. Although subjective, these results o er insight into future applications and study in the fi eld of computational aesthetics and automated structure design.
    • Bio-inspired optimization & sampling technique for side-chain packing in MCCE

      Comte, Pascal; Department of Computer Science (Brock University, 2010-10-26)
      The prediction of proteins' conformation helps to understand their exhibited functions, allows for modeling and allows for the possible synthesis of the studied protein. Our research is focused on a sub-problem of protein folding known as side-chain packing. Its computational complexity has been proven to be NP-Hard. The motivation behind our study is to offer the scientific community a means to obtain faster conformation approximations for small to large proteins over currently available methods. As the size of proteins increases, current techniques become unusable due to the exponential nature of the problem. We investigated the capabilities of a hybrid genetic algorithm / simulated annealing technique to predict the low-energy conformational states of various sized proteins and to generate statistical distributions of the studied proteins' molecular ensemble for pKa predictions. Our algorithm produced errors to experimental results within .acceptable margins and offered considerable speed up depending on the protein and on the rotameric states' resolution used.
    • Bounds on edit metric codes with combinatorial DNA constraints

      Sun, Jing.; Department of Computer Science (Brock University, 2010-03-09)
      The design of a large and reliable DNA codeword library is a key problem in DNA based computing. DNA codes, namely sets of fixed length edit metric codewords over the alphabet {A, C, G, T}, satisfy certain combinatorial constraints with respect to biological and chemical restrictions of DNA strands. The primary constraints that we consider are the reverse--complement constraint and the fixed GC--content constraint, as well as the basic edit distance constraint between codewords. We focus on exploring the theory underlying DNA codes and discuss several approaches to searching for optimal DNA codes. We use Conway's lexicode algorithm and an exhaustive search algorithm to produce provably optimal DNA codes for codes with small parameter values. And a genetic algorithm is proposed to search for some sub--optimal DNA codes with relatively large parameter values, where we can consider their sizes as reasonable lower bounds of DNA codes. Furthermore, we provide tables of bounds on sizes of DNA codes with length from 1 to 9 and minimum distance from 1 to 9.
    • A Centrality Based Multi-Objective Disease-Gene Association Approach Using Genetic Algorithms

      Kennedy Collins, Tyler; Department of Computer Science
      The Disease Gene Association Problem (DGAP) is a bioinformatics problem in which genes are ranked with respect to how involved they are in the presentation of a particular disease. Previous approaches have shown the strength of both Monte Carlo and evolutionary computation (EC) based techniques. Typically these past approaches improve ranking measures, develop new gene relation definitions, or implement more complex EC systems. This thesis presents a hybrid approach which implements a multi-objective genetic algorithm, where input consists of centrality measures based on various relational biological evidence types merged into a complex network. In an effort to explore the effectiveness of the technique compared to past work, multiple objective settings and different EC parameters are studied including the development of a new exchange methodology, safe dealer-based (SDB) crossover. Successful results with respect to breast cancer and Parkinson's disease compared to previous EC techniques and popular known databases are shown. In addition, the newly developed methodology is also successfully applied to Alzheimer’s, further demonstrating the flexibility of the technique. Across all three cases studies the strongest results were produced by the shortest path-based measures stress and betweenness in a single objective parameter setting. When used in conjunction in a multi-objective environment, competitive results were also obtained but fell short of the single objective settings studied as part of this work. Lastly, while SDB crossover fell short of expectations on breast cancer and Parkinson's, it achieved the best results when applied to Alzheimer’s, illustrating the potential of the technique for future study.
    • Characterizing Dynamic Optimization Benchmarks for the Comparison of Multi-Modal Tracking Algorithms

      Bond, Ron; Department of Computer Science
      Population-based metaheuristics, such as particle swarm optimization (PSO), have been employed to solve many real-world optimization problems. Although it is of- ten sufficient to find a single solution to these problems, there does exist those cases where identifying multiple, diverse solutions can be beneficial or even required. Some of these problems are further complicated by a change in their objective function over time. This type of optimization is referred to as dynamic, multi-modal optimization. Algorithms which exploit multiple optima in a search space are identified as niching algorithms. Although numerous dynamic, niching algorithms have been developed, their performance is often measured solely on their ability to find a single, global optimum. Furthermore, the comparisons often use synthetic benchmarks whose landscape characteristics are generally limited and unknown. This thesis provides a landscape analysis of the dynamic benchmark functions commonly developed for multi-modal optimization. The benchmark analysis results reveal that the mechanisms responsible for dynamism in the current dynamic bench- marks do not significantly affect landscape features, thus suggesting a lack of representation for problems whose landscape features vary over time. This analysis is used in a comparison of current niching algorithms to identify the effects that specific landscape features have on niching performance. Two performance metrics are proposed to measure both the scalability and accuracy of the niching algorithms. The algorithm comparison results demonstrate the algorithms best suited for a variety of dynamic environments. This comparison also examines each of the algorithms in terms of their niching behaviours and analyzing the range and trade-off between scalability and accuracy when tuning the algorithms respective parameters. These results contribute to the understanding of current niching techniques as well as the problem features that ultimately dictate their success.
    • Comparison of classification ability of hyperball algorithms to neural network and k-nearest neighbour algorithms

      Zibamanzar-Mofrad, Tanaby; Department of Computer Science (2012-04-03)
      The main focus of this thesis is to evaluate and compare Hyperbalilearning algorithm (HBL) to other learning algorithms. In this work HBL is compared to feed forward artificial neural networks using back propagation learning, K-nearest neighbor and 103 algorithms. In order to evaluate the similarity of these algorithms, we carried out three experiments using nine benchmark data sets from UCI machine learning repository. The first experiment compares HBL to other algorithms when sample size of dataset is changing. The second experiment compares HBL to other algorithms when dimensionality of data changes. The last experiment compares HBL to other algorithms according to the level of agreement to data target values. Our observations in general showed, considering classification accuracy as a measure, HBL is performing as good as most ANn variants. Additionally, we also deduced that HBL.:s classification accuracy outperforms 103's and K-nearest neighbour's for the selected data sets.
    • Complete computational sequence characterization of mobile element variations in the human genome using meta-personal genome data

      Girilishena, Yaroslava; Department of Computer Science
      While a large number of methods have been developed to detect such types of genome sequence variations as single nucleotide polymorphisms (SNPs) and small indels, comparatively fewer methods have been developed for finding structural variants (SVs) and in particular mobile elements insertions (MEIs). Moreover, almost all these methods can detect only the breakpoints of an occurred SV, sometimes with approximation, and do not provide complete sequences representing the SVs. The main objective of our research is to develop a set of computer algorithms to provide complete genome sequence characterization for insertional structural variants in the human genomes via local de novo sequence assembly or progressive assembly using discordant and concordant read pairs and split-reads. An essential component of our approach involves utilizing all personal genome data available in the public domain vs. the standard way of using one set of personal genome sequences. The developed tool is the first system that provides full sequence characterization of SVs. Overall, the characterization success rate for Alu is 75.03% with the mean of discordant and split-reads higher than 94 reads. For SVA, it is 71.43% with the threshold of 363 reads. And for L1 the values are 77.78% and 355 respectively. The results showed that the SV characterization depends on the allele frequency and is influenced by the repetitiveness of flanking regions. Therefore, addressing these problems is a key to further improvements.
    • Construction of I-Deletion-Correcting Ternary Codes

      Li, Zhiyuan; Department of Computer Science (2013-04-08)
      Finding large deletion correcting codes is an important issue in coding theory. Many researchers have studied this topic over the years. Varshamov and Tenegolts constructed the Varshamov-Tenengolts codes (VT codes) and Levenshtein showed the Varshamov-Tenengolts codes are perfect binary one-deletion correcting codes in 1992. Tenegolts constructed T codes to handle the non-binary cases. However the T codes are neither optimal nor perfect, which means some progress can be established. Latterly, Bours showed that perfect deletion-correcting codes have a close relationship with design theory. By this approach, Wang and Yin constructed perfect 5-deletion correcting codes of length 7 for large alphabet size. For our research, we focus on how to extend or combinatorially construct large codes with longer length, few deletions and small but non-binary alphabet especially ternary. After a brief study, we discovered some properties of T codes and produced some large codes by 3 different ways of extending some existing good codes.
    • Data mining using L-fuzzy concept analysis.

      Saha, Sajal; Department of Computer Science
      Association rules in data mining are implications between attributes of objects that hold in all instances of the given data. These rules are very useful to determine the properties of the data such as essential features of products that determine the purchase decisions of customers. Normally the data is given as binary (or crisp) tables relating objects with their attributes by yes-no entries. We propose a relational theory for generating attribute implications from many-valued contexts, i.e, where the relationship between objects and attributes is given by a range of degrees from no to yes. This degree is usually taken from a suitable lattice where the smallest element corresponds to the classical no and the greatest element corresponds to the classical yes. Previous related work handled many-valued contexts by transforming the context by scaling or by choosing a minimal degree of membership to a crisp (yes-no) context. Then the standard methods of formal concept analysis were applied to this crisp context. In our proposal, we will handle a many-valued context as is, i.e., without transforming it into a crisp one. The advantage of this approach is that we work with the original data without performing a transformation step which modifies the data in advance.
    • Decoding algorithms using side-effect machines

      Brown, Joseph Alexander; Department of Computer Science (Brock University, 2010-03-09)
      Bioinformatics applies computers to problems in molecular biology. Previous research has not addressed edit metric decoders. Decoders for quaternary edit metric codes are finding use in bioinformatics problems with applications to DNA. By using side effect machines we hope to be able to provide efficient decoding algorithms for this open problem. Two ideas for decoding algorithms are presented and examined. Both decoders use Side Effect Machines(SEMs) which are generalizations of finite state automata. Single Classifier Machines(SCMs) use a single side effect machine to classify all words within a code. Locking Side Effect Machines(LSEMs) use multiple side effect machines to create a tree structure of subclassification. The goal is to examine these techniques and provide new decoders for existing codes. Presented are ideas for best practices for the creation of these two types of new edit metric decoders.
    • Deep Learning Concepts for Evolutionary Art

      Tanjil, Fazle; Department of Computer Science
      A deep convolutional neural network (CNN) trained on millions of images forms a very high-level abstract overview of any given target image. Our primary goal is to use this high-level content information of a given target image to guide the automatic evolution of images. We use genetic programming (GP) to evolve procedural textures. We incorporate a pre-trained deep CNN model into the fitness. We are not performing any training, but rather, we pass a target image through the pre-trained deep CNN and use its the high-level representation as the fitness guide for evolved images. We develop a preprocessing strategy called Mean Minimum Matrix Strategy (MMMS) which reduces the dimensions and identifies the most relevant high-level activation maps. The technique using reduced activation matrices for a fitness shows promising results. GP is able to guide the evolution of textures such that they have shared characteristics with the target image. We also experiment with the fully connected “classifier” layers of the deep CNN. The evolved images are able to achieve high confidence scores from the deep CNN module for some tested target images. Finally, we implement our own shallow convolutional neural network with a fixed set of filters. Experiments show that the basic CNN had limited effectiveness, likely due to the lack of training. In conclusion, the research shows the potential for using deep learning concepts in evolutionary art. As deep CNN models become better understood, they will be able to be used more effectively for evolutionary art.
    • A Deep Learning Pipeline for Classifying Different Stages of Alzheimer's Disease from fMRI Data.

      kazemi, yosra; Department of Computer Science
      Abstract Alzheimer’s disease (AD) is an irreversible, progressive neurological disorder that causes memory and thinking skill loss. Many different methods and algorithms have been applied to extract patterns from neuroimaging data in order to distinguish different stages of AD. However, the similarity of the brain patterns in older adults and in different stages makes the classification of different stages a challenge for researchers. In this thesis, convolutional neuronal network architecture AlexNet was applied to fMRI datasets to classify different stages of the disease. We classified five different stages of Alzheimer’s using a deep learning algorithm. The method successfully classified normal healthy control (NC), significant memory concern (SMC), early mild cognitive impair (EMCI), late cognitive mild impair (LMCI), and Alzheimer’s disease (AD). The model was implemented using GPU high performance computing. Before applying any classification, the fMRI data were strictly preprocessed to avoid any noise. Then, low to high level features were extracted and learned using the AlexNet model. Our experiments show significant improvement in classification. The average accuracy of the model was 97.63%. We then tested our model on test datasets to evaluate the accuracy of the model per class, obtaining an accuracy of 94.97% for AD, 95.64% for EMCI, 95.89% for LMCI, 98.34% for NC, and 94.55% for SMC.
    • Disease-Gene Association Using a Genetic Algorithm

      Tahmasebipour, Koosha; Department of Computer Science (Brock University, 2014-10-09)
      Understanding the relationship between genetic diseases and the genes associated with them is an important problem regarding human health. The vast amount of data created from a large number of high-throughput experiments performed in the last few years has resulted in an unprecedented growth in computational methods to tackle the disease gene association problem. Nowadays, it is clear that a genetic disease is not a consequence of a defect in a single gene. Instead, the disease phenotype is a reflection of various genetic components interacting in a complex network. In fact, genetic diseases, like any other phenotype, occur as a result of various genes working in sync with each other in a single or several biological module(s). Using a genetic algorithm, our method tries to evolve communities containing the set of potential disease genes likely to be involved in a given genetic disease. Having a set of known disease genes, we first obtain a protein-protein interaction (PPI) network containing all the known disease genes. All the other genes inside the procured PPI network are then considered as candidate disease genes as they lie in the vicinity of the known disease genes in the network. Our method attempts to find communities of potential disease genes strongly working with one another and with the set of known disease genes. As a proof of concept, we tested our approach on 16 breast cancer genes and 15 Parkinson's Disease genes. We obtained comparable or better results than CIPHER, ENDEAVOUR and GPEC, three of the most reliable and frequently used disease-gene ranking frameworks.
    • Disease-Gene Association Using Genetic Programming

      Entezari Heravi, Ashkan; Department of Computer Science
      As a result of mutation in genes, which is a simple change in our DNA, we will have undesirable phenotypes which are known as genetic diseases or disorders. These small changes, which happen frequently, can have extreme results. Understanding and identifying these changes and associating these mutated genes with genetic diseases can play an important role in our health, by making us able to find better diagnosis and therapeutic strategies for these genetic diseases. As a result of years of experiments, there is a vast amount of data regarding human genome and different genetic diseases that they still need to be processed properly to extract useful information. This work is an effort to analyze some useful datasets and to apply different techniques to associate genes with genetic diseases. Two genetic diseases were studied here: Parkinson’s disease and breast cancer. Using genetic programming, we analyzed the complex network around known disease genes of the aforementioned diseases, and based on that we generated a ranking for genes, based on their relevance to these diseases. In order to generate these rankings, centrality measures of all nodes in the complex network surrounding the known disease genes of the given genetic disease were calculated. Using genetic programming, all the nodes were assigned scores based on the similarity of their centrality measures to those of the known disease genes. Obtained results showed that this method is successful at finding these patterns in centrality measures and the highly ranked genes are worthy as good candidate disease genes for being studied. Using standard benchmark tests, we tested our approach against ENDEAVOUR and CIPHER - two well known disease gene ranking frameworks - and we obtained comparable results.
    • Effect of the Side Effect Machines in Edit Metric Decoding

      Banik, Sharnendu; Department of Computer Science
      The development of general edit metric decoders is a challenging problem, especially with the inclusion of additional biological restrictions that can occur in DNA error correcting codes. Side effect machines (SEMs), an extension of finite state machines, can provide efficient decoding algorithms for such edit metric codes. However, finding a good machine poses its own set of challenges and is itself considered as an open problem with no general solution. Previous studies utilizing evolutionary computation techniques, such as genetic algorithms and evolutionary programming to search for good SEMs have found success in terms of decoding accuracy. However, they all worked with extremely constricted problem spaces i.e. a single code or codes of the same length. Therefore a general approach that works well across codes of different lengths is yet to be formalized. In this research, several codes of varying lengths are used to study the effectiveness of evolutionary programming (EP) as a general approach for finding efficient edit metric decoders. Two classification methods—direct and fuzzy—are compared while also changing some of the EP settings to observe how the decoding accuracy is affected. The final SEMs are verified against an additional dataset to test their general effectiveness. Regardless of the code length, the best results are found using the fuzzy classification methods. For codes of length 10, a maximum accuracy of up to 99.4% is achieved for distance 1 whereas distance 2 and 3 achieve up to 97.1% and 85.9%, respectively. Unsurprisingly, the accuracy suffers for longer codes, as the maximum accuracies achieved by codes of length 14 were 92.4%, 85.7% and 69.2% for distance 1, 2, and 3 respectively. Additionally, the machines are examined for potential bloat by comparing the number of visited states against the number of total states. The study has found some machines with at least one unvisited state. The bloat is seen more in larger machines than it is in smaller machines. Furthermore, the results are analyzed to find potential trends and relationships among the parameters. The trend that is most consistently noticed is that — when allowed, the longer codes generally show a propensity for larger machines.
    • Elliptic Curve Cryptography using Computational Intelligence

      Ribaric, Tim; Department of Computer Science
      Public-key cryptography is a fundamental component of modern electronic communication that can be constructed with many different mathematical processes. Presently, cryptosystems based on elliptic curves are becoming popular due to strong cryptographic strength per small key size. At the heart of these schemes is the complexity of the elliptic curve discrete logarithm problem (ECDLP). Pollard’s Rho algorithm is a well known method for solving the ECDLP and thereby breaking ciphers based on elliptic curves for reasonably small key sizes (up to approximately 100 bits in length). It has the same time complexity as other known methods but is advantageous due to smaller memory requirements. This study considers how to speed up the Rho process by modifying a key component: the iterating function, which is the part of the algorithm responsible for determining what point is considered next when looking for the solution to the ECDLP. It is replaced with an alternative that is found through an evolutionary process. This alternative consistently and significantly decreases the number of iterations required by Pollard’s Rho Algorithm to successfully find the sought after solution.