Browsing M.Sc. Computer Science by Title
Now showing items 4968 of 88

Lossy Compression of Quality Values in NextGeneration Sequencing DataIn this work we address the compression of SAM files which is the standard output file for DNA assembly. We specifically study lossy compression techniques used for quality values reported in the SAM file and we analyse the impact of such lossy techniques in the CRAM format. We also study the impact of these lossy techniques in the SNP calling process. Our results show that lossy techniques allow a better compression ratio than the one obtained with the original quality values. We also show that SNP calling performance is not negatively affected. Moreover we confirmed that some of the lossy techniques can even boost the SNP calling performance.

Modal and Relevance Logics for Qualitative Spatial ReasoningQualitative Spatial Reasoning (QSR) is an alternative technique to represent spatial relations without using numbers. Regions and their relationships are used as qualitative terms. Mostly peer qualitative spatial reasonings has two aspect: (a) the first aspect is based on inclusion and it focuses on the ”partof” relationship. This aspect is mathematically covered by mereology. (b) the second aspect focuses on topological nature, i.e., whether they are in ”contact” without having a common part. Mereotopology is a mathematical theory that covers these two aspects. The theoretical aspect of this thesis is to use classical propositional logic with nonclassical relevance logic to obtain a logic capable of reasoning about Boolean algebras i.e., the mereological aspect of QSR. Then, we extended the logic further by adding modal logic operators in order to reason about topological contact i.e., the topological aspect of QSR. Thus, we name this logic Modal Relevance Logic (MRL). We have provided a natural deduction system for this logic by defining inference rules for the operators and constants used in our (MRL) logic and shown that our system is correct. Furthermore, we have used the functional programming language and interactive theorem prover Coq to implement the definitions and natural deduction rules in order to provide an interactive system for reasoning in the logic.

Modeling Metal Protein Complexes from Experimental Extended Xray Absorption Fine Structure using Computational IntelligenceExperimental Extended Xray Absorption Fine Structure (EXAFS) spectra carry information about the chemical structure of metal protein complexes. However, pre dicting the structure of such complexes from EXAFS spectra is not a simple task. Currently methods such as Monte Carlo optimization or simulated annealing are used in structure refinement of EXAFS. These methods have proven somewhat successful in structure refinement but have not been successful in finding the global minima. Multiple population based algorithms, including a genetic algorithm, a restarting ge netic algorithm, differential evolution, and particle swarm optimization, are studied for their effectiveness in structure refinement of EXAFS. The oxygenevolving com plex in S1 is used as a benchmark for comparing the algorithms. These algorithms were successful in finding new atomic structures that produced improved calculated EXAFS spectra over atomic structures previously found.

Modelling and Proving Cryptographic Protocols in the Spi Calculus using CoqThe spi calculus is a process algebra used to model cryptographic protocols. A process calculus is a means of modelling a system of concurrently interacting agents, and provides tools for the description of communications and synchronizations between those agents. The spi calculus is based on Robin Milner's pi calculus, which was itself based upon his Calculus of Communicating Systems (CCS). It was created by Martin Abadi and Andrew D. Gordon as an expansion of the pi calculus intended to focus on cryptographic protocols, and adds features such as the encryption and decryption of messages using keys. The Coq proof system is an interactive theorem prover that allows for the definition of types and functions, and provides means by which to prove properties about them. The spi calculus has been implemented in Coq and subsequently used to model and show an example proof of a property about a simple cryptographic protocol. This required the implementation of both the syntax and the semantics of the calculus, as well as the rules and axioms used to manipulate the syntax in proofs. We discuss the spi calculus in detail as defined by Abadi and Gordon, then the various challenges faced during the implementation of the calculus and our rationale for the decisions made in the process.

A MultiObjective Genetic Algorithm with Side Effect Machines for Motif DiscoveryUnderstanding the machinery of gene regulation to control gene expression has been one of the main focuses of bioinformaticians for years. We use a multiobjective genetic algorithm to evolve a specialized version of side effect machines for degenerate motif discovery. We compare some suggested objectives for the motifs they find, test different multiobjective scoring schemes and probabilistic models for the background sequence models and report our results on a synthetic dataset and some biological benchmarking suites. We conclude with a comparison of our algorithm with some widely used motif discovery algorithms in the literature and suggest future directions for research in this area.

Multiobjective Genetic Algorithms for Multidepot VRP with Time WindowsEfficient routing and scheduling has significant economic implications for many realworld situations arising in transportation logistics, scheduling, and distribution systems, among others. This work considers both the single depot vehicle routing problem with time windows (VRPTW) and the multidepot vehicle routing problem with time windows (MDVRPTW). An agelayered population structure genetic algorithm is proposed for both variants of the vehicle routing problem. To the best of the author’s knowledge, this is first work to provide a multiobjective genetic algorithm approach for the MDVRPTW using wellknown benchmark data with up to 288 customers.

MultiObjective Genetic Algorithms for the Single Allocation Hub Location ProblemHub Location Problems play vital economic roles in transportation and telecommunication networks where goods or people must be efficiently transferred from an origin to a destination point whilst direct origindestination links are impractical. This work investigates the single allocation hub location problem, and proposes a genetic algorithm (GA) approach for it. The effectiveness of using a singleobjective criterion measure for the problem is ﬁrst explored. Next, a multiobjective GA employing various ﬁtness evaluation strategies such as Pareto ranking, sum of ranks, and weighted sum strategies is presented. The effectiveness of the multiobjective GA is shown by comparison with an Integer Programming strategy, the only other multiobjective approach found in the literature for this problem. Lastly, two new crossover operators are proposed and an empirical study is done using small to large problem instances of the Civil Aeronautics Board (CAB) and Australian Post (AP) data sets.

Network Similarity Measures and Automatic Construction of Graph Models using Genetic ProgrammingA complex network is an abstract representation of an intricate system of interrelated elements where the patterns of connection hold significant meaning. One particular complex network is a social network whereby the vertices represent people and edges denote their daily interactions. Understanding social network dynamics can be vital to the mitigation of disease spread as these networks model the interactions, and thus avenues of spread, between individuals. To better understand complex networks, algorithms which generate graphs exhibiting observed properties of realworld networks, known as graph models, are often constructed. While various efforts to aid with the construction of graph models have been proposed using statistical and probabilistic methods, genetic programming (GP) has only recently been considered. However, determining that a graph model of a complex network accurately describes the target network(s) is not a trivial task as the graph models are often stochastic in nature and the notion of similarity is dependent upon the expected behavior of the network. This thesis examines a number of wellknown network properties to determine which measures best allowed networks generated by different graph models, and thus the models themselves, to be distinguished. A proposed metaanalysis procedure was used to demonstrate how these network measures interact when used together as classifiers to determine network, and thus model, (dis)similarity. The analytical results form the basis of the fitness evaluation for a GP system used to automatically construct graph models for complex networks. The GPbased automatic inference system was used to reproduce existing, wellknown graph models as well as a realworld network. Results indicated that the automatically inferred models exemplified functional similarity when compared to their respective target networks. This approach also showed promise when used to infer a model for a mammalian brain network.

New Contig Creation Algorithm for the de novo DNA Assembly ProblemDNA assembly is among the most fundamental and difficult problems in bioinformatics. Near optimal assembly solutions are available for bacterial and small genomes, however assembling large and complex genomes especially the human genome using NextGenerationSequencing (NGS) technologies is shown to be very difficult because of the highly repetitive and complex nature of the human genome, short read lengths, uneven data coverage and tools that are not specifically built for human genomes. Moreover, many algorithms are not even scalable to human genome datasets containing hundreds of millions of short reads. The DNA assembly problem is usually divided into several subproblems including DNA data error detection and correction, contig creation, scaffolding and contigs orientation; each can be seen as a distinct research area. This thesis specifically focuses on creating contigs from the short reads and combining them with outputs from other tools in order to obtain better results. Three different assemblers including SOAPdenovo [Li09], Velvet [ZB08] and Meraculous [CHS+11] are selected for comparative purposes in this thesis. Obtained results show that this thesis’ work produces comparable results to other assemblers and combining our contigs to outputs from other tools, produces the best results outperforming all other investigated assemblers.

Nonphotorealistic Rendering with Cartesian Genetic Programming using Graphic Processing UnitsNonphotorealistic rendering (NPR) is concerned with the algorithm generation of images having unrealistic characteristics, for example, oil paintings or watercolour. Using genetic programming to evolve aesthetically pleasing NPR images is a relatively new approach in the art field, and in the majority of cases it takes a lot of time to generate results. With use of Cartesian genetic programming (CGP) and graphic processing units (GPUs), we can improve the performance of NPR image evolution. Evolutionary NPR can render images with interesting, and often unexpected, graphic effects. CGP provides a means to eliminate large, inefficient rendering expressions, while GPU acceleration parallelizes the calculations, which minimizes the time needed to get results. By using these tools, we can speed up the image generation process. Experiments revealed that CGP expressions are more concise, and search is more exploratory, than in treebased approaches. Implementation of the system with GPUs showed significant speedup.

Object Classification using LFuzzy Concept AnalysisObject classification and processing have become a coordinated piece of modern industrial manufacturing systems, generally utilized in a manual or computerized inspection process. Vagueness is a common issue related to object classification and analysis such as the ambiguity in input data, the overlapping boundaries among the classes or regions, and the indefiniteness in defining or extracting features and relations among them. The main purpose of this thesis is to construct, define, and implement an abstract algebraic framework for Lfuzzy relations to represent the uncertainties involved at every stage of the object classification. This is done to handle the proposed vagueness that is found in the process of object classification such as retaining information as much as possible from the original data for making decisions at the highest level making the ultimate output or result of the associated system with least uncertainty.

ObjectOriented Genetic Programming for the Automatic Inference of Graph Models for Complex NetworksComplex networks are systems of entities that are interconnected through meaningful relationships. The result of the relations between entities forms a structure that has a statistical complexity that is not formed by random chance. In the study of complex networks, many graph models have been proposed to model the behaviours observed. However, constructing graph models manually is tedious and problematic. Many of the models proposed in the literature have been cited as having inaccuracies with respect to the complex networks they represent. However, recently, an approach that automates the inference of graph models was proposed by Bailey [10] The proposed methodology employs genetic programming (GP) to produce graph models that approximate various properties of an exemplary graph of a targeted complex network. However, there is a great deal already known about complex networks, in general, and often specific knowledge is held about the network being modelled. The knowledge, albeit incomplete, is important in constructing a graph model. However it is difficult to incorporate such knowledge using existing GP techniques. Thus, this thesis proposes a novel GP system which can incorporate incomplete expert knowledge that assists in the evolution of a graph model. Inspired by existing graph models, an abstract graph model was developed to serve as an embryo for inferring graph models of some complex networks. The GP system and abstract model were used to reproduce wellknown graph models. The results indicated that the system was able to evolve models that produced networks that had structural similarities to the networks generated by the respective target models.

Objective reduction in manyobjective optimization problemsManyobjective optimization problems (MaOPs) are multiobjective optimization problems which have more than three objectives. MaOPs face significant challenges because of search efficiency, computational cost, decision making, and visualization. Many wellknown multiobjective evolutionary algorithms do not scale well with an increasing number of objectives. The objective reduction can alleviate such difficulties. However, most research in objective reduction use nondominated sorting or Pareto ranking. However, Pareto is effective in problems having less than four objectives. In this research, we use two approaches to objective reduction: randombased and linear coefficientbased. We use the sum of ranks instead of Pareto Ranking. When applied to manyobjective problems, the sum of ranks has outperformed many other optimization approaches. We also use the age layered population structure (ALPS). We use ALPS in our approach to remove premature convergence and improve results. The performance of the proposed methods has been studied extensively on the famous benchmark problem DTLZ. The original GA and ALPS outperform the objective reduction algorithms in many test cases of DTLZ. Among all reduction algorithms, a linear coefficient based reduction algorithm provides better performance for some problems in this test suite. Random based reduction is not an appropriate strategy for reducing objectives.

Particle swarm optimization for twoconnected networks with bounded ringsThe TwoConnected Network with Bounded Ring (2CNBR) problem is a network design problem addressing the connection of servers to create a survivable network with limited redirections in the event of failures. Particle Swarm Optimization (PSO) is a stochastic populationbased optimization technique modeled on the social behaviour of flocking birds or schooling fish. This thesis applies PSO to the 2CNBR problem. As PSO is originally designed to handle a continuous solution space, modification of the algorithm was necessary in order to adapt it for such a highly constrained discrete combinatorial optimization problem. Presented are an indirect transcription scheme for applying PSO to such discrete optimization problems and an oscillating mechanism for averting stagnation.

Passive Solar Building Design Using Genetic ProgrammingPassive solar building design is the process of designing a building while considering sunlight exposure for receiving heat in winter and rejecting heat in summer. The main goal of a passive solar building design is to remove or reduce the need of mechanical and electrical systems for cooling and heating, and therefore saving energy costs and reducing environmental impact. This research will use evolutionary computation to design passive solar buildings. Evolutionary design is used in many research projects to build 3D models for structures automatically. In this research, we use a mixture of split grammar and stringrewriting for generating new 3D structures. To evaluate energy costs, the EnergyPlus system is used. This is a comprehensive building energy simulation system, which will be used alongside the genetic programming system. In addition, genetic programming will also consider other design and geometry characteristics of the building as search objectives, for example, window placement, building shape, size, and complexity. In passive solar designs, reducing energy that is needed for cooling and heating are two objectives of interest. Experiments show that smaller buildings with no windows and skylights are the most energy efficient models. Window heat gain is another objective used to encourage models to have windows. In addition, window and volume based objectives are tried. To examine the impact of environment on designs, experiments are run on five different geographic locations. Also, both single floor models and multifloor models are examined in this research. According to the experiments, solutions from the experiments were consistent with respect to materials, sizes, and appearance, and satisfied problem constraints in all instances.

Properties and algorithms of the (n, k)arrangement graphsThe (n, k)arrangement interconnection topology was first introduced in 1992. The (n, k )arrangement graph is a class of generalized star graphs. Compared with the well known nstar, the (n, k )arrangement graph is more flexible in degree and diameter. However, there are few algorithms designed for the (n, k)arrangement graph up to present. In this thesis, we will focus on finding graph theoretical properties of the (n, k) arrangement graph and developing parallel algorithms that run on this network. The topological properties of the arrangement graph are first studied. They include the cyclic properties. We then study the problems of communication: broadcasting and routing. Embedding problems are also studied later on. These are very useful to develop efficient algorithms on this network. We then study the (n, k )arrangement network from the algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms such as prefix sums computation, sorting, merging and basic geometry computation: finding convex hull on the (n, k )arrangement graph. A literature review of the stateoftheart in relation to the (n, k)arrangement network is also provided, as well as some open problems in this area.

Properties and algorithms of the (n, k)star graphsThe (n, k)star interconnection network was proposed in 1995 as an attractive alternative to the nstar topology in parallel computation. The (n, k )star has significant advantages over the nstar which itself was proposed as an attractive alternative to the popular hypercube. The major advantage of the (n, k )star network is its scalability, which makes it more flexible than the nstar as an interconnection network. In this thesis, we will focus on finding graph theoretical properties of the (n, k )star as well as developing parallel algorithms that run on this network. The basic topological properties of the (n, k )star are first studied. These are useful since they can be used to develop efficient algorithms on this network. We then study the (n, k )star network from algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms for basic communication, prefix computation, and sorting, etc. A literature review of the stateoftheart in relation to the (n, k )star network as well as some open problems in this area are also provided.

Properties and Algorithms of the (n,k)Arrangement Graphs and Augmented CubesThe (n, k)arrangement graph was first introduced in 1992 as a generalization of the star graph topology. Choosing an arrangement topology is more efficient in comparison with a star graph as we can have a closer number of nodes to what is needed. Also it has other advantages such as a lower degree and a smaller diameter, depending on k. In this thesis we investigate the problem of finding k(n − k) disjoint paths from a source node to k(n−k) target nodes in an (n, k)arrangement interconnection network such that no path has length more than diameter+(n−k)+2, where diameter is the maximum length of shortest path between any two nodes in the graph. These disjoint paths are built by routing to all neighbors of the source node and fixing specific elements in each of the k positions of the node representation in an (n, k)arrangement graph. Moreover, a simple routing is presented for finding n disjoint paths between two nodes which are located in different subgraphs. The lengths are no more than d(t, s) + 4, for d(t, s) being the shortest path length between two nodes s and t. This routing algorithm needs O(n^2) time to find all n these paths. In addition to arrangement graphs, we also study augmented cubes, first introduced in 2002, a desirable variation of the hypercube. An augmented cube of dimension n has a higher degree and a lower diameter in comparison with the hypercube. We introduce an O(n^3) algorithm for finding disjoint shortest paths from a single source node to 2n − 1 different target nodes.

Properties and algorithms of the hyperstar graph and its related graphsThe hyperstar interconnection network was proposed in 2002 to overcome the drawbacks of the hypercube and its variations concerning the network cost, which is defined by the product of the degree and the diameter. Some properties of the graph such as connectivity, symmetry properties, embedding properties have been studied by other researchers, routing and broadcasting algorithms have also been designed. This thesis studies the hyperstar graph from both the topological and algorithmic point of view. For the topological properties, we try to establish relationships between hyperstar graphs with other known graphs. We also give a formal equation for the surface area of the graph. Another topological property we are interested in is the Hamiltonicity problem of this graph. For the algorithms, we design an allport broadcasting algorithm and a singleport neighbourhood broadcasting algorithm for the regular form of the hyperstar graphs. These algorithms are both optimal timewise. Furthermore, we prove that the folded hyperstar, a variation of the hyperstar, to be maixmally faulttolerant.

Properties and Algorithms of the KCube GraphsThe KCube interconnection topology was rst introduced in 2010. The KCube graph is a compound graph of a Kautz digraph and hypercubes. Compared with the at tractive Kautz digraph and well known hypercube graph, the KCube graph could accommodate as many nodes as possible for a given indegree (and outdegree) and the diameter of interconnection networks. However, there are few algorithms designed for the KCube graph. In this thesis, we will concentrate on nding graph theoretical properties of the KCube graph and designing parallel algorithms that run on this network. We will explore several topological properties, such as bipartiteness, Hamiltonianicity, and symmetry property. These properties for the KCube graph are very useful to develop efficient algorithms on this network. We will then study the KCube network from the algorithmic point of view, and will give an improved routing algorithm. In addition, we will present two optimal broadcasting algorithms. They are fundamental algorithms to many applications. A literature review of the state of the art network designs in relation to the KCube network as well as some open problems in this field will also be given.