M.Sc. Computer Science
Browse by
Recent Submissions

Data mining using Lfuzzy concept analysis.Association rules in data mining are implications between attributes of objects that hold in all instances of the given data. These rules are very useful to determine the properties of the data such as essential features of products that determine the purchase decisions of customers. Normally the data is given as binary (or crisp) tables relating objects with their attributes by yesno entries. We propose a relational theory for generating attribute implications from manyvalued contexts, i.e, where the relationship between objects and attributes is given by a range of degrees from no to yes. This degree is usually taken from a suitable lattice where the smallest element corresponds to the classical no and the greatest element corresponds to the classical yes. Previous related work handled manyvalued contexts by transforming the context by scaling or by choosing a minimal degree of membership to a crisp (yesno) context. Then the standard methods of formal concept analysis were applied to this crisp context. In our proposal, we will handle a manyvalued context as is, i.e., without transforming it into a crisp one. The advantage of this approach is that we work with the original data without performing a transformation step which modifies the data in advance.

Object Classification using LFuzzy Concept AnalysisObject classification and processing have become a coordinated piece of modern industrial manufacturing systems, generally utilized in a manual or computerized inspection process. Vagueness is a common issue related to object classification and analysis such as the ambiguity in input data, the overlapping boundaries among the classes or regions, and the indefiniteness in defining or extracting features and relations among them. The main purpose of this thesis is to construct, define, and implement an abstract algebraic framework for Lfuzzy relations to represent the uncertainties involved at every stage of the object classification. This is done to handle the proposed vagueness that is found in the process of object classification such as retaining information as much as possible from the original data for making decisions at the highest level making the ultimate output or result of the associated system with least uncertainty.

A Functional Programming Language with Patterns and CopatternsSince the emergence of coinductive data types in functional programming languages, various languages such as Haskell and Coq tried different ways in dealing with them. Yet, none of them dealt with coinductive data types properly. In lazy languages such as Haskell, both inductive data types and coinductive data types are gathered and mixed in one list. Moreover, some languages such as Coq used the same constructors that are used for inductive data types as a tool to tackle coinductive data types, and while other languages such as Haskell did use destructors, they did not use them properly. Coinductive data types behave differently than inductive data types and therefore, it is more appropriate to deal with them differently. In this thesis, we propose a new functional programming language where coinductive data types are dealt with in a dual approach in which coinductive data types are defined by observation and inductive data types are defined by constructors. This approach is more appropriate in dealing with coinductive data types whose importance comes from their role in creating a safer and more sophisticated software.

A Hybrid Approach to Network Robustness Optimization using Edge Rewiring and Edge AdditionNetworks are ubiquitous in the modern world. From computer and telecommunication networks to road networks and power grids, networks make up many crucial pieces of infrastructure that we interact with on a daily basis. These networks can be subjected to damage from many different sources, both random and targeted. If one of these networks receives too much damage, it may be rendered inoperable, which can have disastrous consequences. For this reason, it is in the best interests of those responsible for these networks to ensure that they are highly robust to failure. Since it is not usually feasible to rebuild most existing networks from scratch to make them more resilient, it is necessary to have an approach that can modify an existing network to make it more robust to failure. Previous work has established several methods of accomplishing this task, including edge rewiring and edge addition. Both of these methods can be very useful for optimizing network robustness, but each comes with its own set of limitations. This thesis proposes a new hybrid approach to network robustness optimization that combines both of these approaches. Four edge rewiring based metaheuristic approaches were modified to incorporate one of three different edge addition strategies. A comparative study was performed on these new hybrid optimizers, comparing them to each other and to the vanilla edge rewiring only approach on both synthetic and real world networks. Experiments showed that this new hybrid approach to network robustness optimization leads to much more highly robust networks than an edge rewiring only approach.

Effect of the Side Effect Machines in Edit Metric DecodingThe development of general edit metric decoders is a challenging problem, especially with the inclusion of additional biological restrictions that can occur in DNA error correcting codes. Side effect machines (SEMs), an extension of finite state machines, can provide efficient decoding algorithms for such edit metric codes. However, finding a good machine poses its own set of challenges and is itself considered as an open problem with no general solution. Previous studies utilizing evolutionary computation techniques, such as genetic algorithms and evolutionary programming to search for good SEMs have found success in terms of decoding accuracy. However, they all worked with extremely constricted problem spaces i.e. a single code or codes of the same length. Therefore a general approach that works well across codes of different lengths is yet to be formalized. In this research, several codes of varying lengths are used to study the effectiveness of evolutionary programming (EP) as a general approach for finding efficient edit metric decoders. Two classification methods—direct and fuzzy—are compared while also changing some of the EP settings to observe how the decoding accuracy is affected. The final SEMs are verified against an additional dataset to test their general effectiveness. Regardless of the code length, the best results are found using the fuzzy classification methods. For codes of length 10, a maximum accuracy of up to 99.4% is achieved for distance 1 whereas distance 2 and 3 achieve up to 97.1% and 85.9%, respectively. Unsurprisingly, the accuracy suffers for longer codes, as the maximum accuracies achieved by codes of length 14 were 92.4%, 85.7% and 69.2% for distance 1, 2, and 3 respectively. Additionally, the machines are examined for potential bloat by comparing the number of visited states against the number of total states. The study has found some machines with at least one unvisited state. The bloat is seen more in larger machines than it is in smaller machines. Furthermore, the results are analyzed to find potential trends and relationships among the parameters. The trend that is most consistently noticed is that — when allowed, the longer codes generally show a propensity for larger machines.

Swarmbased Algorithms for Neural Network TrainingThe main focus of this thesis is to compare the ability of various swarm intelligence algorithms when applied to the training of artificial neural networks. In order to compare the performance of the selected swarm intelligence algorithms both classification and regression datasets were chosen from the UCI Machine Learning repository. Swarm intelligence algorithms are compared in terms of training loss, training accuracy, testing loss, testing accuracy, hidden unit saturation, and overfitting. Our observations showed that Particle Swarm Optimization (PSO) was the best performing algorithm in terms of Training loss and Training accuracy. However, it was also found that the performance of PSO dropped considerably when examining the testing loss and testing accuracy results. For the classification problems, it was found that firefly algorithm, ant colony optimization, and fish school search outperformed PSO for testing loss and testing accuracy. It was also observed that ant colony optimization was the algorithm that performed the best in terms of hidden unit saturation.

A RelationAlgebraic Approach to L  Fuzzy TopologyAny science deals with the study of certain models of the real world. However, a model is always an abstraction resulting in some uncertainty, which must be considered. The theory of fuzzy sets is one way of formalizing one of the types of uncertainty that occurs when modeling real objects. Fuzzy sets have been applied in various realworld problems such as control system engineering, image processing, and weather forecasting systems. This research focuses on applying the categorical framework of abstract L  fuzzy relations to Lfuzzy topology with ideas, concepts and methods of the theory of Lfuzzy sets. Since Lfuzzy sets were introduced to deal with the problem of approximate reasoning, t − norm based operations are essential in the definition of L  fuzzy topologies. We use the abstract theory of arrow categories with additional t − norm based connectives to define L  fuzzy topologies abstractly. In particular, this thesis will provide an abstract relational definition of an L  fuzzy topology, consider bases of topological spaces, continuous maps, and the first two separation axioms T0 and T1. The resulting theory of L  fuzzy topological spaces provides the foundation for applications and algorithms in areas such as digital topology, i.e., analyzing images using topological features.

Objective reduction in manyobjective optimization problemsManyobjective optimization problems (MaOPs) are multiobjective optimization problems which have more than three objectives. MaOPs face significant challenges because of search efficiency, computational cost, decision making, and visualization. Many wellknown multiobjective evolutionary algorithms do not scale well with an increasing number of objectives. The objective reduction can alleviate such difficulties. However, most research in objective reduction use nondominated sorting or Pareto ranking. However, Pareto is effective in problems having less than four objectives. In this research, we use two approaches to objective reduction: randombased and linear coefficientbased. We use the sum of ranks instead of Pareto Ranking. When applied to manyobjective problems, the sum of ranks has outperformed many other optimization approaches. We also use the age layered population structure (ALPS). We use ALPS in our approach to remove premature convergence and improve results. The performance of the proposed methods has been studied extensively on the famous benchmark problem DTLZ. The original GA and ALPS outperform the objective reduction algorithms in many test cases of DTLZ. Among all reduction algorithms, a linear coefficient based reduction algorithm provides better performance for some problems in this test suite. Random based reduction is not an appropriate strategy for reducing objectives.

Using Deep Learning for Predicting Stock TrendsDeep learning has shown great promise in solving complicated problems in recent years. One applicable area is finance. In this study, deep learning will be used to test the predictability of stock trends. Stock markets are known to be volatile, prices fluctuate, and there are many complicated financial indicators involved. While the opinion of researchers differ about the predictability of stocks, it has been shown by previous empirical studies that some aspects of stock markets can be predictable to some extent. Various data including news or financial indicators can be used to predict stock prices. In this study, the focus will be on using past stock prices and using technical indicators to increase the performance of the results. The goal of this study is to measure the accuracy of predictions and evaluate the results. Historical data is gathered for Apple, Microsoft, Google and Intel stocks. A prediction model is created by using past data and technical indicators were used as features in the model. The experiments were performed by using long shortterm memory networks. Different approaches and techniques were tested to boost the performance of the results. To prove the usability of the final model in the real world and measure the profitability of results backtesting was performed. The final results show that while it is not possible to predict the exact price of a stock in the future to gain profitable results, deep learning can be used to predict the trend of stock markets to generate buy and sell signals.

Surface Areas of Some Interconnection NetworksAn interesting property of an interconnected network (G) is the number of nodes at distance i from an arbitrary processor (u), namely the node centered surface area. This is an important property of a network due to its applications in various fields of study. In this research, we investigate on the surface area of two important interconnection networks, (n, k)arrangement graphs and (n, k)star graphs. Abundant works have been done to achieve a formula for the surface area of these two classes of graphs, but in general, it is not trivial to find an algorithm to compute the surface area of such graphs in polynomial time or to find an explicit formula with polynomially many terms in regards to the graph's parameters. Among these studies, the most efficient formula in terms of computational complexity is the one that Portier and Vaughan proposed which allows us to compute the surface area of a special case of (n, k)arrangement and (n, k)star graphs when k = n1, in linear time which is a tremendous improvement over the naive solution with complexity order of O(n * n!). The recurrence we propose here has the linear computational complexity as well, but for a much wider family of graphs, namely A(n, k) for any arbitrary n and k in their defined range. Additionally, for (n, k)star graphs we prove properties that can be used to achieve a simple recurrence for its surface area.

A Centrality Based MultiObjective DiseaseGene Association Approach Using Genetic AlgorithmsThe Disease Gene Association Problem (DGAP) is a bioinformatics problem in which genes are ranked with respect to how involved they are in the presentation of a particular disease. Previous approaches have shown the strength of both Monte Carlo and evolutionary computation (EC) based techniques. Typically these past approaches improve ranking measures, develop new gene relation definitions, or implement more complex EC systems. This thesis presents a hybrid approach which implements a multiobjective genetic algorithm, where input consists of centrality measures based on various relational biological evidence types merged into a complex network. In an effort to explore the effectiveness of the technique compared to past work, multiple objective settings and different EC parameters are studied including the development of a new exchange methodology, safe dealerbased (SDB) crossover. Successful results with respect to breast cancer and Parkinson's disease compared to previous EC techniques and popular known databases are shown. In addition, the newly developed methodology is also successfully applied to Alzheimer’s, further demonstrating the flexibility of the technique. Across all three cases studies the strongest results were produced by the shortest pathbased measures stress and betweenness in a single objective parameter setting. When used in conjunction in a multiobjective environment, competitive results were also obtained but fell short of the single objective settings studied as part of this work. Lastly, while SDB crossover fell short of expectations on breast cancer and Parkinson's, it achieved the best results when applied to Alzheimer’s, illustrating the potential of the technique for future study.

Modelling and Proving Cryptographic Protocols in the Spi Calculus using CoqThe spi calculus is a process algebra used to model cryptographic protocols. A process calculus is a means of modelling a system of concurrently interacting agents, and provides tools for the description of communications and synchronizations between those agents. The spi calculus is based on Robin Milner's pi calculus, which was itself based upon his Calculus of Communicating Systems (CCS). It was created by Martin Abadi and Andrew D. Gordon as an expansion of the pi calculus intended to focus on cryptographic protocols, and adds features such as the encryption and decryption of messages using keys. The Coq proof system is an interactive theorem prover that allows for the definition of types and functions, and provides means by which to prove properties about them. The spi calculus has been implemented in Coq and subsequently used to model and show an example proof of a property about a simple cryptographic protocol. This required the implementation of both the syntax and the semantics of the calculus, as well as the rules and axioms used to manipulate the syntax in proofs. We discuss the spi calculus in detail as defined by Abadi and Gordon, then the various challenges faced during the implementation of the calculus and our rationale for the decisions made in the process.

Eﬃcient Merging and Decomposition Variants of Cooperative Particle Swarm Optimization for Large Scale ProblemsFor largescale optimization problems (LSOPs), an increased problem size reduces performance by both increasing the landscape complexity, as well as exponentially increasing the search space size. These contributing factors make up the "curse of dimensionality", which is addressed either by improving the search operator of the metaheuristic or decomposing the highdimensional problem into smaller subproblems. Unfortunately, nonseparable LSOPs contain a scaling number of variable dependencies which should be optimized together but are often separated into different subproblems due to insufficient grouping strategies. Various particle swarm optimization (PSO) techniques have been proposed in order to address these LSOPs, either through the improvement of search operators or utilizing decomposition. However, there is a lack of comparison between them showing which PSO variant performs best for specific types of LSOPs. Additionally, decomposition variants which utilize a cooperative PSO (CPSO) approach still struggle to properly group related variables in more difficult nonseparable multimodal problems. In an attempt to better optimize these nonseparable LSOPs, this thesis introduces two new adaptive decomposition and merging CPSO algorithms, referred to as DCPSO2 and MCPSO2 respectively, which offer a new regrouping strategy by adaptively splitting and merging stagnated subswarms according to the their fitness. These algorithms proposed in this thesis are then compared against existing CPSO variants in order to establish the best decompositionbased PSO algorithm for LSOPs. Results show that the decomposition and merging variants are able to perform competitively with previously wellestablished CPSO algorithms for largescale problems across all problem classes. DCPSO ranks the highest in terms of accuracy across all nonseparable problems while MCPSO and MCPSO2 prove to have the fastest convergence amongst all algorithms.

Deep Learning Concepts for Evolutionary ArtA deep convolutional neural network (CNN) trained on millions of images forms a very highlevel abstract overview of any given target image. Our primary goal is to use this highlevel content information of a given target image to guide the automatic evolution of images. We use genetic programming (GP) to evolve procedural textures. We incorporate a pretrained deep CNN model into the fitness. We are not performing any training, but rather, we pass a target image through the pretrained deep CNN and use its the highlevel representation as the fitness guide for evolved images. We develop a preprocessing strategy called Mean Minimum Matrix Strategy (MMMS) which reduces the dimensions and identifies the most relevant highlevel activation maps. The technique using reduced activation matrices for a fitness shows promising results. GP is able to guide the evolution of textures such that they have shared characteristics with the target image. We also experiment with the fully connected “classifier” layers of the deep CNN. The evolved images are able to achieve high confidence scores from the deep CNN module for some tested target images. Finally, we implement our own shallow convolutional neural network with a fixed set of filters. Experiments show that the basic CNN had limited effectiveness, likely due to the lack of training. In conclusion, the research shows the potential for using deep learning concepts in evolutionary art. As deep CNN models become better understood, they will be able to be used more effectively for evolutionary art.

Modal and Relevance Logics for Qualitative Spatial ReasoningQualitative Spatial Reasoning (QSR) is an alternative technique to represent spatial relations without using numbers. Regions and their relationships are used as qualitative terms. Mostly peer qualitative spatial reasonings has two aspect: (a) the first aspect is based on inclusion and it focuses on the ”partof” relationship. This aspect is mathematically covered by mereology. (b) the second aspect focuses on topological nature, i.e., whether they are in ”contact” without having a common part. Mereotopology is a mathematical theory that covers these two aspects. The theoretical aspect of this thesis is to use classical propositional logic with nonclassical relevance logic to obtain a logic capable of reasoning about Boolean algebras i.e., the mereological aspect of QSR. Then, we extended the logic further by adding modal logic operators in order to reason about topological contact i.e., the topological aspect of QSR. Thus, we name this logic Modal Relevance Logic (MRL). We have provided a natural deduction system for this logic by defining inference rules for the operators and constants used in our (MRL) logic and shown that our system is correct. Furthermore, we have used the functional programming language and interactive theorem prover Coq to implement the definitions and natural deduction rules in order to provide an interactive system for reasoning in the logic.

Image Evolution Using 2D Power SpectraProcedurally generated textures have seen use in many applications, are a highinterest topic when exploring evolutionary algorithms, and hold a central interest for digital art. However, there is an existing difficulty in finding suitable heuristics for measuring perceived qualities of an image. Particular difficulty remains for quantifying aspects of style and shape. In an attempt to bridge the divide between computer vision and cognitive perception, one set of proposed measures from previous studies relate to image spatial frequencies. Based on existing research which uses power spectral density of spatial frequencies as an effective metric for image classification and retrieval, we believe this measure and others based on Fourier decomposition may be effective for guiding evolutionary texture synthesis. We briefly compare some alternative means of using frequency analysis to guide evolution of shape and composition, and refine fitness measures based on Fourier analysis and spatial frequency. Our exploration has been conducted with the goals of improving intuition of these measures, evaluating the utility of these measures for image composition, and observing possible adaptations of their use in digital evolutionary art. Multiple evolutionary guidance schemes with consideration of the spatial frequencies' power spectra and phase have been evaluated across numerous targets with mixed results. We will display our exploration of power spectral density measures and their effectiveness as used for evolutionary algorithm fitness targets, particularly for basic compositional guidance in evolutionary art. We also observe and analyze a previously identified phenomenon of spatial properties which could lead to further consideration of visual comfort and aesthetics.

A Deep Learning Pipeline for Classifying Different Stages of Alzheimer's Disease from fMRI Data.Abstract Alzheimer’s disease (AD) is an irreversible, progressive neurological disorder that causes memory and thinking skill loss. Many different methods and algorithms have been applied to extract patterns from neuroimaging data in order to distinguish different stages of AD. However, the similarity of the brain patterns in older adults and in different stages makes the classification of different stages a challenge for researchers. In this thesis, convolutional neuronal network architecture AlexNet was applied to fMRI datasets to classify different stages of the disease. We classified five different stages of Alzheimer’s using a deep learning algorithm. The method successfully classified normal healthy control (NC), significant memory concern (SMC), early mild cognitive impair (EMCI), late cognitive mild impair (LMCI), and Alzheimer’s disease (AD). The model was implemented using GPU high performance computing. Before applying any classification, the fMRI data were strictly preprocessed to avoid any noise. Then, low to high level features were extracted and learned using the AlexNet model. Our experiments show significant improvement in classification. The average accuracy of the model was 97.63%. We then tested our model on test datasets to evaluate the accuracy of the model per class, obtaining an accuracy of 94.97% for AD, 95.64% for EMCI, 95.89% for LMCI, 98.34% for NC, and 94.55% for SMC.

Properties and Algorithms of the (n,k)Arrangement Graphs and Augmented CubesThe (n, k)arrangement graph was first introduced in 1992 as a generalization of the star graph topology. Choosing an arrangement topology is more efficient in comparison with a star graph as we can have a closer number of nodes to what is needed. Also it has other advantages such as a lower degree and a smaller diameter, depending on k. In this thesis we investigate the problem of finding k(n − k) disjoint paths from a source node to k(n−k) target nodes in an (n, k)arrangement interconnection network such that no path has length more than diameter+(n−k)+2, where diameter is the maximum length of shortest path between any two nodes in the graph. These disjoint paths are built by routing to all neighbors of the source node and fixing specific elements in each of the k positions of the node representation in an (n, k)arrangement graph. Moreover, a simple routing is presented for finding n disjoint paths between two nodes which are located in different subgraphs. The lengths are no more than d(t, s) + 4, for d(t, s) being the shortest path length between two nodes s and t. This routing algorithm needs O(n^2) time to find all n these paths. In addition to arrangement graphs, we also study augmented cubes, first introduced in 2002, a desirable variation of the hypercube. An augmented cube of dimension n has a higher degree and a lower diameter in comparison with the hypercube. We introduce an O(n^3) algorithm for finding disjoint shortest paths from a single source node to 2n − 1 different target nodes.

Complete computational sequence characterization of mobile element variations in the human genome using metapersonal genome dataWhile a large number of methods have been developed to detect such types of genome sequence variations as single nucleotide polymorphisms (SNPs) and small indels, comparatively fewer methods have been developed for finding structural variants (SVs) and in particular mobile elements insertions (MEIs). Moreover, almost all these methods can detect only the breakpoints of an occurred SV, sometimes with approximation, and do not provide complete sequences representing the SVs. The main objective of our research is to develop a set of computer algorithms to provide complete genome sequence characterization for insertional structural variants in the human genomes via local de novo sequence assembly or progressive assembly using discordant and concordant read pairs and splitreads. An essential component of our approach involves utilizing all personal genome data available in the public domain vs. the standard way of using one set of personal genome sequences. The developed tool is the first system that provides full sequence characterization of SVs. Overall, the characterization success rate for Alu is 75.03% with the mean of discordant and splitreads higher than 94 reads. For SVA, it is 71.43% with the threshold of 363 reads. And for L1 the values are 77.78% and 355 respectively. The results showed that the SV characterization depends on the allele frequency and is influenced by the repetitiveness of flanking regions. Therefore, addressing these problems is a key to further improvements.

Learning Strategies for Evolved Cooperating MultiAgent Teams in Pursuit DomainThis study investigates how genetic programming (GP) can be effectively used in a multiagent system to allow agents to learn to communicate. Using the predatorprey scenario and a cooperative learning strategy, communication protocols are compared as multiple predator agents learn the meaning of commands in order to achieve their common goal of first finding, and then tracking prey. This work is divided into three parts. The first part uses a simple GP language in the Pursuit Domain Development Kit (PDP) to investigate several communication protocols, and compares the predators' ability to find and track prey when the prey moves both linearly and randomly. The second part, again in the PDP environment, enhances the GP language and fitness measure in search of a better solution for when the prey moves randomly. The third part uses the Ms. PacMan Development Toolkit to test how the enhanced GP language performs in a game environment. The outcome of each part of this study reveals emergent behaviours in different forms of message sending patterns. The results from Part 1 reveal a general synchronization behaviour emerging from simple message passing among agents. Additionally, the results show a learned behaviour in the best result which resembles the behaviour of guards and reinforcements found in popular stealth video games. The outcomes from Part 2 reveal an emergent message sending pattern such that one agent is designated as the "sending" agent and the remaining agents are designated as "receiving" agents. Evolved agents in the Ms. PacMan simulator show an emergent sending pattern in which there is one agent that sends messages when it is in view of the prey. In addition, it is shown that evolved agents in both Part 2 and Part 3 are able to learn a language. For example, "sending" agents are able to make decisions about when and what type of command to send and "receiving" agents are able to associate the intended meaning to commands.