Recent Submissions

  • Pandemic: A Graph Evolution Story

    Dube, Michael; Houghten, Sheridan; Ashlock, Daniel (IEEE, 2019-07)
    The Graph Evolution Tool (GET)was built to generate personal contact networks representing who can infect whom within a community. The tool is expanded in order to permit an infection scheme which divides the community into different districts, thus permitting within-district and between-district infections. The evolutionary algorithm comprising GET is expanded upon to simulate communities which include 512 individuals in up to eight districts, initially infecting one person in one district and spreading through a community. The overall goal is to generate communities that will maximize the length of an epidemic. The problem associated with adequately exploring the numerous parameters accompanying evolutionary algorithms is addressed using a point packing and insight from previous work. The Susceptible-Infected-Removed (SIR)model of infection was chosen as it provides a sufficient balance of simplicity and complexity for the problem.
  • Using Genetic Programming to Investigate a Novel Model of Resting Energy Expenditure for Bariatric Surgery Patients

    Hughes, James Alexander; Reid, Ryan E. R.; Houghten, Sheridan; Andersen, Ross E. (IEEE, 2020-10-27)
    Traditionally, models developed to estimate resting energy expenditure (REE) in the bariatric population have been limited to linear modelling based on data from `normal' or `overweight' individuals - not `obese'. This type of modelling can be restrictive and yield functions which poorly estimate this important physiological outcome.Linear and nonlinear models of REE for individuals after bariatric surgery are developed with linear regression and symbolic regression via genetic programming. Features not traditionally used in REE modelling were also incorporated and analyzed and genetic programming's intrinsic feature selection was used as a measure of feature importance.A collection of effective new linear and nonlinear models were generated. The linear models generated outperformed the nonlinear on testing data, although the nonlinear models fit the training data better. Ultimately, the newly developed linear models showed an improvement over existing models and the feature importance analysis suggested that the typically used features (age, weight, and height) were the most important.
  • Deep Learning for the Prediction of Stock Market Trends

    Fazeli, Arvand; Houghten, Sheridan (IEEE, 2019-12)
    In this study, deep learning will be used to test the predictability of stock trends. Stock markets are known to be volatile, prices fluctuate, and there are many complicated financial indicators involved. Various data including news or financial indicators can be used to predict stock prices. In this study, the focus will be on using past stock prices and using technical indicators to increase the performance of the results. The goal of this study is to measure the accuracy of predictions and evaluate the results. Historical data is gathered for Apple, Microsoft, Google and Intel stocks. A prediction model is created by using past data and technical indicators were used as features in the model. The experiments were performed by using long short-term memory networks. Different approaches and techniques were tested to boost the performance of the results. To prove the usability of the final model in the real world and measure the profitability of results backtesting was performed. The final results show that while it is not possible to predict the exact price of a stock in the future to gain profitable results, deep learning can be used to predict the trend of stock markets to generate buy and sell signals.
  • Compression of Biological Networks using a Genetic Algorithm with Localized Merge

    Houghten, Sheridan; Romualdo, Angelo; Collins, Tyler K.; Brown, Joseph Alexander (IEEE, 2019-07)
    Network graphs appear in a number of important biological data problems, recording information relating to protein-protein interactions, gene regulation, transcription regulation and much more. These graphs are of such a significant size that they are impossible for a human to understand. Furthermore, the ever-expanding quantity of such information means that there are storage issues. To help address these issues, it is common for applications to compress nodes to form supernodes of similarly connected components. In previous graph compression studies it was noted that such supernodes often contain points from disparate parts of the graph. This study aims to correct this flaw by only allowing merges to occur within a local neighbourhood rather than across the entire graph. This restriction was found to not only produce more meaningful compressions, but also to reduce the overall distortion created by the compression for two out of three biological networks studied.
  • Descriptive Symbolic Models of Gaits from Parkinson's Disease Patients

    Hughes, James Alexander; Houghten, Sheridan; Brown, Joseph Alexander (IEEE, 2019-07)
    Parkinson's disease (PD) is a degenerative disorder of the central nervous system that has many debilitating symptoms which affect the patient's motor system and can cause significant changes in their gait. By using genetic programming, we aim to develop descriptive symbolic nonlinear models of PD patient gait from time series data recorded from pressure sensors under subjects' feet. When compared to popular types of linear regression (OLS and LASSO), the nonlinear models fit their data better and generalize to unseen data significantly better. It was found that models developed for healthy control subjects generalized to other control subjects well, however the models trained on subjects with PD did not generalize well to other PD patients, which complicates the issue of being able to detect the progression of the disease. It is suspected that health care professionals can have difficulty classifying PD due to a lack of accurate data from patient reports; having individually trained models for active monitoring of patients would help in effectively diagnosing PD.
  • Cryptanalysis of RSA: Integer Prime Factorization Using Genetic Algorithms

    Rutkowski, Emilia; Houghten, Sheridan (IEEE, 2020-07)
    In recent years, researchers have been exploring alternative methods to solving Integer Prime Factorization, the decomposition of an integer into its prime factors. This has direct application to cryptanalysis of RSA, as one means of breaking such a cryptosystem requires factorization of a large number that is the product of two prime numbers. This paper applies three different genetic algorithms to solve this issue, utilizing mathematical knowledge concerning distribution of primes to improve the algorithms. The best of the three genetic algorithms has a chromosome that represents m in the equation prime = 6 m ± 1, and is able to factor a number of up to 22 decimal digits. This is a significantly larger number than the largest factored by comparable methods in earlier work. This leads to the conclusion that approaches such as genetic algorithms are a promising avenue of research into the problem of integer factorization.
  • Evolutionary Graph Compression and Diffusion Methods for City Discovery in Role Playing Games

    Brown, Joseph Alexander; Ashlock, Daniel; Houghten, Sheridan; Romualdo, Angelo (IEEE, 2020-07)
    Cities, while exciting in their visualization and permitting several layouts, do not take into account the placement of crucial characters which might be part of the narrative. Narrative graphs, a connected graph of all potential and existing relations within a game, can enable an ability to find a Nonplayer Character (NPC) who is likely to live nearby, under the assumption that those who interact most frequently are also close in distance. We examine the use of an evolutionary graph compression method and a method using simulated diffusion to cluster features based on relational information about players to generate relationally intimate groups. This clustering can be used to generate information about the game world and cities to inform PCG as to how the connectivity of these areas is, and should be, arranged. The algorithms are validated as being human competitive.
  • Gait Model Analysis of Parkinson’s Disease Patients under Cognitive Load

    Hughes, James Alexander; Houghten, Sheridan; Brown, Joseph Alexander (IEEE, 2020-07)
    Parkinson's disease is a neurodegenerative disease that affects close to 10 million with various symptoms including tremors and changes in gait. Observing differences or changes in an individual's manifestations of gait may provide a mechanism to identify Parkinson's disease and understand specific changes. In this study, timeseries data from both Control subjects and Parkinson's disease patients was modelled with symbolic regression and extreme gradient boosting. Model effectiveness was analyzed along with the differences in the models between modelling strategies, between Control subjects and Parkinson's disease patients, and between normal walking and walking while under a cognitive load. Both modelling strategies were found to effective. The symbolic regression models were more easily interpreted, while extreme gradient boosting had higher overall accuracy. Interpretation of the models identified certain characteristics that distinguished Control subjects from Parkinson's disease patients and normal walking conditions from walking while under a cognitive load.
  • Extracting Information from Weighted Contact Networks via Genetic Algorithms

    Rutkowski, Emilia; Houghten, Sheridan; Brown, Joseph Alexander (IEEE, 2020-10-27)
    Epidemic contact tracing examines the movement of infection through a population based upon links in a contact network, and weighted networks represent the potential of transfer of the contagion. Graph compression reduces the size of a network by merging groups of nodes into supernodes. This study considers the use of genetic algorithms to select the nodes to be merged, grouping together highly connected sections of the graphs. Examined is a dataset that is extracted from contacts that occurred during several days of the "Infectious: Stay Away" event. The incorporation of weights, to indicate the strength of interactions between individuals, is an important contribution of this work. The demonstrated outcomes are that by including weighted information on the edges, there is more effective detection of highly interacting subgroups when compared to the unweighted version of graphs. These methods not only compress the networks with a low rate of distortion, but also the identification of supernodes in the networks allows for better targeting of interventions by public health upon individuals in such groups. This is crucial because when one member becomes infected, all members of the group are exposed to the contagion.
  • Evolving the Curve

    Dube, Michael; Houghten, Sheridan; Ashlock, Daniel; Hughes, James Alexander (IEEE, 2020-10-27)
    Evolutionary algorithms are used to generate personal contact networks, modelling human populations, that are most likely to match a given epidemic profile. The Susceptible-Infected-Removed (SIR) model is used and also expanded upon to allow for an extended period of infection, termed the SIIR model. The networks generated for each of these models are thoroughly evaluated for their ability to match nine different epidemic profiles. The addition of the SIIR model showed that the model of infection has an impact on the networks generated. For the SIR and SIIR models, these differences were relatively minor in most cases.
  • Effective Side Effect Machines for Decoding

    Banik, Sharnendu; Houghten, Sheridan (IEEE, 2020-10-27)
    The development of general edit metric decoders is a challenging problem, especially with the inclusion of additional biological restrictions that can occur when using error correcting codes in biological applications. Side effect machines (SEMs), an extension of finite state machines, can provide efficient decoding algorithms for such edit metric codes.Several codes of varying lengths are used to study the effectiveness of evolutionary programming (EP) as a general approach for finding SEMs for edit metric decoding. Direct and fuzzy classification methods are compared while also changing some of the EP settings to observe how decoding accuracy is affected. Regardless of code length, the best results are found using the fuzzy classification methods. For codes of length 10, a maximum accuracy of up to 99.4% is achieved for distance 1 whereas distance 2 and 3 achieve up to 97.1% and 85.9%, respectively. The accuracy suffers for longer codes, as the maximum accuracies achieved by codes of length 14 were 92.4%, 85.7% and 69.2% for distance 1, 2, and 3 respectively. Additionally, the SEMs are examined for potential bloat by comparing the number of reachable states against the total number of states. Bloat is seen more in larger machines than it is in smaller machines. Furthermore, the results are analyzed to find potential trends and relationships among the parameters, with the most consistent trend being that, when allowed, the longer codes generally show a propensity for larger machines.
  • Modelling of Vaccination Strategies for Epidemics using Evolutionary Computation

    Dube, Michael; Houghten, Sheridan; Ashlock, Daniel (IEEE, 2020-07)
    Personal contact networks that represent social interactions can be used to identify who can infect whom during the spread of an epidemic. The structure of a personal contact network has great impact upon both epidemic duration and the total number of infected individuals. A vaccine, with varying degrees of success, can reduce both the length and spread of an epidemic, but in the case of a limited supply of vaccine a vaccination strategy must be chosen, and this has a significant effect on epidemic behaviour.In this study we consider four different vaccination strategies and compare their effects upon epidemic duration and spread. These are random vaccination, high degree vaccination, ring vaccination, and the base case of no vaccination. All vaccinations are applied as the epidemic progresses, as opposed to in advance. The strategies are initially applied to static personal contact networks that are known ahead of time. They are then applied to personal contact networks that are evolved as the vaccination strategy is applied. When any form of vaccination is applied, all strategies reduce both duration and spread of the epidemic. When applied to a static network, random vaccination performs poorly in terms of reducing epidemic duration in comparison to strategies that take into account connectivity of the network. However, it performs surprisingly well when applied on the evolved networks, possibly because the evolutionary algorithm is unable to take advantage of a fixed strategy.
  • Genetic programming for improved cryptanalysis of elliptic curve cryptosystems

    Ribaric, Tim; Houghten, Sheridan (IEEE, 2017-06)
    Public-key cryptography is a fundamental compo- nent of modern electronic communication that can be constructed with many different mathematical processes. Presently, cryp- tosystems based on elliptic curves are becoming popular due to strong cryptographic strength per small key size. At the heart of these schemes is the intractability of the elliptic curve discrete logarithm problem (ECDLP). Pollard’s Rho algorithm is a well known method for solving the ECDLP and thereby breaking ciphers based on elliptic curves. It has the same time complexity as other known methods but is advantageous due to smaller memory requirements. This paper considers how to speed up the Rho process by modifying a key component: the iterating function, which is the part of the algorithm responsible for determining what point is considered next when looking for a collision. It is replaced with an alternative that is found through an evolutionary process. This alternative consistently and significantly decreases the number of iterations required by Pollard’s Rho Algorithm to successfully find a solution to the ECDLP.
  • Lossy Compression of Quality Values in Sequencing Data

    Morales, Veronica Suaste; Houghten, Sheridan (Institute of Electrical and Electronics Engineers (IEEE), 2021-09-01)
    The dropping cost of sequencing human DNA has allowed for fast development of several projects around the world generating huge amounts of DNA sequencing data. This deluge of data has run up against limited storage space, a problem that researchers are trying to solve through compression techniques. In this study we address the compression of SAM files, the standard output files for DNA alignment. We specifically study lossy compression techniques used for quality values reported in the SAM file and analyze the impact of such lossy techniques on the CRAM format. We present a series of experiments using a data set corresponding to individual NA12878 with three different fold coverages. We introduce a new lossy model, dynamic binning, and compare its performance to other lossy techniques, namely Illumina binning, LEON and QVZ. We analyze the compression ratio when using CRAM and also study the impact of the lossy techniques on SNP calling. Our results show that lossy techniques allow a better CRAM compression ratio. Furthermore, we show that SNP calling performance is not negatively affected and may even be boosted.
  • Models of Parkinson's Disease Patient Gait

    Hughes, James Alexander; Houghten, Sheridan; Brown, Joseph Alexander (Institute of Electrical and Electronics Engineers (IEEE), 2020-11)
    Parkinson's Disease is a disorder with diagnostic symptoms that include a change to a walking gait. The disease is problematic to diagnose. An objective method of monitoring the gait of a patient is required to ensure the effectiveness of diagnosis and treatments. We examine the suitability of Extreme Gradient Boosting (XGBoost) and Artificial Neural Network (ANN) Models compared to Symbolic Regression (SR) using genetic programming that was demonstrated to be successful in previous works on gait. The XGBoost and ANN models are found to out-perform SR, but the SR model is more human explainable.
  • A centrality based multi-objective approach to disease gene association

    Collins, Tyler K.; Houghten, Sheridan (Elsevier BV, 2020-06)
    Disease Gene Association nds genes that are involved in the presentation of a given genetic disease. We present a hybrid approach which implements a multi-objective genetic algorithm, where input consists of centrality measures based on various relational biological evidence types merged into a complex network. Multiple objective settings and parameters are studied including the development of a new exchange methodology, safe dealer-based crossover. Successful results with respect to breast cancer and Parkinson's disease compared to previous techniques and popular known databases are shown. In addition, the newly developed methodology is also successfully applied to Alzheimer's disease, further demonstrating its flexibility. Across all three case studies the strongest results were produced by the shortest path-based measures stress and betweenness, either in a single objective parameter setting or when used in conjunction in a multi-objective environment. The new crossover technique achieved the best results when applied to Alzheimer's disease.
  • Restarting and recentering genetic algorithm variations for DNA fragment assembly: The necessity of a multi-strategy approach

    Hughes, James Alexander; Houghten, Sheridan; Ashlock, Daniel (Elsevier BV, 2016-12)
    DNA Fragment assembly – an NP-Hard problem – is one of the major steps in of DNA sequencing. Multiple strategies have been used for this problem, including greedy graph-based algorithms, deBruijn graphs, and the overlap-layout-consensus approach. This study focuses on the overlap-layout-consensus approach. Heuristics and computational intelligence methods are combined to exploit their respective benefits. These algorithm combinations were able to produce high quality results surpassing the best results obtained by a number of competitive algorithms specially designed and tuned for this problem on thirteen of sixteen popular benchmarks. This work also reinforces the necessity of using multiple search strategies as it is clearly observed that algorithm performance is dependent on problem instance; without a deeper look into many searches, top solutions could be missed entirely.
  • Build a Sporadic Group in your Basement

    Becker, Paul; Derka, Martin; Houghten, Sheridan; Ulrich, Jennifer (MAA, 2017-04)
    All simple finite groups are classified as members of specific families. With one exception, these families are infinite collections of groups sharing similar structures. The exceptional family of sporadic groups contains exactly twenty-six members. The five Mathieu groups are the most accessible of these sporadic cases. In this article, we explore connections between Mathieu groups and error-correcting communication codes. These connections permit simple, visual representations of the three largest Mathieu groups: M24, M23, and M22. Along the way, we provide a brief, nontechnical introduction to the field of coding theory.
  • Peer-to-Peer Energy Trading and Energy Conversion in Interconnected Multi-Energy Microgrids Using Multi-Agent Deep Reinforcement Learning

    Chen, Tianyi; Bu, Shengrong; Liu, Xue; Kang, Jikun; Yu, F. Richard; Han, Zhu (Institute of Electrical and Electronics Engineers (IEEE), 2021)
    A key aspect of multi-energy microgrids (MEMGs) is the capability to efficiently convert and store energy in order to reduce the costs and environmental impact. Peer-to-peer (P2P) energy trading is a novel paradigm for decentralized energy market designs. In this paper, we investigate the external P2P energy trading problem and internal energy conversion problem within interconnected residential, commercial and industrial MEMGs. These two problems are complex decision-making problems with enormous high-dimensional data and uncertainty, so a multi-agent deep reinforcement learning approach combining the multi-agent actor-critic algorithm with the twin delayed deep deterministic policy gradient algorithm is proposed. The proposed approach can handle the high-dimensional continuous action space and aligns with the nature of P2P energy trading with multiple MEMGs. Simulation results based on three real-world MG datasets show that the proposed approach significantly reduces each MG's average hourly operation cost. The impact of carbon tax pricing is also considered.