• Development of a Transcriptome-Based Genome Assembly Tool and Whole Genome Sequencing for Autism Spectrum Disorders

      Baldwin, Robert; Centre for Biotechnology
      This thesis consisted of two independent projects. The first involved developing a software tool that uses transcriptome data to improve genome assemblies. The second involved processing and analyzing whole genome sequencing (WGS) from the ASPIRE autism spectrum disorder (ASD) cohort. The first project produced the bioinformatics software called RDNA. This free tool was written in Perl and should be valuable for users interested in genome assembly. Comparative assessment between RDNA and the leading transcript based scaffolding software showed that RDNA can significantly improve genome assemblies while making relatively few scaffolding connection errors. RDNA also makes possible the assembly of scaffolding connections, including gap filling, using BLAST. The second project was undertaken with collaborators and involved processing and analyzing whole genome sequencing (WGS) data from the ASPIRE ASD cohort. The ASPIRE ASD cohort consisted of several hundred probands from both simplex and multiplex families. Sequencing occurred for 120 of these individuals who were selected based upon membership in two phenotype clusters (C1 and C2). These individuals had a relatively high rate of intellectual disability (ID) compared to heavily studied ASD cohorts such as the Simons Simplex Collection (SSC), indicating a significant involvement of de novo sequence variants. Analysis of rare single nucleotide variants (SNVs) and insertion/deletions (indels) identified large risk factors for severe neurodevelopmental disorders (NDDs), two of which were previously observed de novo among individuals with severe, undiagnosed NDDs. On this basis, ABCA1 was found to be a novel candidate risk gene. Gene Ontology (GO) analysis of rare loss of function and missense SNVs indicted the importance of lipid metabolic processes and synaptic signalling. Overall, the genetic variation examined by this study pertained to a modest number of cases, consistent with previous findings that ASD is a genetically heterogeneous disorder with a complex genetic architecture.