Identification and characterization of polymorphic mobile elements (MEs) in humans
Retrotransposons are mobile elements (MEs) that propagate in a “copy and paste” fashion in the genomes via RNA intermediates. In the human genome, retrotransposons consist of long terminal repeats (LTRs), long interspersed elements (LINEs), short interspersed elements (SINEs), SINE-VNTR- Alus (SVAs), and processed pseudogenes (PPSGs), and they collectively contribute close to 50% of the genome. Some members of these MEs continue to undergo retrotransposition, thereby generating a type of structural variations (SVs) within and between human populations by the presence and absence of ME insertions at specific genomic locations. A large number of such polymorphic MEs have been previously reported and documented, including cases associated with diseases, but with limited sequence characterization and genotype analysis. In this study, we performed extensive computational analysis and compilation of polymorphic MEs from multiple sources. We focused on characterization of complete sequences representing the insertion alleles and pre-integration alleles of ME polymorphic loci, using methods including local sequence assembly based on rich personal genome sequence data for many entries. Further, we performed in silico genotyping and population distribution for these polymorphic MEs for 2600 human subjects representing 28 well recognized populations around the world, as well as phylogenetic analysis of these human subjects using these polymorphic MEs as markers. We identified a total of 4400 polymorphic MEs with full sequence characterization for both the pre-integration and insertion alleles. Among these, 1267 entries represent new insertions not previously documented in the Database of Retrotransposon Insertion Polymorphisms in humans (dbRIP), and 1777 entries represent ME insertions outside the current human reference genome. By individual populations and all samples as whole, all 5 ME types displayed a similar allele distribution pattern with the majority having an allele frequency at 0.5, while differences across ME types are also seen at the very low frequency range. Nevertheless, polymorphic MEs do show substantial geographic differentiation, with numerous continent-specific loci identified. Polymorphic ME-based clustering of human subjects seems to correlate well with what we know about the history and relationship of human populations, indicating the usefulness of polymorphic MEs as markers for studying human evolution. Furthermore, polymorphic MEs were found to participate in both coding and regulatory sequences, signifying their potential contribution to the phenotypic diversity present among human populations and individuals. In conclusion, polymorphic MEs represent a significant source of human genetic diversity with potentials on impacting the structure, function, and evolution of the human genome.