Large-scale analysis and machine learning of the distribution of genetic traits in populations

PI: Dominika Hozakowska-Roszkowska

We speculate that different locations in the DNA are exposed to a varying degree of evolutionary pressure which differs between populations based on the genetic background, the environment and the geographic locations. It is already known that various population groups have different rates of SNPs (single nucleotide polymorphisms) and there are also SNPs which are rare globally, but occur frequently in a specific population. It is likely that this is not only the case for single SNPs, but true for entire regions in the DNA with important implications to the susceptibility of entire populations towards certain diseases or conditions – all of which could explain the different prevalence of disease subtypes between populations. Besides environmental factors, different local distributions of the SNP densities in one sub-population might lead to different consequences, as in the patient group, where the same SNP is in the evolutionarily more conserved region.

Therefore, we are developing a novel method for large-scale analysis of genetic data unravelling statistically significant distributions of genetic traits and modifications in the DNA between different populations.

The increasing number of human genomes allows for the in-depth analysis of the evolutionary conservation. More and more consortia are formed and produce a tremendous wealth of data which is rapidly made available to the public, for example the 1000 genomes projects, several national initiatives like Denmark or Great Britain which seek to sequence over 100,000 genomes.

Analyzing the entire genome as well as using the available data from different consortia requires a lot of computational power therefore, all the processing, data handling and calculations used in the project are performed on ABACUS2.0, located at SDU in Odense, Denmark.

Large-scale analysis and machine learning of the distribution of genetic traits in populations

Published by Desirée Suhr Pérez on January 8, 2020

Desirée Suhr Pérez