Mathematical and Computational Biology Stream

Elucidating the genetic architecture of human complex traits beyond single variant genome wide association studies

Faculty: Sridharan Devarajan (CNS) and  Bratati Kahali (CBR)

Complex traits or diseases in humans are those resulting from the effect of multiple genetic loci, in conjunction with environmental influences. GWAS (Genome Wide Association Study) is a robust widely-used approach used to study association between genome-scale genetic variations and complex diseases or phenotypic traits. However, the phenotypic variance of these disease or traits is only partly explained by genetic variants detected by GWAS. Missing heritability in GWAS for various phenotypic traits can be explained by different genetic mechanism such as high-effect contributions from rare and structural variants, gene-gene interactions, gene-environment interactions, among others. The genes or genetic loci affecting one complex trait often affect other traits as well and are thus pleiotropic. Their interaction with other genes or genetic loci, could also determine the extent to which multiple traits are affected. Thus, phenotypic traits associated with a genetic locus modulated by other genetic loci, leads to variations in disease penetrance and expressivity. The contribution from gene-gene interactions, gene-environment interactions, and how inter-trait correlation can elucidate the biology of complex traits will be investigated in this thesis for explaining the genetic architecture of metabolic disorders [central adiposity, type 2 diabetes, abnormal lipid profile, hypertension] and their relation to cognitive function.

We will use whole genome sequencing data, which is a huge computational challenge, since joint analysis of genomes together increases sample size and statistical power. We will work on jointly analyzing tens of thousands of genomes by proper optimization of the computing resources, efficiently parsing the datasets, and obtain biological information from raw sequence read datasets. We will further work on computation which should scale in a sustainable and unrestrained manner. This is a research problem of genetic BIG data and has rarely been done before in India. We will employ machine learning, multifactor dimensionality reduction approaches, mediation analysis, linear or logistic regression models to evaluate the cross- sectional association between the above-mentioned disease measures, while adjusting for age, gender, population stratification, and commonly correlated phenotypic measures.

This project will train a student in advanced computational genomics and statistical human genetics.