Mathematical and Computational Biology Stream

Identifying population-level genetic variations in the Indian population

Faculty : Bratati Kahali (CBR) Govindan Rangarajan (Department of Mathematics) and Balaji Jayaprakash (CNS)

What genetic factors can make humans susceptible to specific diseases or heritable traits?

How do we measure these genetic factors? These are genetic variations observed in a population or very rarely in families. These can be single nucleotide polymorphisms, insertions and deletions, copy number variations and other kinds of genetic variations. These genetic variations can determine the susceptibilities to diseases for individuals.

How do we uncover these genetic variations in our own population? We will do the whole genome sequencing (WGS) data analysis for tens of thousands of Indian individuals to estimate the variations in our population. This is a huge computational challenge given that this kind of data is definitely more valuable in aggregate, that is, joint analysis of all these genomes together increases sample size and statistical power. We will work on jointly analyzing tens of thousands of genomes by proper optimization of the computing resources, enabling virtualization and efficiently parsing the datasets to actually complete this analysis and obtain biological information from raw sequence read datasets. We will further work on computation which should scale in a sustainable and unrestrained manner. This is a research problem of genetic BIG data and has never been done before in India. Next, we will construct a better imputation reference panel by merging genetic variant information from Indian WGS with existing cosmopolitan imputation panels. This will facilitate accurate identification of genetic loci for susceptibilities to complex multifactorial diseases in the Indian population.

So, this project will train a student in advanced computational and statistical human genetics.