Reconstructing the demographic history of populations and species is a fundamental task in evolutionary analysis.
We develop methods for inferring past demography through analysis
of small numbers of complete individual genomes. This is done by utilizing the ancestral information encoded
in numerous loosely-linked genomic loci, and using a genealogy sampler
to accommodate for uncertainty in the local ancestry at each locus. The
Generalized Phylogenetic Coalescent Sampler (G-PhoCS) based on this approach was introduced in
a paper we published in
Nature Genetics in 2011, where we used this approach to infer ancient human demography by analyzing
the complete seuenced genomes of six human individuals.
We are currently interested in two specific applications of this approach:
(1) studying the dynamics leading to
species divergence and speciation through analysis of genomes sequenced from closely related species
2014 study on origins of domestic dogs); and (2) investigating the demography and evolution of early
human populations in Africa through analysis of
ancient DNA and the genomes of individuals from divergent human populations.
Integration of Functional and Population Genomic Data
A central challenge in genomics is to find constructive ways to integrate diverse types of genomic data.
We are interested in methods for integrating sequence vairation data (e.g., individual genome sequences) with
functional genomic data (e.g., ChIP-seq, DNase-seq, RNA-seq). Through this we hope to gain insights on the interplay
between biochemical functions of the genome (e.g., transcription, protein binding) and forces of natural selection
acting on the DNA sequence. In a paper published in
MBE in 2013 we present INSIGHT - a method for infering natural selection in short
interspersed genomic lements,
a companion paper in Nature Genetics, we use this method to study how natural selection has shaped the DNA sequence
at 1.4 million binding sites of 78 transcription factors. We have recently scaled this approach
genome-wide using diverse types of functional genomic data and a simple clustering technique
(see paper on fitCons).
Despite being a classic problem in computational biology, nearly half a century old, the
fundamental task of
reconstructing evolutionary trees (phylogenies) from short sequences still poses interesting
theoretical challenges. Most of our work in
this area focuses on the distance-based approach for phylogenetic reconstruction. We study
and develop algorithms that have provable reconstruction guarantees
Gronau, et. al., 2012) as well as methods for computing more statistically robust evolutionary
Gronau, et. al., 2009;
Doerr, et. al., 2009).