Model-based inference in Phylogeography from single species to communities

CompPhylo workshops are designed to introduce a myriad of tools for making statistical inference about historical processes using genetic/genomic data. After introducing statistical approaches in model-based inferences, participants will be introduced to inference frameworks that span taxonomic scales from single-species demographic inference, to multi-species comparative analysis, to inference at the scale of the whole community. Participants will also get hands on experience using approximate Bayesian computation, supervised machine learning and composite likelihood methods for model comparison.

Participants will work on sample datasets but are also encouraged to bring their own data. Advantages and limitations of each method and the fit to participant’s datasets, field sampling design as well as the selection and use of genetic markers will be actively discussed. In addition, to strengthen connections and exchanges between researchers, participants will have the opportunity to present their own work in the evenings.

Upcoming events:

Details about methods

ABLE

ABLE is a composite likelihood method for the joint inference of arbitrary population histories and the genome-wide recombination rate. It makes use of the distribution of blockwise SFS (bSFS) patterns which retain information on the variation in genealogies spanning short-range linkage blocks across the genome. ABLE does not require phased data as the bSFS does not distinguish the sampled lineage in which a mutation has occurred. Like with the SFS, outgroup information can be also be ignored by folding the bSFS. ABLE takes advantage of openmp parallelization and is tailored for studying population histories of model as well as non-model species.

ABLE stands for Approximate Blockwise Likelihood Estimation. It is written in C/C++ and authored by Champak Beeravolu Reddy

CAMI

CAMI employs a stochastic algorithm to simulate communities assembled under environmental filtering, competitive exclusion, and neutral species assembly processes -simultaneously considering phylogenetic and phenotypic information from species in local and regional communities. CAMI parameterizes the relative strength of the assembly processes to mimic strong to weak non-neutral assembly. CAMI implements a model-based inference procedure by using two approximate approaches, random forests and approximate Bayesian computation. Additionally, because the strength of non-neutral assembly models is parameterized, the strength parameter can be estimated.

CAMI stands for Community Assembly Model Inference and is implemented as an R package.

MESS

MESS is a novel comparative phylogeographic model grounded in community ecological theory. This integrative approach makes use of four data axes (distributions of traits, abundances, genetic diversities/divergences, and phylogenetic patterns) to enable testing alternative community assembly models (neutral vs non-neutral) and estimating parameters underlying different assembly processes (e.g. dispersal vs in situ speciation). This method capitalizes on the widespread use of DNA barcoding and meta-barcoding approaches and is implemented in the software package MESS co-developed by I. Overcast & M. Ruffley.

Multi-DICE

Multi-DICE is an R package for constructing hierarchical co-demographic models and simulating multi-taxa summary statistic vectors in order to perform comparative demographic inferences within a single, unified analysis. Previously, Multi-DICE simulations have been used within a supervised machine learning, specifically random forest, and approximate Bayesian computation (ABC) framework for statistical inference of temporal synchrony among multi-taxa single-population size changes. Additionally, a currently unpublished modification of Multi-DICE that deploys a co-demographic model of population-pairs to investigate congruence in co-divergence is introduced.

PipeMaster

PipeMaster is an R-package to build demographic models and simulate data under the coalescent model. Current implementation can simulate sanger-type and nexgen data for single species or complex of species. It is also possible to simulate single-locus data for hierarchical demographic models of comparative phylogeography and species trees with one horizontal connection (Phylogenetic Networks). PipeMaster simulates summary statistics and coalescent trees. It calculates the same summary statistics on an empirical data. The user can use these sumary statistics to perform aproximate Bayesian computation (ABC) or supervized machine learning (SML) for model and parameter inference.

What is phylogeography/comparative phylogeography?

Who should attend?

CompPhylo workshops are geared toward practicing field biologists with little or no computational experience. General knowledge in evolutionary biology, population genetic and phylogenetic analyses. Basic knowledge in R. bash/Linux and python scripting/commands scripting is useful.

Workshop attendees will need to bring a laptop computer.

Past events:

Acknowledgements

PoreCamp - Which inspired the design of this workshop and also of this site. RADCamp - RAD-Seq assembly and analysis workshop resources. ANU, University of Adelaide, UT Arlington, and UNAM for hosting previous workshops on Multi-DICE