3. Session - Training and Inference of Coalescent Models

In the last session, we’ll be using what have learned about neural networks, but use a different architecture/model, suitable for processing of graphs.

Graph Neural Networks (GNNs) are a relatively new and exciting area of research in the field of deep learning. GNNs are designed to operate on graph-structured data, where the input data is represented as a graph, consisting of nodes and edges. GNNs have shown great potential in various fields such as social network analysis, molecule structure prediction, and recommender systems.

One of the core components of GNNs is the graph convolution operation. Unlike regular convolutions, which operate on fixed grid-like structures such as images, graph convolutions operate on irregular graph structures. Graph convolutions aim to propagate information between nodes in a graph, taking into account the graph structure.

The key difference between graph convolutions and regular convolutions is that graph convolutions use learnable weights, which are applied to the neighborhood of each node, rather than applying the same weights to all nodes in a fixed grid. This allows the model to learn node representations that capture the graph structure, which can be used for various downstream tasks.

Some common tasks that GNNs are used for include:

  1. Node classification: In this task, each node in the graph is assigned a label based on the graph structure and its features.

  2. Link prediction: Here, the goal is to predict missing links in a graph, given the existing graph structure.

  3. Graph classification: In this task, the entire graph is assigned a label, based on the structure and features of the nodes and edges.

  4. Graph generation: This task involves generating new graphs that have similar structure and properties to a given set of input graphs.

Overall, GNNs and graph convolutions represent a promising approach for dealing with graph-structured data, and have the potential to revolutionize a wide range of applications in the future.

3.0 Background - ARGs, msprime, tsinfer, tsdate

Ancestral recombination graphs (ARGs) are probabilistic models that represent the history of genetic variation within a population over time. They capture the patterns of genetic inheritance and recombination that occur in populations, and can be used to simulate genetic data, infer evolutionary histories, and estimate population parameters.

The software package msprime is a popular tool for simulating ARGs under a variety of demographic scenarios. msprime uses a coalescent simulation framework, which models the process by which lineages coalesce over time, to generate ARGs that reflect the demographic history of a population.

Once an ARG is generated, it can be analyzed using tools such as tskit, which is a Python library for working with large-scale genomic data. tskit provides efficient data structures and algorithms for manipulating ARGs, and can be used for tasks such as simulating genetic data, estimating demographic parameters, and inferring evolutionary histories.

Another important tool for working with ARGs is tsinfer, which is a method for inferring ARGs directly from genotype data. tsinfer uses a probabilistic approach to estimate the most likely ARG that is consistent with the observed genetic data. This can be useful for reconstructing the evolutionary history of a population, or for identifying regions of the genome that have been subject to natural selection.

Finally, tsdate is a method for estimating the timescale of evolutionary events in an ARG. tsdate uses a machine learning approach to estimate the mutation rate and time to the most recent common ancestor (TMRCA) for each branch in the ARG. This information can be used to date the evolutionary events that are represented in the ARG, such as the time of a population split or the onset of a selective sweep.

Overall, ARGs and the tools for working with them represent a powerful framework for studying the evolutionary history of populations and the genetic variation that underlies it. These tools are increasingly being used in fields such as population genetics, evolutionary biology, and human genetics, and are likely to play an important role in future research in these areas.

3.1 Simulation of ARGs under the Beta-coalescent