ML24 | Polymath Jr

Generative Machine Learning Models for Data Assimilation

This is a brief introduction to the 2024 Polymath Jr project that will be run by Ricardo Baptista and Giulio Trigila.

Broad Goal Of The Project. Generative machine learning aims to characterize properties of probability distributions from samples. These distributions quantify our beliefs for possible outcomes of an experiment or hypothesis such as:

Is it likely to rain in New York tomorrow?
Will the Boston Red Sox win the World Series in 2024?

In practice we summarize our beliefs by computing properties of distributions such as their moments and quantiles. This project will focus on the distributions for states of dynamical systems given a stream of noisy observations. This distribution is known as the filtering distribution and in this project, we will build algorithms to sample these distributions using tools from optimal transport. This procedure is also known as data assimilation when incorporating data that arrives sequentially in time in an online setting.

The applications of this project range vastly from improving predictions of weather forecasting systems to financial models. For example, data assimilation is a core operational procedure to estimate the state of atmosphere given limited measurements at weather stations. This procedure is deployed on a regular basis by weather prediction services and atmospheric research centers across the globe. Devising robust and unbiased methods to perform these tasks will have a significant benefit for these applications.

General references.

Relevant papers.

Normalizing Flows for Probabilistic Modeling and Inference by G. Papamakarios et al.
An Optimal Transport Formulation of Bayes’ Law for Nonlinear Filtering Algorithms by A. Taghvaei and B. Hosseini.
Optimal Transport Particle Filters by M. Al-Jarrah, B. Hosseini and A. Taghvaei.
A family of nonparametric density estimation algorithms by E.G. Tabak, C.V. Turner.
Data Driven Optimal Transport by E.G. Tabak and G. Trigila.