Published on February 18, 2026–Updated on February 18, 2026
Dates
on the February 18, 2026
Two new published papers co-authored by Charles Bouveyron
Two new published papers co-authored by Charles Bouveyron are available on HAL.
➊ Scaling Optimal Transport to High-Dimensional Gaussian Distributions with Application to Domain Adaptation
Charles Bouveyron, 3IA Chairholder, Marco Corneli, Junior Professor associated with the 3IA
Abstract: Optimal transport (OT) has recently become very popular in machine learning, with application to several sub-fields such as clustering, dictionary learning and \mc{domain adaptation}. However, OT is known to face challenges when dealing with high-dimensional data, such as \mc{images, texts or omics data}. Most current OT approaches for high-dimensional situations rely on projections of the data or measures onto low-dimensional spaces, which inevitably results in information loss. In this work, we consider the case of high-dimensional Gaussian distributions with parsimonious covariance structures and lower intrinsic dimension. We exhibit a simplified closed-form expression of the 2-Wasserstein distance with an efficient and robust calculation procedure based on a low-dimensional decomposition of empirical covariance matrices, without relying on data projections. Furthermore, we provide a closed-form expression for the Monge map, which involves the exact calculation of the square-root and inverse square-root of the source distribution covariance matrix. This approach offers analytical and computational advantages, as demonstrated by our numerical experiments, which quantitatively evaluate these benefits in comparison to existing methods. In addition to being able to compute both the W22-distance and the transport map, our method can compete with model-free methods, in high dimension, even in the case of non-Gaussian distributions. Moreover, it reveals to be of particular interest in the context of unsupervised domain adaptation for supervised classification.
➋ Stick-Breaking Embedded Topic Model with Continuous Optimal Transport for Online Analysis of Document Streams
Federica Granese, Researcher at Centre Inria d'Université Côte d'Azur, Serena Villata, 3IA Scientific Director and Chairholder, Charles Bouveyron, 3IA Chairholder
Online topic models are unsupervised algorithms to identify latent topics in data streams that continuously evolve over time. Although these methods naturally align with real-world scenarios, they have received considerably less attention from the community compared to their offline counterparts, due to specific additional challenges. To tackle these issues, we present SB-SETM, an innovative model extending the Embedded Topic Model (ETM) to process data streams by merging models formed on successive partial document batches. To this end, SB-SETM (i) leverages a truncated stick-breaking construction for the topic-per-document distribution, enabling the model to automatically infer from the data the appropriate number of active topics at each timestep; and (ii) introduces a merging strategy for topic embeddings based on a continuous formulation of optimal transport adapted to the high dimensionality of the latent topic space. Numerical experiments show SB-SETM outperforming baselines on simulated scenarios. We extensively test it on a real-world corpus of news articles covering the Russian-Ukrainian war throughout 2022-2023.