We are pleased to share the 3IA Côte d’Azur’s researchers’ new publications.
Forty-Third International Conference on Machine Learning (ICML 2026), July 2026, Seoul
- Variance-Reduced (ε,δ)−Unlearning using Forget Set Gradients
Martin Van Waerebeke, Giovanni Neglia (3IA Chairholder), Kevin Scaman, Marco Lorenzi (3IA Chairholder), El-Mahdi El-Mhamdi
Abstract: In machine unlearning, (ε,δ)−unlearning is a popular framework that provides formal guarantees on the effectiveness of the removal of a subset of training data, the \emph{forget set}, from a trained model. For strongly convex objectives, existing first-order methods achieve (ε,δ)−unlearning, but they only use the forget set to calibrate injected noise, never as a direct optimization signal. In contrast, efficient empirical heuristics often exploit the forget samples (e.g., via gradient ascent) but come with no formal unlearning guarantees. We bridge this gap by presenting the Variance-Reduced Unlearning (VRU) algorithm. To the best of our knowledge, VRU is the first first-order algorithm that directly includes forget set gradients in its update rule, while provably satisfying (ε,δ)−unlearning. We establish the convergence of VRU and show that incorporating the forget set yields strictly improved rates, i.e., a better dependence on the achieved error compared to existing first-order (ε,δ)−unlearning methods. Moreover, we prove that, in a low-error regime VRU asymptotically outperforms any first-order methods that ignores the forget set. Experiments corroborate our theory, showing consistent gains over both state-of-the-art certified unlearning methods and over empirical baselines that explicitly leverage the forget set.
Read the paper
The 32nd International Conference on Principles and Practice of Constraint Programming (CP 2026), July 2026, Lisbon
- The Distance Constraint on Sequence Variables
Margaux Schmied (3IA Ph.D. student), Augustin Delecluse, Jean-Charles Régin (3IA Chairholder), Pierre Schaus
Sequence variables provide a compact framework for modeling routing and scheduling problems in constraint programming, by constructing solutions through successive insertions of nodes into a partial sequence.
We study the propagation of a distance constraint on these variables and propose new admissible lower bounds on the cost of any feasible extension. These bounds, derived from relaxations of the insertion process, make it possible to detect and eliminate unfeasible insertions.
Experiments on the traveling salesman problem with time windows and prize collecting show that these filtering rules significantly reduce the search space compared to existing methods.
Read the paper
The 17th International Conference on Information Processing in Computer-Assisted Interventions (IPCAI 2026), July 2026, Nagoya
This paper will be published in International Journal of Computer Assisted Radiology and Surgery
Asbtract: Purpose: Achieving fine-grained understanding of surgical gestures remains a fundamental challenge in computer vision, due to the subtle and temporally overlapping nature of surgical motions. Gesture boundaries, where transitions between surgical actions occur, present challenges for precise temporal localization. We propose a temporal boundary analysis framework that improves overall surgical gesture segmentation by explicitly modeling transitions between actions. While most existing methods rely on both RGB and kinematic data, our approach operates on RGB-only video, without requiring additional annotations or computational overhead at inference.
Methods: We introduce a Temporal Boundary Distillation Module (TBDM) that leverages privileged information during training to learn boundary-aware features. TBDM employs cross-attention between class-present and class-absent temporal regions derived from ground-truth annotations, explicitly encoding transition information. A lightweight projection layer learns boundary-aware features through knowledge distillation from TBDM, supervised by classification and distillation loss (MSE). At inference, only the trained projection layer is required, resulting in no additional computational cost.
Results: We evaluated TBDM on CholecT50 and RARP-45 surgical datasets. TBDM consistently improved baseline models across all metrics, achieving up to +8.5 edit score improvement on CholecT50. On RARP-45, our approach achieved state-of-the-art edit score (81.4) and F1@50 (77.9), demonstrating effectiveness across different architectures and datasets.
Read the paper
IEEE International Conference on Robotics and Automation (ICRA 2026), June 2026, Vienna
- DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation
Tushar Anand, Maheswar Bora, Antitza Dantcheva (3IA Chairholder), Abhijit Das
Abstract: In this work, we propose a novel Mamba block DenVisCoM, as well as a novel hybrid architecture specifically tailored for accurate and real-time estimation of optical flow and disparity estimation. Given that such multi-view geometry and motion tasks are fundamentally related, we propose a unified architecture to tackle them jointly. Specifically, the proposed hybrid architecture is based on DenVisCoM and a Transformer-based attention block that efficiently addresses real-time inference, memory footprint, and accuracy at the same time for joint estimation of motion and 3D dense perception tasks. We extensively analyze the benchmark trade-off of accuracy and real-time processing on a large number of datasets. Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time. All models and associated code are available
here.
Read the paper
- Learning-Based Fusion for Robust Multi-Spectral Visual Servoing
Enrico Fiasché, Siddharth Singh Savner, Ezio Malis (3IA Chairholder), Philippe Martinet
Asbtract: Multispectral sensors, which measure multiple wavelength bands beyond the standard red, green, and blue channels, capture richer information than conventional RGB cameras. Such enriched data is especially valuable in visual servoing, where robot control critically depends on image content. However, leveraging multiple spectral bands (typically around a dozen) directly within real-time visual servoing constitutes a significant challenge. The only prior work tackled this problem using a Pixel Selection strategy based on image gradients. This paper introduces a learning-based framework to enhance Multi-Spectral Visual Servoing (MSVS) by fusing data from multispectral cameras into a single, robust representation for control. An autoencoder is employed to compress multispectral inputs into a noise-attenuated 2D image, which is then used within a standard rule-based Direct Visual Servoing (DVS) scheme. Comparison experiments both with simulated data and with a real robot in complex and unstructured environments show that the proposed learning-based fusion maintains stable convergence and improves positioning accuracy under noisy conditions while preserving computational efficiency.
Read the paper
ICRA Workshop on "Geometry in the Age of Data Driven Robotics
- Introducing Sylvester Forms to Robotics: Efficient Closed-Form Pose Estimation
Jana Vráblíková, Ezio Malis (3IA Chairholder), Laurent Busé
Asbtract: Pose estimation from 3D-to-3D correspondences is fundamental in robotics and computer vision, with strong relevance to real-time perception and localization. It is commonly formulated as a nonlinear optimization problem that can be reduced to a polynomial system and solved in closed form. In this paper, we introduce a new class of resultant-based polynomial solvers that exploits Sylvester forms to reduce elimination complexity. By integrating Sylvester forms into a hidden-variable formulation, we derive closed-form solvers operating in lower degrees, producing smaller elimination matrices and lower computational cost. Experiments on the KITTI dataset show that the proposed solvers are accurate and faster than state-of-the-art closed-form methods. Beyond the proposed solver, our results highlight a broader point that is particularly relevant for geometric robotics: geometric methods and data-driven methods need not be opposed. While the solver itself is derived from exact algebraic structure, its numerical performance depends on implementation choices such as the order of monomials that induces the block decomposition of the elimination matrix. Since we currently do not have a principled method for selecting the ordering that gives the best numerical conditioning, this work suggests a hybrid direction in which offline learning optimizes such choices while preserving the solver's exact geometric structure.
Read the paper
- Lie Group Error Coordinates for Symmetry-Aware Reinforcement Learning applied to Quadrotor Low-Level Control
Andrea Pagnini, Ezio Malis (3IA Chairholder)
Abstract: As data-driven methods become prevalent in robotics, a key question remains whether classical geometric structures are still relevant or whether they can be learned from data. We argue that geometry is not an alternative to learning, but a design tool that shapes what must be learned. In this paper, we show that encoding the right symmetry in the observation of a RL agent reduces the effective complexity of the control problem at the representation level, prior to any architectural choice. We demonstrate this principle on quadrotor low-level control, expressing tracking errors as Lie group quantities in the desired body frame. We show that this coordinate choice improves sample efficiency and enables zero-shot generalization to unseen trajectories, suggesting that the right choice of error coordinates can effectively improve learning without relying on architectural changes.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), June 2026, Denver (USA)
- THEval. Evaluation Framework for Talking Head Video Generation
Nabyl Quignon, Baptiste Chopin, Yaohui Wang, Antitza Dantcheva (3IA Chairholder)
Abstract: Video generation has achieved remarkable progress, with generated videos increasingly resembling real ones. However, the rapid advance in generation has outpaced the development of adequate evaluation metrics. Currently, the assessment of talking head generation primarily relies on limited metrics, evaluating general video quality, lip synchronization, and on conducting user studies. Motivated by this, we propose a new evaluation framework comprising 8 metrics related to three dimensions (i) quality, (ii) naturalness, and (iii) synchronization. In selecting the metrics, we place emphasis on efficiency, as well as alignment with human preferences. Based on this consideration, we streamline to analyze fine-grained dynamics of head, mouth, and eyebrows, as well as face quality. Our extensive experiments on 85,000 videos generated by 17 state-of-the-art models suggest that while many algorithms excel in lip synchronization, they face challenges with generating expressiveness and artifact-free details. These videos were generated based on a novel real dataset, that we have curated, in order to mitigate bias of training data. Our proposed benchmark framework is aimed at evaluating the improvement of generative methods. Original code, dataset and leaderboards will be publicly released and regularly updated with new methods, in order to reflect progress in the field.
Read the paper
25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), May 2026, Paphos
- Fast and Robust Information Spreading in the Noisy PULL Model : Extended Abstract
Niccolò d'Archivio, Amos Korman, Emanuele Natale (3IA Chairholder), Robin Vacus
Abstract: Efficient information spreading in stochastic multi-agent systems is a core challenge when communication is noisy, bandwidth-limited, and agents lack global coordination. Yet biological systems—such as ant colonies and fish schools—routinely overcome these constraints: a small number of informed individuals can reliably guide large, uncoordinated populations using minimal, noisy signals. Motivated by these observations, we investigate how reliable information dissemination can be achieved in bio-inspired stochastic settings with limited communication and no global control.
We analyze the noisy PULL(ℎ) model, covering a general setting that spans from rumor spreading to majority consensus: a subset of source agents hold initial preferences, and the goal is to converge to the majority preference. Agents passively observe noisy messages from ℎ randomly sampled peers per round. Prior work shows that convergence requires Ω(
/ℎ) rounds even under favorable conditions. We ask: how far can one push simplicity—no synchronization and minimal message size—without compromising convergence speed?
We present a quasi self-stabilizing protocol using only 2-bit messages that converges from arbitrary initial states despite severe noise and asynchrony. It achieves optimal convergence time
((
/ℎ) log
) with high probability, and
(log
) time in the fully connected case ℎ =
. A key subroutine is an even simpler 1-bit protocol assuming simultaneous start, based on a natural two-phase “listen-then-amplify” mechanism reminiscent of biological strategies.
Together, our results connect biologically inspired heuristics with provable guarantees for robust, efficient information dissemination in highly unreliable and uncoordinated systems.
Read the paper
3rd European Semantic Web Conference (ESWC 2026), May 2026, Dubrovnik
- Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings
Guillaume Méroué (3IA Ph.D. student), Fabien Gandon (3IA Chairholder), Pierre Monnin
Best paper award
Abstract: Embedding models (KGEMs) constitute the main link prediction approach to complete knowledge graphs. Standard evaluation protocols emphasize rank-based metrics such as MRR or Hits@K, but usually overlook the influence of random seeds on result stability. Moreover, these metrics conceal potential instabilities in individual predictions and in the organization of embedding spaces. In this work, we conduct a systematic stability analysis of multiple KGEMs across several datasets. We find that high-performance models actually produce divergent predictions at the triple level and highly variable embedding spaces. By isolating stochastic factors (i.e., initialization, triple ordering, negative sampling, dropout, hardware), we show that each independently induces instability of comparable magnitude. Furthermore, for a given model, hyperparameter configurations with better MRR are not guaranteed to be more stable. Moreover, voting, albeit a known remediation mechanism, only provides a limited enhancement of stability. These findings highlight critical limitations of current benchmarking protocols, and raise concerns about the reliability of KGEMs for knowledge graph completion.
Read the paper
Transactions on Machine Learning Research, May 2026
- Think2SQL: Blueprinting Reward Density and Advantage Scaling for Effective Text-to-SQL Reasoning
Simone Papicchio, Simone Rossi, Luca Cagliero, Paolo Papotti (3IA Chairholder)
Abstract: While Large Language Models (LLMs) have advanced the state-of-the-art in Text-to-SQL, robust reasoning in complex, multi-table environments remains a bottleneck for parameter-efficient models. This paper presents a systematic empirical study on injecting reasoning capabilities into Text-to-SQL through the lens of Reinforcement Learning with Verifiable Rewards (RLVR) for the Qwen3 model family. We uncover a critical interplay between reward density, advantage scaling, and model capacity. Our analysis yields four primary insights. First, we propose a novel execution-guided dense reward function that significantly outperforms binary signals and existing state-of-the-art rewards by providing granular feedback at the instance level. Second, we analyze the mechanics of advantage calculation, demonstrating that while large models thrive on sparse signals with aggressive advantage scaling, smaller models require dense rewards and conservative scaling to improve Text-to-SQL performance. Third, we evaluate the impact of cold start showing that distillation does not always benefit RLVR performance, and supervised fine-tuned models are prone to distributional mimicry. Fourth, we map the Pareto frontier of training efficiency, providing insights for optimizing Text-to-SQL reasoning under computational constraints. Our findings culminate in the Think2SQL family: our 4B-parameter model demonstrates reasoning capabilities competitive with state-of-the-art models such as o3. We release our models, datasets, and code to create a blueprint for RLVR optimization in
Text-to-SQL.
Read the paper