Publications of the month - March 2026

  • Research
Published on March 26, 2026 Updated on March 26, 2026
3IA Côte d'Azur researchers March 2026 publications
3IA Côte d'Azur researchers March 2026 publications

We are pleased to share the 3IA Côte d’Azur’s researchers’ new publications.

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), June 2026, Denver (USA)

  • "Illustrator’s Depth: Monocular Layer Index Prediction for Image Decomposition", Nissim Maruani (3IA Ph.D.), Peiying Zhang, Siddhartha Chaudhuri, Matthew Fisher, Nanxuan Zhao, Vladimir G. Kim, Pierre Alliez (3IA Chairholder), Mathieu Desbrun, Wang Yifan
Abstract: We introduce Illustrator's Depth, a novel definition of depth that addresses a key challenge in digital content creation: decomposing flat images into editable, ordered layers. Inspired by an artist's compositional process, illustrator's depth infers a layer index to each pixel, forming an interpretable image decomposition through a discrete, globally consistent ordering of elements optimized for editability. We also propose and train a neural network using a curated dataset of layered vector graphics to predict layering directly from raster inputs. Our layer index inference unlocks a range of powerful downstream applications. In particular, it significantly outperforms state-of-the-art baselines for image vectorization while also enabling high-fidelity text-to-vector-graphics generation, automatic 3D relief generation from 2D images, and intuitive depth-aware editing. By reframing depth from a physical quantity to a creative abstraction, illustrator's depth prediction offers a new foundation for editable image decomposition.

Read the article

29th Annual Conference on Artificial Intelligence and Statistics (AISTATS 2026), May 2026, Tangier (Marocco)

  • "Reconciling Communication Compression and Byzantine-Robustness in Distributed Learning", Diksha Gupta, Antonio Honsell, Chuan Xu, Nirupam Gupta, Giovanni Neglia (3IA Chairholder)
Abstract: Distributed learning enables scalable model training over decentralized data, but remains hindered by Byzantine faults and high communication costs. While both challenges have been studied extensively in isolation, their interplay has received limited attention. Prior work has shown that naively combining communication compression with Byzantine-robust aggregation can severely weaken resilience to faulty nodes. The current state-of-the-art, Byz-DASHA-PAGE, leverages a momentum-based variance reduction scheme to counteract the negative effect of compression noise on Byzantine robustness. In this work, we introduce RoSDHB, a new algorithm that integrates classical Polyak momentum with a coordinated compression strategy. Theoretically, RoSDHB matches the convergence guarantee of Byz-DASHA-PAGE under the standard (G,B)-gradient dissimilarity model, but relies on milder assumptions. Empirically, RoSDHB demonstrates stronger robustness while achieving substantial communication savings compared to Byz-DASHA-PAGE.

Read the article

23rd European Semantic Web Conference (ESWC26), May 2026, Dubrovnik (Croatia)

  • "Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings" by Guillaume Méroué (3IA Ph.D. student), Fabien Gandon (3IA Chairholder), and Pierre Monnin
Abstract: Embedding models (KGEMs) constitute the main link prediction approach to complete knowledge graphs. Standard evaluation protocols emphasize rank-based metrics such as MRR or Hits@K, but usually overlook the influence of random seeds on result stability. Moreover, these metrics conceal potential instabilities in individual predictions and in the organization of embedding spaces. In this work, we conduct a systematic stability analysis of multiple KGEMs across several datasets. We find that high-performance models actually produce divergent predictions at the triple level and highly variable embedding spaces. By isolating stochastic factors (i.e., initialization, triple ordering, negative sampling, dropout, hardware), we show that each independently induces instability of comparable magnitude. Furthermore, for a given model, hyperparameter configurations with better MRR are not guaranteed to be more stable. Moreover, voting, albeit a known remediation mechanism, only provides a limited enhancement of stability. These findings highlight critical limitations of current benchmarking protocols, and raise concerns about the reliability of KGEMs for knowledge graph completion.

Read the article

International Symposium on Biomedical Imaging (ISBI 2026), April 2026, London (England)

  • "Resource-Efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback" by Alix de Langlais, Benjamin Billot, Théo Aguilar Vidal, Marc-Olivier Gauci (3IA Fellow), Hervé Delingette (3IA Chairholder)
Abstract: Delineating anatomical regions is a key task in medical image analysis. Manual segmentation achieves high accuracy but is labor-intensive and prone to variability, thus prompting the development of automated approaches. Recently, a breadth of foundation models has enabled automated segmentations across diverse anatomies and imaging modalities, but these may not always meet the clinical accuracy standards. While segmentation refinement strategies can improve performance, current methods depend on heavy user interactions or require fully supervised segmentations for training. Here, we present SCORE (Segmentation COrrection from Regional Evaluations), a weakly supervised framework that learns to refine mask predictions only using light feedback during training. Specifically, instead of relying on dense training image annotations, SCORE introduces a novel loss that leverages region-wise quality scores and over/under-segmentation error labels. We demonstrate SCORE on humerus CT scans, where it considerably improves initial predictions from TotalSegmentator, and achieves performance on par with existing refinement methods, while greatly reducing their supervision requirements and annotation time.
Our code is available here

Read the article

19th Conference of the European Chapter of the Association for Computational Linguistics (EACL), March 2026, Rabat (Marocco)

  • "Stakeholder Suite: A Unified AI Framework for Mapping Actors, Topics and Arguments in Public Debates" (Demo track) by Mohamed Chenene, Jeanne Rouhier, Jean Daniélou, Mihir Sarkar, and Elena Cabrio (3IA Chairholder)
Abstract: Public debates surrounding infrastructure and energy projects involve complex networks of stakeholders, arguments, and evolving narratives. Understanding these dynamics is crucial for anticipating controversies and informing engagement strategies, yet existing tools in media intelligence largely rely on descriptive analytics with limited transparency. This paper presents Stakeholder Suite, a framework deployed in operational contexts for mapping actors, topics, and arguments within public debates. The system combines actor detection, topic modeling, argument extraction and stance classification in a unified pipeline. Tested on multiple energy infrastructure projects as a case study, the approach delivers fine-grained, source-grounded insights while remaining adaptable to diverse domains. The framework achieves strong retrieval precision and stance accuracy, producing arguments judged relevant in 75% of pilot use cases. Beyond quantitative metrics, the tool has proven effective for operational use: helping project teams visualize networks of influence, identify emerging controversies, and support evidence-based decision-making.

Read the article
  • "CacheNotes: Task-Aware Key-Value Cache Compression for Reasoning-Intensive Knowledge Tasks" by Giulio Corallo, Orion Weller, Fabio Petroni, Paolo Papotti (3IA Chairholder)
Abstract: Integrating external knowledge into Large Language Models (LLMs) is crucial for many real-world applications, yet current methods like Retrieval-Augmented Generation (RAG) face limitations with broad, multi-source queries, while long-context models are computationally prohibitive.
We introduce CACHENOTES: Task-Aware Key-Value Cache Compression. Given a task description and a corpus, CACHENOTES first generates a sequence of Compression-Planning-Tokens (CPTs), an offline task-focused distillation pass that identifies and organizes key information from the corpus. These CPTs are then used to guide a one-time compression of the corpus into a compact, reusable KV cache, which is then used alone at inference time to efficiently answer diverse, reasoning-intensive queries — eliminating repeated retrieval or context expansion.
Experiments on LongBench show that, on Question-Answering tasks at a 20× compression, CACHENOTES outperforms RAG by over 8 F1 points and reduces latency by over 4×. On RULER, it surpasses previous query-agnostic compression methods by 55 points, narrowing the gap to query-aware compression approaches. Additional results on real-world enterprise and synthetic datasets demonstrate its strong performance on multi-hop and broad-coverage queries.

Read the article

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026), March 2026, Tucson (USA)

  • "Denoise, Divide, Distill, and Predict (D3P): Towards Forecasting Long-horizon Real-world Anomaly from Normalcy" by Quentin Mérilleau, Snehashis Majhi, Antitza Dantcheva (3IA Chairholder), Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond (3IA Chairholder)
Asbtract: Forecasting abnormal human behavior (AHB) in unconstrained real-world environments is critical for enabling proactive safety interventions. Unlike short-term anomaly detection, long-horizon forecasting offers a vital reaction window but remains underexplored due to three core challenges: (i) noisy, complex human–agent interactions; (ii) weak temporal coupling between normal observations and distant anomalies; and (iii) data scarcity limiting the scalability of autoregressive models.
To address these, we propose D3P (Denoise, Divide, Distill, and Predict), a novel encoder–decoder framework that bridges denoised pasts with distilled autoregressive futures. Our Differential Past Encoder (DiPE) disentangles scene-level and object-level dynamics via differential attention, suppressing irrelevant interactions and enhancing discriminative cues. The Distilled Future Auto-Regressive Decoder (D-FAD) adopts a divide-and-conquer strategy, segmenting future queries into temporal chunks for sequential prediction, while leveraging distillation to balance robustness and latency.
We validate our approach on the AHB-F benchmark, the only dataset dedicated to abnormal behavior forecasting, and further integrate D-FAD with several state-of-the-art methods. In all cases, our framework consistently outperforms prior work in both forecasting accuracy and computational efficiency.

Read the article
  • "MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training" by Muhammad Osama Zeeshan, Natacha Gillet, Alessandro Lameiras Koerich, Marco Pedersoli, François Brémond (3IA Chairholder), Eric Granger
Asbtract: Personalized expression recognition (ER) involves adapting a machine learning model to subject-specific data for improved recognition of expressions with considerable interpersonal variability. Subject-specific ER can benefit significantly from multi-source domain adaptation (MSDA) methods — where each domain corresponds to a specific subject — to improve model accuracy and robustness.
Despite promising results, state-of-the-art MSDA approaches often overlook multimodal information or blend sources into a single domain, limiting subject diversity and failing to explicitly capture unique subject-specific characteristics.
To address these limitations, we introduce MuSACo, a multimodal subject-specific selection and adaptation method for ER based on co-training. It leverages complementary information across multiple modalities and multiple source domains for subject-specific adaptation. This makes MuSACo particularly relevant for affective computing applications in digital health, such as patient-specific assessment for stress or pain, where subject-level nuances are crucial.
MuSACo selects source subjects relevant to the target and generates pseudo-labels using the dominant modality for class-aware learning, in conjunction with a class-agnostic loss to learn from less confident target samples. Finally, source features from each modality are aligned, while only confident target features are combined.
Experimental results on challenging multimodal ER datasets — BioVid, StressID, and BAH — show that MuSACo outperforms UDA (blending) and state-of-the-art MSDA methods.
Code available here

Read the article

26th French-Speaking Conference on Knowledge Extraction and Management (EGC 2026), January 2026, Anglet (France)

  • "Q²Forge : Générer des collections de questions de compétences et requêtes SPARQL pour interroger des graphes de connaissances en langue naturelle" ("Q²Forge: Generating Competency Question Collections and SPARQL Queries for Querying Knowledge Graphs in Natural Language") by Yousouf Taghzouti, Franck Michel, Tao Jiang, Louis Felix Nothias (Junior Professor associated with the 3IA), Fabien Gandon (3IA Chairholder)
Abstract: SPARQL is the standard query language for accessing knowledge graphs (KGs). However, formulating SPARQL queries remains a challenging task for non-expert users, and a time-consuming one even for experienced practitioners. Best practices recommend documenting KGs with competency questions (CQs) and example queries in order to contextualise the knowledge they contain and illustrate their potential applications. In practice, this is rarely done, or only with a limited number of examples.
Large Language Models (LLMs) are increasingly used in conversational agents and demonstrate a wide range of applications, from simple question answering to code generation in specific languages. Nevertheless, training and evaluating these models to produce high-quality SPARQL queries from natural language questions requires large collections of question–query pair examples.
This paper presents Q²Forge, an approach aimed at automatically generating new competency questions for a knowledge graph and their corresponding SPARQL queries. Q²Forge validates these queries iteratively through human evaluation and an LLM acting as a judge. The tool is open source, generic, extensible and modular: its different modules (CQ generation, query generation and refinement) can be used separately or as an integrated pipeline. The result is a complete workflow, from competency question formulation to query evaluation, facilitating the creation of question–query benchmark datasets for any target KG.

Read the article
  • Généraliser l’adaptation de modèles de langue frugaux pour l’extraction de motifs RDF à partir de texte" ("Generalising Frugal Language Model Adaptation for RDF Pattern Extraction from Text") by Célian Ringwald (3IA Ph.D. alumni), Fabien Gandon (3IA Chairholder), Catherine Faron, Franck Michel, Hanna Abi Akl
Abstract: Small language models have demonstrated strong performance for RDF relation extraction from SHACL shapes. This paper, drawn from our work accepted at K-CAP 2025, investigates their ability to jointly handle both Datatype and Object Property types. The main challenge lies in the extraction of rare properties. To address this, we explore several strategies: stratified sampling, loss weighting, data resampling and pattern-based synthetic generation. The best results are achieved when each property reaches a minimum occurrence threshold in the training data. Our data, results and code are made publicly available to ensure reproducibility. This work thus proposes concrete methods for training specialised SLMs and opens up new perspectives for semantic relation extraction.

Read the article

IEEE/CVF International Conference on Computer Vision, ICCV 2025, October 2025, Honolulu, Hawai (USA)

  • "Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection" by Giacomo D’Amicantonio, Snehashis Majhi, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond (3IA Chairholder), Egor Bondarev
Abstract: Video Anomaly Detection (VAD) is a challenging task due to the variability of anomalous events and the limited availability of labeled data. Under the Weakly-Supervised VAD (WSVAD) paradigm, only video-level labels are provided during training, while predictions are made at the frame level. Although state-of-the-art models perform well on simple anomalies (e.g., explosions), they struggle with complex real-world events (e.g., shoplifting). This difficulty stems from two key issues: (1) the inability of current models to address the diversity of anomaly types, as they process all categories with a shared model, overlooking category-specific features; and (2) the weak supervision signal, which lacks precise temporal information, limiting the ability to capture nuanced anomalous patterns blended with normal events. To address these challenges, we propose Gaussian Splatting-guided Mixture of Experts (GS-MoE), a novel framework that employs a set of expert models, each specialized in capturing specific anomaly types. These experts are guided by a temporal Gaussian splatting loss, enabling the model to leverage temporal consistency and enhance weak supervision. The Gaussian splatting approach encourages a more precise and comprehensive representation of anomalies by focusing on temporal segments most likely to contain abnormal events. The predictions from these specialized experts are integrated through a mixture-of-experts mechanism to model complex relationships across diverse anomaly patterns. Our approach achieves state-of-the-art performance, with a 91.58% AUC on the UCF-Crime dataset, and demonstrates superior results on XD-Violence and MSAD datasets. By leveraging category-specific expertise and temporal guidance, GS-MoE sets a new benchmark for VAD under weak supervision.

Read the article / Read the article with supplementary
  • "Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation", with supplementary by Tanay Agrawal, Abid Ali, Antitza Dantcheva (3IA Chairholder), Francois Bremond (3IA Chairholder)
Abstract: Temporal Action Detection (TAD) is essential for analyzing long-form videos by identifying and segmenting actions within untrimmed sequences. While recent innovations like Temporal Informative Adapters (TIA) have improved resolution, memory constraints still limit large video processing. To address this, we introduce AdaTAD++, an enhanced framework that decouples temporal and spatial processing within adapters, organizing them into independently trainable modules. Our novel two-step training strategy first optimizes for high temporal and low spatial resolution, then vice versa, allowing the model to utilize both high spatial and temporal resolutions during inference, while maintaining training efficiency. Additionally, we incorporate a more sophisticated temporal module capable of capturing long-range dependencies more effectively than previous methods. Experiments on benchmark datasets, including ActivityNet-1.3, THUMOS14, and EPIC-Kitchens 100, demonstrate that AdaTAD++ achieves state-of-the-art performance. We also explore various adapter configurations, discussing their trade-offs regarding resource constraints and performance, providing valuable insights into their optimal application.

Read the article

MultiMediate Challenge: Multi-modal Group Behaviour Analysis for Artificial Mediation, part of the 33th ACM International Conference on Multimedia, October 2025, Dublin (Ireland)

  • "MultiMediate'25: Cross-cultural Multi-domain Engagement Estimation" by Daksitha Senel Withanage Don, Marius Funk, Michal Balazia, Huajian Qiu, Shogo Okada, François Brémond (3IA Chairholder), Jan Alexandersson, Andreas Bulling, Elisabeth André, Philipp Müller
Abstract: Estimating the momentary level of participant engagement is an important prerequisite for assistive systems that support human interactions. Previous work has addressed this task in within-domain evaluation scenarios, i.e. training and testing on the same dataset. This is in contrast to real-life scenarios where domain shifts between training and testing data frequently occur. With MultiMediate '24, we present the first challenge addressing multi-domain engagement estimation. As training data, we utilise the NOXI database of dyadic novice-expert interactions. In addition to within-domain test data, we add two new test domains. First, we introduce recordings following the NOXI protocol but covering languages that are not present in the NOXI training data. Second, we collected novel engagement annotations on the MPIIGroupInteraction dataset, which consists of group discussions between three to four people. In this way, MultiMediate '24 evaluates the ability of approaches to generalise across factors such as language and cultural background, group size, task, and screen-mediated vs. face-to-face interaction. This paper describes the MultiMediate '24 challenge, presents baseline results, and discusses selected challenge solutions.

Read the article