Paper by Giovanni Neglia and Team accepted at NeurIPS 2025

  • Research
Published on November 26, 2025 Updated on November 26, 2025
Dates

on the November 26, 2025

December 2-7, 2025
Location
San Diego Convention Center, California (USA)
Paper by Giovanni Neglia and Team accepted at NeurIPS 2025
Paper by Giovanni Neglia and Team accepted at NeurIPS 2025

The paper “Streaming Federated Learning with Markovian Data” has been accepted at NeurIPS 2025 San Diego.

Co-authored by Huỳnh Tấn Khiêm (Ph.D. student, Inria), Malcolm Egan (Tenured Research Scientist, Inria), Giovanni Neglia (3IA Chairholder, Researcher, Head of NEO Team, Inria), Jean-Marie Gorce (Research Director, Inria, on secondment from INSA Lyon), it will be presented during San Diego poster session on December 4.

Abstract: Federated learning (FL) is now recognized as a key framework for communication-efficient collaborative learning. Most theoretical and empirical studies, however, rely on the assumption that clients have access to pre-collected data sets, with limited investigation into scenarios where clients continuously collect data. 
In many real-world applications, particularly when data is generated by physical or biological processes, client data streams are often modeled by non-stationary Markov processes. Unlike standard i.i.d. sampling, the performance of FL with Markovian data streams remains poorly understood due to the statistical dependencies between client samples over time. In this paper, we investigate whether FL can still support collaborative learning with Markovian data streams. Specifically, we analyze the performance of Minibatch SGD, Local SGD, and a variant of Local SGD with momentum. We answer affirmatively under standard assumptions and smooth non-convex client objectives: the sample complexity is proportional to the inverse of the number of clients, with a communication complexity comparable to the i.i.d. scenario. However, the sample complexity for Markovian data streams remains higher than for i.i.d. sampling. Our analysis is validated via experiments with real pollution monitoring time series data.

Learn more