Paolo Papotti cosigned a paper accepted at SIGMOD 2025
Research
Published on September 19, 2025–Updated on September 19, 2025
Dates
from June 22, 2025 to June 27, 2025
Location
Berlin, Germany
Paolo Papotti cosigned an article accepted at SIGMOD 2025
We are very pleased to announce that a paper cosigned by Paolo Papotti has been accepted for SIGMOD 2025.
The paper, titled “Logical and Physical Optimizations for SQL Query Execution over Large Language Models”* is cosigned by Dario Satriani (Ph.D. student at University of Basilicata), Enzo Veltri (Ph.D. student at University of Basilicata), Donatello Santoro (Ph.D. student at University of Basilicata), Sara Rosato (data engineer at SLB), Simone Varriale (Master’s degree studend at EURECOM and Politecnico di Torino), and Paolo Papotti (3IA Chairholder and associate professor in the Data Science department at EURECOM).
Abstract:
Interacting with Large Language Models (LLMs) via declarative queries is increasingly popular for tasks like question answering and data extraction, thanks to their ability to process vast unstructured data. However, LLMs often struggle with answering complex factual questions, exhibiting low precision and recall in the returned data.
This challenge highlights that executing queries on LLMs remains a largely unexplored domain, where traditional data processing assumptions often fall short. Conventional query optimization, typically cost-driven, overlooks LLM-specific quality challenges such as contextual understanding. Just as new physical operators are designed to address the unique characteristics of LLMs, optimization must consider these quality challenges. Our results highlight that adhering strictly to conventional query optimization principles fails to generate the best plans in terms of result quality.
To tackle this challenge, we present a novel approach to enhance SQL results by applying query optimization techniques specifically adapted for LLMs. We introduce a database system, GALOIS, that sits between the query and the LLM, effectively using the latter as a storage layer. We design alternative physical operators tailored for LLM-based query execution and adapt traditional optimization strategies to this novel context. For example, while pushing down operators in the query plan reduces execution cost (fewer calls to the model), it might complicate the call to the LLM and deteriorate result quality. Additionally, these models lack a traditional catalog for optimization, leading us to develop methods to dynamically gather such metadata during query execution.
Our solution is compatible with any LLM and balances the trade-off between query result quality and execution cost. Experiments show up to 144% quality improvement over questions in Natural Language and 29% over direct SQL execution, highlighting the advantages of integrating database solutions with LLMs.