Diffusion-Based Quality Control of Medical Image Segmentations across Multiple Organs
Abstract: Medical image segmentation, powered by deep learning, has revolutionized automated analysis pipelines for large-scale population studies. However, state-of-the-art methods are prone to hallucinations that lead to anatomically implausible segmentations. With manual correction impractical at scale, automated quality control (QC) techniques have emerged to address the challenge. While promising, existing QC methods are designed for handling a single organ, which restricts their generalizability across different applications. To overcome this limitation, we propose no-new Quality Control (nnQC), a robust QC framework based on a diffusion-generative paradigm that self-adapts to any input organ dataset. Central to nnQC is a novel Team of Experts (ToE) architecture, where two specialized experts independently process an image and its predicted segmentation, generating a pair of independent embeddings, or opinions. A weighted conditional module combines the opinions to guide the sampling within a diffusion process, enabling the accurate generation of a spatially-aware pseudo-ground truth (pGT) used to predict QC scores. We evaluated nnQC on seven organs using publicly available datasets. By adapting the network through extracted dataset information, or fingerprints, and leveraging the proposed ToE framework, our results demonstrate that nnQC consistently outperforms state-of-the-art methods across all experiments, including cases where segmentation masks are highly degraded or completely missing, confirming it as a versatile and off-the-shelf QC solution across different organs.