Aberrant Crypt Foci (ACFs) are considered early markers of colorectal neoplasms, yet their segmentation remains challenging due to limited annotated data and large lesion variability. We propose a novel diffusion-based synthetic augmentation framework to generate realistic ACF images under diverse artefact conditions and shapes. By conditioning a denoising diffusion probabilistic model on multi-class masks (blood, dye, and waste), and depth cues, our method expands both morphological and contextual variability in small ACF datasets. We evaluate performance gains on five segmentation architectures: three convolutional networks (U-Net, U-Net++, and DeepLab V3+) and two models with transformer backbones (TransNetR, PVT CASCADE). In our experiments, the less complex CNN models achieve the most substantial boosts in test Dice (e.g., +22.1% for U-Net), while DeepLab V3+ sees an +8.4% gain. The transformer-backbone architectures also benefit, with improvements of +0.6% for TransNetR and +2.7% for PVT-CASCADE. Qualitative assessments confirm that the diffusion-generated images replicate the annotated artefacts and reduce the performance drop that typically occurs when training with limited real data. Our results indicate that CtrlEndoDiff improves ACF segmentation accuracy by adding visually realistic synthetic samples into the training process.
Chapter
2026-01-01T00:00:00+00:00
16128 LNCS
319 - 329
10