Instance-wise distribution control of text-to-image diffusion models

Oct 31, 2025·

Weng Ian Chan

Hiroaki Santo

Yasuyuki Matsushita

Fumio Okura

· 0 min read

Abstract

Text-to-image diffusion models are increasingly used to generate synthetic datasets for downstream vision tasks. However, they often inherit biases from large-scale training data, which can result in unbalanced attribute distributions in the generated images. While prior efforts have attempted to mitigate these biases, most focus on single-object images and struggle to control attributes across object instances in multi-instance generations. To address this limitation, we propose an instance-wise control of the attribute distribution by fine-tuning diffusion models with guidance from a pre-trained object detector and an attribute classifier. Our approach aligns the attribute distribution over object instances in generated images with a user-defined distribution, which enables precise control over attribute proportions at the instance level. Experiments across various objects and attributes demonstrate that our method generates high-quality, multi-instance images that match the specified distribution, supporting the scalable creation of distribution-aware synthetic datasets for in-the-wild vision tasks.

Type

Journal article

Publication

Pattern Recognition

Last updated on Oct 31, 2025

Pattern Recognition Computer Vision Generative AI Diffusion Models

Authors

Fumio Okura

Associate Professor

← Zero-shot hierarchical plant segmentation via foundation segmentation models and text-to-image attention Mar 6, 2026

NeuraLeaf: Neural parametric leaf models with shape and deformation disentanglement Oct 19, 2025 →