Dataset Gallery

ConsistCompose3M

A 3.4M-sample multimodal dataset for layout-grounded image composition with layout and identity annotations.

← Project Page 📄 Paper 🤗 Model 💻 Code 📂 Dataset 📑 BibTeX

Overview

Data Statistics

Text2Image

Single-Reference

Multi-Reference

Category 1

Layout Text to Image

Category 2

Single Subject

Subject → Background → ROI → Output

Category 2 — Full Wall

Single Subject Image Wall

Category 3

Multi-subject

Sub 1 → Sub 2 → Background → ROI → Output

Category 3 — Full Wall

Multi-subject Image Wall (FLUX)

Category 3 — Qwen

Multi-subject Image Wall (Qwen Image)

Citation

BibTeX

If you find our work useful, please cite:

@article{shi2025consistcompose,
  title={ConsistCompose: Unified Multimodal Layout Control for Image Composition},
  author={Shi, Xuanke and Li, Boxuan and Han, Xiaoyang and Cai, Zhongang and
          Yang, Lei and Lin, Dahua and Wang, Quan},
  journal={arXiv preprint arXiv:2511.18333},
  year={2025}
}