A 3.4M-sample multimodal dataset for layout-grounded image composition with layout and identity annotations.
If you find our work useful, please cite:
@article{shi2025consistcompose,
title={ConsistCompose: Unified Multimodal Layout Control for Image Composition},
author={Shi, Xuanke and Li, Boxuan and Han, Xiaoyang and Cai, Zhongang and
Yang, Lei and Lin, Dahua and Wang, Quan},
journal={arXiv preprint arXiv:2511.18333},
year={2025}
}