Controllable Diffusion Models for Fine-Grained Image Editing via Prompt-Guided Semantic Inpainting

المؤلفون

  • الهام محمد ثابت عبد الامير جامعة كربلاء /كلية علوم الحاسوب وتكنولوجيا المعلومات

الكلمات المفتاحية:

Diffusion models، image editing، semantic imprinting، prompt-guided editing، cross-attention، mask encoder، dynamic prompt tokens

الملخص

Image synthesis has been turned on its head by models that use diffusion, but the majority fail to enable fine-grained editing that is guided by user purpose. PromptEditDiff is a lightning-fast new prompt-based method of performing fine-grained image editing through semantic imprinting. Our model is a synthesis of cross-attention mechanism, specially designed mask encoder and dynamic prompt tokens that allow pre-defined, high precision region-specific modifications in reaction to text prompts. Significant experiments on CelebA-HQ and COCO datasets demonstrate that PromptEditDiff is dramatically better compared to state-of-art baselines both in terms of photorealism and prompt alignment with FID decreasing to 6.2 and the metric which indicates that 84.7% of humans prefer it over the baselines. Objective measurements as well as end user studies make clear that PromptEditDiff can be used to edit images more accurately, more intuitively, and in a more controlled manner- going forward users can easily put this prominence based editing capability of centering around text based visual content.

 

 

التنزيلات

منشور

2025-12-31