Recent work has shown various interesting semantic image manipulation methods based on GAN guided by text descriptions. A method based on GAN inversion can achieve versatile image manipulation functions without a time-consuming preprocessing stage. However, the method suffers from a lack of self-adaptation due to the intrinsic conflict between multi-objective losses. Meanwhile, the method applied in image manipulation guided by text conditions is not robust due to the vast and ambiguous search space. To solve the above problems, we propose a novel framework RAIN based on GAN inversion, which can achieve robust and adaptive text-driven image manipulation. As shown in Fig. 1(c), RAIN contains two main parts: CEV Initialization and RAGAN inversion. CEV Initialization can adaptively provide a Candidate Editing Vector (CEV) in a short time. RGAN inversion is a multi-stage optimization scheme utilizing the CEV as prior knowledge to prune search space. In RAGAN inversion, we explore how to improve the vision-language model's perception capability to restrict search space further. The objective of the paper is guaranteeing semantic correctness and image quality in a time-constrained scenario compared to the SOTA image manipulation methods guided by text descriptions. Extensive experiments show that RAIN can manipulate images guided by text description while meeting robustness and self-adaptation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.