Text to Image Conversion using Stable Diffusion
Ashly Correya1, Amrutha N2
1Ashy Correya, Department of Computer Science, St. Albert’s College, Kochi (Kerala), India.
2Amrutha N, Department of Computer Science, St. Albert’s College, Kochi (Kerala), India.
Manuscript received on 25 April 2024 | Revised Manuscript received on 04 May 2024 | Manuscript Accepted on 15 May 2024 | Manuscript published on 30 May 2024 | PP: 17-20 | Volume-4 Issue-1 May 2024 | Retrieval Number: 100.1/ijdm.A163904010524 | DOI: 10.54105/ijdm.A1639.04010524
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Published by Lattice Science Publication (LSP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In this paper, we introduce a pioneering technique for translating textual descriptions into visually compelling images using stable diffusion methods, with a particular emphasis on the latent diffusion model (LDM). Our approach represents a departure from conventional methods like Generative Adversarial Networks (GANs) and AttnGAN, offering enhanced accuracy and diversity in the generated images. Through extensive experimentation and comparative analysis, we validate the efficacy of our method. Leveraging the LAION-5B dataset, we fine-tune the stable diffusion model, resulting in superior performance in text-to-image conversion tasks. Our findings underscore substantial advancements in accuracy, showcasing the promise of stable diffusion-based approaches across a spectrum of applications. By embracing stable diffusion techniques, we overcome some of the limitations encountered in previous methodologies. This enables us to achieve a higher fidelity in image generation while maintaining a diverse output spectrum. Our method excels in capturing intricate details and nuances specified in textual descriptions, facilitating a more faithful translation from text to image. The significance of our work extends beyond mere technical improvements. By pushing the boundaries of image synthesis, we contribute to the evolution of artificial intelligence, fostering new possibilities for creative expression and content generation. Our approach not only enhances the capabilities of AI systems but also democratizes the process of image creation, empowering users to effortlessly translate their ideas into visually stunning representations. Through our research, we aim to inspire further exploration and innovation in the realm of text-to-image conversion. The success of stable diffusion-based methods underscores their potential to revolutionize various domains, including computer vision, graphic design, and multimedia content creation. As we continue to refine and optimize these techniques, we anticipate even greater strides in the field of AI, ushering in a new era of intelligent image synthesis and interpretation.
Keywords: Text-to-Image Conversion, Stable Diffusion, Latent Diffusion Model, Fine-Tuning, LAION-5B Dataset.
Article of the Scope: Data Science