Real-Time Text-to-Image synthesis with LCM-LoRA

Posted by : Junseo Park on Mar 1, 2024

Category : project

🚀 "Provides text-to-image synthesis in real-time
through LCM-LoRA" 🌟

The Latent Diffusion Model (LDM) has brought innovation to reliable image generation. Nevertheless, the inherently slow sampling process of diffusion models hinders real-time generation, negatively impacting user experience.
Efforts to accelerate LDM generally fall into two categories:
1. Utilizing advanced ODE solvers such as DDIM and DPMSolver, which drastically reduce the $1,000$ time steps of DDPM.
2. Distilling LDM.

The slow generation speed of traditional LDMs makes it challenging to satisfy user experience.
Deploying a site to test user satisfaction requires significant time and resources.

Introducing LCM-LoRA into LDM (SDXL) to enable near real-time image synthesis:
- The recently introduced Latent Consistency Model (LCM), inspired by the Consistency Model (CM), allows for the application of CM in latent space.
  - It can create an origin from any point on the ODE path.
  - This enables efficient synthesis of high-resolution images with as few as $1$ to $4$ time steps.
- LCM-LoRA acts as an adapter that can be attached to existing LDMs, reducing training costs.
Building a website using Flask and ChatGPT.

As demonstrated in the demo video on Github, the model generates images in response to user text input in near real-time.
Although not perfectly real-time due to communication delays between the server and user and the time required to save images, the experience is significantly faster and more satisfying than before.
- Feedback was collected from acquaintances to evaluate user satisfaction.