Reference Point-Based Interpolation of CLIP Embeddings for Controlling Text-To-Image Generation

Abstract

This thesis investigates interpolation methods within the context of text- to-image generation, focusing on the latent space of the CLIP (Contrastive Language-Image Pretraining) model. Our work explores the effectiveness of linear interpolation (lerp) and spherical linear interpolation (slerp) in generat- ing coherent and smooth transitions between text prompts. Results indicate that slerp outperforms lerp, particularly with complex prompts, by producing more visually coherent images. Additionally, a novel reference-based interpo- lation method is introduced, leveraging cosine similarity to guide the inter- polation path through the latent space. While manual selection of reference points demonstrated improved interpolation quality, automatic selection meth- ods showed varying levels of success. Despite these advancements, limitations related to dataset quality and the initial embeddings were identified, highlighting areas for future research. The findings contribute to the broader understanding of interpolation methods in multimodal AI, offering insights into sampling from text-to-image generation models

Structure

00 interpolates CLIP embeddings using lerp and slerp and generates images from the interpolated embeddings
01 evaluates the generated images from 00\
02 uses manual selected prompts to improve the interpolation quality\
03 evaluates the generated images from 02
04 for experiments with the EOT Token of CLIP
05 selects the reference points for the interpolation automatically
06 evaluates the generated images from 05

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ba_utils		ba_utils
data		data
gen_out/fid		gen_out/fid
plt_out		plt_out
.gitignore		.gitignore
00_genarate_p1.ipynb		00_genarate_p1.ipynb
00_generate_p2.ipynb		00_generate_p2.ipynb
00_generate_p3.ipynb		00_generate_p3.ipynb
00_generate_p4.ipynb		00_generate_p4.ipynb
00_generate_p5.ipynb		00_generate_p5.ipynb
01_evaluate_qualitative.ipynb		01_evaluate_qualitative.ipynb
01_evaluate_quantitative.ipynb		01_evaluate_quantitative.ipynb
02_manual_reference_p4.ipynb		02_manual_reference_p4.ipynb
02_manual_reference_p5.ipynb		02_manual_reference_p5.ipynb
03_evaluate_manual_reference_p4.ipynb		03_evaluate_manual_reference_p4.ipynb
03_evaluate_manual_reference_p5.ipynb		03_evaluate_manual_reference_p5.ipynb
04_cluster.ipynb		04_cluster.ipynb
05_automatic_references.ipynb		05_automatic_references.ipynb
06_evaluate_automatic_reference_qualitative.ipynb		06_evaluate_automatic_reference_qualitative.ipynb
06_evaluate_automatic_reference_quantitative.ipynb		06_evaluate_automatic_reference_quantitative.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reference Point-Based Interpolation of CLIP Embeddings for Controlling Text-To-Image Generation

Abstract

Structure

Some Results

using slerp and reference points

using slerp and reference points

About

Releases

Packages

Languages

traberph/CLIP-Interpolation

Folders and files

Latest commit

History

Repository files navigation

Reference Point-Based Interpolation of CLIP Embeddings for Controlling Text-To-Image Generation

Abstract

Structure

Some Results

using slerp and reference points

using slerp and reference points

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages