This project's origin is here.
The Multi-Modal Text/Image Search using CLIP project revolutionizes search capabilities by integrating CLIP technology, allowing users to search for images using natural language descriptions. Built on Weaviate, it supports multi-modal searches, combining text and images effortlessly. Users can describe images or provide images directly for contextual searches. The system is user-friendly, with a customizable interface and support for various image formats, ensuring a seamless and intuitive experience.
This example application spins up a Weaviate instance using the multi2vec-clip module, imports a few sample images (you can add your own images, too!) and provides a very simple search frontend in Python using Flask
Model Credits: This demo uses the ckip-ViT-B32-multilingual-v1 model from SBERT.net. Shoutout to Nils Reimers and his colleagues for the great Sentence Transformers models.
- Docker & Docker-Compose: Required to set up the Weaviate instance
- Bash: Necessary for executing the provided setup scripts.
- Python and pip: Frontend is implemented in python, pip is needed to install
requirements.txt
- Run Docker on your machine
- Run the
start.sh
script:$ bash start.sh
- Open Browser :
http://localhost:5000
- To stop the server press: CTRL + C
- Use
stop.sh
script when finished:$ bash stop.sh
Simply add your images to the ./images
folder prior to running the import
script. The script looks for .jpg
file ending, but Weaviate supports other
image types as well, you can adopt those if you like.
The images used in this demo are licensed as follows:
- Photo by Michael on Unsplash
- Photo by Bas Peperzak on Unsplash
- Photo by David Köhler on Unsplash
- Photo by eggbank on Unsplash
- Photo by John McArthur on Unsplash
It is a minimal example using only 5 images, but you can add any amount of images yourself!