Skip to content

Commit

Permalink
Add a guide to explain how to choose a model.
Browse files Browse the repository at this point in the history
  • Loading branch information
qdequele committed Sep 2, 2024
1 parent fcf2f4e commit 70800f6
Show file tree
Hide file tree
Showing 2 changed files with 159 additions and 0 deletions.
5 changes: 5 additions & 0 deletions config/sidebar-guides.json
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@
"title": "Artificial Intelligence",
"slug": "ai",
"routes": [
{
"source": "guides/choosing-model.mdx",
"label": "Choosing the best model for semantic search",
"slug": "choosing_model"
},
{
"source": "guides/langchain.mdx",
"label": "Implementing semantic search with LangChain",
Expand Down
154 changes: 154 additions & 0 deletions guides/choosing-model.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Choosing the best model and service for semantic search

## Introduction

Semantic search is transforming search technology by providing more accurate and relevant results. Using Meilisearch, developers can implement semantic search to enhance user experience. However, with many models and services available, choosing the right one can be challenging. This guide will help you understand the key factors to consider when selecting the best model and service for your semantic search needs.

In that guide we will explore the following models and services:
| Model/Service | Dimensions | Context Length |
|--------------------------------------------------------------------------------------------------------|----------------|------------|
| [Cohere embed-english-v3.0](https://docs.cohere.com/docs/embeddings) | 1024 | 512 |
| [Cohere embed-english-light-v3.0](https://docs.cohere.com/docs/embeddings) | 384 | 512 |
| [Cohere embed-multilingual-v3.0](https://docs.cohere.com/docs/embeddings) | 1024 | 512 |
| [Cohere embed-multilingual-light-v3.0](https://docs.cohere.com/docs/embeddings) | 384 | 512 |
| [OpenAI text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) | 1536 | 8192 |
| [OpenAI text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings) | 3072 | 8192 |
| [Mistral](https://mistral.ai/product/embeddings/) | 1024 | 8192 |
| [VoyageAI voyage-2](https://voyageai.com/) | 1024 | 4000 |
| [VoyageAI voyage-large-2](https://voyageai.com/) | 1536 | 16000 |
| [VoyageAI voyage-multilingual-2](https://voyageai.com/) | 1024 | 32000 |
| [Jina Colbert v2](https://jina.ai/news/jina-colbert-v2-multilingual-late-interaction-retriever-for-embedding-and-reranking/) | 128, 96, or 64 | 8192 |
| [OSS all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 384 | 512 |
| [OSS bge-small-en-v1.5](https://huggingface.co/Cloudflare/bge-small-en-v1.5) | 1024 | 512 |
| [OSS bge-large-en-v1.5](https://huggingface.co/Cloudflare/bge-large-en-v1.5) | 1536 | 512 |

Disclaimer: All the tests have been done with the Build plan of Meilisearch. Obviously, the performance will be different with the Pro plan and even more so for the Enterprise plan.

## Factors to consider

### 1. Relevancy: striking the perfect balance

Relevancy is crucial for effective search, as it ensures that users find the most pertinent results quickly. In the realm of semantic search, achieving a balance between relevancy and speed is essential to provide a seamless user experience.

Meilisearch's hybrid search can significantly enhance relevancy by combining the strengths of both full-text search and semantic search. When users perform precise queries, Meilisearch's Full Text search often yields better results than Semantic Search alone. By integrating both methods, you can leverage the advantages of each, offering users the best of both worlds.

When selecting a model, consider your specific use case, such as the need for multilingual support, handling multi-modal data, or addressing domain-specific requirements. If you have a highly specialized use case or need to support a particular language, it may be beneficial to explore models that can be trained on your data or opt for multilingual models.

The performance difference between a very small model and a large model is not always substantial. Smaller models are generally less expensive and faster, making them a practical choice in many scenarios. Therefore, it is often worth considering smaller models for their cost-effectiveness and speed.

Additionally, fine-tuning Meilisearch's auto-embedder template can further enhance relevancy. The more accurately the template is tailored to your specific needs, the better the search results will be, leading to a more satisfying user experience.

### 2. Search performance: lightning-fast results

In today's fast-paced digital landscape, users expect instant gratification. Providing an "as-you-type" search experience can greatly enhance user satisfaction and keep them engaged with your platform. To achieve lightning-fast search performance, consider using a local model to minimize latency by eliminating the need for round trips to the model service.

If a remote model is necessary, hosting the model in close proximity to the Meilisearch engine, such as on AWS, can significantly reduce latency. The table below showcases latency benchmarks for various models and services, empowering you to make an informed decision based on your performance requirements:

| Model/Service | Latency |
|--------------------------------------------------------------------------------------------------------|------------|
| [Cloudflare bge-small-en-v1.5](https://huggingface.co/Cloudflare/bge-small-en-v1.5) | ±800ms |
| [Cloudflare bge-large-en-v1.5](https://huggingface.co/Cloudflare/bge-large-en-v1.5) | ±500ms |
| [Cohere embed-english-v3.0](https://docs.cohere.com/docs/embeddings) | ±170ms |
| [Cohere embed-english-light-v3.0](https://docs.cohere.com/docs/embeddings) | ±160ms |
| [Local gte-small](https://huggingface.co/sentence-transformers/gte-small) | ±20ms |
| [Local all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | ±10ms |
| [Local bge-small-en-v1.5](https://huggingface.co/Cloudflare/bge-small-en-v1.5) | ±20ms |
| [Local bge-large-en-v1.5](https://huggingface.co/Cloudflare/bge-large-en-v1.5) | ±60ms |
| [Mistral](https://mistral.ai/product/embeddings/) | ±200ms |
| [Jina colbert](https://huggingface.co/jinac/cobert-v1) | ±400ms |
| [OpenAI text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) | ±460ms |
| [OpenAI text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings) | ±750ms |
| [VoyageAI voyage-2](https://voyageai.com/) | ±350ms |
| [VoyageAI voyage-large-2](https://voyageai.com/) | ±400ms |

Here you can see that there are some clear winners in terms of latency. Unfortunately, because latency is not the same as throughput, we also have to take a close look at the indexing time.

### 3. Indexation performance: efficient and scalable

Indexation performance is another critical aspect to consider when choosing a model and service for semantic search. The speed at which your data can be indexed directly impacts the overall efficiency and scalability of your search solution. Local models without GPUs may have slower indexation speeds due to limited processing power, while third-party services offer varying speeds and limitations based on their infrastructure and service agreements. It is essential to evaluate these factors to ensure that your chosen model and service can handle your data volume and indexing requirements effectively, providing a seamless and efficient search experience.

Meilisearch intelligently auto-batches embeddings to optimize performance, which helps in managing the workload and improving the overall indexing speed. This batching process ensures that the system can handle large volumes of data more efficiently. Additionally, Meilisearch automatically retries the call in case of rate limiting, which helps to maintain the indexing process without significant interruptions. However, this process may still be subject to rate limits or slowdowns imposed by the service provider, which can affect the consistency and reliability of the indexing performance. Understanding these limitations and planning accordingly can help mitigate potential bottlenecks and ensure a smoother indexing process.

When considering indexation performance, it is crucial to take into account several factors. These include the localization of the model for low latency, which can significantly reduce the time taken for data to travel between your application and the model service. Additionally, the size of the call you can make to the API, the rate limiting imposed by the service provider, and the number of dimensions supported by the model are all important aspects that can influence the efficiency and scalability of your indexing process.

To give you a better understanding of indexation performance, we've compiled benchmarks for 10k e-commerce documents (indexing for full-text search and filters took 7s):

| Model/Service | Indexation Time |
|--------------------------------------------------------------------------------------------------------|-----------------|
| [Cohere embed-english-v3.0](https://docs.cohere.com/docs/embeddings) | 43s |
| [Cohere embed-english-light-v3.0](https://docs.cohere.com/docs/embeddings) | 16s |
| [OpenAI text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) | 95s |
| [OpenAI text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings) | 151s |
| [Cloudflare bge-small-en-v1.5](https://huggingface.co/Cloudflare/bge-small-en-v1.5) | 152s |
| [Cloudflare bge-large-en-v1.5](https://huggingface.co/Cloudflare/bge-large-en-v1.5) | 159s |
| [Jina Colbert V2](https://huggingface.co/jinac/cobert-v1) | 375s |
| [VoyageAI voyage-large-2](https://voyageai.com/) | 409s |
| [Mistral](https://mistral.ai/product/embeddings/) | 409s |
| [Local all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 880s |
| [Local bge-small-en-v1.5](https://huggingface.co/Cloudflare/bge-small-en-v1.5) | 3379s |
| [Local bge-large-en-v1.5](https://huggingface.co/Cloudflare/bge-large-en-v1.5) | 9132s |

### 4. Price: balancing cost and performance

While local embedders are free, most services charge per million tokens. Here's a breakdown of the pricing for each platform:

- [Cohere](https://docs.cohere.com/docs/pricing):
- $0.10 per million tokens
- [OpenAI](https://openai.com/pricing):
- $0.13 per million tokens for text-embedding-3-large
- $0.02 per million tokens for text-embedding-3-small
- [Cloudflare](https://developers.cloudflare.com/workers-ai/platform/pricing/):
- $0.011/1,000 Neurons
- [Jina](https://jina.ai/pricing/):
- $0.18 per million tokens
- [Mistral](https://mistral.ai/product/embeddings/):
- $0.10 per million tokens
- [VoyageAI](https://voyageai.com/pricing):
- $0.10 per million tokens for voyage-2
- $0.12 per million tokens for voyage-large-2
- $0.12 per million tokens for voyage-multilingual-2
- Local:
- Free, but for optimal performance or high volume, contact the sales team for more performance options.

As your search needs grow and scale, it may become more cost-effective to invest in your own GPU machine. This allows you to generate embeddings using tools like Ollama or a REST configuration in Meilisearch. By having your own hardware, you can potentially reduce long-term costs and have greater control over the performance and scalability of your search solution.

Initially, it is often best to start with a common model from the list provided. These models are typically well-documented and widely used, making them easier to implement without the need for extensive training. As you become more familiar with the model and its capabilities, you can consider migrating it to a cloud service like AWS. Many services offer this option, allowing you to leverage their infrastructure for improved performance and scalability. Alternatively, you can run an equivalent open-source model on your own hardware, giving you even more flexibility and control over your search solution in the long term.

Additionally, it could be worth contacting our sales team. They have the expertise to help you find the best solution tailored to your specific needs. Whether it's optimizing your current setup or exploring new options, our team can provide valuable insights and assistance to ensure you get the most out of your search solution.

## Need more exploration

While this article provides a comprehensive overview, we did not delve deeply into optimization techniques. There are several additional optimizations that can be explored to further enhance the performance of semantic search with Meilisearch:

- Experiment with different presets (query vs. document) for models that offer this option to potentially improve relevancy.
- Evaluate specialized models for specific applications to assess their performance and suitability for your use case.
- Explore models that provide a reranking function to further refine search results.
- Test higher-tier accounts on each platform to check for improved performance and reduced rate limiting.
- Investigate parameters for receiving quantized data directly from the API to optimize data transfer and processing.

## Conclusion

| Model/Service | Dimensions | Context Length | Latency | Indexation Time | Pricing (per million tokens) |
|--------------------------------------------------------------------------------------------------------|------------|----------------|------------|-----------------|------------------------------|
| [Cohere embed-english-v3.0](https://docs.cohere.com/docs/embeddings) | 1024 | 512 | ±170ms | 43s | $0.10 |
| [Cohere embed-english-light-v3.0](https://docs.cohere.com/docs/embeddings) | 384 | 512 | ±160ms | 16s | $0.10 |
| [OpenAI text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) | 1536 | 8192 | ±460ms | 95s | $0.02 |
| [OpenAI text-embedding-3-large](https://platform.openai.com/docs/guides/embeddings) | 3072 | 8192 | ±750ms | 151s | $0.13 |
| [Mistral](https://mistral.ai/product/embeddings/) | 1024 | 8192 | ±200ms | 409s | $0.10 |
| [VoyageAI voyage-2](https://voyageai.com/) | 1024 | 4000 | ±350ms | 330s | $0.10 |
| [VoyageAI voyage-large-2](https://voyageai.com/) | 1536 | 16000 | ±400ms | 409s | $0.12 |
| [Jina Colbert v2](https://jina.ai/news/jina-colbert-v2-multilingual-late-interaction-retriever-for-embedding-and-reranking/) | 128, 96, or 64 | 8192 | ±400ms | 375s | $0.18 |
| [OSS all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 384 | 512 | ±10ms | 880s | Free (contact sales for performance options) |
| [OSS bge-small-en-v1.5](https://huggingface.co/Cloudflare/bge-small-en-v1.5) | 1024 | 512 | ±20ms | 3379s | Free (contact sales for performance options) |
| [OSS bge-large-en-v1.5](https://huggingface.co/Cloudflare/bge-large-en-v1.5) | 1536 | 512 | ±60ms | 9132s | Free (contact sales for performance options) |

Choosing the right model and service for semantic search with Meilisearch involves carefully balancing several key factors: relevancy, search performance, indexation performance, and cost. Each option presents its own set of trade-offs:

- Cloud-based services like Cohere and OpenAI offer excellent relevancy and reasonable latency, with Cohere's embed-english-light-v3.0 standing out for its balance of speed and performance.
- Local models provide the fastest search latency but may struggle with indexation speed on limited hardware.
- Emerging services like Mistral and VoyageAI show promise with competitive pricing and performance.
- Open-source models offer cost-effective solutions for those willing to manage their own infrastructure.

Ultimately, the best choice depends on your specific use case, budget, and performance requirements. For many applications, starting with a cloud-based service like Cohere or OpenAI provides a good balance of ease of use, performance, and cost. As your needs grow, consider exploring local or specialized models, or contact Meilisearch's sales team for tailored solutions.

Remember that optimizing your search experience goes beyond model selection. Fine-tuning Meilisearch's hybrid search capabilities, customizing embedding templates, and exploring advanced features can significantly enhance your semantic search implementation.

0 comments on commit 70800f6

Please sign in to comment.