Why it takes so long to run PropertyGraphIndex with Llama 3 and a multilingual embedding model? #13944

hoangcuongnguyen2001 · 2024-06-05T06:52:12Z

hoangcuongnguyen2001
Jun 5, 2024

I am trying to deploy PropertyGraphIndex, similar to this example: Property Graph Index - LlamaIndex. The only 2 things that I did different with this example are:

For LLM, I chose a 4-bit quantized version of Llama3-8B-Instruct:

from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name="meta-llama/Meta-Llama-3-8B-Instruct",
    context_window=8192,
    max_new_tokens=256,
    model_kwargs={
        "token": hf_token,
        "torch_dtype": torch.bfloat16,  # comment this line and uncomment below to use 4bit
        "quantization_config": quantization_config
    },
    generate_kwargs={
        "do_sample": True,
        "temperature": 0.1,
        "top_p": 0.3,

    },
    tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct",
    tokenizer_kwargs={"token": hf_token},
    stopping_ids=stopping_ids,
)

For embedding model, I deployed intfloat/multilingual-e5-base from HuggingFace:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-base", trust_remote_code=True)

However, rather than getting the graph in seconds as in the example with GPT-3.5-turbo, after 10 minutes of running code, the pipeline could not produce any results before being interrupted.

So, why it took so long to run the graph in my case, and what does it mean for the choice of LLM and embedding models in PropertyGraphIndex? Does that mean I have to choose OpenAI models, rather than open-source ones, for this task?

@hoangcuongnguyen2001 · 2024-06-05T07:01:53Z

dosubot[bot]
bot Jun 5, 2024

@hoangcuongnguyen2001 Xin chào! Tôi là một bot có thể giúp bạn giải quyết lỗi, trả lời câu hỏi và trở thành một người đóng góp. Tôi ở đây để giúp bạn trong khi đợi người duy trì mã nguồn mở.

The long runtime for running the graph in your case could be due to the choice of embedding model. In the provided code, the HuggingFaceEmbedding model is used:

embed_model = HuggingFaceEmbedding(model_name="intfloat/multilingual-e5-base", trust_remote_code=True)

HuggingFace models, especially large ones, can be slower compared to OpenAI's embedding models. If performance is a critical factor, you might want to switch to an OpenAI embedding model, which is generally optimized for speed and efficiency. OpenAI offers a variety of models that you can choose from, such as:

# Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")

Switching to an OpenAI model could potentially reduce the runtime for running the graph. However, this does not necessarily mean you have to choose OpenAI models over open-source ones; it depends on your specific requirements for speed versus other factors like cost or model openness.

Additionally, the PropertyGraphIndex class in the LlamaIndex framework supports asynchronous transformations and embedding operations, which can also impact performance. Ensure that the use_async parameter is set appropriately:

use_async: bool = True

This setting can help optimize the runtime by leveraging asynchronous processing. OpenAI's embedding models, such as text-embedding-ada-002, text-embedding-3-large, and text-embedding-3-small, are designed to be efficient and can be a good choice if you need faster performance.

The LlamaIndex framework provides various OpenAI embedding models, including:

text-embedding-ada-002
text-embedding-3-large
text-embedding-3-small
text-similarity-davinci-001
text-search-davinci-query-001
text-search-davinci-doc-001
text-similarity-curie-001
text-search-curie-query-001
text-search-curie-doc-001
text-similarity-babbage-001
text-search-babbage-query-001
text-search-babbage-doc-001
text-similarity-ada-001
text-search-ada-query-001
text-search-ada-doc-001

These models are designed to be efficient and can be a good choice if you need faster performance ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

0 replies

SuroshAhmadZobair · 2024-06-10T06:15:38Z

SuroshAhmadZobair
Jun 10, 2024

@hoangcuongnguyen2001
Hi
Have you found a solution ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why it takes so long to run PropertyGraphIndex with Llama 3 and a multilingual embedding model? #13944

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Why it takes so long to run PropertyGraphIndex with Llama 3 and a multilingual embedding model? #13944

hoangcuongnguyen2001 Jun 5, 2024

Replies: 2 comments

dosubot[bot] bot Jun 5, 2024

SuroshAhmadZobair Jun 10, 2024

hoangcuongnguyen2001
Jun 5, 2024

dosubot[bot]
bot Jun 5, 2024

SuroshAhmadZobair
Jun 10, 2024