Skip to content

Latest commit

 

History

History
130 lines (111 loc) · 4.62 KB

rag.md

File metadata and controls

130 lines (111 loc) · 4.62 KB

🤖 MMORE RAG

💡 TL;DR

The RAG module enables the creation of a modular RAG inference pipeline for your indexed multimodal documents, using two inference modes:

  1. API: Creates a server hosting the pipeline
  2. Local: Runs the inference locally (:warning: might be long when running local models :warning:)

You can customize various parts of the pipeline by defining an inference RAG configuration file.

💻 Minimal Example:

Here is a minimal example to create a RAG pipeline hosted through LangServe servers.

  1. Create your RAG Inference config file

    # RAG Config
    rag: 
        # LLM Config
        llm: 
            llm_name: "gpt-4o-mini" # Anything supported
            max_new_tokens: 100
            temperature: 0.8
        # Retriever Config
        retriever:
            db:
            uri: ./proc_demo.db
            hybrid_search_weight: 0.5
            k: 5
        # Prompt Args
        system_prompt: "Answer the question using the context.\n\nContext: {context}"
    # Mode Config
    mode: api
    mode_args:
        endpoint: '/rag'
        port: '8000'
        host: 'localhost'
  2. Start your RAG pipeline using the run_rag.py script and your config file

    python src/mmore/rag/run_rag.py --config_file /path/to/config.yaml
  3. Query the server like any other LangServe server

    curl --location --request POST http://localhost:8000/rag/invoke \
    -H 'Content-Type: application/json' \
    -d '{
        "input": {
            "input": "What is Meditron?",
            "collection_name": "med_docs"
        }
    }'
    curl --location --request GET http://localhost:8000/rag/input_schema \
    -H 'Content-Type: application/json' 

See examples/rag for other use cases.

🔍 Modules

The RAG decomposes into two main modules:

  1. The Retriever, which retrieves multimodal documents from the database.
  2. The LLM, which wraps different types of multimodal-able LLMs.

Retriever

Here is an example on how to use the retriever module alone. Note that it assumes that you already created a DB using the indexer module.

  1. Create a config:

    db:
      uri: ./demo.db
      name: db_name
    hybrid_search_weight: 0.5
    k: 5
  2. Retrieve on the vector store using the Retriever class:

    from src.mmore.rag.retriever import Retriever
    
    # Create the Retriever
    retriever = Retriever.from_config('/path/to/your/retriever_config.yaml')
    
    # Retrieves the top 3 documents using an hybrid approach (e.g. dense + sparse embeddings)
    retriever.retrieve(
        'What is Meditron?',
        k=3,
        collection_name="my_docs",
        search_type="hybrid"  # Options: "dense", "sparse", "hybrid"
    )

LLM

Here is an example on how to use the LLM module alone. Note that it assumes that you already created a DB using the indexer module.

  1. Create a config file:

    llm_name: gpt-4o-mini
    max_new_tokens: 150
    temperature: 0.7
  2. Query the LLM:

    from src.mmore.rag.llm import LLM
    
    # Create the LLM
    llm = LLM.from_config('/path/to/your/llm_config.yaml')
    
    # Create your messages
    messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love Meditron."),
    ]
    
    # Retrieves the top 3 documents using an hybrid approach (e.g. dense + sparse embeddings)
    llm.invoke(messages)

🔧 Customization

Our RAG pipeline is built to take full advantage of LangChain abstractions, providing compatibility with all components offered.

Retriever

Our retriever is a LangChain BaseRetriever. If you want to create a custom retriever (e.g. GraphRetriever,...) you can simply make it inherit from this class and use it as described in our examples.

LLM

Our LLMs are LangChain's BaseChatModel base class. If you want to create a custom retriever you can simply make it inherit from this class and use it as described in our examples.

⚠️ Note that we support HuggingFace Hub models, so a simpler solution is to push a model to the hub and use the class as defined.