Awsome Ingestion

Introduction

The purpose of this repository is that we demonstrate how to ingest data from various sources into Chroma vector storage. We combine different approaches using the Llama-Index and LangChain. The focus is not on determining the superiority of algorithms or libraries; rather, it serves as a demonstration of data ingestion into Chroma. Additionally, there are RAG classes to generate descriptions for different indices.

Powered By

🦜️🔗 LangChain
🦙 LlamaIndex
Streamlit
Ollama

Awesome LLMs applications and experiments

A repository demonstrating various approaches for RAG, Chunking, and other LLM-related algorithms.

Chunking Principle

Chunking Necessity: Vector databases need documents split into chunks for retrieval and prompt generation.
Query Result Variability: The same query will return different content depending on how the document is chunked.
Even Size Chunks: The easiest way is to split the document into roughly even size chunks. This can result in similar content getting split across chunks.
Chunking by Atomic Elements: By identifying atomic elements, you can chunk by combining elements rather than splitting raw text.
- Results in more coherent chunks
- Example: combining content under the same section header into the same chunk.

RAG

In the experiment, RAG is used in many places, such as search, chat, and smart update of description. There are various implementations for RAG, in the rags package. They may not all be used, but can be frequently used as replacements.

Setup

Conda

conda env create -n machinelearning -f environment.yml
conda activate machinelearning

Pip

pip install -r requirements.txt

Run

streamlit run app.py --server.port 8011 --server.enableCORS false

References

Precise Zero-Shot Dense Retrieval without Relevance Labels (Hypothetical Document Embeddings(HyDE))

Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation

code

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Dense X Retrieval: What Retrieval Granularity Should We Use?

Multi or Subqueries

Ollama setup

You must install Ollama to active the local models.

Check the ollama_option.json file to turn on the local models.
Check llms.py to find the local models.

Model list

Embeddings

Check file embeddings.py to find the embeddings models.

LLMs

Check file llms.py to find the LLMs models.

Model config

Different feature can use different models and we differentiate them in different parts.

Title	init.py
ingestion (chunking and indexing)	`__init__.py`
search	`__init__.py`
chat	`__init__.py`
readme	`__init__.py`
dashboard	`__init__.py`

Key exports for LLMs and Embeddings

export SERPAPI_API_KEY="e7945........."
export OPENAI_API_KEY="sk-........."
export GROQ_API_KEY="gsk_........."
export ANTHROPIC_API_KEY="sk-ant-........."
export LANGCHAIN_API_KEY="ls__........."
export NVIDIA_API_KEY="nvapi-........."
export HUGGING_FACE_TOKEN="hf_........."
export COHERE_API_KEY="zFiHtBT........."
export CO_API_KEY="zFiHtBT........."

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
assets		assets
knowledge_center		knowledge_center
tools		tools
.gitignore		.gitignore
README.md		README.md
app.py		app.py
environment.yml		environment.yml
ollama_option.json		ollama_option.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awsome Ingestion

Introduction

Powered By

Awesome LLMs applications and experiments

Chunking Principle

RAG

Setup

Conda

Pip

Run

References

Precise Zero-Shot Dense Retrieval without Relevance Labels (Hypothetical Document Embeddings(HyDE))

Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Dense X Retrieval: What Retrieval Granularity Should We Use?

Multi or Subqueries

Ollama setup

Model list

Embeddings

LLMs

Model config

Key exports for LLMs and Embeddings

Star History

About

Releases

Packages

Languages

XinyueZ/knowledge-center

Folders and files

Latest commit

History

Repository files navigation

Awsome Ingestion

Introduction

Powered By

Awesome LLMs applications and experiments

Chunking Principle

RAG

Setup

Conda

Pip

Run

References

Precise Zero-Shot Dense Retrieval without Relevance Labels (Hypothetical Document Embeddings(HyDE))

Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation

Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity

Dense X Retrieval: What Retrieval Granularity Should We Use?

Multi or Subqueries

Ollama setup

Model list

Embeddings

LLMs

Model config

Key exports for LLMs and Embeddings

Star History

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages