Inspired by the vision of the talks, "Search Engines History and Future" (FOSSASIA Singapore 2016 video) and "Search Engines of the Future" (QtCon Berlin 2016) this project aims to bring that vision of a "future search engine" to life. The prediction of both talks had been:
"Future search engines will answer to all questions!"
Large Language Models (LLMs) today provide a new approach to answer to all questions. The concept of a combination of LLMs with Search Engines is described as Retrieval Augmented Generation (RAG). Search engines with exact-match, similar-match (i.e. using Levenshtein distance), or synonym-match full-text indexing (i.e. frameworks using Lucene or similar technologies) are not sufficient to match a Instruct query (a "prompt") semantically with a text corpus, that is used to provide a context to the prompt. Instead, a dense vector similarity search on embeddings is the right approach to implement RAG.
In this repository we are building a search portal that leverages the power of both Large Language Models (LLM) and Retrieval Augmented Generation (RAG) to create a comprehensive, responsive, and cutting-edge search engine that fulfills the promise of the prediction above.
YaCy users are able to acquire large text corpora using web crawling and also by importing large data dumps provided as WARC or ZIM file. This project will use specifically the data dump files from YaCy to create context-delivery to LLMs using RAG.
Data acquisition with YaCy then becomes a pre-processing step to create an expert-like chatbot, which is specifically potent in the knowledge domain of the YaCy web crawls. This motivates to name this project "YaCy Expert".
Not only the naming but also the approach follows the (old) AI concept of "Expert Systems" in the tradition of "General Problem Solving" (1958), "A model of inexact reasoning in medicine" (1975) and "Internist-I, an Experimental Computer-Based Diagnostic Consultant for General Internal Medicine" (1982). This concept divides the expert system into two subsystems: the inference engine and the knowledge base. This is reflected in the YaCy Expert architecture.
YaCy Expert provides a web interface that acts as a wrapper for two separate backend services: a LLM (acting as the inference engine of the expert system) and a RAG system (acting as the knowledge base of the expert system). The LLM is hosted as a drop-in replacement of the OpenAI chat API using llama.cpp and the RAG system is embedded into the server back-end of the YaCy Expert web interface.
graph LR;
UI -- acts as a wrapper for --> LLM;
UI -- embedded into --> RAG;
RAG -- uses --> Faiss
LLM -- drop-in replacement of OpenAI Chat API --> llama.cpp;
This is a python project. It should run on Linux, macOS and Windows and it can also run in a distributed environment where the LLM is hosted separately. You need:
- one server running python
- another (or the same) server running a LLM, either natively (recommended on Mac) or in a docker container (e.g. on Windows or Linux).
- one or more YaCy index dump(s)
- RAM for the LLM and the Vector index; both can be very large. We recommend a lot of RAM (64GB and above).
Although designed for powerful hardware (macOS or Linux with GPU), we are also working on a proof-of-concept installation for two Raspberry Pi SBCs.
(this is a stub)
You must go through the following steps:
- provide data for a vector search
- do indexing of the provided data
- provide a LLM backend
- run the yacy_expert backend server
- open the chat web page to talk to the expert
We use a Faiss index here. The index is provided by the following scripts:
knowledge_indexing.py
knowledge_search_server.py
The first script generates a Faiss index, the second one provides access to the index with a flask web server.
Run your YaCy instance and generate a search index. This search index defines the corpus of the knowledge that you want to use to enrich YaCy Expert using RAG. When you believe that your index is well-curated, do an index export:
- click on "Index Export/Import" (
/IndexExport_p.html
) - keep the default export settings and use "Full Data Records"; click on "Export" and wait until the export is finished.
- Copy the exported
.jsonlist
file from YaCyDATA/EXPORT/
path to theknowledge
path inside this repository
The Faiss index is applied on all .jsonlist
files inside the knowledge
path.
- indexing uses a BERT language model to tokenize texts and to compute embeddings. Such embeddings can be optimized by choosing the right BERT model. The default model
bert-base-multilingual-uncased
can be replaced with a custom model that can be declared in a.ini
file. For eachjsonlist
/jsonl
file you may create a file<filename>.jsonlist.ini
with declaration of a wanted model, e.g. for a german BERT model:
[DEFAULT]
model_name = dbmdz/bert-base-german-uncased
dimension = 768
- run
python3 knowledge_indexing.py
. This is a long-running process which takes about 1 hour for 10000 index entries - depends on the speed of your computer (best speed is 4x of this). A multi-million-sized index may take several days or weeks!
The produced index files with extension .faiss
are also inside the knowledge
path. Running the knowledge_indexing.py
script again will not compute already produced faiss index files again.
The Faiss index shall be used as RAG context provider, but we provide a like-YaCy search API that can run separately!
- run
python3 knowledge_search.py
. This will set up a web server which provides a search API athttp://localhost:8094/yacysearch.json
. You can search i.e. with the urlhttp://localhost:8094/yacysearch.json?query=one%20two%20three&count=3
-- this is the same path as in YaCy. You can use the same html front-ends for the search. - open
knowledge_search_html/index.html
. This will provide you with a very simple test search interface which uses the knowledge_search_server to retrieve documents from the Faiss Vector Index.
The 'industrial' way to get the LLM backend is to use the OpenAI API. We want to be independed from a commercial solution but our own solution shall be compatible to the OpenAI API. However, to prove that this way is compatible, one of the options here must actually be the commercial binding to the OpenAI interface. That means we have two options:
- copy a language model in the
model
subdirectory - build the docker image containing the llama.cpp framework
- start the llama.cpp docker container with the OpenAI API