-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
04e9828
commit ce926a3
Showing
19 changed files
with
1,886 additions
and
1,552 deletions.
There are no files selected for viewing
Binary file modified
BIN
+4.8 MB
(150%)
data/crawler/5d1d4bae-0137-4cfb-9783-64f67098e434/data_level0.bin
Binary file not shown.
Binary file modified
BIN
+0 Bytes
(100%)
data/crawler/5d1d4bae-0137-4cfb-9783-64f67098e434/header.bin
Binary file not shown.
Binary file modified
BIN
+170 KB
(150%)
data/crawler/5d1d4bae-0137-4cfb-9783-64f67098e434/index_metadata.pickle
Binary file not shown.
Binary file modified
BIN
+11.7 KB
(150%)
data/crawler/5d1d4bae-0137-4cfb-9783-64f67098e434/length.bin
Binary file not shown.
Binary file modified
BIN
+26.1 KB
(150%)
data/crawler/5d1d4bae-0137-4cfb-9783-64f67098e434/link_lists.bin
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
## Documentation Bot | ||
|
||
::: documentation_query_utils |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Documentation Bot | ||
|
||
- This bot reads the documentation of OpenML and trains an LLM model to answer questions about the project. | ||
|
||
## How to run | ||
|
||
- First run the crawler to get the documentation from OpenML. This will create a `data` folder with the documentation in it. ```python run_crawler.py``` | ||
- For inference, run ```uvicorn documentation_query:app --host 0.0.0.0 --port 8083 &``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Documentation Bot | ||
|
||
- This bot reads the documentation of OpenML and trains an LLM model to answer questions about the project. | ||
|
||
## How to run | ||
|
||
- First run the crawler to get the documentation from OpenML. This will create a `data` folder with the documentation in it. ```python run_crawler.py``` | ||
- For inference, run ```uvicorn documentation_query:app --host 0.0.0.0 --port 8083 &``` |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
https://openml.github.io/openml-python/main/ | ||
https://docs.openml.org/ | ||
https://openml.org/apis/ | ||
https://github.com/openml/openml-python/tree/develop/openml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
beautifulsoup4==4.12.3 | ||
fastapi==0.112.2 | ||
httpx==0.27.0 | ||
langchain==0.2.14 | ||
langchain_community==0.2.12 | ||
langchain_core==0.2.35 | ||
langchain_ollama==0.1.1 | ||
pandas==2.2.2 | ||
Requests==2.32.3 | ||
tenacity==8.3.0 | ||
torch==2.3.0 | ||
tqdm==4.66.4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
import os | ||
|
||
from documentation_query_utils import ChromaStore, Crawler | ||
|
||
recrawl_websites = True | ||
|
||
crawled_files_data_path = "../data/crawler/crawled_data.csv" | ||
chroma_path = "../data/crawler/" | ||
model_name = "BAAI/bge-small-en" | ||
generation_model_name = "llama3" # ollama | ||
|
||
# Crawl the websites and save the data | ||
num_of_websites_to_crawl = None # none for all | ||
|
||
if not os.path.exists(chroma_path): | ||
os.makedirs(chroma_path, exist_ok=True) | ||
|
||
# Crawl the websites and save the data | ||
crawler = Crawler( | ||
crawled_files_data_path=crawled_files_data_path, | ||
recrawl_websites=recrawl_websites, | ||
num_of_websites_to_crawl=num_of_websites_to_crawl, | ||
) | ||
crawler.do_crawl() | ||
|
||
# Initialize the ChromaStore and embed the data | ||
chroma_store = ChromaStore( | ||
model_name=model_name, | ||
crawled_files_data_path=crawled_files_data_path, | ||
chroma_file_path=chroma_path, | ||
generation_model_name=generation_model_name, | ||
) | ||
if recrawl_websites == True: | ||
chroma_store.read_data_and_embed() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.