Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM RAG example #89

Merged
merged 51 commits into from
Jan 25, 2024
Merged

Add LLM RAG example #89

merged 51 commits into from
Jan 25, 2024

Conversation

htahir1
Copy link
Contributor

@htahir1 htahir1 commented Jan 24, 2024

No description provided.

@htahir1 htahir1 requested a review from wjayesh January 24, 2024 14:07
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor Author

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didnt need to change ALL the licences , just for your project :-) Other than that left some reviews

llm-agents/src/pipelines/index_builder.py Outdated Show resolved Hide resolved
Comment on lines 1 to 2
langchain==0.0.263
openai==0.27.2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for fun, can we try and use the latest versions to see what happens?

@@ -1,7 +1,5 @@
langchain==0.0.263
openai==0.27.2
slack-bolt==1.16.2
slack-sdk==3.20.0
zenml==0.43.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should upgrade to 0.55.0

Comment on lines 39 to 42
# examples_readme_urls = get_nested_readme_urls(repo_url)
# docs_urls = get_all_pages(docs_url)
# website_urls = get_all_pages(website_url)
# all_urls = docs_urls + website_urls + [release_notes_url]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep this as an option for people to include a greater number of documents as data sources for the vector store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? all objects get pickled automatically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cloudpickle in the default implementation didn't work for the AgentExecutor class AFAIR. So I implemented this pickle materializer instead.

llm-agents/src/pipelines/index_builder.py Outdated Show resolved Hide resolved


@step(enable_cache=True)
@step(enable_cache=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should put all config in a YAML file tbh. Btw why do we disable cache here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, if the step fails and the URLs are not loaded, the next pipeline run skips the step and goes into the next step with a None object. This leads to some obscure errors. Doesn't happen usually but it's an edge case that the DSD guys ran into, sometime around the first bootcamp.

@wjayesh wjayesh merged commit 7d1fa76 into main Jan 25, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants