-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #520 from MacOS/main
Add documentation workflow and documentation itself
- Loading branch information
Showing
9 changed files
with
433 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# This workflow uses actions that are not certified by GitHub. | ||
# They are provided by a third-party and are governed by | ||
# separate terms of service, privacy policy, and support | ||
# documentation. | ||
|
||
# Sample workflow for building and deploying a Jekyll site to GitHub Pages | ||
name: Documentation | ||
|
||
on: | ||
push: | ||
branches: ["master"] | ||
paths: ["docs/**"] # only changes in the docs directory triger the workflow | ||
|
||
# Allows you to run this workflow manually from the Actions tab | ||
workflow_dispatch: | ||
|
||
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages | ||
permissions: | ||
contents: read | ||
pages: write | ||
id-token: write | ||
|
||
# Allow one concurrent deployment | ||
concurrency: | ||
group: "pages" | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
# Build job | ||
build: | ||
runs-on: ubuntu-latest | ||
defaults: | ||
run: | ||
working-directory: docs | ||
|
||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v3 | ||
|
||
- name: Setup Ruby | ||
uses: ruby/setup-ruby@v1 | ||
with: | ||
ruby-version: '3.1' # Not needed with a .ruby-version file | ||
bundler-cache: true # runs 'bundle install' and caches installed gems automatically | ||
cache-version: 0 # Increment this number if you need to re-download cached gems | ||
working-directory: '${{ github.workspace }}/docs' | ||
|
||
- name: Setup Pages | ||
id: pages | ||
uses: actions/configure-pages@v3 | ||
|
||
- name: Build with Jekyll | ||
# Outputs to the './_site' directory by default | ||
run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}" | ||
env: | ||
JEKYLL_ENV: production | ||
|
||
- name: Upload artifact | ||
# Automatically uploads an artifact from the './_site' directory by default | ||
uses: actions/upload-pages-artifact@v1 | ||
with: | ||
path: "docs/_site/" | ||
|
||
|
||
# Deployment job | ||
deploy: | ||
environment: | ||
name: github-pages | ||
url: ${{ steps.deployment.outputs.page_url }} | ||
runs-on: ubuntu-latest | ||
needs: build | ||
|
||
steps: | ||
- name: Deploy to GitHub Pages | ||
id: deployment | ||
uses: actions/deploy-pages@v2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
source 'https://rubygems.org' | ||
|
||
|
||
# | ||
# Gems that are locked to a version. | ||
# | ||
gem "jekyll", "~> 4.3.2" # installed by `gem jekyll` | ||
gem "just-the-docs", "0.7.0" # pinned to the current release | ||
|
||
|
||
# | ||
# Gems that are only loaded if they are configured correctly. | ||
# | ||
gem 'jekyll-default-layout' | ||
gem 'jekyll-github-metadata' | ||
|
||
|
||
# | ||
# Gems that loaded irrespective of site configuration. | ||
# | ||
gem 'jekyll-seo-tag', group: :jekyll_plugins | ||
gem 'jekyll-include-cache', group: :jekyll_plugins |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
GEM | ||
remote: https://rubygems.org/ | ||
specs: | ||
addressable (2.8.5) | ||
public_suffix (>= 2.0.2, < 6.0) | ||
base64 (0.2.0) | ||
colorator (1.1.0) | ||
concurrent-ruby (1.2.2) | ||
em-websocket (0.5.3) | ||
eventmachine (>= 0.12.9) | ||
http_parser.rb (~> 0) | ||
eventmachine (1.2.7) | ||
faraday (2.7.12) | ||
base64 | ||
faraday-net_http (>= 2.0, < 3.1) | ||
ruby2_keywords (>= 0.0.4) | ||
faraday-net_http (3.0.2) | ||
ffi (1.15.5) | ||
forwardable-extended (2.6.0) | ||
google-protobuf (3.24.3-arm64-darwin) | ||
google-protobuf (3.24.3-x86_64-linux) | ||
http_parser.rb (0.8.0) | ||
i18n (1.14.1) | ||
concurrent-ruby (~> 1.0) | ||
jekyll (4.3.2) | ||
addressable (~> 2.4) | ||
colorator (~> 1.0) | ||
em-websocket (~> 0.5) | ||
i18n (~> 1.0) | ||
jekyll-sass-converter (>= 2.0, < 4.0) | ||
jekyll-watch (~> 2.0) | ||
kramdown (~> 2.3, >= 2.3.1) | ||
kramdown-parser-gfm (~> 1.0) | ||
liquid (~> 4.0) | ||
mercenary (>= 0.3.6, < 0.5) | ||
pathutil (~> 0.9) | ||
rouge (>= 3.0, < 5.0) | ||
safe_yaml (~> 1.0) | ||
terminal-table (>= 1.8, < 4.0) | ||
webrick (~> 1.7) | ||
jekyll-default-layout (0.1.5) | ||
jekyll (>= 3.0, < 5.0) | ||
jekyll-github-metadata (2.16.0) | ||
jekyll (>= 3.4, < 5.0) | ||
octokit (>= 4, < 7, != 4.4.0) | ||
jekyll-include-cache (0.2.1) | ||
jekyll (>= 3.7, < 5.0) | ||
jekyll-sass-converter (3.0.0) | ||
sass-embedded (~> 1.54) | ||
jekyll-seo-tag (2.8.0) | ||
jekyll (>= 3.8, < 5.0) | ||
jekyll-watch (2.2.1) | ||
listen (~> 3.0) | ||
just-the-docs (0.7.0) | ||
jekyll (>= 3.8.5) | ||
jekyll-include-cache | ||
jekyll-seo-tag (>= 2.0) | ||
rake (>= 12.3.1) | ||
kramdown (2.4.0) | ||
rexml | ||
kramdown-parser-gfm (1.1.0) | ||
kramdown (~> 2.0) | ||
liquid (4.0.4) | ||
listen (3.8.0) | ||
rb-fsevent (~> 0.10, >= 0.10.3) | ||
rb-inotify (~> 0.9, >= 0.9.10) | ||
mercenary (0.4.0) | ||
octokit (6.1.1) | ||
faraday (>= 1, < 3) | ||
sawyer (~> 0.9) | ||
pathutil (0.16.2) | ||
forwardable-extended (~> 2.6) | ||
public_suffix (5.0.3) | ||
rake (13.0.6) | ||
rb-fsevent (0.11.2) | ||
rb-inotify (0.10.1) | ||
ffi (~> 1.0) | ||
rexml (3.2.6) | ||
rouge (4.1.3) | ||
ruby2_keywords (0.0.5) | ||
safe_yaml (1.0.5) | ||
sass-embedded (1.67.0-arm64-darwin) | ||
google-protobuf (~> 3.23) | ||
sass-embedded (1.67.0-x86_64-linux-gnu) | ||
google-protobuf (~> 3.23) | ||
sawyer (0.9.2) | ||
addressable (>= 2.3.5) | ||
faraday (>= 0.17.3, < 3) | ||
terminal-table (3.0.2) | ||
unicode-display_width (>= 1.1.1, < 3) | ||
unicode-display_width (2.4.2) | ||
webrick (1.8.1) | ||
|
||
PLATFORMS | ||
arm64-darwin-23 | ||
x86_64-linux | ||
|
||
DEPENDENCIES | ||
jekyll (~> 4.3.2) | ||
jekyll-default-layout | ||
jekyll-github-metadata | ||
jekyll-include-cache | ||
jekyll-seo-tag | ||
just-the-docs (= 0.7.0) | ||
|
||
BUNDLED WITH | ||
2.3.26 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Site settings | ||
# These are used to personalize your new site. If you look in the HTML files, | ||
# you will see them accessed via {{ site.title }}, {{ site.github_repo }}, and so on. | ||
# You can create any custom variable you would like, and they will be accessible | ||
# in the templates via {{ site.myvariable }}. | ||
title: llmware | ||
description: llmware is an enterprise-grade LLM-based development framework, including tools and fine-tuned models. | ||
theme: just-the-docs | ||
url: https://llmware-ai.github.io/llmware/ | ||
baseurl: "" # the subpath of your site | ||
favicon_ico: "/assets/images/favicon.ico" | ||
|
||
# Enable or disable heading anchors | ||
heading_anchors: true | ||
|
||
# Aux links for the upper right navigation | ||
aux_links: | ||
llmware repository: https://github.com/llmware-ai/llmware | ||
|
||
lsi: false | ||
safe: true | ||
highlighter: rouge | ||
|
||
gist: | ||
noscript: false | ||
kramdown: | ||
math_engine: mathjax | ||
syntax_highlighter: rouge | ||
|
||
|
||
callouts_level: quiet # or loud | ||
callouts: | ||
highlight: | ||
color: yellow | ||
important: | ||
title: Important | ||
color: blue | ||
new: | ||
title: New | ||
color: green | ||
note: | ||
title: Note | ||
color: purple | ||
warning: | ||
title: Warning | ||
color: red | ||
|
||
|
||
plugins: | ||
- jekyll-default-layout | ||
- jekyll-seo-tag | ||
- jekyll-github-metadata | ||
- jekyll-include-cache |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
--- | ||
layout: default | ||
title: Introduction by Examples | ||
nav_order: 2 | ||
permalink: /examples | ||
--- | ||
# Introduction by Examples | ||
We introduce ``llmware`` through self-contained examples. | ||
|
||
|
||
# Your first library and query | ||
|
||
{: .note } | ||
> The code here is a modified version from [example-1-create_first_library.py](https://github.com/llmware-ai/llmware/blob/main/fast_start/example-1-create_first_library.py). | ||
> The adjustments are made to ease understanding for this post. | ||
In this introduction, we will walk through the steps of creating a **library**. | ||
To create a ``library`` in ``llmware`` we have to instantiate a ``library`` object and call | ||
the ``add_files`` method, which will parse the files, chunk up the text and also index it. | ||
We will also download the samples files we provide, which can be used for any experimentation you | ||
might want to do. | ||
|
||
|
||
**Configuring llmware** | ||
Before we get started, we can influence the configuration of ``llmware``. | ||
For example, we can decide on which **text collection** data base to use, and on the logging level. | ||
By default, ``llmware`` uses MongoDB as the text collection data base and has a ``debug_mode`` level | ||
of ``0``. | ||
This means that by default, ``llmware`` will show the status manager and print errors. | ||
The status manager is useful for large parsing jobs. | ||
In this ``library`` introduction, we will change the text collection data base as well as the ``debug_mode``. | ||
As the text collection data base, we will choose ``sqlite``. | ||
And we will change the ``debug_mode`` to ``2``, which will show the file name that is being parsed, i.e. a file-by-file progress. | ||
```python | ||
from llmware.configs import LLMWareConfig | ||
|
||
LLMWareConfig().set_active_db("sqlite") | ||
LLMWareConfig().set_config("debug_mode", 2) | ||
``` | ||
|
||
**Downloading sample files** | ||
We start by downloading the sample files we need. | ||
``llmware`` provides a set of sample files which we use throught our examples. | ||
The following code snippet downloades these sample files, and in doing so creates the directoires | ||
*Agreements*, *Invoices*, *UN-Resolutions-500*, *SmallLibrary*, *FinDocs*, and *AgreementsLarge*. | ||
If you want to get the newest version of the sample files, you can set ``over_write=True``. | ||
However, we encourage you to try it out with your own files once you are confortable enough with ``llmware``. | ||
```python | ||
from llmware.setup import Setup | ||
|
||
sample_files_path = Setup().load_sample_files(over_write=False) | ||
``` | ||
``sample_files_path`` is the path where the files are stores. | ||
Assume that your use name is ``foo``, then on Linux the path would be ``'/home/foo/llmware_data/sample_files'.`` | ||
|
||
|
||
**Creating a library** | ||
Now that we have data, we can start to create our library. | ||
In ``llmware``, a **library** is a collection of unstructured data. | ||
Currently, ``llmware`` supports *text* and *images*. | ||
The following code creates an empty ``library`` with the name ``my_llmware_library``. | ||
```python | ||
from llmware.library import Library | ||
|
||
library = Library().create_new_library('my_llmware_library') | ||
``` | ||
|
||
**Adding files to a library** | ||
Now that we have created a ``library``, we are ready to *add files* to it. | ||
Currently, the ``add_files`` method supports pdf, pptx, docx, xlsx, csv, md, txt, json, wav, and zip, jpg, and png. | ||
The method will automtically choose the correct parser, based on the file extension. | ||
```python | ||
library.add_files('/home/foo/llmware_data/sample_files/Agreements') | ||
``` | ||
|
||
**The library card** | ||
A ``library`` keeps inventory of its' inventory, similar to a good librarian. | ||
We do this with a *library card*. | ||
At the moment of this writting, a library card has the keys _id, library_name, embedding, knowledge_graph, unique_doc_id, documents, blocks, images, pages, tables, and account_name. | ||
```python | ||
updated_library_card = library.get_library_card() | ||
doc_count = updated_library_card["documents"] | ||
block_count = updated_library_card["blocks"] | ||
library_card.keys() | ||
``` | ||
|
||
You can also get where the library is stored via the ``library_main_path`` attribute. | ||
Again, assuming your user name is *foo* and you are on a Linux system, then the ``library_path`` is ``'/home/foo/llmware_data/accounts/llmware/my_lib'``. | ||
```python | ||
library.library_main_path | ||
``` | ||
|
||
**Querying a library** | ||
Finally, we are ready to execute a query against our library. | ||
Remember that the text is indexed automatically when we add it to the library. | ||
The result of a ``Query`` is a list of dictionaries, where one dictionary is one result. | ||
A result dictionary has a wide range of useful keys. | ||
A few important keys in the dictionary are *text*, *file_source*, *page_num*, *doc_ID*, *block_ID*, and | ||
*matches*. | ||
In the following, we query the library for the base salary, return the first ten results, and | ||
iterate over the results. | ||
```python | ||
query_results = Query(library).text_query('base salary', result_count=10) | ||
|
||
for query_result in query_results: | ||
text = query_result["text"] | ||
file_source = query_result["file_source"] | ||
page_number = query_result["page_num"] | ||
doc_id = query_result["doc_ID"] | ||
block_id = query_result["block_ID"] | ||
matches = query_result["matches"] | ||
``` | ||
|
||
You can take a look at all the keys that are returned by calling ``keys()``. | ||
```python | ||
query_results[0].keys() | ||
``` |
Oops, something went wrong.