Skip to content

Commit

Permalink
Merge pull request #520 from MacOS/main
Browse files Browse the repository at this point in the history
Add documentation workflow and documentation itself
  • Loading branch information
doberst authored Mar 15, 2024
2 parents bbb49db + c94bde6 commit 37e7e0a
Show file tree
Hide file tree
Showing 9 changed files with 433 additions and 0 deletions.
76 changes: 76 additions & 0 deletions .github/workflows/pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.

# Sample workflow for building and deploying a Jekyll site to GitHub Pages
name: Documentation

on:
push:
branches: ["master"]
paths: ["docs/**"] # only changes in the docs directory triger the workflow

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow one concurrent deployment
concurrency:
group: "pages"
cancel-in-progress: true

jobs:
# Build job
build:
runs-on: ubuntu-latest
defaults:
run:
working-directory: docs

steps:
- name: Checkout
uses: actions/checkout@v3

- name: Setup Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.1' # Not needed with a .ruby-version file
bundler-cache: true # runs 'bundle install' and caches installed gems automatically
cache-version: 0 # Increment this number if you need to re-download cached gems
working-directory: '${{ github.workspace }}/docs'

- name: Setup Pages
id: pages
uses: actions/configure-pages@v3

- name: Build with Jekyll
# Outputs to the './_site' directory by default
run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}"
env:
JEKYLL_ENV: production

- name: Upload artifact
# Automatically uploads an artifact from the './_site' directory by default
uses: actions/upload-pages-artifact@v1
with:
path: "docs/_site/"


# Deployment job
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build

steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v2
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
![Static Badge](https://img.shields.io/badge/python-3.9_%7C_3.10%7C_3.11-blue?color=blue)
![PyPI - Version](https://img.shields.io/pypi/v/llmware?color=blue)
[![discord](https://img.shields.io/badge/Chat%20on-Discord-blue?logo=discord&logoColor=white)](https://discord.gg/MhZn5Nc39h)
[![Documentation](https://github.com/llmware-ai/llmware/actions/workflows/pages.yml/badge.svg)](https://github.com/llmware-ai/llmware/actions/workflows/pages.yml)

🧰🛠️🔩**The Ultimate Toolkit for Building LLM Apps**

Expand Down
22 changes: 22 additions & 0 deletions docs/Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
source 'https://rubygems.org'


#
# Gems that are locked to a version.
#
gem "jekyll", "~> 4.3.2" # installed by `gem jekyll`
gem "just-the-docs", "0.7.0" # pinned to the current release


#
# Gems that are only loaded if they are configured correctly.
#
gem 'jekyll-default-layout'
gem 'jekyll-github-metadata'


#
# Gems that loaded irrespective of site configuration.
#
gem 'jekyll-seo-tag', group: :jekyll_plugins
gem 'jekyll-include-cache', group: :jekyll_plugins
107 changes: 107 additions & 0 deletions docs/Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
GEM
remote: https://rubygems.org/
specs:
addressable (2.8.5)
public_suffix (>= 2.0.2, < 6.0)
base64 (0.2.0)
colorator (1.1.0)
concurrent-ruby (1.2.2)
em-websocket (0.5.3)
eventmachine (>= 0.12.9)
http_parser.rb (~> 0)
eventmachine (1.2.7)
faraday (2.7.12)
base64
faraday-net_http (>= 2.0, < 3.1)
ruby2_keywords (>= 0.0.4)
faraday-net_http (3.0.2)
ffi (1.15.5)
forwardable-extended (2.6.0)
google-protobuf (3.24.3-arm64-darwin)
google-protobuf (3.24.3-x86_64-linux)
http_parser.rb (0.8.0)
i18n (1.14.1)
concurrent-ruby (~> 1.0)
jekyll (4.3.2)
addressable (~> 2.4)
colorator (~> 1.0)
em-websocket (~> 0.5)
i18n (~> 1.0)
jekyll-sass-converter (>= 2.0, < 4.0)
jekyll-watch (~> 2.0)
kramdown (~> 2.3, >= 2.3.1)
kramdown-parser-gfm (~> 1.0)
liquid (~> 4.0)
mercenary (>= 0.3.6, < 0.5)
pathutil (~> 0.9)
rouge (>= 3.0, < 5.0)
safe_yaml (~> 1.0)
terminal-table (>= 1.8, < 4.0)
webrick (~> 1.7)
jekyll-default-layout (0.1.5)
jekyll (>= 3.0, < 5.0)
jekyll-github-metadata (2.16.0)
jekyll (>= 3.4, < 5.0)
octokit (>= 4, < 7, != 4.4.0)
jekyll-include-cache (0.2.1)
jekyll (>= 3.7, < 5.0)
jekyll-sass-converter (3.0.0)
sass-embedded (~> 1.54)
jekyll-seo-tag (2.8.0)
jekyll (>= 3.8, < 5.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
just-the-docs (0.7.0)
jekyll (>= 3.8.5)
jekyll-include-cache
jekyll-seo-tag (>= 2.0)
rake (>= 12.3.1)
kramdown (2.4.0)
rexml
kramdown-parser-gfm (1.1.0)
kramdown (~> 2.0)
liquid (4.0.4)
listen (3.8.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
octokit (6.1.1)
faraday (>= 1, < 3)
sawyer (~> 0.9)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
public_suffix (5.0.3)
rake (13.0.6)
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.6)
rouge (4.1.3)
ruby2_keywords (0.0.5)
safe_yaml (1.0.5)
sass-embedded (1.67.0-arm64-darwin)
google-protobuf (~> 3.23)
sass-embedded (1.67.0-x86_64-linux-gnu)
google-protobuf (~> 3.23)
sawyer (0.9.2)
addressable (>= 2.3.5)
faraday (>= 0.17.3, < 3)
terminal-table (3.0.2)
unicode-display_width (>= 1.1.1, < 3)
unicode-display_width (2.4.2)
webrick (1.8.1)

PLATFORMS
arm64-darwin-23
x86_64-linux

DEPENDENCIES
jekyll (~> 4.3.2)
jekyll-default-layout
jekyll-github-metadata
jekyll-include-cache
jekyll-seo-tag
just-the-docs (= 0.7.0)

BUNDLED WITH
2.3.26
53 changes: 53 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Site settings
# These are used to personalize your new site. If you look in the HTML files,
# you will see them accessed via {{ site.title }}, {{ site.github_repo }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
title: llmware
description: llmware is an enterprise-grade LLM-based development framework, including tools and fine-tuned models.
theme: just-the-docs
url: https://llmware-ai.github.io/llmware/
baseurl: "" # the subpath of your site
favicon_ico: "/assets/images/favicon.ico"

# Enable or disable heading anchors
heading_anchors: true

# Aux links for the upper right navigation
aux_links:
llmware repository: https://github.com/llmware-ai/llmware

lsi: false
safe: true
highlighter: rouge

gist:
noscript: false
kramdown:
math_engine: mathjax
syntax_highlighter: rouge


callouts_level: quiet # or loud
callouts:
highlight:
color: yellow
important:
title: Important
color: blue
new:
title: New
color: green
note:
title: Note
color: purple
warning:
title: Warning
color: red


plugins:
- jekyll-default-layout
- jekyll-seo-tag
- jekyll-github-metadata
- jekyll-include-cache
Binary file added docs/assets/images/favicon.ico
Binary file not shown.
117 changes: 117 additions & 0 deletions docs/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
layout: default
title: Introduction by Examples
nav_order: 2
permalink: /examples
---
# Introduction by Examples
We introduce ``llmware`` through self-contained examples.


# Your first library and query

{: .note }
> The code here is a modified version from [example-1-create_first_library.py](https://github.com/llmware-ai/llmware/blob/main/fast_start/example-1-create_first_library.py).
> The adjustments are made to ease understanding for this post.
In this introduction, we will walk through the steps of creating a **library**.
To create a ``library`` in ``llmware`` we have to instantiate a ``library`` object and call
the ``add_files`` method, which will parse the files, chunk up the text and also index it.
We will also download the samples files we provide, which can be used for any experimentation you
might want to do.


**Configuring llmware**
Before we get started, we can influence the configuration of ``llmware``.
For example, we can decide on which **text collection** data base to use, and on the logging level.
By default, ``llmware`` uses MongoDB as the text collection data base and has a ``debug_mode`` level
of ``0``.
This means that by default, ``llmware`` will show the status manager and print errors.
The status manager is useful for large parsing jobs.
In this ``library`` introduction, we will change the text collection data base as well as the ``debug_mode``.
As the text collection data base, we will choose ``sqlite``.
And we will change the ``debug_mode`` to ``2``, which will show the file name that is being parsed, i.e. a file-by-file progress.
```python
from llmware.configs import LLMWareConfig

LLMWareConfig().set_active_db("sqlite")
LLMWareConfig().set_config("debug_mode", 2)
```

**Downloading sample files**
We start by downloading the sample files we need.
``llmware`` provides a set of sample files which we use throught our examples.
The following code snippet downloades these sample files, and in doing so creates the directoires
*Agreements*, *Invoices*, *UN-Resolutions-500*, *SmallLibrary*, *FinDocs*, and *AgreementsLarge*.
If you want to get the newest version of the sample files, you can set ``over_write=True``.
However, we encourage you to try it out with your own files once you are confortable enough with ``llmware``.
```python
from llmware.setup import Setup

sample_files_path = Setup().load_sample_files(over_write=False)
```
``sample_files_path`` is the path where the files are stores.
Assume that your use name is ``foo``, then on Linux the path would be ``'/home/foo/llmware_data/sample_files'.``


**Creating a library**
Now that we have data, we can start to create our library.
In ``llmware``, a **library** is a collection of unstructured data.
Currently, ``llmware`` supports *text* and *images*.
The following code creates an empty ``library`` with the name ``my_llmware_library``.
```python
from llmware.library import Library

library = Library().create_new_library('my_llmware_library')
```

**Adding files to a library**
Now that we have created a ``library``, we are ready to *add files* to it.
Currently, the ``add_files`` method supports pdf, pptx, docx, xlsx, csv, md, txt, json, wav, and zip, jpg, and png.
The method will automtically choose the correct parser, based on the file extension.
```python
library.add_files('/home/foo/llmware_data/sample_files/Agreements')
```

**The library card**
A ``library`` keeps inventory of its' inventory, similar to a good librarian.
We do this with a *library card*.
At the moment of this writting, a library card has the keys _id, library_name, embedding, knowledge_graph, unique_doc_id, documents, blocks, images, pages, tables, and account_name.
```python
updated_library_card = library.get_library_card()
doc_count = updated_library_card["documents"]
block_count = updated_library_card["blocks"]
library_card.keys()
```

You can also get where the library is stored via the ``library_main_path`` attribute.
Again, assuming your user name is *foo* and you are on a Linux system, then the ``library_path`` is ``'/home/foo/llmware_data/accounts/llmware/my_lib'``.
```python
library.library_main_path
```

**Querying a library**
Finally, we are ready to execute a query against our library.
Remember that the text is indexed automatically when we add it to the library.
The result of a ``Query`` is a list of dictionaries, where one dictionary is one result.
A result dictionary has a wide range of useful keys.
A few important keys in the dictionary are *text*, *file_source*, *page_num*, *doc_ID*, *block_ID*, and
*matches*.
In the following, we query the library for the base salary, return the first ten results, and
iterate over the results.
```python
query_results = Query(library).text_query('base salary', result_count=10)

for query_result in query_results:
text = query_result["text"]
file_source = query_result["file_source"]
page_number = query_result["page_num"]
doc_id = query_result["doc_ID"]
block_id = query_result["block_ID"]
matches = query_result["matches"]
```

You can take a look at all the keys that are returned by calling ``keys()``.
```python
query_results[0].keys()
```
Loading

0 comments on commit 37e7e0a

Please sign in to comment.