Merge pull request #520 from MacOS/main

Add documentation workflow and documentation itself
llmware-ai · Mar 15, 2024 · 37e7e0a · 37e7e0a
2 parents bbb49db + c94bde6
commit 37e7e0a
Show file tree

Hide file tree

Showing 9 changed files with 433 additions and 0 deletions.
diff --git a/.github/workflows/pages.yml b/.github/workflows/pages.yml
@@ -0,0 +1,76 @@
+# This workflow uses actions that are not certified by GitHub.
+# They are provided by a third-party and are governed by
+# separate terms of service, privacy policy, and support
+# documentation.
+
+# Sample workflow for building and deploying a Jekyll site to GitHub Pages
+name: Documentation
+
+on:
+  push:
+    branches: ["master"]
+    paths: ["docs/**"] # only changes in the docs directory triger the workflow
+
+  # Allows you to run this workflow manually from the Actions tab
+  workflow_dispatch:
+
+# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+# Allow one concurrent deployment
+concurrency:
+  group: "pages"
+  cancel-in-progress: true
+
+jobs:
+  # Build job
+  build:
+    runs-on: ubuntu-latest
+    defaults:
+        run:
+          working-directory: docs
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+
+      - name: Setup Ruby
+        uses: ruby/setup-ruby@v1
+        with:
+          ruby-version: '3.1' # Not needed with a .ruby-version file
+          bundler-cache: true # runs 'bundle install' and caches installed gems automatically
+          cache-version: 0 # Increment this number if you need to re-download cached gems
+          working-directory: '${{ github.workspace }}/docs'
+
+      - name: Setup Pages
+        id: pages
+        uses: actions/configure-pages@v3
+
+      - name: Build with Jekyll
+        # Outputs to the './_site' directory by default
+        run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}"
+        env:
+          JEKYLL_ENV: production
+
+      - name: Upload artifact
+        # Automatically uploads an artifact from the './_site' directory by default
+        uses: actions/upload-pages-artifact@v1
+        with:
+          path: "docs/_site/"
+
+
+  # Deployment job
+  deploy:
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    runs-on: ubuntu-latest
+    needs: build
+
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v2
diff --git a/README.md b/README.md
@@ -2,6 +2,7 @@
 ![Static Badge](https://img.shields.io/badge/python-3.9_%7C_3.10%7C_3.11-blue?color=blue)
 ![PyPI - Version](https://img.shields.io/pypi/v/llmware?color=blue)
 [![discord](https://img.shields.io/badge/Chat%20on-Discord-blue?logo=discord&logoColor=white)](https://discord.gg/MhZn5Nc39h)   
+[![Documentation](https://github.com/llmware-ai/llmware/actions/workflows/pages.yml/badge.svg)](https://github.com/llmware-ai/llmware/actions/workflows/pages.yml)
 
 🧰🛠️🔩**The Ultimate Toolkit for Building LLM Apps**
 

diff --git a/docs/Gemfile b/docs/Gemfile
@@ -0,0 +1,22 @@
+source 'https://rubygems.org'
+
+
+#
+# Gems that are locked to a version.
+#
+gem "jekyll", "~> 4.3.2" # installed by `gem jekyll`
+gem "just-the-docs", "0.7.0" # pinned to the current release
+
+
+#
+# Gems that are only loaded if they are configured correctly.
+#
+gem 'jekyll-default-layout'
+gem 'jekyll-github-metadata'
+
+
+#
+# Gems that loaded irrespective of site configuration.
+#
+gem 'jekyll-seo-tag', group: :jekyll_plugins
+gem 'jekyll-include-cache', group: :jekyll_plugins
diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock
@@ -0,0 +1,107 @@
+GEM
+  remote: https://rubygems.org/
+  specs:
+    addressable (2.8.5)
+      public_suffix (>= 2.0.2, < 6.0)
+    base64 (0.2.0)
+    colorator (1.1.0)
+    concurrent-ruby (1.2.2)
+    em-websocket (0.5.3)
+      eventmachine (>= 0.12.9)
+      http_parser.rb (~> 0)
+    eventmachine (1.2.7)
+    faraday (2.7.12)
+      base64
+      faraday-net_http (>= 2.0, < 3.1)
+      ruby2_keywords (>= 0.0.4)
+    faraday-net_http (3.0.2)
+    ffi (1.15.5)
+    forwardable-extended (2.6.0)
+    google-protobuf (3.24.3-arm64-darwin)
+    google-protobuf (3.24.3-x86_64-linux)
+    http_parser.rb (0.8.0)
+    i18n (1.14.1)
+      concurrent-ruby (~> 1.0)
+    jekyll (4.3.2)
+      addressable (~> 2.4)
+      colorator (~> 1.0)
+      em-websocket (~> 0.5)
+      i18n (~> 1.0)
+      jekyll-sass-converter (>= 2.0, < 4.0)
+      jekyll-watch (~> 2.0)
+      kramdown (~> 2.3, >= 2.3.1)
+      kramdown-parser-gfm (~> 1.0)
+      liquid (~> 4.0)
+      mercenary (>= 0.3.6, < 0.5)
+      pathutil (~> 0.9)
+      rouge (>= 3.0, < 5.0)
+      safe_yaml (~> 1.0)
+      terminal-table (>= 1.8, < 4.0)
+      webrick (~> 1.7)
+    jekyll-default-layout (0.1.5)
+      jekyll (>= 3.0, < 5.0)
+    jekyll-github-metadata (2.16.0)
+      jekyll (>= 3.4, < 5.0)
+      octokit (>= 4, < 7, != 4.4.0)
+    jekyll-include-cache (0.2.1)
+      jekyll (>= 3.7, < 5.0)
+    jekyll-sass-converter (3.0.0)
+      sass-embedded (~> 1.54)
+    jekyll-seo-tag (2.8.0)
+      jekyll (>= 3.8, < 5.0)
+    jekyll-watch (2.2.1)
+      listen (~> 3.0)
+    just-the-docs (0.7.0)
+      jekyll (>= 3.8.5)
+      jekyll-include-cache
+      jekyll-seo-tag (>= 2.0)
+      rake (>= 12.3.1)
+    kramdown (2.4.0)
+      rexml
+    kramdown-parser-gfm (1.1.0)
+      kramdown (~> 2.0)
+    liquid (4.0.4)
+    listen (3.8.0)
+      rb-fsevent (~> 0.10, >= 0.10.3)
+      rb-inotify (~> 0.9, >= 0.9.10)
+    mercenary (0.4.0)
+    octokit (6.1.1)
+      faraday (>= 1, < 3)
+      sawyer (~> 0.9)
+    pathutil (0.16.2)
+      forwardable-extended (~> 2.6)
+    public_suffix (5.0.3)
+    rake (13.0.6)
+    rb-fsevent (0.11.2)
+    rb-inotify (0.10.1)
+      ffi (~> 1.0)
+    rexml (3.2.6)
+    rouge (4.1.3)
+    ruby2_keywords (0.0.5)
+    safe_yaml (1.0.5)
+    sass-embedded (1.67.0-arm64-darwin)
+      google-protobuf (~> 3.23)
+    sass-embedded (1.67.0-x86_64-linux-gnu)
+      google-protobuf (~> 3.23)
+    sawyer (0.9.2)
+      addressable (>= 2.3.5)
+      faraday (>= 0.17.3, < 3)
+    terminal-table (3.0.2)
+      unicode-display_width (>= 1.1.1, < 3)
+    unicode-display_width (2.4.2)
+    webrick (1.8.1)
+
+PLATFORMS
+  arm64-darwin-23
+  x86_64-linux
+
+DEPENDENCIES
+  jekyll (~> 4.3.2)
+  jekyll-default-layout
+  jekyll-github-metadata
+  jekyll-include-cache
+  jekyll-seo-tag
+  just-the-docs (= 0.7.0)
+
+BUNDLED WITH
+   2.3.26
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -0,0 +1,53 @@
+# Site settings
+# These are used to personalize your new site. If you look in the HTML files,
+# you will see them accessed via {{ site.title }}, {{ site.github_repo }}, and so on.
+# You can create any custom variable you would like, and they will be accessible
+# in the templates via {{ site.myvariable }}.
+title: llmware
+description: llmware is an enterprise-grade LLM-based development framework, including tools and fine-tuned models.
+theme: just-the-docs
+url: https://llmware-ai.github.io/llmware/
+baseurl: "" # the subpath of your site
+favicon_ico: "/assets/images/favicon.ico"
+
+# Enable or disable heading anchors
+heading_anchors: true
+
+# Aux links for the upper right navigation
+aux_links:
+  llmware repository: https://github.com/llmware-ai/llmware
+
+lsi: false
+safe: true
+highlighter: rouge
+
+gist:
+  noscript: false
+kramdown:
+  math_engine: mathjax
+  syntax_highlighter: rouge
+
+
+callouts_level: quiet # or loud
+callouts:
+  highlight:
+    color: yellow
+  important:
+    title: Important
+    color: blue
+  new:
+    title: New
+    color: green
+  note:
+    title: Note
+    color: purple
+  warning:
+    title: Warning
+    color: red
+
+
+plugins:
+  - jekyll-default-layout
+  - jekyll-seo-tag
+  - jekyll-github-metadata
+  - jekyll-include-cache
diff --git a/docs/assets/images/favicon.ico b/docs/assets/images/favicon.ico
diff --git a/docs/examples.md b/docs/examples.md
@@ -0,0 +1,117 @@
+---
+layout: default
+title: Introduction by Examples
+nav_order: 2
+permalink: /examples
+---
+# Introduction by Examples
+We introduce ``llmware`` through self-contained examples.
+
+
+# Your first library and query
+
+{: .note }
+> The code here is a modified version from [example-1-create_first_library.py](https://github.com/llmware-ai/llmware/blob/main/fast_start/example-1-create_first_library.py).
+> The adjustments are made to ease understanding for this post.
+
+In this introduction, we will walk through the steps of creating a **library**.
+To create a ``library`` in ``llmware`` we have to instantiate a ``library`` object and call
+the ``add_files`` method, which will parse the files, chunk up the text and also index it.
+We will also download the samples files we provide, which can be used for any experimentation you
+might want to do.
+
+
+**Configuring llmware**
+Before we get started, we can influence the configuration of ``llmware``.
+For example, we can decide on which **text collection** data base to use, and on the logging level.
+By default, ``llmware`` uses MongoDB as the text collection data base and has a ``debug_mode`` level
+of ``0``.
+This means that by default, ``llmware`` will show the status manager and print errors.
+The status manager is useful for large parsing jobs.
+In this ``library`` introduction, we will change the text collection data base as well as the ``debug_mode``.
+As the text collection data base, we will choose ``sqlite``.
+And we will change the ``debug_mode`` to ``2``, which will show the file name that is being parsed, i.e. a file-by-file progress.
+```python
+from llmware.configs import LLMWareConfig
+
+LLMWareConfig().set_active_db("sqlite")
+LLMWareConfig().set_config("debug_mode", 2)
+```
+
+**Downloading sample files**
+We start by downloading the sample files we need.
+``llmware`` provides a set of sample files which we use throught our examples.
+The following code snippet downloades these sample files, and in doing so creates the directoires
+*Agreements*, *Invoices*, *UN-Resolutions-500*, *SmallLibrary*, *FinDocs*, and *AgreementsLarge*.
+If you want to get the newest version of the sample files, you can set ``over_write=True``.
+However, we encourage you to try it out with your own files once you are confortable enough with ``llmware``.
+```python
+from llmware.setup import Setup
+
+sample_files_path = Setup().load_sample_files(over_write=False)
+```
+``sample_files_path`` is the path where the files are stores.
+Assume that your use name is ``foo``, then on Linux the path would be ``'/home/foo/llmware_data/sample_files'.``
+
+
+**Creating a library**
+Now that we have data, we can start to create our library.
+In ``llmware``, a **library** is a collection of unstructured data.
+Currently, ``llmware`` supports *text* and *images*.
+The following code creates an empty ``library`` with the name ``my_llmware_library``.
+```python
+from llmware.library import Library
+
+library = Library().create_new_library('my_llmware_library')
+```
+
+**Adding files to a library**
+Now that we have created a ``library``, we are ready to *add files* to it.
+Currently, the ``add_files`` method supports pdf, pptx, docx, xlsx, csv, md, txt, json, wav, and zip, jpg, and png.
+The method will automtically choose the correct parser, based on the file extension.
+```python
+library.add_files('/home/foo/llmware_data/sample_files/Agreements')
+```
+
+**The library card**
+A ``library`` keeps inventory of its' inventory, similar to a good librarian.
+We do this with a *library card*.
+At the moment of this writting, a library card has the keys _id, library_name, embedding, knowledge_graph, unique_doc_id, documents, blocks, images, pages, tables, and account_name.
+```python
+updated_library_card = library.get_library_card()
+doc_count = updated_library_card["documents"]
+block_count = updated_library_card["blocks"]
+library_card.keys()
+```
+
+You can also get where the library is stored via the ``library_main_path`` attribute.
+Again, assuming your user name is *foo* and you are on a Linux system, then the ``library_path`` is ``'/home/foo/llmware_data/accounts/llmware/my_lib'``.
+```python
+library.library_main_path
+```
+
+**Querying a library**
+Finally, we are ready to execute a query against our library.
+Remember that the text is indexed automatically when we add it to the library.
+The result of a ``Query`` is a list of dictionaries, where one dictionary is one result.
+A result dictionary has a wide range of useful keys.
+A few important keys in the dictionary are *text*, *file_source*, *page_num*, *doc_ID*, *block_ID*, and
+*matches*.
+In the following, we query the library for the base salary, return the first ten results, and
+iterate over the results.
+```python
+query_results = Query(library).text_query('base salary', result_count=10)
+
+for query_result in query_results:
+    text = query_result["text"]
+    file_source = query_result["file_source"]
+    page_number = query_result["page_num"]
+    doc_id = query_result["doc_ID"]
+    block_id = query_result["block_ID"]
+    matches = query_result["matches"]
+```
+
+You can take a look at all the keys that are returned by calling ``keys()``.
+```python
+query_results[0].keys()
+```