Skip to content
This repository has been archived by the owner on Oct 14, 2024. It is now read-only.

Commit

Permalink
fix: add missing content in migration (#4)
Browse files Browse the repository at this point in the history
fix: add missing content in migration
  • Loading branch information
0xHieu01 authored Mar 22, 2024
2 parents dce03b7 + 0cffc9a commit bedc795
Show file tree
Hide file tree
Showing 26 changed files with 1,452 additions and 883 deletions.
Binary file added docs/about/assets/solar-punk.webp
Binary file not shown.
Binary file added docs/about/assets/vision-1.webp
Binary file not shown.
74 changes: 74 additions & 0 deletions docs/about/vision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: Jan's Vision
slug: /vision
description: Jan is a desktop application that turns computers into thinking machines.
keywords:
[
Jan AI,
Jan,
ChatGPT alternative,
local AI,
private AI,
conversational AI,
no-subscription fee,
large language model,
about Jan,
desktop application,
thinking machine,
jan vision,
]
---

## Jan's vision is to shape a future where humans and machines collaborate, continuing our legacy as toolmakers

Throughout history, humanity has thrived by mastering tools, from [controlling fire](https://en.wikipedia.org/wiki/Control_of_fire_by_early_humans) to [inventing the wheel](https://en.wikipedia.org/wiki/Wheel). These leaps weren't just about survival, they were foundational to our civilization.

Today, we stand on the brink of a new frontier with artificial intelligence. AI is not merely another tool, it represents a new form of collaboration between humans and machines - promising to enhance our creativity, augment our lives, and deepen our understanding of the world.

![jan ai shapes the future](./assets/vision-1.webp)

In the future we envision, AI will be as integral to our lives as fire and the wheel once were, with each individual having their own machines/robots. Mastering AI, like mastering fire, will require understanding its potential, respecting its power, and learning to control it for the betterment of humanity.

### Inspired by Science Fiction, Grounded in Optimism

Our vision is influenced by the harmonious coexistence of humans and machines in science fiction. From the helpful companionship of [C3PO](https://tr.wikipedia.org/wiki/C-3PO) and [Jarvis](https://en.wikipedia.org/wiki/J.A.R.V.I.S.) to the strategic alliances in [Halo](https://www.imdb.com/title/tt2934286/), these stories showcase a future where technology amplifies human potential.

### Jan's Role in Shaping the Future

Jan is our contribution to this future - a tool designed to augment human capabilities, not replace them. We are committed to developing AI that works for humanity, enhancing our creativity, productivity, and well-being. With Jan, we aim to empower individuals and communities to achieve more, together.

Our vision is not just a dream, it's a blueprint for a future where technology and humanity harmonize to unlock unprecedented possibilities.

## How we imagine the world in the future

We are fundamentally optimistic about the future. Jan aligns with the [Solarpunk movement](https://en.wikipedia.org/wiki/Solarpunk), which envisions a world where technology and nature coexist and flourish together. We reject the notion of climate doomerism and instead, focus on the positive impact we can make with AI.

![solarpunk and jan](./assets/solar-punk.webp)

Imagine a world where every individual is empowered by their own robots, where machines are not just tools but partners in our journey. This is the future Jan is striving to create.

Now, let's take a glimpse into this future through a day in the life of Emre, a reflection of how Jan's vision manifests in everyday life.

## A Day in the Life of Emre in 2050

> In 2050, Emre wakes up to the gentle sound of birds chirping, a soothing alarm created by **his own AI robot, Jan**. As he gets ready for the day, **Jan has already prepared** his schedule, factoring in his preferences and the day's weather.
>
> At breakfast, Emre discusses his upcoming project with **Jan, who offers insights and suggestions**, enhancing Emre's creativity. As he heads to work, his self-driving car, **integrated with Jan**, takes the most scenic and efficient route, allowing Emre to enjoy a moment of tranquility.
>
> In the office, Emre collaborates with colleagues from around the globe in a virtual workspace. **Jan assists** by translating languages in real-time and organizing ideas, making collaboration seamless and productive.
>
> During lunch, Emre decides to explore a new hobby. **Jan quickly curates** a list of resources and connects Emre with a virtual mentor, making learning accessible and enjoyable.
>
> In the afternoon, Emre takes a break to connect with nature. His smart garden, **managed by Jan**, is thriving, blending technology with the natural world in perfect harmony.
>
> As the day winds down, Emre reflects on his accomplishments. **With Jan's help**, he's been able to focus on what truly matters, achieving a balance between work, personal growth, and well-being.
>
> In 2050, Jan is more than just a tool, it's an integral part of Emre's life, **augmenting his abilities** and enabling him to live a more fulfilling life.
What a day, hah!

---

Jan's vision commits to developing thinking machines that work alongside humans - learning, adapting, and contributing to a broader, smarter society. This journey isn't just about technology. It's about creating a future where humans and machines collaborate.

Let's build the future together - join the journey!
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
title: 'Rag Is Not Enough: Lessons from Beating GPT-3.5 on Specialized Tasks with Mistral 7B'
description: 'Creating Open Source Alternatives to Outperform ChatGPT'
slug: /blog/surpassing-chatgpt-with-open-source-alternatives
tags: [Open Source ChatGPT Alternatives, Outperform ChatGPT]
authors: [hahuyhoang411, 0xsage, automaticcat]
date: 2024-03-17
---

## Abstract

We present a straightforward approach to adapting small, open-source models for specialized use cases, that can surpass GPT 3.5 performance with RAG. With it, we were able to get superior results on Q&A over [technical documentation](https://nitro.jan.ai/docs) describing a small [codebase](https://github.com/janhq/nitro).

In short, (3) extending a general foundation model like [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) with strong math and coding, and (7) training it over a high-quality, synthetic dataset generated from the intended corpus, and (2) adding RAG capabilities, can lead to significant accuracy improvements.

Problems still arise with catastrophic forgetting in general tasks, commonly observed during specialized domain fine-tuning. In our case, this is likely exacerbated by our lack of access to Mistral’s original training dataset and various compression techniques used in our approach to keep the model small.

## Selecting a Strong Foundation Model

[Mistral 7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) outshines both [Meta's Llama-2 7B](https://huggingface.co/meta-llama/Llama-2-7b) and [Google's Gemma 7B](https://huggingface.co/google/gemma-7b) in key benchmarks, making it our choice for a base model. Starting with a strong foundation like Mistral allowed us to achieve greater accuracy in our specialized adaptations.

![Mistral vs LLama vs Gemma](assets/mistral-comparasion.png)

_Figure 1._ Mistral 7B excels in benchmarks, ranking among the top foundational models.

_Note: we are not sponsored by the Mistral team. Though many folks in their community do like to run Mistral locally using our desktop client - [Jan](https://jan.ai/)._

## Cost-Effectively Improving the Base Model

Mistral alone has known, poor math capabilities, which we needed for our highly technical use case. Thus, we tested all model variants on top of Mistral, from foundation models to finetunes to model merges, in order to find a stronger base model to receive our own finetuning.

![Merged model vs finetuned models](assets/stealth-comparasion.png)

_Figure 2._ The merged model, Stealth, doubles the mathematical capabilities of its foundational model while retaining the performance in other tasks.

We found merging models is quick and cost-effective, enabling fast adjustments based on the result of each iteration.

We ended up with [Stealth 7B v1.1](https://huggingface.co/jan-hq/stealth-v1.1), a [SLERP](https://github.com/Digitous/LLM-SLERP-Merge) merge of Mistral with the following:

- [WizardMath](https://huggingface.co/WizardLM/WizardMath-7B-V1.1) for its math capabilities.
- [WizardCoder](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0) for its coding capabilities.
- Our own [Trinity](https://huggingface.co/jan-hq/trinity-v1.2) model for its versatility across general tasks.

This particular combination yielded the best tradeoff across mathematical & technical reasoning while retaining the most pre-merge performance on general tasks.

## DPO Finetuning

Merging different LLMs can lead to a mixed answering style because each model was originally trained on different types of data.

Thus, we applied Direct Preference Optimization ([DPO](https://arxiv.org/abs/2305.18290)) using the [Intel's Orca DPO pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) dataset, chosen for its helpful answering style in general, math and coding concentration.

This approach results in a final model - [Stealth 7B v1.2](https://huggingface.co/jan-hq/stealth-v1.2), with minimal loss, and realign to our technical preferences.

## Using Our Technical Documentation

With the base model ready, we started on our specific use case.

Jan is an open-source & bootstrapped project - at one point during our unanticipated growth, we received 1 customer support ticket per minute, with no one to handle customer service.

So, we directed our efforts toward training a model to answer user questions based on existing technical documentation.

Specifically, we trained it on Nitro [docs](https://nitro.jan.ai/docs). For context, Nitro is the default inference engine for Jan. It’s a serious server implementation of LlamaCPP, written in C++, with multimodal, queues, and other production-level server capabilities.

It made an interesting corpus because it was rife with post-2023 technical jargon, edge cases, and poor informational layout.

## Generating a Training Dataset for GPT-4

The first step was to transform Nitro’s unstructured format into a synthetic Q&A dataset designed for [instruction tuning](https://arxiv.org/pdf/2109.01652.pdf).

The text was split into chunks of 300-token segments with 30-token overlaps. This helped to avoid a [lost-in-the-middle](https://arxiv.org/abs/2307.03172) problem where LLM can’t use context efficiently to answer given questions.

The chunks were then given to GPT-4 with 8k context length to generate 3800 Q&A pairs. The [training dataset](https://huggingface.co/datasets/jan-hq/nitro_binarized_v2) is available on HuggingFace.

## Training

The training was done with supervised finetuning (SFT) from the [Hugging Face's alignment handbook](https://github.com/huggingface/alignment-handbook) based on the [Huggingface's Zephyr Beta](https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta) guidelines.

We used consumer-grade, dual Nvidia RTX 4090s for the training. The end-to-end training took 18 minutes. We found optimal hyperparameters in LoRA for this specific task to be `r = 256` and `alpha = 512`.

This final model is publicly available at https://huggingface.co/jan-hq/nitro-v1.2-e3.

![Using LLM locally](assets/nitro-on-jan.png)

_Figure 3._ Using the new finetuned model in [Jan](https://jan.ai/).

## Improving Results With Rag

As an additional step, we also added [Retrieval Augmented Generation (RAG)](https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/) as an experiment parameter.

A simple RAG setup was done using **[Llamaindex](https://www.llamaindex.ai/)** and the **[bge-en-base-v1.5 embedding](https://huggingface.co/BAAI/bge-base-en-v1.5)** model for efficient documentation retrieval and question-answering. The RAG implementation is publicly available at at https://github.com/janhq/open-foundry/blob/main/rag-is-not-enough/rag/nitro_rag.ipynb

## Benchmarking the Results

We curated a new set of [50 multiple-choice questions](https://github.com/janhq/open-foundry/blob/main/rag-is-not-enough/rag/mcq_nitro.csv) (MCQ) based on the Nitro docs. The questions had varying levels of difficulty and had trick components that challenged the model's ability to discern misleading information.

![Opensource model outperforms GPT](assets/rag-comparasion.png)

_Figure 4._ Comparison between fine-tuned model and OpenAI's GPT.

_Table 1._ Result of Benchmarking Different Model With RAG.
| Approach | Performance |
| ----------------------------------------------------------------------------------- | ----------- |
| GPT-3.5 with RAG | 56.7% |
| GPT-4 with RAG | 64.3% |
| Merged 7B Model ([Stealth 7B](https://huggingface.co/jan-hq/stealth-v1.3)) with RAG | 47.7% |
| Finetuned 7B Model (Nitro 7B) with RAG | 57.8% |

This indicates that with task-specific training, we can improve an open-source, Small Language Model to the level of GPT-3.5 on domain knowledge.

Notably, the finetuned with RAG approach also demonstrated more consistency across benchmarking, as indicated by its lower standard deviation.

## Conclusion

We conclude that this combination of model merging finetuning and RAG yields promise. This finding is relevant for teams and individuals that need specialized, technical SLMs that need to run in resource-constrained or highly secured environments, where GPT may not be an option.

Anecdotally, we’ve had some success using this model in practice to onboard new team members to the Nitro codebase.

A full research report with more statistics can be found at https://github.com/janhq/open-foundry/blob/main/rag-is-not-enough/README.md.

## References

[1] Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le. Finetuned Language Models Are Zero-Shot Learners. _arXiv preprint arXiv:2109.01652_, 2021. URL: https://arxiv.org/abs/2109.01652

[2] Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang. WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct. _arXiv preprint arXiv:2308.09583_, 2023. URL: https://arxiv.org/abs/2308.09583

[3] Luo, Y., Yang, Z., Meng, F., Li, Y., Zhou, J., & Zhang, Y. An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning. _arXiv preprint arXiv:2308.08747_,2023 URL: https://arxiv.org/abs/2308.08747

[4] Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang. WizardCoder: Empowering Code Large Language Models with Evol-Instruct., _arXiv preprint arXiv:2306.08568_, 2023. URL: https://arxiv.org/abs/2306.08568

[5] SciPhi-AI, Agent Search. GitHub. URL: https://github.com/SciPhi-AI/agent-search

[6] Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang. "Lost in the Middle: How Language Models Use Long Contexts." _arXiv preprint arXiv:2307.03172_, 2023. URL: https://arxiv.org/abs/2307.03172

[7] Luo, H., Sun, Q., Xu, C., Zhao, P., Lou, J., Tao, C., Geng, X., Lin, Q., Chen, S., & Zhang, D. WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct. _arXiv preprint arXiv:2308.09583_, 2023. URL: https://arxiv.org/abs/2308.09583

[8] nlpxucan et al., WizardLM. GitHub. URL: https://github.com/nlpxucan/WizardLM
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit bedc795

Please sign in to comment.