Skip to content

Commit

Permalink
feat: Add FAQ Section to README (#263)
Browse files Browse the repository at this point in the history
Add FAQ Section to README
  • Loading branch information
ishaansehgal99 authored Feb 28, 2024
1 parent 7bd5641 commit 42a5965
Showing 1 changed file with 42 additions and 1 deletion.
43 changes: 42 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Using Kaito, the workflow of onboarding large AI inference models in Kubernetes

Kaito follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a `workspace` custom resource which describes the GPU requirements and the inference specification. Kaito controllers will automate the deployment by reconciling the `workspace` custom resource.
<div align="left">
<img src="docs/img/arch.png" width=80% title="Kaito architecture">
<img src="docs/img/arch.png" width=80% title="Kaito architecture" alt="Kaito architecture">
</div>

The above figure presents the Kaito architecture overview. Its major components consist of:
Expand Down Expand Up @@ -79,6 +79,47 @@ The detailed usage for Kaito supported models can be found in [**HERE**](presets

The number of the supported models in Kaito is growing! Please check [this](./docs/How-to-add-new-models.md) document to see how to add a new supported model.

## FAQ

### How to upgrade the existing deployment to use the latest model configuration?

When using hosted public models, a user can delete the existing inference workload (`Deployment` of `StatefulSet`) manually, and the workspace controller will create a new one with the latest preset configuration (e.g., the image version) defined in the current release. For private models, it is recommended to create a new workspace with a new image version in the Spec.

### How to update model/inference parameters to override the Kaito Preset Configuration?

To update model or inference parameters for a deployed service, perform a `kubectl edit` on the workload type, which could be either a `StatefulSet` or `Deployment`.
For example, to enable 4-bit quantization on a `falcon-7b-instruct` deployment, you would execute:

```
kubectl edit deployment workspace-falcon-7b-instruct
```

Within the deployment configuration, locate the command section and modify it as follows:

Original command:
```
accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16
```
Modified command to enable 4-bit Quantization
```
accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16 --load_in_4bit
```

For a comprehensive list of inference parameters for the text-generation models, refer to the following options:
- `pipeline`: The model pipeline for the pre-trained model. For text-generation models this can be either `text-generation` or `conversational`
- `pretrained_model_name_or_path`: Path to the pretrained model or model identifier from huggingface.co/models.
- Additional parameters such as `state_dict`, `cache_dir`, `from_tf`, `force_download`, `resume_download`, `proxies`, `output_loading_info`, `allow_remote_files`, `revision`, `trust_remote_code`, `load_in_4bit`, `load_in_8bit`, `torch_dtype`, and `device_map` can also be customized as needed.

Should you need an undocumented parameter, kindly file an issue for potential future inclusion.

### What is the difference between instruct and non-instruct models?
The main distinction lies in their intended use cases. Instruct models are fine-tuned versions optimized
for interactive chat applications. They are typically the preferred choice for most implementations due to their enhanced performance in
conversational contexts.

On the other hand, non-instruct, or raw models, are designed for further fine-tuning. Future developments in Kaito may include features that allow users to
apply fine-tuned weights to these raw models.

## Contributing

[Read more](docs/contributing/readme.md)
Expand Down

0 comments on commit 42a5965

Please sign in to comment.