feat: Add FAQ Section to README (#263)

Add FAQ Section to README
kaito-project · Feb 28, 2024 · 42a5965 · 42a5965
1 parent 7bd5641
commit 42a5965
Showing 1 changed file with 42 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -23,7 +23,7 @@ Using Kaito, the workflow of onboarding large AI inference models in Kubernetes
 
 Kaito follows the classic Kubernetes Custom Resource Definition(CRD)/controller design pattern. User manages a `workspace` custom resource which describes the GPU requirements and the inference specification. Kaito controllers will automate the deployment by reconciling the `workspace` custom resource.
 <div align="left">
-  <img src="docs/img/arch.png" width=80% title="Kaito architecture">
+  <img src="docs/img/arch.png" width=80% title="Kaito architecture" alt="Kaito architecture">
 </div>
 
 The above figure presents the Kaito architecture overview. Its major components consist of:
@@ -79,6 +79,47 @@ The detailed usage for Kaito supported models can be found in [**HERE**](presets
 
 The number of the supported models in Kaito is growing! Please check [this](./docs/How-to-add-new-models.md) document to see how to add a new supported model.
 
+## FAQ
+
+### How to upgrade the existing deployment to use the latest model configuration?
+
+When using hosted public models, a user can delete the existing inference workload (`Deployment` of `StatefulSet`) manually, and the workspace controller will create a new one with the latest preset configuration (e.g., the image version) defined in the current release. For private models, it is recommended to create a new workspace with a new image version in the Spec.
+
+### How to update model/inference parameters to override the Kaito Preset Configuration?
+
+To update model or inference parameters for a deployed service, perform a `kubectl edit` on the workload type, which could be either a `StatefulSet` or `Deployment`.
+For example, to enable 4-bit quantization on a `falcon-7b-instruct` deployment, you would execute:
+
+```
+kubectl edit deployment workspace-falcon-7b-instruct
+```
+
+Within the deployment configuration, locate the command section and modify it as follows:
+
+Original command:
+```
+accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16
+```
+Modified command to enable 4-bit Quantization
+```
+accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16 --load_in_4bit
+```
+
+For a comprehensive list of inference parameters for the text-generation models, refer to the following options:
+- `pipeline`: The model pipeline for the pre-trained model. For text-generation models this can be either `text-generation` or `conversational`
+- `pretrained_model_name_or_path`: Path to the pretrained model or model identifier from huggingface.co/models.
+- Additional parameters such as `state_dict`, `cache_dir`, `from_tf`, `force_download`, `resume_download`, `proxies`, `output_loading_info`, `allow_remote_files`, `revision`, `trust_remote_code`, `load_in_4bit`, `load_in_8bit`, `torch_dtype`, and `device_map` can also be customized as needed.
+
+Should you need an undocumented parameter, kindly file an issue for potential future inclusion.
+
+### What is the difference between instruct and non-instruct models?
+The main distinction lies in their intended use cases.  Instruct models are fine-tuned versions optimized
+for interactive chat applications. They are typically the preferred choice for most implementations due to their enhanced performance in
+conversational contexts.
+
+On the other hand, non-instruct, or raw models, are designed for further fine-tuning. Future developments in Kaito may include features that allow users to
+apply fine-tuned weights to these raw models.
+
 ## Contributing
 
 [Read more](docs/contributing/readme.md)