diff --git a/README.md b/README.md index e92e742..4067f05 100755 --- a/README.md +++ b/README.md @@ -153,16 +153,88 @@ To launch RAG: # Inference -⌛️ in progress.. + +## Chat Inference +- **📚 Dataset type** prepare your dataset in the `ChatDataset`, examples available [here](docs/dataset_example.md#-chat-dataset) format. +- **📝 Configs Example**: [sft.json](configs/exp/train/sft/sft.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment inference_chat --inference_settings_path configs/exp/train/sft/sft.json +``` + + +## Classification Inference +- **📚 Dataset type** prepare your dataset in the `ChatDataset`, examples available [here](docs/dataset_example.md#-chat-dataset) format. +- **📝 Configs Example**: [classification_inference.json](configs/exp/inference/classification/classification_inference.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment inference_classification --inference_settings_path configs/exp/train/sft/sft.json +``` + + +## Multimodal Inference +- **📚 Dataset type** prepare your dataset in the `ChatDataset`, examples available [here](docs/dataset_example.md#-chat-dataset) format. +- **📝 Configs Example**: [mlp.json](configs/exp/inference/multimodal/mlp.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment inference_multimodal --inference_settings_path configs/exp/train/sft/sft.json +``` + +## RAG Inference +- **📚 Dataset type** prepare your dataset in the `ChatDataset`, examples available [here](docs/dataset_example.md#-chat-dataset) format. +- **📝 Configs Example**: [rag_inference.json](configs/exp/inference/rag/rag_inference.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment inference_rag --inference_settings_path configs/exp/train/sft/sft.json +``` # Sampling -⌛️ in progress.. + +## Random Sampling +- **📚 Dataset type** prepare your dataset in the `SamplingRMDataset`, examples available [here](docs/dataset_example.md#-sampling-dataset) format. +- **📝 Configs Example**: [random.json](tests/fixtures/configs/sampling/base.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment random_sample --experiment_settings_path tests/fixtures/configs/sampling/base.json +``` + + +## RSO Sampling +- **📚 Dataset type** prepare your dataset in the `SamplingRMDataset`, examples available [here](docs/dataset_example.md#-sampling-dataset) format. +- **📝 Configs Example**: [rso.json](tests/fixtures/configs/sampling/rso.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment rso_sample --experiment_settings_path tests/fixtures/configs/sampling/rso.json +``` + + +## Reward Model Sampling +- **📚 Dataset type** prepare your dataset in the `ChatDataset`, examples available [here](docs/dataset_example.md#-sampling-dataset) format. +- **📝 Configs Example**: [rm.json](tests/fixtures/configs/sampling/rm.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment rm_sample --experiment_settings_path tests/fixtures/configs/sampling/rm.json +``` # Common -⌛️ in progress.. + +## Merge Adapters to base model +- **📝 Configs Example**: [llama.json](configs/utils/merge_adapters_to_base/llama.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment merge_adapters_to_base --settings_path configs/utils/merge_adapters_to_base/llama.json +``` + + +## Preprocess Multimodal Dataset +- **📝 Configs Example**: [coco2014_clip.json](configs/utils/preprocess/coco2014_clip.json) +- **🖥️ CLI launch command** +```bash +python -m turbo_alignment preprocess_multimodal_dataset --settings_path configs/utils/preprocess/coco2014_clip.json +``` diff --git a/configs/utils/convert_to_base/llama.json b/configs/utils/merge_adapters_to_base/llama.json similarity index 100% rename from configs/utils/convert_to_base/llama.json rename to configs/utils/merge_adapters_to_base/llama.json diff --git a/tutorials/multimodal/create_tutorial_dataset.py b/tutorials/multimodal/create_tutorial_dataset.py index ee7582d..edb0c3e 100644 --- a/tutorials/multimodal/create_tutorial_dataset.py +++ b/tutorials/multimodal/create_tutorial_dataset.py @@ -1,6 +1,3 @@ -import json -import random -import subprocess from pathlib import Path from typing import Any @@ -9,15 +6,13 @@ from turbo_alignment.common.data.io import write_jsonl from turbo_alignment.dataset.chat.models import ChatMessageRole from turbo_alignment.dataset.multimodal.models import ( - MultimodalChatMessage, MultimodalDatasetRecord, MultimodalImageMessage, MultimodalTextMessage, ) -from turbo_alignment.settings.modality import Modality -def convert_to_multimodal_record(row): +def convert_to_multimodal_record(row: dict[str, Any]) -> MultimodalDatasetRecord: return MultimodalDatasetRecord( id=row['id'], messages=[