Jockey is a conversational video agent designed for complex video workflows. It combines the following technologies:
- Twelve Labs Video Understanding Platform: Offers an API suite for integrating state-of-the-art (“SOTA”) video foundation models (VFMs) that understand contextual information from your videos. The platform works with video natively without the need for intermediary representations like pre-generated captions.
- Large Language Models (LLMs): Logically plan execution steps, interact with users, and pass video-related tasks to the Twelve Labs Video Understanding Platform. LLMs interpret natural language instructions and translate them into actionable tasks.
- LangGraph: Orchestrates the interaction between the Twelve Labs API suite and LLMs. LangGraph enables the creation of stateful, multi-step applications, allowing for complex video processing workflows.
This allows Jockey to perform accurate video operations based on natural language instructions.
NOTE: Jockey is currently in alpha development. It may be unstable or behave unexpectedly. Use caution when implementing Jockey in production environments.
- Intelligent task allocation: Jockey efficiently distributes workloads between LLMs for logical planning and user interaction, and VFMs for video understanding.
- Native video processing: Unlike systems that rely on pre-generated captions, Jockey works with video content directly, enabling more accurate and nuanced operations.
- Flexible architecture: Built on LangGraph, Jockey's modular design allows for easy customization and extension to suit specific use cases.
- Multiple deployment options: Supports both terminal-based deployment for quick testing and development, and API server deployment for integration into larger applications.
- Comprehensive video manipulation: Capable of tasks such as clip selection, video editing, and content analysis, all driven by natural language instructions.
Use cases include but are not limited to the following:
- Compiling and editing video clips
- Summarizing videos
- Generating chapters and highlights
- Searching for clips or videos using natural language queries
- Creating custom video compilations based on specific criteria
- Answering questions about video content
Ensure the following prerequisites are met before installing and using Jockey.
- Operating System: macOS
- CPU: M1 or newer
- RAM: 8GB minimum
-
Git: Any recent version.
- Installation instructions: Git Downloads.
- Verify the installation: Run the
git --version
command.
-
Python: Version 3.11 or higher.
- Installation instructions: Python Releases for macOS.
- Verify the installation: Run the
python3 --version
command.
-
FFmpeg: Must be accessible in your system's
PATH
environment variable.- Installation instructions: Download FFmpeg and add it to the
PATH
environment variable. - Verify the installation: Run the
ffmpeg -version
command.
- Installation instructions: Download FFmpeg and add it to the
-
Docker: Required for running the Jockey API server.
- Installation instructions: Get Docker.
- Verify the installation: Run the
docker --version
command.
-
Docker Compose V2: Required for running the Jockey API server.
- Installation instructions: Overview of installing Docker Compose.
- Verify the installation: Run the
docker compose version
command. If you see a message similar to "docker: 'compose' is not a docker command," you may have v1. To update your Docker Compose version, see the Migrate to Compose V2 page of the official Docker documentation.
-
Twelve Labs API Key:
-
LLM Provider API Key:
- Jockey supports Azure OpenAI and OpenAI. Retrieve the following based on your chosen provider:
- For Azure: Azure OpenAI endpoint, API key, and API version. For instructions, see the Retrieve key and endpoint section of the official Azure OpenAI documentation.
- For OpenAI: OpenAI API key. For instructions, see the Account setup section of the official OpenAI documentation.
- Jockey supports Azure OpenAI and OpenAI. Retrieve the following based on your chosen provider:
-
LangSmith API Key
- To deploy the
jockey API server
, we need aLangGraph
API key. You can test LangGraph locally with the free developer plan.- Log in to
LangGraph
(https://www.langchain.com/langgraph) and click the setting icon at the bottom left. - Go to
API Keys
and click theCreate API Key
at the top right. - Generate yours key and save it under the
LANGSMITH_API_KEY
variable in your.env
file.
- Log in to
- To deploy the
- Familiarity with Python and basic command-line operations is recommended.
- Familiarity with LangGraph is recommended to use Jockey with the LangGraph API server.
This section guides you through the process of installing Jockey on your system. Please ensure all the prerequisites are met before proceeding with the installation. If you encounter any issues, please refer to the Troubleshooting page or reach out on the Multimodal Minds Discord server for assistance.
Open a terminal, navigate to the directory where you want to install Jockey, and enter the following command:
git clone https://github.com/twelvelabs-io/tl-jockey.git
- Create a new virtual environment:
cd tl-jockey && python3 -m venv venv
- Activate your virtual environment:
source venv/bin/activate
- (Optional) Verify that your virtual environment is activated:
The output should display the path to your virtual environment directory, as shown in the example below:
echo $VIRTUAL_ENV
This indicates that your virtual environment is activated. Your virtual environment is not activated if you see an empty line. If this check indicates that your virtual environment is not activated, activate it using the/Users/tl/jockey/tl-jockey/venv
source venv/bin/activate
command.
Install the required Python packages:
pip3 install -r requirements.txt
Jockey uses environment variables for configuration, and comes with an example.env
file to help you get started.
- In the
tl-jockey
directory, copy theexample.env
file to a new file named.env
: - Open the newly created
.env
file in a text editor. - Replace the placeholders with your actual values. See the tables below for details.
Common variables
Variable | Description | Example |
---|---|---|
LANGSMITH_API_KEY |
Your Langgraph-sdk API key. | lsv2_... |
TWELVE_LABS_API_KEY |
Your Twelve Labs API key. | tlk_987654321 |
LLM_PROVIDER |
The LLM provider you wish to use. Possible values are AZURE and OPENAI . |
AZURE |
HOST_PUBLIC_DIR |
Directory for storing rendered videos | ./output |
HOST_VECTOR_DB_DIR |
Directory for vector database storage | ./vector_db |
LLM provider-specific variables
For Azure OpenAI:
Variable | Description | Example |
---|---|---|
AZURE_OPENAI_ENDPOINT |
Your Azure OpenAI endpoint URL | https://your-resource-name.openai.azure.com/ |
AZURE_OPENAI_API_KEY |
Your Azure OpenAI API key | 987654321 |
AZURE_OPENAI_API_VERSION |
The API version you're using | 2023-12-01-preview |
For OpenAI:
Variable | Description | Example |
---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | 987654321 |
This section provides instructions on how to deploy and use Jockey. Note that Jockey supports the following deployment options:
- Terminal-based deployment: Ideal for quick testing, development work, and debugging.
- LangGraph API server deployment: Suitable for building and debugging end-to-end user applications.
This document covers the terminal-based deployment. If you're a developer looking to integrate Jockey into your application, see the Deploy and Use Jockey with the LangGraph API Server
The terminal deployment is ideal for quick testing, development work, and debugging. It provides immediate feedback and allows for easy interaction with Jockey.
Terminal Example Jockey Video Walkthrough
- Activate your virtual environment:
source venv/bin/activate
- Run the following command:
python3 -m jockey terminal
- Jockey will initialize and display a startup message. Wait for the prompt indicating it's ready for input.
- Once Jockey is ready, you can start interacting with it using natural language commands.
Begin by providing and index id in your initial prompt, as shown in the example below:
Note that in some cases, such as summarizing videos or generating chapters and highlights, you must also provide a video ID. You can continue the conversation by providing new instructions or asking questions, as shown in the following example:
Use index 65f747a50db0463b8996bde2. I'm trying to create a funny video focusing on Gordon Ramsay. Can you find 3 clips of Gordon yelling at his chefs about scrambled eggs and then a final clip where Gordon bangs his head on a table. After you find all those clips, lets edit them together into one video.
This is awesome but the last clip is too long. Lets shorten the last clip where Gordon hits his head on the table by making it start one second later. Then combine all the clips into a single video again.
- When you've finished, exit terminal mode using the
Ctrl+C
keyboard shortcut.
The terminal version of Jockey provides verbose output for debugging purposes:
- The outputs from all of the individual components are displayed.
- Tool calls and their results are also displayed.
To adjust the verbosity of the output, modify the parse_langchain_events_terminal()
function in jockey/util.py
.
Note that the tags for the individual components are set in app.py.
To integrate Jockey into your application, use an HTTP client library or the LangGraph Python SDK.
For a basic example of how to interact with Jockey programmatically, refer to the client.ipynb Jupyter notebook in the project repository. For more detailed information, see the LangGraph Examples page.