VideoWise

In an age where video content dominates our digital interactions, finding key information within hours of footage can feel like searching for a needle in a haystack. VideoWise transforms the way you manage and interact with video content by making it searchable, interactive, and insightful. Whether you’re navigating a training session, analyzing a lecture, or creating engaging content, VideoWise makes working with videos more efficient and effective.

At its core, VideoWise provides a web application to upload videos, which are then transcribed using WhisperX, a highly efficient and accurate tool based on the Whisper OpenAI model. Each sentence is tied to a precise timestamp, enabling effortless navigation through hours of content without the frustration of scrubbing timelines.
Going beyond transcription, VideoWise integrates with Ollama, enabling users to interact with an AI assistant to ask questions about the video, generate summaries, or even create quizzes and documentation. Export options let users save the AI-powered chats in various formats or download the transcribed video with subtitles applied.

Key Features

📤 Seamless Video Uploads: Quickly upload your videos to get started.
🎙️ Accurate Transcription and Translation: WhisperX ensures high-quality transcriptions in multiple languages (en, fr, de, es, it, ja, zh, nl, uk, pt).
⏱️ Timestamped Navigation: Automatically associate transcribed sentences with their relative timestamps, enabling effortless navigation through video content.
🤖 AI-Powered Interactions: Communicate with an AI using Ollama to ask questions about the transcribed video, generate summaries, or create quizzes.
📦 Flexible Export Options: Export AI-powered chats in various formats or download the transcribed video with subtitles applied.

Setup

Requirements

Ollama (tested with v0.3.12 and v0.4.5)
Docker Desktop (tested with v24.0.2) or Docker Engine (tested with v24.0.7)
[OPTIONAL] NVIDIA GPU with installed drivers for optimal transcription performance.

Installation

VideoWise offers two different installation methods:

1. Simple Setup (Recommended for Quick Start)

This method is ideal for users who want a fast and simple installation. It runs the entire application on a single machine.

Steps

Install the required dependencies (Ollama and Docker)
Configure the Ollama API URL in the .env file. To do so, be sure to substitute <your_machine_ip> in OLLAMA_API_URL with the actual IP of the machine running Ollama, for example:
```
OLLAMA_API_URL=http://127.0.0.1:11434/api/chat
```

[OPTIONAL] Modify the WhisperX and Ollama configuration in the .env file.

# WhisperX Configuration
WHISPER_MODEL="large-v2" # Whisper transcription model, available models are {tiny, base, small, medium, large-v2, large-v3}

# Ollama Configuration
OLLAMA_MODEL="llama3.1:latest" # Ollama chat model
OLLAMA_CTX_LEN=16000 # Ollama model context length
OLLAMA_MAX_PRED_LEN=4000 # Ollama model max response length

Run Docker Compose to build and start the application. Include --profile gpu if your machine mounts a NVIDIA GPU:
```
docker-compose --profile gpu up --build
```

At the end of the process, you'll be able to access the application on port 80 (e.g. http://localhost:80).

2. Modular Setup (For Advanced Users)

The modular setup allows more flexibility and is ideal for separating services onto different machines (e.g., running the WhisperX transcription service on a GPU-equipped system). This setup requires manual configuration of each service.

Application Modules
- 🌐 Main Service: Acts as a central hub for all communication between modules.
- 🖥️ Web UI Client: The front-end interface for the application.
- 📁 FileSystem Service: Manages uploaded/generated files and handles video streaming.
- 🐍 Python Service: Interfaces with WhisperX for transcription and performs file conversions (HTML to PDF/DOCX).
- 🗄️ DataBase Service: PostgreSQL instance storing non-file data (e.g., chats, users).

Steps

Install Docker on every machine where a service will run and Ollama on the one that will provide the AI chat functionality.
Deploy the Database Service
- Navigate to the Database service directory and run the provided script:
```
cd videowise-db
./start_db.sh
```

Deploy the FileSystem Service

Navigate to the FileSystem service directory and build the Docker image:

cd videowise-filesystem-service
docker build -t videowise-filesystem-service .

Run the service, mounting the uploads directory for persistent storage:

docker run -d \
       --name videowise-filesystem-service \
       -v $(pwd)/videowise-filesystem-service/uploads:/app/uploads \
       -p 8081:8081 \
       videowise-filesystem-service

Configure and Deploy the Python Service

Set the required environment variables for the Python Service:
- Edit /videowise-python-service/Dockerfile, and uncomment the following lines (23-24)
```
ENV FILESYSTEM_API_URL="http://<your_fs_service_ip>:8081" 
ENV WHISPER_MODEL="large-v2"
```
- Replace <your_fs_service_ip> with the IP address of the machine running the FileSystem service.

Build the Docker image:

cd videowise-python-service
docker build -t videowise-python-service .

Run the Python service:

For machines with a GPU:

docker run -d \
       --name videowise-python-service \
       --gpus all \
       -p 8000:8000 \
       videowise-python-service

For machines without a GPU:

docker run -d \
       --name videowise-python-service \
       -p 8000:8000 \
       videowise-python-service

Configure and Deploy the Main Service

Set the required environment variables for the Main Service

Edit /videowise-main-service/Dockerfileand uncomment the lines under:

# --- External Services ---
ENV OLLAMA_API_URL="http://<your_ollama_ip>:11434/api/chat" 
ENV QUARKUS_DATASOURCE_JDBC_URL="jdbc:postgresql://<your_db_ip>:5432/video_transcriptions_db" 
ENV FILESYSTEM_API_URL="http://<your_fs_service_ip>:8081" 
ENV FILESYSTEM_STREAMING_API_URL="http://<your_fs_service_ip>:8081" 
ENV WHISPER_API_URL="http://<your_python_service_ip>:8000"
# --- WhisperX config ---
ENV WHISPER_MODEL="large-v2" 
# --- Ollama config ---
ENV OLLAMA_MODEL="llama3.1:latest" 
ENV OLLAMA_CTX_LEN=16000 
ENV OLLAMA_MAX_PRED_LEN=4000

Replace the placeholders (<...>) with the corresponding service IP addresses.

Build the Docker image:

cd videowise-main-service
docker build -t videowise-main-service .
docker run -d --name videowise-main-service -p 8080:8080 videowise-main-service

Run the Main service:

docker run -d \
       --name videowise-main-service \
       -p 8080:8080 \
       videowise-main-service

Deploy the Web UI Client
1. Set the environment variable for the Web UI Client:
  - Edit /videowise-ui-client/Dockerfile and uncomment the line:
```
ENV MAIN_SERVICE_URL="http://<your_main_service_ip>:8080"
```
  - Replace <your_main_service_ip> with the IP address of the machine running the Main Service.
2. Build the Web UI Client:
```
cd videowise-ui-client
docker build -t videowise-ui-client .
```
3. Run the Web UI Client:
```
docker run -d --name videowise-ui-client -p 80:80 videowise-ui-client
```

How to Use

🆕 Create a new Chat: Begin by clicking on the New Chat button.
🎥 Upload a Video: Drag and drop your video file onto the right side of the interface.
⏳ Wait for Transcription: Allow the system to process and transcribe the video content.
🤖 Interact with AI: Check the "Inject Video Context" option to provide the AI with video context, then ask questions, generate summaries, or even create quizzes.
💾 Export Options: Export the transcribed video with embedded subtitles or save the AI chat content in PDF, Word, or TXT format.

Technical Details

By default, the Python Server employs the large-v2 model for video transcription. You can change this setting in the .env file (Simple Setup), or videowise-main-service/Dockerfile (Modular Setup):
```
ENV WHISPER_MODEL="large-v2"
```

To optimize memory usage, the server automatically clears models from memory after 5 minutes of inactivity. You can adjust the timeout or disable it entirely:

# `videowise-python-service/main.py`

whisperx_manager = WhisperXModelManager(
   model_name=model_name,
   device="cuda" if torch.cuda.is_available() else "cpu",
   timeout=300 # Release timeout, in seconds
   auto_release=True # True: If unused, resources are released after the timeout; False: Disable resource auto-release.
)

The default Large Language Model (LLM) for AI interactions is set to "llama3.1:latest". You can change this setting in the .env file (Simple Setup), or videowise-main-service/Dockerfile (Modular Setup):
```
ENV OLLAMA_MODEL="llama3.1:latest" 
```
The maximum response length is capped at 4.000 tokens and context length is set to 16.000 tokens. These can also be adjusted:
```
ENV OLLAMA_CTX_LEN=16000 
ENV OLLAMA_MAX_PRED_LEN=4000 
```
Both the Database and FileSystem Services data is mounted externally. This simplifies the operation of moving one of these services onto another machine.
- For the FileSystem Service copy the contents of videowise-filesystem-service/uploads into the same folder on the new machine.
- For the DataBase Service copy the contents of videowise-db/db_data into the same folder on the new machine.

Limitations

Speaker diarization (identifying speakers in audio) is currently not supported.
Chats are limited to a single video as the context.
The system integrates exclusively with Ollama and does not yet support other AI models such as ChatGPT, Gemini, or Claude.
Audio file uploads are not supported.
Minor bugs may occur as VideoWise is actively under development.

TODOs

Enable multi-video uploads within a single chat.
Add support for popular AI models (e.g., ChatGPT, Gemini, Claude).
Implement speaker diarization.
Add functionality to upload videos directly from YouTube URLs.
Introduce voice chat for real-time interaction with AI.
Support direct uploads of audio files.
Expand question presets for improved AI interactions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoWise

Table of Contents

Key Features

Setup

Requirements

Installation

1. Simple Setup (Recommended for Quick Start)

Steps

2. Modular Setup (For Advanced Users)

How to Use

Technical Details

Limitations

TODOs

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
videowise-db		videowise-db
videowise-filesystem-service		videowise-filesystem-service
videowise-main-service		videowise-main-service
videowise-python-service		videowise-python-service
videowise-ui-client		videowise-ui-client
.env		.env
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

GioPicci/videowise

Folders and files

Latest commit

History

Repository files navigation

VideoWise

Table of Contents

Key Features

Setup

Requirements

Installation

1. Simple Setup (Recommended for Quick Start)

Steps

2. Modular Setup (For Advanced Users)

How to Use

Technical Details

Limitations

TODOs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages