Blog Link: Easy-GPT4o - reproduce GPT-4o in less than 200 lines
Easy-GPT4O opensource version: use OpenAI older API implements GPT-4o in less than 200 lines of code.
Why I start this project? This is just a toy project and a simple demo. I want to prove some ideas in this project:
- Developers can build their own GPT-4o using existing APIs. By leveraging available tools, developers can easily access the capabilities of advanced models.
- End-to-end models provide low latency but limited customization. This project explores the trade-off between latency and customization, highlighting the benefits and limitations of each approach.
- The combined power of multiple models can outperform a single multimodal model. This project demonstrates the effectiveness of a collaborative approach, leveraging the collective intelligence of various models to achieve superior results.
- Python 3.6 or higher
- OpenAI Python package (
openai
) - FFmpeg (for audio extraction)
-
Clone the repository:
git clone https://github.com/Chivier/easy-gpt4o
-
Install the required Python packages:
pip install -r requirements.txt
-
Download and install FFmpeg from the official website: https://ffmpeg.org/
# Set your own openai api
export OPENAI_API_KEY=xxxxxxx
python main.py input_video.mp4 output_audio.mp3
Replace input_video.mp4
with the path to your input video file, and output_audio.mp3
with the desired path to save the output audio file.
- Extracts audio from a video file
- Transcribes the audio using OpenAI Whisper API
- Generates image descriptions for key frames in the video using OpenAI GPT-4 Turbo API
- Combines the audio transcription and image descriptions into a comprehensive response
- Converts the response to speech using OpenAI TTS API
a.mp4
a1.mov
a2.mov
b.mp4
b.mov
- Open-source Model Replace OpenAI API
- Streaming video processing
- Use RAG store long period memory