PhotoTranslator: Generate Creative Sentences in a Foreign Language from a Photo

Important

Created by Preternatural AI, an exhaustive client-side AI infrastructure for Swift.
This project and the frameworks used are presently in alpha stage of development.

PhotoTranslator: Generate Creative Sentences in a Foreign Language from a Photo

The PhotoTranslator app leverages OpenAI's Vision API to bring translations into the user's surroundings seamlessly. Users can simply take a photo, and the app, using an on-device YOLO model, identifies objects within the image. Then, creative sentences in the target language are generated about the picture in general and each object specifically along with the foreign language audio using ElevenLabs API, making learning a new language an engaging and immersive experience.

Usage

Supported Platforms

macos ios ipados

To install and run the PhotoTranslator app:

Download and open the project
Add your OpenAI API Key in the LLMClientManager file:

// AIManagers/LLMClientManager
private static let client: any LLMRequestHandling = OpenAI.Client(
    apiKey: "YOUR_API_KEY"
)

You can get the OpenAI API key on the OpenAI developer website. Note that you have to set up billing and add a small amount of money for the API calls to work (this will cost you less than 1 dollar).

Add your ElevenLabs API Key in the TTSClientManager file:

// AIManagers/TTSClientManager
static let client = ElevenLabs.Client(apiKey: "YOUR_API_KEY")

ElevenLabs is a “Text-to-Speech” service which is used in the PhotoTranslator app to generate the audio of the translated sentence in a foreign language. You can get your ElevenLabs API Key on the ElevenLabs website. The API key is located in your user profile:

Select the target language for translation. The app is currently set to Hindi.

// AIManagers/LLMClientManager
private static let targetLanguage = "Hindi"

Create the target language speaker in AIManagers/Speakers. The app is currently set to a HindiSpeaker

// AIManagers/Speakers
// change the speaker to your target language
// you can find the voice for your target language on the ElevenLabs website
struct HindiSpeaker: Speaker { 
    let speakerName: String = "Akshay"
    let elevenLabsVoiceID = "qO2mI1DuN2aagyvZHwwt"
}

Run the app on device - either iPhone, iPad or Mac as the camera is required to take a photo.
Take a photo and wait for the app to generate creative sentences about the photo in your target language, with English translation.

Bug: Note that there is currently a bug where the photo is flipped 90 degrees on the phone and iPad.

Key Concepts

The PhotoTranslator app is developed to demonstrate the the following key concepts:

Using OpenAI's Vision API
Function calling to get structured data from LLMs
Integrating ElevenLabs Multilingual Audio generation

Preternatural Frameworks

The following Preternatural Frameworks were used in this project:

AI: The definitive, open-source Swift framework for interfacing with generative AI.
Media: Media makes it stupid simple to work with media capture & playback in Swift.

Technical Specifications

The PhotoTranslator uses several AI frameworks in the following steps:

The user captures a photo
The photo is analyzed by the YOLOv8 on-device model, which detects and identifies individual objects within the image. Each object is highlighted with uniquely colored, numbered boxes. See PhotoObjectDetectionManager for the implementation.
The processed photo is sent to OpenAI using the completion API with function calling. This step involves generating creative sentences in the apps's target language about the picture as a whole and each individual object identified in the picture. Transliteration and english translation is also provided for each sentence. See LLMClientManager for implementation.
Finally, the translated text is converted into spoken audio using ElevenLabs' voice synthesis technology, so the user can learn how to say the sentence in the app's target foreign language. See TTSClientManager for implementation.

As a result, the PhotoTranslator app exemplifies the effective integration of diverse AI technologies to create a comprehensive and interactive language learning tool.

License

This package is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
PhotoTranslator.xcodeproj		PhotoTranslator.xcodeproj
PhotoTranslator		PhotoTranslator
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhotoTranslator: Generate Creative Sentences in a Foreign Language from a Photo

Table of Contents

Usage

Supported Platforms

Key Concepts

Preternatural Frameworks

Technical Specifications

License

About

Releases

Packages

Languages

preternatural-explore/photo-translator

Folders and files

Latest commit

History

Repository files navigation

PhotoTranslator: Generate Creative Sentences in a Foreign Language from a Photo

Table of Contents

Usage

Supported Platforms

Key Concepts

Preternatural Frameworks

Technical Specifications

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages