JavaScript & Swift SDK #81

ashvardanian · 2024-04-19T05:01:12Z

How many AI models can run on-device out of the box? UForm multimodal embeddings can 🥳

Model	Parameters	Languages	Architecture
`uform3-image-text-english-large` 🆕	365M	1	6 text layers, ViT-L/14, 6 multimodal layers
`uform3-image-text-english-base`	143M	1	2 text layers, ViT-B/16, 2 multimodal layers
`uform3-image-text-english-small` 🆕	79M	1	2 text layers, ViT-S/16, 2 multimodal layers
`uform3-image-text-multilingual-base`	206M	21	8 text layers, ViT-B/16, 4 multimodal layers

JavaScript

Load the models and preprocessors for different modalities:

import { getModel, Modality } from 'uform';
import { TextProcessor, TextEncoder, ImageEncoder, ImageProcessor } from 'uform';

const { configPath, modalityPaths, tokenizerPath } = await getModel({
    modelId: 'unum-cloud/uform3-image-text-english-small',
    modalities: [Modality.TextEncoder, Modality.ImageEncoder],
});

Embed images:

const imageProcessor = new ImageProcessor(configPath);
await imageProcessor.init();
const processedImages = await imageProcessor.process("path/to/image.png");

const imageEncoder = new ImageEncoder(modalityPaths.image_encoder, imageProcessor);
await imageEncoder.init();
const imageOutput = await imageEncoder.encode(processedImages);
assert(imageOutput.embeddings.dims.length === 2, "Output should be 2D");

Embed queries:

const textProcessor = new TextProcessor(configPath, tokenizerPath);
await textProcessor.init();
const processedTexts = await textProcessor.process("a small red panda in a zoo");

const textEncoder = new TextEncoder(modalityPaths.text_encoder, textProcessor);
await textEncoder.init();
const textOutput = await textEncoder.encode(processedTexts);
assert(textOutput.embeddings.dims.length === 2, "Output should be 2D");
await textEncoder.dispose();

Swift

Embed images:

let imageModel = try await ImageEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let imageURL = "https://github.com/ashvardanian/ashvardanian/blob/master/demos/bbq-on-beach.jpg?raw=true"
guard let url = URL(string: imageURL),
    let imageSource = CGImageSourceCreateWithURL(url as CFURL, nil),
    let cgImage = CGImageSourceCreateImageAtIndex(imageSource, 0, nil) {
    throw Exception("Could not load image from URL: \(imageURL)")
}

var imageEmbedding: Embedding = try imageModel.encode(cgImage)
var imageVector: [Float32] = embedding.asFloats()

Embed queries:

let textModel = try await TextEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let text = "A group of friends enjoy a barbecue on a sandy beach, with one person grilling over a large black grill, while the other sits nearby, laughing and enjoying the camaraderie."
let textEmbedding: Embedding = try textModel.encode(text)
let textVector: [Float32] = textEmbedding.asFloats()

Python

Load model:

from uform import get_model, Modality

model_name = 'unum-cloud/uform3-image-text-english-small'
modalities = [Modality.TEXT_ENCODER, Modality.IMAGE_ENCODER]
processors, models = get_model(model_name, modalities=modalities)

Embed images:

import requests
from io import BytesIO
from PIL import Image

image_url = 'https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'
image = Image.open(BytesIO(requests.get(image_url).content))

processor_image = processors[Modality.IMAGE_ENCODER]
model_image = models[Modality.IMAGE_ENCODER]
image_data = processor_image(image)
image_features, image_embedding = model_image.encode(image_data, return_features=True)

Embed queries:

text = 'a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'

model_text = models[Modality.TEXT_ENCODER]
processor_text = processors[Modality.TEXT_ENCODER]

text_data = processor_text(text)
text_features, text_embedding = model_text.encode(text_data, return_features=True)

…ain-dev

xenova

Love it! 😄 Just some minor suggestions/improvements/comments.

javascript/encoders.mjs

javascript/encoders_test.js

Co-authored-by: Joshua Lochner <[email protected]>

ashvardanian · 2024-04-25T03:13:39Z

🎉 This PR is included in version 3.0.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

ashvardanian added 21 commits April 16, 2024 02:36

Improve: Fetch modalities separately

2246f13

Fix: Compatibility with older models

b310e90

Make: Rename files

a2f77d2

Merge branch 'main-dev' of https://github.com/unum-cloud/uform into m…

30d40a0

…ain-dev

Add: Placeholder for JavaScript SDK

acbb77a

Docs: Improve export process

2351fe9

Merge branch 'main-dev'

bb7ca9d

Merge branch 'main' of https://github.com/ashvardanian/uform

38949f3

Break: Deprecate old ONNX structure

94ebd6e

Improve: Support different models with Swift

479ae61

Add: JavaScript library placeholder

45479bd

Merge branch 'main' of https://github.com/ashvardanian/uform

a2e1a86

Make: Consistent naming between Python and TS

2f81413

Improve: Separate text and image processors

eb88296

Make: Deprecate TypeScript for JavaScript

a391b6d

Add: Text processor for JS

50c71c8

Fix: Mismatch in the input types for text

19c0c30

Fix: Passing tests in JavaScript

7ac33bd

Fix: Rename image inputs

4f1568f

Improve: Separate encoders & processors

cccfc62

Improve: PAss tests for small models

b790519

ashvardanian force-pushed the main branch from b148f84 to b790519 Compare April 20, 2024 04:27

ashvardanian added 8 commits April 20, 2024 22:18

Improve: Test more models

605bfc8

Improve: Test many models in JS

0c2aa28

Add: Text and image cross-referencing in JS

766963c

Merge branch 'main' of https://github.com/ashvardanian/uform

ffca7f6

Add: Initial decoder exporters

6b3f8cd

Fix: Transposing channels in JS

4c1ac18

Improve: Uniform APIs across JS, Py, and Swift

9bf5fe3

Improve: Error handling in Swift

3e1e576

ashvardanian added 9 commits April 23, 2024 11:57

Merge branch 'main' of https://github.com/ashvardanian/uform

18a3bb6

Improve: Image pre-processing in Swift

f8654b5

Improve: Hide temporary files

37d7f52

Merge branch 'main' of https://github.com/ashvardanian/uform

d82a1a1

Improve: Pretty-print benchmarks

67b083f

Make: Add development dependencies

8e38b2e

Improve: Reduce warnings

96df21d

Improve: Move inputs to same device as model

91c86a1

Docs: Reorganize

6d5f1ce

ashvardanian changed the title ~~JavaScript SDK~~ JavaScript & Swift SDK Apr 24, 2024

ashvardanian added 4 commits April 24, 2024 05:54

Improve: Extend benchmarks

1f556b8

Docs: Add examples

47b7a49

Improve: Refresh CLI for new models

ebd7f66

Docs: Reference for Py and Swift

d00204f

xenova approved these changes Apr 24, 2024

View reviewed changes

javascript/encoders.mjs Outdated Show resolved Hide resolved

javascript/encoders_test.js Outdated Show resolved Hide resolved

javascript/encoders_test.js Show resolved Hide resolved

javascript/encoders_test.js Outdated Show resolved Hide resolved

ashvardanian and others added 6 commits April 24, 2024 23:50

Docs: Typo

c6f773c

Co-authored-by: Joshua Lochner <[email protected]>

Improve: Backend-agnostic .data extraction in JS

6d4b614

Co-authored-by: Joshua Lochner <[email protected]>

Fix: add_special_tokens argument in JS

cf25160

Co-authored-by: Joshua Lochner <[email protected]>

Improve: Multi-GPU support in Py

917a4a8

Add: Parallel decoding bench

f195b66

Merge branch 'main' of https://github.com/ashvardanian/uform

f4b19a8

ashvardanian merged commit 641b8c0 into unum-cloud:main-dev Apr 25, 2024
1 check passed

ashvardanian added the released label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JavaScript & Swift SDK #81

JavaScript & Swift SDK #81

ashvardanian commented Apr 19, 2024 •

edited

Loading

xenova left a comment •

edited

Loading

ashvardanian commented Apr 25, 2024

JavaScript & Swift SDK #81

JavaScript & Swift SDK #81

Conversation

ashvardanian commented Apr 19, 2024 • edited Loading

JavaScript

Swift

Python

xenova left a comment • edited Loading

Choose a reason for hiding this comment

ashvardanian commented Apr 25, 2024

ashvardanian commented Apr 19, 2024 •

edited

Loading

xenova left a comment •

edited

Loading