Skip to content

Releases: modularml/mojo

Mojo 24.6

17 Dec 18:05
Compare
Choose a tag to compare

Release 24.6

We are excited to announce the release of MAX 24.6, featuring a preview of MAX GPU! At the heart of the MAX 24.6 release is MAX GPU – the first vertically integrated Generative AI serving stack that eliminates the dependency on vendor-specific computation libraries like NVIDIA’s CUDA.

MAX GPU is built on two groundbreaking technologies. The first is MAX Engine, a high-performance AI model compiler and runtime built with innovative Mojo GPU kernels for NVIDIA GPUs–free from CUDA or ROCm dependencies. The second is MAX Serve, a sophisticated Python-native serving layer specifically engineered for LLM applications. MAX Serve expertly handles complex request batching and scheduling, delivering consistent and reliable performance, even under heavy workloads.

For additional details, checkout the changelog and the release announcement.

Mojo 24.5

26 Sep 21:26
Compare
Choose a tag to compare

Release 24.5

We are excited to announce the release of MAX 24.5! This release includes support for installing MAX as a conda package with magic, a powerful new package and virtual environment manager. We’re also introducing two new Python APIs for MAX Graph and MAX Driver, which will ultimately provide the same low-level programming interface as the Mojo Graph API. MAX Engine has improved performance for Llama3, with 24.5 generating tokens for Llama an average of 15% to 48% faster. Lastly, this release also adds support for Python 3.12, and drops support for Python 3.8 and Ubuntu 20.04.

For additional details, checkout the changelog and the release announcement.