Release 24.6

We are excited to announce the release of MAX 24.6, featuring a preview of MAX GPU! At the heart of the MAX 24.6 release is MAX GPU – the first vertically integrated Generative AI serving stack that eliminates the dependency on vendor-specific computation libraries like NVIDIA’s CUDA.

MAX GPU is built on two groundbreaking technologies. The first is MAX Engine, a high-performance AI model compiler and runtime built with innovative Mojo GPU kernels for NVIDIA GPUs–free from CUDA or ROCm dependencies. The second is MAX Serve, a sophisticated Python-native serving layer specifically engineered for LLM applications. MAX Serve expertly handles complex request batching and scheduling, delivering consistent and reliable performance, even under heavy workloads.

For additional details, checkout the changelog and the release announcement.

Release 24.5

We are excited to announce the release of MAX 24.5! This release includes support for installing MAX as a conda package with magic, a powerful new package and virtual environment manager. We’re also introducing two new Python APIs for MAX Graph and MAX Driver, which will ultimately provide the same low-level programming interface as the Mojo Graph API. MAX Engine has improved performance for Llama3, with 24.5 generating tokens for Llama an average of 15% to 48% faster. Lastly, this release also adds support for Python 3.12, and drops support for Python 3.8 and Ubuntu 20.04.

For additional details, checkout the changelog and the release announcement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 24.6

Release 24.5

Releases: modularml/mojo

Mojo 24.6

Release 24.6

Mojo 24.5

Release 24.5