TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.
TT-NN is our neural network OP library with a PyTorch-like Python, and C++ API.
import ttnn
import torch
with ttnn.manage_device(device_id=0) as device:
a = torch.ones((5, 7))
b = torch.ones((1, 7))
a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
output = a + b
output = ttnn.to_torch(output)
print(output)
You can use simple conversion APIs to use PyTorch tensors with Tenstorrent hardware.
import ttnn
import torch
with ttnn.manage_device(device_id=0) as device, ttnn.tracer.trace():
a = torch.ones((5, 7))
b = torch.ones((1, 7))
a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
output = a + b
output = ttnn.to_torch(output)
ttnn.tracer.visualize(output)
We also provide tools to review the graphs you create and run in TT-NN.
We have working demos for models such as ResNet, BERT, Falcon7B and Falcon40B.
These are some performance metrics for our models running on Grayskull (GS). We constantly improve these and publish these metrics on GitHub Actions.
Model | Batch size | GS end-to-end throughput [1] | GS on-device throughput [2] | Target GS end-to-end throughput [1] |
---|---|---|---|---|
ResNet-50 (fps) | 20 | 2070 | 6943 | 10000 |
BERT-Large (sen/s) | 12 | 362 | 406 | 410 |
TT-NN Falcon-7B decode (t/s) | 32 | 135 | coming soon | 140 |
ViT | Coming end of March | |||
U-Net | Coming end of March |
[1] - Throughput is measured by taking batch size and dividing by accelerator inference time, and reported per sec.
[2] - Throughput on device is measured by directly counting the clock cycles for operations done on device.
We are writing efficient versions of the following models to run on our Wormhole (WH) architecture (N300 2x WH card):
Model | N300 (2x WH) Throughput |
---|---|
Falcon-7B | Coming end of March |
Mistral-7B | Coming end of March |
Mamba-2.8B | Coming end of March |
Stable Diffusion | Coming end of March |
And are also mapping models to run on our T3000 workstation with a 2x4 mesh of Wormhole devices (8x WH), and Galaxy systems with a 4x8 mesh of Wormhole devices (32x WH):
Model | T3000 (8x WH) Throughput | Galaxy (32x WH) Throughput |
---|---|---|
Falcon-40B | Coming end of March | Coming end of March |
LLaMA-2-70B | Coming end of March | Coming end of March |
Mixtral7Bx8 | Coming end of March | Coming end of March |
- Installing
- Getting started
- Documentation
- Troubleshooting and debugging tips
- Contributing
- Communication
Table of Contents generated with markdown-toc.
Note: Currently, all features are only fully tested on Grayskull E150 accelerators. We are currently working on functionality for other Tenstorrent architectures.
To find through all necessary instructions for setting up your Tenstorrent accelerator and this software, please refer to our full installation instructions.
You should look ahead to Getting started to further use this project.
If you just came reading from building from source, you can read ahead to running an example.
Otherwise, you must set up the necessary environment variables to use this project every time:
export ARCH_NAME=<arch name>
export TT_METAL_HOME=<appropriate value based on installation method above>
where <arch name>
is your target, which could be:
grayskull
wormhole_b0
etc...
If you're setting up an environment from source, you must further set up and activate the environment with:
export PYTHONPATH=<this repo dir>
export TT_METAL_ENV=dev
source build/python_env/bin/activate
After installing, please refer to our Getting Started page in our documentation.
Note that example programs are only available through source installation at this time.
Please refer to our documentation.
In addition to our documentation above, you can check out relevant sections in the contribution standards if you ever need hardware troubleshooting help or debugging tips.
We are excited to move our development to the public, open-source domain. However, we are not adequately staffed to review contributions in an expedient and manageable time frame at this time. In the meantime, please review the contributor's guide for more information about contribution standards.
If you would like to contribute, your submissions must pass post-commit regressions. If you would like more information on running tests locally and CI, please refer to the relevant section in the the contributor's guide and read it in its entirety.
Announcements from the Tenstorrent team regarding this project will be in the discussions page.
We also have a Discord channel that you can join. You may discuss with other members of the community and developers there. You may use this invite link.
If you would like to formally propose a new feature, report a bug, or have issues with permissions, please file through GitHub issues.