Skip to content

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

License

Notifications You must be signed in to change notification settings

swimdi/tt-metal

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ttnn logo

TT-NN is python & C++ Neural Network OP library.


Grayskull (GS) Models

Model Batch End-to-end throughput [1] Device throughput [2] Target
ResNet-50 (fps) 20 2,850 7,200 10,000
BERT-Large (sen/s) 12 362 406 410
Falcon7B-decode (t/s) 32 135 135 140
ViT (fps) 8 480 1570 2000
T5 small (sen/s) 140
Bloom (sen/s) 70
U-Net coming soon

[1] - Observed from the host. Includes dispatch overhead and kernel execution time.

[2] - Ignoring host overhead. Kernel execution time only.

Wormhole (WH) Models

Note

All model demos in this table function on both N150 and N300 Wormhole cards, unless otherwise stated.

Model Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode 129th 32 11.6 t/s/u - 371 t/s 15.4 t/s/u - 493 t/s 21 t/s/u
Mistral-7B-decode 33rd 32 10.9 t/s/u - 349 t/s 13.3 t/s/u - 426 t/s 21 t/s/u
Mamba-2.8B-decode any 32 9.2 t/s/u - 295 t/s 13.1 t/s/u - 419 t/s 22 t/s/u
BERT-Large (sen/s) [4] any 8 270 340 400
Stable Diffusion 1.4 512x512 (sec/img) 1 8s 5s

[1] - Observed from the host. Includes dispatch overhead and kernel execution time.

[2] - Ignoring host overhead. Kernel execution time only.

[3] - Generating the i'th token in a sequence while the kv_cache is filled with i-1 rows.

[4] - This model demo does not work on N150. It does work on N300.

T3000 (2x4 mesh of WHs) Models

Model Technique Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode Data Parallel 129th 256 4.4 t/s/u - 1114 t/s coming soon 21 t/s/u
LLaMA-2-70B-decode Tensor Parallel 129th 32 8.5 t/s/u - 272 t/s 13.9 t/s/u - 445 t/s 20 t/s/u
LLaMA-3-70B-decode Tensor Parallel 129th 32 8.1 t/s/u - 257 t/s 13.9 t/s/u - 445 t/s 20 t/s/u
Falcon40B-decode Tensor Parallel 129th 32 1.5 t/s/u - 48 t/s 14.0 t/s/u - 448 t/s 30 t/s/u
Mixtral7Bx8-decode Tensor Parallel 129th 32 7.0 t/s/u - 225 t/s 27.0 t/s/u - 864 t/s 28 t/s/u
ResNet50 Data Parallel coming soon

Using TT-NN ops and tensors

import ttnn
import torch

with ttnn.manage_device(device_id=0) as device:
   a = torch.ones((5, 7))
   b = torch.ones((1, 7))

   a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
   b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)

   output = a + b
   output = ttnn.to_torch(output)

print(output)

TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Getting started

Get started with simple kernels.

About

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 48.8%
  • Python 40.6%
  • Jupyter Notebook 5.3%
  • C 4.5%
  • Shell 0.5%
  • CMake 0.2%
  • Other 0.1%