[FFE - E2E] Open Llama 3B Milestone · GitHub

New issue

[FFE - E2E] Open Llama 3B

No due date 72% complete

Core operations support is required for the Llama 3B model.

List of ops that are currently lowered through tt-forge (up to emit to TTIR)

Add - Already supported e2e
Concatenate - Required support on Forge and MLIR
Embedding - Required support on Forge and MLIR
Hslice - Should be removed from the model
Hstack - Should be removed from the model
Matmul - Re…

Core operations support is required for the Llama 3B model.

List of ops that are currently lowered through tt-forge (up to emit to TTIR)

Add - Already supported e2e
Concatenate - Required support on Forge and MLIR
Embedding - Required support on Forge and MLIR
Hslice - Should be removed from the model
Hstack - Should be removed from the model
Matmul - Required support on Forge, MLIR has it
Multiply - Already supported e2e
Narrow - Required via reshape op for both Forge and MLIR
Pad_tile - Potentially redundant
Reciprocal - Required support on Forge and MLIR
Reduce_avg - Required support on Forge, MLIR has it
Sigmoid - Required support on Forge and MLIR
Softmax - Already supported e2e
Sparse_matmul - Should be removed from the model
Sqrt - Required support on Forge and MLIR
Squeeze - Required via reshape op for both Forge and MLIR
Tile_broadcast - Potentially redundant
Transpose - Currently WIP
Unsqueeze - Required via reshape op for both Forge and MLIR

Also, some of the basic Llama 3B building blocks that should be supported:

Embeddings
Self-attention
MLP
RMS Norm
LM head