Why not use PyTorch’s cpp_extension? #73

Lenan22 · 2025-01-04T03:09:00Z

`
JointTransformerBlock::JointTransformerBlock(int dim, int num_attention_heads, int attention_head_dim, bool context_pre_only, Tensor::ScalarType dtype, Device device) :
dim(dim),
dim_head(attention_head_dim / num_attention_heads),
num_heads(num_attention_heads),
context_pre_only(context_pre_only),
norm1(dim, false, dtype, device),
norm1_context(dim, context_pre_only, dtype, device),
qkv_proj(dim, dim * 3, true, dtype, device),
qkv_proj_context(dim, dim * 3, true, dtype, device),
norm_q(dim_head, 1e-6, false, dtype, device),
norm_k(dim_head, 1e-6, false, dtype, device),
norm_added_q(dim_head, 1e-6, false, dtype, device),
norm_added_k(dim_head, 1e-6, false, dtype, device),
attn(num_attention_heads, attention_head_dim / num_attention_heads, device),
out_proj(dim, dim, true, dtype, device),
out_proj_context(dim, dim, true, dtype, device),
norm2(dim, 1e-6, false, dtype, device),
norm2_context(dim, 1e-6, false, dtype, device),
mlp_fc1(dim, dim * 4, true, dtype, device),
mlp_fc2(dim * 4, dim, true, dtype, device),
mlp_context_fc1(dim, dim * 4, true, dtype, device),
mlp_context_fc2(dim * 4, dim, true, dtype, device)
{
registerChildren
(norm1, "norm1")
(norm1_context, "norm1_context")
(qkv_proj, "qkv_proj")
(qkv_proj_context, "qkv_proj_context")
(norm_q, "norm_q")
(norm_k, "norm_k")
(norm_added_q, "norm_added_q")
(norm_added_k, "norm_added_k")
(attn, "attn")
(out_proj, "out_proj")
(out_proj_context, "out_proj_context")
(norm2, "norm2")
(norm2_context, "norm2_context")
(mlp_fc1, "mlp_fc1")
(mlp_fc2, "mlp_fc2")
(mlp_context_fc1, "mlp_context_fc1")
(mlp_context_fc2, "mlp_context_fc2")
;
}

`

I understand that CUDA operators can be directly encapsulated into PyTorch’s cpp_extension for implementation and calling. Why would one need to rebuild the entire network using C++?

lmxyy · 2025-01-11T05:18:37Z

Thanks for raising this question. For now, we rebuilt the entire network in C to maximize the efficiency and minimize the overhead of PyTorch. We are also considering wrapping our core kernels into torchao as extensions.

Lenan22 · 2025-01-13T07:55:07Z

Thanks for raising this question. For now, we rebuilt the entire network in C to maximize the efficiency and minimize the overhead of PyTorch. We are also considering wrapping our core kernels into torchao as extensions.

Whether torch extensions leads to performance degradation compared to rebuilding the entire network in C ?

lmxyy added the question Further information is requested label Jan 11, 2025

lmxyy closed this as completed Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not use PyTorch’s cpp_extension? #73

Why not use PyTorch’s cpp_extension? #73

Lenan22 commented Jan 4, 2025 •

edited

Loading

lmxyy commented Jan 11, 2025

Lenan22 commented Jan 13, 2025

Why not use PyTorch’s cpp_extension? #73

Why not use PyTorch’s cpp_extension? #73

Comments

Lenan22 commented Jan 4, 2025 • edited Loading

lmxyy commented Jan 11, 2025

Lenan22 commented Jan 13, 2025

Lenan22 commented Jan 4, 2025 •

edited

Loading