Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not use PyTorch’s cpp_extension? #73

Closed
Lenan22 opened this issue Jan 4, 2025 · 2 comments
Closed

Why not use PyTorch’s cpp_extension? #73

Lenan22 opened this issue Jan 4, 2025 · 2 comments
Labels
question Further information is requested

Comments

@Lenan22
Copy link

Lenan22 commented Jan 4, 2025

`
JointTransformerBlock::JointTransformerBlock(int dim, int num_attention_heads, int attention_head_dim, bool context_pre_only, Tensor::ScalarType dtype, Device device) :
dim(dim),
dim_head(attention_head_dim / num_attention_heads),
num_heads(num_attention_heads),
context_pre_only(context_pre_only),
norm1(dim, false, dtype, device),
norm1_context(dim, context_pre_only, dtype, device),
qkv_proj(dim, dim * 3, true, dtype, device),
qkv_proj_context(dim, dim * 3, true, dtype, device),
norm_q(dim_head, 1e-6, false, dtype, device),
norm_k(dim_head, 1e-6, false, dtype, device),
norm_added_q(dim_head, 1e-6, false, dtype, device),
norm_added_k(dim_head, 1e-6, false, dtype, device),
attn(num_attention_heads, attention_head_dim / num_attention_heads, device),
out_proj(dim, dim, true, dtype, device),
out_proj_context(dim, dim, true, dtype, device),
norm2(dim, 1e-6, false, dtype, device),
norm2_context(dim, 1e-6, false, dtype, device),
mlp_fc1(dim, dim * 4, true, dtype, device),
mlp_fc2(dim * 4, dim, true, dtype, device),
mlp_context_fc1(dim, dim * 4, true, dtype, device),
mlp_context_fc2(dim * 4, dim, true, dtype, device)
{
registerChildren
(norm1, "norm1")
(norm1_context, "norm1_context")
(qkv_proj, "qkv_proj")
(qkv_proj_context, "qkv_proj_context")
(norm_q, "norm_q")
(norm_k, "norm_k")
(norm_added_q, "norm_added_q")
(norm_added_k, "norm_added_k")
(attn, "attn")
(out_proj, "out_proj")
(out_proj_context, "out_proj_context")
(norm2, "norm2")
(norm2_context, "norm2_context")
(mlp_fc1, "mlp_fc1")
(mlp_fc2, "mlp_fc2")
(mlp_context_fc1, "mlp_context_fc1")
(mlp_context_fc2, "mlp_context_fc2")
;
}

`

I understand that CUDA operators can be directly encapsulated into PyTorch’s cpp_extension for implementation and calling. Why would one need to rebuild the entire network using C++?

@lmxyy lmxyy added the question Further information is requested label Jan 11, 2025
@lmxyy
Copy link
Collaborator

lmxyy commented Jan 11, 2025

Thanks for raising this question. For now, we rebuilt the entire network in C to maximize the efficiency and minimize the overhead of PyTorch. We are also considering wrapping our core kernels into torchao as extensions.

@lmxyy lmxyy closed this as completed Jan 11, 2025
@Lenan22
Copy link
Author

Lenan22 commented Jan 13, 2025

Thanks for raising this question. For now, we rebuilt the entire network in C to maximize the efficiency and minimize the overhead of PyTorch. We are also considering wrapping our core kernels into torchao as extensions.

Whether torch extensions leads to performance degradation compared to rebuilding the entire network in C ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants