Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why 2dgs model has 3 scales and the third element of scales has grad?? #468

Open
insomniaaac opened this issue Oct 27, 2024 · 2 comments
Open

Comments

@insomniaaac
Copy link

i noticed that in examples/simple_trainer_2dgs.py, 2dgs model has 3 scales.
however, in original 2dgs repo, there only exists 2 scales.

furthermore, i found that in gsplat cuda implementation, only scales[0] and scales[1] are used.
but when i print self.splats["scales"].grad, the third element has grad!

@Eightu
Copy link

Eightu commented Oct 29, 2024

In section 4.1 of the original 2DGS, it is explained as follows: '... the scaling factors into a 3 ×3 diagonal matrix S whose last entry is zero.'.
I think maybe even if the third diagonal element of the diagonal matrix S is 0, it will still have a gradient due to the completeness of the gradient calculation and the action of the chain rule. I'm also learning.

@insomniaaac
Copy link
Author

in cuda kernel

template <typename T>
__global__ void fully_fused_projection_fwd_2dgs_kernel(
const uint32_t C,
const uint32_t N,
const T *__restrict__ means, // [N, 3]: Gaussian means. (i.e. source points)
const T *__restrict__ quats, // [N, 4]: Quaternions (No need to be normalized): This is the rotation component (for 2D)
const T *__restrict__ scales, // [N, 3]: Scales. [N, 3] scales for x, y, z
const T *__restrict__ viewmats, // [C, 4, 4]: Camera-to-World coordinate mat
// [R t]
// [0 1]
const T *__restrict__ Ks, // [C, 3, 3]: Projective transformation matrix
// [f_x 0 c_x]
// [0 f_y c_y]
// [0 0 1] : f_x, f_y are focal lengths, c_x, c_y is coords for camera center on screen space
const int32_t image_width, // Image width pixels
const int32_t image_height, // Image height pixels
const T near_plane, // Near clipping plane (for finite range used in z sorting)
const T far_plane, // Far clipping plane (for finite range used in z sorting)
const T radius_clip, // Radius clipping threshold (through away small primitives)
// outputs
int32_t *__restrict__ radii, // [C, N] The maximum radius of the projected Gaussians in pixel unit. Int32 tensor of shape [C, N].
T *__restrict__ means2d, // [C, N, 2] 2D means of the projected Gaussians.
T *__restrict__ depths, // [C, N] The z-depth of the projected Gaussians.
T *__restrict__ ray_transforms, // [C, N, 3, 3] Transformation matrices that transform xy-planes in pixel spaces into splat coordinates (WH)^T in equation (9) in paper
T *__restrict__ normals // [C, N, 3] The normals in camera spaces.
) {

only scales[0] and scales[1] is used, noted that the shape of scales is [N,3]
mat3<T> RS_camera =
R * quat_to_rotmat<T>(glm::make_vec4(quats)) *
mat3<T>(scales[0], 0.0 , 0.0,
0.0 , scales[1], 0.0,
0.0 , 0.0 , 1.0);

however, i found that in backward

std::tuple<torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor>
fully_fused_projection_bwd_2dgs_tensor(
// fwd inputs
const torch::Tensor &means, // [N, 3]
const torch::Tensor &quats, // [N, 4]
const torch::Tensor &scales, // [N, 2]
const torch::Tensor &viewmats, // [C, 4, 4]
const torch::Tensor &Ks, // [C, 3, 3]
const uint32_t image_width,
const uint32_t image_height,
// fwd outputs
const torch::Tensor &radii, // [C, N]
const torch::Tensor &ray_transforms, // [C, N, 3, 3]
// grad outputs
const torch::Tensor &v_means2d, // [C, N, 2]
const torch::Tensor &v_depths, // [C, N]
const torch::Tensor &v_normals, // [C, N, 3]
const torch::Tensor &v_ray_transforms, // [C, N, 3, 3]
const bool viewmats_requires_grad
) {

scales is in [N, 2]

so v_scales is [N, 2] too, because zeroslike

torch::Tensor v_scales = torch::zeros_like(scales);

however, in kernel function

template <typename T>
__global__ void fully_fused_projection_bwd_2dgs_kernel(
// fwd inputs
const uint32_t C,
const uint32_t N,
const T *__restrict__ means, // [N, 3]
const T *__restrict__ quats, // [N, 4]
const T *__restrict__ scales, // [N, 3]
const T *__restrict__ viewmats, // [C, 4, 4]
const T *__restrict__ Ks, // [C, 3, 3]
const int32_t image_width,
const int32_t image_height,
// fwd outputs
const int32_t *__restrict__ radii, // [C, N]
const T *__restrict__ ray_transforms, // [C, N, 3, 3]
// grad outputs
const T *__restrict__ v_means2d, // [C, N, 2]
const T *__restrict__ v_depths, // [C, N]
const T *__restrict__ v_normals, // [C, N, 3]
// grad inputs
T *__restrict__ v_ray_transforms, // [C, N, 3, 3]
T *__restrict__ v_means, // [N, 3]
T *__restrict__ v_quats, // [N, 4]
T *__restrict__ v_scales, // [N, 3]
T *__restrict__ v_viewmats // [C, 4, 4]
) {

scales is marked as [N, 3] and v_scales is [N, 3] too!
in kernel, pointer offset calculation is
vec2<T> scale = glm::make_vec2(scales + gid * 3);

if (warp_group_g.thread_rank() == 0) {
v_quats += gid * 4;
v_scales += gid * 3;
gpuAtomicAdd(v_quats, v_quat[0]);
gpuAtomicAdd(v_quats + 1, v_quat[1]);
gpuAtomicAdd(v_quats + 2, v_quat[2]);
gpuAtomicAdd(v_quats + 3, v_quat[3]);
gpuAtomicAdd(v_scales, v_scale[0]);
gpuAtomicAdd(v_scales + 1, v_scale[1]);
}

there v_scales offset is 3, but in fact, v_scale and scales is glm::vec2!

I am confused by these chaotic symbols

can we just modify 2dgs's api, to just pass scales in [N, 2] and align behavior with original 2dgs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants