You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to your example code, the number of frames to encode in video mode should be 1 + 4*(N - 1). That means we have to drop several frames in some cases. Can we encode frames in image mode and decode latents in video mode to keep all frames and get temporal-interpolated outputs?
The text was updated successfully, but these errors were encountered:
According to your example code, the number of frames to encode in video mode should be 1 + 4*(N - 1). That means we have to drop several frames in some cases. Can we encode frames in image mode and decode latents in video mode to keep all frames and get temporal-interpolated outputs?
Apologies for the late response. We've attempted to encode the video in image mode (N frames of pixels -> N frames of latents ) and decode it using video mode (N frames of latents -> 1 + 4*(N-1) frames of pixels). Unfortunately, this does not yield a temporally interpolated output. This is because encoding in image mode treats each frame independently, meaning the resulting latents does not contain effective motion information. Therefore, the video decoded in video mode is not temporally smooth. Interestingly, the 3D Decoder can achieve some degree of interpolation for smaller motions. Please note that this interpolation effect is extremely limited, so the 3D Decoder should not be considered a frame interpolation model.
CV-VAE/cvvae_inference_video.py
Line 32 in 7c69a06
Thank you for sharing the wonderful work!
According to your example code, the number of frames to encode in video mode should be 1 + 4*(N - 1). That means we have to drop several frames in some cases. Can we encode frames in image mode and decode latents in video mode to keep all frames and get temporal-interpolated outputs?
The text was updated successfully, but these errors were encountered: