-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoder #8
Comments
Hi @y1131388949 . Each skeleton frame is 17 joints and each joint is 3 numbers. x,y,z. So you need 5x17x3 as input to encoder. You need to copy the fifth frame 20 times and make it 20x17x3 and input that to decoder. |
Thank you very much for your answer, it was very useful for me. I noticed that in the H36MDataset_v3 file, the selected keypoints for the human body are _MAJOR_JOINTS = [0, 1, 2, 5, 6, 7, 11, 12, 13, 14, 16, 17, 18, 24, 25, 26], which is a total of 16 keypoints, not 17 as you said. Which parts of the human body do these points correspond to? The key point diagrams I've looked up online don't seem to match up. |
@y1131388949 As far as I remember, this is the skeleton structure: If there are 16 joints used in the data loading part, I guess it's removing the hip joint. |
I noticed that in both training and validation, the input of the decoder is the sequence of truth values to be predicted, but what should be the input of the decoder when using the trained STPosetransformer for prediction? Your paper states that the last frame of the input sequence is copied as the decoder input, what exactly does this look like? If I want to use my own recognized 3D keypoints of the human body as input to predict, what format should the input of the decoder be?
The text was updated successfully, but these errors were encountered: