Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe #5622

mbkamran · 2024-09-12T14:54:18Z

I downloaded the pose_landmaker_lite.task file from the official Mediapipe guide for Pose Landmark Detection here:

In order to access its .tflite models, I unzipped it using unzip pose_landmaker_lite.task and got 2 files: pose_detector.tflite and pose_landmarks_detector.tflite.

Question 1: How do we interpret these models and how are they being used for tasks?

pose_landmarks_detector.tflite appears to be one for pose detection, as we can visualize the structure and outputs of both the models at Netron App and see that this model has pose detection outputs:

However, I have difficulty understanding the shapes and meaning of both "Pose landmarks" Output Shape: [1,195] and "World landmarks for pose" Output Shape: [1,117]

Question 2: How do we interpret the shapes `[1,195]` and `[1,117]`?

And finally,

Question 3: How do we interpret the structure of the model, especially that how does it relate with BlazePose and MobileNetV2? Also is there any support for fine-tuning, using the trained backbone in this model and writing a custom head?

The text was updated successfully, but these errors were encountered:

mbkamran added the type:others issues not falling in bug, perfromance, support, build and install or feature label Sep 12, 2024

google-ml-butler bot assigned ayushgdev Sep 12, 2024

kuaashish assigned kuaashish and unassigned ayushgdev Sep 13, 2024

kuaashish added task:pose landmarker Issues related to Pose Landmarker: Find people and body positions type:support General questions platform:python MediaPipe Python issues and removed type:others issues not falling in bug, perfromance, support, build and install or feature labels Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe #5622

Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe #5622

mbkamran commented Sep 12, 2024

Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe #5622

Explanation of output shape [1, 117] (World landmarks for pose) or [1, 195] (Pose landmarks) of pose_landmarks_detector.tflite in Mediapipe #5622

Comments

mbkamran commented Sep 12, 2024

Question 1: How do we interpret these models and how are they being used for tasks?

Question 2: How do we interpret the shapes [1,195] and [1,117]?

Question 3: How do we interpret the structure of the model, especially that how does it relate with BlazePose and MobileNetV2? Also is there any support for fine-tuning, using the trained backbone in this model and writing a custom head?

Question 2: How do we interpret the shapes `[1,195]` and `[1,117]`?