Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, I tried to extend the Python API with the ability to keep the data on the GPU (#230), and I ran into some weird behaviors (back then they were weird, but now, it's obvious that it was just a lack on understanding of how the data is laid out in memory).
This PR, however, provides a fully functional extension.
NOTE: this change adds an extra dependency; cupy.
The targeted function is
get_data()
, and both modes of providing data (memory view / deep copy) were implemented for GPU as well.This was tested on an
Nvidia AGX Orin 32Gb
, withJetPack 5.1.2
, andZED_SDK_4.1.4
.Shoutout to @andreacelani for the discussion that lead to figuring out how to implement this correctly (look into the closed PR #230 for details).
Benchmarking with an ML pipeline:
@andreacelani did some benchmarking with impressive results: #230 (comment)
Additionally, I tested it myself using a real feed from a ZED Mini with a simple pipeline (see picture), and here are my findings:
TL;DR:
Details:
Notes:
from ultralytics import YOLO
), and a custom trainedPytorch
YOLOV8 model.HD2K
grabbing, my pipeline wasn't saturating the15FPS
rate, thus grabbing was seemingly slower in GPU (faulty read).4 channel
to3 channel
reduction, resizing (to meet the 640x640 expected input), and normalization.PCL
just to simulate real work. (code details are here Feature/get data gpu #230 (comment).)