-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No prediction segmentation fault #323
Comments
I updated to hyperpose-2.1.0
I get:
|
When I analyze the sh scripts/download-test-data.sh data with: The model is able to analyze only two images: If the max_batch_size is not restricted to 1, no output image is generated. Terminal Log:
|
Is this the issue: HW=368x432.onnx --w 368 --h 432 in your CLI use? As for your last example, its good to see that 2 images are produced correctly. The seg fault comes from these 35 lines below. I would recommend you find the exact line that segfaults even just by putting some prints.
|
The issue happens with all the models. Am I the only one having this issue ? |
@salvador-blanco Can you tell me what operating system you use with the RTX 2080? I dont run these on RTX 2080. I think I know what causes your seg fault though - the batching requires and enormous amount of GPU RAM. Well, when running batched images I hit about 20 - 28 GB memory utilization on my Nvidia Xavier. The thing is, AGX Xavier shares RAM between CPU and GPU so GPU has 32 GB of memory accessible. With RTX 2080, you may be a little limited with fixed 8 GB. Still, when running batched, I managed to overflow 32 GB of memory on OpenPose COCO x368 model resulting in seg fault. Other models managed to stay below 28 GB. Instead I run processing with batch size 1 suppling one image at a time programmatically and it went through, a little slower but without seg faults. |
@salvador-blanco the log is:
Could please tell me more about your TensorRT version, OS version, or any other things like that. Or you can use gdb to find the segmentation point. First you type: Then: When the segmentation point shows up. Type Or you can try our docker image with all dependencies in verified versions. |
@ganler I can reproduce the segfaults when I use a large data set batched on a larger model. Try to reproduce with OpenPoseCOCO at something like 650x368 with 100+ images in the test set and it segfaults a lot of images into the processing. |
@stubbb Yeah, but I think in @salvador-blanco 's case, it might not be the issue, as he got segmentation fault even if using a batch size of 1. |
This is the log from |
@salvador-blanco Hi, it seems like there are little useful details in your logging. So could you please just re-compile this program using "debug" flags, say '-Og' that it ensures very detailed compiling information in you codes. To be more specific, I encourage you to follow:
rm CMakeCache.txt
cmake .. -DCMAKE_BUILD_TYPE=Debug
cmake . --build # or simply make -j First, you check if there are still errors. (In case your errors come from the illegal use of SIMD instructions) I hope you find my suggestions helpful. |
Hi @ganler I made a clean installation with Ubuntu 18.04 since that was mention to be used for the test-bed; still the problem persisted I followed your recommendation and here is the log:
|
I tried in a different computer with GeForce GTX 960 and I encountered the same issue
|
Same problem with lopps-resnet50-V2-HW=368x432.onnx: Input shape mismatch, other models is ok. |
The lopps being 368x432 means actually 432x368. Flip your dimensions and should be ok. |
Sadly it's not ok. The problem is not here. |
Can you share how you invoke the model? I managed to correct this error every time I got it and I used this model. |
Yes, I also get the error about And I'm pretty sure I'm setting the H and W properly, e.g.:
Here is the typical output:
|
Your network is actually broken: Network Input Shape: (-1, 3, -1, -1) <- this outputs the correct input size for me if I mess it up, not the (-1 -1) Everything up to the error looks the same for me though. I run it compiled, natively on Nvidia Xavier AGX and for lopps-resnet50-V2-HW=368x432.onnx I achieved 19 fps. |
In the Dockerised version, you mean? Is this something the maintainers need to fix? |
That might be the thing, I run it natively no issue. You can try to download the model, go into the container and swap it out. |
When analyzing complex images/videos, after some time, the conde stops and I get
Segmentation fault (core dumped)
with no output image or video. My intuition tells me that this happens with "complex" videos/imagesI am using an RTX 2080 so I don't think resources are the problem.
For example the following image:
https://ibb.co/xCwn1TC
When analyzing that image with:
./example.operator_api_batched_images_paf --model_file ../data/models/openpose_thin.onnx --input_width 432 --input_height 368
There is no output image
Terminal Log:
When analyzing that image with:
./example.operator_api_batched_images_paf --model_file ../data/models/openpose_coco.onnx --input_width 656 --input_height 368
An output image is generated https://ibb.co/PN0RmNV
Terminal Log:
When analyzing this video https://youtu.be/Rme8aTAWXxc with the following:
./example.operator_api_video_paf --model_file ../data/models/openpose_coco.onnx --input_video ../data/media/CA.mp4 --input_width 656 --input_height 368
I get an empty video file
Terminal log:
I would appreciate your help
The text was updated successfully, but these errors were encountered: