Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use DMA-BUF with HW encoding #206

Merged
merged 1 commit into from
Aug 5, 2023
Merged

Use DMA-BUF with HW encoding #206

merged 1 commit into from
Aug 5, 2023

Conversation

nowrep
Copy link
Contributor

@nowrep nowrep commented Jan 16, 2023

No description provided.

@ammen99
Copy link
Owner

ammen99 commented Jan 16, 2023

Hi, and thank you for working on this! I have briefly looked through the code and it looks ok. However, if we go for the dmabuf approach, wouldn't it make sense to use the wlr-export-dmabuf protocol to avoid copying the pixel data altogether?

@nowrep
Copy link
Contributor Author

nowrep commented Jan 16, 2023

use the wlr-export-dmabuf protocol to avoid copying the pixel data altogether?

Even with wlr-export-dmabuf, the client is still supposed to copy the frame? And copying the frame in wf-recorder would mean doing it in OpenGL.

https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/protocol/wlr-export-dmabuf-unstable-v1.xml#L80

    <enum name="flags">
      <description summary="frame flags">
        Special flags that should be respected by the client.
      </description>
      <entry name="transient" value="0x1"
             summary="clients should copy frame before processing"/>
    </enum>

@ammen99
Copy link
Owner

ammen99 commented Jan 16, 2023

use the wlr-export-dmabuf protocol to avoid copying the pixel data altogether?

Even with wlr-export-dmabuf, the client is still supposed to copy the frame? And copying the frame in wf-recorder would mean doing it in OpenGL.

https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/protocol/wlr-export-dmabuf-unstable-v1.xml#L80

    <enum name="flags">
      <description summary="frame flags">
        Special flags that should be respected by the client.
      </description>
      <entry name="transient" value="0x1"
             summary="clients should copy frame before processing"/>
    </enum>

Yeah, I see, I thought the flag isn't always set but looking at the wlroots code, it always is. I'll read the code in a bit more depth soon and will merge if everything is ok.

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2023

Doesn't work, tried both vulkan and opengl backends on wlroots

$ wf-recorder -d /dev/dri/renderD129
selected region 0,0 0x0
Failed to copy frame, retrying...

Also I think there should be an option to not use dmabuf, similar to --no-damage like --no-dmabuf

@nowrep
Copy link
Contributor Author

nowrep commented Jan 17, 2023

It should be

wf-recorder -c h264_vaapi -d /dev/dri/renderD128 out.mp4

but it seems to fail even before starting encoding for you.

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2023

Tried that, still the same error.

fwiw i'm on sway/wlroots both built from latest git

@nowrep
Copy link
Contributor Author

nowrep commented Jan 17, 2023

Does this help?

diff --git a/src/main.cpp b/src/main.cpp
index 26d73fd..34cc43d 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -305,7 +305,7 @@ static void frame_handle_linux_dmabuf(void *, struct zwlr_screencopy_frame_v1 *f
 
     if (!buffer.wl_buffer) {
         buffer.bo = gbm_bo_create(gbm_device, buffer.width,
-            buffer.height, format, GBM_BO_USE_RENDERING);
+            buffer.height, format, GBM_BO_USE_LINEAR | GBM_BO_USE_RENDERING);
         if (buffer.bo == NULL)
         {
             std::cerr << "Failed to create gbm bo" << std::endl;

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2023

This works now but only with scale_vaapi=format=nv12, which results in yuvj420p videos that have incorrect colors. Changing the filter to format=nv12 like in #207 doesn't work with your branch

Using video filter: format=nv12
Impossible to convert between the formats supported by the filter 'Parsed_format_0' and the filter 'auto_scale_0'
Failed to configure graph filter: Function not implemented

@nowrep
Copy link
Contributor Author

nowrep commented Jan 17, 2023

Yes, software filters won't work now because there's no transfer to CPU anymore.
I cannot reproduce wrong colors on 7900xtx + mesa-git, what is your GPU?

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2023

6600xt + mesa 22.3.3

screenshot with grim:
screenshot with grim

screencopy with dmabuf:
screencopy with dmabuf

@nowrep
Copy link
Contributor Author

nowrep commented Jan 17, 2023

Actually this should work: -F scale_vaapi,hwdownload,format=nv12,hwupload

@ammen99
Copy link
Owner

ammen99 commented Jan 17, 2023

Actually this should work: -F scale_vaapi,hwdownload,format=nv12,hwupload

Sounds quite suboptimal to download the data just to scale it and upload again .. But unfortunately I have no idea what filters or options need to be set for this to work.

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2023

It doesn't work, I still get the same incorrect colors

@nowrep
Copy link
Contributor Author

nowrep commented Jan 17, 2023

Sounds quite suboptimal to download the data just to scale it and upload again

Indeed, but if someones wants that .... (but the filter format is wrong and it doesn't work)

It doesn't work, I still get the same incorrect colors

In any case, this is not related to this pull request.

ffmpeg -h filter=scale_vaapi

Maybe changing some of these options can fix the colors for you?
My recording when played in mpv looks identical to what I see on screen.

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2023

In any case, this is not related to this pull request.

Yes, this shouldn't block the PR but there should be a --no-dmabuf option

@nowrep nowrep force-pushed the dmabuf branch 2 times, most recently from 5a471fd to 97a2f4e Compare January 17, 2023 15:32
@nowrep
Copy link
Contributor Author

nowrep commented Jan 17, 2023

It will now map the dmabuf to memory if there is a hwupload in filter. You can try with -F format=nv12,hwupload.

Also I've managed to reproduce the wrong colors, if I use the sw format conversion instead of scale_vaapi : )

@llyyr
Copy link
Contributor

llyyr commented Jan 17, 2023

Incorrect colors are fixed by #208

@nowrep nowrep force-pushed the dmabuf branch 2 times, most recently from ae8864a to 3f16a47 Compare January 17, 2023 15:50
manpage/wf-recorder.1 Outdated Show resolved Hide resolved
@llyyr
Copy link
Contributor

llyyr commented Jan 18, 2023

wf-recorder segfaults when either width or height is an odd number with dmabuf, and gets stuck on Failed to copy frame, retrying... if it's even. dmabuf doesn't support copying regions right now anyway, so we either have to crop the region with ffmpeg internally or allow for not using dmabuf

@llyyr
Copy link
Contributor

llyyr commented Jan 20, 2023

Hangs when bad geometry is provided, happens with or without any other options. Used to select whole output before properly

$ wf-recorder -g "asdf" -f out.mp4
Bad geometry: asdf, capturing whole output instead.
selected region 0,0 0x0
Failed to copy frame, retrying...

@nowrep
Copy link
Contributor Author

nowrep commented Jan 20, 2023

Hangs when bad geometry is provided

Can't reproduce. Setting geometry to invalid value behaves exactly the same as if you don't set any geometry at all.

@llyyr
Copy link
Contributor

llyyr commented Jan 20, 2023

Ah, sorry that has nothing to do with geometry, I misunderstood. It fails and gets stuck on when using dmabuf, but the provided rendering device isn't the one that wlroots-based compositor started with on multigpu systems. Perhaps a message that lets the user know when its using dmabuf or screencopy would be nice.

Also, you can't Ctrl+C out of wf-recorder when it's stuck on "Failed to copy frame, retrying...", you have to send SIGKILL. It never reaches MAX_FRAME_FAILURES when using dmabuf

@nowrep
Copy link
Contributor Author

nowrep commented Jan 20, 2023

Also, you can't Ctrl+C out of wf-recorder when it's stuck on "Failed to copy frame, retrying...", you have to send SIGKILL. It never reaches MAX_FRAME_FAILURES when using dmabuf

That's because it will never try again to capture the frame - not related to this PR.

diff --git a/src/main.cpp b/src/main.cpp
index 49ee1bc..9761da4 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -253,6 +253,9 @@ static void frame_handle_failed(void *, struct zwlr_screencopy_frame_v1 *) {
     {
         std::cerr << "Failed to copy frame too many times, exiting!" << std::endl;
         exit_main_loop = true;
+    } else
+    {
+        buffer_copy_done = true;
     }
 }
 

@llyyr
Copy link
Contributor

llyyr commented Jan 20, 2023

that works, but there's no need for it to be an else statement is there?

looks good to me now

Copy link
Owner

@ammen99 ammen99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks once more for working on this! I finally got the time to test, I can reproduce the washed out colors, but that is a problem from before, so it can be fixed separately. All else seems to be working fine, I am no dmabuf expert, but the implementation looks reasonable.

I have one small comment regarding logging, but that should be easy to fix :)
Are there any issues with this PR as of now? I am not sure how the problems on @llyyr 's system are to be resolved.

@@ -857,6 +1026,57 @@ int main(int argc, char *argv[])
wl_registry_add_listener(registry, &registry_listener, NULL);
sync_wayland();

if (params.codec.find("vaapi") != std::string::npos)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many cases here, some of which may not be apparent to the user. I think it would be nice to print messages not just in the error cases, but basically every time we make a decision: therefore the user can see what is going on internally (e.g. when we use dmabuf, whether we use the same device as compositor, etc).

@nowrep
Copy link
Contributor Author

nowrep commented Aug 5, 2023

I can reproduce the washed out colors

This is mesa bug, but it has been already fixed.

Are there any issues with this PR as of now? I am not sure how the problems on @llyyr 's system are to be resolved.

There are no issues I am aware of. It does use way more memory than is needed, but that's because it uses 16 buffers and the same issue is with software encoding too.

@nowrep
Copy link
Contributor Author

nowrep commented Aug 5, 2023

Fixed one mistake + added info messages about enabling/disabling dmabuf capture.

Copy link
Owner

@ammen99 ammen99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then we can merge this :)

@ammen99
Copy link
Owner

ammen99 commented Aug 5, 2023

The CI also needs an update ^

@ammen99 ammen99 merged commit 3a2416f into ammen99:master Aug 5, 2023
1 check passed
@llyyr
Copy link
Contributor

llyyr commented Aug 5, 2023

There are no issues I am aware of.

I just described my issue 1/3rd of an screen above your comment. With the current git master it's not possible for me to use hardware encoding without making my system freeze to the point where even the reset button doesn't work and I have to turn off power and turn it back on. I have no interest in trying to debug AMD's garbage but there should still be an option to not always use dmabuf when using hardware encoding

@ammen99
Copy link
Owner

ammen99 commented Aug 5, 2023

There are no issues I am aware of.

I just described my issue 1/3rd of an screen above your comment. With the current git master it's not possible for me to use hardware encoding without making my system freeze to the point where even the reset button doesn't work and I have to turn off power and turn it back on. I have no interest in trying to debug AMD's garbage but there should still be an option to not always use dmabuf when using hardware encoding

I did read your message but for some reason assumed the problem was somehow resolved ... Yeah, if drivers are that broken, we should have a workaround. I thought however some of the earlier versions of this PR worked for you? Do you happen to know which change triggers the crash?

@llyyr
Copy link
Contributor

llyyr commented Aug 5, 2023

I thought however some of the earlier versions of this PR worked for you? Do you happen to know which change triggers the crash?

The same version of this PR that worked before doesn't work now, so I can only assume it's a bug somewhere in wlroots/mesa(radeonsi)/amdgpu. However having to do a full reset for every single bisect is not something I have time for, so someone else will have to figure out what's broken and where and why.

@nowrep
Copy link
Contributor Author

nowrep commented Aug 5, 2023

Yeah, if drivers are that broken, we should have a workaround

I wouldn't say that drivers are that broken, I can say that I had no GPU reset while developing this 😃

Without kernel and compositor logs it's really hard to say what's wrong, but if you get full system freeze then it sounds like kernel issue. Usually kernel can recover from GPU hang, although you'll have to restart your session because compositors mostly don't handle GPU resets well.

@nowrep
Copy link
Contributor Author

nowrep commented Aug 5, 2023

@llyyr Does this work?

wf-recorder -c h264_vaapi -F "hwupload,scale_vaapi=format=nv12" -f out.mp4

@llyyr
Copy link
Contributor

llyyr commented Aug 6, 2023

That still doesn't work but at least it doesn't produce a gpu hang.

out.log

Usually kernel can recover from GPU hang

Citation needed. AMD recovers from gpu resets maybe 5-10% of the time, their own devs tell you that it's the userspace application's fault if it causes AMD gpus to hang and reset unsuccessfully, not that nothing in the userspace should be able to do that in the first place.

@nowrep
Copy link
Contributor Author

nowrep commented Aug 6, 2023

Thanks for the log, it's the screencopy capture that fails. It shouldn't send it to encoder if the copy failed, so that has to be fixed.
What compositor are you using and can you also check if there are some errors in compositor log?

Citation needed. AMD recovers from gpu resets maybe 5-10% of the time

If this is your experience then there is definitely something very wrong.

@llyyr
Copy link
Contributor

llyyr commented Aug 6, 2023

What compositor are you using and can you also check if there are some errors in compositor log?

sway and wlroots built from git, with the vulkan backend but it also happens on the gles backend.

Aug 06 11:01:47 altina sway[28345]: 00:00:30.084 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.096 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.105 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.112 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.119 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.127 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.134 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.141 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.148 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.155 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.163 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.171 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.178 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.185 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.192 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.199 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.207 [ERROR] [wlr] [render/vulkan/texture.c:508] Format XR24 (0x34325258) can't be used with modifier INVALID (0x00FFFFFFFFFFFFFF)
Aug 06 11:01:47 altina sway[28345]: 00:00:30.214 [ERROR] [wlr] [render/dmabuf_linux.c:83] drmIoctl(IMPORT_SYNC_FILE) failed: Inappropriate ioctl for device
Aug 06 11:01:47 altina sway[28345]: 00:00:30.214 [ERROR] [wlr] [render/vulkan/pass.c:388] Failed to sync render buffer
Aug 06 11:01:47 altina sway[28345]: 00:00:30.232 [ERROR] [wlr] [render/dmabuf_linux.c:83] drmIoctl(IMPORT_SYNC_FILE) failed: Inappropriate ioctl for device
Aug 06 11:01:47 altina sway[28345]: 00:00:30.232 [ERROR] [wlr] [render/vulkan/pass.c:388] Failed to sync render buffer
Aug 06 11:01:47 altina sway[28345]: 00:00:30.251 [ERROR] [wlr] [render/dmabuf_linux.c:83] drmIoctl(IMPORT_SYNC_FILE) failed: Bad file descriptor
Aug 06 11:01:47 altina sway[28345]: 00:00:30.251 [ERROR] [wlr] [render/vulkan/pass.c:388] Failed to sync render buffer
Aug 06 11:01:47 altina sway[28345]: 00:00:30.281 [ERROR] [wlr] [render/dmabuf_linux.c:83] drmIoctl(IMPORT_SYNC_FILE) failed: Bad file descriptor
Aug 06 11:01:47 altina sway[28345]: 00:00:30.281 [ERROR] [wlr] [render/vulkan/pass.c:388] Failed to sync render buffer
Aug 06 11:01:47 altina sway[28345]: 00:00:30.551 [ERROR] [wlr] [render/dmabuf_linux.c:83] drmIoctl(IMPORT_SYNC_FILE) failed: Bad file descriptor

edit: it appears to work on the gles backend if I specify -F "hwupload,scale_vaapi=format=nv12", but not if I don't

@nowrep
Copy link
Contributor Author

nowrep commented Aug 6, 2023

Thanks, now I can see what's wrong. Also word of advice, when you are using very non-standard setup (vulkan, 10bit, rc kernel, ...) it's usually a good idea to mention that ;)

@llyyr
Copy link
Contributor

llyyr commented Aug 6, 2023

vulkan, 10bit, rc kernel, ...)

Doesn't work on gles either. Not on 10bit. I am on 6.5-rc3 but it doesn't work on older kernels either.

@nowrep
Copy link
Contributor Author

nowrep commented Aug 6, 2023

This should fix the INVALID modifier being sent and the compositor should import it correctly.

diff --git a/src/main.cpp b/src/main.cpp
index 093070d..92e3c9e 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -317,8 +317,9 @@ static void frame_handle_linux_dmabuf(void *, struct zwlr_screencopy_frame_v1 *f
     buffer.height = height;
 
     if (!buffer.wl_buffer) {
-        buffer.bo = gbm_bo_create(gbm_device, buffer.width,
-            buffer.height, format, GBM_BO_USE_LINEAR | GBM_BO_USE_RENDERING);
+        const uint64_t modifier = 0; // DRM_FORMAT_MOD_LINEAR
+        buffer.bo = gbm_bo_create_with_modifiers(gbm_device, buffer.width,
+            buffer.height, format, &modifier, 1);
         if (buffer.bo == NULL)
         {
             std::cerr << "Failed to create gbm bo" << std::endl;

@llyyr
Copy link
Contributor

llyyr commented Aug 6, 2023

That fixes it on vulkan when inserting hwupload to the filters, however I still get the system freeze without it on both gles and vulkan, on both 8 bit and 10 bit, on both 6.3.9 and 6.5-rc3, the only remaining variable here is me using mesa-git I'll downgrade to an older version later

@nowrep
Copy link
Contributor Author

nowrep commented Aug 6, 2023

Thanks for testing. So now the copy works but trying to import the dmabuf into vaapi doesn't. Unfortunately it's going to be more difficult to debug with the system freeze, and I assume no dmesg errors.

Also the 10bit formats should be added to format table. It probably still works, but may confuse some filters.

@llyyr
Copy link
Contributor

llyyr commented Aug 7, 2023

I was able to bisect it down to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24151

@nowrep
Copy link
Contributor Author

nowrep commented Aug 7, 2023

export AMD_DEBUG=noefc should disable the feature and workaround the issue.

@llyyr
Copy link
Contributor

llyyr commented Aug 8, 2023

Should I open an issue on Mesa for this?

@nowrep
Copy link
Contributor Author

nowrep commented Aug 8, 2023

Yes, please.

@llyyr
Copy link
Contributor

llyyr commented Aug 8, 2023

Would be helpful if you could attach necessary information that I might've missed here https://gitlab.freedesktop.org/mesa/mesa/-/issues/9497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants