-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ruby: Add source frame scaling via screen
functions
#1508
base: master
Are you sure you want to change the base?
Conversation
I like it! My Mac is currently out of action so I can't test it myself right now, though |
Yeah, the reason this is still in draft is that I started working on a larger PR to allow all the software rendering functions in I've also been planning on backporting some of my other work to OpenGL too, so maybe I'll look at that first. |
### OpenGL rendering (all platforms) - librashader specifies that the output viewport should be the same size as the output framebuffer texture. This was not the case for the OpenGL render code. * Previously, librashader under OpenGL would render onto an intermediate framebuffer sized to the "output" size (the entire window view size), rather than the "target" size (the final composited size of the game area within the output area). The `libra_viewport_t` would be sized to the target size, but the underlying buffer would be larger. * Now, we size the intermediate framebuffer to the "target" size, let librashader render onto it with a `libra_viewport_t` that matches that size. In the final pass we sample this buffer within an area of the "output"-sized buffer as appropriate. This prior behavior would lead to scaling issues with shaders in the Metal backend. The same issues did not seem to be obviously present in OpenGL in my testing, but we should nevertheless probably fix this in case it is causing any of the subtle issues with shaders that have been reported, and also in case something breaks in the future as a result of not following this recommendation. (The above is also unrelated to the scaling issues addressed by #1508) ### CGL fix-ups (macOS) - Backports the native fullscreen and monitor selection options to OpenGL, and removes unnecessary custom window code from the OpenGL driver. - Removes a seemingly unnecessary output call from `reshape` that would cause flickering during resizes. This has been tested on macOS but should probably be tested on other platforms as well to make sure nothing breaks. Co-authored-by: jcm <[email protected]>
Does this mean that if 320 and 256 pixel wide content is mixed, that the framebuffer will be scaled to 320 pixels? Or will it still be 1280, but only when the content is mixed? I ask because resampling 256 pixel wide content to 320 pixels would introduce either significant blur or aliasing. Some shaders can handle wide framebuffers and will display correctly even with horizontally duplicated pixels. Those shaders would look worse with a 320 pixel wide framebuffer in this case. It also makes shaders that aim for sharp pixels impossible. I'm not familiar enough with the codebase to understand what scaleX and scaleY in screen.cpp are. |
With the Metal implementation in this version, for mixed content with pixel widths of 256 and 320, the buffers would be scaled horizontally before shader passes at all times, not just when pixel clocks are mixed in the same frame. This is up to the video backend to decide whether to do or not. In theory, if ruby could be informed when content is mixed, ruby could decide to only perform this horizontal scaling for content that is not or is not likely to be mixed. An API for that does not currently exist though. As for shaders that try to allocate textures larger than the D3D or Metal max texture size in response to extra-wide buffers, I suppose that a bug should be filed against those shaders (such as crt-maximus-royale in the OP) regardless of what ends up being considered proper on the ares end. Lastly I'd say the main benefit here visually speaking is the vertical scaling for progressive frames. One way or another, the source framebuffer should probably not end up being 480 pixels high for a 240p frame (this creates the artifacts in the first example, often leads to improper numbers of scanlines being drawn, etc.). It may be worth looking at |
I agree and that's why I was excited to see this PR! Horizontal scaling is a difficult problem. Outputting the least common multiple of the possible pixel widths breaks many shaders, often subtly, as they make assumptions about the horizontal resolution for the sake of simplicity or performance. Resampling to one of the resolutions on mixed content would also cause issues. I'm any case, this issue may be out of scope here. If I understand correctly, you are preserving the existing horizontal scaling behavior on the Metal backend (e.g. always output 1280 pixels wide in Mega Drive/Genesis)? |
Without this PR the Metal backend takes in the 1280-wide buffer and passes that to librashader, which renders it onto an offscreen texture the same size as the final viewport, and then a final Metal pass blits that texture onto the viewport. With this PR the buffers are always resampled down to 320 width before being passed to librashader. If this PR ever becomes not-a-draft, that would probably not be the final behavior. |
(Marked as draft pending some further testing and any feedback)
This PR exposes several methods in
screen
to the video backend inruby
, in order to resolve certain libretro shader scaling issues, and downscale source frame buffers as appropriate to improve shader performance.Background
ares's current rendering strategy, broadly, is to render frames on the CPU one at a time before sending them to the video backend. The frames sent to the video backend can have many duplicated pixels present, as a shortcut to account for cases where frames contain scanlines with different pixel clocks. Unless we scale these buffers down to the appropriate size before sending them to librashader, we can introduce artifacts, cause incorrect shader behavior, or even crash if we cause a shader to create an overly large texture.
By allowing
screen
'ssetScale
,setInterlace
andsetProgressive
methods to call into the video backend, we can signal that ruby should scale down the framebuffers appropriately before sending them to shaders. This is (in theory of course) more efficient than performing this scaling on the CPU, and more streamlined logically in terms of making the video backend responsible for the final frame buffer creation.Proof of concept
To test this approach, I added rudimentary implementations of these functions for the Metal backend. These methods basically do the following in a separate render pass, prior to shader passes:
This approach is basic and may need refinement, but is a solid improvement on existing behavior in my testing.
Future work
(Disclaimer; not trying to speak for maintainers here; just thinking out loud):
Per conversations on the ares Discord, it seems as though ares may in the future move toward a renderer that leans more heavily on the video backend, perhaps similar to the approach taken by something like CLK, where the core generates a video signal, and the display backend is entirely responsible for synthesizing that signal into frames. This could have numerous advantages in terms of accuracy, performance and latency.
While not terribly significant, this PR could be considered a small step in this direction, as it moves some of the work generating the final source frame from the CPU to the GPU/display backend.
This does, however, also mean that for other video backends to achieve parity with Metal in terms of framebuffer scaling, they will need to implement
setScale
,setInterlace
andsetProgressive
themselves in some fashion.Examples
crt/crt-maximus-royale.slangp
used here, not because it is my favorite shader but because it seems to detect interlacing changes on the fly, and also creates very large textures.SNES 240p, first with PR then without PR:
Sega 32x at combined 320/256 pixel width, first with PR then without PR:
Sonic 2 switching from interlaced to progressive during runtime (works both ways) (crashes without PR):