Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create language binding generation tools for SDL3 #6

Open
flibitijibibo opened this issue Mar 19, 2024 · 17 comments
Open

Create language binding generation tools for SDL3 #6

flibitijibibo opened this issue Mar 19, 2024 · 17 comments

Comments

@flibitijibibo
Copy link
Owner

Introductory Information:

SDL3 is the upcoming major release of SDL that breaks the ABI for the first time in over 10 years, in favor of dramatically improving and redesigning core APIs. It is currently being used by Source 2 internally, but will likely be adopted very quickly by the countless projects currently using SDL2, including FNA.

SDL2# is the C# binding for SDL2, originally built for FNA shortly before SDL 2.0 was officially tagged. The binding has been handwritten and hand-maintained for the entirety of its lifespan. This worked well for most of that time, but in recent years it's gotten increasingly difficult to keep up with upstream SDL changes.

The Project:

Since many different languages want to have SDL bindings, there is now a proposal for SDL3 to maintain machine-readable API files that can easily generate bindings for numerous languages:

libsdl-org/SDL#6337

This bounty specifically targets C#; the goal is to make machine readable definitions and then make a program that generates SDL3.cs, which should look and feel exactly like SDL2#.

The machine-readable files will be hosted in the SDL repo; ideally we'll be able to use the SDL headers directly, but if we have to generate something similar to dynapi that should be fine too.

The C# generator script and the generated bindings will be hosted in the FNA repository directly, with SDL itself being the new submodule rather than a new SDL3-CS repo.

Prerequisites:

Experience with scripting and working with both C headers and machine-readable data formats will help a lot; the language isn't locked down but existing scripts for SDL use Python and Perl. Knowing C# will probably help but is not as important as the other prerequisites.

Example Games:

XNA/FNA games will be making use of this by generating SDL3 bindings that look and act similar to SDL2#, and will be the basis for the SDL3_FNAPlatform, which will live side-by-side with SDL2_FNAPlatform for maximum compatibility with the existing catalog of games.

How Much Can flibit Help?

I'm not great with automated bindings generation, but having worked on SDL2# for a decade I should be able to explain any possible quirks that C# (for example) might have that will affect the API definitions. There are others working on other languages that will probably want to test and give feedback as well.

Budget/Timeline

If done as a part-time gig, I would expect this to take about a month, with the majority of the work being at the first and last steps (the first being getting something generated at all, and the last being upstreaming it for SDL 3.0). There is $4000 USD allocated for this project.

@lithiumtoast
Copy link
Sponsor

lithiumtoast commented Mar 19, 2024

Hello, I would like to present for this bounty as part-time.

Introduction

Back in 2020/2021 I was trying to create C# bindings for a specific C library called sokol. I had the same problem outlined here that changes upstream resulted in a lot of manual work to fix/update the C# bindings. Thus, I experimented to automate the process by using libclang to parse the C header and generate the C# bindings. See this sokol GItHub issue for my original approach.

My findings have been put into the following projects as tools with a lot of learnings and documentation:

I have tried using the tooling on various C libraries with early limited testing success including: SDL, FAudio,FNA3D, Theorafile, imgui, sokol, libuv, flecs, etc.

You can find an example of the generated bindings from https://github.com/bottlenoselabs/SDL-cs/blob/main/src/cs/production/SDL/Generated/SDL.gen.cs. This file gets updated by using Github Actions pipelines via Dependabot to open a new PR when upstream has been updated.

Problems

As others have described in the linked SDL GitHub issue, using libclang has some issues. Most notably C function-like macros. I have documented some general rules for the C code to make it suitable for bindgen to C# and other languages: https://github.com/bottlenoselabs/CAstFfi?tab=readme-ov-file#limitations-is-the-c-library-ffi-ready.

The best outcome is that changes are made upstream to SDL to accommodate making bindgen friendly. Worse case is that some workarounds would be done for the purposes of generating C# bindings which would be unique to to FNA.

The good news is that SDL2#, and thus FNA, is not using the full feature set of SDL. For the purposes of this bounty, the subset of the larger problem could be solved by focusing on generating the C# bindings that are only used by FNA. The larger problem could be achieved as a seperate milestone after additional feedback.

Questions

  • Can all the work remain and be released under an open source license?
  • Which organization or person owns the IP?

Proposed Deadline

I would be working on this part-time.

Two Weeks of March 18th to March 31st

  • Create a new Git repository under my name lithiumtoast to demonstrate the tooling as a test run replacement for SDL2#.
  • Re-write the CAstFfi project (renamed c2ffi) with tests to solve some bugs as described here: Anonymous fields not used bottlenoselabs/c2cs#160.

Week of April 1st to April 7th

  • Iterate on any feedback.

@flibitijibibo
Copy link
Owner Author

This bit from the README should be able to answer both:

All source code written is yours to keep. This is NOT a work for hire; you will retain full copyright ownership of the code you write. All I am asking is that we get the right to use/publish that code under the license used by the project. For example, if a project is released under zlib, we get to use that code under zlib until we decide to change the license, at which point we will ask you for permission to do so first. If you decide that code you wrote is useful for some proprietary project and it makes you a zillion dollars, fine by me!

The rest sounds okay, so will mark as In Progress.

@Susko3
Copy link

Susko3 commented Mar 19, 2024

I'm interested in taking on this.

I've recently been experimenting with generating SDL3 bindings from header files. I've used a combination of ClangSharpPInvokeGenerator, C# Source Generators and manually written .cs and ClangSharp configuration files.

  • ClangSharp is really good at creating "bit-perfect" bindings, including things like unmanaged function pointers
  • Sourcegen is used to automatically generate friendly overloads for char * functions, using string / ReadOnlySpan<byte> (for UTF8 literals)
  • Manually written .cs files are required for macro functions, and typedefs
    • typedefs could realistically be automatically generated with a simple python script or similar

I have some questions about the C# side of things:

  • What are your requirements for C# and .NET versions?
    • The SDL2# project is structured in a weird way and has #if NET6_0_OR_GREATER etc. sprinkled troughout. I'd like to avoid that if possible.
  • Are you considering releasing the bindings as a nuget package? That way projects other than FNA could use them.
  • How important are friendly C# function definitions (eg. in, out, ref params instead of raw pointers)?
    • I presume char * functions are really important to have friendly definitions to prevent memory leaks

I'm also happy to help with creating demo SDL3 applications to check that the generated bindings are simple to use and correct.
Possibly even cross-check the bindings with header files.

@flibitijibibo
Copy link
Owner Author

I'm interested in taking on this.

I've recently been experimenting with generating SDL3 bindings from header files. I've used a combination of ClangSharpPInvokeGenerator, C# Source Generators and manually written .cs and ClangSharp configuration files.

* ClangSharp is really good at creating "bit-perfect" bindings, including things like unmanaged function pointers

* Sourcegen is used to automatically generate friendly overloads for `char *` functions, using `string` / `ReadOnlySpan<byte>` (for UTF8 literals)

* Manually written `.cs` files are required for macro functions, and typedefs
  
  * typedefs could realistically be automatically generated with a simple python script or similar

I have some questions about the C# side of things:

* What are your requirements for C# and .NET versions?
  
  * The SDL2# project is structured in a weird way and has `#if NET6_0_OR_GREATER` etc. sprinkled troughout. I'd like to avoid that if possible.

* Are you considering releasing the bindings as a nuget package? That way projects other than FNA could use them.

* How important are friendly C# function definitions (eg. `in`, `out`, `ref` params instead of raw pointers)?
  
  * I presume `char *` functions are really important to have friendly definitions to prevent memory leaks

I'm also happy to help with creating demo SDL3 applications to check that the generated bindings are simple to use and correct. Possibly even cross-check the bindings with header files.

Note that @lithiumtoast may have already started, so definitely coordinate with them if you both want to work on this at the same time.

To answer the questions (all of which are good!):

  • SDL3.cs will need to compile on VS2010, so .NET 4.0 is the minimum spec - however since it's generated we can go harder with the newer .NET features if it makes stuff run faster.
  • For SDL3 I think I'm actually going to just keep our bindings internal to FNA (though they'll still be public like SDL2) - we like the style of SDL2# but basically nobody else does, so the hope is that we can have our bindings our way and the C# community can use the same definitions to make something more to their liking on NuGet.
  • For SDL2# this is how we did it:
    • For all pointers that can be null, we need an IntPtr overload so that IntPtr.Zero can be passed
    • For memory that SDL will end up initializing, we use out, otherwise we use ref
    • For char we ended up doing UTF8 marshaling ourselves, so it was a combination of byte[] and byte*, with the UTF8 and memory helpers at the top. .NET 4 unfortunately still needs this but maybe the modern .NET config can use Utf8Marshaler.

@Susko3
Copy link

Susko3 commented Mar 19, 2024

SDL3.cs will need to compile on VS2010, so .NET 4.0 is the minimum spec - however since it's generated we can go harder with the newer .NET features if it makes stuff run faster.

I'm not sure I understand what you mean by this. Even if the code is generated, VS2010 will still have to compile it.

Unless you mean that there'll be two versions of each function, one for .NET 4.0 compatibility, and one using modern syntax and features. And preprocessor directives could be used to compile only the function relevant for the tooling used.

@lithiumtoast
Copy link
Sponsor

lithiumtoast commented Mar 19, 2024

Hey @Susko3, if you want to share this bounty that's OK with me. It would be great to get some help.

  • ClangSharp: I originally was using ClangSharp but ended up out growing the project. I wanted to add additional things, make the user experience better, etc. I found out later after talking to some other folks in the space that I really only wanted to target cross-platform C libraries. I did not care for generating bindings for Windows APIs or C++ libraries which is a major use case for ClangSharp internally at Microsoft. I decided to roll my own so I could make changes as I wanted and not have to rely on getting PRs in upstream or forking ClangSharp. This is a common thread for Silk.NET and others that leverage libclang for cross-platform APIs.

  • SourceGenerators: I have tried source generators and found them useful in only specific contexts where you have some C# code you want to write inside your end application project and want to generate code from that code. I did not find source generators useful for creating C# bindings when the C API is fixed via source control. Furthermore I discovered that using libclang on Windows to have different results in specific situations than using libclang on Linux or macOS because how compilers are implemented and how macro objects are used for conditional compiling. This makes source generators not really the right tool for the job because the information used as input to source generators is not uniquely bound to the same machine. Instead, to capture all the information, the bindgen needs to run all different development environments (Windows, macOS, Linux, etc), then merge the results into an agnostic machine readable file. Then, generate C# code from that machine readable file.

@lithiumtoast
Copy link
Sponsor

lithiumtoast commented Apr 2, 2024

Posting here with an update.

I re-wrote and renamed CAstFfi to https://github.com/bottlenoselabs/c2ffi. I did this while setting up end-to-end tests for GitHub Actions making sure everything works correctly (so far) for all desktop platforms (Windows, macOS, Ubuntu) when using libclang. What's changed from CAstFfi is simpler logic, various bug fixes, and an updated data model for the machine readable FFI which is more canon to libclang.

My plan is to update https://github.com/bottlenoselabs/c2cs to use c2ffi this week. Then finish creating that example repository for bindgen of SDL. I expect to have that repository ready for first round of feedback on April 9th.

@lithiumtoast
Copy link
Sponsor

lithiumtoast commented Apr 14, 2024

Hello, I have things ready for some feedback. Apologies about being the timeline being a bit later then what I mentioned earlier.

You can find the generated C# code here from the latest SDL main commit: https://github.com/lithiumtoast/SDL-cs/blob/main/src/cs/production/SDL/Generated/SDL.gen.cs
The GitHub Actions workflow run is here to which you can find the machine readable files by the artifacts: https://github.com/lithiumtoast/SDL-cs/actions/runs/8680613309
If there any questions about how it works I would be happy to explain.

At this time it doesn't include SDL_image and friends. However, it should be sufficient to have some discussion and and ask questions.

I'll get us started:

  1. Is there a specific branch of the SDL repository to use for bindgen? I'm assuming main is fine for SDL3.

  2. I see that SDL2# uses tabs. The currently generated C# code is using spaces. Do you want the generated code to be using tabs?

  3. The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?

  4. Some header files from SDL have been ignored for the purposes of the bindgen. You can see which ones here: https://github.com/lithiumtoast/SDL-cs/blob/94ca7d6d863aad5169d0bac86afc1cf432728a69/bindgen/config-extract.json#L12. Is this assumption okay? Should the generated C# code cover all of SDL? Or is there only a subset that should be covered for the purposes of C#?

  5. On that note it's unfortunate that SDL_stdinc.h has so many functions that are not useful in C# but SDL_Free is in in that header. You can see in the extract config mentioned above that a lot of functions and macro objects are ignored from SDL_stdinc.h. Perhaps the case could be made to move the memory related stuff to its own header? Is it possible to make changes to the SDL repository for the purposes of bindgen?

  6. Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:

    • C# function pointers for callbacks from C. In the .NET 4 generated version, C# delegates will have to be used for callbacks. This already makes the two versions of the C# bindings significantly different. Is that okay?

    • Disabled runtime marshalling. For example, System.Boolean is properly mapped to bool in C with runtime marshalling disabled (see CBool in the C2CS.Runtime for when it is enabled by default). The only "undesired" consequence is that ref and out is not supported to which unsafe pointers will need to be used in C#. Is this taking it to far? I suppose if it is disabled that the P/Invoke functions in C# could have some generated "wrappers" as a compromise. Should runtime marshalling be disabled in the .NET 8+ version?

    • UTF8 string literals. I have generally been handling char * as a CString in C2CS.Runtime. I see that in SDL2# like you mentioned, the approach was to just have wrapper methods to use System.String in C#. I think it would be better if the .NET 8+ version of the bindgen got to using UTF8 string literals where ever possible.

  7. Please take a look at the generated code mentioned at the beginning. Is there anything else not mentioned that would need to be changed?

@kg
Copy link
Sponsor

kg commented Apr 14, 2024

  1. The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?

Aside from the "nuget or not" question, it would be ideal from a security/supply-chain perspective if the NuGet is one that's built and distributed by the SDL organization and not an unknown third party.

  1. Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:

If you're not aware, you can use the NET8_0_OR_GREATER preprocessor define in C# if you want to generate a single file that supports both runtimes. I don't know if that will be the ideal approach, though.

@flibitijibibo
Copy link
Owner Author

Will definitely do a deep dive tomorrow, but to answer what I can ASAP...

Is there a specific branch of the SDL repository to use for bindgen? I'm assuming main is fine for SDL3.

Yep, we'll be using the main branch until SDL4 starts.

I see that SDL2# uses tabs. The currently generated C# code is using spaces. Do you want the generated code to be using tabs?
It'd be nice to have, but since it's autogenerated anyway it's not too big a deal what the exact style is.

The generated code as it is right now has a dependency on the following code via NuGet: https://github.com/bottlenoselabs/c2cs/tree/main/src/cs/production/C2CS.Runtime. I think you would like for these supporting code to added directly to the bindgen to remove the dependency. What do you think? Should the generated C# code have zero dependencies?

I'd say remove them if possible; we were able to do all our bindings without dependencies up until now so it should be possible to keep that going.

Some header files from SDL have been ignored for the purposes of the bindgen. You can see which ones here: https://github.com/lithiumtoast/SDL-cs/blob/94ca7d6d863aad5169d0bac86afc1cf432728a69/bindgen/config-extract.json#L12. Is this assumption okay? Should the generated C# code cover all of SDL? Or is there only a subset that should be covered for the purposes of C#?
Audio and IOStream might get used (it's rare but some real use cases have come up before)

On that note it's unfortunate that SDL_stdinc.h has so many functions that are not useful in C# but SDL_Free is in in that header. You can see in the extract config mentioned above that a lot of functions and macro objects are ignored from SDL_stdinc.h. Perhaps the case could be made to move the memory related stuff to its own header? Is it possible to make changes to the SDL repository for the purposes of bindgen?

Sam/Ryan might consider something like SDL_math.h, stdint, etc, but worst case we can ignore this header and generate the allocator functions ourselves.

Since you mentioned that .NET 4.0 is the minimum spec, it seems like there will be need to be two generated versions of the C# code. One for the .NET 4 target and one for the .NET 8+ target. Is that okay? How much can these versions be different between each other? The reason for is that there are some significant differences for interoperability where modern .NET has made some changes some of which are fairly good for performance:

This is fine with me - as long as the code using the functions isn't any different it's okay for the generated stuff to do more optimal things internally (we have a handful of these in SDL2.cs). If it's the case that it would break ABI to have two different versions, the ABI should prefer the old way over the new way (someone making their own bindings with this generator could optimize for themselves if they wanted to though!).

Please take a look at the generated code mentioned at the beginning. Is there anything else not mentioned that would need to be changed?

Will take a look soon!

@lithiumtoast
Copy link
Sponsor

If you're not aware, you can use the NET8_0_OR_GREATER preprocessor define in C# if you want to generate a single file that supports both runtimes. I don't know if that will be the ideal approach, though.

Yup, I am doing that in a supporting assembly attributes file to the main generated C# file AssemblyAttributes.gen.cs.

Speaking of which. The usage of partial on the static class makes makes it possible for the generated C# code to be split across multiple files if that makes thing easier. For example, it would be possible to have one generated C# file to match each C header. Or, to have on C# file for each struct, enum, etc.

worst case we can ignore this header and generate the allocator functions ourselves

This could be a good example of using another C# file with partial to write supporting "extra" functions that should be rolled into the static class.

@lithiumtoast
Copy link
Sponsor

lithiumtoast commented Apr 30, 2024

Update: April 29th 2024

https://github.com/lithiumtoast/SDL-cs

  • Removed the dependency on the NuGet package. I actually forgot I had an option for this already so this was really low hanging fruit. I removed namespace bottlenoselabs.C2CS.Runtime to simply Bindgen.Runtime though so there is no branding.

Remaining work:

  • Add a second GitHub Actions workflow (or roll more steps into the existing one) for the .NET Framework target to generate bindings with different options. The output directory for generated C# bindings will need to be adjusted with a base Target Framework Moniker (TFM). For example, ../Generated/net40/*.cs vs ../Generated/net80/*.cs.
  • Iterate on adding different configuration options to support .NET Framework vs .NET modern. Probably will result in simplifying the configuration options into specifying a .NET version which the existing options will be rolled into checking if the feature is supported in that .NET version. For example, C# function pointers require at least C# 9 which matches to at least .NET 5 and fallback to function delegates if not supported on that version of .NET.

@lithiumtoast
Copy link
Sponsor

lithiumtoast commented May 8, 2024

Update: May 8th 2024

Apologies for the delay as I caught COVID and was not functioning most of the time.

https://github.com/lithiumtoast/SDL-cs

  • Added support for generating code that is compatible with .NET Framework 4.0 and .NET Core 8.
    • A Target Framework Moniker (TFM) is used as input to determine what version of .NET is available for the generated code.
    • Lack of support for Span<T> is pretty grim in .NET Framework 4.0. Polyfill for Span<T> by using the NuGet package System.Memory is not even an option since it's only supported in .NET Framework 4.5+. Have to fallback to using arrays with a method/property for the edge case of fixed buffers fields inside structs. However, all structs are still 100% blittable.

Remaining work:

  • Fix a bug with enums not having values in c2ffi. For example, SDL_Scancode is not being generated correctly and is currently empty in the generated C# files.
  • Look into using ref and out for function parameters.
  • Generate code for SDL_image and friends.

@TerensTare
Copy link

TerensTare commented Jun 17, 2024

A bit late to the party but here's my attempt repo, which uses Python with tree sitter to parse the sources and generate the bindings. Currently it needs libsdl-org/SDL#9907 to work properly or else in/out parameters cannot be distinguished properly.

I tried to stay as close as I could to the design of SDL2# and use old C# features as well, but I don't have an old compiler to find the minimum C# version needed.

Feel free to ask about any questions/concerns related to the code.

Update: Here's some sample code that shows how to use the generated API: gist

@flibitijibibo
Copy link
Owner Author

SDL_GPU's public interface is starting to slow down, so I'm going to try and put some time into this before we fully freeze - @lithiumtoast, is it possible for GitHub Actions to do a daily run that pulls the latest main branch of SDL and regenerates the bindings? That way we can try to get SDL3_FNAPlatform scribbled out without having to take away too much of your time!

@lithiumtoast
Copy link
Sponsor

is it possible for GitHub Actions to do a daily run that pulls the latest main branch of SDL and regenerates the bindings

Yeah I can see what I can do this weekend.

@flibitijibibo
Copy link
Owner Author

This ended up getting stomped on a bit by a new generator by @thatcosmonaut and @cryy22 - it uses an ffi.json but it otherwise is an all-new generator:

https://github.com/flibitijibibo/SDL3-CS/tree/main/GenerateBindings

It includes a lot of additional annotations to deal with ref/out, and falls back with a warning like WARN_UNKNOWN_POINTER_PARAMETER when the metadata doesn't cover a function's parameters yet.

SDL3_FNAPlatform is in progress over here: FNA-XNA/FNA#494

There's still work to be done but we're getting close!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants