Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom allocators #413

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

ningvin
Copy link
Contributor

@ningvin ningvin commented Aug 7, 2024

Overview

Addresses the proposal in #372.
It indeed took a while until I got around to implement it and this change turned out to be more involved than I expected...

Disclaimer

I am mainly a C++ developer and only rarely write pure C. I tried to stick to the coding style / naming conventions / etc. used throughout MIR but I might have missed some things. You have been warned :-)

Implementation

The general purpose memory management routines (malloc, calloc, realloc, free) are bundled in struct MIR_alloc, which is just a collection of function pointers and some user data. Similarly, function pointers to executable code related memory management functions (mem_map, mem_unmap, mem_protect) are bundled in MIR_code_alloc.

Users wanting to provide custom allocators can do so by calling newly introduced function MIR_init2 (name to be discussed), which takes pointers to both a MIR_alloc and a MIR_code_alloc. Both can be NULL, which results in a default implementation being used. Both pointers are persisted in the MIR_context (fields alloc and code_alloc) for easy consumption by other functions.

The following convenience functions are provided to work with allocators and replace the "raw" functions of corresponding name:

  • void *MIR_malloc (MIR_alloc_t alloc, size_t size)
  • void *MIR_calloc (MIR_alloc_t alloc, size_t num, size_t size)
  • void *MIR_realloc (MIR_alloc_t alloc, void *ptr, size_t old_size, size_t new_size) (a little different than plain realloc, see next section)
  • void MIR_free (MIR_alloc_t alloc, void *ptr)

Similarly, for the executable code management functions:

  • void *MIR_mem_map (MIR_code_alloc_t code_alloc, size_t len)
  • int MIR_mem_unmap (MIR_code_alloc_t code_alloc, void *ptr, size_t len)
  • int MIR_mem_protect (MIR_code_alloc_t code_alloc, void *ptr, size_t len, MIR_mem_protect_t prot)

I initially planned to only feed MIR_alloc into the MIR_context and feed MIR_code_alloc instead into MIR_gen_init, but this turned out to be impossible without yet more involved refactorings.

As it currently stands, the core API (exposed by mir.h, mir-gen.h, etc.) did not change, i.e. existing projects should compile and run just like before without requiring any changes. Internal APIs (like mir-varr.h, mir-htab.h, etc.) did however change, though I tried to keep these to a minimum. More on this in the next section.

Potentially Controversial Design Decision

VARRs keep a pointer to their allocator

Both VARRs and HTABs (as well as bitmaps) now require passing an allocator when calling their create functions. I wanted to avoid also passing allocators when invoking other operations that might need to manage memory (push, expand, destroy, ...), as this would have severly cluttered the API.

Instead, I opted for VARRs storing an additional pointer to the allocator which was passed in when creating them, which is then used for further operations. Slightly increases their memory footprint though.

realloc takes the previous size as an additional parameter

Alternative title: C's and C++'s allocation APIs are not friends.

As I mentioned initially, I come from a C++ background. When designing the allocator interface for MIR, I tried to do it in a way that would also play somewhat nicely with what C++ has to offer. Sadly, what C++ has to offer is less than spectacular: std::allocator is a complete mess and its intended replacement std::pmr::polymorphic_allocator / std::pmr::memory_resource shares some of its quirks:

  • Both do not support a realloc-style operation
  • Both require users to pass the size of the allocation when calling deallocate

To support as wide a range of allocators as possible without complicating the API too much, I opted for the following compromise:

  • MIR_realloc takes the previous size as an additional parameter. This allows allocator implementations to translate calls to MIR_realloc into a allocate - memcpy - free sequence. As MIR_realloc is only called inside mir-varr.h, required changes on MIR's part were minimal.
  • MIR_free on the other hand does not take the previous size. This would have required a lot of changes on MIR's part as there are a decent amount of allocations whose size is dynamically determined at runtime. Users wishing to use an API requiring this will need to do their own bookkeeping.

Of course, we could always go more into one or the other direction and:

  • go with a fully C-style allocation interface, removing the quirks of MIR_realloc
  • go with a fully C++-style allocation interface and only provide MIR_alloc / MIR_free, requiring VARR to always copy manually upon expansion
  • or something other in between... Ah the choices :-)

Documentation

I added some documentation, both for users of MIR as a library as well as developers of MIR, in CUSTOM-ALLOCATORS.md.

Testing

This is partly covered by the existing tests, though it may be nice to add some allocator specific ones. While it would be pretty straightforward to test that a given allocator has been called, the other way around (ensuring that malloc, free, etc. are not called when a custom allocator is passed) is a bit harder. Not sure if there is some cross platform magic to override malloc and the others...

Things I am unsure about

  • Thread safety: As far as I can tell, users supplying custom allocators wishing to call MIR from multiple threads simply need to ensure their supplied functions are thread safe.

Conclusion

I hope this fits MIR and my ramblings did not scare you away already :-)
Let me know if there are things that need improvement / polish.

This change allows users to provide custom allocators during context
creation.
Instead of using `malloc`, `free`, etc. directly, MIR will then instead
use these user-provided allocators for memory management.
@vnmakarov vnmakarov merged commit 8ce5a06 into vnmakarov:master Aug 9, 2024
@vnmakarov
Copy link
Owner

Users wanting to provide custom allocators can do so by calling newly introduced function MIR_init2 (name to be discussed), which takes pointers to both a MIR_alloc and a MIR_code_alloc. Both can be NULL, which results in a default implementation being used. Both pointers are persisted in the MIR_context (fields alloc and code_alloc) for easy consumption by other functions.

Thank you for providing the compatibility.

As it currently stands, the core API (exposed by mir.h, mir-gen.h, etc.) did not change, i.e. existing projects should compile and run just like before without requiring any changes. Internal APIs (like mir-varr.h, mir-htab.h, etc.) did however change, though I tried to keep these to a minimum. More on this in the next section.

I don't consider mir-varr etc as part MIR API. So it is ok to me.

Documentation

Thank you for providing a good documentation (I'll fix one typo later)

Things I am unsure about

* **Thread safety:** As far as I can tell, users supplying custom allocators wishing to call MIR from multiple threads simply need to ensure their supplied functions are thread safe.

I think it is quite reasonable. All widely-used allocators already implement this feature and it should be if the allocator need to be high performant as providing thread-safety affect the allocator design.

Conclusion

I hope this fits MIR and my ramblings did not scare you away already :-) Let me know if there are things that need improvement / polish.

It is a big step forward. I think we will have time to polish it after getting feedback from MIR users. We need to finalize this only before the next MIR release which probably will not happen this year.

I think you should add yourself (as first person) to copyright to the new files. If you don't want it, you can add yourself as an author.

Thank you very much for the PR. I consider your API choices reasonable and most probably I'd made the analogous ones.
You did a big and good job. It is actually the biggest PR I received for MIR-project.

I merged the PR but found some problems. The biggest one is parallel bootstrap failure. So I reverted the patch. I'll continue to work on the patch and the PR you just sent recently on the next week. I feel it is important to add this code to MIR-project.

@ningvin
Copy link
Contributor Author

ningvin commented Aug 10, 2024

Thank you very much for the PR. I consider your API choices reasonable and most probably I'd made the analogous ones.
You did a big and good job.

Glad to hear that, happy I can contribute :-)

I think you should add yourself (as first person) to copyright to the new files. If you don't want it, you can add yourself as an author.

I have seen other projects adopt an AUTHORS.md file, e.g. the Godot Engine. They then proceed to simply reference that in their copyright, e.g. like so (for Godot they also break down the different development periods, not sure if that is necessary):

/* Copyright (c) 2014-present Godot Engine contributors (see AUTHORS.md). */
/* Copyright (c) 2007-2014 Juan Linietsky, Ariel Manzur.                  */

I do not feel particularly strong for any of these, so either would work for me (copyright / author / AUTHORS.md). The latter probably scales better for larger projects.

I'll continue to work on the patch and the PR you just sent recently on the next week.

Let me know if I can be of help :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants