Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] IL2CPU IL-level Optimization #169

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ascpixi
Copy link

@ascpixi ascpixi commented Oct 14, 2022

This pull request implements the base building blocks for IL-level optimization, alongside basic optimizations. The IL would normally be optimized with a JITter, but, as IL2CPU is an AOT compiler, IL-level optimizations can be made.

Optimization passes to implement:

  • Property inlining
  • Method inlining
  • Loop unrolling
  • Control flow reordering
  • Redundant instruction elimination

This is a work-in-progress pull request.

@terminal-cs
Copy link
Contributor

What kind of benifits will this have? Faster running speeds, smaller compile size, faster compile, etc

@ascpixi
Copy link
Author

ascpixi commented Oct 14, 2022

What kind of benifits will this have? Faster running speeds, smaller compile size, faster compile, etc

Smaller compile size and faster running speeds. The compile times will increase, but I plan to address this in a future PR; for now, if you need fast compilation, you can simply disable optimization from your build profile

@terminal-cs
Copy link
Contributor

alrighty, what kind of performance gains will there be to expect? anything significant?

@ascpixi
Copy link
Author

ascpixi commented Oct 14, 2022

alrighty, what kind of performance gains will there be to expect? anything significant?

As new passes get added, you can expect quite sizable performance gains - for now, there is only a direct property inline pass that drastically improves performance when using properties - it removes the need for the CPU to jmp to a memory address, meaning that the pipeline does not get cleared; the performance is the same as if you would use a field, because the IL instruction call is directly replaced with stfld/stsfld.

Other features that are planned, such as method inlining and control flow reordering, will boost performance even more. Method inlining will avoid jmps altogether, which will drastically improve performance in loops, and control flow reordering will prioritize the branch that will most likely get called - reducing the number of jumps in that scenario as well.

@ascpixi
Copy link
Author

ascpixi commented Oct 14, 2022

This article covers a good portion of the optimizations that a JIT would normally perform (and, in our case, the Optimizer class, as we lack a JIT).

@zarlo
Copy link
Member

zarlo commented Oct 15, 2022

Method inlining. yes, yes, yes, yes 1000 times yes

this would lets up speed up the current canvas with little work

@quajak
Copy link
Member

quajak commented Oct 15, 2022

This is a great PR! The approach looks very sensible for now. Regarding optimization a very big improvement would be to figure out when we actually need to push + pop to the stack and not only keep it in the registers.

@terminal-cs
Copy link
Contributor

what about compile times and sizes, how will those be effected?

@ascpixi
Copy link
Author

ascpixi commented Oct 25, 2022

what about compile times and sizes, how will those be effected?

Compile times will be extended, as the compiler will need to perform extra passes for each method. Depending on the complexity of the pass, it can take the compiler anywhere from a millisecond to a full second to process a method. For example, if the method has a lot of calls that can be inlined, the InlineMethodsPass (has not yet been commited) will need to perform local analysis and instruction correction for each inlined method call.

This is why there are additional passes like InlineDirectPropertiesPass that will inline every direct property without the need for any method analysis, reducing the load on InlineMethodsPass, which will perform a (relatively) more complex method analysis routine.

As for binary size, this is may vary depending on the set of optimization passes you'll be using. Method inlining will introduce a few more bytes to the final binary, but redundant instruction elimination should balance that out. IIRC, IL2CPU already only compiles in methods that it will need as it uses a scanner. The reason why the final kernel binary is so big is because Cosmos initializes a large majority of devices for you, even if you're not going to use them; so, for example, the network driver will be initialized outside of your kernel code, meaning you don't really have a choice whether it will include it or not.

As an example, the CAI can be used, as it's extensible and allows kernel authors to choose to enable it or not. Compile a kernel that doesn't reference any CAI classes, and then search for AudioBuffer in the assembly file IL2CPU creates; you'll find that no references to such class exists. After adding an audio card initialization routine, and re-compiling, you'll notice that these references get created.

In the cases mentioned previously, the optimizer can't really help you, as it can't simply take code out that it knows a part of the kernel uses; not only would that be dangerous, but also that burden shouldn't lie on the compiler at all. A solution would be to do a refactor of all drivers whose initialization can be delegated to public API methods (like the CAI).

TL;DR: this comment.

@MishaTy
Copy link
Contributor

MishaTy commented Oct 25, 2022

It is also big because each plug that may not be used is included.

@quajak
Copy link
Member

quajak commented Oct 31, 2022

Are all plugs included? They get scanned but I would expect only the required plugs to actually emitted.

@ascpixi
Copy link
Author

ascpixi commented Jan 11, 2023

This PR is currently inactive, but as new IL optimizers have come out as of late, there is a possibility to use such a project (like DistIL) and reduce the amount of work we would need to do under IL2CPU - optimization itself can introduce a lot of buggy behavior, so a lot of upkeep would be required to keep this stable (or, at least, stable for IL2CPU standards).

However, I won't close this PR, as it's not confirmed if these projects would be suitable for IL2CPU - it might be the case that writing an external IL optimizer, suited for IL2CPU, but not directly associated/exclusive to it, would be the best option here.

If anyone wants to take over this PR, let me know, as currently I'm occupied with operating system development with NAOT research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants