-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e tests that model switching costs and add func arg lowering for the air pipeline #566
Conversation
96adfab
to
576f8e0
Compare
576f8e0
to
c24f9bb
Compare
There are 2 tests which repeat the same matmul N times. Can you update the comments in the tests specifying what they're testing, so that it's clear they're both useful? Have you run this locally with success? CI has numerical errors which aren't zeros... is the offset thing not working? FYI I created a task which is slightly related (compiler side though, not runtime) #541 I think these are useful numerical tests, so happy include them here. But for benchmarking runtime overheads, should we also have some separate code, something along the lines of #415 ? |
Yes I can update the comments, what they are testing is that what happens if you dont switch and just keep calling the same kernel, so they establish the baseline for the switching case, Ya the CI caught an error that I didnt locally becuase I was using uniform inputs, its a fun one where we are wrongly swapping argument locations on the hal.interface.bindings (kind of stuff that is tickled with these "serious" models :) )
The way I am using these tests are with Tracy profiles which give me ns precision timing of all runtime functions,. Of course #415 is useful for the whole model performance benchmarking though, so we should have that for all these "models" too. |
so far havent seen any issues but thats a good feature! |
Thanks, looks good to me. I've left only minor suggestions, so accepting in advance.
Nice fix! I would probably have done this as a separate PR, but they're both pretty small so fine. |
Another change we are making and will have in all iree-compile commands running something beyond a single dispatch is the use of
--iree-scheduling-optimize-bindings=false
. This is needed because there is currently no way to pass runtime constants to generated kernel. In a typical iree backend they are used to pass two thingswith the above flag we wont generate offseted buffers and we dont support dynamic shapes yet so for now this flag allows us to run models where such offsets would otherwise occur.
It was also discovered that the lowering used by the air pipeline does not respect the binding number for the hal.binding ops so we lower them before the
AIRRtToNpuPass
pass can do it.