Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set NOMINSIZE, LTO #796

Merged
merged 1 commit into from
Oct 18, 2024
Merged

set NOMINSIZE, LTO #796

merged 1 commit into from
Oct 18, 2024

Conversation

matthiasdiener
Copy link
Contributor

This results in a ~10% speedup for a simple microbenchmark like:

import pyopencl as cl
import time

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

prg = cl.Program(ctx, """
__kernel void sum()
{

}
""").build()


knl = prg.sum

start = time.time()

for _ in range(100000):
    knl(queue, (1,), None)

# Intentionally disabled, we are only interested in the enqueue time
# queue.finish()

print(time.time() - start)

@inducer inducer merged commit f842a99 into inducer:main Oct 18, 2024
17 checks passed
@inducer
Copy link
Owner

inducer commented Oct 18, 2024

Thanks!

@matthiasdiener matthiasdiener deleted the nominsize branch October 18, 2024 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants