Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaderboard requirements #40

Open
msaroufim opened this issue Dec 5, 2024 · 0 comments
Open

Leaderboard requirements #40

msaroufim opened this issue Dec 5, 2024 · 0 comments

Comments

@msaroufim
Copy link
Member

msaroufim commented Dec 5, 2024

So right now the bot can take in python and cuda programs, run them and post stdout - instead we'd like the runtime and ncu outputs of a solution to be saved in an artifact so we can render a leaderboard and have people compete on producing the fastest kernel for a given target

There's 2 goals

  1. Produce a large set of useful kernels that can be used to train an LLM to produce better kernels
  2. Create a fun feedback loop where gpu mode folks can go from watching a lecture to writing their first performant kernel

And we'd be targeting a launch in January 2025

Dimension of the competition

Even for a simple kernel like softmax llm.c has over 700 LOC dedicated to various variants https://github.com/karpathy/llm.c/blob/master/dev/cuda/softmax_forward.cu so we could start with a competition to produce the fastest softmax. Because even for softmax users can compete on

  1. The GPU type: softmax kernel tuned for H100 won't be like one for T4
  2. Dtype: fp8, fp16
  3. Input shape

But one benefit of having a community per Jordan Jurafsky is people can tell us what kernels they find interesting but the "fastest softmax" in the west could be an interesting dimension for a first practice round

Table schema

We'd need likely need a few tables where new kernels could be added where some likely columns are likely to be

  1. Problem table: Reference code or UUID to the specific problem setting: idea here is we want companies and individuals to submit "interesting kernels" they want people to compete on
  2. Submission information: code for the submission, discord username of the person who submitted it, time of submission
  3. UUID to run information: Which would include stdout, ncu outputs

We also need some versioning for our benchmarking setup

Ncu outputs

One thing we'd like to also learn is what NCU outputs experts are looking at to figure out how to optimize their kernels so open to some ideas for how to add telemetry to figure out what people are looking at

  1. Make each ncu output a field and make people have to expand it out
  2. Put the whole results in the equivalent of an online excel spreadsheet and see where people move over their mouse and collect that adata

Discord submission flow

Similarly to /run modal/github train.py we want a run leaderboard <kernel_problem> <dtype> <GPU> <shape> train.py

GPU could probably be implicit we can run "all"

Then the backend needs to take in this kernel, run it, make sure it matches the correctness of a reference and if it does time it and then rank it among all existing solutions in the leaderboard

Optionally we'd also want a run leaderboard <kernel_problem> without a train.py to give the top entries with links to their code

And finally a run new_learboarboard_problem where people get a few fields: the problem name, the reference solution, an optional bounty, discord name of the person who created that kernel

What is a reference

On random inputs

  • PyTorch code
  • Some tolerance values
  • Optional: Latency target
  • Cold starts
  • Number of runs to average
  • If benchmarking methodology proves incorrect, how do we track and invalidate or rerun old results
  • top level metrics: wall clock, ncu output, peak memory and

Meeting minutes Dec 5

  • Flags should be user determined for the compiler
  • Schema really needs to be locked down
  • Do people submit the launcher or just the kernel
  • How do driver scripts look like for different languages
  • How to include unverified submissions where we don't save their code, mark as unverified but keep it - reference bar
  • LLM generated kernel - have claude as a baseline
  • Website without discord so people can investigate the results - I agree that discord is submission flow and then website is result investgiation
  • How to break apart the work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant