Leaderboard requirements #40

msaroufim · 2024-12-05T22:17:02Z

So right now the bot can take in python and cuda programs, run them and post stdout - instead we'd like the runtime and ncu outputs of a solution to be saved in an artifact so we can render a leaderboard and have people compete on producing the fastest kernel for a given target

There's 2 goals

Produce a large set of useful kernels that can be used to train an LLM to produce better kernels
Create a fun feedback loop where gpu mode folks can go from watching a lecture to writing their first performant kernel

And we'd be targeting a launch in January 2025

Dimension of the competition

Even for a simple kernel like softmax llm.c has over 700 LOC dedicated to various variants https://github.com/karpathy/llm.c/blob/master/dev/cuda/softmax_forward.cu so we could start with a competition to produce the fastest softmax. Because even for softmax users can compete on

The GPU type: softmax kernel tuned for H100 won't be like one for T4
Dtype: fp8, fp16
Input shape

But one benefit of having a community per Jordan Jurafsky is people can tell us what kernels they find interesting but the "fastest softmax" in the west could be an interesting dimension for a first practice round

Table schema

We'd need likely need a few tables where new kernels could be added where some likely columns are likely to be

Problem table: Reference code or UUID to the specific problem setting: idea here is we want companies and individuals to submit "interesting kernels" they want people to compete on
Submission information: code for the submission, discord username of the person who submitted it, time of submission
UUID to run information: Which would include stdout, ncu outputs

We also need some versioning for our benchmarking setup

Ncu outputs

One thing we'd like to also learn is what NCU outputs experts are looking at to figure out how to optimize their kernels so open to some ideas for how to add telemetry to figure out what people are looking at

Make each ncu output a field and make people have to expand it out
Put the whole results in the equivalent of an online excel spreadsheet and see where people move over their mouse and collect that adata

Discord submission flow

Similarly to /run modal/github train.py we want a run leaderboard <kernel_problem> <dtype> <GPU> <shape> train.py

GPU could probably be implicit we can run "all"

Then the backend needs to take in this kernel, run it, make sure it matches the correctness of a reference and if it does time it and then rank it among all existing solutions in the leaderboard

Optionally we'd also want a run leaderboard <kernel_problem> without a train.py to give the top entries with links to their code

And finally a run new_learboarboard_problem where people get a few fields: the problem name, the reference solution, an optional bounty, discord name of the person who created that kernel

What is a reference

On random inputs

PyTorch code
Some tolerance values
Optional: Latency target
Cold starts
Number of runs to average
If benchmarking methodology proves incorrect, how do we track and invalidate or rerun old results
top level metrics: wall clock, ncu output, peak memory and

Meeting minutes Dec 5

Flags should be user determined for the compiler
Schema really needs to be locked down
Do people submit the launcher or just the kernel
How do driver scripts look like for different languages
How to include unverified submissions where we don't save their code, mark as unverified but keep it - reference bar
LLM generated kernel - have claude as a baseline
Website without discord so people can investigate the results - I agree that discord is submission flow and then website is result investgiation
How to break apart the work
- Website @anneouyang @simonguozirui
- Schema development @anneouyang @simonguozirui @b9r5
- DB management @b9r5
- Discord submission flow (user facing stuff + interfacing with the DB) @msaroufim @S1ro1 @alexzhang13
- Scheduler changes (compiler flags, gpus supported) @msaroufim @S1ro1 @alexzhang13
- Task definition (softmax, dtype, input sizes set by creator of task) advised by Erik and Arun @jordan-benjamin
- Instrumentation (collecting ncu outputs, visualizing those outputs in a website): at least for January do the ncu output collection only

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard requirements #40

Leaderboard requirements #40

msaroufim commented Dec 5, 2024 •

edited

Loading

Leaderboard requirements #40

Leaderboard requirements #40

Comments

msaroufim commented Dec 5, 2024 • edited Loading

Dimension of the competition

Table schema

Ncu outputs

Discord submission flow

What is a reference

msaroufim commented Dec 5, 2024 •

edited

Loading