Effort Engine (potential method of speeding up LLM matmul by dropping some calculations) #6731
netrunnereve
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This has been making the rounds on social media and I think it's worth posting here so our devs and users can discuss this. I'm still at work and haven't read through the whole thing in detail but it's basically an approach to drop certain multiplications that don't really affect the output.
Apparently this gives better results than dropping full layers, though I don't see any perplexity curves posted.
Article: https://kolinko.github.io/effort/
HN thread: https://news.ycombinator.com/item?id=40067677
Beta Was this translation helpful? Give feedback.
All reactions