Effort Engine (potential method of speeding up LLM matmul by dropping some calculations) #6731

netrunnereve · 2024-04-17T21:53:05Z

netrunnereve
Apr 17, 2024

This has been making the rounds on social media and I think it's worth posting here so our devs and users can discuss this. I'm still at work and haven't read through the whole thing in detail but it's basically an approach to drop certain multiplications that don't really affect the output.

Apparently this gives better results than dropping full layers, though I don't see any perplexity curves posted.

Article: https://kolinko.github.io/effort/
HN thread: https://news.ycombinator.com/item?id=40067677

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effort Engine (potential method of speeding up LLM matmul by dropping some calculations) #6731

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Effort Engine (potential method of speeding up LLM matmul by dropping some calculations) #6731

netrunnereve Apr 17, 2024

Replies: 0 comments

netrunnereve
Apr 17, 2024