More gpu memory saving for llama #20

sfc-gh-zhwang · 2023-10-01T06:20:25Z

Get rid of the qkv_buf_tmp_ -> qkv_buf_ with repeat_kv hack in llamacontextdecoding to save gpu memory.

sfc-gh-zhwang added 18 commits September 30, 2023 19:19

commit

aa9176c

commit

b3e68ec

commit

7645f04

commit

2bdbed5

commit

aaec0de

commit

bbf8791

commit

d81a7df

commit

ed1e2c7

commit

8f02927

commit

c4705f6

commit

5c338ff

commit

21167b2

commit

6c4524b

commit

3359362

commit

6d14988

commit

7aa3f45

commit

b358534

commit

5523f1e

sfc-gh-zhwang changed the title ~~Zhwang/more mem~~ More gpu memory saving for llama Oct 1, 2023

sfc-gh-zhwang added 6 commits September 30, 2023 23:23

commit

81a856c

commit

8aa0208

commit

c74dd65

commit

24d6602

commit

c051547

commit

0854463

Provide feedback