You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Molly.
I'm writing to discuss some thoughts on rwkv. Recently, we are using some models from tier one developed based on Transformers. The high bandwith occupation is a headache issue. I haven't tried your project on device chip. But I'm wondering that whether RWKV outperforms those Transformers based models on bandwith cooupation owing to no kv-cache.
I will look into more details on kv-cache and on device test of your project. I would be grateful if you could offer some hints on this issue.
The text was updated successfully, but these errors were encountered:
Hi, Molly.
I'm writing to discuss some thoughts on rwkv. Recently, we are using some models from tier one developed based on Transformers. The high bandwith occupation is a headache issue. I haven't tried your project on device chip. But I'm wondering that whether RWKV outperforms those Transformers based models on bandwith cooupation owing to no kv-cache.
I will look into more details on kv-cache and on device test of your project. I would be grateful if you could offer some hints on this issue.
The text was updated successfully, but these errors were encountered: