Experiments with RWKV model. Currently it reproduces the v4 model, but as I experiment, things will look different.
Known issues:
- training is slow, probably a bug in the code.
Relevant links
- original rwkv repo
- mostly followed the pico implementation instead: rwkv-decon