How do I train on a translation dataset? (eg. {"english": "...", "chinese": "...") #1046
-
I am trying to fine tune mistral 7B on Sundanese and I am confused at how exactly did the Chinese-Alpaca fine tuners trained the model on their translation training dataset that can be found here: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Training-Details#Training-Data They showed an example dataset structure like so: I ofcourse don't see the option of a dataset format that fits that, so do I just use completion raw corpus dataset type and put that inside the text key? Like so:
I am ofcourse going to use my own Sundanese dataset that I've made, but I am asking to figure out what's the best way to do this, so the model learns the translations the best. Any help would be appreciated! Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
In alpaca dataset, there are examples where the instruction is "Translate the following input from __ to ___".. |
Beta Was this translation helpful? Give feedback.
In alpaca dataset, there are examples where the instruction is "Translate the following input from __ to ___"..