We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi
想請教一下在進行微調時是否有需要對資料的格式進行處理,在網路上有看到不同作法,例如:
<s>[INST] {instruction} [/INST] {response} </s>
attention_mask
input_ids
想詢問哪一種方式會是比較好的,同時也好奇attention_mask在微調過程中的必要性,以目前Hugging Face的SFTTrainer而言,並未有一個參數能指定這個mask的名稱,實在不確定提供了之後是否會被使用,以及這向資訊是否為必要的。
感謝撥冗閱讀,還請不吝賜教。
The text was updated successfully, but these errors were encountered:
如果你自己寫腳本訓練,我建議用 1 就好,簡單有效。
這問題可以回答有深有淺,會關乎你要不要 1. 訓練在 user input / 2. use flash attention? / 3. packing? 等等等,所以我建議你直接熟悉 axolotl 哈哈哈哈 他會幫你準備這些 model input。
Sorry, something went wrong.
No branches or pull requests
Hi
想請教一下在進行微調時是否有需要對資料的格式進行處理,在網路上有看到不同作法,例如:
<s>[INST] {instruction} [/INST] {response} </s>
的形式紀錄,並直接提供給SFTTrainer進行微調。attention_mask
以及input_ids
後才提供給SFTTrainer進行微調。想詢問哪一種方式會是比較好的,同時也好奇
attention_mask
在微調過程中的必要性,以目前Hugging Face的SFTTrainer而言,並未有一個參數能指定這個mask的名稱,實在不確定提供了之後是否會被使用,以及這向資訊是否為必要的。感謝撥冗閱讀,還請不吝賜教。
The text was updated successfully, but these errors were encountered: