-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用ggmlv3 q6_K model, inference會掉字 #30
Comments
從程式碼裡面的敘述 這類 Streaming 的輸出遇上 BPE Tokenizer 與中日韓文字都很容易發生類似的情況 |
@wennycooper 我剛剛實際測試了一下,我發現你使用的模型是 ggml v3 的格式,這個格式已經被 ggml 官方標記為 deprecation # CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama_cpp_python==0.2.6
# pip install langchain==0.0.298
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.llms import LlamaCpp
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# load Llama-2 model
llm = LlamaCpp(
model_path="/path/to/Taiwan-LLaMa-13b-1.0.Q4_0.gguf",
n_gpu_layers=100,
n_batch=8,
n_ctx=512,
temperature=0.1,
max_tokens=512,
callback_manager=callback_manager,
)
# response = run_simple_qa(llm, query)
prompt_template = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"""
prompt = prompt_template.format("什麼是深度學習?")
response = llm(prompt)
print(response) Edit: 剛剛測試 gguf q6_k 結果也是沒問題的 |
感謝! |
您好,
我使用ggml quantize 成為 q6_K format, 然後用以下 code 做inference
`
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
`
結果會掉字... 如下:
深度學是機器學的一子集,基人工神經結。使得計算機能通別模式大量中學,而不需要明編程。深度學算法用分、進行和別模式
The text was updated successfully, but these errors were encountered: