how to prevent llama cpp python for printing text on CLI #7140

rites1095 · 2024-05-08T07:18:35Z

rites1095
May 8, 2024

i am using llama python cpp . i am running below code
from llama_cpp import Llama
import timeit

from PyPDF2 import PdfReader
start = timeit.default_timer()
path = r'C:\Users\f162\data\cc.pdf'
pnb_path = r'D:\llama_cpp\SBI.pdf'
reader = PdfReader(pnb_path)
number_of_pages = len(reader.pages)
print(number_of_pages)
page = reader.pages[0]
text = page.extract_text()

text1 = text.splitlines()

new_text1 = ' '.join(text1[:15])
print(new_text1)

prompt = "extract account name Branch Name, Branch Address, from the " + new_text1

print(prompt)

llm = Llama( model_path=r"D:\contract_note\llama-2-13b-chat.Q5_K_S.gguf", chat_format="chatml")

llm = Llama( model_path=r"D:\contract_note\llama-2-13b-chat.Q5_K_S.gguf", chat_format="chatml",n_ctx=2048)

x = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a helpful assistant that outputs in JSON.",
},
{"role": "user", "content": prompt},
],
response_format={
"type": "json_object",
"schema": {
"type": "object",
"properties": {"branch name": {"type": "string"},"branch address": {"type": "string"},"customer address": {"type": "string"}},
"required": ["branch name","branch address","customer address"],
},
},
temperature=0.2,
)

print(len(x['choices'][0]['content']))

print(x['choices'][0]['message']['content'])
end = timeit.default_timer()

print(end-start)
on running it is printing various text on CLI . below is given text I want to prevent it
02002149 02 MAR 2024 INR 4886.63RIBHU SHARMA Drawing Power11 Mar 2024 IFS CodeSavings Account Description Balance as on Search forAccount Name 0.0081/209 SECTOR 8, PRATAP NAGAR, JAIPUR, 302033 BranchAccount NumberDate KUMBHA MARG PRATAP NAGAR JAIPUR 15 JUL 2021 to 02 MAR 20247119593780351105648681 Interest Rate(%p.a.)Address CIF No. Yes Nomination RegisteredSBIN0031840 MICR Code2.7000State Bank of India Date Credit Balance DetailsRef No./Cheque
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from D:\contract_note\llama-2-13b-chat.Q5_K_S.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv  0:            general.architecture str       = llama
llama_model_loader: - kv  1:                general.name str       = LLaMA v2
llama_model_loader: - kv  2:            llama.context_length u32       = 4096
llama_model_loader: - kv  3:           llama.embedding_length u32       = 5120
llama_model_loader: - kv  4:             llama.block_count u32       = 40
llama_model_loader: - kv  5:         llama.feed_forward_length u32       = 13824
llama_model_loader: - kv  6:         llama.rope.dimension_count u32       = 128
llama_model_loader: - kv  7:         llama.attention.head_count u32       = 40
llama_model_loader: - kv  8:       llama.attention.head_count_kv u32       = 40
llama_model_loader: - kv  9:   llama.attention.layer_norm_rms_epsilon f32       = 0.000010
llama_model_loader: - kv 10:             general.file_type u32       = 16
llama_model_loader: - kv 11:            tokenizer.ggml.model str       = llama
llama_model_loader: - kv 12:           tokenizer.ggml.tokens arr[str,32000]  = ["", "~~", "~~", "<0x00>", "<...
llama_model_loader: - kv 13:           tokenizer.ggml.scores arr[f32,32000]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14:         tokenizer.ggml.token_type arr[i32,32000]  = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15:        tokenizer.ggml.bos_token_id u32       = 1
llama_model_loader: - kv 16:        tokenizer.ggml.eos_token_id u32       = 2
llama_model_loader: - kv 17:      tokenizer.ggml.unknown_token_id u32       = 0
llama_model_loader: - kv 18:        general.quantization_version u32       = 2
llama_model_loader: - type f32:  81 tensors
llama_model_loader: - type q5_K: 281 tensors
llama_model_loader: - type q6_K:  1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format      = GGUF V2
llm_load_print_meta: arch       = llama
llm_load_print_meta: vocab type    = SPM
llm_load_print_meta: n_vocab     = 32000
llm_load_print_meta: n_merges     = 0
llm_load_print_meta: n_ctx_train   = 4096
llm_load_print_meta: n_embd      = 5120
llm_load_print_meta: n_head      = 40
llm_load_print_meta: n_head_kv    = 40
llm_load_print_meta: n_layer     = 40
llm_load_print_meta: n_rot      = 128
llm_load_print_meta: n_embd_head_k  = 128
llm_load_print_meta: n_embd_head_v  = 128
llm_load_print_meta: n_gqa      = 1
llm_load_print_meta: n_embd_k_gqa   = 5120
llm_load_print_meta: n_embd_v_gqa   = 5120
llm_load_print_meta: f_norm_eps    = 0.0e+00
llm_load_print_meta: f_norm_rms_eps  = 1.0e-05
llm_load_print_meta: f_clamp_kqv   = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale  = 0.0e+00
llm_load_print_meta: n_ff       = 13824
llm_load_print_meta: n_expert     = 0
llm_load_print_meta: n_expert_used  = 0
llm_load_print_meta: causal attn   = 1
llm_load_print_meta: pooling type   = 0
llm_load_print_meta: rope type    = 0
llm_load_print_meta: rope scaling   = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned  = unknown
llm_load_print_meta: ssm_d_conv    = 0
llm_load_print_meta: ssm_d_inner   = 0
llm_load_print_meta: ssm_d_state   = 0
llm_load_print_meta: ssm_dt_rank   = 0
llm_load_print_meta: model type    = 13B
llm_load_print_meta: model ftype   = Q5_K - Small
llm_load_print_meta: model params   = 13.02 B
llm_load_print_meta: model size    = 8.36 GiB (5.51 BPW)
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token    = 1 ''
llm_load_print_meta: EOS token    = 2 ''
llm_load_print_meta: UNK token    = 0 ''
llm_load_print_meta: LF token     = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =  0.18 MiB
llm_load_tensors:    CPU buffer size = 8555.93 MiB
....................................................................................................
llama_new_context_with_model: n_ctx   = 512
llama_new_context_with_model: n_batch  = 512
llama_new_context_with_model: n_ubatch  = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:    CPU KV buffer size =  400.00 MiB
llama_new_context_with_model: KV self size = 400.00 MiB, K (f16): 200.00 MiB, V (f16): 200.00 MiB
llama_new_context_with_model:    CPU output buffer size =   0.12 MiB
llama_new_context_with_model:    CPU compute buffer size =  85.01 MiB
llama_new_context_with_model: graph nodes = 1286
llama_new_context_with_model: graph splits = 1
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
expand_more

clort81 · 2024-05-25T21:49:56Z

clort81
May 25, 2024

The standard llama.cpp main program outputs the debug info to STDERR; i expect the python does also.
In a *nix terminal you can redirect STDERR to a file or /dev/null to make it disappear.
./your_program_name 2>/dev/null

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to prevent llama cpp python for printing text on CLI #7140

{{title}}

Replies: 1 comment

{{title}}

Select a reply

how to prevent llama cpp python for printing text on CLI #7140

rites1095 May 8, 2024

print(prompt)

llm = Llama( model_path=r"D:\contract_note\llama-2-13b-chat.Q5_K_S.gguf", chat_format="chatml",n_ctx=2048)

print(len(x['choices'][0]['content']))

Replies: 1 comment

clort81 May 25, 2024

rites1095
May 8, 2024

clort81
May 25, 2024