-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: "_pickle.UnpicklingError: pickle data was truncated" #65
Comments
Errors in the ingestion process can lead to unsynchronised or corrupt data. Causes of errors could be:
Results could be:
The fix would seem to require a catch and rollback in the injest.py file and _process_sources function. Additionally it would seem that a repair script is required to remove past corrupt ingestion or this crashes the query function.
|
Hi, all the data is first received through the network and processed before being added to the vector db. It seems the plug was pulled right when it was doing so. A graceful shutdown at this point would be a feature request. Please open a new issue for that. From #66, it seems you mount a folder for the persistent directory so let's use that local folder.
I have not tested this personally but the docs should not be lying. Having a backup will be vital here. |
I appreciate the above and need for regular overnight backups. I will open a feature request. I would think there need to be some sort of journal with a rollback of incomplete jobs on restart if required. |
Describe the bug
Context chat queries are failing today with a 500 internal server error in the context_chat_backend docker container.
Could this be because the Docker container was restarted while an embedding task was running?
I also had the issue with the latest groupfolders update yesterday that caused the fulltextsearch process to fail and trigger 500 errors for Nextcloud as a whole (see files_fulltextsearch issue #265). Could embedding corruption have occurred during this process?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
This depends on the cause. If it is due to a docker container stop, then I would expect a graceful shutdown. If the embedding process fails due to another error (e.g. the fulltextsearch bug), then the embedding process should gracefully stop. If there is a corrupt pickle file, then I would expect an error recovery procedure.
Server logs (if applicable)
NA
Context Chat Backend logs (if applicable, from the docker container)
Screenshots
See logs
Setup Details (please complete the following information):
Additional context
docker logs context_chat_backend:
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from persistent_storage/model_files/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = ehartford_dolphin-2.2.1-mistral-7b
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 17
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["", "
", "", "<0x00>", "<...llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 19: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens cache size = 5
llm_load_vocab: token to piece cache size = 0.1637 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32002
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = ehartford_dolphin-2.2.1-mistral-7b
llm_load_print_meta: BOS token = 1 '
'llm_load_print_meta: EOS token = 32000 '<|im_end|>'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_print_meta: EOT token = 32000 '<|im_end|>'
llm_load_print_meta: max token length = 48
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: Tesla P40, compute capability 6.1, VMM: yes
Device 1: Tesla P40, compute capability 6.1, VMM: yes
llm_load_tensors: ggml ctx size = 0.41 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 85.94 MiB
llm_load_tensors: CUDA0 buffer size = 2495.28 MiB
llm_load_tensors: CUDA1 buffer size = 2311.78 MiB
...................................................................................................
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 544.00 MiB
llama_kv_cache_init: CUDA1 KV buffer size = 480.00 MiB
llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model: CUDA0 compute buffer size = 640.01 MiB
llama_new_context_with_model: CUDA1 compute buffer size = 640.02 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 72.02 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 3
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Model metadata: {'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'ehartford_dolphin-2.2.1-mistral-7b', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '17'}
Using fallback chat format: llama-2
TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.start', 'status': 500, 'headers': '<...>'}
INFO: 192.168.0.73:52868 - "POST /query HTTP/1.1" 500 Internal Server Error
TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.body', 'body': '<6386 bytes>'}
TRACE: 192.168.0.73:52868 - ASGI [23] Raised exception
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 84, in call
raise exc from None
File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 80, in call
await self.app(scope, inner_receive, inner_send)
File "/usr/local/lib/python3.11/dist-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/app/context_chat_backend/ocs_utils.py", line 75, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/utils.py", line 74, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/controller.py", line 325, in _
return process_context_query(
^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/chain/one_shot.py", line 65, in process_context_query
context_docs = get_context_docs(user_id, query, vectordb, ctx_limit, scope_type, scope_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/chain/context.py", line 37, in get_context_docs
context_docs = user_client.similarity_search(query, k=ctx_limit, filter=ctx_filter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 349, in similarity_search
docs_and_scores = self.similarity_search_with_score(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 439, in similarity_search_with_score
results = self.__query_collection(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_core/utils/utils.py", line 36, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 156, in __query_collection
return self._collection.query(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/api/models/Collection.py", line 195, in query
query_results = self._client._query(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/rate_limiting/init.py", line 47, in wrapper
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/api/segment.py", line 737, in _query
vector_reader = self._manager.get_segment(collection_id, VectorReader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 217, in get_segment
instance = self._instance(segment)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 246, in _instance
instance = cls(self._system, segment)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 107, in init
self._persist_data = PersistentData.load_from_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 70, in load_from_file
ret = cast(PersistentData, pickle.load(f))
^^^^^^^^^^^^^^
_pickle.UnpicklingError: pickle data was truncated
TRACE: 192.168.0.73:52868 - HTTP connection lost
The text was updated successfully, but these errors were encountered: