Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: "_pickle.UnpicklingError: pickle data was truncated" #65

Open
ga-it opened this issue Aug 18, 2024 · 3 comments
Open

[bug]: "_pickle.UnpicklingError: pickle data was truncated" #65

ga-it opened this issue Aug 18, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ga-it
Copy link

ga-it commented Aug 18, 2024

Describe the bug
Context chat queries are failing today with a 500 internal server error in the context_chat_backend docker container.

Could this be because the Docker container was restarted while an embedding task was running?
I also had the issue with the latest groupfolders update yesterday that caused the fulltextsearch process to fail and trigger 500 errors for Nextcloud as a whole (see files_fulltextsearch issue #265). Could embedding corruption have occurred during this process?

To Reproduce
Steps to reproduce the behavior:

  1. Run context_chat query via Nextcloud Assistant
  2. Trigger background process via cron
  3. Context chat fails

Expected behavior
This depends on the cause. If it is due to a docker container stop, then I would expect a graceful shutdown. If the embedding process fails due to another error (e.g. the fulltextsearch bug), then the embedding process should gracefully stop. If there is a corrupt pickle file, then I would expect an error recovery procedure.

Server logs (if applicable)
NA

Context Chat Backend logs (if applicable, from the docker container)

See below

Screenshots
See logs

Setup Details (please complete the following information):

  • Nextcloud Version: Nextcloud Hub 8 (29.0.4)
  • AppAPI Version: 3.1.0
  • Context Chat PHP Version 8.2.22
  • Context Chat Backend Version 3.0.1
  • Nextcloud deployment method: Docker latest stable
  • Context Chat Backend deployment method: manual, remote
  • chroma.sqlite3 is 175Gb (nextcloud file repository is TB+)

Additional context

docker logs context_chat_backend:

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from persistent_storage/model_files/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = ehartford_dolphin-2.2.1-mistral-7b
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 17
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32002] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32002] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32002] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 32000
llama_model_loader: - kv 18: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 19: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens cache size = 5
llm_load_vocab: token to piece cache size = 0.1637 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32002
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = ehartford_dolphin-2.2.1-mistral-7b
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 32000 '<|im_end|>'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_print_meta: EOT token = 32000 '<|im_end|>'
llm_load_print_meta: max token length = 48
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: Tesla P40, compute capability 6.1, VMM: yes
Device 1: Tesla P40, compute capability 6.1, VMM: yes
llm_load_tensors: ggml ctx size = 0.41 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 85.94 MiB
llm_load_tensors: CUDA0 buffer size = 2495.28 MiB
llm_load_tensors: CUDA1 buffer size = 2311.78 MiB
...................................................................................................
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 544.00 MiB
llama_kv_cache_init: CUDA1 KV buffer size = 480.00 MiB
llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model: CUDA0 compute buffer size = 640.01 MiB
llama_new_context_with_model: CUDA1 compute buffer size = 640.02 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 72.02 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 3
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Model metadata: {'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'ehartford_dolphin-2.2.1-mistral-7b', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '17'}
Using fallback chat format: llama-2
TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.start', 'status': 500, 'headers': '<...>'}
INFO: 192.168.0.73:52868 - "POST /query HTTP/1.1" 500 Internal Server Error
TRACE: 192.168.0.73:52868 - ASGI [23] Send {'type': 'http.response.body', 'body': '<6386 bytes>'}
TRACE: 192.168.0.73:52868 - ASGI [23] Raised exception
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 84, in call
raise exc from None
File "/usr/local/lib/python3.11/dist-packages/uvicorn/middleware/message_logger.py", line 80, in call
await self.app(scope, inner_receive, inner_send)
File "/usr/local/lib/python3.11/dist-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/app/context_chat_backend/ocs_utils.py", line 75, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/fastapi/routing.py", line 193, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/utils.py", line 74, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/controller.py", line 325, in _
return process_context_query(
^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/chain/one_shot.py", line 65, in process_context_query
context_docs = get_context_docs(user_id, query, vectordb, ctx_limit, scope_type, scope_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/context_chat_backend/chain/context.py", line 37, in get_context_docs
context_docs = user_client.similarity_search(query, k=ctx_limit, filter=ctx_filter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 349, in similarity_search
docs_and_scores = self.similarity_search_with_score(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 439, in similarity_search_with_score
results = self.__query_collection(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_core/utils/utils.py", line 36, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/langchain_community/vectorstores/chroma.py", line 156, in __query_collection
return self._collection.query(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/api/models/Collection.py", line 195, in query
query_results = self._client._query(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/rate_limiting/init.py", line 47, in wrapper
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/api/segment.py", line 737, in _query
vector_reader = self._manager.get_segment(collection_id, VectorReader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/telemetry/opentelemetry/init.py", line 146, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 217, in get_segment
instance = self._instance(segment)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/manager/local.py", line 246, in _instance
instance = cls(self._system, segment)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 107, in init
self._persist_data = PersistentData.load_from_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 70, in load_from_file
ret = cast(PersistentData, pickle.load(f))
^^^^^^^^^^^^^^
_pickle.UnpicklingError: pickle data was truncated
TRACE: 192.168.0.73:52868 - HTTP connection lost

@ga-it ga-it added the bug Something isn't working label Aug 18, 2024
@ga-it
Copy link
Author

ga-it commented Aug 22, 2024

Errors in the ingestion process can lead to unsynchronised or corrupt data.

Causes of errors could be:

  1. CUDA tensor issues
  2. File decoding errors
  3. Text splitting problems
  4. Network connectivity issues during database interaction
  5. Database-specific errors during document insertion
  6. Memory limitations, especially when handling large files
  7. Timeouts
  8. Process / docker container halts / crashes
  9. Nextcloud server errors during embedding process

Results could be:

  1. Partially added documents in the database
  2. Inconsistent or missing metadata
  3. Corrupted embedding data
  4. Unsynchronized data between the internal dictionary/lists and the vector database

The fix would seem to require a catch and rollback in the injest.py file and _process_sources function.

Additionally it would seem that a repair script is required to remove past corrupt ingestion or this crashes the query function.
This should identify:

  1. Missing Embeddings: Check if any documents in the database have missing embeddings. You can query the database for documents where the embedding field is null or empty.
  2. Metadata Mismatches: Compare the metadata (filename, timestamp, etc.) in the database with the actual files in your storage. Look for inconsistencies, missing metadata, or outdated timestamps.
  3. Orphaned Embeddings: Identify embeddings in the database that no longer correspond to any existing file in your storage. This might indicate that a file was deleted without properly removing its associated data from the database.

@kyteinsky
Copy link
Contributor

Hi, all the data is first received through the network and processed before being added to the vector db. It seems the plug was pulled right when it was doing so. A graceful shutdown at this point would be a feature request. Please open a new issue for that.

From #66, it seems you mount a folder for the persistent directory so let's use that local folder.

  1. shut down context_chat_backend
  2. cd /data/context_chat_backend/persistent_storage/vector_db_data
  3. sqlite3 chroma.sqlite3
  4. check if the db looks good using commands: .tables, select * from collections;, select * from embeddings; and select * from embedding_metadata;
  5. make a backup of the entire "vector_db_data" folder if all is good
  6. sqlite3 /path/to/db/chroma.sqlite3 "select s.id, c.name from segments s join collections c on s.collection=c.id where s.scope='VECTOR';". This should give you all the ids which should be the folder names in "vector_db_data" folder. Delete those folders and take a note of the collection names Vector_something. Instructions are from here: https://cookbook.chromadb.dev/strategies/rebuilding/#rebuilding-a-collection
  7. start context_chat_backend to build the index

I have not tested this personally but the docs should not be lying. Having a backup will be vital here.

@ga-it
Copy link
Author

ga-it commented Sep 3, 2024

I appreciate the above and need for regular overnight backups.

I will open a feature request. I would think there need to be some sort of journal with a rollback of incomplete jobs on restart if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants