Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: White screen crash during indexing/embedding. Too many docs in workspace? Workspace needs a cleaning up function? #928

Closed
Tiberius1313 opened this issue Mar 19, 2024 · 10 comments · Fixed by #2365
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@Tiberius1313
Copy link

How are you running AnythingLLM?

AnythingLLM desktop app

What happened?

Adding a new documents seams to be a bit delayed. Then selecting the file is very laggy. Clicking "move 1 file to workspace" results in a white screen crash.

The issue seams to be caused do to too many files in the Workspace. In my case the error appears once the folder contains more than ~2500 files or a size >100 MB (RAM usage is only at 66%).

If I delete the files in the folder "custum-documents"* or rather push them to an other folder, that removes all files from the workspace and the problem seams to be solved. But I'm not sure if these files are still needed and my workaround is a good solution?

*(C:\Users[name]\AppData\Roaming\anythingllm-desktop\storage\documents\custom-documents).

Are there known steps to reproduce?

System: 14-Core Intel Xeon E5-2690 v4, 3166 MHz on MSI X99S Gaming 7 (MS-7885), 8x 16 GB DDR4-3200 DDR4 SDRAM, NVIDIA GeForce GTX 970, Windows 10 Pro 10.0.19045.4170, Anything 1.3.1

It could be that the error occurs after more than 3000 files in the mentioned folder.

@Tiberius1313 Tiberius1313 added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Mar 19, 2024
@Tiberius1313 Tiberius1313 changed the title [BUG]: White screen crash during indexing/embedding. Too many docs in workspace? Workspace need a cleaning up function? [BUG]: White screen crash during indexing/embedding. Too many docs in workspace? Workspace needs a cleaning up function? Mar 19, 2024
@shatfield4
Copy link
Collaborator

What type of files are you uploading to the document uploader? The crash might be happening due to some hidden files that are incomplete but also what is the reason you are uploading 3000 files at one time? Regarding the lag that happens, this is happening just because you are loading in so many files at a time and your CPU cannot keep up with rendering all 3000 files.

If you want to embed this many files without relying on your CPU to do it, we suggest you use a cloud embedding model like OpenAI since it can handle much more than just your local CPU would.

@Tiberius1313
Copy link
Author

@shatfield4 thanks you for your response.

I'm uploading PDFs. But I'm not uploading them at the same time. Even if I just add any single file the crash occurs. Once the folder ist emptied the system works fine again.

Is it necessary to keep does files in that folder? As it seams if I delete them I might get a lower quality response, but I'm not sure if that's really the case. Do does files interact with the retrieval process or is for that only the information in the DB been used?

@shatfield4
Copy link
Collaborator

It is necessary to keep the files in that folder because that file is a metadata file and is how the document picker knows which files are in the workspace and available. If you delete those files in the custom-documents folder manually, the documents are still embedded in your vector database so even though your workspace says there are no documents, you will still get context from that document and the RAG will still happen (this does not make the results less accurate by deleting those files).

This makes me think that you have a PDF file that may be corrupt or empty that is causing the crash. Is there a certain file that you can upload to replicate this bug consistently or does this only happen when you just upload lots of documents?

@Tiberius1313
Copy link
Author

Thank you for explaining. I understand now better how the DB an the json Files work together.

Meanwhile I have uploaded and processed new files. First everything worked fine and fluently now that I reached 2600 files the same error started to occurs again. Sometimes after restarting I was able to add new documents but once the folder contained more than 2700 files the error appears every time. That means electing a file in "My Documents" is slow and when I press "Move 1 file to workspace" anythingllm crashes after 20 sec in a white screen.

So I don't think the error is related to a single corrupt file but to the size of the files in the folder "custom-documents".

@jafrank88
Copy link

I am seeing this as well. There is some cliff that is hit around 180k vectors maybe. I can upload far more files, but if I try to embed even one more I get the white screen of death and no Task Manager activity related to background processing. Using Desktop v.1.40

@Tiberius1313
Copy link
Author

Thank you for confirming. But I don't think it's related to the DB and the vectors because when you delete the files from custom-documents the issue is gone and that step did not affect the DB.
I think the bug is simply caused by the Windows File Explores limitation to handle large amount of files. This process seams to be a single core application and when the response takes too long anythingLLM seams to run into a time out error.
Maybe the problem could be solved by storing the files in sub folder structure und the comparison of new and already embedded files could be made by an index?

@wallartup
Copy link

You probably hit a vector DB limit somehow.

A fix could be to be able to deploy different instances of lancedb and not have all the documents connected with just one db instance. After a certain amount of vector chunks it "cuts" the db into another vector

@wallartup
Copy link

@scimantica
Copy link

Adding another voice to this - am also seeing the same behaviour above a certain number of documents.

@timothycarambat
Copy link
Member

Moved to #2317

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants