Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge sqlite database when writing via API calls #1163

Open
1 of 2 tasks
MHOOO opened this issue Aug 16, 2024 · 10 comments
Open
1 of 2 tasks

Huge sqlite database when writing via API calls #1163

MHOOO opened this issue Aug 16, 2024 · 10 comments

Comments

@MHOOO
Copy link

MHOOO commented Aug 16, 2024

Describe the current behavior

I have a python backend service that continuously updates tables with new data on a self-hosted grist-core instance via docker. Over the past months, this has apparently resulted in an SQLite DB that is over 900MB large without snapshots. I could now get rid of most history states via the /{did}/states/remove endpoint, resulting in an SQLite DB that is only 500 KB large.
Is there some way to disable history writing when using the Python / REST API or have it automatically clean up states every now & then?

Steps to reproduce

  1. Startup any grist core instance via e.g. docker
  2. Spam row updates
  3. See your SQLite DB grow infinitely!

Describe the expected behavior

SQLite DB should not grow infinitely on row updates

Where have you encountered this bug?

Instance information (when self-hosting only)

  • Grist instance:

    • Version: 1.1.12
    • URL (if it's OK for you to share it): -
    • Installation mode: docker
    • Architecture: single-worker I think
  • Browser name, version and platforms on which you could reproduce the bug: chromium

@MHOOO
Copy link
Author

MHOOO commented Aug 16, 2024

Oh, another thing I noticed, is the grist-sessions.db growing upto 900MB as well. However I could just delete it and grist ran as normal. What is stored inside the table for it to grow this large? Am I potentially creating new sessions with every REST API call?

@fflorent
Copy link
Collaborator

Hello,

Is there some way to disable history writing when using the Python / REST API or have it automatically clean up states every now & then?

Not yet, but I think your request is already tracked here (when you say "have it automatically clean up states every now & then"):
#1121

I tend to think it should be quite easy and straightforward to implement it (not 100% sure), if you want to and have time to take a look. I would be glad to help if so!

@MHOOO
Copy link
Author

MHOOO commented Aug 16, 2024

I guess exposing ACTION_HISTORY_MAX_ROWS & ACTION_HISTORY_MAX_BYTES as environment Variables should work, right?
I'm amazed I did not reach the 1000 rows, since that would be only 1000s in my case (I have a status table that I update every second).

@fflorent
Copy link
Collaborator

I would say so. As stated Paul, we should also check the value and ensure it is strictly positive:

 Pruning to 0 would currently leave the document unusable, since in there is currently no way to work with a document with a null "current last action". I'm sure this could be overcome. An alternative could be to add an empty action and then prune everything before it.

I tend to think we may also let the user configure the other variables too, without much opinion.

If you agree that #1121 would solve your issue, would you agree to close your issue and continue the discussion there?

@MHOOO
Copy link
Author

MHOOO commented Aug 16, 2024

Yes, that would work for me. Though I would still like to know what caused grist-sessions.db to grow this large. Am I potentially misusing the system by creating a new session with every request?

@fflorent
Copy link
Collaborator

fflorent commented Aug 16, 2024

No, that's because of something you already pointed out:

I have a python backend service that continuously updates tables with new data on a self-hosted grist-core instance via docker

Every action you make on the document, whether it is through the UI or the API, adds a new history entry, explaining why with how you use the document it went so big.

@MHOOO
Copy link
Author

MHOOO commented Aug 20, 2024

Oh, I was under the assumption that the history was stored inside the actual document SQLite DB and not the grist-sessions.db. OK, good to know

@paulfitz
Copy link
Member

Oh, I was under the assumption that the history was stored inside the actual document SQLite DB and not the grist-sessions.db

That's correct, history is stored inside the SQLite DB, in the _gristsys_ActionHistory and _gristsys_ActionHistoryBranch tables.

The grist-sessions.db table stores session information, when Redis is not available. It does sound like you might be getting a lot of session churn somehow? Are the requests made anonymously or with an api key?

@fflorent
Copy link
Collaborator

I have missed you talked about grist-sessions.db and not the document, sorry for the confusion @MHOOO

@MHOOO
Copy link
Author

MHOOO commented Sep 2, 2024

The grist-sessions.db table stores session information, when Redis is not available. It does sound like you might be getting a lot of session churn somehow? Are the requests made anonymously or with an api key?

I have set GRIST_API_KEY to a valid key, so these should be made non-anonymously

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants