Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api/rest: files limit per bucket #209

Open
diegodelemos opened this issue May 20, 2019 · 1 comment
Open

api/rest: files limit per bucket #209

diegodelemos opened this issue May 20, 2019 · 1 comment

Comments

@diegodelemos
Copy link
Member

We have been having problems with the number of files Files-Rest can handle in CERN Opendata.

We have put in place a workaround in the past and the investigation we went through can be accessed here.

As a summary:

  • This feature in Opendata adds files from the CLI using the Files-Rest Pythonic API
  • Profiling shows big amount of SQL queries for the given files to be added (factor of 10 times the number of files)
  • Memory usage grows polynomially and it doesn't get cleaned

We should find out what are the limits of a bucket in Invenio, so we can add it to the documentation, and eventually solve the memory problem. To do so we can replicate the conditions on Opendata:

  • Create through Pythonic API ~4000 files
  • Create through REST API ~4000 files
  • Try to list those files and operate them
@tiborsimko
Copy link
Member

Example: one dataset record consisting of 27K files: http://opendata.cern.ch/record/8884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants