Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Upload Endpoints #565

Merged
merged 18 commits into from
Jun 20, 2024
Merged

Fix: Upload Endpoints #565

merged 18 commits into from
Jun 20, 2024

Conversation

1yam
Copy link
Collaborator

@1yam 1yam commented Apr 25, 2024

Issue

Issue Description:
The current file upload endpoint reads the entire file before checking if it exceeds the maximum allowed size. The size limits are 25 MB for unauthenticated uploads (MAX_UNAUTHENTICATED_UPLOAD_FILE_SIZE) and 1000 MB for authenticated uploads (MAX_UPLOAD_FILE_SIZE).

Proposed Solutions:

  • Read the file in chunks and check if the size exceeds the maximum allowed size.
  • Use a temporary file to handle the data, which helps avoid using too much memory.

Copy link

codecov bot commented Apr 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.67%. Comparing base (555c166) to head (526de08).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #565      +/-   ##
==========================================
+ Coverage   94.60%   94.67%   +0.07%     
==========================================
  Files          89       89              
  Lines        4724     4770      +46     
  Branches      648      652       +4     
==========================================
+ Hits         4469     4516      +47     
+ Misses        233      232       -1     
  Partials       22       22              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@1yam 1yam marked this pull request as ready for review May 2, 2024 10:14
Copy link
Member

@hoh hoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Would it make sense to add unit tests on the behavior specific to MultipartUploadedFile and RawUploadedFile ?

src/aleph/web/controllers/storage.py Outdated Show resolved Hide resolved
@aliel aliel self-requested a review May 7, 2024 20:30
@hoh
Copy link
Member

hoh commented May 8, 2024

@philogicae this should fix the file upload issue you reported.

@hoh hoh requested a review from Psycojoker May 8, 2024 14:53
@1yam
Copy link
Collaborator Author

1yam commented May 8, 2024

post = await request.post()

Based on aiohttp docs post method looks like it's made for MultipartForm:

If method is not POST, PUT, PATCH, TRACE or DELETE or content_type is not empty or application/x-www-form-urlencoded or multipart/form-data returns empty multidict.

@hoh Maybe should i add a check and use read() for RawUploadedFile ?

@aliel
Copy link
Member

aliel commented May 15, 2024

We should use multipart method instead of post

You might have noticed a big warning in the example above. The general issue is that aiohttp.web.BaseRequest.post() reads the whole payload in memory, resulting in possible OOM errors. To avoid this, for multipart uploads, you should use aiohttp.web.BaseRequest.multipart() which returns a multipart reader:

=>
https://docs.aiohttp.org/en/stable/web_quickstart.html#file-uploads

@hoh

@Psycojoker
Copy link
Collaborator

I did a quick general review because I don't know the business code very well and some of them are more questions/suggestions, don't hesitate if you have questions :)

@1yam
Copy link
Collaborator Author

1yam commented May 30, 2024

@Psycojoker I did some updates:

  • Refactored UploadFile to avoid code duplication (u was right, with the change made was possible to do way easier)
  • Used aiofiles
  • Fixing most of you'r comment

The main thing I didn't change is that we're still reading and writing the file to a temporary file first to avoid launching the workflow if the file is too large and we need to calculate the hash early for safety check.

I wasn't using the manager correctly, that should be fixed now.

@1yam 1yam requested a review from Psycojoker May 30, 2024 11:19
setup.cfg Outdated Show resolved Hide resolved
src/aleph/web/controllers/storage.py Outdated Show resolved Hide resolved
src/aleph/web/controllers/storage.py Outdated Show resolved Hide resolved
src/aleph/web/controllers/storage.py Outdated Show resolved Hide resolved
src/aleph/web/controllers/storage.py Outdated Show resolved Hide resolved
tests/api/test_storage.py Outdated Show resolved Hide resolved
@Psycojoker
Copy link
Collaborator

Hugo suggested that we could do a call at some point if you want me to explain some of my comments.

@Psycojoker
Copy link
Collaborator

We did a pair programming session and that should be good on my side regard good python practices.

Lyam told me he needed to do one last test with the frontend before confirming it's good.

@1yam
Copy link
Collaborator Author

1yam commented Jun 12, 2024

@hoh @Psycojoker Just finish my last test, everything looks working as intended, I would say ready for merge

@hoh hoh merged commit 833bbdf into aleph-im:main Jun 20, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants