Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read media files more efficiently from zipped files #9761

Open
rtibbles opened this issue Oct 5, 2022 · 3 comments · May be fixed by #12805
Open

Read media files more efficiently from zipped files #9761

rtibbles opened this issue Oct 5, 2022 · 3 comments · May be fixed by #12805
Assignees

Comments

@rtibbles
Copy link
Member

rtibbles commented Oct 5, 2022

Currently in our implementation of H5P, we load the entire H5P file into the frontend and then extract all its constituent files into Blob objects and create URLs for them.

This has the distinct advantage of being very robust and being very predictable. It has the slight downside of causing incredibly long loading times for H5P files that contain large bundled media files such as video or audio.

Once #9157 has been implemented, it would be useful to augment the zip file wrapper in the following ways:

Instead of reading the entire file at once, first read the file metadata and listing in the zip file via a range request. To accomplish this we can use try to either directly use zipinfo.js or vendor it and modify it for our needs.

For reference, this is a Python implementation of a similar mechanism https://github.com/saulpw/unzip-http.

Once we have the listing of all the files, in a minimal number of requests, we load and unzip all files below a certain size cut off (say 500KB) at which we decide to defer loading of a file.

For any files not loaded through the above mechanism, we do the following:

  • First we enhance the zipcontent endpoint to support range requests, this will allow video and audio files to be easily played - due to a lack of support for a seek method in Python 3.6 on extracted files, we will have to brook some inefficiency on Python 3.6 backends.
  • Secondly, we enhance the zipfile wrapper to allow passing a function to generate the URL for the large file. If this is present, then it will defer to this URL generating function for large files, if not, it will behave as it currently does. We then inject this function where it is needed to provide appropriate references to the zipcontent endpoint.
  • Thirdly, we will update the zipcontent endpoint to serve files from within any compressed file format (again).
@rtibbles rtibbles self-assigned this Oct 5, 2022
@rtibbles
Copy link
Member Author

rtibbles commented Nov 3, 2022

A recent issue arose that showed the usefulness of this - when the H5P contains particularly large media files (whether because they are very long or have not been adequately compressed), this can cause excessive memory usage on memory constrained client devices, which will cause fflate to return undefined from its unzip command.

Avoiding the loading and unzipping of large media files until they are demanded would help to prevent this issue, while still solving the issue that client side unpacking of H5P files was intended to resolve, which is to avoid the hundreds of HTTP requests initiated by a more naive implementation.

@rtibbles
Copy link
Member Author

rtibbles commented Jan 2, 2024

I have a slight concern with this approach - mostly to do with the fact that the ability to generate an object URL for a MediaSource object, while currently widely supported, is being phased out and will be dropped in the future.

MediaSource objects instead have to be programmatically attached via the objectSrc attribute. To avoid the complexity here, I think it might be simpler to handle files too large to be loaded as part of the main zip file in this way instead:

  • First we enhance the zipcontent endpoint to support range requests, this will allow video and audio files to be easily played - due to a lack of support for a seek method in Python 3.6 on extracted files, we will have to brook some inefficiency on Python 3.6 backends.
  • Secondly, we enhance the zipfile wrapper to allow passing a function to generate the URL for the large file. If this is present, then it will defer to this URL generating function for large files, if not, it will behave as it currently does. We then inject this function where it is needed to provide appropriate references to the zipcontent endpoint.

@rtibbles
Copy link
Member Author

Spec has been updated to address @rtibbles' concerns here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants