-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #2257 : Temporary Directory Cleanup for Materializers #2560
base: develop
Are you sure you want to change the base?
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
4f4e883
to
cc37a09
Compare
Docs are updated. Fixed a couple of materializers I had missed. |
docs/book/user-guide/advanced-guide/data-management/handle-custom-data-types.md
Outdated
Show resolved
Hide resolved
Some tests failing in the CI and you'll have to run the formatting script as well. ( |
d50b890
to
972fd23
Compare
Ran the formatter, fixed the errors I had in the CI CD check script output. There is one failing unit test in the numpy materializer that is on the develop branch as well. Otherwise seems clean. |
4d6c3fe
to
8fa0db8
Compare
src/zenml/integrations/xgboost/materializers/xgboost_dmatrix_materializer.py
Outdated
Show resolved
Hide resolved
…reation onto the tempfile module
matrix = lgb.Dataset(temp_file, free_raw_data=False) | ||
# Copy from artifact store to temporary file | ||
fileio.copy(filepath, temp_file) | ||
matrix = lgb.Dataset(temp_file, free_raw_data=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was a tricky bug to catch :) With the current implementation a matrix
is lazy loaded later on and if you dump it to a temp file with clean-up, it will fail later on. So clean-up here is not feasible without heavy rework.
Describe changes
I fixed the integrated materializers methods for creating temporary files and directories (#2257) to achieve a unified approach to temporary file and directory management that doesn't require manual cleanup methods. The tempfile module has context managers which manage cleanup for us.
I didn't add anything to the tests because I'm not certain where we should enforce the usage of tempfile:
I think 1 and 2 are the best places to enforce this pattern. We don't need to test whether or not the tempfile context managers are working, we know they do, they have upstream tests. Any software tests of this pattern would really need to ask the question "are your integration materializers using tempfile context managers to manage your temporary files?", which is going to be difficult to do in code.
If the maintainers want me to take a crack at improving the tests to answer this question for the materializers, I will, but I am not sure that's the best way to accomplish compliance.
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes