-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making it possible to recursively clean files in sub-directories #173
base: main
Are you sure you want to change the base?
Conversation
…clean sub-directories Signed-off-by: Adam.Dybbroe <[email protected]>
Signed-off-by: Adam.Dybbroe <[email protected]>
…ip logfile handler Signed-off-by: Adam.Dybbroe <[email protected]>
Signed-off-by: Adam.Dybbroe <[email protected]>
Signed-off-by: Adam.Dybbroe <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #173 +/- ##
==========================================
- Coverage 90.00% 89.48% -0.52%
==========================================
Files 24 26 +2
Lines 5100 5277 +177
==========================================
+ Hits 4590 4722 +132
- Misses 510 555 +45
Flags with carried forward coverage won't be shown. Click here to find out more.
|
You can have a look, and see what you think @TAlonglong @pnuu and @mraspaud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some change requests inline.
if args.verbose: | ||
logging.basicConfig(level=logging.DEBUG, handlers=[handler], format=msgformat) | ||
elif args.quiet: | ||
logging.basicConfig(level=logging.DEBUG, handlers=[handler], format=msgformat) | ||
else: | ||
logging.basicConfig(level=logging.DEBUG, handlers=[handler], format=msgformat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all equal, should they have levels DEBUG
/WARNING
/INFO
(or even ERROR
for the quiet
)?
try: | ||
if os.path.isdir(filename): | ||
if not os.listdir(filename): | ||
os.rmdir(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directory deletion doesn't need publishing?
if len(files_in_dir) == 0: | ||
if is_dry_run: | ||
LOGGER.info("Would remove empty directory: %s", dirpath) | ||
else: | ||
try: | ||
os.rmdir(dirpath) | ||
except OSError: | ||
LOGGER.debug("Was trying to remove empty directory, but failed. Should not have come here!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract to new _remove_empty_directory()
function.
|
||
|
||
def clean_files_and_dirs(pub, filepaths, ref_time, stat_time_checker, is_dry_run): | ||
"""From a list of file paths and a reference time clean files and directories.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring should be in imperative mood.
"""From a list of file paths and a reference time clean files and directories.""" | |
"""Clean files and directories defined by a list of file paths and a reference time.""" |
was_removed = False | ||
if not is_dry_run: | ||
was_removed = remove_file(filepath, pub) | ||
else: | ||
# print("Would remove %s" % filepath) | ||
LOGGER.info("Would remove %s" % filepath) | ||
|
||
if was_removed: | ||
section_files += 1 | ||
section_size += stat.st_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra branch can be removed by rearranging the code.
was_removed = False | |
if not is_dry_run: | |
was_removed = remove_file(filepath, pub) | |
else: | |
# print("Would remove %s" % filepath) | |
LOGGER.info("Would remove %s" % filepath) | |
if was_removed: | |
section_files += 1 | |
section_size += stat.st_size | |
if not is_dry_run: | |
was_removed = remove_file(filepath, pub) | |
section_files += 1 | |
section_size += stat.st_size | |
else: | |
LOGGER.info("Would remove %s" % filepath) |
recursive = info.get('recursive') | ||
if recursive and recursive == 'true': | ||
recursive = True | ||
else: | ||
recursive = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recursive = info.get('recursive') | |
if recursive and recursive == 'true': | |
recursive = True | |
else: | |
recursive = False | |
recursive = info.getboolean('recursive', False) |
kws = {} | ||
for key in ["days", "hours", "minutes", "seconds"]: | ||
try: | ||
kws[key] = int(info[key]) | ||
except KeyError: | ||
pass | ||
|
||
ref_time = datetime.utcnow() - timedelta(**kws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a separate function, _get_reference_time()
for example.
|
||
|
||
def test_clean_dir_non_recursive(fake_tree_of_some_files, tmp_path, caplog): | ||
"""Test cleaning a directory for files of a certain age.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring doesn't match the actual test (nor the test function name).
"""Test cleaning a directory tree for files of a certain age. | ||
|
||
Here we test using the modification time to determine when the file has been 'created'. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Test cleaning a directory tree for files of a certain age. | |
Here we test using the modification time to determine when the file has been 'created'. | |
""" | |
"""Test cleaning a directory tree for files which were created before the given time.""" |
"""Test cleaning a directory tree for files of a certain age. | ||
|
||
Here we test using the modification time to determine when the file has been 'created'. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Test cleaning a directory tree for files of a certain age. | |
Here we test using the modification time to determine when the file has been 'created'. | |
""" | |
"""Test cleaning a directory tree for files that have not been modified after the given time.""" |
Refactor the files cleaning a little, and making it possible to recursively clean sub-directories
At SMHI we are using the remove_it.py script to clean up after CSPP SDR processing for instance. However, CSPP leaves behind a deep tree of files, something like 3-4 GB, and these are not needed after all granules have been processed (they are however, probably/likely needed while SDR granule processing is going on). The current
remove_it.py
script only does a file-glob and does not walk down into the tree, thus it requires something like the list of templates below in order to clean properly (and this one was not enough in our case):Instead I would like to be able to do something like this instead: