implements processing files in serial manner && watch for new files && smarter delete to preserve largest file #57

DZamataev · 2019-01-31T21:30:33Z

when num_processes==1, which leads to make DB insertions exactly after each hash and not in the end of processing a single chunk (a whole library).

adds some prints for total number of files to be processed and success insert to database.

…hich leads to make DB insertions exactly after each hash and not in the end of processing a single chunk (a whole library). adds some prints for total number of files to be processed and success insert to database.

…ype. Error log was: File "C:\Python37\lib\site-packages\magic\magic.py", line 196, in errorcheck_null raise MagicException(err) magic.magic.MagicException: b"cannot read `filename.jpg' (Permission denied)"

DZamataev · 2019-02-01T10:38:58Z

Thanks for sharing the awesome script @philipbl !
It's really nice, but I've faced some issues on my Windows machine.
Installation was not flawless, I think because pip requirements does not include everything. I had to retry running script several times getting error messages for abscent modules (mongo, libMagic and so on), which I installed with pip one by one.
The next issue was multi processing. I am sorting thousands of files and parallel execution leaded to stuck DB writes. Hashes were going OK, they were fast, but no real writes to the DB occured after each hash. So when I interrupted my script the database was empty. I've tried to fix it at first by using chunks on parallel map function, but it didn't work because files entry were not considered iterable by parallel map. I don't know why. So I had to stick to serial execution which I implemented in this PR. Please consider merging it, cuz it's a huge yet seamless improvement. Another benefit with single process run with serial map function is instant keyboard interruptions. Somewhy I had to wait for minutes after pressing CTRL+C before with parallel map function approach.
Best regards.

…nt given. When a file modification occurs recursively in this path, the modified file will be added as if ```add``` command was chosen. Useful when you are sorting your library and adding new images to it. Removing is not supported yet. Also raises Pillow version because of Python 3.6 compatibility issues I experienced on Windows. Not tested so well after the update.

DZamataev · 2019-02-01T19:44:10Z

Added a function I needed to watch for incoming files and add them as they are modified. Also updated Pillow.

DZamataev · 2019-02-01T19:45:47Z

It's really nice, but I've faced some issues on my Windows machine.
Installation was not flawless, I think because pip requirements does not include everything. I had to retry running script several times getting error messages for abscent modules (mongo, libMagic and so on), which I installed with pip one by one.

No issues after uninstalling python 3.7 and installing python 3.6 with updated version of Pillow dependency.
Only one extra dependency installation is necessary on windows. Lib magic. Use pip install python-magic

…ameter to sort files by size and preserve largest file on delete.

DZamataev · 2019-02-04T15:29:19Z

Now I also implemented --filter-largest parameter to sort files by size and preserve largest file on delete.
It is disabled by default for compatibility. Simply add --filter-largest parameter to the find --delete command and it will delete only smaller duplicates of the larger file which will stay in place.

DZamataev · 2019-02-05T10:02:49Z

there was a critical issue which is now fixed. But tests still dont pass because of the added option.

DZamataev · 2019-02-05T10:07:15Z

Test will be ok from now. BTW i have not covered added features.

DZamataev added 2 commits February 1, 2019 00:04

Fixes unhandled exception when permission denied while getting mime t…

f365008

…ype. Error log was: File "C:\Python37\lib\site-packages\magic\magic.py", line 196, in errorcheck_null raise MagicException(err) magic.magic.MagicException: b"cannot read `filename.jpg' (Permission denied)"

DZamataev changed the title ~~implements processing files in serial manner~~ implements processing files in serial manner and watch for new files Feb 1, 2019

implements filtering files before --delete. Adds --filter-largest par…

2b3461d

…ameter to sort files by size and preserve largest file on delete.

DZamataev changed the title ~~implements processing files in serial manner and watch for new files~~ implements processing files in serial manner && watch for new files && smarter delete to preserve largest file Feb 4, 2019

DZamataev mentioned this pull request Feb 4, 2019

Make automatic deletion (dedup) smarter #8

Open

DZamataev added 2 commits February 4, 2019 18:43

fix trash location on delete

c8a5738

fixes deletion when no filter-largest option

5c96956

adds default values for tests to pass

e14053d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implements processing files in serial manner && watch for new files && smarter delete to preserve largest file #57

implements processing files in serial manner && watch for new files && smarter delete to preserve largest file #57

DZamataev commented Jan 31, 2019

DZamataev commented Feb 1, 2019

DZamataev commented Feb 1, 2019

DZamataev commented Feb 1, 2019 •

edited

Loading

DZamataev commented Feb 4, 2019 •

edited

Loading

DZamataev commented Feb 5, 2019 •

edited

Loading

DZamataev commented Feb 5, 2019 •

edited

Loading

implements processing files in serial manner && watch for new files && smarter delete to preserve largest file #57

Are you sure you want to change the base?

implements processing files in serial manner && watch for new files && smarter delete to preserve largest file #57

Conversation

DZamataev commented Jan 31, 2019

DZamataev commented Feb 1, 2019

DZamataev commented Feb 1, 2019

DZamataev commented Feb 1, 2019 • edited Loading

DZamataev commented Feb 4, 2019 • edited Loading

DZamataev commented Feb 5, 2019 • edited Loading

DZamataev commented Feb 5, 2019 • edited Loading

DZamataev commented Feb 1, 2019 •

edited

Loading

DZamataev commented Feb 4, 2019 •

edited

Loading

DZamataev commented Feb 5, 2019 •

edited

Loading

DZamataev commented Feb 5, 2019 •

edited

Loading