Look into optimizations for the initial "gather paths to analyze" phase #15

ssokolow · 2014-08-21T00:21:20Z

Unlike the other steps, I've done practically nothing to optimize the initial recursive tree traversal phase.

I'll want to do some cost-benefit research on the following as well as identifying other potential improvements:

Look into the performance effect of checking whether excludes contain meta-characters and using simple string matching if they don't.
As I understand it, fnmatch.fnmatch uses regexes internally and doesn't cache them. Given how many times it gets called, I should try using re.compile with fnmatch.translate instead.
I should also look into what the performance effect are of programmatically combining multiple fnmatch.translate outputs so the ignore check can be handled in a single pass.
Look into the memory-I/O trade-offs inherent in doing one stat call for each file and then caching it so it can be used both for sizeClassifier and for things like inode-based hardlink detection.

The text was updated successfully, but these errors were encountered:

ssokolow changed the title ~~Look into optimization the initial "gather paths to analyze" phase~~ Look into optimizations for the initial "gather paths to analyze" phase Aug 21, 2014

ssokolow added the enhancement label Aug 21, 2014

ssokolow pushed a commit that referenced this issue Aug 21, 2014

Move TODOs for os.walk()-phase optimizations to issue #15.

428fdf7

ssokolow added the research label Aug 21, 2014

ssokolow added the confirmed label May 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look into optimizations for the initial "gather paths to analyze" phase #15

Look into optimizations for the initial "gather paths to analyze" phase #15

ssokolow commented Aug 21, 2014

Look into optimizations for the initial "gather paths to analyze" phase #15

Look into optimizations for the initial "gather paths to analyze" phase #15

Comments

ssokolow commented Aug 21, 2014