Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into optimizations for the initial "gather paths to analyze" phase #15

Open
ssokolow opened this issue Aug 21, 2014 · 0 comments
Open

Comments

@ssokolow
Copy link
Owner

Unlike the other steps, I've done practically nothing to optimize the initial recursive tree traversal phase.

I'll want to do some cost-benefit research on the following as well as identifying other potential improvements:

  • Look into the performance effect of checking whether excludes contain meta-characters and using simple string matching if they don't.
  • As I understand it, fnmatch.fnmatch uses regexes internally and doesn't cache them. Given how many times it gets called, I should try using re.compile with fnmatch.translate instead.
  • I should also look into what the performance effect are of programmatically combining multiple fnmatch.translate outputs so the ignore check can be handled in a single pass.
  • Look into the memory-I/O trade-offs inherent in doing one stat call for each file and then caching it so it can be used both for sizeClassifier and for things like inode-based hardlink detection.
@ssokolow ssokolow changed the title Look into optimization the initial "gather paths to analyze" phase Look into optimizations for the initial "gather paths to analyze" phase Aug 21, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant