-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Store.last-modified
by comparing hashes instead of file contents
#2335
base: main
Are you sure you want to change the base?
Conversation
3239903
to
b908da8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this provides a huge speedup for large values!
Looking at the overall algorithm used by last_modified
, I believe there could be complexity explosion with fork/merge: it doesn't detect that a commit was already checked when reached from another path? I wonder if you have some ideas on how to fix this also :)
| Some x, Some y -> not (equal_contents x y) | ||
let+ e = Tree.find_tree tree key in | ||
match (e, current_tree) with | ||
| Some x, Some y -> Tree.hash x <> Tree.hash y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the generic <>
function isn't guaranteed to work for some hash representations, maybe consider using an equal_hash
function instead? (as done in other modules)
modified [path], in ascending order of date. [depth] is the maximum depth | ||
to be explored in the commit graph, if any. Default value for [number] is | ||
1. *) | ||
(** [last_modified ?n c k] is the list of the last [n] commits that modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(** [last_modified ?n c k] is the list of the last [n] commits that modified | |
(** [last_modified ?depth ?n c k] is the list of the last [n] commits that modified |
@@ -1153,9 +1153,9 @@ module Make (B : Backend.S) = struct | |||
let current, current_depth = Heap.pop_minimum heap in | |||
let parents = Commit.parents current in | |||
let tree = Commit.tree current in | |||
let* current_value = Tree.find tree key in | |||
let* current_tree = Tree.find_tree tree key in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could also be optimized away since, except for the very first call, we already fetched the current_tree
on the previous recursive call on line 1175 below :) (but we can do this in a follow-up PR if it's unclear how to)
This commit changes the
Store.last_modified
function to compare the precomputed file hashes instead of their contents, as the latter is significantly (~10 to 20x) slower. :)(More specifically,
Tree.find
is way slower thanTree.find_tree
)It also changes the documentation of the function where variable
n
is wrongly referred to asnumber
.