-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed FileRevs to handle renaming in another branch #60
Fixed FileRevs to handle renaming in another branch #60
Conversation
Hey @linzhp, What i really like that you added some unit tests! I try to do this later this day or tomorrow. Will inform you about more progress. @sduenas If i test this successful, is it ok that i will merge this? |
This pull request contains an updated test repository. To reproduce the problem, you could run the old code on the test repository attached in this PR. |
He @linzhp, i tried to reproduce your code, but i was not successful.
Can you help me to test it? |
I think you just reproduced the bug that this PR is trying to fix. So try applying the changes in this PR, and the error will be gone. |
__path_query__ = '''SELECT file_path FROM file_links, | ||
(SELECT MAX(id) id FROM file_links WHERE file_id = ? AND commit_id <= ? ORDER BY commit_id DESC) fp | ||
WHERE file_links.id = fp.id''' | ||
__path_query__ = '''SELECT rev, file_path FROM file_links fl JOIN scmlog s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't ORDER BY commit_id DESC, fl.id DESC
sufficient to get max file_links id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to Test this next days again. Today i was not Successful :(
Am Sonntag, 23. Februar 2014 schrieb Zhongpeng Lin :
In pycvsanaly2/extensions/FileRevs.py:
@@ -32,9 +34,8 @@ class FileRevs:
from scmlog s, action_files af where s.id = af.commit_id and s.repository_id = ? order by s.id'''
# This query selects the newest entry for those cases with two filepaths
# for the same file. See #3 for more info.
- path_query = '''SELECT file_path FROM file_links,
-(SELECT MAX(id) id FROM file_links WHERE file_id = ? AND commit_id <= ? ORDER BY commit_id DESC) fp
-WHERE file_links.id = fp.id'''- path_query = '''SELECT rev, file_path FROM file_links fl JOIN scmlog s
Isn't ORDER BY commit_id DESC, fl.id DESC sufficient to get max
file_links id?Reply to this email directly or view it on GitHubhttps://github.com//pull/60/files#r9977111
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it is enough but I think your query can return more than a row while the current query returns only one. Moreover, I think the current query is more readable.
@andygrunwald what do you think? Which one do you think is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you come up with a query that only returns one row and is guaranteed to be the right one, i.e., without the error in #65?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sduenas I checked the queries.
The new query is fine, too. In my point of view the new query is much more readable, because there is no subquery to generate a virtual table like in the last one.
You are right, the new query can return more than one query. The last query only returns one.
But it is possible to add a LIMIT 1 to the new query to return only one query.
What do you think about this @linzhp ?
@andygrunwald if you find that it works, merge the changes, but before that, I think the problem with the query that I mentioned above should be fixed. Thanks both for your work. |
Same problem here. I've been unable to make this patch work with input.tar.gz repo. |
Zhongpeng, can you please post your config file and your Commands to Start And which was the result in the Database before and after. Am Montag, 3. März 2014 schrieb Santiago Dueñas :
|
Sorry, it was my bad. I made the test pass, but I didn't update the existing calls to FileRevs to use the new |
So if have to enable Metrics Extension or Blame extension to see the new results, right? |
Yes 发自我的 iPad
|
Any problems with testing this pull request? |
I executes it on our test repository with metrics extension enabled. But there seems to be a change in behaviour. In my point of view, the behaviour of the patch (metrics only if a file changes) is correct. |
Conflicts: pycvsanaly2/extensions/Blame.py pycvsanaly2/extensions/FileRevs.py pycvsanaly2/extensions/Metrics.py tests/file_path_test.py
Interesting... I just merged the master branch into this branch, deleted all tables, and ran CVSAnalY and Metrics again, I could see 11 rows. Could you check out the branch https://github.com/linzhp/CVSAnalY/tree/concurrent_file_path to your local directory, and try again? |
I made a bigger test today. Maybe this helps to understand my results. My config file: ## Run in debug mode?
debug = True
#
## Run quiet
quiet = False
#
## Enable profiling
# profile = False
#
# repo_logfile = None
# save_logfile = None
# no_parse = False
#
## Database parameters
db_driver = 'mysql'
db_user = 'root'
db_password = None
# db_database = 'cvsanaly'
db_hostname = 'localhost'
## Extensions
## No extensions enable by default
extensions = ['Metrics']
#extensions = ['Content']
#
## Metrics extension options
# metrics_all = False
# metrics_noerr = False My command:
Current master:
Current pr/60:
linzhp/CVSAnalY/tree/concurrent_file_path* ** Results ** |
All you did seem good to me. As far as I can tell, the remaining error and the ID 6 were not introduced by this PR, and this PR didn't mean to fix bugs in metrics extension. So you may want to create another issue for them. I don't see any problem with this PR now. |
You are right. |
I checked everything. A huge thanks to @linzhp for his endurance to fix and explain and answers of my testing :) |
Fixed FileRevs to handle renaming in another branch
Adding Thanks @andygrunwald for reviewing! |
Problem
Commit B and commit C share the same parent A, but B and C are in different branches. B renamed file1 to file1.renamed, while C modified file1. C are made at a later time than B. When the branch of C is finally merged into the branch of B, commit C will appear after commit B in git log. When deciding the path of file1 at commit C, FileRevs found that the latest record in file_links table is the one related to commit B, so it used the file1.renamed as the path, which is wrong.
Solution
Use is_ancestor method in repositoryhandler to decide whether a commit is an ancestor for another. In this case, B is not an ancestor of C. Only use the latest file_links record related to ancestors of a commit to decide the file path.
An alternative implementation is to use commit graph to determine the ancestry reation. Not sure which one is better. Since #40 is not yet accepted, I try not to make this PR dependent on it.
@andygrunwald @sduenas please review, thanks!