Implements #288: support for two additional version ids (<bid>b.min and <bid>b.orig) #309

rjust · 2020-03-11T06:26:58Z

I took a first stab at implementing the two new version ids (#288 ). I focused on backward compatibility to avoid breaking existing functionality in the 2.0.0 release.

I ended up using b.min and b.orig (as opposed to b-min and b-orig) because I found Lang-2b.min and -v2b.min easier to read.

I added a basic test for checking those new version ids.

…e patches.

…extensions

…before giving up.

…nto vid-extensions

…nto simplify-apply

…nto vid-extensions

…nto simplify-apply

…lify-apply

…extensions

…nto vid-extensions

Greg4cr · 2020-03-11T07:27:58Z

@rjust - one minor suggestion/thought:

Should we also add version id <bid>f.orig as an alias for <bid>f? The ideas of "min" and "orig" have no meaning for the fixed version, but it could prevent accidental misuse.

I'm fine accepting this, as is, but the thought occurred to me.

jose · 2020-03-11T08:29:48Z

Thanks @rjust. Quick questions,

Should we also provide the original/unminimized patches as we provide the minimal ones? I.e., defects4j/framework/projects/Chart/patches/1.src.patch and defects4j/framework/projects/Chart/patches/1.src.patch.orig? I'm aware that D4J can figure out the original patch at checkout, but I'm wondering whether that will be useful for others. Or perhaps we can create, yet another d4j-command to provide the original patch of a minimized patch.
Don't we have to update the scripts/D4J commands that depend on the metadata of each patch? Let me give you an example. Suppose that I checkout the original buggy version of Chart-1 and then run the d4j-coverage command. Quoting the d4j-coverage command:

By default, code coverage is measured only for the classes modified by the bug fix.

In other words, by default, code coverage is measured only for the classes modified by the minimal bug fix. Because I checkout the original buggy version of Chart-1, shouldn't the d4j-coverage command measure coverage for the classes modified by the original bug fix instead?

Greg4cr · 2020-03-11T12:30:33Z

@jose @rjust - If we provide a .orig patch, we may also want to do provide .orig versions of modified_classes, loaded_classes, and relevant_tests as well. Any thoughts on this?

…extensions

rjust · 2020-03-11T14:43:32Z

Duplicating all meta data for the .orig versions seems a bit overkill and would grow the repo even more quickly. Given that most files would be identical for .min and .orig, maybe symlinks would be an option, though. Also, computing patches and the set of modified files is super quick and could be done on-the-fly.

I agree that the notion of instrumenting modified classes by default is confusing. We could fix this by more precisely documenting the default. Each command does support custom class lists, so a user can always override the default -- having such class lists precomputed would make this easier. Alternatively, we can compute the set of modified files prior to running a command rather than using pre-computed metadata. Computing the set of relevant tests would be more of an issue, though.

Based on the discussions that we had so far, it seems that we have the following goals:

Consistent defaults between .min and .orig: either (1) always use the .min metadata or (2) precompute (where necessary) metadata for .min and .orig
Minimize runtime: computing loaded classes and relevant tests on-the-fly is way too time-consuming.
Convenience: accessing metadata on Github (or locally) allows for quick sanity checks and analyses.

Here is a concrete proposal:

Command: we add a top-level command that can export (any) metadata for a bug.
Consistency: all commands use the metadata for the checked-out project version. For example, coverage will instrument different classes for .min and .orig.
Meta-data repo: we move the precomputed metadata to a dedicated repository and add metadata for .min and .orig.

@Greg4cr, I am inclined to leave f.min and f.orig for later. We want to refactor b to be an actual alias rather than a dedicated version, we could tackle the f versions at the same time. This also allows us to get broader feedback first, which could influence future changes.

Greg4cr · 2020-03-12T06:26:12Z

@rjust - I like your proposal. If we have a command that can generate the metadata for the original version, we can actually keep our repo quite clean while still enabling access to data:

Our repo contains artifacts for the min version, as those require curation.
The artifacts for the orig version do not need curation, and can be extracted on-command.

jose · 2020-03-12T10:35:07Z

@rjust, I do agree with your proposal. It would be tricky to keep the metadata repository up-to-date and synchronized with the D4J repository but I think we can manage that. (I do prefer to have all metadata in place, i.e., without having to run any command or even clone D4J. It's just easier to consult metadata if it has been pre-computed.)

rjust · 2020-03-12T17:07:27Z

@jose I was thinking about a submodule solution, but I agree it will be less convenient. As an alternative, having a snapshot of all pre-computed metadata in a dedicated repo (clearly linked from the D4J README) and keeping only necessary metadata in D4J proper might be acceptable.

Btw, I just noticed that the current metadata in D4J proper is not "complete" -- for example, modified_classes directory does not provide the list of modified test classes, but the patches directory does provide all test patches.

Let's group all the metadata into:

Curated (e.g., minimized src patches)
Pre-computed [instant] (e.g., modified files, original src patches, test patches)
Pre-computed [long-running]: (e.g., loaded classes, relevant tests)

We can put all of the above metadata (and possibly more) into a dedicated repo. We need to keep the curated files in D4J proper -- and probably want to keep the results of long-running analyses as well. I don't think we need to store metadata that can be instantly computed with a command (the dedicated metadata repo will still provide immediate access online). If we want to make pre-computed [instant] available locally, then the init script could populate the corresponding folders.

Any thoughts?

rjust · 2020-03-12T22:12:30Z

@Greg4cr and @jose, for this particular PR, let's make a decision on what the semantics of the -r flag (test, coverage, etc.) should be for b.orig project versions and make sure that this is implemented correctly. This likely involves more code changes and possibly a few metadata changes.

I think that we should tackle the re-org of the metadata, too, but in another PR.

jose · 2020-03-13T18:08:21Z

IMO, the -r flag should use .orig metadata whenever the original buggy version is checkout and use .min metadata if the minimized buggy version is checkout. It might be a good idea to first address the re-org of the metadata in another PR and then revisit this.

…extensions

Greg4cr · 2020-03-16T08:50:42Z

I agree with @jose - the current relevant_tests are generated based on the minimized patch. We should have a second relevant_tests generated for the original patch. If -r is used for the original version, we should run the relevant tests for that, not the ones calculated for the minimized version.

Given the growing complexity of the .orig/.min split, we may want to go ahead and tackle the creation of a metadata repo (or, at least, refactor of the metadata in our repo), then refactor to use that.

One suggestion for the metadata repo - maintain separate .orig and .min directories for each type of metadata. This could prevent each folder from becoming too messy. For a given project, we would have the following directory structure:

Project
- failing_tests
- loaded_classes
  - orig
  - min
- modified_classes
  - orig
  - min
- patches
  - orig
  - min
- relevant_tests
  - orig
  - min
- trigger_tests

(assuming we create static metadata files instead of generating on the fly)

jose · 2020-03-16T10:05:03Z

Yes, lets go ahead and address the metadata repository first. @Greg4cr, I like your suggestion.

mernst · 2024-08-04T16:16:57Z

@rjust What is the status of this pull request?

rjust added 20 commits March 9, 2020 22:44

First stab at adding support for b.min and b.orig

3488d18

Added patchutils as a dependency

af8d6e9

Account for the fact that we need to strip the top-level dir from som…

dae65c9

…e patches.

Merge branch 'master' of https://github.com/rjust/defects4j into vid-…

b984623

…extensions

Use the same apply command for Git and Svn; try multiple -p settings …

95158b7

…before giving up.

Merge branch 'simplify-apply' of https://github.com/rjust/defects4j i…

94ead5c

…nto vid-extensions

Moved subroutine to where it is called.

2437a01

Avoid crlf-related issues when applying Chart patches.

1f70a7f

Merge branch 'fix-crlf-issue' of https://github.com/rjust/defects4j i…

9939817

…nto simplify-apply

Merge branch 'fix-crlf-issue' of https://github.com/rjust/defects4j i…

7bbec92

…nto vid-extensions

Merge branch 'simplify-apply' of https://github.com/rjust/defects4j i…

ca96085

…nto vid-extensions

Redirect all output.

5a91dfa

Output more debugging information in case apply_patch fails.

6b196e2

Merge branch 'fix-crlf-issue' of https://github.com/rjust/defects4j i…

00b5908

…nto simplify-apply

Merge branch 'master' of https://github.com/rjust/defects4j into simp…

9b03063

…lify-apply

Merge branch 'master' of https://github.com/rjust/defects4j into vid-…

6f0db47

…extensions

Merge branch 'simplify-apply' of https://github.com/rjust/defects4j i…

5838155

…nto vid-extensions

Added a helper function to enumerate all project ids.

9b608cb

Added a test script.

127f49a

Updated travis config.

a09597b

rjust requested review from Greg4cr and jose March 11, 2020 06:27

rjust added 2 commits March 11, 2020 07:01

Merge branch 'master' of https://github.com/rjust/defects4j into vid-…

4c34cd2

…extensions

Minor tweak to align error message with regex used for checking a vid.

0126ba7

Special case for JodaTime; stronger test case.

ae1a65a

Mockito projects need to be compiled before running test.

1908415

rjust mentioned this pull request Mar 13, 2020

Implements #304: Adds deprecated-bugs.csv and refactors commit-db into active-bugs.csv #312

Merged

Merge branch 'master' of https://github.com/rjust/defects4j into vid-…

b12a7f0

…extensions

Merge ../defects4j-branch-master into vid-extensions

a514530

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements #288: support for two additional version ids (<bid>b.min and <bid>b.orig) #309

Implements #288: support for two additional version ids (<bid>b.min and <bid>b.orig) #309

rjust commented Mar 11, 2020 •

edited

Loading

Greg4cr commented Mar 11, 2020

jose commented Mar 11, 2020

Greg4cr commented Mar 11, 2020

rjust commented Mar 11, 2020 •

edited

Loading

Greg4cr commented Mar 12, 2020 •

edited

Loading

jose commented Mar 12, 2020

rjust commented Mar 12, 2020

rjust commented Mar 12, 2020

jose commented Mar 13, 2020

Greg4cr commented Mar 16, 2020 •

edited

Loading

jose commented Mar 16, 2020

mernst commented Aug 4, 2024

Implements #288: support for two additional version ids (<bid>b.min and <bid>b.orig) #309

Are you sure you want to change the base?

Implements #288: support for two additional version ids (<bid>b.min and <bid>b.orig) #309

Conversation

rjust commented Mar 11, 2020 • edited Loading

Greg4cr commented Mar 11, 2020

jose commented Mar 11, 2020

Greg4cr commented Mar 11, 2020

rjust commented Mar 11, 2020 • edited Loading

Greg4cr commented Mar 12, 2020 • edited Loading

jose commented Mar 12, 2020

rjust commented Mar 12, 2020

rjust commented Mar 12, 2020

jose commented Mar 13, 2020

Greg4cr commented Mar 16, 2020 • edited Loading

jose commented Mar 16, 2020

mernst commented Aug 4, 2024

rjust commented Mar 11, 2020 •

edited

Loading

rjust commented Mar 11, 2020 •

edited

Loading

Greg4cr commented Mar 12, 2020 •

edited

Loading

Greg4cr commented Mar 16, 2020 •

edited

Loading