-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overload == and isequal for checking equality of Terms and Schemas #174
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good but needs some work. At a high level, could you say more about what the desired behavior is in the summary of the pull request at the top? When ARE two terms equal? When are two schema equal? Are there any potentially surprising situations where two things would NOT be equal? Is the behavior of isequal
and ==
different?
Also, there are a lot of changes here that don't have anything to do with checking equality, both whitespace/formatting, but also some functional stuff (at least one which breaks an existing test). Can you remove all the changes that aren't directly required to implement the proposed behavior for ==
and isequal
? And make sure that the existing tests all run. Best to check that locally but the travis CI checks will also tell you that (you can view the builds on the
Hi, Thanks for reviewing. I'll do the changes and add a summary of expected behaviour. |
Codecov Report
@@ Coverage Diff @@
## master #174 +/- ##
=========================================
+ Coverage 86.47% 86.87% +0.4%
=========================================
Files 9 9
Lines 510 564 +54
=========================================
+ Hits 441 490 +49
- Misses 69 74 +5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very close! I think the last thing (which I didn't think of before) is that there aren't tests for some of the added ==
methods, in particular the FunctionTerm
and MatrixTerm
s.
I have fixed all things you asked except adding tests for Matrix terms. I do not know how to create a matrix term or how to include one in a formula. Can you please give an example of one such term ? |
Co-Authored-By: Dave Kleinschmidt <[email protected]>
They come from doing |
New tests still not passing. |
Hi! I have fixed the broken tests. The latest build is now passing. |
@test schema(df) != schema(df7) | ||
@test schema(df) != schema(df8) | ||
@test schema(df8) != schema(df) | ||
@test apply_schema(f, schema(df)) == apply_schema(f, schema(df5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I was confused because schema(df) != schema(df5)
, so the result of applying them to the same formula shouldn't be equal. But then I realized that f
has already had a schema applied to it above, so this isn't testing what I thought it was testing. Which makes me wonder, what is the purpose of this test? apply_schema(@formula(y ~ 1 + a + log(b) + c + b&c), schema(df5))
would produce an error because y
isn't found in df5
. But since you've already applied a schema with y
in it above, you're actually just re-applying the schema terms that ARE found. I'm not sure that's a particularly useful test, unless you were specifically trying to test this edge case... so can you explain what the goal of this test is?
This PR aims to add functionality to compare two schema using == operator and isequal function. There is no difference between the two methods except that of the treatment of floating point numbers by == and isequal. Two schemas are equal if they are comprised by the same terms. All the terms to the two schema should be common. Every term in the first schema should have an equivalent term in the second schema.
A special case may arise where the currently proposed implementation does not function properly. Example:-
df1 = (y = y, a = 1:9, b = b, c = repeat(["d", "e", "f"], 3))
df2 = (y = [df1.y; df1.y], a = [1:9; 1:9], b = [b; b], c = [df1.c; df1.c])
are unequal because the variance of the Continuous terms varies inversely with number of data points.
This PR will close #173.