Overload == and isequal for checking equality of Terms and Schemas #174

ghost · 2020-02-22T10:01:22Z

This PR aims to add functionality to compare two schema using == operator and isequal function. There is no difference between the two methods except that of the treatment of floating point numbers by == and isequal. Two schemas are equal if they are comprised by the same terms. All the terms to the two schema should be common. Every term in the first schema should have an equivalent term in the second schema.

Constant terms are equal if the same is number represented by them.
Formula terms are equal if the two terms have same LHS components and same RHS components respectively.
Function terms are equal if the same function applied is to both and same arguments are passed to both.
Interaction terms are equal if the same terms are present in the interaction for both terms.
Intercept terms are equal if they have the same width.
Continuous terms are equal if all invariants are equal.
Categorical terms are equal if the width and contrast matrices of the terms are equal respectively.
Terms with differently named variables are always unequal.

A special case may arise where the currently proposed implementation does not function properly. Example:-
df1 = (y = y, a = 1:9, b = b, c = repeat(["d", "e", "f"], 3))
df2 = (y = [df1.y; df1.y], a = [1:9; 1:9], b = [b; b], c = [df1.c; df1.c])
are unequal because the variance of the Continuous terms varies inversely with number of data points.

This PR will close #173.

kleinschmidt

This looks pretty good but needs some work. At a high level, could you say more about what the desired behavior is in the summary of the pull request at the top? When ARE two terms equal? When are two schema equal? Are there any potentially surprising situations where two things would NOT be equal? Is the behavior of isequal and == different?

Also, there are a lot of changes here that don't have anything to do with checking equality, both whitespace/formatting, but also some functional stuff (at least one which breaks an existing test). Can you remove all the changes that aren't directly required to implement the proposed behavior for == and isequal? And make sure that the existing tests all run. Best to check that locally but the travis CI checks will also tell you that (you can view the builds on the

src/terms.jl

test/schema.jl

ghost · 2020-02-29T18:11:28Z

Hi, Thanks for reviewing. I'll do the changes and add a summary of expected behaviour.

codecov-io · 2020-03-09T05:50:44Z

Codecov Report

Merging #174 into master will increase coverage by 0.4%.
The diff coverage is 90%.

@@            Coverage Diff            @@
##           master     #174     +/-   ##
=========================================
+ Coverage   86.47%   86.87%   +0.4%     
=========================================
  Files           9        9             
  Lines         510      564     +54     
=========================================
+ Hits          441      490     +49     
- Misses         69       74      +5

Impacted Files	Coverage Δ
src/schema.jl	`84% <100%> (+2.01%)`	⬆️
src/terms.jl	`87.21% <81.25%> (-0.82%)`	⬇️
src/contrasts.jl	`87.95% <0%> (+0.45%)`	⬆️
src/temporal_terms.jl	`84% <0%> (+4%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 749fdcd...ee16796. Read the comment docs.

kleinschmidt

This is looking very close! I think the last thing (which I didn't think of before) is that there aren't tests for some of the added == methods, in particular the FunctionTerm and MatrixTerms.

src/schema.jl

src/terms.jl

test/schema.jl

src/terms.jl

ghost · 2020-03-16T08:11:25Z

I have fixed all things you asked except adding tests for Matrix terms. I do not know how to create a matrix term or how to include one in a formula. Can you please give an example of one such term ?

Co-Authored-By: Dave Kleinschmidt <[email protected]>

kleinschmidt · 2020-03-16T15:54:18Z

They come from doing apply_schema, which creates them on the RHS of the formula.

kleinschmidt · 2020-03-30T14:29:57Z

New tests still not passing.

ghost · 2020-03-31T13:38:08Z

Hi! I have fixed the broken tests. The latest build is now passing.

kleinschmidt · 2020-03-31T14:32:03Z

test/schema.jl

+    @test schema(df) != schema(df7)
+    @test schema(df) != schema(df8)
+    @test schema(df8) != schema(df)
+    @test apply_schema(f, schema(df)) == apply_schema(f, schema(df5))


First I was confused because schema(df) != schema(df5), so the result of applying them to the same formula shouldn't be equal. But then I realized that f has already had a schema applied to it above, so this isn't testing what I thought it was testing. Which makes me wonder, what is the purpose of this test? apply_schema(@formula(y ~ 1 + a + log(b) + c + b&c), schema(df5)) would produce an error because y isn't found in df5. But since you've already applied a schema with y in it above, you're actually just re-applying the schema terms that ARE found. I'm not sure that's a particularly useful test, unless you were specifically trying to test this edge case... so can you explain what the goal of this test is?

ghost requested a review from kleinschmidt February 22, 2020 10:11

kleinschmidt requested changes Feb 29, 2020

View reviewed changes

src/terms.jl Outdated Show resolved Hide resolved

test/schema.jl Show resolved Hide resolved

Overload == and isequal to check schema equality

0a6d78d

Remove unneccessary changes.

1ca55c0

kleinschmidt requested changes Mar 15, 2020

View reviewed changes

Consistent formatting style and add more tests.

e6f8b87

ghost requested a review from kleinschmidt March 16, 2020 08:15

Update src/terms.jl

2f03a7c

Co-Authored-By: Dave Kleinschmidt <[email protected]>

amCap1712 added 2 commits March 16, 2020 23:14

Add more tests.

ef367d7

Fix broken tests.

993829b

Fix broken tests.

ee16796

kleinschmidt reviewed Mar 31, 2020

View reviewed changes

kleinschmidt mentioned this pull request Sep 21, 2021

equality and hash for terms and schemas #241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overload == and isequal for checking equality of Terms and Schemas #174

Overload == and isequal for checking equality of Terms and Schemas #174

ghost commented Feb 22, 2020 •

edited by ghost

Loading

kleinschmidt left a comment

ghost commented Feb 29, 2020

codecov-io commented Mar 9, 2020 •

edited

Loading

kleinschmidt left a comment

ghost commented Mar 16, 2020

kleinschmidt commented Mar 16, 2020

kleinschmidt commented Mar 30, 2020

ghost commented Mar 31, 2020

kleinschmidt Mar 31, 2020

Overload == and isequal for checking equality of Terms and Schemas #174

Are you sure you want to change the base?

Overload == and isequal for checking equality of Terms and Schemas #174

Conversation

ghost commented Feb 22, 2020 • edited by ghost Loading

kleinschmidt left a comment

Choose a reason for hiding this comment

ghost commented Feb 29, 2020

codecov-io commented Mar 9, 2020 • edited Loading

Codecov Report

kleinschmidt left a comment

Choose a reason for hiding this comment

ghost commented Mar 16, 2020

kleinschmidt commented Mar 16, 2020

kleinschmidt commented Mar 30, 2020

ghost commented Mar 31, 2020

kleinschmidt Mar 31, 2020

Choose a reason for hiding this comment

ghost commented Feb 22, 2020 •

edited by ghost

Loading

codecov-io commented Mar 9, 2020 •

edited

Loading