Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shouldn't take object properties order into consideration #65

Open
awvalenti opened this issue May 9, 2016 · 28 comments
Open

Shouldn't take object properties order into consideration #65

awvalenti opened this issue May 9, 2016 · 28 comments
Labels

Comments

@awvalenti
Copy link
Contributor

I liked your library a lot, much simpler than most of the others. I found that a test here failed because the "id" property was in the end of the object instead of the beginning. According to json.org: "An object is an unordered set of name/value pairs".

@awvalenti
Copy link
Contributor Author

Here's a simple test case for the scenario described:

assertEquals(Json.parse("{\"a\":1,\"b\":2}"), Json.parse("{\"b\":2,\"a\":1}"));

@bernardosulzbach
Copy link
Contributor

Right now names and values are stored using lists, so an order-independent comparison could be as expensive as O(m·n) where m and n are the number of values. How do other reference libraries do it?

I don't care about this from a practical standpoint (never showed up to me until now), but from a design perspective, this seems to be an issue. Good catch, @awvalenti.

@ralfstx
Copy link
Owner

ralfstx commented May 10, 2016

Actually, considering objects with the same set of members but a different order as not equal was a conscious decision.

I see JsonObject and JsonArray more as a representations of a JSON text that has been parsed or that is about to be read, than as collection classes to store data in an application. That's also why these classes don't implement Java Collection interfaces. And since the resulting JSON strings would be different, I didn't consider the JsonObjects equal.

The JsonObject#equals() method as it is currently implemented is a valid equivalence relation, as it is symmetric, reflexive, and transitive.

However, I understand the request and suppose that it would be helpful to ignore the order, e.g. in unit tests. How about adding an additional equals method like equalsIgnoreOrder(), comparable to String#equalsIgnoreCase()?

@bernardosulzbach
Copy link
Contributor

@ralfstx I like the idea. I don't know if your reasoning for the equals method has been documented anywhere else, but it would be nice to have it in the source, not only on this issue.

Also, I would appreciate if the internal representation of JsonObject didn't change. This should not cause a performance hit, right?

@ralfstx
Copy link
Owner

ralfstx commented May 10, 2016

@mafagafogigante right, we should add JavaDoc for JsonObject#equals(). Currently, it inherits from Object#equals. Good point!

As for the performance, I don't think it's a problem if the new equalsIgnoreOrder() is slower than equals(), the documentation could explain that there is a price. We shouldn't sacrifice the internal structure just to speed up this method. Even if it's O(n^2), I suppose it will be fast enough for reasonable input sizes. Moreover, there's still a small hash table that will make it much faster for smaller objects.

@bernardosulzbach
Copy link
Contributor

@ralfstx glad we agree on Javadoc'ing what we consider JsonObject equality. Documenting the decision is even more important than the decision itself.

I don't really care about the performance of equalsIgnoreOrder(), I just don't want the internal representation to change to allow for this. Again, good that we are on the same page about that too.

@awvalenti I suppose that this completely addresses the raised issue (as long as design goes). Do you have any remarks or comments on points we missed?

@awvalenti
Copy link
Contributor Author

Hello,

I see now that it was a conscious decision, especially after I found a test case specifically stating that. I don't quite agree that JsonObject and Array should represent a "text JSON", although I understand your point of view.

An equalsIgnoreOrder() would work. However, test output would be something like "expected true, found false" - not very informative.

Another possible solution would be to provide a toMap() method on JsonObject and toList() on JsonArray. They would lead to more clear test outputs, like "expected {a=1,b=2}, found {c=3}".

What do you think?

@bernardosulzbach
Copy link
Contributor

I think I disagree mainly because of the design rationale of this library. We start by giving it a equalsIgnoreOrder() because it seems almost a requirement to conform to JSON, then we get toMap and toList and two days later this is a clone of Jackson. It is not my call in the end, but I think that toMap and toList are a bit overboard. Not to mention that implementing those yourself if you need them shouldn't be hard.

@awvalenti
Copy link
Contributor Author

Clone of Jackson? I've used Jackson a little and had problems with it, but why would these features lead minimal-json to becoming a copy of Jackson?

And what's your opinion about test output on failure?

@bernardosulzbach
Copy link
Contributor

bernardosulzbach commented May 19, 2016

minimal-json cannot and should not get "nice features" for the sake of convenience.

And what's your opinion about test output on failure?

I've missed that one, please provide a brief summary of what the question is.

@awvalenti
Copy link
Contributor Author

The question is:

An equalsIgnoreOrder() would work. However, test output would be something like "expected true, found false" - not very informative.

Another possible solution would be to provide a toMap() method on JsonObject and toList() on JsonArray. They would lead to more clear test outputs, like "expected {a=1,b=2}, found {c=3}".

@bernardosulzbach
Copy link
Contributor

I do agree that "expected true, found false" is not very informative, as any reasonable developer would. However, I don't think that we should expose internals or add methods to address this type of issue. Better messages in your assertion calls would remediate it, as in "the JsonArray does not have the expected elements". Furthermore, if you want the explicit values to be shown, it shouldn't be hard to do in a high level language such as Java for your own project, locally.

minimal-json (emphasis mine, again) is meant to stay away from such verbosity and expose only a clean and short API, which it has been successful in doing so far. It is not meant to be a fully-fledged, we-solve-everything, gargantuan library. This is why I think that these sort of conversion methods do not belong in here.

@awvalenti
Copy link
Contributor Author

awvalenti commented May 28, 2016

You got me convinced that minimal-json should in fact be minimal :).

However, equalsIgnoreOrder() also follows the direction of non-minimal approach, although on a lesser degree than toMap().

My opinion remains that equals() should not consider elements order. This is in accordance with JSON specification, which, for me, should be the top priority. To do this, I see two ways: one is O(m . n) and the other one is O(n) (fixed previous typo saying that it was O(1)).

The O(m . n) solution keeps the current implementation of JsonObject and just modifies the equalsmethod to iterate on the two ArrayLists.

The O(n) solution would be to use LinkedHashMap instead of ArrayLists. LinkedHashMap works as a map, but preserves insertion order. As far as I know, this would result in a O(n) implementation (simply call LinkedHashMap#equals).

I don't see any drawbacks on the second suggestion. Do you?

@bernardosulzbach
Copy link
Contributor

To do this, I see two ways: one is O(m . n) and the other one is O(1).

O(1) was a typo, right? The second solution is linear, not constant.

The O(n) solution would be to use LinkedHashMap instead of ArrayLists. LinkedHashMap works as a map, but preserves insertion order. As far as I know, this would result in a O(n) implementation (simply call LinkedHashMap#equals).

Equality checks would be linear indeed. However, populating a properly written dynamic array (ArrayList here) is substantially faster than populating a LinkedHashMap. The doubly linked list is not very expensive, but every insertion will call hashCode() now, which is expensive in some cases, while ArrayList's add() never calls it.

If we had to comply to the standard, your latter solution would be a right way to do it. However, as this does not seem like a requirement, I am against it. The performance hit is not trivial even for use cases I have.

I agree with you that equalsIgnoreOrder() is also against the design philosophy. However, differently from toMap() and toList() (which I will tell you to write yourself), I don't think it is a sensible thing to do to ask everyone to write ordering-independant equality checks. This is debatable, however, and may not be implemented if @ralfstx thinks it should not be.

I just need to make it very clear to you that the increased cost for addition would introduce a noticeable performance regression. This would affect users of minimal-json. If it wasn't for this, I would also defend the ordering-independent equality behavior.

@awvalenti
Copy link
Contributor Author

Oops, O(1) was a typo, indeed. Sorry!

I see your point about performance and agree with it.

I haven't understand this part: "ask everyone to write ordering-independant equality checks". Which checks would those be?

About toMap() and toList(), it seems to me now that they could simply be implemented on the test case (something like assertEquals(toMap(expectedJsonObject), toMap(obtainedJsonObject)).

@awvalenti
Copy link
Contributor Author

Wait a second... I thought the map to be used would be a generic Map<K, V>. But it's not: JsonObject keys are always Strings. The map would be Map<String, JsonValue>. String's hashCode, from what I searched, is O(n) on the first call (where n is its length) and O(1) on subsequent calls (since the result is cached). In addition, properties names are usually just a few characters.

Are there examples in which this would raise performance issues even so?

@bernardosulzbach
Copy link
Contributor

I haven't understand this part: "ask everyone to write ordering-independant equality checks". Which checks would those be?

I simply meant that I believe the library should provide an equalsIgnoreOrder() method. This is not something trivial to delegate to users. It is to some users, but not for all of them, I suppose. Also, by simply having an equals() and an equalsIgnoreOrder() methods it is made clear that equals() may not ignore ordering. Documentation in code at its best.

I am not sure if String hashCode() is cached on all implementations. However, I am fairly confident that it is on all that I use. Just remember that, in the end, even if it "works" for me and you, we shouldn't ignore the rest of the world.

Still on the caching issue, I am fairly certain that the library would not reuse the String objects of one Json object in other Json objects that also have it. This implies that the effective hashCode caching would not happen as you think it will. See this line of code to understand what I am talking about.

There is also the case in which you are creating JsonObject objects from Java code, in which you could ensure String reuse (and therefore hashCode caching) yourself, but the JsonParser cannot do it at parsing time.

In addition, properties names are usually just a few characters.

This is a good point. However, "usually" is not good enough for a library that needs to deal with big amounts of data (sometimes in real time). Additionally, usually there are tons of objects, each with several properties of its own, which implies a lot of hashCode() calls that are not going to be cached.

@ralfstx
Copy link
Owner

ralfstx commented May 30, 2016

@awvalenti we don't use HashMap for performance reasons. Our implementation is faster, especially for small objects, has a smaller memory footprint, and does not create wrapper objects for every element, which results in less work for the garbage collector. The same applies to LinkedHashMap.

@ralfstx
Copy link
Owner

ralfstx commented May 30, 2016

@mafagafogigante I'm not generally opposed to the idea of a toMap() method, but I see your concerns. In particular, I wonder how much value a Map<String, JsonValue> would provide beyond the use case of unit testing (which I absolutely see). I'm also afraid that some users would expect a recursive implementation–which I'd consider out of scope (and the generic type rules it out anyway). But then again, some tests may actually require a deep comparison, and toMap() would be useless for this purpose...

The implementation of toMap() would be straightforward, maybe even clearer than an implementation of equalsIgnoreCase() - e.g. how would you expect the latter to handle duplicate keys? With the semantics of a map, it's obvious that subsequent members would overwrite preceding ones.

I wouldn't consider performance to be an issue, as this function would not be used in the context of parsing or writing JSON, would it?

@bernardosulzbach
Copy link
Contributor

@mafagafogigante I'm not generally opposed to the idea of a toMap() method, but I see your concerns.

@ralfstx I think you misunderstood me a little bit. I don't really care if toMap() and toList() are implemented as long as the internal implementation of the current objects is left unchanged. I think they shouldn't be implemented on a lean library, and your arguments against a toMap() seem to make this even more obvious.

The implementation of toMap() would be straightforward, maybe even clearer than an implementation of equalsIgnoreCase() - e.g. how would you expect the latter to handle duplicate keys?

You mean equalsIgnoreOrdering(), right? Well, I haven't seen duplicate keys used in practice but as far as I remember the standard is indifferent about it. However, for instance, this reference implementation does not allow for duplicate keys. This library does, which is also correct according to the JSON standard. Ultimately, I do not believe that the standard defines when two JSON objects are equal. I think that a sensible thing to do is to see if each and every key-value pair of an object has an identical key-value pair in the other object, not allowing the same key-value pair to be matched multiple times.

This would make {a: b} != {a: b, a: b} but {a: b, a: b} == {a: b, a: b}.

Implementation-wise: sort by a chained comparator of a key comparator and a value comparator, walk from first to last element and fail as soon as you get two entries that differ. This assumes you will also fail fast on the number of entries being different.

Then we get "fast" comparison in O(n) with what we currently have and "correct" comparison in O(n lg n), which is only required in a few cases. Selection sort would be a good idea if most objects happened to be different as failing it midway would reduce the running time substantially, but any superlinear algorithm seems a simpler (and more likely correct) solution. Just don't let users even specify which algorithm they want us to use internally at runtime for equalsIgnoreCase().

@awvalenti
Copy link
Contributor Author

This "duplicate key" thing is a bit scary :)... I was thinking about it yesterday. I thought the JSON spec prohibited it, but it simply says nothing about it, leaving for the implementation to decide. By the way, if duplicate keys are deliberately accepted by minimal-json, I believe it should have unit tests for that. Does it?

In my opinion, toMap would only be useful if recursive.

What if we simply add equalsIgnoreOrder and implement it by creating temporary LinkedHashMaps, adding elements from both JsonObjects and return one.equals(another)? Silly as it may sound, it's a simple, O(n) solution, which will not affect current implementation at all and will rarely be called, anyway.

@awvalenti
Copy link
Contributor Author

awvalenti commented May 31, 2016

@mafagafogigante About the caching issue, you're right, it wouldn't work as expected. I was going to suggest calling String.intern(nonCachedString), but just found out that it is extremely slow!

@bernardosulzbach
Copy link
Contributor

By the way, if duplicate keys are deliberately accepted by minimal-json, I believe it should have unit tests for that. Does it?

I don't know. It is well documented, so I never even bothered checking it. Buy you may check it yourself. And yes, there should be some test cases using duplicate keys.

Additionally, a quick read of the JsonObject tend to make things very clear. But not many users are willing to read through the library code.

In my opinion, toMap would only be useful if recursive.

And that's what most users would expect it to be.

What if we simply add equalsIgnoreOrder and implement it by creating temporary LinkedHashMaps, adding elements from both JsonObjects and return one.equals(another)?

I agree with you about implementing it. But if we are going to do it, I want it to be done right. Your solution allows for {a: b} == {a: b, a: b}, which I think are not the same JSON object and should produce false.

@bernardosulzbach
Copy link
Contributor

I was going to suggest calling String.intern(nonCachedString)

Let's not go there. The implementation has changed a lot in the past few years and I don't think a library should be messing with the string table.

I think this is going to end on implement equalsIgnoreOrdering(), however that is done.

@awvalenti
Copy link
Contributor Author

awvalenti commented May 31, 2016

Like I said, I gave up suggesting the String.intern thing.

Your solution allows for {a: b} == {a: b, a: b}

You're right. If the library allows {a:b, a:b}, equals shouldn't behave this way. (by the way, I created a pull request with two new unit tests to document the "multi-keys" behavior)

I think this is going to end on implement equalsIgnoreOrdering(), however that is done.

I agree.

I'm not sure if I understood your proposal of solution, but it seemed O(n . log n) to me if it involves sorting before comparing.

@bernardosulzbach
Copy link
Contributor

Yes, asymptotical worst case of the simplest one is n · log n, which shouldn't be a big problem. In addition, I couldn't think of a way to test for equality as we want to in linear time.

ralfstx added a commit that referenced this issue Jun 19, 2016
Explain the conditions which imply equality for JsonObject and JsonArray.
In particular, point out that JsonObjects are considered equal only if
the members have the same order.

See #65
@ralfstx
Copy link
Owner

ralfstx commented Jun 19, 2016

In my opinion, toMap would only be useful if recursive.

I think I agree on that. In fact, calling equals() on the returned map would in turn call the default equals() method on nested JsonObjects. However, the obvious return value for toMap would be Map<String, JsonValue>, which does not allow for a recursive implementation. Moreover, a recursive method, if used excessively, may result in a performance disaster.

Considering a deep comparison of JsonObjects, an equalsIgnoreOrder would be equally useless if not implemented recursively, wouldn't it?

@awvalenti
Copy link
Contributor Author

Probably. Isn't the ordinary equals recursive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants