Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Binary Search for Sorted Collections #3207

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

joshuabambrick
Copy link

Added search method to collection prototype as per #3169. This is automatically used by where and findWhere when it is possible to do so to provide speedups without user needing to do anything.

For toFind, the user may provide:

  • a comparison function which compares takes one value and returns 1 if the sought value has higher index, -1 if its lower and 0 if the value matches (comparator unused)
  • a value to search for (the value of the model[comparator] sought - requires comparator to be a string)
  • a model to search for (where comparator is a function)

Users may also request that it is the index of the model that is returned or that the model of highest/lowest index which matches the comparison is sought.

When no match is found, undefined is returned, or if returnIndex is true then -1 is returned instead.

One point that may need to be considered is that the behaviour when a model is input as toFind but the comparator is a string is different from that when comparator is a function. Furthermore, the name of this method may also need some more thought to emphasise that it is for binary searches or that it only works on sorted collections so binarySearch or the more user-friendly sortedSearched might be preferable.

Added `search` method to collection prototype. automatically used by
`where` and `findWhere` when it is possible to do so.

user may provide:
- a comparison function (comparator unused)
- a value to search for (the value of the model[comparator] sought -
requires comparator to be a string)
- a model to search for (where `comparator` is a function)

Users may also request that it is the index of the model that is
returned or that the model of highest/lowest index which matches the
comparison is sought.

One point that may need to be considered is that the behaviour when a
model is input as `toFind` but the `comparator` is a string is different
from that when `comparator` is a function.
Fixed small bugs with `where` and `findWhere` and the erroneous tests
which didn't catch these issues.
@joshuabambrick joshuabambrick changed the title Added binary search for Collections (fixed) Added Binary Search for Sorted Collections (fixed) Jun 25, 2014
@joshuabambrick joshuabambrick changed the title Added Binary Search for Sorted Collections (fixed) Added Binary Search for Sorted Collections Jun 26, 2014
} else if (_.isFunction(this.comparator)) {
// Use the `comparator` function, `toFind` is a model.
if (this.comparator.length === 2) {
compare = _.bind(this.comparator, this, toFind);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever!

@megawac
Copy link
Collaborator

megawac commented Aug 9, 2014

I do think _.sortedIndex should be delegated to for the same cases as sort delegates to sortBy, in order to prevent possible inconsistencies between this and sort (jashkenas/underscore#1768)

If `toFind` is a function, the `compare` method will be bound to the
model.

Removed option to return model and always return index.
@joshuabambrick
Copy link
Author

@megawac Could you possibly elaborate on this a bit more? I think using _.sortedIndex would require considerable code replication which I think is best avoided if possible. I have read through the issue you linked to but I don't quite follow what this would achieve. Cheers.

@joshuabambrick
Copy link
Author

@megawac In fact, looking at the underscore source code, there is no obvious, efficient way to use _.sortedIndex - the iterator is not passed the index of the element currently being looked at which makes it difficult to work with the getMin and getMax options.

@megawac
Copy link
Collaborator

megawac commented Aug 21, 2014

getMin and getMax are trivially interchangeable. Just increment by 1 once and then until x[i] !== x[i+1]

This function has too many unnecessary options and params. I'd ditch those options and make a second functions sortedIndexRight or whatever if necessary.

@joshuabambrick
Copy link
Author

getMin and getMax are the only options at present. Whilst your suggestion is probably fair in the usual case, it does increase the complexity of this function from O(lg n) to O(n) - equivalent to a full loop of the collection when these parameters are set - so I wouldn't refer to them as unnecessary.

In any case, I would quite like to hear a bit more about why you think _.sortedIndex should be used.

@megawac
Copy link
Collaborator

megawac commented Aug 21, 2014

  1. You could easily optimize for the duplicate case but I don't think thats
    necessary.

  2. code redundancy, simplicity and future-compatibility with _.sortBy

@joshuabambrick
Copy link
Author

future-compatibility with _.sortBy

might be worthwhile, although I would quite like to hear why you think this method could ever become incompatible.

I'm not sure that your suggested change would necessarily improve code redundancy or simplicity. As far as I can see, the binary search code would still need to be there for the case of a two parameter comparator and so all we would be doing is adding more code for the other cases, effectively achieving the same thing by two possible means, rather that just one.

Furthermore, I think whether or not optimising for the case where multiple models compare equally need to be considered. I think this is actually quite a common situation - perhaps even occurring in the majority of cases and I suppose it would be slightly disappointing to not optimise efficiency where it is possible, since optimising efficiency where it is possible was the goal of this method. If you have an alternative strategy for optimising the duplicate cases whilst maintaining the use of _.sortedIndex that may be worth considering.

@megawac
Copy link
Collaborator

megawac commented Aug 21, 2014

  1. Read the underscore issue I linked earlier.

  2. this is a sorted index I don't really see the rational of needing a
    right sorted index at all. The sort position is the left sorted i ndex +1
    except for equal item cases as we've noted. Its not worth the code bloat to
    handle the edge case efficiently in my opinion

  3. I've always been against exposing options args as they usually go poorly
    documented and lead to behavior that is confusing.
    On Aug 21, 2014 6:48 PM, "Josh Bambrick" [email protected] wrote:

future-compatibility with _.sortBy

might be worthwhile, although I would quite like to hear why you think
this method could ever become incompatible.

I'm not sure that your suggested change would necessarily improve code
redundancy or simplicity. As far as I can see, the binary search code would
still need to be there for the case of a two parameter comparator and so
all we would be doing is adding more code for the other cases, effectively
achieving the same thing by two possible means, rather that just one.

Furthermore, I think whether or not optimising for the case where multiple
models compare equally need to be considered. I think this is actually
quite a common situation - perhaps even occurring in the majority of cases
and I suppose it would be slightly disappointing to not optimise efficiency
where it is possible, since optimising efficiency where it is possible was
the goal of this method. If you have an alternative strategy for optimising
the duplicate cases whilst maintaining the use of _.sortedIndex that may
be worth considering.


Reply to this email directly or view it on GitHub
#3207 (comment).

@joshuabambrick
Copy link
Author

It would be great to get a few more opinions on this. If others agree with you, I would be happy to implement your ideas. That being said, if this gets included, I will gladly also write up documentation to ensure that the use of this method is clear - it would surely take less time than that we've already spent discussing it.

I'm also not fully sure what you mean by 'options args' or 'right sorted' but I hope I have interpreted them correctly.

An overview of different viewpoints

@megawac

  • the getMin and getMax options are unnecessary and you should assume that the min is wanted (or max should be a different function)
  • it is not necessary to find the min/max efficiently, as these are edge cases - if you even support finding them at all
  • improved comparison is needed (beyond greater/less) in order to maintain consistency with the sorting carried out by Array.prototype.sort - suggesting delegating to _.sortedIndex for the same cases as Backbone.Collection.prototype.sort delegates to _.sortBy.
  • the use of _.sortedIndex would reduce code redundancy and improve simplicity
  • the option of having toFind as a function, to use as the compare function (see code) should be removed

@joshbambrick

  • getMin and getMax are useful and should not be discarded - as evidenced by their use in Backbone.Collection.prototype.where. I have added code to throw an error if they are both set in order to make their use easier and provide feedback. Adding a separate method for searching for the max would clutter the API, require extra code, and likely result in code repetition.
  • optimising the search for min and max is important as it ensures O(lg n) run time for this function in all cases - and search optimisation is the key goal of this method - furthermore, the liklihood of several models comparing as equal is higher than @megawac believes
  • maintaining consistency with Array.prototype.sort is a 'nice-to-have' but not worth the costs (described below), will not be guaranteed by documentation, and unnecessary, such to the extent that it is not currently support by Backbone or underscore
  • the best way to maintain consistency with Array.prototype.sort, would be adding a comparison function to the underscore api, which can be used by _.sortedIndex - the use of this new method would allow fixing Backbone.Collection.prototype.search by changing a single line
  • since the two argument comparator format could not use _.sortedIndex, as such all current code (except possibly the current strategy to find the min/max) would be necessary, in addition, the code to use _.sortedIndex and more separate code to (now inefficiently) search for the min/max
  • I am open to the idea of removing the option of having toFind as a function. However it is worth considering that it would provide greater flexibility and a nice way for users of the comparator as a function to compare on whatever criteria they like, without having to create a model that compares equal to pass. Furthermore, the cost of this is only a single line, and I am happy to ensure that it is documented well.

I think that @megawac's main issue with the code as it stands is the inconsistency with Array.prototype.sort, as he posted about in an underscore issue. If the idea of adding a comparison method, such as that he wrote in the issue, to underscore is implemented, I think we can easily and neatly sort that out. Even if this method isn't part of the public API, making it available to Backbone, possibly via a "mostly internal" function, like iteratee, would simplify things enormously.
Alternatively, since the function is so short, I could just copy an paste it into my pull request, at least until it gets added to underscore.

@joshuabambrick
Copy link
Author

It might improve the interface to take the second parameter as getMax and if that parameter is not true, assume that the user wants the minimum.

now only have `getMax` boolean, if this is not truthy then the min is
fetched
bump test underscore to 1.7.0 (unreleased)
@joshuabambrick
Copy link
Author

bumped test underscore to 1.7.0 to include @megawac's new _.comparator method for consistency with Array.prototype.sort and _.sortedIndex
also only have getMax as a boolean, and otherwise efficiently fetch the minimum (with no time big-O complexity cost)

updated code to match changes to backbone.js
@megawac
Copy link
Collaborator

megawac commented Jan 3, 2015

@joshbambrick that's not 1.7 but the master working branch.

@joshuabambrick
Copy link
Author

you're right. I wasn't quite sure what the protocol was in this situation.

@akre54
Copy link
Collaborator

akre54 commented Feb 17, 2015

@megawac @joshbambrick where are we at with this? Worth adding to 1.2.0 or holding off a bit more?

@megawac
Copy link
Collaborator

megawac commented Feb 17, 2015

@akre54 I feel we should off on this until underscores methods are improved. It could reduce the bloat from this implementation

@akre54
Copy link
Collaborator

akre54 commented Feb 17, 2015

Ok I'll leave it be for now. Thanks.

@akre54 akre54 mentioned this pull request Apr 16, 2015
3 tasks
@joshuabambrick
Copy link
Author

we could update the pull request we a local copy of _.comparator, and then update it to use that defined in underscore, when it becomes available.

@joshuabambrick
Copy link
Author

see most recent pull request for version with local copy of comparator (and also reset underscore.js version - was a mistake last time)

@joshuabambrick
Copy link
Author

could consider renaming to binarySearch to be Java-like, or sortedIndexOf since it does a similar thing to some underscore.js methods

to avoid having to update this file at all, adding the newline back in at the end
@joshuabambrick
Copy link
Author

Any updates on the status of this? If the compare slows the search down too much, as for _.comparator, we could put this back to the original version with plain old >/< comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants