Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nearest Neighbor] Code and basic test for nearest neighbor #28

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kushalpokharel
Copy link
Contributor

@kushalpokharel kushalpokharel commented Oct 14, 2023

I have added two functions: mindist and minmaxdistance, which will be helpful in pruning the tree while searching for the nearest neighbor. The code doesn't work yet. I have some doubts about the implementation:

  • Should we add the distance functions to the Value module as I have added or do we make it specific to module Point in n dimension?
  • If I make another module called Point with floatarray representing points in n dimensions. How do I make sure that the second parameter is of type Envelope?

@kushalpokharel kushalpokharel marked this pull request as draft October 14, 2023 19:26
@kushalpokharel kushalpokharel changed the title Add mindist and minmaxdist functions. Tests not working [Nearest Neighbor] Add mindist and minmaxdist functions. Tests not working Oct 14, 2023
test/basic.ml Outdated
let mindist (p:floatarray) (e:envelope) =
let (x0,y0,x1,y1) = Rtree.Rectangle.coords e in
let gete ind = if ind==0 then x0
else if ind==1 then y0 else if ind==2 then x1 else y1 in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple if-else is not a good idea IMO
Maybe you can use a match statement here, something like this

let gete = match ind with 
...
| 1 -> ...
|2 -> ...
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @AryanGodara here, also note the difference between = and == in OCaml (both can be used to check for equality): https://cs3110.github.io/textbook/chapters/mut/refs.html?highlight=physical#physical-equality

test/basic.ml Outdated

let minmaxdist p e =
let (x0,y0,x1,y1) = Rtree.Rectangle.coords e in
let gete ind = if ind==0 then x0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @AryanGodara. I also need some feedback on how the point shall be stored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @AryanGodara, I will change this but I also need feedback on where the point is stored in the code. @patricoferris I am waiting for the feedback

@@ -43,6 +43,9 @@ module type Value = sig

val envelope : t -> envelope
(** Given a value, calculates the envelope for it. *)

val mindist: t -> envelope -> float
val minmaxdist: t -> envelope -> float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be exposed in the interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to test these functions as well. Is that not necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that they're a part of the Value signature then we need them here and can be tested individually. Could we add some documentation strings to them ? minmaxdist is a little opaque I think ?

test/basic.ml Outdated
end

module R = Rtree.Make (Rtree.Rectangle) (V)

module Vp = struct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Vp short for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is short for (Value for Point)

@kushalpokharel
Copy link
Contributor Author

Hi @patricoferris, please review the latest changes. I have listed my approach below which is based on the paper you provided.

  • Within the envelope, we have two functions mindist and minmaxdist which we need to provide to calculate the distance from a point to that envelope (bounding box).
  • The nearest neighbor function is available inside rtree which takes a rtree and a point (float list) and traverses each node and sees if the distance from the envelope of the node to the point is greater than the minmaxdist. If the distance is greater then it doesn't traverse, else it traverses that node.
  • I have added tests for two distance functions and one nearest_neighbor.

There might be some mistakes as I have only tested a basic case for nearest neighbor. Please guide me on how to create a deterministic large R tree to correctly test the nearest neighbor function.

@kushalpokharel kushalpokharel changed the title [Nearest Neighbor] Add mindist and minmaxdist functions. Tests not working [Nearest Neighbor] Code and basic test for nearest neighbor Oct 27, 2023
@kushalpokharel kushalpokharel marked this pull request as ready for review October 27, 2023 10:18
@kushalpokharel
Copy link
Contributor Author

@patricoferris, today is the last day for some contribution, if you could review this and hopefully we could resolve this, it would be an addition for the outreachy application

@patricoferris
Copy link
Contributor

@harshey1103 please do still reference unmerged PRs in your application :))

Copy link
Contributor

@patricoferris patricoferris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great step in the right direction @kushalpokharel. When I was thinking about the interface for this feature, I thought the distance functions would end up in the Rtree_intf.Value module? Users would then provide their own distance functions from whatever item they're storing to whatever envelope they are using? This would allow us to have a nearest neighbour function that works for all Rtree's and not just points, what do you think? Am I missing something?

src/rectangle.ml Outdated

let mindist p e =
let x0, y0, x1, y1 = (x0 e, y0 e, x1 e, y1 e) in
let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these indirections are potentially a little more confusing than helpful. I think it would be clearer and more maintainable to write these in full, even if it seems a little more verbose. Seem my overview comment before making changes to this directly though.

src/rectangle.ml Outdated

let minmaxdist p e =
let x0, y0, x1, y1 = (x0 e, y0 e, x1 e, y1 e) in
let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment.

src/rectangle.ml Outdated
let minmaxdist p e =
let x0, y0, x1, y1 = (x0 e, y0 e, x1 e, y1 e) in
let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in
let rm k =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A slightly more descriptive name would be helpful here, and also so we don't confuse rm and rM.

src/rectangle.ml Outdated
in

let rM k =
if get p k >= (gete k +. gete (k + 2)) /. 2. then gete k else gete (k + 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gete indirection could just be replace with direct use of the x0 etc. variables no ?

@@ -77,6 +89,9 @@ module type Envelope = sig

val contains : t -> t -> bool
(** [contains a b] asks whether [b] is contained by [a]. *)

val minmaxdist : float list -> t -> float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these needed in envelope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put this in a separate module called Point, or create a function in an envelope, to calculate the distance from a point to that envelope. This seemed easier to me. I know this is not the appropriate place for the function. Need to find its home.

@kushalpokharel
Copy link
Contributor Author

kushalpokharel commented Oct 30, 2023

This is a great step in the right direction @kushalpokharel. When I was thinking about the interface for this feature, I thought the distance functions would end up in the Rtree_intf.Value module? Users would then provide their own distance functions from whatever item they're storing to whatever envelope they are using? This would allow us to have a nearest neighbour function that works for all Rtree's and not just points, what do you think? Am I missing something?

Exactly my thought and my question in the first comment. I wanted to do the same. However, calculating distances from a value to an envelope was confusing to me. I think the users can give the distance function between two envelopes in the R-tree because we could turn the value into an envelope. Is this approach correct?

@kushalpokharel
Copy link
Contributor Author

kushalpokharel commented Nov 2, 2023

Hi @patricoferris, I made some changes to the code especially the location of distance functions inside V module. Can you review this?

Regarding other comments:

  • Naming is still the same, will change after the structure is set
  • about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

@kushalpokharel
Copy link
Contributor Author

Hi @patricoferris, I made some changes to the code especially the location of distance functions inside V module. Can you review this?

Regarding other comments:

  • Naming is still the same, will change after the structure is set
  • about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

Hi @patricoferris, can you provide some feedback on this?

test/distance Outdated
let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in
let rm k =
if get p k <= (gete k +. gete (k + 2)) /. 2. then gete k else gete (k + 2)
in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this file committed by mistake? If so can it be removed ?

test/persist.ml Outdated
@@ -10,6 +10,9 @@ module Line = struct
let y0 = Float.min arr.(1) arr.(3) in
let y1 = Float.max arr.(1) arr.(3) in
Rtree.Rectangle.v ~x0 ~y0 ~x1 ~y1

let mindist _a _b= 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do run a dune build @fmt --auto to run the formatter as currently the CI will not like the _b= part I think

@@ -43,6 +43,9 @@ module type Value = sig

val envelope : t -> envelope
(** Given a value, calculates the envelope for it. *)

val mindist: t -> envelope -> float
val minmaxdist: t -> envelope -> float
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that they're a part of the Value signature then we need them here and can be tested individually. Could we add some documentation strings to them ? minmaxdist is a little opaque I think ?

@patricoferris
Copy link
Contributor

about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

I don't quite understand, don't you already use gete 2 in this 2D case ?

let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in

@kushalpokharel
Copy link
Contributor Author

about using x0 instead of gete, what I am actually doing is setting up the function for one axis. For example: if I was to implement distance for 3D I would just have to add gete 2. If this doesn't make sense to you, we can discuss.

I don't quite understand, don't you already use gete 2 in this 2D case ?

let gete ind = match ind with 0 -> x0 | 1 -> y0 | 2 -> x1 | _ -> y1 in

Sorry for the late reply, I missed the notification for this. I am using gete inside of the functions like dist (in mindist).
mindist function is calculated by identifying the nearest line of the bounding box parallel to x-axis and similarly the same process is done for the lines parallel to y-axis and if we work in higher dimension, similar is done to other axes. Hence, I want to get the coordinates according to the axis that I am working on.
For example: For minimum distance from point to rectangle: I do mindist 0 +. mindist 1.
Now mindist is parametrized by i (the axis we are working on), the gete will give us the points relating to that axis.
Does this make sense to you?

@kushalpokharel
Copy link
Contributor Author

@patricoferris, I have made some variable name changes and deleted the file. Please review these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants