-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/knn #200
base: master
Are you sure you want to change the base?
Feature/knn #200
Conversation
Closes #201 |
} | ||
|
||
def nearestNeighbors(timeseries: Array[Double], | ||
featureDim: Int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multiline indentation, we use the convention used here: https://github.com/sryza/spark-timeseries/blob/master/src/main/scala/com/cloudera/sparkts/models/ARIMA.scala#L80
} | ||
|
||
/** | ||
* @author debasish83, xiangzhe, santanu.das |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use author tags in spark-ts and let readers use git blame to track down the originators of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 other colleagues contributed to the idea and so I wanted to include them...I can add these in log messages....
val queryPoint = timeseries(queryStart + featureDim) | ||
minHeap.clear() | ||
targetArray.update(i, Neighbor(matchedTarget, queryPoint)) | ||
i = i + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i += 1?
else { | ||
minHeap.add((targetIndex, distance)) | ||
} | ||
j = j + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
j += 1
import java.util.Comparator | ||
import scala.collection.JavaConverters._ | ||
|
||
class BoundedPriorityQueue[E](k: Int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark has a BoundedPriorityQueue implementation - would it make sense to rely on that? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/BoundedPriorityQueue.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for BLAS I had to add the spark package and I can use the BoundedPriorityQueue from there...
@@ -339,6 +339,7 @@ | |||
</execution> | |||
</executions> | |||
</plugin> | |||
<!-- clenaup artifact sign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this needed for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow my local compilation was failing on signing the artifact...I need to install some plugin...will fix it
Hi @debasish83, thanks for your contribution! I left a few comments inline. One broader comment: are you able to scaladoc for the main public classes and methods? |
@sryza We would like to contribute KNN model to the spark-ts package. Opening it for initial review. I will introduce KNNModel and implement the prediction/forecast using it. Also the lag function should use TimeSeries.lag..we optimized it for performance.
Reference: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/regression.py