-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement scores for FDatairregular
objects as described in #609
#610
Conversation
(testing included to assert equality with the `FDataGrid` case)
score: FDataIrregular, | ||
squared: bool = True, | ||
weights: NDArrayFloat | None = None, | ||
) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pep8] reported by reviewdog 🐶
DAR201 Missing "Returns" in Docstring: - return
@@ -554,6 +605,23 @@ | |||
return _multioutput_score_grid(error, multioutput) | |||
|
|||
|
|||
@mean_absolute_percentage_error.register # type: ignore[attr-defined, misc] | |||
def _mean_absolute_percentage_error_fdatairregular( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pep8] reported by reviewdog 🐶
WPS118 Found too long name: _mean_absolute_percentage_error_fdatairregular > 45
epsilon = np.finfo(np.float64).eps | ||
|
||
if np.any(np.abs(y_true.values) < epsilon): | ||
warnings.warn('Zero denominator', RuntimeWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pep8] reported by reviewdog 🐶
B028 No explicit stacklevel argument found. The warn method from the warnings module uses a stacklevel of 1 by default. This will only show a stack trace for the line on which the warn method is called. It is therefore recommended to use a stacklevel of 2 or greater to provide more information to the user.
skfda/tests/test_scoring.py
Outdated
@@ -461,3 +469,101 @@ def test_negative_msle(self) -> None: | |||
y_true_grid, | |||
y_pred_grid, | |||
) | |||
|
|||
|
|||
############### Test irregular data scoring #################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pep8] reported by reviewdog 🐶
E266 too many leading '#' for block comment
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #610 +/- ##
===========================================
+ Coverage 86.65% 86.67% +0.02%
===========================================
Files 156 156
Lines 13322 13380 +58
===========================================
+ Hits 11544 11597 +53
- Misses 1778 1783 +5 ☔ View full report in Codecov by Sentry. |
The integral of the score is normalized because each integral is divided by | ||
the length of the curve's domain. | ||
|
||
If the score is vector-valued, then the mean of each codimension integral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what we want? Is what we do for the other types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the question is regarding whether to divide by the length of the curve's domain or by the length of the FDataIrregular
object's domain. This is the only difference that there is between the results of FDataGrid
scores and the FDataIrregular
that I implemented. As I said in #609, I think that dividing by each curve's domain length is more accurate, as the integral of that curve is being made only taking into account its particular domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant the treatment of vector-valued functions, but you also raised an interesting point that I didn't notice, and maybe we should discuss in the meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering the initial question, then: yes, for other types we also take the mean of each codimension integral in the case of vector-valued functions. I think it is a reasonable design decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge it for now, as the only remaining issue is the length we use in the division, and there are other PRs waiting for this to be merged.
I propose to keep the associated issue #609 open and mention in it explicitly the problem with the quotient lengths, to be solved in the future.
I'll explain the problem with the quotient lengths in #609. Related to said problem is the design (or definition) of the integral of discretized functional observations (regular or irregular) when the grid endpoints do not coincide with the domain's. I will create another issue to discuss this (edit: this is the issue: #619). |
This pull request depends on #608 (integrating
FDataIrregular
objects is needed to implement the scores).As explained in #609 ,
mean_absolute_error
,mean_absolute_percentage_error
,mean_squared_error
andmean_squared_log_error
have been implemented in the case when bothy_true
andy_pred
areFDataIrregular
objects.Test cases have been included to ensure the same score is obtained if the
FDataIrregular
objects are obtained fromFDataGrid
's.