-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Source Hint in colnamesInt Function for Error Clarity #6008
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6008 +/- ##
==========================================
- Coverage 97.53% 97.53% -0.01%
==========================================
Files 80 80
Lines 14916 14915 -1
==========================================
- Hits 14548 14547 -1
Misses 368 368 ☔ View full report in Codecov by Sentry. |
The error messages should be meaningful. Adding the line number where the call took place provides no such information to the user and has only drawbacks over using traces, e.g. line numbers need to be updated every time code is added. I converted it back to a draft. |
Co-authored-by: Michael Chirico <[email protected]>
@MichaelChirico Is these changes in source Information enough or their is need for more context . |
@MichaelChirico @ben-schwen @jangorecki @HughParsonage can someone review this PR please and suggest if there is need for further changes :) |
inst/tests/tests.Rraw
Outdated
@@ -72,6 +72,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) { | |||
which.last = data.table:::which.last | |||
`-.IDate` = data.table:::`-.IDate` | |||
haszlib = data.table:::haszlib | |||
colnamesInt = data.table:::colnamesInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colnamesInt
tests are mostly in nafill.Rraw so we might want to add the tests there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
src/utils.c
Outdated
if (!isNewList(x)) | ||
error(_("'x' argument must be data.table compatible")); | ||
error(_("In %s 'x' argument must be data.table compatible"),CHAR(STRING_ELT(source, 0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of the "In %s" but would tend to something like
error(_("%s\n'x' argument must be data.table compatible"),CHAR(STRING_ELT(source, 0)));
or
error(_("'x' argument must be data.table compatible for %s"),CHAR(STRING_ELT(source, 0)));
Ofc the source message then need to follow this style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍 ,
Now error messages looks something like :-
While matching column names from 'cols' to columns in 'x'
'x' argument must be data.table compatible
I guess the tricky part here are the nested calls where the function is not directly called by the user itself but happening as a subcall, e.g. as with Definitely already an improvement but still not perfect. |
Co-authored-by: Benjamin Schwendinger <[email protected]>
Co-authored-by: Benjamin Schwendinger <[email protected]>
Co-authored-by: Benjamin Schwendinger <[email protected]>
@jangorecki WDYT about Alternatively, we could have |
NULL and negatives sounds good. We just need to differentiate each of the errors raised by it, so couple negative numbers. |
It seems that this PR might not be suitable for addressing the issue, so should i close this PR and work on the alternate approach ? Additionally, which alternate approach is preferable "letting caller handle the error directly" or " creating custom error class" . |
I think using custom error class is a more natural way to accomplish this, but am wary of calls within C to the same routine: Line 117 in b4ae3ef
Is calling
I would instead include a 'revert' commit in this branch, then start the new approach from scratch in same PR. That way full context of review history remains available in the same place for future readers. |
I would aim to keep this function as fast as possible, therefore calling trycatch probably is not great. Maybe R wrapper that calls C function and translates negative error codes into custom conditions. |
Happy to benchmark the two, but is it really so performance-critical? It doesn't seem like it would be called more than 1-3 times in any given operation, must be <100 times in a typical analysis. Here's a very quick comparison: foo1 = function(x) if (anyNA(out <- match(x, 1:10))) stop("not found") else out
foo2 = function(x) if (anyNA(out <- match(x, 1:10))) -1L else out
bar1 = function(x) tryCatch(foo1(x), error=stop)
bar2 = function(x) if (any((out <- foo1(x)) < 0)) stop('not found') else out
microbenchmark(times = 1e4, bar1 = try(bar1(0), silent=TRUE), bar2 = try(bar2(0), silent=TRUE))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# bar1 93.525 98.0795 105.53914 100.4415 103.268 6373.577 10000 a
# bar2 79.119 83.1405 88.39692 85.1870 87.651 2154.560 10000 b
microbenchmark(times = 1e4, bar1 = bar1(2:4), bar2 = bar2(2:4))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# bar1 5.925 6.3295 7.166895 6.8225 7.509 1659.777 10000 a
# bar2 1.459 1.6050 1.850732 1.6950 1.859 20.135 10000 b So yes, there is some overhead, but on scale of 10μs --> up to 1ms difference in a full analysis script. |
Summary:
Added an optional
source
argument to thecolnamesInt
function to provide a hint about the source of errors.Details:
source
to thecolnamesInt
function.source
argument to provide hints about the source of errors.colnamesInt
was called.tests.Rraw
.This PR Closes #5039