Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Source Hint in colnamesInt Function for Error Clarity #6008

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

Nj221102
Copy link
Contributor

@Nj221102 Nj221102 commented Mar 17, 2024

Summary:
Added an optional source argument to the colnamesInt function to provide a hint about the source of errors.

Details:

  • Added a new argument source to the colnamesInt function.
  • Used the source argument to provide hints about the source of errors.
  • This enhancement improves the clarity of error messages, making it easier for users to identify the source of errors.
  • Added the source wherever the function colnamesInt was called.
  • Writing test for the testing the accuracy of the updated error messages in tests.Rraw.

This PR Closes #5039

Copy link

codecov bot commented Mar 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.53%. Comparing base (7fbc314) to head (c20c7a0).

❗ Current head c20c7a0 differs from pull request most recent head 992f066. Consider uploading reports for the commit 992f066 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6008      +/-   ##
==========================================
- Coverage   97.53%   97.53%   -0.01%     
==========================================
  Files          80       80              
  Lines       14916    14915       -1     
==========================================
- Hits        14548    14547       -1     
  Misses        368      368              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ben-schwen
Copy link
Member

ben-schwen commented Mar 17, 2024

The error messages should be meaningful. Adding the line number where the call took place provides no such information to the user and has only drawbacks over using traces, e.g. line numbers need to be updated every time code is added.

I converted it back to a draft.

@ben-schwen ben-schwen marked this pull request as draft March 17, 2024 23:28
@HughParsonage HughParsonage changed the title **Enhancement: Introduce Source Hint in colnamesInt Function for Error Clarity** Introduce Source Hint in colnamesInt Function for Error Clarity Mar 18, 2024
@HughParsonage HughParsonage marked this pull request as ready for review March 18, 2024 01:21
@HughParsonage HughParsonage marked this pull request as draft March 18, 2024 01:22
R/data.table.R Outdated Show resolved Hide resolved
Nj221102 and others added 2 commits March 18, 2024 11:58
@Nj221102
Copy link
Contributor Author

Nj221102 commented Mar 18, 2024

@MichaelChirico Is these changes in source Information enough or their is need for more context .

@Nj221102 Nj221102 requested a review from MichaelChirico March 18, 2024 09:53
@Nj221102 Nj221102 marked this pull request as ready for review March 18, 2024 15:23
@Nj221102
Copy link
Contributor Author

Nj221102 commented Mar 19, 2024

@MichaelChirico @ben-schwen @jangorecki @HughParsonage can someone review this PR please and suggest if there is need for further changes :)

R/wrappers.R Outdated Show resolved Hide resolved
@@ -72,6 +72,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) {
which.last = data.table:::which.last
`-.IDate` = data.table:::`-.IDate`
haszlib = data.table:::haszlib
colnamesInt = data.table:::colnamesInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colnamesInt tests are mostly in nafill.Rraw so we might want to add the tests there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

src/utils.c Outdated
if (!isNewList(x))
error(_("'x' argument must be data.table compatible"));
error(_("In %s 'x' argument must be data.table compatible"),CHAR(STRING_ELT(source, 0)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the "In %s" but would tend to something like
error(_("%s\n'x' argument must be data.table compatible"),CHAR(STRING_ELT(source, 0)));
or
error(_("'x' argument must be data.table compatible for %s"),CHAR(STRING_ELT(source, 0)));

Ofc the source message then need to follow this style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍 ,
Now error messages looks something like :-

While matching column names from 'cols' to columns in 'x'
'x' argument must be data.table compatible

src/utils.c Outdated Show resolved Hide resolved
@ben-schwen
Copy link
Member

ben-schwen commented Mar 19, 2024

I guess the tricky part here are the nested calls where the function is not directly called by the user itself but happening as a subcall, e.g. as with setdiff_ and .duplicated.helper, so I'm not sure about the source messages there.

Definitely already an improvement but still not perfect.

NEWS.md Outdated Show resolved Hide resolved
Nj221102 and others added 2 commits March 19, 2024 18:38
Co-authored-by: Benjamin Schwendinger <[email protected]>
Co-authored-by: Benjamin Schwendinger <[email protected]>
Co-authored-by: Benjamin Schwendinger <[email protected]>
@Nj221102 Nj221102 marked this pull request as draft March 19, 2024 13:16
@Nj221102 Nj221102 marked this pull request as ready for review March 19, 2024 14:34
@MichaelChirico
Copy link
Member

@jangorecki WDYT about colnamesInt() returning NULL, or negative integers, instead of erroring itself? Then let caller handle the error. Might be easier than trying to jam everything into this source= argument.

Alternatively, we could have colnamesInt() signal a custom error class, and caller can tryCatch() its way to giving a more helpful error. WDYT?

@jangorecki
Copy link
Member

NULL and negatives sounds good. We just need to differentiate each of the errors raised by it, so couple negative numbers.

@Nj221102
Copy link
Contributor Author

@jangorecki WDYT about colnamesInt() returning NULL, or negative integers, instead of erroring itself? Then let caller handle the error. Might be easier than trying to jam everything into this source= argument.

Alternatively, we could have colnamesInt() signal a custom error class, and caller can tryCatch() its way to giving a more helpful error. WDYT?

It seems that this PR might not be suitable for addressing the issue, so should i close this PR and work on the alternate approach ? Additionally, which alternate approach is preferable "letting caller handle the error directly" or " creating custom error class" .

@MichaelChirico
Copy link
Member

We just need to differentiate each of the errors raised by it, so couple negative numbers.

I think using custom error class is a more natural way to accomplish this, but am wary of calls within C to the same routine:

SEXP ricols = PROTECT(colnamesInt(obj, cols, ScalarLogical(TRUE))); protecti++; // nafill cols=NULL which turns into seq_along(obj)

Is calling tryCatch() from C going to be a huge mess? Might we move that colnamesInt() call to R instead?

should i close this PR and work on the alternate approach

I would instead include a 'revert' commit in this branch, then start the new approach from scratch in same PR. That way full context of review history remains available in the same place for future readers.

@jangorecki
Copy link
Member

I would aim to keep this function as fast as possible, therefore calling trycatch probably is not great. Maybe R wrapper that calls C function and translates negative error codes into custom conditions.

@MichaelChirico
Copy link
Member

I would aim to keep this function as fast as possible, therefore calling trycatch probably is not great. Maybe R wrapper that calls C function and translates negative error codes into custom conditions.

Happy to benchmark the two, but is it really so performance-critical? It doesn't seem like it would be called more than 1-3 times in any given operation, must be <100 times in a typical analysis.

Here's a very quick comparison:

foo1 = function(x) if (anyNA(out <- match(x, 1:10))) stop("not found") else out
foo2 = function(x) if (anyNA(out <- match(x, 1:10))) -1L else out

bar1 = function(x) tryCatch(foo1(x), error=stop)
bar2 = function(x) if (any((out <- foo1(x)) < 0)) stop('not found') else out

microbenchmark(times = 1e4, bar1 = try(bar1(0), silent=TRUE), bar2 = try(bar2(0), silent=TRUE))
# Unit: microseconds
#  expr    min      lq      mean   median      uq      max neval cld
#  bar1 93.525 98.0795 105.53914 100.4415 103.268 6373.577 10000  a 
#  bar2 79.119 83.1405  88.39692  85.1870  87.651 2154.560 10000   b
microbenchmark(times = 1e4, bar1 = bar1(2:4), bar2 = bar2(2:4))
# Unit: microseconds
#  expr   min     lq     mean median    uq      max neval cld
#  bar1 5.925 6.3295 7.166895 6.8225 7.509 1659.777 10000  a 
#  bar2 1.459 1.6050 1.850732 1.6950 1.859   20.135 10000   b

So yes, there is some overhead, but on scale of 10μs --> up to 1ms difference in a full analysis script.

@Nj221102 Nj221102 marked this pull request as draft March 28, 2024 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

colnamesInt makes inscrutable errors
5 participants