Introduce Source Hint in colnamesInt Function for Error Clarity #6008

Nj221102 · 2024-03-17T19:44:09Z

Summary:
Added an optional source argument to the colnamesInt function to provide a hint about the source of errors.

Details:

Added a new argument source to the colnamesInt function.
Used the source argument to provide hints about the source of errors.
This enhancement improves the clarity of error messages, making it easier for users to identify the source of errors.
Added the source wherever the function colnamesInt was called.
Writing test for the testing the accuracy of the updated error messages in tests.Rraw.

This PR Closes #5039

codecov · 2024-03-17T19:51:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.53%. Comparing base (7fbc314) to head (c20c7a0).

❗ Current head c20c7a0 differs from pull request most recent head 992f066. Consider uploading reports for the commit 992f066 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6008      +/-   ##
==========================================
- Coverage   97.53%   97.53%   -0.01%     
==========================================
  Files          80       80              
  Lines       14916    14915       -1     
==========================================
- Hits        14548    14547       -1     
  Misses        368      368

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ben-schwen · 2024-03-17T23:25:13Z

The error messages should be meaningful. Adding the line number where the call took place provides no such information to the user and has only drawbacks over using traces, e.g. line numbers need to be updated every time code is added.

I converted it back to a draft.

R/data.table.R

Co-authored-by: Michael Chirico <[email protected]>

Nj221102 · 2024-03-18T09:22:52Z

@MichaelChirico Is these changes in source Information enough or their is need for more context .

Nj221102 · 2024-03-19T09:11:29Z

@MichaelChirico @ben-schwen @jangorecki @HughParsonage can someone review this PR please and suggest if there is need for further changes :)

R/wrappers.R

ben-schwen · 2024-03-19T12:33:57Z

inst/tests/tests.Rraw

@@ -72,6 +72,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) {
  which.last = data.table:::which.last
  `-.IDate` = data.table:::`-.IDate`
  haszlib = data.table:::haszlib
+  colnamesInt = data.table:::colnamesInt


colnamesInt tests are mostly in nafill.Rraw so we might want to add the tests there.

ben-schwen · 2024-03-19T12:39:12Z

src/utils.c

  if (!isNewList(x))
-    error(_("'x' argument must be data.table compatible"));
+    error(_("In %s 'x' argument must be data.table compatible"),CHAR(STRING_ELT(source, 0)));


I'm not a big fan of the "In %s" but would tend to something like
error(_("%s\n'x' argument must be data.table compatible"),CHAR(STRING_ELT(source, 0)));
or
error(_("'x' argument must be data.table compatible for %s"),CHAR(STRING_ELT(source, 0)));

Ofc the source message then need to follow this style

Done 👍 ,
Now error messages looks something like :-

While matching column names from 'cols' to columns in 'x' 'x' argument must be data.table compatible

src/utils.c

ben-schwen · 2024-03-19T12:41:52Z

I guess the tricky part here are the nested calls where the function is not directly called by the user itself but happening as a subcall, e.g. as with setdiff_ and .duplicated.helper, so I'm not sure about the source messages there.

Definitely already an improvement but still not perfect.

NEWS.md

Co-authored-by: Benjamin Schwendinger <[email protected]>

MichaelChirico · 2024-03-19T20:07:51Z

@jangorecki WDYT about colnamesInt() returning NULL, or negative integers, instead of erroring itself? Then let caller handle the error. Might be easier than trying to jam everything into this source= argument.

Alternatively, we could have colnamesInt() signal a custom error class, and caller can tryCatch() its way to giving a more helpful error. WDYT?

jangorecki · 2024-03-21T11:19:34Z

NULL and negatives sounds good. We just need to differentiate each of the errors raised by it, so couple negative numbers.

Nj221102 · 2024-03-21T13:15:05Z

@jangorecki WDYT about colnamesInt() returning NULL, or negative integers, instead of erroring itself? Then let caller handle the error. Might be easier than trying to jam everything into this source= argument.

Alternatively, we could have colnamesInt() signal a custom error class, and caller can tryCatch() its way to giving a more helpful error. WDYT?

It seems that this PR might not be suitable for addressing the issue, so should i close this PR and work on the alternate approach ? Additionally, which alternate approach is preferable "letting caller handle the error directly" or " creating custom error class" .

MichaelChirico · 2024-03-25T04:30:13Z

We just need to differentiate each of the errors raised by it, so couple negative numbers.

I think using custom error class is a more natural way to accomplish this, but am wary of calls within C to the same routine:

data.table/src/nafill.c

Line 117 in b4ae3ef

    
           SEXP ricols = PROTECT(colnamesInt(obj, cols, ScalarLogical(TRUE))); protecti++; // nafill cols=NULL which turns into seq_along(obj)

Is calling tryCatch() from C going to be a huge mess? Might we move that colnamesInt() call to R instead?

should i close this PR and work on the alternate approach

I would instead include a 'revert' commit in this branch, then start the new approach from scratch in same PR. That way full context of review history remains available in the same place for future readers.

jangorecki · 2024-03-27T14:55:28Z

I would aim to keep this function as fast as possible, therefore calling trycatch probably is not great. Maybe R wrapper that calls C function and translates negative error codes into custom conditions.

MichaelChirico · 2024-03-27T16:02:34Z

I would aim to keep this function as fast as possible, therefore calling trycatch probably is not great. Maybe R wrapper that calls C function and translates negative error codes into custom conditions.

Happy to benchmark the two, but is it really so performance-critical? It doesn't seem like it would be called more than 1-3 times in any given operation, must be <100 times in a typical analysis.

Here's a very quick comparison:

foo1 = function(x) if (anyNA(out <- match(x, 1:10))) stop("not found") else out
foo2 = function(x) if (anyNA(out <- match(x, 1:10))) -1L else out

bar1 = function(x) tryCatch(foo1(x), error=stop)
bar2 = function(x) if (any((out <- foo1(x)) < 0)) stop('not found') else out

microbenchmark(times = 1e4, bar1 = try(bar1(0), silent=TRUE), bar2 = try(bar2(0), silent=TRUE))
# Unit: microseconds
#  expr    min      lq      mean   median      uq      max neval cld
#  bar1 93.525 98.0795 105.53914 100.4415 103.268 6373.577 10000  a 
#  bar2 79.119 83.1405  88.39692  85.1870  87.651 2154.560 10000   b
microbenchmark(times = 1e4, bar1 = bar1(2:4), bar2 = bar2(2:4))
# Unit: microseconds
#  expr   min     lq     mean median    uq      max neval cld
#  bar1 5.925 6.3295 7.166895 6.8225 7.509 1659.777 10000  a 
#  bar2 1.459 1.6050 1.850732 1.6950 1.859   20.135 10000   b

So yes, there is some overhead, but on scale of 10μs --> up to 1ms difference in a full analysis script.

nitish jha added 2 commits March 18, 2024 00:52

Added the source of the error in every error message

a788233

loading colnamesInt in tests.Rraw

6f24cfa

Nj221102 requested review from HughParsonage, jangorecki and MichaelChirico as code owners March 17, 2024 19:44

ben-schwen marked this pull request as draft March 17, 2024 23:28

HughParsonage added the enhancement label Mar 18, 2024

HughParsonage changed the title ~~**Enhancement: Introduce Source Hint in colnamesInt Function for Error Clarity**~~ Introduce Source Hint in colnamesInt Function for Error Clarity Mar 18, 2024

HughParsonage marked this pull request as ready for review March 18, 2024 01:21

HughParsonage marked this pull request as draft March 18, 2024 01:22

MichaelChirico reviewed Mar 18, 2024

View reviewed changes

R/data.table.R Outdated Show resolved Hide resolved

Nj221102 and others added 2 commits March 18, 2024 11:58

Update R/data.table.R

4cc8181

Co-authored-by: Michael Chirico <[email protected]>

changed source message

ce7f5c7

Nj221102 requested a review from MichaelChirico March 18, 2024 09:53

Merge branch 'master' into side

331b032

Nj221102 marked this pull request as ready for review March 18, 2024 15:23

Nj221102 and others added 2 commits March 19, 2024 15:51

Merge branch 'master' into side

651c2c6

added new enty

f4a65ea

ben-schwen reviewed Mar 19, 2024

View reviewed changes

R/wrappers.R Outdated Show resolved Hide resolved

ben-schwen reviewed Mar 19, 2024

View reviewed changes

src/utils.c Outdated Show resolved Hide resolved

ben-schwen reviewed Mar 19, 2024

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

Nj221102 and others added 2 commits March 19, 2024 18:38

Update R/wrappers.R

589c265

Co-authored-by: Benjamin Schwendinger <[email protected]>

Update src/utils.c

2af7e78

Co-authored-by: Benjamin Schwendinger <[email protected]>

Update NEWS.md

8eeb5c8

Co-authored-by: Benjamin Schwendinger <[email protected]>

Nj221102 marked this pull request as draft March 19, 2024 13:16

shifting tests from tests.Rraw to nafill.Rraw

2833dd0

Nj221102 marked this pull request as ready for review March 19, 2024 14:34

nitish jha added 2 commits March 19, 2024 22:34

format change

2f71063

fixing

c2fefb5

Nj221102 and others added 2 commits March 28, 2024 11:25

Merge branch 'Rdatatable:master' into side

0e33b8f

reverting changes

b45a193

Nj221102 marked this pull request as draft March 28, 2024 06:21

nitish jha and others added 5 commits March 28, 2024 11:54

fixing mistakes

40ef6ab

reverting more changes

a412628

final revert

b44a8f9

removing extra space

c20c7a0

Merge branch 'Rdatatable:master' into side

992f066

MichaelChirico mentioned this pull request Apr 11, 2024

Added skip_absent arguement to colnamesInt() #6068

Merged

MichaelChirico mentioned this pull request Nov 20, 2024

colnamesInt makes inscrutable errors #5039

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Source Hint in colnamesInt Function for Error Clarity #6008

Introduce Source Hint in colnamesInt Function for Error Clarity #6008

Nj221102 commented Mar 17, 2024 •

edited

Loading

codecov bot commented Mar 17, 2024 •

edited

Loading

ben-schwen commented Mar 17, 2024 •

edited

Loading

Nj221102 commented Mar 18, 2024 •

edited

Loading

Nj221102 commented Mar 19, 2024 •

edited

Loading

ben-schwen Mar 19, 2024

Nj221102 Mar 19, 2024

ben-schwen Mar 19, 2024

Nj221102 Mar 19, 2024

ben-schwen commented Mar 19, 2024 •

edited

Loading

MichaelChirico commented Mar 19, 2024

jangorecki commented Mar 21, 2024

Nj221102 commented Mar 21, 2024

MichaelChirico commented Mar 25, 2024

jangorecki commented Mar 27, 2024

MichaelChirico commented Mar 27, 2024

Introduce Source Hint in colnamesInt Function for Error Clarity #6008

Are you sure you want to change the base?

Introduce Source Hint in colnamesInt Function for Error Clarity #6008

Conversation

Nj221102 commented Mar 17, 2024 • edited Loading

codecov bot commented Mar 17, 2024 • edited Loading

Codecov Report

ben-schwen commented Mar 17, 2024 • edited Loading

Nj221102 commented Mar 18, 2024 • edited Loading

Nj221102 commented Mar 19, 2024 • edited Loading

ben-schwen Mar 19, 2024

Choose a reason for hiding this comment

Nj221102 Mar 19, 2024

Choose a reason for hiding this comment

ben-schwen Mar 19, 2024

Choose a reason for hiding this comment

Nj221102 Mar 19, 2024

Choose a reason for hiding this comment

ben-schwen commented Mar 19, 2024 • edited Loading

MichaelChirico commented Mar 19, 2024

jangorecki commented Mar 21, 2024

Nj221102 commented Mar 21, 2024

MichaelChirico commented Mar 25, 2024

jangorecki commented Mar 27, 2024

MichaelChirico commented Mar 27, 2024

Nj221102 commented Mar 17, 2024 •

edited

Loading

codecov bot commented Mar 17, 2024 •

edited

Loading

ben-schwen commented Mar 17, 2024 •

edited

Loading

Nj221102 commented Mar 18, 2024 •

edited

Loading

Nj221102 commented Mar 19, 2024 •

edited

Loading

ben-schwen commented Mar 19, 2024 •

edited

Loading