-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "format string" support for tensors #655
Conversation
I think this PR could use some other eyes / opinions. Not so much because of the added code (you're welcome to review of course), but more so because of the added syntax. @mratsim @HugoGranstrom @Clonkk (feel free to ping anyone else!) If we decide to add it to arraymancer directly (and not e.g. as a submodule that can be imported |
## | ||
## Inputs: | ||
## - Input Tensor | ||
## - specifier: A format specifier similar to those used in format strings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't those be an enum instead of static[string] ?
In my experience custom string format specifier are rather confusing.
Multiple format output could even be different proc so it's easier to navigate code and change it. We could also introduce a CT switch -d:... to choose which prettyPrint version is used on $
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an enum is the first thing I tried but it was not very flexible. Using a string is how I let the user configure the individual element display in the same flexible manner that nim's strformat allows. For example, you can display integers as hex or as binary values, you can force the display of the sign for positive values, you can specify the number of digits in total and on the decimal part or (with nim 2.2) you can ask to use the "math-like" "j" complex display format (which displays complex numbers as 2.3-7.1j
) which is particularly useful for a tensor library in the context of signal processing.
Good stuff over all. To me the biggest discussion to be had is about using string as specifier. For non-mutually exclusive options we could use a set (something done in the std lib). |
Thanks for working on this @AngelEzquerra 😄 I agree with Clonkk that string formats are quite confusing. type
PrintMode = enum
arrayMode
singleLineMode
prettyMode
proc pretty*[T](t: Tensor[T], formatString: string, showHeader: bool = true, printMode: PrintModeEnum = prettyMode): string =
... Or is it something I'm missing here, some combination of tokens that can't be handled this way? |
The only downside of enum I can see is you have 2 non mutually formatting options you need 3 enums (optA, optB, optA_B). And the more non mutually exclusive options you add the worse it gets. That is why I.think we should also.consider set instead of enum since it can still work well even if we want/need different formatting options. On the other hand if we want to keep limited possibility then enum is simpler. |
I agree with you that it is more flexible. Do we have any plans to introduce any mutually inclusive options, though? If we do, it is easy enough to add an overload that just calls the set-version with the single argument in a set. :) |
Thank you @Vindaar for asking others to comment and thank you for those comments guys! :-) Just to give more context to other people, the main reason I made this PR is that while the current pretty printing of tensors is nice (and pretty! :-) ), it makes it hard to compare arraymancer's results with numpy's, which uses pretty different format, and, above all, it makes it really hard to use arraymancer's own output in your arraymancer code (e.g. when creating a test or using arraymancer to build a simulator). Another reason is to be able to control the format of the elements themselves (e.g. to use hex format, etc). Finally, I wanted to be able to do all of those while using nim's strformat, which I use extensively and find very useful and nice. Another comment I'd make is that I tried several options before landing on the current proposal. The first thing I tried is to add an enum argument to One small tweak that could be done to the current proposal is to force these "tokens" to surround the "element level format". That is, instead of (or perhaps in addition to) adding "[]" to "+06.2f" (i.e. "[]+06.2f"), we could do "[+06.2f]". |
Yes it is great and something I have also been a bit annoyed at some times that the string representation is a bit odd. 😅
One doesn't exclude the other. You can specify the array format with an enum and the element format with a string (like in my example). Or was there something that didn't work out with that approach? :) |
That is almost identical to the first thing I tried to do :-) echo &"Error the input tensor ({t:[]}) does not have the right rank ({t.rank})" Or something like this: echo {t=:+06.8f[]} And get a nice representation of my tensor in which every element has the "+06.f" format and I can copy it into my code because it is valid nim syntax. |
Yeah I understand, enum felt limited too when i thought about it. In your opinion do you think a set of options would work ?
for example would display hexadecimal in array format without the header. This seems flexible enough while allowing for high cardinality of options and at the same time doesn't require sanitizing input and is resilient to change in the specs (if an option disappears or is split into 2, it will instantly be a compile time error). Since enums can have a string representation, we could define
Since enum can have a string value it shouldn't be too difficult to use the enum in a strformat. EDIT :
This is a great argument, I am on my way to the airport and will look a bit more into your proposal. |
That is what the
Okay, I think I'm starting to see the problem, it is specifically the |
FYI, strformat strings also give compile time errors because they are all static strings :-) This helped me more than once when I was working on this feature.
Sorry, can you explain that a bit more? how would you do that? I'm not sure I understand your proposal. |
Yes, I guess I should have been clear about that too 😅. I wanted an easy way to use the non pretty syntax in format strings. I could have done that with a new function (e.g. Plus |
My idea was to use set to pass options to pretty print : Typically something like : var A : Tensor[float] = randomTensor(3, 5 ,6, 1.0)
echo a.prettyPrint({optNoHeader, optHexFormat, ... }) This make having inclusive options easy and from a doc / api point of view it makes a lot of sense and it is easy to know what's possible and what isn't. It's nice to have but won't solve We can always go for the string specifier and build a way to generate the string specifier from a set of enums (like have |
What if we extended the version of |
Yes, this is what I had in mind in my last paragraph. But it can be done after this PR once string specifier are merged |
00970d0
to
78647a4
Compare
78647a4
to
33288f2
Compare
src/arraymancer/tensor/display.nim
Outdated
## - Input Tensor. | ||
## - precision: The number of decimals printed (for float tensors), | ||
## _including_ the decimal point. | ||
## - showHeader: If true (the default) show a dscription header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add row-major or col-major indication in the header ? That would be very helpful when passing arraymancer's buffer to other interface (like Numpy or Julia) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, but #660 raises a good point about our colMajor support.
src/arraymancer/tensor/display.nim
Outdated
## - "||": "Pretty-print" the tensor _without_ a header. This | ||
## can also be combined with "<>" (i.e. "<>||") to | ||
## explicitly enable the default mode, which is pretty | ||
## printing with a header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AngelEzquerra Could we have a specifier to choose between showing the content 'in-memory' order OR in 'index' order ?
This seems good enough for me. I think we should merge this whenever CI is green to avoir PR rot. cc @Vindaar cc @mratsim cc @HugoGranstrom |
src/arraymancer/tensor/display.nim
Outdated
## - Input Tensor. | ||
## - precision: The number of decimals printed (for float tensors), | ||
## _including_ the decimal point. | ||
## - showHeader: If true (the default) show a dscription header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## - showHeader: If true (the default) show a dscription header | |
## - showHeader: If true (the default) show a description header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this typo in a commit taking care of all your comments.
func getTensorDispMode(specifier: string): tensorDispMode = | ||
## Get the display mode based on the special "tensor specifier tokens" | ||
let brackets = "[" in specifier and "]" in specifier | ||
let colon_brackets = specifier.count(':') == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we test for 2 :
here? Shouldn't it be one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"brackets" is when you use [] while "colon brackets" is when you use :: that's why we must check for a count of 2 colons.
# <:> is a shortcut for <>[:] | ||
multi_line_array | ||
elif brackets or logic_brackets: | ||
# <> is a shortcut for <>[] _when_ not combined with ||, ::, [:] or [::] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, ::
is not mentioned in the doc strings, no?
result = specifier.replace("::") | ||
.replace("[:]") | ||
.replace("<:>") | ||
.replace("<>") | ||
.replace("[]") | ||
.replace("||") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could use multiReplace
, which will be more faster. But it doesn't really matter for this purpose (the overhead of the actual printing far exceeds this of course).
# when T is SomeFloat: | ||
# result = t.map_inline((x.formatBiggestFloat(precision = precision)).len).max | ||
# else: | ||
# result = t.map_inline(($x).len).max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely.
## Describe the tensor in terms of its shape and type (in a "compact" way) | ||
## Only used for array-style printing | ||
# Most if not all tensors element types are part of the system or complex | ||
# modules so we can remove them from the type without must information loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# modules so we can remove them from the type without must information loss | |
# modules so we can remove them from the type without much information loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this typo in a commit taking care of all your comments.
|
||
proc prettyImpl*[T]( | ||
t: Tensor[T], precision: int, specifier: static string): string = | ||
## Non public implementation of the pretty function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Non public implementation of the pretty function | |
## Public implementation of the pretty function |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users are not expected to use the prettyImpl
function. It is "public" in the sense that it is exported from p_disp.nim
, but p_disp.nim
itself is not exported so in the end it should be non-public?
Awesome work and sorry again for the very long delay! |
This comes in 2 forms: 1. Add a version of the pretty function that takes a format string as its input instead of a precision value. Also add a new "showHeader" argument to the original pretty function. - This new version of pretty lets you specify the format string that must be used to display each element. It also adds some new tensor-specific format string "tokens" that are used to control how tensors are displayed (beyond the format of each element). See below for more details. 2. Add a formatValue procedure that takes a tensor as its first input. This makes it possible to control how tensors are displayed in format strings, in the exact same way as if you were using the new pretty(specifier) procedure. This new formatValue procedure takes all of the standard format specifiers, such as "f", "g", "+", etc. (and including, in nim 2.2 and above, the 'j' specifier for complex tensors) which are used to configure of each tensor element is displayed. In addition to those standard format specifiers you can use a few, tensor specific modifiers which can be combined to achieve different results: - "[:]": Display the tensor as if it were a nim "array". This makes it easy to use the representation of a tensor in your own code. No header is shown. - "<>[:]": Same as "[:]" but displays a header describing the tensor type and shape. - "[]": Same as "[:]" but displays the tensor in a single line. No header is shown. - "<>[:]": Same as "[]" but displays a header describing the tensor type and shape. - "||": "Pretty-print" the tensor _without_ a header. - "<>||": Same as "||" but displays a header describing the tensor type and shape. This is the default display mode. - "<>": When used on its own (i.e. not combined with "[:]", "[]" and "||" as shown above) it is equivalent to "<>[]" (i.e. single-line, nim-like representation with a header). Cannot wrap the element specifier. Notes: - The default format (i.e. when none of these tensor-specific tokens is used) is pretty printing with a header (i.e. "<>||"). - These tensor specific modifiers can placed at the start or at the end of the format specifier (e.g. "<>[:]06.2f", or "06.2f<>[:]"), or they can "wrap" the element specifier (e.g. "<>[:06.2f]"). - When wrapping you cannot use the "compact forms" of these specifiers (i.e. "<:>" and "<|>"). - When using the wrapping form of "[:]", start with "[:" and end with either "]" or ":]" (i.e. both "[:06.2f]" and "[:06.2f:]" are valid). - This version of this function does not have a `showHeader` argument because to disable the header you can simply select the right format specifier. Fixes after PR comments More fixes
33288f2
to
545491d
Compare
Thanks! 🥳 |
This comes in 2 forms: 1. Add a version of the pretty function that takes a format string as its input instead of a precision value. Also add a new "showHeader" argument to the original pretty function. - This new version of pretty lets you specify the format string that must be used to display each element. It also adds some new tensor-specific format string "tokens" that are used to control how tensors are displayed (beyond the format of each element). See below for more details. 2. Add a formatValue procedure that takes a tensor as its first input. This makes it possible to control how tensors are displayed in format strings, in the exact same way as if you were using the new pretty(specifier) procedure. This new formatValue procedure takes all of the standard format specifiers, such as "f", "g", "+", etc. (and including, in nim 2.2 and above, the 'j' specifier for complex tensors) which are used to configure of each tensor element is displayed. In addition to those standard format specifiers you can use a few, tensor specific modifiers which can be combined to achieve different results: - "[:]": Display the tensor as if it were a nim "array". This makes it easy to use the representation of a tensor in your own code. No header is shown. - "<>[:]": Same as "[:]" but displays a header describing the tensor type and shape. - "[]": Same as "[:]" but displays the tensor in a single line. No header is shown. - "<>[:]": Same as "[]" but displays a header describing the tensor type and shape. - "||": "Pretty-print" the tensor _without_ a header. - "<>||": Same as "||" but displays a header describing the tensor type and shape. This is the default display mode. - "<>": When used on its own (i.e. not combined with "[:]", "[]" and "||" as shown above) it is equivalent to "<>[]" (i.e. single-line, nim-like representation with a header). Cannot wrap the element specifier. Notes: - The default format (i.e. when none of these tensor-specific tokens is used) is pretty printing with a header (i.e. "<>||"). - These tensor specific modifiers can placed at the start or at the end of the format specifier (e.g. "<>[:]06.2f", or "06.2f<>[:]"), or they can "wrap" the element specifier (e.g. "<>[:06.2f]"). - When wrapping you cannot use the "compact forms" of these specifiers (i.e. "<:>" and "<|>"). - When using the wrapping form of "[:]", start with "[:" and end with either "]" or ":]" (i.e. both "[:06.2f]" and "[:06.2f:]" are valid). - This version of this function does not have a `showHeader` argument because to disable the header you can simply select the right format specifier. Fixes after PR comments More fixes
This comes in 2 forms:
The special, tensor-specific tokens added by this change are:
Note that these are only used to control how the tensors themselves are displayed as a whole and are removed before displaying the individual elements.