-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF8 bytes count support as columnKind? #466
Comments
Thank you for the suggestion! @michaelcfanning FYI |
+1, good suggestion. |
Next steps, investigate XCode + VS Code behavior for managing column data. |
In a SARIF-2.1.0 region object, the byteOffset property counts from the start of the artifact, not from the start of the line indicated by the startLine property. So it can be used for UTF-8 bytes in principle but I suspect it is not convenient for text-oriented tools. |
Document location for issue:§3.14 run object haya14busa: "I propose to add utf8CodeUnits or bytes (in UTF8 or bytes in the text encoding of the file) as Propose to add utf8CodeUnits as a possible value to
|
Can or should sarif support (UTF-8) bytes count support as columnKind?
utf16CodeUnits
seems to be handy for programming languages that use UTF 16 as default string representation, but many other languages use UTF-8 by default and the de-fact standard encoding of text files is UTF-8 these days.SARIF supports
unicodeCodePoints
as alternative but I think it's not handy neither for tools that output sarif nor tools which consume sarif format. To supportunicodeCodePoints
, tools need extra encode/decode steps.For example, consider supporting replacements as consumers. If it's (UTF8) bytes count based number, it's simple to support replacements because it can just replace the content of the byte in the specified range.
If tools need to use
unicodeCodePoints
, it needs to read and encode the content even before the specified range to get the correct number.Actual Examples
Go
Go uses byte count for AST node's position. https://golang.org/pkg/go/token/#Position
This is used for Go standard analysis package (https://pkg.go.dev/golang.org/x/tools/go/analysis) as well.
Vim
Vim uses the byte index as a column position (See
:help col()
for example).I don't research other tools, but I believe there are many other tools out there that use UTF8 byte count as column.
Proposal
I propose to add
utf8CodeUnits
orbytes
(in UTF8 or bytes in the text encoding of the file) ascolumnKind
.The text was updated successfully, but these errors were encountered: