docs: ADR for string as alias or type #3

tristanmenzel · 2024-06-05T20:20:52Z

No description provided.

architecture-decisions/2024-06-05_string-alias-or-type.md

Loedn

In my opinion option 2 is the best path forward, not hiding what's underneath makes sure the developer is conscious of what they're doing.
In the future we could tackle this level of abstraction at a library level.

joe-p · 2024-06-07T16:13:01Z

In my opinion option 2 is the best path forward, not hiding what's underneath makes sure the developer is conscious of what they're doing.

I feel like option 2 is hiding what's underneath. In this case Bytes and Str and the same exact data and have the same exact capabilities.

A lot of the strengths of option 2 make assumptions about string opcodes available in the AVM, which I think is rather unlikely. It's not clear to me when you'd want to do this on-chain and I think any use case would be an edge case rather than the norm. I look at it similar to the AVM JSON opcodes. json_ref exists and people can use it, but parsing JSON data is extremely expensive and it doesn't make sense to do it on-chain most of the time.

This is why I think option 1 makes sense and if we ever want to have a string class we make it available and people can construct it from bytes, but it should not be the "default" class for strings because the operations will be expensive (even if implemented as an opcode) and should be avoided by most developers.

An example for why it'd be nice to have an alias rather than a separate class

Option 1

hasUnitName(unitName: str, asset: Asset) {
  return asset.unitName === unitName
}

Option 2

hasUnitName(unitName: str, asset: Asset) {
  return asset.unitName.bytes === unitName
}

A developer naturally thinks of unit names as stings, but it would be erroneous to represent them as such because it's not enforced. Thus they might be confused why they need to .bytes to compare with the unitName string but not other strings.

I know this stuff seems inconsequential but similar things have been a point of friction in both PyTeal/Beaker and Algorand Python

achidlow · 2024-06-13T04:17:15Z

In this case Bytes and Str and the same exact data and have the same exact capabilities.

You can't get the length of a UTF-8 string, let alone index it, on the AVM without it being an O(N) operation in terms of op codes.

Logically, a UTF-8 string of 3 characters(*) has a length of 3, but the number of bytes to encode that can be anywhere from 3 to 12, and there's no simple way to compute that without iterating.

(*) even this is simplifying things by assuming each character means code-point, rather than grapheme.

joe-p · 2024-07-03T18:32:10Z

You can't get the length of a UTF-8 string, let alone index it, on the AVM without it being an O(N) operation in terms of op codes.

Logically, a UTF-8 string of 3 characters(*) has a length of 3, but the number of bytes to encode that can be anywhere from 3 to 12, and there's no simple way to compute that without iterating.

Yes that is exactly my point. We won't be implementing character-based functions, thus there is no functional difference between the bytes class and the string class regardless of which option we choose.

Also this ADR seems to be duplicating 2024-05-21_primitive-bytes-and-strings.md, no?

tristanmenzel · 2024-07-03T18:43:01Z

Also this ADR seems to be duplicating 2024-05-21_primitive-bytes-and-strings.md, no?

There's a bit of crossover, I think 2024-05-21_primitive-bytes-and-strings.md focused largely on a solution for bytes. If string is not a separate type then it would follow the solution selected for bytes. The proposal here is that if string is its own type, can we choose something more appropriate than what we choose for bytes. I think using native strings makes a lot of sense here as long as we don't need to modify the native prototype.

joe-p · 2024-07-03T18:49:08Z

There's a bit of crossover, I think 2024-05-21_primitive-bytes-and-strings.md focused largely on a solution for bytes

Then we should modify that ADR to only focus on bytes. We basically now have two ADRs for the same thing. Some of the information is in that ADR and some in this and it's rather confusing.

I think using native strings makes a lot of sense here as long as we don't need to modify the native prototype.

@robdmoore, @Loedn, and myself discussed this and reached a general consensus that a modified prototype is acceptable. We can simply not support the character-based functions and add additional functions for byte-based operations. Even byte-based functions are also seldomly used on-chain with strings.

…ng a separate ADR to document that choice

robdmoore · 2024-07-04T16:53:50Z

@robdmoore, @Loedn, and myself discussed this and reached a general consensus that a modified prototype is acceptable. We can simply not support the character-based functions and add additional functions for byte-based operations. Even byte-based functions are also seldomly used on-chain with strings.

Tristan was walking me through yesterday and showed what it looked like to have strings with no prototype modification. I'm not opposed to prototype modification, but it's definitely fair to consider having it vs not needing it given the downside of prototype modification is you need to have a side-effect import.

Given we believe people seldomly do byte stuff with strings, the extra (fairly idiomatic JavaScript) Bytes(myString) syntax to convert the string to bytes to do bytes stuff might be considered acceptable, especially if BytesCompat allows for strings to be passed in to things like asset name assignment and log values etc. so the cases where you'd need to do the conversion are quite limited.

It's also really easy to add prototype stuff after initial implementation too so we can see how it feels?

robdmoore reviewed Jun 6, 2024

View reviewed changes

architecture-decisions/2024-06-05_string-alias-or-type.md Outdated Show resolved Hide resolved

robdmoore reviewed Jun 6, 2024

View reviewed changes

architecture-decisions/2024-06-05_string-alias-or-type.md Outdated Show resolved Hide resolved

Loedn reviewed Jun 6, 2024

View reviewed changes

tristanmenzel added 3 commits July 3, 2024 10:24

docs: ADR for string as alias or type

3022b5d

docs: Highlight the option which tealscript uses

664c74f

docs: Include option for using native strings

e5e54b5

tristanmenzel force-pushed the string-alias-or-type branch from 1a35a59 to e5e54b5 Compare July 3, 2024 17:55

docs: Remove references to string from the bytes ADR in favour of usi…

24b3642

…ng a separate ADR to document that choice

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: ADR for string as alias or type #3

docs: ADR for string as alias or type #3

tristanmenzel commented Jun 5, 2024

Loedn left a comment

joe-p commented Jun 7, 2024 •

edited

Loading

achidlow commented Jun 13, 2024

joe-p commented Jul 3, 2024

tristanmenzel commented Jul 3, 2024

joe-p commented Jul 3, 2024

robdmoore commented Jul 4, 2024

docs: ADR for string as alias or type #3

Are you sure you want to change the base?

docs: ADR for string as alias or type #3

Conversation

tristanmenzel commented Jun 5, 2024

Loedn left a comment

Choose a reason for hiding this comment

joe-p commented Jun 7, 2024 • edited Loading

Option 1

Option 2

achidlow commented Jun 13, 2024

joe-p commented Jul 3, 2024

tristanmenzel commented Jul 3, 2024

joe-p commented Jul 3, 2024

robdmoore commented Jul 4, 2024

joe-p commented Jun 7, 2024 •

edited

Loading