Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR for string as alias or type #3

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tristanmenzel
Copy link
Collaborator

No description provided.

Copy link
Contributor

@Loedn Loedn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion option 2 is the best path forward, not hiding what's underneath makes sure the developer is conscious of what they're doing.
In the future we could tackle this level of abstraction at a library level.

@joe-p
Copy link
Contributor

joe-p commented Jun 7, 2024

In my opinion option 2 is the best path forward, not hiding what's underneath makes sure the developer is conscious of what they're doing.

I feel like option 2 is hiding what's underneath. In this case Bytes and Str and the same exact data and have the same exact capabilities.

A lot of the strengths of option 2 make assumptions about string opcodes available in the AVM, which I think is rather unlikely. It's not clear to me when you'd want to do this on-chain and I think any use case would be an edge case rather than the norm. I look at it similar to the AVM JSON opcodes. json_ref exists and people can use it, but parsing JSON data is extremely expensive and it doesn't make sense to do it on-chain most of the time.

This is why I think option 1 makes sense and if we ever want to have a string class we make it available and people can construct it from bytes, but it should not be the "default" class for strings because the operations will be expensive (even if implemented as an opcode) and should be avoided by most developers.

An example for why it'd be nice to have an alias rather than a separate class

Option 1

hasUnitName(unitName: str, asset: Asset) {
  return asset.unitName === unitName
}

Option 2

hasUnitName(unitName: str, asset: Asset) {
  return asset.unitName.bytes === unitName
}

A developer naturally thinks of unit names as stings, but it would be erroneous to represent them as such because it's not enforced. Thus they might be confused why they need to .bytes to compare with the unitName string but not other strings.

I know this stuff seems inconsequential but similar things have been a point of friction in both PyTeal/Beaker and Algorand Python

@achidlow
Copy link

In this case Bytes and Str and the same exact data and have the same exact capabilities.

You can't get the length of a UTF-8 string, let alone index it, on the AVM without it being an O(N) operation in terms of op codes.

Logically, a UTF-8 string of 3 characters(*) has a length of 3, but the number of bytes to encode that can be anywhere from 3 to 12, and there's no simple way to compute that without iterating.

(*) even this is simplifying things by assuming each character means code-point, rather than grapheme.

@joe-p
Copy link
Contributor

joe-p commented Jul 3, 2024

You can't get the length of a UTF-8 string, let alone index it, on the AVM without it being an O(N) operation in terms of op codes.

Logically, a UTF-8 string of 3 characters(*) has a length of 3, but the number of bytes to encode that can be anywhere from 3 to 12, and there's no simple way to compute that without iterating.

Yes that is exactly my point. We won't be implementing character-based functions, thus there is no functional difference between the bytes class and the string class regardless of which option we choose.

Also this ADR seems to be duplicating 2024-05-21_primitive-bytes-and-strings.md, no?

@tristanmenzel
Copy link
Collaborator Author

Also this ADR seems to be duplicating 2024-05-21_primitive-bytes-and-strings.md, no?

There's a bit of crossover, I think 2024-05-21_primitive-bytes-and-strings.md focused largely on a solution for bytes. If string is not a separate type then it would follow the solution selected for bytes. The proposal here is that if string is its own type, can we choose something more appropriate than what we choose for bytes. I think using native strings makes a lot of sense here as long as we don't need to modify the native prototype.

@joe-p
Copy link
Contributor

joe-p commented Jul 3, 2024

There's a bit of crossover, I think 2024-05-21_primitive-bytes-and-strings.md focused largely on a solution for bytes

Then we should modify that ADR to only focus on bytes. We basically now have two ADRs for the same thing. Some of the information is in that ADR and some in this and it's rather confusing.

I think using native strings makes a lot of sense here as long as we don't need to modify the native prototype.

@robdmoore, @Loedn, and myself discussed this and reached a general consensus that a modified prototype is acceptable. We can simply not support the character-based functions and add additional functions for byte-based operations. Even byte-based functions are also seldomly used on-chain with strings.

@robdmoore
Copy link
Contributor

@robdmoore, @Loedn, and myself discussed this and reached a general consensus that a modified prototype is acceptable. We can simply not support the character-based functions and add additional functions for byte-based operations. Even byte-based functions are also seldomly used on-chain with strings.

Tristan was walking me through yesterday and showed what it looked like to have strings with no prototype modification. I'm not opposed to prototype modification, but it's definitely fair to consider having it vs not needing it given the downside of prototype modification is you need to have a side-effect import.

Given we believe people seldomly do byte stuff with strings, the extra (fairly idiomatic JavaScript) Bytes(myString) syntax to convert the string to bytes to do bytes stuff might be considered acceptable, especially if BytesCompat allows for strings to be passed in to things like asset name assignment and log values etc. so the cases where you'd need to do the conversion are quite limited.

It's also really easy to add prototype stuff after initial implementation too so we can see how it feels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants