Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

human readable and random ids for messages by default #1892

Closed
samuelstroschein opened this issue Dec 14, 2023 · 26 comments
Closed

human readable and random ids for messages by default #1892

samuelstroschein opened this issue Dec 14, 2023 · 26 comments
Assignees
Labels
scope: inlang/sdk Related to source-code/sdk. type: feature New feature or request

Comments

@samuelstroschein
Copy link
Member

samuelstroschein commented Dec 14, 2023

Problem

  1. @inlang/editor needs a message id generation algorithm to add a "create a message" button
  2. devs choosing message ids leads to multiple problems like namespacing, renaming IDs and thereby breaking the relation to translations, or Programmatic Linting #1889

Proposal

Apps choose message IDs for users that are human-readable and memorizable but have no "meaning" by default.

blue_dot_map
car_sky_keyboard
phone_table_chocolate
...

Pros

  • already best practice for large projects because ids should (must!) have no meaning
  • better UX/DX because apps don't need to prompt users for ids
  • a wide class of bugs is eliminated like choosing paraglide incompatible ids
  • users can still search and memorize messages
    • "did you change blue_dot_map?"
    • "we have a missing translation for car_sky_keyboard"

Cons

  • maybe unexpected behavior for devs. I propose to test implement this and wait for user reactions.
// own function for tree-shakability 
import { generateMessageId } from "@inlang/sdk"

// checks the project for id conflicts
const id = generateMessageId({ project })

project.query.message.create({ id }) 
@maige-app maige-app bot added scope: inlang/sdk Related to source-code/sdk. type: feature New feature or request labels Dec 14, 2023
@samuelstroschein
Copy link
Member Author

samuelstroschein commented Dec 14, 2023

@inlang/editor @inlang/ide-extension @inlang/paraglide-js @martin-lysk good idea? The implementation cost is relatively simple

@samuelstroschein
Copy link
Member Author

samuelstroschein commented Dec 14, 2023

The idea is inspired by what3words.com. @felixhaeberle in the ide extension you can auto fill the generated message id. i expect most devs will just hit enter

@samuelstroschein
Copy link
Member Author

Lovely, a lib exists for this https://www.npmjs.com/package/human-id. three words have 15 million possibilities. 15 million possibilities should be enough for even the largest enterprise use cases. adding a forth word increases the possibilities manifold further

@samuelstroschein samuelstroschein changed the title human readable but random ids for messages human readable and random ids for messages by default Dec 14, 2023
@samuelstroschein samuelstroschein self-assigned this Dec 14, 2023
@NiklasBuchfink
Copy link
Member

NiklasBuchfink commented Dec 14, 2023

It depends on the developer's workflow and whether they like it. I only mention this because these could be possible thoughts:

  1. I'm building a modal for the user; why isn't everything prefixed with "modal_user_" since autocomplete can help me with that? (human readable ids are good for memorizing short-term and for autocompletion)
  2. Clean code says good variable names don't need explanation. We link content with these ids and without context inside the id, I don't know what is behind it (... unless I use the vs code extension and as we know, the problems begin when we start renaming ids)

An alias/comment/description/context field may be necessary. Again, it is something to fill in and find names.

@samuelstroschein
Copy link
Member Author

@NiklasBuchfink Clean code says good variable names don't need explanation. We link content with these ids and without context inside the id,

This is exactly the problem. Every large enterprise project states: Do not link message ids/keys to content. It breaks everywhere. Hence, my proposal to choose a default for inlang users that is human readable but has no meaning.

@martin-lysk
Copy link
Contributor

We could start with this approach as the message name and let people change it if we don't want the user to stuck in the "hmmm what would be a good name for that thing" loop - as an Id i am not convinced since it lacks some properties but i need to think on this a bit more

@samuelstroschein
Copy link
Member Author

samuelstroschein commented Dec 14, 2023

@martin-lysk as an Id i am not convinced since it lacks some properties but i need to think on this a bit more

What properties are missing?

  • unique ✅
  • immutable ✅ (if we disallow ID renames which we will do once "keys" are introduced as a concept)
  • human-readable ✅ (avoids the need for "keys" altogether for fresh projects)

@openscript
Copy link
Contributor

openscript commented Dec 14, 2023

I like the idea very much! As a dev, I sometimes recall the id to reference messages I repeatedly use. This would become harder if the ids don't reflect the message. If the IDE helps me to select a messages id (maybe with fuzzy finding), even better, than trying to recall ids and coming up with some structure for the id names.

@LorisSigrist
Copy link
Collaborator

I have to say I was initially very sceptical of using random IDs, however, the more I think about it the more I come around to it. Many developers will likely have the same experience.

I do have one concern.
Devs won't be ok with completely random IDs, unless we provide an alternate way of finding messages. I really like @openscript's suggestion of fuzz-finding messages by content.

(Perhaps the IDE extension could kick in after someone types m., treating any text afterwards as a search query and suggest messages)

Prefilling message-id fields with randomly generated ids would nudge developers towards the optimal workflow, but without forcing them. They can still use meaningful ids if they want. If we provide the appropriate IDE tooling, devs will come around to it. That's the Tailwind effect.

@martin-lysk
Copy link
Contributor

The concern I have here: One should not reuse messages in different contexts.
Think about a button with a label in a delete modal that a user should confirm with the very generic message "Ok. One creates a message with the id "blue_dot_map". Cool we have now one message with a label "Ok".

Now the next feature is developed: a screen with an information about a new Feature - again the initial iteration just contains "Ok" as a dismiss button.

Fuzzy search will bring up the "blue_dot_map" button if we choose it - two buttons in complete different context reference the same Ok message. This is more likely to happen with this Approach since developer we loose the information about where the message should be used, also such a case would be hard to check in a code review.

updateLoginScreen() {
   button.setText($blue_dot_map)
}

vs.

updateLoginScreen() {
   button.setText($new_feature_dismiss)
}

I see the point that developer should not struggle with giving missing messages a meaningfull name so.
A good article about naming and idea behind this:

https://lokalise.com/blog/translation-keys-naming-and-organizing/

I think messages should have an id (immutable / unique) and a name maybe even aliases

@felixhaeberle
Copy link
Contributor

felixhaeberle commented Dec 15, 2023

I think messages should have an id (immutable / unique) and a name maybe even aliases

Yes. This is the way to go.

Treat the "name" as any other (meta) information according to a message, like a category (modal) or department (marketing).


What's really important for the dev is the ID, and we should simply design a great UX in the IDE extension to search by any of the meta information or unique id & provide great auto-filling / discovering.

Additionally: Very high incentive to then install the IDE extension because without, you are stuck with id gibberish.

IDE extension: It's the same with Git. Near nobody uses command-line only for Git anymore when you have built-in Git functionality with a nice GUI UX in your IDE. And Git extensions are skyrocketing in installs.

But this doesn't has to be the case ultimately, because paraglide could also offer resolving from key OR from id. Duplications in key names could be found by a lint rule. Tree-shaking could also be preserved.


  1. I'm building a modal for the user; why isn't everything prefixed with "modal_user_" since autocomplete can help me with that? (human readable ids are good for memorizing short-term and for autocompletion)
  2. Clean code says good variable names don't need explanation. We link content with these ids and without context inside the id, I don't know what is behind it (... unless I use the vs code extension and as we know, the problems begin when we start renaming ids)

Both can be solved through either resolve from key or from id, or with a great UX in conjunction with the IDE extension. Let's face it – the problem is complex & we need tooling to make it better.

In the end, looking at big enterprises, no other implementation besides the unique id will scale.

@samuelstroschein
Copy link
Member Author

samuelstroschein commented Dec 15, 2023

Let's conclude the discussion 📺 watch the LOOM

Proposal

Introduce random, human-readable IDs by default.

  • unique ✅ (three words have a minimum of 15 million unique ids which can be extended with more words)
  • immutable ✅ (has no meaning -> will not be renamed)
  • human-readable ✅ (eliminates the need to come up with names and naming conventions!)
blue_dot_map
car_sky_keyboard
phone_table_chocolate
...

Why

  1. Random IDs are a necessity for any large project and any app that is non-dev facing.

  2. The only question is whether we introduce human or non-human readable IDs. If we introduce random human-readable IDs, we eliminate the need to think about and implement name logic for most inlang projects.

  3. Thinking about naming is just wrong. If inlang users, and everyone in an organization, need to agree on naming conventions and read overwhelming articles like this, we won't make internationalization simple (enough).

  4. Inlang's ecosystem will provide context through pre-rendering UIs or similar mechanisms in the future; pushing meaning into a message ID/name is redundant.

Additional notes

@martin-lysk
Copy link
Contributor

will be part of #1844

@ferdnyc
Copy link

ferdnyc commented Jan 7, 2024

A somewhat devil's-advocate reaction follows. (IOW, I'm not trying to dispute this proposal or argue against it. Consider this as coming from a place of neutrality -- neither for nor against the idea.)

@samuelstroschein

  • already best practice for large projects because ids should (must!) have no meaning

[citation needed]?

@martin-lysk shared the "overwhelming article" (...? it's a 5-minute, large-font read), which contains arguments/advice in direct opposition to what's proposed here. So it feels like there should at least be some sort of supporting evidence on the pro side, as well.

  • better UX/DX because apps don't need to prompt users for ids

That's fair, and a good argument for at least some sort of automatically-generated ID scheme.

  • a wide class of bugs is eliminated like choosing paraglide incompatible ids

Surely a sufficiently good IDE can prevent that even when IDs are user-chosen, though? Sort of conflating unrelated things, here -- again, devil's advocate.

"Ensure users cannot choose invalid IDs" is solvable in more ways than "choose IDs for the user", isn't it? Even if the latter does technically avoid the former problem, in a swatting-a-fly-with-a-sledgehammer sort of way.

  • users can still search and memorize messages

    • "did you change blue_dot_map?"
    • "we have a missing translation for car_sky_keyboard"

They can, but is there any empirical data indicating that they will? Or is that merely a hypothetical scenario?

If a piece of code has a message ID blue_dot_map that needs to be updated, what's the real-world data (or even anecdata) on how users will discuss that message?

Are they more likely to say:

Did you change blue_dot_map?

or will they ignore randomly-chosen IDs and resort to contextual descriptions, like:

Did you change the translation for the export format label in the render dialog?

@samuelstroschein
Copy link
Member Author

Hey @ferdnyc,

I am replying to address your concern, but please do not reply. This discussion is closed. We formed a decision. Re-opening this discussion would take resources from other tasks.

Before I start, It is crucial to understand that anything we implement at inlang needs to work across an organization and, therefore, across different teams with different needs. I assume that you are coming from a dev (only) perspective, which fails inlang's mission to make globalization of software simple(r).

[...] shared the "overwhelming article" (...? it's a 5-minute, large-font read), which contains arguments/advice in direct opposition to what's proposed here.

The article is overwhelming because this 5-minute read is part of hundreds if not thousands, of hours that larger teams will discuss naming conventions. Wasted hours because a consensus will not emerge. Rules like "describe in the ID where messages are used" will be ignored, will differ between teams, and sometimes can't even be established.

For example, we know that users want to create messages via Fink. They have no context to create a message according to a "provide context rule". And neither might a system that automatically creates messages (think of automatic extraction).

Surely a sufficiently good IDE can prevent that even when IDs are user-chosen, though?

Every app in the ecosystem (designers, translators, marketing, ...) would need this validation. Yes, we could add a mechanism to the linting system, but why lint something that we can (likely) avoid altogether by using human readable IDs instead of random hashes?

That's fair, and a good argument for at least some sort of automatically-generated ID scheme.

You came to the button of the proposal here. This discussion is not about preventing you from aliasing messages, merely that our ID system is human readable instead of random hashes. We believe human-readable ids will eliminate the need for naming discussions.

They can, but is there any empirical data indicating that they will? Or is that merely a hypothetical scenario?

Experience we have in i18n software. Naming conventions are rotten because they don't work for i18n, where different teams need to agree on a convention.

or will they ignore randomly-chosen IDs and resort to contextual descriptions, like:

Nothing prevents that. In that moment, we achieved our goal. The ID of a message became meaningless, and naming discussions are eliminated :)

@martin-lysk
Copy link
Contributor

martin-lysk commented Jan 12, 2024

@opral/inlang-cli @opral/inlang-cli @opral/inlang-fink @opral/inlang-ide-extension

Please check the spreadsheet of terms we plan to use for human id's (i will share the link in discord).

The table has a total of 4 tabs with different "adjectives", "nouns", "adverbs", "verbs".

Please take 30 minutes to check the current words for.
Uniqueness
Bad example:
Live vs. life

Pronounceability
Bad example:
Draught

politically incorrect hurtful or negatively coannotated words
Bad example:
fuck, master, bitch,

spellings in British vs American English
Bad example:
energize vs energise

Just delete the ones where you see problems. If you unsure of one of those properties - its a reason enought to drop it - no discussion needed!
Add good new words - in the end we need 256 words per category to get enought ids out of the combination.

Pleas only change column a. Column c and d will provide you with example ids including the term defined in a #excel_magic

Please react to this comment with a rocked if you are done 🚀

@samuelstroschein
Copy link
Member Author

@martin-lysk i pressed 🚀 because I thought ppl were excited. i doubt that people went through the spreadsheet https://docs.google.com/spreadsheets/d/1AsAgZi9V8R_5xxSK8-spp0mkLojlT-0MFVozcF0MZ6I.

going through it now

@NiklasBuchfink
Copy link
Member

My notes:

  • we should add Jurgen as an Easter egg too
  • we got fink and finch, not sure if this is confusing somehow. Finch is the English translation of the German Fink
  • I see the awful-niklas-arrogant-mix incoming, but I'm okay with that 😄

Is it correct that fink can be translated with:

  • a betrayer, traitor, snitch
  • an unpleasant or contemptible person
  • a person who informs on people to the authorities

@samuelstroschein
Copy link
Member Author

we got fink and finch, not sure if this is confusing somehow. Finch is the English translation of the German Fink

just change it

@jldec jldec self-assigned this Jan 30, 2024
@jldec
Copy link
Contributor

jldec commented Jan 30, 2024

I'll make a pass on this today since there's a cost to making changes to the word lists e.g. impacting mocks / tests.

  • additional scan to remove unwanted words
  • fix weird adverbs like "dai" (used to be daily)
  • remove duplicates (fly)

@samuelstroschein
Copy link
Member Author

@GauBen
Copy link

GauBen commented Dec 24, 2024

Hey folks!

I have some feedback on this feature. First of all I admit it's a brilliant idea to have random keys. No more bike-shedding on key names for "Close" vs "close" vs "close" (as not far) and stuff like that.

I have some suggestions to make it even better, at least for my use cases.

My first criticism is that the token names are long! Often longer than the few words they replace. That plus the hint takes a lot of screen estate.

A second criticism, more related to my workflow, is that token names are of random length. It makes editing the JSON files by hand more exhausting than it should because the start of strings are not aligned.

I would solve these two issues at once by embracing entropy: short and fully random identifiers. 8 random consonants have more entropy than the current 4 words. I'd love to have the extension autogenerate a short random identifier, whose length would be configurable (e.g. for a small app, 4 letters would be more than enough). What do you think?

@samuelstroschein
Copy link
Member Author

Hi @GauBen,

We would love shorter random keys!

Problem: Distributed system. In larger projects, 1000's of people will create messages in different branches. Each ID generation for the message must be unique. Otherwise, we need to handle ID conflicts (which will be a pain).

If you develop a random human-readable ID system that will not lead to ID conflicts in large projects, we will quickly merge a PR!


It makes editing the JSON files by hand more exhausting than it should because the start of strings are not aligned.

Have you tried Sherlock or Fink?

We believe that editing JSON files by hand will fade out once the tools become better, and rather invest time into making the tools better than trying to fix manual editing by hand. If you haven't used Sherlock or Fink, I'd like to know why.

@GauBen
Copy link

GauBen commented Dec 25, 2024

Thanks for your quick response! 🎅

How important is "human-readable"? Current identifiers are in English, which are not nearly as universal. Offering to generate random identifiers (like xhmpsdtb) would offer the same conflict safety (20^8>256^4) at a much smaller size for organizations that do not need to say the keys out loud.

I'm absolutely not advocating to remove the current id system, it's great for many use-cases, I'm suggesting adding a new generator.

Yes I use Fink, it's a great tool! For small updates (i.e. when the wording is too long for the UI) I favor updating the JSON files directly, the feedback loop is shorter.

@samuelstroschein
Copy link
Member Author

How important is "human-readable"?

Important. Human readable enables saying "please change human-blue-moon" or searching for human-blue-moon.

Current problem might be that we have 4 words instead of 3. We did that because the chances for 3 words were too high for a conflict. Solvable if we increase the word pool?

For small updates (i.e. when the wording is too long for the UI) I favor updating the JSON files directly, the feedback loop is shorter.

How could we improve Sherlock to remove the desire to manually edit JSON files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: inlang/sdk Related to source-code/sdk. type: feature New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants