human readable and random ids for messages by default #1892

samuelstroschein · 2023-12-14T16:49:41Z

Problem

@inlang/editor needs a message id generation algorithm to add a "create a message" button
devs choosing message ids leads to multiple problems like namespacing, renaming IDs and thereby breaking the relation to translations, or Programmatic Linting #1889

Proposal

Apps choose message IDs for users that are human-readable and memorizable but have no "meaning" by default.

blue_dot_map
car_sky_keyboard
phone_table_chocolate
...

Pros

already best practice for large projects because ids should (must!) have no meaning
better UX/DX because apps don't need to prompt users for ids
a wide class of bugs is eliminated like choosing paraglide incompatible ids
users can still search and memorize messages
- "did you change blue_dot_map?"
- "we have a missing translation for car_sky_keyboard"

Cons

maybe unexpected behavior for devs. I propose to test implement this and wait for user reactions.

// own function for tree-shakability 
import { generateMessageId } from "@inlang/sdk"

// checks the project for id conflicts
const id = generateMessageId({ project })

project.query.message.create({ id })

The text was updated successfully, but these errors were encountered:

samuelstroschein · 2023-12-14T16:50:04Z

@inlang/editor @inlang/ide-extension @inlang/paraglide-js @martin-lysk good idea? The implementation cost is relatively simple

samuelstroschein · 2023-12-14T16:53:56Z

The idea is inspired by what3words.com. @felixhaeberle in the ide extension you can auto fill the generated message id. i expect most devs will just hit enter

samuelstroschein · 2023-12-14T17:00:29Z

Lovely, a lib exists for this https://www.npmjs.com/package/human-id. three words have 15 million possibilities. 15 million possibilities should be enough for even the largest enterprise use cases. adding a forth word increases the possibilities manifold further

NiklasBuchfink · 2023-12-14T17:09:20Z

It depends on the developer's workflow and whether they like it. I only mention this because these could be possible thoughts:

I'm building a modal for the user; why isn't everything prefixed with "modal_user_" since autocomplete can help me with that? (human readable ids are good for memorizing short-term and for autocompletion)
Clean code says good variable names don't need explanation. We link content with these ids and without context inside the id, I don't know what is behind it (... unless I use the vs code extension and as we know, the problems begin when we start renaming ids)

An alias/comment/description/context field may be necessary. Again, it is something to fill in and find names.

samuelstroschein · 2023-12-14T17:11:31Z

@NiklasBuchfink Clean code says good variable names don't need explanation. We link content with these ids and without context inside the id,

This is exactly the problem. Every large enterprise project states: Do not link message ids/keys to content. It breaks everywhere. Hence, my proposal to choose a default for inlang users that is human readable but has no meaning.

martin-lysk · 2023-12-14T19:56:59Z

We could start with this approach as the message name and let people change it if we don't want the user to stuck in the "hmmm what would be a good name for that thing" loop - as an Id i am not convinced since it lacks some properties but i need to think on this a bit more

samuelstroschein · 2023-12-14T20:22:55Z

@martin-lysk as an Id i am not convinced since it lacks some properties but i need to think on this a bit more

What properties are missing?

unique ✅
immutable ✅ (if we disallow ID renames which we will do once "keys" are introduced as a concept)
human-readable ✅ (avoids the need for "keys" altogether for fresh projects)

openscript · 2023-12-14T20:52:42Z

I like the idea very much! As a dev, I sometimes recall the id to reference messages I repeatedly use. This would become harder if the ids don't reflect the message. If the IDE helps me to select a messages id (maybe with fuzzy finding), even better, than trying to recall ids and coming up with some structure for the id names.

LorisSigrist · 2023-12-15T08:25:40Z

I have to say I was initially very sceptical of using random IDs, however, the more I think about it the more I come around to it. Many developers will likely have the same experience.

I do have one concern.
Devs won't be ok with completely random IDs, unless we provide an alternate way of finding messages. I really like @openscript's suggestion of fuzz-finding messages by content.

(Perhaps the IDE extension could kick in after someone types m., treating any text afterwards as a search query and suggest messages)

Prefilling message-id fields with randomly generated ids would nudge developers towards the optimal workflow, but without forcing them. They can still use meaningful ids if they want. If we provide the appropriate IDE tooling, devs will come around to it. That's the Tailwind effect.

martin-lysk · 2023-12-15T09:27:34Z

The concern I have here: One should not reuse messages in different contexts.
Think about a button with a label in a delete modal that a user should confirm with the very generic message "Ok. One creates a message with the id "blue_dot_map". Cool we have now one message with a label "Ok".

Now the next feature is developed: a screen with an information about a new Feature - again the initial iteration just contains "Ok" as a dismiss button.

Fuzzy search will bring up the "blue_dot_map" button if we choose it - two buttons in complete different context reference the same Ok message. This is more likely to happen with this Approach since developer we loose the information about where the message should be used, also such a case would be hard to check in a code review.

updateLoginScreen() {
   button.setText($blue_dot_map)
}

vs.

updateLoginScreen() {
   button.setText($new_feature_dismiss)
}

I see the point that developer should not struggle with giving missing messages a meaningfull name so.
A good article about naming and idea behind this:

https://lokalise.com/blog/translation-keys-naming-and-organizing/

I think messages should have an id (immutable / unique) and a name maybe even aliases

felixhaeberle · 2023-12-15T11:35:43Z

I think messages should have an id (immutable / unique) and a name maybe even aliases

Yes. This is the way to go.

Treat the "name" as any other (meta) information according to a message, like a category (modal) or department (marketing).

What's really important for the dev is the ID, and we should simply design a great UX in the IDE extension to search by any of the meta information or unique id & provide great auto-filling / discovering.

Additionally: Very high incentive to then install the IDE extension because without, you are stuck with id gibberish.

IDE extension: It's the same with Git. Near nobody uses command-line only for Git anymore when you have built-in Git functionality with a nice GUI UX in your IDE. And Git extensions are skyrocketing in installs.

But this doesn't has to be the case ultimately, because paraglide could also offer resolving from key OR from id. Duplications in key names could be found by a lint rule. Tree-shaking could also be preserved.

I'm building a modal for the user; why isn't everything prefixed with "modal_user_" since autocomplete can help me with that? (human readable ids are good for memorizing short-term and for autocompletion)

Clean code says good variable names don't need explanation. We link content with these ids and without context inside the id, I don't know what is behind it (... unless I use the vs code extension and as we know, the problems begin when we start renaming ids)

Both can be solved through either resolve from key or from id, or with a great UX in conjunction with the IDE extension. Let's face it – the problem is complex & we need tooling to make it better.

In the end, looking at big enterprises, no other implementation besides the unique id will scale.

samuelstroschein · 2023-12-15T15:13:03Z

Let's conclude the discussion 📺 watch the LOOM

Proposal

Introduce random, human-readable IDs by default.

unique ✅ (three words have a minimum of 15 million unique ids which can be extended with more words)
immutable ✅ (has no meaning -> will not be renamed)
human-readable ✅ (eliminates the need to come up with names and naming conventions!)

blue_dot_map
car_sky_keyboard
phone_table_chocolate
...

Why

Random IDs are a necessity for any large project and any app that is non-dev facing.
The only question is whether we introduce human or non-human readable IDs. If we introduce random human-readable IDs, we eliminate the need to think about and implement name logic for most inlang projects.
Thinking about naming is just wrong. If inlang users, and everyone in an organization, need to agree on naming conventions and read overwhelming articles like this, we won't make internationalization simple (enough).
Inlang's ecosystem will provide context through pre-rendering UIs or similar mechanisms in the future; pushing meaning into a message ID/name is redundant.

Additional notes

SEARCH: this is a follow-up issue. Fuzzy stuff or not is not important atm. We will see what users request.
https://discord.com/channels/897438559458430986/1185239478172909568/1185267189624873190

martin-lysk · 2023-12-20T00:40:54Z

will be part of #1844

martin-lysk · 2023-12-21T00:36:50Z

Some inspirations for word dicts

https://blog.asana.com/2011/09/6-sad-squid-snuggle-softly/
https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go#L131
https://github.com/PerWiklander/IdentifierSentence/blob/master/src/main/java/biz/wiklander/tools/IdentifierSentence.java
https://github.com/EmpowerCode/human-readable-ids.js

ferdnyc · 2024-01-07T02:27:42Z

A somewhat devil's-advocate reaction follows. (IOW, I'm not trying to dispute this proposal or argue against it. Consider this as coming from a place of neutrality -- neither for nor against the idea.)

@samuelstroschein

already best practice for large projects because ids should (must!) have no meaning

[citation needed]?

@martin-lysk shared the "overwhelming article" (...? it's a 5-minute, large-font read), which contains arguments/advice in direct opposition to what's proposed here. So it feels like there should at least be some sort of supporting evidence on the pro side, as well.

better UX/DX because apps don't need to prompt users for ids

That's fair, and a good argument for at least some sort of automatically-generated ID scheme.

a wide class of bugs is eliminated like choosing paraglide incompatible ids

Surely a sufficiently good IDE can prevent that even when IDs are user-chosen, though? Sort of conflating unrelated things, here -- again, devil's advocate.

"Ensure users cannot choose invalid IDs" is solvable in more ways than "choose IDs for the user", isn't it? Even if the latter does technically avoid the former problem, in a swatting-a-fly-with-a-sledgehammer sort of way.

users can still search and memorize messages

"did you change blue_dot_map?"

"we have a missing translation for car_sky_keyboard"

They can, but is there any empirical data indicating that they will? Or is that merely a hypothetical scenario?

If a piece of code has a message ID blue_dot_map that needs to be updated, what's the real-world data (or even anecdata) on how users will discuss that message?

Are they more likely to say:

Did you change blue_dot_map?

or will they ignore randomly-chosen IDs and resort to contextual descriptions, like:

Did you change the translation for the export format label in the render dialog?

samuelstroschein · 2024-01-07T15:00:39Z

Hey @ferdnyc,

I am replying to address your concern, but please do not reply. This discussion is closed. We formed a decision. Re-opening this discussion would take resources from other tasks.

Before I start, It is crucial to understand that anything we implement at inlang needs to work across an organization and, therefore, across different teams with different needs. I assume that you are coming from a dev (only) perspective, which fails inlang's mission to make globalization of software simple(r).

[...] shared the "overwhelming article" (...? it's a 5-minute, large-font read), which contains arguments/advice in direct opposition to what's proposed here.

The article is overwhelming because this 5-minute read is part of hundreds if not thousands, of hours that larger teams will discuss naming conventions. Wasted hours because a consensus will not emerge. Rules like "describe in the ID where messages are used" will be ignored, will differ between teams, and sometimes can't even be established.

For example, we know that users want to create messages via Fink. They have no context to create a message according to a "provide context rule". And neither might a system that automatically creates messages (think of automatic extraction).

Surely a sufficiently good IDE can prevent that even when IDs are user-chosen, though?

Every app in the ecosystem (designers, translators, marketing, ...) would need this validation. Yes, we could add a mechanism to the linting system, but why lint something that we can (likely) avoid altogether by using human readable IDs instead of random hashes?

That's fair, and a good argument for at least some sort of automatically-generated ID scheme.

You came to the button of the proposal here. This discussion is not about preventing you from aliasing messages, merely that our ID system is human readable instead of random hashes. We believe human-readable ids will eliminate the need for naming discussions.

They can, but is there any empirical data indicating that they will? Or is that merely a hypothetical scenario?

Experience we have in i18n software. Naming conventions are rotten because they don't work for i18n, where different teams need to agree on a convention.

or will they ignore randomly-chosen IDs and resort to contextual descriptions, like:

Nothing prevents that. In that moment, we achieved our goal. The ID of a message became meaningless, and naming discussions are eliminated :)

martin-lysk · 2024-01-12T09:11:42Z

@opral/inlang-cli @opral/inlang-cli @opral/inlang-fink @opral/inlang-ide-extension

Please check the spreadsheet of terms we plan to use for human id's (i will share the link in discord).

The table has a total of 4 tabs with different "adjectives", "nouns", "adverbs", "verbs".

Please take 30 minutes to check the current words for.
Uniqueness
Bad example:
Live vs. life

Pronounceability
Bad example:
Draught

politically incorrect hurtful or negatively coannotated words
Bad example:
fuck, master, bitch,

spellings in British vs American English
Bad example:
energize vs energise

Just delete the ones where you see problems. If you unsure of one of those properties - its a reason enought to drop it - no discussion needed!
Add good new words - in the end we need 256 words per category to get enought ids out of the combination.

Pleas only change column a. Column c and d will provide you with example ids including the term defined in a #excel_magic

Please react to this comment with a rocked if you are done 🚀

samuelstroschein · 2024-01-15T21:58:57Z

@martin-lysk i pressed 🚀 because I thought ppl were excited. i doubt that people went through the spreadsheet https://docs.google.com/spreadsheets/d/1AsAgZi9V8R_5xxSK8-spp0mkLojlT-0MFVozcF0MZ6I.

going through it now

NiklasBuchfink · 2024-01-17T10:53:03Z

My notes:

we should add Jurgen as an Easter egg too
we got fink and finch, not sure if this is confusing somehow. Finch is the English translation of the German Fink
I see the awful-niklas-arrogant-mix incoming, but I'm okay with that 😄

Is it correct that fink can be translated with:

a betrayer, traitor, snitch
an unpleasant or contemptible person
a person who informs on people to the authorities

samuelstroschein · 2024-01-17T14:34:52Z

we got fink and finch, not sure if this is confusing somehow. Finch is the English translation of the German Fink

just change it

jldec · 2024-01-30T17:29:54Z

I'll make a pass on this today since there's a cost to making changes to the word lists e.g. impacting mocks / tests.

additional scan to remove unwanted words
fix weird adverbs like "dai" (used to be daily)
remove duplicates (fly)

samuelstroschein · 2024-04-06T00:14:16Z

conversation continues in https://linear.app/opral/issue/MESDK-12/human-readable-and-random-ids-for-messages-by-default

GauBen · 2024-12-24T20:35:06Z

Hey folks!

I have some feedback on this feature. First of all I admit it's a brilliant idea to have random keys. No more bike-shedding on key names for "Close" vs "close" vs "close" (as not far) and stuff like that.

I have some suggestions to make it even better, at least for my use cases.

My first criticism is that the token names are long! Often longer than the few words they replace. That plus the hint takes a lot of screen estate.

A second criticism, more related to my workflow, is that token names are of random length. It makes editing the JSON files by hand more exhausting than it should because the start of strings are not aligned.

I would solve these two issues at once by embracing entropy: short and fully random identifiers. 8 random consonants have more entropy than the current 4 words. I'd love to have the extension autogenerate a short random identifier, whose length would be configurable (e.g. for a small app, 4 letters would be more than enough). What do you think?

samuelstroschein · 2024-12-25T13:57:20Z

Hi @GauBen,

We would love shorter random keys!

Problem: Distributed system. In larger projects, 1000's of people will create messages in different branches. Each ID generation for the message must be unique. Otherwise, we need to handle ID conflicts (which will be a pain).

If you develop a random human-readable ID system that will not lead to ID conflicts in large projects, we will quickly merge a PR!

It makes editing the JSON files by hand more exhausting than it should because the start of strings are not aligned.

Have you tried Sherlock or Fink?

We believe that editing JSON files by hand will fade out once the tools become better, and rather invest time into making the tools better than trying to fix manual editing by hand. If you haven't used Sherlock or Fink, I'd like to know why.

GauBen · 2024-12-25T20:21:20Z

Thanks for your quick response! 🎅

How important is "human-readable"? Current identifiers are in English, which are not nearly as universal. Offering to generate random identifiers (like xhmpsdtb) would offer the same conflict safety (20^8>256^4) at a much smaller size for organizations that do not need to say the keys out loud.

I'm absolutely not advocating to remove the current id system, it's great for many use-cases, I'm suggesting adding a new generator.

Yes I use Fink, it's a great tool! For small updates (i.e. when the wording is too long for the UI) I favor updating the JSON files directly, the feedback loop is shorter.

samuelstroschein · 2024-12-26T13:15:30Z

How important is "human-readable"?

Important. Human readable enables saying "please change human-blue-moon" or searching for human-blue-moon.

Current problem might be that we have 4 words instead of 3. We did that because the chances for 3 words were too high for a conflict. Solvable if we increase the word pool?

For small updates (i.e. when the wording is too long for the UI) I favor updating the JSON files directly, the feedback loop is shorter.

How could we improve Sherlock to remove the desire to manually edit JSON files?

maige-app bot added scope: inlang/sdk Related to source-code/sdk. type: feature New feature or request labels Dec 14, 2023

samuelstroschein changed the title ~~human readable but random ids for messages~~ human readable and random ids for messages by default Dec 14, 2023

samuelstroschein self-assigned this Dec 14, 2023

samuelstroschein mentioned this issue Dec 19, 2023

aliases in paraglide js #1920

Closed

martin-lysk assigned martin-lysk and unassigned samuelstroschein Dec 20, 2023

This was referenced Dec 22, 2023

Improve Paraglide Compiler output #1940

Merged

[bug] Message IDs clashing with internal variables #1938

Closed

jldec self-assigned this Jan 30, 2024

jldec unassigned martin-lysk Jan 30, 2024

jldec mentioned this issue Feb 9, 2024

WIP 1844 Part 1: auto-generated human-IDs and aliases #2108

Merged

22 tasks

samuelstroschein mentioned this issue Feb 28, 2024

[bug] plugin-inlang-json and plugin-inlang-i18next compile namespaces to invalid js #1577

Closed

samuelstroschein closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2024

jldec mentioned this issue Apr 8, 2024

Design for human IDs and aliases opral/inlang-sdk#23

Closed

jannesblobel mentioned this issue Jun 17, 2024

Standardize i18n keys activist-org/activist#902

Closed

2 tasks

bugproof mentioned this issue Oct 14, 2024

i18n brainstorming sveltejs/kit#553

Open

samuelstroschein closed this as completed Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

human readable and random ids for messages by default #1892

human readable and random ids for messages by default #1892

samuelstroschein commented Dec 14, 2023 •

edited

Loading

samuelstroschein commented Dec 14, 2023 •

edited

Loading

samuelstroschein commented Dec 14, 2023 •

edited

Loading

samuelstroschein commented Dec 14, 2023

NiklasBuchfink commented Dec 14, 2023 •

edited

Loading

samuelstroschein commented Dec 14, 2023

martin-lysk commented Dec 14, 2023

samuelstroschein commented Dec 14, 2023 •

edited

Loading

openscript commented Dec 14, 2023 •

edited

Loading

LorisSigrist commented Dec 15, 2023

martin-lysk commented Dec 15, 2023

felixhaeberle commented Dec 15, 2023 •

edited

Loading

samuelstroschein commented Dec 15, 2023 •

edited

Loading

martin-lysk commented Dec 20, 2023

martin-lysk commented Dec 21, 2023

ferdnyc commented Jan 7, 2024

samuelstroschein commented Jan 7, 2024

martin-lysk commented Jan 12, 2024 •

edited

Loading

samuelstroschein commented Jan 15, 2024

NiklasBuchfink commented Jan 17, 2024

samuelstroschein commented Jan 17, 2024

jldec commented Jan 30, 2024 •

edited

Loading

samuelstroschein commented Apr 6, 2024

GauBen commented Dec 24, 2024

samuelstroschein commented Dec 25, 2024

GauBen commented Dec 25, 2024

samuelstroschein commented Dec 26, 2024

human readable and random ids for messages by default #1892

human readable and random ids for messages by default #1892

Comments

samuelstroschein commented Dec 14, 2023 • edited Loading

Problem

Proposal

samuelstroschein commented Dec 14, 2023 • edited Loading

samuelstroschein commented Dec 14, 2023 • edited Loading

samuelstroschein commented Dec 14, 2023

NiklasBuchfink commented Dec 14, 2023 • edited Loading

samuelstroschein commented Dec 14, 2023

martin-lysk commented Dec 14, 2023

samuelstroschein commented Dec 14, 2023 • edited Loading

openscript commented Dec 14, 2023 • edited Loading

LorisSigrist commented Dec 15, 2023

martin-lysk commented Dec 15, 2023

felixhaeberle commented Dec 15, 2023 • edited Loading

samuelstroschein commented Dec 15, 2023 • edited Loading

Let's conclude the discussion 📺 watch the LOOM

Proposal

Why

martin-lysk commented Dec 20, 2023

martin-lysk commented Dec 21, 2023

ferdnyc commented Jan 7, 2024

samuelstroschein commented Jan 7, 2024

martin-lysk commented Jan 12, 2024 • edited Loading

samuelstroschein commented Jan 15, 2024

NiklasBuchfink commented Jan 17, 2024

samuelstroschein commented Jan 17, 2024

jldec commented Jan 30, 2024 • edited Loading

samuelstroschein commented Apr 6, 2024

GauBen commented Dec 24, 2024

samuelstroschein commented Dec 25, 2024

GauBen commented Dec 25, 2024

samuelstroschein commented Dec 26, 2024

samuelstroschein commented Dec 14, 2023 •

edited

Loading

samuelstroschein commented Dec 14, 2023 •

edited

Loading

samuelstroschein commented Dec 14, 2023 •

edited

Loading

NiklasBuchfink commented Dec 14, 2023 •

edited

Loading

samuelstroschein commented Dec 14, 2023 •

edited

Loading

openscript commented Dec 14, 2023 •

edited

Loading

felixhaeberle commented Dec 15, 2023 •

edited

Loading

samuelstroschein commented Dec 15, 2023 •

edited

Loading

martin-lysk commented Jan 12, 2024 •

edited

Loading

jldec commented Jan 30, 2024 •

edited

Loading