Serialize identifiers to ESTree #2521

ArnaudBarre · 2024-02-27T01:40:39Z

Do you want to go in this direction?

It can also be seen as a loss for people building other things on top later on (both for merging them and using skipping reference_id).

I'm also quite ok with keeping the AST as close as possible to ESTree without loosing information (align name for 1-1 nodes, camelCase name, string enum, ...) and having a small wrapper to convert to ESTree.

This wrapper could be in a first time in TS and then in Rust. Given that Prettier needs some very custom AST transformations, a small wrapper will be needed in that case anyway so I'm fine saying the current AST for identifier is fine as is.

Also this rename = "Identifier" doesn't seem to be entirely handled by the wasm-pack, the generated types looks like this:

export interface Identifier extends Span {
    type: "Identifier";
    name: Atom;
}

export interface Identifier extends Span {
    type: "Identifier";
    name: Atom;
}

export type TSTypePredicateName = IdentifierName | TSThisType;

codspeed-hq · 2024-02-27T01:47:57Z

CodSpeed Performance Report

Merging #2521 will degrade performances by 7.18%

_{Comparing ArnaudBarre:indentifier (0120b65) with main (46e7791)}

Summary

⚡ 1 improvements
❌ 2 regressions
✅ 24 untouched benchmarks

🆕 2 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`ArnaudBarre:indentifier`	Change
🆕	`codegen_sourcemap[react.development.js]`	N/A	15.5 ms	N/A
🆕	`codegen_sourcemap[typescript.js]`	N/A	1.6 s	N/A
❌	`linter[RadixUIAdoptionSection.jsx]`	5.6 ms	6 ms	-6.64%
⚡	`parser[RadixUIAdoptionSection.jsx]`	350.9 µs	319.4 µs	+9.87%
❌	`semantic[RadixUIAdoptionSection.jsx]`	392.6 µs	422.9 µs	-7.18%

Boshen · 2024-02-27T05:09:02Z

Shall we try and keep these spec names until something blows up?

ArnaudBarre · 2024-02-27T10:41:16Z

Yeah we can. After some thoughts, this is really the "easy" part for compatibility. Given that's this requires fixing also type codegen, let's tackle other issues before this one. I will work on the import attributes/withClause mismatch tonight

overlookmotel · 2024-03-01T03:55:56Z

Just to put in my 2 cents: I think if we're going for ESTree compatibility, we should go for 100% compatibility. It's simpler for users to have that unambiguity.

Once #2409 comes to fruition (granted, that may be a while) it'll be working off the binary representation, so this information won't be lost, only hidden. An AST visitor could expose BindingIdentifier and friends as selectors (much the same way Babel has selectors which don't correspond exactly to AST node types e.g. Function which acts as FunctionDeclaration | FunctionExpression | ArrowFunctionExpression). That would give a lot of the benefits without the downside of having a "slightly wacky" AST.

But in the AST presented as a JS object, in my opinion it's better to be 100% ESTree compatible.

Any conversion/translation required to achieve that is going to be faster if done on Rust side during serialization, than on JS side.

And one other thing: If we do achieve 100% compat, it'll be easier to import a ton of tests from another library without having to comment out a bunch as "we do it differently".

overlookmotel · 2024-03-01T04:09:18Z

By the way, I'm not saying we have to do it now, or that you have to do it Arnaud! I'm just saying that in my view, I think it should be our eventual end goal.

ArnaudBarre · 2024-03-01T09:53:22Z

Yeah I agree OXC should provide an ESTree compatible output as builtin, but this should not block better features as you suggested.
My current plan:

Getting the parser wrapper for Prettier usable on my real world codebase (means that there is enough information for all common nodes)
Getting the parser wrapper for Prettier passing prettier tests (means that the output contains at least enough information for all nodes)
Publish the package (can be under the oxc org, we will see at this point)
Split the wrapper in two (OXC -> TSESTree and TSESTree -> Prettier tree (subtitles differences)) and have the first part pass typescript-eslint tests (means the wrapper should hide any extra information so AST snapshot works)
Verify that is also works for Babel TS tests (stage 4 spec only)
Rewrite the transformation in rRst as part of the @oxc/parser crates (with maybe some help, I still need to see how long the first parts will take)

overlookmotel · 2024-03-01T13:18:02Z

Thanks for coming back @ArnaudBarre and outlining your plan. Sounds great. Glad you agree that OXC's JS interface should provide a TSESTree-compatible AST ideally.

I believe that all transformations required for that should be achievable on Rust side during serialization. Beyond using the #[serde(rename)] etc tags, you can also manually implement impl Serialize for T and tell serde to create whatever arbitrary JSON you like. e.g. Adding rest to end of elements for ArrayPattern etc can be done in this way.

Writing Serialize implementations by hand is not for the faint-hearted, as serde's API is pretty horrible. I doubt as someone unfamiliar with Rust you'd want to tackle that. But I'd be happy to.

There's one part of this I don't understand. What's required in the TSESTree -> Prettier tree conversion? Can you give an example please?

Boshen · 2024-03-01T13:19:02Z

I want to identify all the issues to serializing these AST nodes that have more information to nodes that have less.

For example, it occurs to me that we can't deserialize an estree AST back to oxc AST if there is information loss - a possible fix is to add extra information to the serialized node so we can deserialize it back.

This is just one example, there may be more but I'm not sure.

I'm all in for 100% estree serialization, but we want to sit on the problem for a bit longer so we can identify all the issues prior.

overlookmotel · 2024-03-01T13:27:43Z

we can't deserialize an estree AST back to oxc AST if there is information loss

I think can reconstruct the information from context.

e.g. for BindingPattern, kind field can contain a general {type: 'Identifier', ...} in JS, but we know because it's BindingPattern that this field needs to be deserialized in Rust as a BindingIdentifier.

I doubt there'd be many ambiguities. e.g. x in x = 1 is always an IdentifierReference, and in let x it's always a BindingIdentifier.

serde may even deal with it automatically if there's no ambiguity.

a possible fix is to add extra information to the serialized node so we can deserialize it back

This would work for an AST which was originally created by OXC, but if the user alters the AST in JS (e.g. inserts new nodes), this extra metadata wouldn't be present.

overlookmotel · 2024-03-01T13:48:03Z

On this specific PR... @ArnaudBarre would you like to split out the parts of this PR which are uncontroversial into a separate PR which can get merged before decision is made about whether to "lower" type or not ? e.g.

#[cfg_attr(feature = "serde", serde(skip))]
pub reference_id: Cell<Option<ReferenceId>>,

ArnaudBarre · 2024-03-01T14:54:08Z

Yeah I don't really know what's the optimal way to convert the complex cases, I think rest element handling is probably the most complex one. If it requires tricks specific to serde that copilot don't know maybe I will need help of be happy that someone does it. But I think having all the difference laid out as a TS implementation is a better start before diving in (at least for me).

For things specific to Prettier, see this file and few others in the folder above. This show that even if almost all parsers are estree like, there is always subtleties, mostly because all TS only part is less standardized than the pure ESTree part.

For the "loss" information part, I think this is simpler for OXC to have two mode, one output closer to OXC internal representation so communication can go back and force, and one fully ESTree compatible where you can't go back to OXC. I don't see a usage for something in between, and it will I think be costly to maintain.

@overlookmotel which part is "uncontroversial" for you? For me both result in potential information loss. But I'm happy to discuss skipping nodes and merging type identifier separately anyway. An other example I've seen is the span info the trailing comma is is not part of ESTree.

overlookmotel · 2024-03-01T15:49:44Z

which part is "uncontroversial" for you? For me both result in potential information loss.

Ah sorry, my mistake. I see what you mean.

I can write a Serialize impl to deal with rest.

Concerning 2 modes: I'm not sure how easy this would be to do with serde. I think it assumes there's one canonical way to serialize a type, and don't think it allows multiple variations (but maybe I'm wrong and there is a workaround to do that).

Presumably it's OK to add extra fields (e.g. symbolId on BindingIdentifier) and still be ESTree-compatible, as no software will be looking for that field.

As far as conflating BindingIdentifier/IdentifierReference -> Identifier, personally I don't think it's as bad as both of you seem to, because I think it's clear from context (so the transform isn't lossy, it's just not explicit).

Thanks for info on Prettier. I'll try and get my head around that soon as I get a chance.

Boshen · 2024-03-04T12:05:07Z

I'll try and get the TypeScript types compiling and get this merged tomorrow.

overlookmotel · 2024-03-04T15:42:45Z

What the hell? How on earth did this speed up the parser? As far as I can see, this PR purely changes the Serialize impls, which aren't used in parser benchmarks. Any idea @Boshen?

overlookmotel · 2024-03-04T15:52:36Z

Oh I get it. CodSpeed is comparing this PR rebased on main vs baseline of 1-week old main. So other changes made in past week which speeded up parser are showing as due to this PR. It's annoying CodSpeed is misleading in this way sometimes.

Should we alter benchmark CI job for PRs so it runs benchmarks on actual HEAD commit of the PR, rather than the PR auto-rebased on main?

WIP: uniform serialize identifier

7287434

github-actions bot added the A-ast Area - AST label Feb 27, 2024

This was referenced Feb 27, 2024

fix(ast): few serialization issues #2522

Merged

feat(parser): serialize to estree #2463

Open

update

0120b65

Boshen marked this pull request as ready for review March 4, 2024 14:50

Boshen approved these changes Mar 4, 2024

View reviewed changes

Boshen enabled auto-merge (squash) March 4, 2024 14:52

Boshen merged commit f6709e4 into oxc-project:main Mar 4, 2024
16 checks passed

ArnaudBarre deleted the indentifier branch March 4, 2024 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize identifiers to ESTree #2521

Serialize identifiers to ESTree #2521

ArnaudBarre commented Feb 27, 2024

codspeed-hq bot commented Feb 27, 2024 •

edited

Loading

Boshen commented Feb 27, 2024

ArnaudBarre commented Feb 27, 2024

overlookmotel commented Mar 1, 2024 •

edited

Loading

overlookmotel commented Mar 1, 2024 •

edited

Loading

ArnaudBarre commented Mar 1, 2024

overlookmotel commented Mar 1, 2024 •

edited

Loading

Boshen commented Mar 1, 2024

overlookmotel commented Mar 1, 2024 •

edited

Loading

overlookmotel commented Mar 1, 2024

ArnaudBarre commented Mar 1, 2024

overlookmotel commented Mar 1, 2024 •

edited

Loading

Boshen commented Mar 4, 2024

overlookmotel commented Mar 4, 2024 •

edited

Loading

overlookmotel commented Mar 4, 2024 •

edited

Loading

Serialize identifiers to ESTree #2521

Serialize identifiers to ESTree #2521

Conversation

ArnaudBarre commented Feb 27, 2024

codspeed-hq bot commented Feb 27, 2024 • edited Loading

CodSpeed Performance Report

Merging #2521 will degrade performances by 7.18%

Summary

Benchmarks breakdown

Boshen commented Feb 27, 2024

ArnaudBarre commented Feb 27, 2024

overlookmotel commented Mar 1, 2024 • edited Loading

overlookmotel commented Mar 1, 2024 • edited Loading

ArnaudBarre commented Mar 1, 2024

overlookmotel commented Mar 1, 2024 • edited Loading

Boshen commented Mar 1, 2024

overlookmotel commented Mar 1, 2024 • edited Loading

overlookmotel commented Mar 1, 2024

ArnaudBarre commented Mar 1, 2024

overlookmotel commented Mar 1, 2024 • edited Loading

Boshen commented Mar 4, 2024

overlookmotel commented Mar 4, 2024 • edited Loading

overlookmotel commented Mar 4, 2024 • edited Loading

codspeed-hq bot commented Feb 27, 2024 •

edited

Loading

overlookmotel commented Mar 1, 2024 •

edited

Loading

overlookmotel commented Mar 1, 2024 •

edited

Loading

overlookmotel commented Mar 1, 2024 •

edited

Loading

overlookmotel commented Mar 1, 2024 •

edited

Loading

overlookmotel commented Mar 1, 2024 •

edited

Loading

overlookmotel commented Mar 4, 2024 •

edited

Loading

overlookmotel commented Mar 4, 2024 •

edited

Loading