Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize identifiers to ESTree #2521

Merged
merged 2 commits into from
Mar 4, 2024
Merged

Conversation

ArnaudBarre
Copy link
Contributor

Re #2463

Do you want to go in this direction?

It can also be seen as a loss for people building other things on top later on (both for merging them and using skipping reference_id).

I'm also quite ok with keeping the AST as close as possible to ESTree without loosing information (align name for 1-1 nodes, camelCase name, string enum, ...) and having a small wrapper to convert to ESTree.

This wrapper could be in a first time in TS and then in Rust. Given that Prettier needs some very custom AST transformations, a small wrapper will be needed in that case anyway so I'm fine saying the current AST for identifier is fine as is.

Also this rename = "Identifier" doesn't seem to be entirely handled by the wasm-pack, the generated types looks like this:

export interface Identifier extends Span {
    type: "Identifier";
    name: Atom;
}

export interface Identifier extends Span {
    type: "Identifier";
    name: Atom;
}

export type TSTypePredicateName = IdentifierName | TSThisType;

@github-actions github-actions bot added the A-ast Area - AST label Feb 27, 2024
Copy link

codspeed-hq bot commented Feb 27, 2024

CodSpeed Performance Report

Merging #2521 will degrade performances by 7.18%

Comparing ArnaudBarre:indentifier (0120b65) with main (46e7791)

Summary

⚡ 1 improvements
❌ 2 regressions
✅ 24 untouched benchmarks

🆕 2 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main ArnaudBarre:indentifier Change
🆕 codegen_sourcemap[react.development.js] N/A 15.5 ms N/A
🆕 codegen_sourcemap[typescript.js] N/A 1.6 s N/A
linter[RadixUIAdoptionSection.jsx] 5.6 ms 6 ms -6.64%
parser[RadixUIAdoptionSection.jsx] 350.9 µs 319.4 µs +9.87%
semantic[RadixUIAdoptionSection.jsx] 392.6 µs 422.9 µs -7.18%

@Boshen
Copy link
Member

Boshen commented Feb 27, 2024

Shall we try and keep these spec names until something blows up?

@ArnaudBarre
Copy link
Contributor Author

Yeah we can. After some thoughts, this is really the "easy" part for compatibility. Given that's this requires fixing also type codegen, let's tackle other issues before this one. I will work on the import attributes/withClause mismatch tonight

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 1, 2024

Just to put in my 2 cents: I think if we're going for ESTree compatibility, we should go for 100% compatibility. It's simpler for users to have that unambiguity.

Once #2409 comes to fruition (granted, that may be a while) it'll be working off the binary representation, so this information won't be lost, only hidden. An AST visitor could expose BindingIdentifier and friends as selectors (much the same way Babel has selectors which don't correspond exactly to AST node types e.g. Function which acts as FunctionDeclaration | FunctionExpression | ArrowFunctionExpression). That would give a lot of the benefits without the downside of having a "slightly wacky" AST.

But in the AST presented as a JS object, in my opinion it's better to be 100% ESTree compatible.

Any conversion/translation required to achieve that is going to be faster if done on Rust side during serialization, than on JS side.

And one other thing: If we do achieve 100% compat, it'll be easier to import a ton of tests from another library without having to comment out a bunch as "we do it differently".

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 1, 2024

By the way, I'm not saying we have to do it now, or that you have to do it Arnaud! I'm just saying that in my view, I think it should be our eventual end goal.

@ArnaudBarre
Copy link
Contributor Author

Yeah I agree OXC should provide an ESTree compatible output as builtin, but this should not block better features as you suggested.
My current plan:

  • Getting the parser wrapper for Prettier usable on my real world codebase (means that there is enough information for all common nodes)
  • Getting the parser wrapper for Prettier passing prettier tests (means that the output contains at least enough information for all nodes)
  • Publish the package (can be under the oxc org, we will see at this point)
  • Split the wrapper in two (OXC -> TSESTree and TSESTree -> Prettier tree (subtitles differences)) and have the first part pass typescript-eslint tests (means the wrapper should hide any extra information so AST snapshot works)
  • Verify that is also works for Babel TS tests (stage 4 spec only)
  • Rewrite the transformation in rRst as part of the @oxc/parser crates (with maybe some help, I still need to see how long the first parts will take)

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 1, 2024

Thanks for coming back @ArnaudBarre and outlining your plan. Sounds great. Glad you agree that OXC's JS interface should provide a TSESTree-compatible AST ideally.

I believe that all transformations required for that should be achievable on Rust side during serialization. Beyond using the #[serde(rename)] etc tags, you can also manually implement impl Serialize for T and tell serde to create whatever arbitrary JSON you like. e.g. Adding rest to end of elements for ArrayPattern etc can be done in this way.

Writing Serialize implementations by hand is not for the faint-hearted, as serde's API is pretty horrible. I doubt as someone unfamiliar with Rust you'd want to tackle that. But I'd be happy to.

There's one part of this I don't understand. What's required in the TSESTree -> Prettier tree conversion? Can you give an example please?

@Boshen
Copy link
Member

Boshen commented Mar 1, 2024

I want to identify all the issues to serializing these AST nodes that have more information to nodes that have less.

For example, it occurs to me that we can't deserialize an estree AST back to oxc AST if there is information loss - a possible fix is to add extra information to the serialized node so we can deserialize it back.

This is just one example, there may be more but I'm not sure.

I'm all in for 100% estree serialization, but we want to sit on the problem for a bit longer so we can identify all the issues prior.

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 1, 2024

we can't deserialize an estree AST back to oxc AST if there is information loss

I think can reconstruct the information from context.

e.g. for BindingPattern, kind field can contain a general {type: 'Identifier', ...} in JS, but we know because it's BindingPattern that this field needs to be deserialized in Rust as a BindingIdentifier.

I doubt there'd be many ambiguities. e.g. x in x = 1 is always an IdentifierReference, and in let x it's always a BindingIdentifier.

serde may even deal with it automatically if there's no ambiguity.

a possible fix is to add extra information to the serialized node so we can deserialize it back

This would work for an AST which was originally created by OXC, but if the user alters the AST in JS (e.g. inserts new nodes), this extra metadata wouldn't be present.

@overlookmotel
Copy link
Contributor

On this specific PR... @ArnaudBarre would you like to split out the parts of this PR which are uncontroversial into a separate PR which can get merged before decision is made about whether to "lower" type or not ? e.g.

#[cfg_attr(feature = "serde", serde(skip))]
pub reference_id: Cell<Option<ReferenceId>>,

@ArnaudBarre
Copy link
Contributor Author

Yeah I don't really know what's the optimal way to convert the complex cases, I think rest element handling is probably the most complex one. If it requires tricks specific to serde that copilot don't know maybe I will need help of be happy that someone does it. But I think having all the difference laid out as a TS implementation is a better start before diving in (at least for me).

For things specific to Prettier, see this file and few others in the folder above. This show that even if almost all parsers are estree like, there is always subtleties, mostly because all TS only part is less standardized than the pure ESTree part.

For the "loss" information part, I think this is simpler for OXC to have two mode, one output closer to OXC internal representation so communication can go back and force, and one fully ESTree compatible where you can't go back to OXC. I don't see a usage for something in between, and it will I think be costly to maintain.

@overlookmotel which part is "uncontroversial" for you? For me both result in potential information loss. But I'm happy to discuss skipping nodes and merging type identifier separately anyway. An other example I've seen is the span info the trailing comma is is not part of ESTree.

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 1, 2024

which part is "uncontroversial" for you? For me both result in potential information loss.

Ah sorry, my mistake. I see what you mean.

I can write a Serialize impl to deal with rest.

Concerning 2 modes: I'm not sure how easy this would be to do with serde. I think it assumes there's one canonical way to serialize a type, and don't think it allows multiple variations (but maybe I'm wrong and there is a workaround to do that).

Presumably it's OK to add extra fields (e.g. symbolId on BindingIdentifier) and still be ESTree-compatible, as no software will be looking for that field.

As far as conflating BindingIdentifier/IdentifierReference -> Identifier, personally I don't think it's as bad as both of you seem to, because I think it's clear from context (so the transform isn't lossy, it's just not explicit).

Thanks for info on Prettier. I'll try and get my head around that soon as I get a chance.

@Boshen
Copy link
Member

Boshen commented Mar 4, 2024

I'll try and get the TypeScript types compiling and get this merged tomorrow.

@Boshen Boshen marked this pull request as ready for review March 4, 2024 14:50
@Boshen Boshen enabled auto-merge (squash) March 4, 2024 14:52
@Boshen Boshen merged commit f6709e4 into oxc-project:main Mar 4, 2024
16 checks passed
@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 4, 2024

parser

What the hell? How on earth did this speed up the parser? As far as I can see, this PR purely changes the Serialize impls, which aren't used in parser benchmarks. Any idea @Boshen?

@overlookmotel
Copy link
Contributor

overlookmotel commented Mar 4, 2024

Oh I get it. CodSpeed is comparing this PR rebased on main vs baseline of 1-week old main. So other changes made in past week which speeded up parser are showing as due to this PR. It's annoying CodSpeed is misleading in this way sometimes.

Should we alter benchmark CI job for PRs so it runs benchmarks on actual HEAD commit of the PR, rather than the PR auto-rebased on main?

@ArnaudBarre ArnaudBarre deleted the indentifier branch March 4, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants