Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Babel or estree? #1384

Closed
wooorm opened this issue Dec 15, 2020 · 6 comments · Fixed by #1399
Closed

Babel or estree? #1384

wooorm opened this issue Dec 15, 2020 · 6 comments · Fixed by #1399
Labels
💪 phase/solved Post is done 💬 type/discussion This is a request for comments

Comments

@wooorm
Copy link
Member

wooorm commented Dec 15, 2020

Subject of the discussion

With #1382, we now have a JavaScript syntax tree.

The tree starts out in estree: as markdown + mdx.js is parsed simultaneously, I needed a JavaScript parser in micromark-extension-mdxjs, and I chose a small and fast one: acorn. Which comes with estree.
Acorn is small, 30kb minzipped. acorn-jsx is 4kb. astring (a generator) is also 4kb.

Previously, in this project, we used Babel for plugins.
Babel is giant. @babel/core, which has methods to run Babel plugins, is like 220kb minzipped. @babel/generator is 63kb. @babel/parser is 60kb.@babel/traverse is 165kb (it includes both the parser and the generator).

Estree has the drawback of being a fragmented ecosystem: there are no nice parsers that support comments; there are no tree-wakers or compilers that support JSX. And importantly, as as we use JSX, we’d want to turn JSX into function calls (React/preact/vue), but those are all Babel plugins. We could use estree but then users would still need to run Babel afterwards.

Babel has the drawback of being giant and slow. But the good thing is that the JSX -> JS compilers all live there.

Problem

What should we go with?
We can’t turn JSX -> JS unless we’re using Babel (well, we could, the babel plugin to turn JSX -> _jsx() / React.createElement is 800l). Most users probably want to use Babel plugins to turn their fancy features into whatever.
An estree-only system as a base for MDX would be ✨✨✨. @mdx-js/runtime is now 350kb minzipped. That could go down to 100kb or less?

@wooorm wooorm added 💬 type/discussion This is a request for comments 🙆 yes/confirmed This is confirmed and ready to be worked on labels Dec 15, 2020
@ChristianMurphy
Copy link
Member

Estree has the drawback of being a fragmented ecosystem: there are no nice parsers that support comments; there are no tree-wakers or compilers that support JSX

ESLint's parser and walker have solid ESTree + Comment + JSX support
https://github.com/eslint/espree
https://github.com/eslint/eslint-visitor-keys

Prettier has espree with Comment + JSX support for code gen https://github.com/prettier/prettier/blob/902d524d2f1776efe0b110c1a24813d4d7fcb9d0/src/language-js/printer-estree.js
escogen is close to having ESTree + JSX support estools/escodegen#391

@ChristianMurphy
Copy link
Member

Coming from the perspective of personally using MDX more as a build tool than as a runtime component, and liking both using proposals and typescript features.
I'm drawn more towards babel, having the ability to parse new syntax, having the option to support typescript syntax, and the broad support for babel within node/javascript tools are a draw.
Because of mostly using it as a build tool, bundle size is less of a priority for me.

If we have to pick just one, I'd lean babel.

That being said, do we need to pick just one?
Could the JavaScript parsing strategy be made pluggable?

@ChristianMurphy
Copy link
Member

ChristianMurphy commented Dec 16, 2020

Offering another consideration, if bundle size is the primary goal.
Acorn may not be the smallest option, wasm can pack smaller than JS, for example https://bundlephobia.com/result?p=@swc/[email protected]
and still allows for custom transforms if needed https://swc.rs/docs/usage-plugin
or other estree like javascript based parsers such as https://github.com/meriyah/meriyah and https://github.com/KFlash/seafox

/cc @ChristopherBiscardi since this approach has some potential tie ins to https://github.com/mdx-js/rust


edit: correction bundlephobia ignores wasm, the library may be faster, but it is not smaller https://unpkg.com/browse/@swc/[email protected]/

@johno
Copy link
Member

johno commented Dec 16, 2020

Thanks for all this research folks! I'd lean towards something smaller than Babel but I'm not very opinionated there. There are lots of client-side usages of MDX that won't go away, and Babel is pretty huge and pretty slow in comparison to other options. Considering we're mostly only using Babel for internals we could port it away without users really needing to know the difference.

Also, with wooorm's new JSX parsing, we can drop a bunch of the internals we use and manipulate the AST directly!

@ChristopherBiscardi
Copy link
Member

@ChristianMurphy I definitely wouldn't hold up any changes here based on the work in /rust. If our priority is small, then wasm is probably not the answer at the moment. swc is what I'm planning to use for /rust's js parsing and we could invest there more in the future but it's not a solution for today's in-browser use cases IMO.

that said, swc is hella faster than babel in my experience from working with it in toast (via the Rust APIs), and will work well for node-backed stuff if we're looking for a speed boost at some point in the future (TBD, caveats apply, /rust is an experiment, etc)

@wooorm
Copy link
Member Author

wooorm commented Dec 17, 2020

ESLint's parser and walker have solid ESTree + Comment + JSX support
[...]
escogen is close to having ESTree + JSX support [...]
@ChristianMurphy

espree seems to be a tiny wrapper around acorn and acorn-jsx 🤔
And a year old stalled PR is not really “close” 😅
Those visitor keys are great btw! Especially as espree is ± the same ast as acorn + acorn.jsx!

Porting our internals from Babel to estree is not a lot of work. Three small plugins:

const BabelPluginApplyMdxProp = require('babel-plugin-apply-mdx-type-prop')
const BabelPluginExtractImportNames = require('babel-plugin-extract-import-names')
const BabelPluginExtractExportNames = require('babel-plugin-extract-export-names')
.

For a nice JSX serializer, we could look into adding that to either escodegen/astring/or whatever else is nice.
But as we’re thinking of compiling JSX away, that’s not needed. Rather, forking babel-helper-builder-react-jsx-experimental for estree seems to be the way to go (not sure about Vue though...).

wooorm added a commit that referenced this issue Dec 20, 2020
This removes the last three custom Babel plugins we had and replaces
them with estree versions.
Furthermore, it removes `@babel/generator`.

For the plugins, we were only looking at ESM import/exports, but right
now we’re delegating work to `periscopic` to look at which things are
defined in the top-level scope.
It’s a bit more complex, but this matches better with intentions,
fixes some bugs, and prepares for a potential future where other ES
constructs are allowed, so all in all should be a nice improvement.

For serializing, we’re switching to `astring`, and handling JSX for now
internally (could be externalized later).
`astring` seems fast and is incredibly small, but is not very popular.
We might perhaps see bugs is serialization in the future because of that,
but all our tests seem fine, so I’m not too worried about that.

Estree remains a somewhat fragmented ecosystem, such as that the tree
walkers in `periscopic` and `astring` are different, so we might also
consider writing our own serializer in the future.
Or, when we implement Babel’s React JSX transform ourselves, could switch
to another generator, or at least drop the JSX serialization code here.

Because of these changes, we can drop `@babel/core` and
`@babel/generator` from `@mdx-js/mdx`, which drops the bundle size of
from 349kb to 111kb.
That’s 68%.
Pretty nice.
This should improve downloading and parsing time of bundles
significantly.
Of course, we currently still have JSX in the output, so folks will
have to resort to Babel (or `buble-jsx-only`) in another step.

For performance, v2 (micromark) was already an improvement over v1.
On 1000 simple files totalling about 1mb of MDX:

* v1: 3739ms
* v2: 2734ms (26% faster)
* v2 (w/o babel): 1392ms (63% faster).

Of course, this all really depends on what type of stuff is in your MDX.
But it looks pretty sweet!

✨

Related to GH-1046.
Related to GH-1152.
Related to GH-1338.
Closes GH-704.
Closes GH-1384.
johno pushed a commit that referenced this issue Dec 20, 2020
This removes the last three custom Babel plugins we had and replaces
them with estree versions.
Furthermore, it removes `@babel/generator`.

For the plugins, we were only looking at ESM import/exports, but right
now we’re delegating work to `periscopic` to look at which things are
defined in the top-level scope.
It’s a bit more complex, but this matches better with intentions,
fixes some bugs, and prepares for a potential future where other ES
constructs are allowed, so all in all should be a nice improvement.

For serializing, we’re switching to `astring`, and handling JSX for now
internally (could be externalized later).
`astring` seems fast and is incredibly small, but is not very popular.
We might perhaps see bugs is serialization in the future because of that,
but all our tests seem fine, so I’m not too worried about that.

Estree remains a somewhat fragmented ecosystem, such as that the tree
walkers in `periscopic` and `astring` are different, so we might also
consider writing our own serializer in the future.
Or, when we implement Babel’s React JSX transform ourselves, could switch
to another generator, or at least drop the JSX serialization code here.

Because of these changes, we can drop `@babel/core` and
`@babel/generator` from `@mdx-js/mdx`, which drops the bundle size of
from 349kb to 111kb.
That’s 68%.
Pretty nice.
This should improve downloading and parsing time of bundles
significantly.
Of course, we currently still have JSX in the output, so folks will
have to resort to Babel (or `buble-jsx-only`) in another step.

For performance, v2 (micromark) was already an improvement over v1.
On 1000 simple files totalling about 1mb of MDX:

* v1: 3739ms
* v2: 2734ms (26% faster)
* v2 (w/o babel): 1392ms (63% faster).

Of course, this all really depends on what type of stuff is in your MDX.
But it looks pretty sweet!

✨

Related to GH-1046.
Related to GH-1152.
Related to GH-1338.
Closes GH-704.
Closes GH-1384.
@wooorm wooorm added 💪 phase/solved Post is done and removed 🙆 yes/confirmed This is confirmed and ready to be worked on labels Dec 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done 💬 type/discussion This is a request for comments
Development

Successfully merging a pull request may close this issue.

4 participants