Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an experimental way to export a instance ES module. #22867

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

brendandahl
Copy link
Collaborator

@brendandahl brendandahl commented Nov 6, 2024

Adds a new mode -sMODULARIZE=instance which will change the output to be more of a static ES module. As mentioned in the docs, there will be a default async init function exported and named exports that correspond to Wasm and runtime exports. See the docs and test for an example.

Internally, the module will now have an init function that wraps nearly all of the code except some top level variables that will be exported. When the init function is run, the top level variables are then updated which will in turn update the module exports. E.g.

async function init(moduleArgs) {
  function foo() {};
  x_foo = foo;
  x_bar = wasmExports['bar'];
}
var x_foo, x_bar;
export {x_foo as foo, x_bar as bar};

Note: I alternatively tried to keep everything at the top level scope and move only the code that reads from moduleArg into an init function. This would make it possible to get rid of the x_func vars and directly add export to vars/functions we want to export. However, there are lots of things that read from moduleArg in many different spots and ways which makes this challenging.

Adds a new mode -sMODULARIZE=static which will change the output to be more of
a static ES module. As mentioned in the docs, there will be a default async
init function exported and named exports that correspond to Wasm and runtime
exports. See the docs and test for an example.

Internally, the module will now have an init function that wraps nearly all
of the code except some top level variables that will be exported. When the
init function is run, the top level variables are then updated which will
in turn update the module exports. E.g.

```
async function init(moduleArgs) {
  function foo() {};
  x_foo = foo;
  x_bar = wasmExports['bar'];
}
var x_foo, x_bar;
export {x_foo as foo, x_bar as bar};
```

Note: I alternatively tried to keep everything at the top level scope and move
only the code that reads from moduleArg into an init function. This would make
it possible to get rid of the `x_func` vars and directly add `export` to
vars/functions we want to export. However, there are lots of things that read
from moduleArg in many different spots and ways which makes this challenging.
src/modules.mjs Outdated Show resolved Hide resolved
@@ -18,7 +18,7 @@ moduleRtn = Module;

#endif // WASM_ASYNC_COMPILATION

#if ASSERTIONS
#if ASSERTIONS && MODULARIZE != 'static'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use -sMODULARIZE=instance to match the old -sMODULARIZE_INSTANCE setting? Do you think the word "static" is more descriptive of what is happening here and "instance"? I think of the two modes as "single instance" vs "factory/multiple instance", but maybe "static" makes sense on some level?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I open to changing, but instance to me confusing. When I hear instance, it makes me think I can create multiple instances i.e. a factory. I chose static since the exports to the outside world don't change and there's only one static instance.

Other ideas: static_instance, singleton, esmodule. Or alternatively, we add a whole different flag that will enable all the settings we want in the "new output world" e.g. strict, es modules, no _ exports....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of leaning towards bring back the old name -sMODULARIZE_INSTANCE until we can think of a better one. It is exactly the same meaning as the old option I believe, right?

Its interesting to me that the word instance to you implies multiple instances. I don't think it has the connotation for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it instance for now. Later I'd like to look into the idea of having two independent options of FACTORY(aka MODULARIZE) and then MODULE_FORMAT = none/esm/umd/... as mentioned below and in the other bug.

tools/link.py Outdated Show resolved Hide resolved
tools/settings.py Show resolved Hide resolved
tools/link.py Outdated Show resolved Hide resolved
@kripken
Copy link
Member

kripken commented Nov 7, 2024

Internally, the module will now have an init function that wraps nearly all of the code except some top level variables that will be exported. When the init function is run, the top level variables are then updated which will in turn update the module exports.

I think I follow this, but can you maybe explain a bit about how this improves on the current status? That is, what is changed, and what problems this fixes by changing it?

@brendandahl
Copy link
Collaborator Author

Internally, the module will now have an init function that wraps nearly all of the code except some top level variables that will be exported. When the init function is run, the top level variables are then updated which will in turn update the module exports.

I think I follow this, but can you maybe explain a bit about how this improves on the current status? That is, what is changed, and what problems this fixes by changing it?

Do you mean the overall goals of this new ES module output API or why the init function is needed?

@brendandahl
Copy link
Collaborator Author

To answer both:

The goals of a static ES module:

  1. Produce a module that has a more familiar shape to JS developers where the exports are statically declared.
  2. Make it easier for bundlers to do tree shaking/dead code elimination on ES modules produced by emscripten. Currently, bundlers like rollup can't handle the modularize output where everything is dynamically assigned to the Module object. The use case for this is that a developer uses a prebuilt emscripten module, but only uses a small part of the library. Ideally, the developer in this case would still be able to do DCE after the module has been produced.

As for the init function, there's no way to pass arguments into an imported ES module, so we have to separate out this step into an init function.

@kripken
Copy link
Member

kripken commented Nov 7, 2024

I think I see, thanks. And, looking at the current output of -sMODULARIZE -sEXPORT_ES6, it looks like we have

export default Module;

So "more static" here means to replace the big Module object with individual exports per function, as written above,

export {x_foo as foo, x_bar as bar};

?

What I am still a little confused on is why that is more "static". I guess the big Module object is not introspectable, but x_foo is written dynamically in the example above - can bundlers really reason about the runtime assignment to it?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 7, 2024

I think I see, thanks. And, looking at the current output of -sMODULARIZE -sEXPORT_ES6, it looks like we have

export default Module;

So "more static" here means to replace the big Module object with individual exports per function, as written above,

export {x_foo as foo, x_bar as bar};

?

What I am still a little confused on is why that is more "static". I guess the big Module object is not introspectable, but x_foo is written dynamically in the example above - can bundlers really reason about the runtime assignment to it?

I believe that can/will yes. You are correct that these still get dynamically assigned internally, but at least static from the POV of the importer/user they are static. Over time we can work to make these even more static, using things like the esm integration proposal, but even before then IIUC the bundlers can deal with this kind of thing.

@brendandahl
Copy link
Collaborator Author

Yeah, the tools seem much better at reasoning about exports, even with the later assignment in the init function.

Using Exported Functions

// main.mjs
import init, {foo} from './module.mjs'
await init();
foo();

// module.mjs
export default async function init() {
  _export__foo = function() { console.log('foo'); };
  _export__bar = function() { console.log('bar'); };
}
var _export__foo, _export__bar;
export { _export__foo as foo, _export__bar as bar}

Rollup Output (notice bar is gone)

async function init() {
  _export__foo = function() { console.log('foo'); };
}
var _export__foo;
await init();
_export__foo();

Closure Output (notice bar is gone)

async function a(){b=function(){console.log("foo")}}var b;(async function(){await a();b()})();

Using A Module Object (the current modularize + es6)

// main.mjs
import Module from './module.mjs'
const module = await Module();
module['foo']();

// module.mjs
export default async function Module() {
  var module = {};
  module['foo'] = function() { console.log('foo'); };
  module['bar'] = function() { console.log('bar'); };
  return module;
}

Rollup Output (bar still exists)

async function Module() {
  var module = {};
  module['foo'] = function() { console.log('foo'); };
  module['bar'] = function() { console.log('bar'); };
  return module;
}
const module = await Module();
module['foo']();

Closure Output (bar still exists)

async function a(){return{foo:function(){console.log("foo")},bar:function(){console.log("bar")}}};(async function(){(await a()).foo()})();

@kripken
Copy link
Member

kripken commented Nov 7, 2024

Interesting. How do they do that, though? It seems that to do any kind of useful optimization here we need to know exactly what can be assigned to these exports. I guess it doesn't matter that they can be assigned more than once, but can they even infer the content written to them? That is, @brendandahl you gave this example now

 _export__foo = function() { console.log('foo'); };

but the actual code would be less static, I think? Don't we capture these from wasm exports, like this?

 _export__foo = exports.foo;

Do these tools actually manage to figure out what exports is, going back through the wasm compile&link stuff, to the wasm binary file?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 7, 2024

Do these tools actually manage to figure out what exports is, going back through the wasm compile&link stuff, to the wasm binary file?

I think the idea is that we could extend the bundlers so that they can do that (assuming that can't already do it today).

@brendandahl
Copy link
Collaborator Author

brendandahl commented Nov 7, 2024

Don't we capture these from wasm exports, like this?

Yes, for the first stage rollup would only be able to get rid of runtime/library JS that is unused. The next stage is to create a rollup plugin that knows about wasm exports and can remove them if they're never used. This is where it gets more tricky and it's unclear exactly how we want to do that. I see two options:

  1. Use source phase imports to identify wasm exports in the plugin.
  2. Create some special emscripten output that makes it easy for a plugin to identify wasm exports.

Option 1 seems more like more general/standards approach, but I haven't looked into this yet.

Alternatively, we could keep the current module output and try to make bundlers understand that. I don't see this as a great option though, since we'd need to support this in several bundlers, whereas using export seems like a way tools already support.

@kripken
Copy link
Member

kripken commented Nov 7, 2024

I see, thanks! Makes sense. So there are few more steps here, after this, that I didn't fully get before.

test/test_other.py Outdated Show resolved Hide resolved
tools/link.py Show resolved Hide resolved
@curiousdannii
Copy link
Contributor

In the world of ES Modules, do wrapper functions really have any actual advantage? Wouldn't it be most beneficial to have no wrapper? Everything applicable could be exported directly, and DCE-ed by the bundler if not used.

We'd then need a separate setting for factory output.

Old discussion on the same topic: #11792 (comment)

@sbc100
Copy link
Collaborator

sbc100 commented Nov 8, 2024

In the world of ES Modules, do wrapper functions really have any actual advantage?

When we can avoid wrappers we obviously should. However, until the Wasm ESM integration proposal (https://github.com/WebAssembly/esm-integration) lands we cannot directly export wasm functions, so we need to export something that is then dynamically updated to point to the wasm function once its available.

The second issue that emscripten startup takes arguments (see INCOMING_MODULE_JS_API). This means that even if we have full ESM integration we would still need to delay module instantiation in most cases until those arguments are provided.

@brendandahl
Copy link
Collaborator Author

Old discussion on the same topic: #11792 (comment)

Thanks for pointing to this issue. I hadn't seen it before, but I think it echos a lot of what I was thinking. I like your idea of two independent options of FACTORY(aka MODULARIZE) and then MODULE_FORMAT = none/esm/umd/.... Ideally, in emscripten we'd just handle creating the factory/instance version, and always produce an ESM. Then we'd let rollup handle outputting the desired module format.

@brendandahl brendandahl changed the title Add an experimental way to export a "static" ES module. Add an experimental way to export a instance ES module. Nov 12, 2024
@brendandahl
Copy link
Collaborator Author

For posterity, it seems webpack is unable to do tree shaking as thoroughly as rollup and closure. Using the example of "exported functions" in this comment, the bar function still remains in the output.

I tried with:

// package.json
{
  "sideEffects": false
}

// webpack.config.json
const path = require('path');

module.exports = {
  entry: "./main.mjs",
  output: {
    path: path.resolve(__dirname, "dist"),
  },
  optimization: {
    usedExports: true
  },
  stats: {
    // Display bailout reasons
    optimizationBailout: true,
  },
};

If I instead use an even "more static" module, tree shaking does work.

export let fillSquare = function() {
  console.log('fillSquare');
};
export let fillCircle = function() {
  console.log('fillCircle');
};

export default async function init() {
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants