Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helpers design doc #94

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Helpers design doc #94

wants to merge 2 commits into from

Conversation

GeoffreyBooth
Copy link
Member

I started writing a design doc for the helper functions we’re considering adding, to help people reimplement only small portions of hooks (see README). The idea is that if someone wants to change a small part of what happens within Node’s internal resolve, for example, rather than copy/pasting all the code from Node’s internal resolve and changing one thing (and then this loader getting outdated with the next version of Node where Node’s internal resolve changes) we provide lots of functions that make up all the steps of Node’s resolve, and the hook author can sting those together with whatever minor changes they want as part of the user resolve hook.

On a practical level this will mean making public many of the functions in https://github.com/nodejs/node/blob/3350d9610864af3219de7dd20e3ac18b3c214c52/lib/internal/modules/esm/resolve.js, and splitting some of the larger ones into smaller ones that become public. Because many functions reference other helper functions, I’m thinking of a pattern where most of these internal calls become references that get passed in. So for example within packageResolve there’s a call to parsePackageName; the latter can be passed in as an argument to packageResolve, if the user desires to override parsePackageName within packageResolve. Leaving it undefined would mean that Node’s default internal parsePackageName would be used.

I’m opening this PR as a draft to get initial feedback on this general approach. If you all think it’s good and the best way to address how to make these functions into public APIs, then I’ll keep going on this design doc and try to define all of them (at least for resolve, for now) and then we can implement this.

@GeoffreyBooth GeoffreyBooth added the documentation Improvements or additions to documentation label Jul 10, 2022
@cspotcode
Copy link
Contributor

Does this design account for composing two loaders that are both trying to use these helpers?

Decorator pattern can work well for stuff like this. Decorator pattern could actually work for all hooks: these helpers, existing hooks, and the fs hooks.

// my-loader.mjs
export function create(hooks: NodeLoaderHooks) {
    return {
        ...hooks,
        resolve(url: Url, context) { /* resolve hook goes here.  Can call `hooks.resolve()` if appropriate */ },
        fileExists(path: string) { /* augment fileExists */ },
        findPackageJson(packageName: string, base: string, isScoped: boolean) { /* ... */ }
        // All other hook behaviors are unmodified by this loader.  Other loaders might modify them
    }
}

The snippet above has them all on a single namespace, but could also work splitting into multiple namespaces.

export function create(hooks) {
    ...hooks,
    fs: {
        ...hooks.fs,
        fileExists() { /* augment fileExists */ }
    }
}

@GeoffreyBooth
Copy link
Member Author

Does this design account for composing two loaders that are both trying to use these helpers?

The idea is that each helper would be a “pure” function, operating only on its input; hence the need for passing in a lot more input, from references to other functions to also containers of state like the module cache. But once every helper function is pure and stateless, they should be able to be used by any loader in a chain without conflict.

@cspotcode
Copy link
Contributor

I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.

@arcanis
Copy link
Contributor

arcanis commented Jul 10, 2022

I'm thinking about a situation where one loader needs to augment the behavior of a helper function, and another loader needs to pass an implementation of that helper function into another helper function. The second loader needs a reference to the first loader's augmentations of the helper.

Presumably, with ambiant loaders (I'm working on them), one loader could just override the node:loader_utils built-in module to point to its custom implementation, right?

@GeoffreyBooth
Copy link
Member Author

Presumably, with ambiant loaders (I’m working on them), one loader could just override the node:loader_utils built-in module to point to its custom implementation, right?

This isn’t a use case I considered, like say if you want to override a helper like packageResolve and have your custom version be used by all following loaders; but I suppose it should be possible, since loaders can override/replace/mock builtin imports. Loaders probably wouldn’t and shouldn’t be able to override the version of packageResolve that Node itself uses within its internal resolve, whether or not that gets called, because we have lots of protections against userland code messing with Node’s internals. The point of creating these helpers isn’t for them to be overridden, it’s for them to be used within user-defined hooks.

I’m still hoping to avoid needing lots of defined hooks, like separate hooks for every filesystem operation etc. For now my goal is only to reduce boilerplate among hooks that users are writing, not provide even more ways to hook into Node’s internals (at least, not as part of this effort).

Getting back to my original question, does anyone have any feedback about this general design pattern?

@cspotcode
Copy link
Contributor

To override node:loader_utils in an ambient loader, do you imagine it aliases the existing node:loader_utils -- possibly coming from another loader in the chain -- to a new specifier? And then emits its own code that imports from that new specifier?

Something like:

import * as wrappedHelpers from 'loader_utils_generated_alias_46873'; // avoid collisions
export function findPackageJson() { wrappedHelpers.findPackageJson(/* augmentations */) /* augmentations */ }

If helper A calls helper B, calls helper C, then helper C's dependencies should be pass-able to helper A. So that any helper can accept a Partial<Helpers> which will apply to all direct and transitive operations.

I dunno if there's any performance difference between a single createHelpers(helpers) factory / class and passing a helpers object into each function. Some of these might be hot codepaths. I've read node's stance on performance considerations.

@GeoffreyBooth
Copy link
Member Author

To override node:loader_utils in an ambient loader

I was thinking that each helper would be available on module, as in import { packageResolve } from 'module'. So you could override them like any other import.

But please, let’s stop focusing on overriding helpers. The question is whether the API I’ve started to sketch out in this draft PR is good or could be improved.

doc/design/helpers.md Outdated Show resolved Hide resolved
@cspotcode
Copy link
Contributor

It might help to get a usage example for invoking packageResolve with a custom implementation of tryStatSync, to make sure we're on the same page as far as intended usage in those situations.

@GeoffreyBooth
Copy link
Member Author

GeoffreyBooth commented Jul 12, 2022

It might help to get a usage example for invoking packageResolve with a custom implementation of tryStatSync, to make sure we’re on the same page as far as intended usage in those situations.

I’ve added an example of the intended/expected usage: https://github.com/nodejs/loaders/pull/94/commits/0d263e3b7c9ff83812aa02a96e99cbbf47d7e444.

I’ll add another example with a custom implementation of one of the child functions.

@cspotcode
Copy link
Contributor

The situation I'm thinking of is: what's intended implementation when tryStatSync needs to be customized within a call to packageResolve? Knowing that packageResolve is going to call findPackageJson, which will call tryStatSync

@GeoffreyBooth
Copy link
Member Author

The situation I’m thinking of is: what’s intended implementation when tryStatSync needs to be customized within a call to packageResolve? Knowing that packageResolve is going to call findPackageJson, which will call tryStatSync

At the moment I can’t think of a realistic example for why you’d want to customize tryStatSync, but assuming you had one, it would be something like this (within the example I linked to above):

  const tryStatSync = () => true; // Or whatever
  const pathToPackage = fileURLToPath(packageResolve(specifier, context.parentURL, context.conditions, {
    tryStatSync,
  }));

@cspotcode
Copy link
Contributor

I'm thinking about virtual filesystem situations. Not sure if that's intended to be done in this way, or via the virtual filesystem hooks.

But I like that we got clarification on the ability to pass transitive dependencies into a helper. So a single overrides object can be passed into every helper invocation, ensuring a consistent set of overrides are used for each (transitive) operation.

@arcanis
Copy link
Contributor

arcanis commented Jul 12, 2022

Fwiw my thinking for virtual filesystems is that we would be able to override node:fs the same way as the hooks.

One note: perhaps it'd be simpler if the helpers too a fs reference rather than each individual function. This way users could just do:

import fs from 'node:fs';

packageResolve(..., {fs});

Rather than (which could get unwieldy if there was multiple FS IO slots):

import {tryStatSync} from 'node:fs';

packageResolve(..., {tryStatSync});

* @param {Set<string>} conditions
* @returns {resolved: URL, format? : string}
*/
function packageResolve(specifier, base, conditions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shouldn't conditions be part of a parameter bag? I think it's reasonable to imagine we would want to add more settings affecting the resolution down the road; making each one a separate argument would make the function difficult to use (especially when you only want to mention some of them).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the signature of the real function in https://github.com/nodejs/node/blob/3350d9610864af3219de7dd20e3ac18b3c214c52/lib/internal/modules/esm/resolve.js. We can change that, but I wasn’t thinking of that right now.


```js
function packageResolve(specifier, base, conditions, {
parsePackageName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are parsePackageName & the other functions below other helpers that you just didn't list for brievety?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to get feedback on this design before I wrote it out for all the other helpers. These are all functions in https://github.com/nodejs/node/blob/3350d9610864af3219de7dd20e3ac18b3c214c52/lib/internal/modules/esm/resolve.js

@GeoffreyBooth
Copy link
Member Author

Fwiw my thinking for virtual filesystems is that we would be able to override node:fs the same way as the hooks.

I think the question here is whether you want to override all filesystem calls, including app code that uses fs, or only file IO that happens as a result of import and require. If the latter, then you should create your set of overrides for readFile, readFileSync and so on, and pass your new versions of those functions into the helpers like I have tryStat in this example. Then the virtual filesystem you’ve created would apply to module loading only, and an app could still access files on the real hard disk etc.

If you really want to override all file IO, you could just replace the fs module itself via a loader. When a resolve hook sees fs or node:fs or its variations, something other than the Node builtin gets returned instead. We wouldn’t need to create distinct hooks for all the filesystem methods, I don’t think (or at least, try to get it to work with the hooks we have before proposing to add more).

@cspotcode
Copy link
Contributor

It's still my understanding that the thought experiment from #89 (comment) requires composing helper functions from two different loaders into one. This design by itself doesn't seem to fill that need. So for this design to make sense, I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.

@GeoffreyBooth
Copy link
Member Author

I think there needs to be some understanding that an additional feature will fill the cross-loader composition requirement.

Can you explain a bit what this requirement is? There’s a lot of stuff in that comment you’re linking to.

My goal with these helpers is just to reduce boilerplate. You could use them in a sorta-compositional way by having one loader override these helpers, so that when the next loader imports them they get the custom versions. So if the first loader uses resolve and load to return custom source whenever packageResolve is imported from 'module', this custom packageResolve would be what gets used by the next loader in the chain (once we get the “ambient loaders” idea working). Maybe this isn’t robust enough for your needs? I’m not sure what exactly you need, or what use case you’re trying to solve.

These helpers aren’t meant to solve all outstanding feature requests for loaders. It’s just one part of the roadmap. If there’s anything about creating these helpers that hinders any future use cases, or any way that the design of these helpers can be improved to better enable those use cases, please make specific suggestions on what to change.

@guybedford
Copy link

The API needs more work, but overall these seem like positive directions.

One approach might also be to just start simple - I don't think anyone would reject exposing isBareSpecifier. Could well be an easy way to start.

With package boundary functions, there are some questions as to whether it should directly return configuration, or if you should use path-based APIs primarily - getPackageBoundary(url) -> boundary + getPackageConfig(boundary) versus getPackageConfig(url) -> { config, boundary } sort of thing. Minor stuff but some design decisions in play.

Note that packageResolve doesn't return the path to the package in node_modules/pkg/ but instead handles the full resolution of a package including exports resolution etc etc. That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.

Perhaps having packageResolve mean what you thought it meant would work better for a helper. But that design also needs to be incorporated into how you then handle resolution within the package. A separate resolvePackageExport(subpath, packageUrl, conditions) can work. Although the paths for own-name and imports are still left out of this design flow which might make it easy for users to forget these if they are trying to write their own routines. So this gets a little harder to decompose well in a way that doesn't just expose partial functionality - I think with these pieces we need to ensure we have all the pieces in the design before exposing any one.

@cspotcode
Copy link
Contributor

cspotcode commented Jul 26, 2022

The yarn team might have some good insight on the design, since they have similar functions in their yarn PnP API.

For example, referring to #94 (comment):

Note that packageResolve ... handles the full resolution ... That's why it needs conditions. A function to resolve the path to a package wouldn't need any conditions parameter.

The latter exists in yarn's PnP API, named resolveToUnqualified. It has a well-scoped responsibility: it resolves a specifier to a package's root path, but does not attempt sub-path resolution. (IIRC, hopefully @arcanis can correct me if I'm wrong)
https://github.com/nodejs/node/blob/a506aa76a8cfb6438a0b4bc7604623bc6518a7c9/lib/internal/modules/esm/resolve.js#L774-L806

This I think clearly matches the way yarn and ts-node need to interop.

  • yarn is responsible for resolving foo/bar to /path/to/pnp/virtual/foo.zip or /monorepo/packages/foo. resolveToUnqualified does that.
  • ts-node is responsible for resolving from there to /monorepo/packages/foo/src/mod.ts based on various rules for mapping between file extensions, outDir -> rootDir, CommonJS support.

So when pnp and ts-node hooks are both active, the resolver should use:

  • yarn's implementation of resolveToUnqualified
  • ts-node's implementation of legacyMainResolve

I acknowledge that I'm jumping ahead, because I'm talking about composing hooks from two loaders. But I think it's still relevant to the design of these helpers.

@merceyz
Copy link
Member

merceyz commented Jul 26, 2022

yarn is responsible for resolving foo/bar to /path/to/pnp/virtual/foo.zip or /monorepo/packages/foo. resolveToUnqualified does that.

In this case it would resolve to /path/to/pnp/virtual/foo.zip/bar
https://yarnpkg.com/advanced/pnpapi/#resolvetounqualified

@JakobJingleheimer
Copy link
Member

Regarding augmenting functionality, could these helpers be implemented in ESM? If so, then couldn't loaders be used to customise their dependencies?

// exposed on node:module

import { whatever } from 'node:fs';

export default function packageResolve() {  }
// custom-loader.mjs

import * as td from 'testdouble';

export async function load(url, { parentURL }, nextLoad) {
  await td.replaceEsm('node:fs', { whatever: fakeWhatever });
  
  const { packageResolve } = await import('node:module');

  // …
}

function fakeWhatever() {  }

@GeoffreyBooth
Copy link
Member Author

Regarding augmenting functionality, could these helpers be implemented in ESM? If so, then couldn’t loaders be used to customise their dependencies?

At the moment, at least, all internal code must be CommonJS. As far as I understand it (and I’m sure this is a simplification), we’d have to rebuild how the internal code gets assembled to be passed into V8 at startup for it to become ESM, since it’s not like internal code uses either the CJS or ESM loaders to run.

@rafaelliu
Copy link

Is exposing the current logic as pieced helpers the right approach? I see two use cases that relate to this:

  1. Customizing the way node loads packages: involves making arbitrary changes to node's resolve/load hooks. That's fine if it's completely new logic, but doing small tweaks of existing logic is very cumbersome and this seems like is the focus of OP
  2. Leveraging some node logic to get package information: some code (likely a framework) wants to find the path of a package that node would load (or the non-deferenced symlinked path[2], or its package.json location[1], or more generally all resolved export, or the actual extension, etc). Note this is not the same as import.meta.resolve (or even require.resolve)

I can see how helpers would be a perfect fit for (2), but just exposing the logic without allowing for overriding it would still cause a lot of rewriting and maintenance for use case (1), no? A few examples:

  1. A customer loader that exposes a synthetic endpoint (e.g. "./myframework-metadata") would only need to a different packageExportsResolve, but has to re-write - almost verbatim - the functions leading to it: packageExportsResolve <- packageResolve <- moduleResolve <- defaultResolve
  2. @cspotcode 's typescript loader needs to know how to resolve to ".ts"-like extensions which means customizing finalizeResolution and legacyMainResolve, but in reality ts-node also has to basically copy the whole code changing the calls

Essentially, the deeper in the call stack the behavior trying to be customized is, he more replication there will be. A simple wrapper around the helpers would allow to change references to point to custom functions but it would be patching the internal loader and is hacky. As a compromise couldn't node expose the a DefaultLoader prototype it uses itself to instantiate a loader object used as the default loader (i.e. today's defaultResolve would call defaultLoader.resolve, and so forth)?

To a limited extent, this is what require does with the Module prototype with its setters and getters, but ESM could use actual functions and be more consistent defining everything in a prototype. A few things this would achieve:

  1. It would allow users to override specific methods of DefaultLoader down the prototype chain to instantiate their own custom loaders
  2. Users could still instantiate the DefaultLoader themselves to use node's internal logic as helpers, or any other loader in their dependencies really
  3. The actual defaultResolve instance used internally by node doesn't have to be exposed
  4. The API surface would be basically the same as helpers, just packaged differently

I'd image this is achievable through adapters and shims without much change to the current public spec. But it could be taken further later with decomposed prototypes for resolve and load, loader chaining could the use object references as nextResolve and nextLoad, there could be a way to get the an instance of the current prototype being used that would generalize import.meta.resolve into just createLoader().resolve, documented list of "stable" methods

[1] nodejs/node#33460
[2] nodejs/node#3402

@GeoffreyBooth
Copy link
Member Author

couldn’t node expose the a DefaultLoader prototype it uses itself to instantiate a loader object

No, because then the entire class becomes a public API surface and we could essentially never change any part of it without those changes being breaking. This is what happened with CommonJS monkey-patching and is why ESM is comparatively locked down.

@cspotcode
Copy link
Contributor

Well, all the internal CJS resolver functions are still private, so that's not the best example.

@rafaelliu
Copy link

Yeah, I don't think this is the same as require. CJS implementation allows for ad-hoc customization of certain behavior by using a Module prototype to get/set function references used in their private logic. I'm sure there were good reasons for it but with the benefit of hindsight, it doesn't look right.

I'd separate the concern into two parts:

  1. Exposing internal logic so users can use as helpers or selectively override on their own loaders: means that logic has to be backward compatible and thus harder to evolve and maintain
  2. Allowing change of current logic via monkey-patching: means we could have code stepping onto each other toes

The suggestion is to expose the prototype and not the instance used by node, and either can be frozen so (2) is not a problem. Customization would happen through prototype extension into a new class/prototype that would then be used in custom loaders. Everything is immutable.

As for (1), the API surface could be made to match whatever helpers were going to be created. Maintainers would choose what logic makes the "stable"/"public" cut, and the rest could be private. Granted, the more that makes to the prototype, the more useful it would be and if that's small enough there's no benefit. But it would still have the benefit of later be dialed up as see fit.

My guess is the biggest risk of having more rather than less logic as part of the prototype is for use cases of using it as helpers. If used to create custom loaders, the authors of those libraries are fairly aware they are working at a lower level, documentation could state the stability of each method (much like it does today for some features), and the library users would essentially have to enable a feature flag to use them (i.e. --loader).

The API surface for the helper use case and custom loader case doesn't even have to be the same. The custom loader prototype could be passed to a factory method that node calls to create custom loader, and only a subset of it - possibly even keeping with the named function exports idea, but still using the prototype instance behind the scenes - could be available for importing to use as helper. This would be quite a break tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants