Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: preserve vfile data from unified processor #236

Closed

Conversation

stefanprobst
Copy link
Contributor

@stefanprobst stefanprobst commented May 29, 2022

unified plugins can store custom information on the resulting VFile on the data property. this can be used e.g. to pass a table of contents or similar.

this PR preserves the information on VFile.data.

note that this is done only for the remark pipeline, mdx-bundler seems to not expose the vfile unfortunately (although i haven't used it personally so i might be missing something)

x-ref: #216

@changeset-bot
Copy link

changeset-bot bot commented May 29, 2022

⚠️ No Changeset found

Latest commit: d7df86a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@stefanprobst
Copy link
Contributor Author

stefanprobst commented May 30, 2022

for mdx, we could maybe convert any vfile.data to a javascript export via a rehype plugin that runs as the last one in the processing pipeline (although it's probably better to just document how to do this rather than add by default):

import { valueToEstree } from "estree-util-value-to-estree";

function convertVFileDataToExport() {
  return function transformer(tree, vfile) {
    const node = {
      type: "mdxjsEsm",
      data: {
        estree: {
          type: "Program",
          sourceType: "module",
          body: [
            {
              type: "ExportNamedDeclaration",
              declaration: {
                type: "VariableDeclaration",
                declarations: [
                  {
                    type: "VariableDeclarator",
                    id: {
                      type: "Identifier",
                      name: "data",
                    },
                    init: valueToEstree(vfile.data),
                  },
                ],
                kind: "const",
              },
              specifiers: [],
              source: null,
            },
          ],
        },
      },
    };

    tree.children.push(node);
  };
}

@schickling
Copy link
Collaborator

Thanks a lot for raising this PR @stefanprobst. I'm pretty busy right now but hope to find some time soon to act on this! 🤞

@wrthwhl
Copy link

wrthwhl commented Jun 27, 2022

Hi @stefanprobst, I am really interested in this but primarily for mdx documents. I saw that the PR you raised seems to implement changes only for markdown. Wouldn't it make sense for mdx, too?
Thanks in advance!

@stefanprobst
Copy link
Contributor Author

hi, unfortunately there is no way to access vfile.data when using mdx-bundler. i think for mdx, there are two options:

@wrthwhl
Copy link

wrthwhl commented Jun 28, 2022

Thanks so much for taking the time to clarify and explain! Will use a custom hook as per your suggestion here for now.

@schickling
Copy link
Collaborator

Thanks a lot again for opening this PR @stefanprobst. I've just given it a more thorough look and while I agree with the problems/use cases it addresses, I think there are some open questions/problems we should address before merging:

1. Increased document size

Adding a data property to each document comes at the "cost" of increasing the document file sizes for all users (even users who might not need this feature):

CleanShot 2022-07-05 at 11 36 31@2x

A possible solution to this concern could be to introduce a configuration option to let users opt-in to this functionality.

2. Divergence of Markdown vs MDX behaviour

To what ever degree possible, I'd like to keep the behaviour of Markdown and MDX processing as similar as possible and avoid "gaps" / asymmetries in terms functionality, APIs and behaviour. Before merging this PR I'd like to better explore and implement ways to maintain the current level of consistency between the MD and MDX.


As an alternative approach I could also imagine introducing a way to let the user expose remark/rehype/MDX processing data via computedFields. I'd be curious to hear more thoughts/feedback on this as I think it might be the cleanest/simplest solution to the underlying problems.

@stefanprobst
Copy link
Contributor Author

stefanprobst commented Jul 18, 2022

thanks for your comments @schickling! some more thoughts:

(i) increased document size

  • the rawDocumentData "extra data" the arrow points to in the image above comes from a unified plugin which is added by contentlayer (see here). not sure if this should actually be added by default, because unified plugins which have not specifically been written for contentlayer will not know where to find this info. the canonical way to pass information about the currently processed file to a unified pipeline is via a VFile:

    // implicitly
    processor().process({ value: mdString, path: mdFilePath })
    // or explicitly
    const vfile = new VFile({ value: mdString, path: mdFilePath })
    processor().process(vfile)
  • other unified plugins will have been explicitly added by users, and i think their output on data should be part of the resulting document.

  • i do think a concern is that users may not want to send everything on body.data to the client, but rather pick specific properties. however, the even bigger issue here is not sending body.raw (e.g. the next-contentlayer example sends content as both html and raw markdown in getStaticProps) - so this would need some recommended pattern in the docs i think

(ii) divergence of markdown and mdx behavior

  • generally, i think some asymmetries between md and mdx are to be expected - after all, they are parsed differently

  • vfile.data does work for both, but unfortunately not when using @mdx-js/esbuild (which mdx-bundler wraps)

  • mdx can add arbitrary data as named exports to the AST, but this data will only be available once the resulting javascript has been evaled

(iii) use computed fields

  • i agree that using computedFields would be great!

  • with the above, it's kind of already possible:

    const Post = defineDocumentType(() => {
      return {
        name: "Post",
        description: "Post",
        filePathPattern: `posts/**/*.mdx`,
        contentType: "mdx",
        fields: {
          /* ... */
        },
        computedFields: {
          readingTime: {
            description: "Reading time",
            type: "number",
            resolve(doc) {
              return doc["body"].data["readingTime"];
            },
          },
        },
      };
    });

    Note that body.data itself will be correctly typed if unified plugins use module augmentation.

  • i think the main thing missing is support for non-primitive types on computedFields. for example:

    const Post = defineDocumentType(() => {
      return {
        name: "Post",
        description: "Post",
        filePathPattern: `posts/**/*.mdx`,
        contentType: "mdx",
        fields: {
          /* ... */
        },
        computedFields: {
          toc: {
            description: 'Table of contents',
            // FIXME: How to use types exported by plugins here?
            type: 'TableOfContents',
            resolve(doc) {
              return doc['body'].data['toc']
            },
          },
        },
      };
    });
  • regarding the suggestion here: one thing to consider is that there can be multiple markdown/mdx fields per document (e.g. body, but also an additional frontmatter field like summary which should be parsed as md/mdx), so passing in a single vfile to resolve(doc, { vfile } may not be sufficient

@stefanprobst stefanprobst marked this pull request as draft August 3, 2022 09:50
@stale
Copy link

stale bot commented Aug 30, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the meta: stale label Aug 30, 2022
@schickling
Copy link
Collaborator

Still WIP.

@stale stale bot removed the meta: stale label Aug 31, 2022
@stale
Copy link

stale bot commented Sep 25, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants