feat: preserve vfile data from unified processor #236

stefanprobst · 2022-05-29T19:48:53Z

unified plugins can store custom information on the resulting VFile on the data property. this can be used e.g. to pass a table of contents or similar.

this PR preserves the information on VFile.data.

note that this is done only for the remark pipeline, mdx-bundler seems to not expose the vfile unfortunately (although i haven't used it personally so i might be missing something)

x-ref: #216

changeset-bot · 2022-05-29T19:48:56Z

⚠️ No Changeset found

Latest commit: d7df86a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

stefanprobst · 2022-05-30T10:04:02Z

for mdx, we could maybe convert any vfile.data to a javascript export via a rehype plugin that runs as the last one in the processing pipeline (although it's probably better to just document how to do this rather than add by default):

import { valueToEstree } from "estree-util-value-to-estree";

function convertVFileDataToExport() {
  return function transformer(tree, vfile) {
    const node = {
      type: "mdxjsEsm",
      data: {
        estree: {
          type: "Program",
          sourceType: "module",
          body: [
            {
              type: "ExportNamedDeclaration",
              declaration: {
                type: "VariableDeclaration",
                declarations: [
                  {
                    type: "VariableDeclarator",
                    id: {
                      type: "Identifier",
                      name: "data",
                    },
                    init: valueToEstree(vfile.data),
                  },
                ],
                kind: "const",
              },
              specifiers: [],
              source: null,
            },
          ],
        },
      },
    };

    tree.children.push(node);
  };
}

schickling · 2022-05-31T08:49:27Z

Thanks a lot for raising this PR @stefanprobst. I'm pretty busy right now but hope to find some time soon to act on this! 🤞

wrthwhl · 2022-06-27T19:47:57Z

Hi @stefanprobst, I am really interested in this but primarily for mdx documents. I saw that the PR you raised seems to implement changes only for markdown. Wouldn't it make sense for mdx, too?
Thanks in advance!

stefanprobst · 2022-06-27T20:25:16Z

hi, unfortunately there is no way to access vfile.data when using mdx-bundler. i think for mdx, there are two options:

allow not using mdx-bundler for mdx processing, and instead allow providing a custom unified pipeline. this is what i am suggesting in [suggestion] allow mdx processing without mdx-bundler #247 and i've been using this approach personally.
while vfile.data is the only place for plugins to put additional data when processing markdown, "mdx-aware" plugins can use a different approach and add the data as a named export to the syntax tree. this is what the comment above shows and what e.g. is done here for table of contents, but it's up to individual plugins if they support this. for contentlayer users, these named exports would then be available via useMDXComponent (but needs useMDXComponent does not allow accessing named exports #246 to be fixed first)

wrthwhl · 2022-06-28T08:46:11Z

Thanks so much for taking the time to clarify and explain! Will use a custom hook as per your suggestion here for now.

schickling · 2022-07-05T10:03:24Z

Thanks a lot again for opening this PR @stefanprobst. I've just given it a more thorough look and while I agree with the problems/use cases it addresses, I think there are some open questions/problems we should address before merging:

1. Increased document size

Adding a data property to each document comes at the "cost" of increasing the document file sizes for all users (even users who might not need this feature):

A possible solution to this concern could be to introduce a configuration option to let users opt-in to this functionality.

2. Divergence of Markdown vs MDX behaviour

To what ever degree possible, I'd like to keep the behaviour of Markdown and MDX processing as similar as possible and avoid "gaps" / asymmetries in terms functionality, APIs and behaviour. Before merging this PR I'd like to better explore and implement ways to maintain the current level of consistency between the MD and MDX.

As an alternative approach I could also imagine introducing a way to let the user expose remark/rehype/MDX processing data via computedFields. I'd be curious to hear more thoughts/feedback on this as I think it might be the cleanest/simplest solution to the underlying problems.

stefanprobst · 2022-07-18T14:00:56Z

thanks for your comments @schickling! some more thoughts:

(i) increased document size

the rawDocumentData "extra data" the arrow points to in the image above comes from a unified plugin which is added by contentlayer (see here). not sure if this should actually be added by default, because unified plugins which have not specifically been written for contentlayer will not know where to find this info. the canonical way to pass information about the currently processed file to a unified pipeline is via a VFile:
```
// implicitly
processor().process({ value: mdString, path: mdFilePath })
// or explicitly
const vfile = new VFile({ value: mdString, path: mdFilePath })
processor().process(vfile)
```
other unified plugins will have been explicitly added by users, and i think their output on data should be part of the resulting document.
i do think a concern is that users may not want to send everything on body.data to the client, but rather pick specific properties. however, the even bigger issue here is not sending body.raw (e.g. the next-contentlayer example sends content as both html and raw markdown in getStaticProps) - so this would need some recommended pattern in the docs i think

(ii) divergence of markdown and mdx behavior

generally, i think some asymmetries between md and mdx are to be expected - after all, they are parsed differently
vfile.data does work for both, but unfortunately not when using @mdx-js/esbuild (which mdx-bundler wraps)
mdx can add arbitrary data as named exports to the AST, but this data will only be available once the resulting javascript has been evaled

(iii) use computed fields

i agree that using computedFields would be great!

with the above, it's kind of already possible:

const Post = defineDocumentType(() => {
  return {
    name: "Post",
    description: "Post",
    filePathPattern: `posts/**/*.mdx`,
    contentType: "mdx",
    fields: {
      /* ... */
    },
    computedFields: {
      readingTime: {
        description: "Reading time",
        type: "number",
        resolve(doc) {
          return doc["body"].data["readingTime"];
        },
      },
    },
  };
});

Note that body.data itself will be correctly typed if unified plugins use module augmentation.

i think the main thing missing is support for non-primitive types on computedFields. for example:

const Post = defineDocumentType(() => {
  return {
    name: "Post",
    description: "Post",
    filePathPattern: `posts/**/*.mdx`,
    contentType: "mdx",
    fields: {
      /* ... */
    },
    computedFields: {
      toc: {
        description: 'Table of contents',
        // FIXME: How to use types exported by plugins here?
        type: 'TableOfContents',
        resolve(doc) {
          return doc['body'].data['toc']
        },
      },
    },
  };
});

regarding the suggestion here: one thing to consider is that there can be multiple markdown/mdx fields per document (e.g. body, but also an additional frontmatter field like summary which should be parsed as md/mdx), so passing in a single vfile to resolve(doc, { vfile } may not be sufficient

stale · 2022-08-30T17:49:50Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

schickling · 2022-08-31T10:55:26Z

Still WIP.

stale · 2022-09-25T14:45:36Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

feat: preserve vfile data from unified processor

5f742b4

test: add test for passing vfile.data from unified processor

f6655d5

fix: better types

d7df86a

schickling added the meta: feedback-wanted label Jul 5, 2022

stefanprobst marked this pull request as draft August 3, 2022 09:50

stale bot added the meta: stale label Aug 30, 2022

stale bot removed the meta: stale label Aug 31, 2022

stale bot added the meta: stale label Sep 25, 2022

schickling added the meta: never-stale label Sep 25, 2022

stale bot removed the meta: stale label Sep 25, 2022

essential-randomness mentioned this pull request Nov 14, 2022

Provide a way to populate field values from Markdown/MDX processing pipeline #216

Open

stefanprobst closed this Feb 27, 2024

stefanprobst deleted the feat/vfile-data branch February 27, 2024 05:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: preserve vfile data from unified processor #236

feat: preserve vfile data from unified processor #236

stefanprobst commented May 29, 2022 •

edited

Loading

changeset-bot bot commented May 29, 2022 •

edited

Loading

stefanprobst commented May 30, 2022 •

edited

Loading

schickling commented May 31, 2022

wrthwhl commented Jun 27, 2022

stefanprobst commented Jun 27, 2022

wrthwhl commented Jun 28, 2022

schickling commented Jul 5, 2022

stefanprobst commented Jul 18, 2022 •

edited

Loading

stale bot commented Aug 30, 2022

schickling commented Aug 31, 2022

stale bot commented Sep 25, 2022

feat: preserve vfile data from unified processor #236

feat: preserve vfile data from unified processor #236

Conversation

stefanprobst commented May 29, 2022 • edited Loading

changeset-bot bot commented May 29, 2022 • edited Loading

⚠️ No Changeset found

stefanprobst commented May 30, 2022 • edited Loading

schickling commented May 31, 2022

wrthwhl commented Jun 27, 2022

stefanprobst commented Jun 27, 2022

wrthwhl commented Jun 28, 2022

schickling commented Jul 5, 2022

1. Increased document size

2. Divergence of Markdown vs MDX behaviour

stefanprobst commented Jul 18, 2022 • edited Loading

stale bot commented Aug 30, 2022

schickling commented Aug 31, 2022

stale bot commented Sep 25, 2022

stefanprobst commented May 29, 2022 •

edited

Loading

changeset-bot bot commented May 29, 2022 •

edited

Loading

stefanprobst commented May 30, 2022 •

edited

Loading

stefanprobst commented Jul 18, 2022 •

edited

Loading