Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap Generation #1232

Open
thescientist13 opened this issue May 18, 2024 · 5 comments · Fixed by #1240 · May be fixed by #1260
Open

Sitemap Generation #1232

thescientist13 opened this issue May 18, 2024 · 5 comments · Fixed by #1240 · May be fixed by #1260
Assignees
Labels
alpha.3 CLI documentation Greenwood specific docs feature New feature or request v0.30.0
Milestone

Comments

@thescientist13
Copy link
Member

thescientist13 commented May 18, 2024

Summary

Called out in our Slack channel, but Greenwood should definitely have some support for sitemaps, which are an XML file used to tell Search Engines about the content and pages contained within a site, in particular for larger sites and / or where links between pages are maybe not as consistent.
https://developers.google.com/search/docs/crawling-indexing/sitemaps/overview

A sitemap tells search engines which pages and files you think are important in your site, and also provides valuable information about these files. For example, when the page was last updated and any alternate language versions of the page.

Here is a basic example
https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.example.com/foo.html</loc>
    <lastmod>2022-06-04</lastmod>
  </url>
</urlset>

Details

I think the approach used in Next.js is probably good enough for Greenwood supporting either of this options

  1. ✅ Static File, e.g. sitemap.xml - will be copied automatically to the output
  2. Dynamic File, e.g. sitemap.xml.js - will be provided a copy of the greenwood graph and be expected to return valid XML
    export async function sitemap(compilation) {
      const urls = compilation.graph.map((page) => {
        return `
          <url>
            <loc>http://www.example.com${page.route}</loc>
          </url>
        `;
      });
    
      return `
        <?xml version="1.0" encoding="UTF-8"?>
        <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
          ${urls}
        </urlset>
      `
    }

Might want to wait until after #955 is merged since we might want to piggy back off any solutions there re: extending the ability for pages to be more than just markdown (.md) or JavaScript (.js).

@thescientist13 thescientist13 added documentation Greenwood specific docs CLI feature New feature or request labels May 18, 2024
@thescientist13 thescientist13 added this to the 1.0 milestone May 18, 2024
@thescientist13
Copy link
Member Author

For now a couple ways to implement this manually could be to:

  1. Create / maintain a src/sitemap.xml and use a copy plugin to put into the output directory
  2. After the greenwood build step, read the contents of graph.json in the output directory and generate the file

@jstockdi
Copy link

For 2, would it be a copy plugin? ie, the plugin would generate a temporary file, then pass

     {
        from: tempPath,
        to: new URL(`sitemap.xml`, outputDir)
     }

@thescientist13
Copy link
Member Author

thescientist13 commented May 19, 2024

@jstockdi
Greenwood should automatically generate a graph.json file for you, that will be available in the output directory after running greenwood build (it's technically there too during development in the .greenwood/ tmp folder)

So after running greenwood build, a simple Node script should suffice

// sitemap-gen.js
import fs from 'fs';
import graph from './public/graph.json' with { type: 'json'};

const urls = graph.map((page) => {
  return `
    <url>
      <loc>http://www.example.com${page.route}</loc>
    </url>
  `
}).join('\n');

fs.writeFileSync('./public/sitemap.xml', `
  <?xml version="1.0" encoding="UTF-8"?>
  <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    ${urls}
  </urlset>
`);
# after running Greenwood build, or add to your npm scripts...
$ node sitemap-gen.js

edit: sorry, I think you were referencing option 1, in which case yes, a copy plugin would do the trick, e.g.

function myCopySitemapPlugin() {
  return {
    type: 'copy',
    name: 'plugin-copy-sitemap',
    provider: (compilation) => {
      const filename = 'sitemap.xml';
      const { userWorkspace, outputDir } = compilation.context;

      return [{
        from: new URL('./${filename}', userWorkspace),
        to: new URL('./${filename}', outputDir)
      }];
    }
  };
}

Otherwise, to generate dynamically for now, the above script sample should also work. 🎯

@jstockdi
Copy link

jstockdi commented May 20, 2024

Actually, I was thinking use a copy plugin...

Read the graph, write a dynamic file to scratch, then copy to final.

const greenwoodPluginSitemap = [{
    type: 'copy',
    name: 'plugin-copy-sitemap',
    provider: async (compilation) => {
      
      const { outputDir, scratchDir } = compilation.context;

      const urls = graph.map((page) => {
        return `
          <url>
            <loc>http://www.example.com${page.route}</loc>
          </url>
        `
      }).join('\n');
      
      const sitemapFromUrl = new URL(`./sitemap.xml`, scratchDir)
      fs.writeFileSync(
        sitemapFromUrl, `
        <?xml version="1.0" encoding="UTF-8"?>
        <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
          ${urls}
        </urlset>
      `);
  
         
      const assets = [];

      assets.push({
        from: sitemapFromUrl,
        to: new URL(`./${fileName}`, outputDir)
      });
  
      return assets;      
    }
  }];

@thescientist13
Copy link
Member Author

thescientist13 commented May 26, 2024

So for the two different options here from a contributing perspective, here are my initial thoughts

Static Sitemap

For a static sitemap in the root workspace folder, e.g. src/sitemap.xml it should just be as simple as following one of the existing "copy" based features / plugins, like our robots.txt plugin
https://github.com/ProjectEvergreen/greenwood/blob/master/packages/cli/src/plugins/copy/plugin-copy-robots.js

Dynamic Sitemap

As for supporting a dynamic flavor of this, e.g. src/sitemap.xml.js I'm not sure I have an idea on the best way to instrument this off the top of my head, mainly for handling development vs production workflows which are slightly different.

For development, we could make a resource plugin that resource plugin that has a serve lifecycle that checks if the dynamic flavor exists in shouldServe and then the serve function would be something like this?

async function shouldServe(url) {
  return url.pathname.endsWith('sitemap.xml.js')
}

async function serve(url) {
  const { generateSitemap } = (await import(url)).then(module => module);
  const sitemap = await generateSitemap(this.compilation);

  return new Response(sitemap, { headers: { 'Content-Type': 'text/xml' });
}

For production, we could probably just run that similar logic in serve (except just outputting a file instead of returning a Response object) in the bundle command.

Testing

Greenwood tests are basically black box tests, You can create an exact version of any greenwood project + config, run the CLI, and just the output, in either case, that a sitemap.xml file is generated in the output folder.
https://github.com/ProjectEvergreen/greenwood/tree/master/packages/cli/test/cases

We would probably want on test case for each of static and dynamic sitemaps

Documentation

I think for now the best place to document these would probably be in the Styles and Assets page

jstockdi added a commit to jstockdi/greenwood that referenced this issue Jun 5, 2024
@thescientist13 thescientist13 linked a pull request Jun 5, 2024 that will close this issue
4 tasks
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jun 14, 2024
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jun 14, 2024
@thescientist13 thescientist13 mentioned this issue Jun 22, 2024
38 tasks
thescientist13 pushed a commit to jstockdi/greenwood that referenced this issue Jun 22, 2024
thescientist13 added a commit that referenced this issue Jun 22, 2024
* #1232 - Adding default static sitemap plugin

* 1232 - Adding meta files documentation

* update meta files docs and usage examples

---------

Co-authored-by: Owen Buckley <[email protected]>
@thescientist13 thescientist13 linked a pull request Jun 22, 2024 that will close this issue
3 tasks
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 21, 2024
@jstockdi jstockdi linked a pull request Jul 21, 2024 that will close this issue
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 21, 2024
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 21, 2024
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 21, 2024
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 22, 2024
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 22, 2024
@thescientist13 thescientist13 linked a pull request Jul 23, 2024 that will close this issue
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 27, 2024
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 27, 2024
jstockdi added a commit to jstockdi/greenwood that referenced this issue Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment