noindex/nofollow on a per-page basis #4

pierremanceaux · 2020-10-01T15:16:50Z

Opening an issue, following this #coderedcms thread.

Prerequisites

To understand better how noindex/follow work, this is a good starting point.

Goal

Ability to control on a per-page basis the presence of noindex and nofollow robot instructions.

How do I imagine it?

From the top of my mind, the most needed feature would first to be able to noindex a page, in order to have a better control over what part of a website should be indexed by search engines.

The most basic approach would be to have a checkbox in the "SEO" tab, like the following:

Exclude this page from being indexed by search engines

or another version for more technical users:

"noindex" this page (using "<meta>" tag)

It would then output <meta name="robots" content="noindex"> in the head of the page. That's it.

Edit: Exclude no-indexed pages from the sitemap. See this comment.

Suggestions to make it better

Let the user choose if the noindex directive should be applied via a tag in the , or via request headers. Could be a global setting (site-wide), or local to each page, not sure what would be best.
Add a similar checkbox to enable nofollow
Apply noindex or nofollow on the current page, and all child pages too. Useful when for example you have a pure SEA marketing group of pages that you want to be isolated. For instance www.company.com/lp/, every child pages could have noindex preset, inheriting this setting from /lp/.
A way to control which bots to target (tricky, and quite advanced I guess).

The text was updated successfully, but these errors were encountered:

vsalvino · 2020-10-01T15:19:34Z

Would it make sense to also add the nofollow URLs into the robots.txt?

pierremanceaux · 2020-10-01T15:25:14Z

@vsalvino I honestly don't know. I don't know what the SEO best practices look like in 2020. I personally like the modularity of the meta tags in the head. It makes it also simple to debug when you see everything in the page source, I don't imagine myself checking a robots.txt file to know what's excluded. An SEO expert could help you on this, I'm not the right person :)

moojen · 2020-10-07T08:18:07Z

Would it make sense to also add the nofollow URLs into the robots.txt?

Yes it would. This still is used by Google and others to see what is allowed to index. So I would definitely include this as something which should be included, either in the same feature, or in a separate feature.

pierremanceaux · 2020-11-23T13:12:22Z

One more point I forgot to raise: the sitemap. Today we have a custom solution in place (but far from perfect) to generate the sitemap based on pages not having the noindex flag. It would apparently be a bad practice to have in the sitemap pages that should not be indexed.

It means that we cannot use the Wagtail implementation to generate our sitemap (see https://docs.wagtail.io/en/v2.1.1/reference/contrib/sitemaps.html#basic-configuration). So it would be important to take that in consideration when working on this issue, I believe.

vsalvino · 2020-11-23T16:50:51Z

It would definitely make sense for us to provide a better sitemap, if Wagtail's is limiting. I'd be happy to review a PR @pierremanceaux if you would be willing to share your implementation?

pierremanceaux · 2020-11-23T16:57:28Z

Hey @vsalvino , here is what we have for now. Keep in mind that this code it 4 years old and probably needs some polishing, but hopefully it helps! ;)

View

@never_cache
def sitemap_view(request):
    cache_key = 'wagtail-sitemap:' + str(request.site.id)
    sitemap_xml = cache.get(cache_key)

    if not sitemap_xml:
        sitemap = Sitemap()
        sitemap_xml = sitemap.render()

        cache.set(cache_key, sitemap_xml, getattr(settings, 'WAGTAILSITEMAPS_CACHE_TIMEOUT', 6000))

    response = HttpResponse(sitemap_xml)
    response['Content-Type'] = "text/xml; charset=utf-8"

    return response

Sitemap generation

class Sitemap(object):
    EXCLUDED_TYPES = [
        JobSinglePage
    ]
    template = 'sitemap.xml'

    @staticmethod
    def _get_urls():
        site = Site.objects.filter(is_default_site=True).select_related("root_page").get()
        pages_qs = site.root_page.get_descendants(
            inclusive=True
        ).live().public().exclude(basepage__seo_robot_meta__icontains="noindex").order_by('path')\
            .specific()

        for page in pages_qs.iterator():
            # TODO: replace this by filtering this in the queryset
            if type(page) in Sitemap.EXCLUDED_TYPES:
                continue
            for url in page.get_sitemap_urls():
                yield url

    def render(self):
        return render_to_string(self.template, {
            'urlset': self._get_urls()
        })

Template

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{% spaceless %}
{% for url in urlset %}
  <url>
    <loc>{{ url.location }}</loc>
    {% if url.lastmod %}<lastmod>{{ url.lastmod|date:"Y-m-d" }}</lastmod>{% endif %}
    <changefreq>weekly</changefreq>
   </url>
{% endfor %}
{% endspaceless %}
</urlset>

anefta · 2020-11-30T02:07:14Z

Hi,
the per page noindex and nofollow is super critical nowadays in SEO nowadays.
SEOs mostly use robots.txt to block admin areas and disallow bots and not pages.

I hope this feature will be added to next release!

benlamptey-gocity · 2022-06-22T10:46:41Z

Has this been achieved?

gideonaa · 2024-09-13T02:59:24Z

Hi, checking in about 4 years later, wondering if this has been implemented? Or perhaps people are using some other solution?

vsalvino · 2024-09-13T16:25:09Z

This has not been a priority or a need for us. However, if someone is willing to implement it, including tests and docs, I would be willing to review and merge it.

vsalvino added the hacktoberfest label Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

noindex/nofollow on a per-page basis #4

noindex/nofollow on a per-page basis #4

pierremanceaux commented Oct 1, 2020 •

edited

Loading

vsalvino commented Oct 1, 2020

pierremanceaux commented Oct 1, 2020 •

edited

Loading

moojen commented Oct 7, 2020

pierremanceaux commented Nov 23, 2020

vsalvino commented Nov 23, 2020

pierremanceaux commented Nov 23, 2020

anefta commented Nov 30, 2020 •

edited

Loading

benlamptey-gocity commented Jun 22, 2022

gideonaa commented Sep 13, 2024

vsalvino commented Sep 13, 2024

noindex/nofollow on a per-page basis #4

noindex/nofollow on a per-page basis #4

Comments

pierremanceaux commented Oct 1, 2020 • edited Loading

Prerequisites

Goal

How do I imagine it?

Suggestions to make it better

vsalvino commented Oct 1, 2020

pierremanceaux commented Oct 1, 2020 • edited Loading

moojen commented Oct 7, 2020

pierremanceaux commented Nov 23, 2020

vsalvino commented Nov 23, 2020

pierremanceaux commented Nov 23, 2020

anefta commented Nov 30, 2020 • edited Loading

benlamptey-gocity commented Jun 22, 2022

gideonaa commented Sep 13, 2024

vsalvino commented Sep 13, 2024

pierremanceaux commented Oct 1, 2020 •

edited

Loading

pierremanceaux commented Oct 1, 2020 •

edited

Loading

anefta commented Nov 30, 2020 •

edited

Loading