-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
noindex/nofollow on a per-page basis #4
Comments
Would it make sense to also add the nofollow URLs into the |
@vsalvino I honestly don't know. I don't know what the SEO best practices look like in 2020. I personally like the modularity of the |
Yes it would. This still is used by Google and others to see what is allowed to index. So I would definitely include this as something which should be included, either in the same feature, or in a separate feature. |
One more point I forgot to raise: the sitemap. Today we have a custom solution in place (but far from perfect) to generate the sitemap based on pages not having the It means that we cannot use the Wagtail implementation to generate our sitemap (see https://docs.wagtail.io/en/v2.1.1/reference/contrib/sitemaps.html#basic-configuration). So it would be important to take that in consideration when working on this issue, I believe. |
It would definitely make sense for us to provide a better sitemap, if Wagtail's is limiting. I'd be happy to review a PR @pierremanceaux if you would be willing to share your implementation? |
Hey @vsalvino , here is what we have for now. Keep in mind that this code it 4 years old and probably needs some polishing, but hopefully it helps! ;) View @never_cache
def sitemap_view(request):
cache_key = 'wagtail-sitemap:' + str(request.site.id)
sitemap_xml = cache.get(cache_key)
if not sitemap_xml:
sitemap = Sitemap()
sitemap_xml = sitemap.render()
cache.set(cache_key, sitemap_xml, getattr(settings, 'WAGTAILSITEMAPS_CACHE_TIMEOUT', 6000))
response = HttpResponse(sitemap_xml)
response['Content-Type'] = "text/xml; charset=utf-8"
return response Sitemap generation class Sitemap(object):
EXCLUDED_TYPES = [
JobSinglePage
]
template = 'sitemap.xml'
@staticmethod
def _get_urls():
site = Site.objects.filter(is_default_site=True).select_related("root_page").get()
pages_qs = site.root_page.get_descendants(
inclusive=True
).live().public().exclude(basepage__seo_robot_meta__icontains="noindex").order_by('path')\
.specific()
for page in pages_qs.iterator():
# TODO: replace this by filtering this in the queryset
if type(page) in Sitemap.EXCLUDED_TYPES:
continue
for url in page.get_sitemap_urls():
yield url
def render(self):
return render_to_string(self.template, {
'urlset': self._get_urls()
}) Template <?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{% spaceless %}
{% for url in urlset %}
<url>
<loc>{{ url.location }}</loc>
{% if url.lastmod %}<lastmod>{{ url.lastmod|date:"Y-m-d" }}</lastmod>{% endif %}
<changefreq>weekly</changefreq>
</url>
{% endfor %}
{% endspaceless %}
</urlset> |
Hi, I hope this feature will be added to next release! |
Has this been achieved? |
Hi, checking in about 4 years later, wondering if this has been implemented? Or perhaps people are using some other solution? |
This has not been a priority or a need for us. However, if someone is willing to implement it, including tests and docs, I would be willing to review and merge it. |
Opening an issue, following this #coderedcms thread.
Prerequisites
To understand better how noindex/follow work, this is a good starting point.
Goal
Ability to control on a per-page basis the presence of
noindex
andnofollow
robot instructions.How do I imagine it?
From the top of my mind, the most needed feature would first to be able to
noindex
a page, in order to have a better control over what part of a website should be indexed by search engines.The most basic approach would be to have a checkbox in the "SEO" tab, like the following:
or another version for more technical users:
<meta>
" tag)It would then output
<meta name="robots" content="noindex">
in the head of the page. That's it.Edit: Exclude no-indexed pages from the sitemap. See this comment.
Suggestions to make it better
noindex
directive should be applied via a tag in the , or via request headers. Could be a global setting (site-wide), or local to each page, not sure what would be best.nofollow
noindex
ornofollow
on the current page, and all child pages too. Useful when for example you have a pure SEA marketing group of pages that you want to be isolated. For instancewww.company.com/lp/
, every child pages could havenoindex
preset, inheriting this setting from/lp/
.The text was updated successfully, but these errors were encountered: