-
Notifications
You must be signed in to change notification settings - Fork 276
SitemapGenerator Usage
Please add your name, site and how many links in your Sitemap, and, if you feel like it, a small snippet of cool code, showing how SitemapGenerator made your life easier.
Sambit Behera, bookprice.co, ~280M links
Produced sitemaps with more than 280M links in a matter of 4.3 hours by running the sitemap generation in parallel (parallel gem) over 4 cores.
....
Sitemap stats: 35,622,979 links / 713 sitemaps / 129m58s
Sitemap stats: 35,622,979 links / 713 sitemaps / 130m04s
Sitemap stats: 35,622,979 links / 713 sitemaps / 130m11s
Sitemap stats: 35,622,979 links / 713 sitemaps / 131m18s
....
15460.43 real 39030.90 user 14347.09 sys
Parallel.each(domains, :in_processes => 4) do |domain|
SitemapGenerator::Sitemap.default_host = "http://#{domain}"
SitemapGenerator::Sitemap.sitemaps_path = "sitemaps/#{domain}"
SitemapGenerator::Sitemap.adapter = SitemapGenerator::FileAdapter.new
SitemapGenerator::Sitemap.create do
add '/', changefreq: 'monthly', priority: 1.0
add '/signup', changefreq: 'monthly', priority: 0.8
add '/login', changefreq: 'monthly', priority: 0.8
add '/about', changefreq: 'monthly', priority: 0.8
add '/contact', changefreq: 'monthly', priority: 0.8
add '/faq', changefreq: 'monthly', priority: 0.8
add '/careers', changefreq: 'monthly', priority: 0.8
add '/privacy', changefreq: 'monthly', priority: 0.8
add '/terms', changefreq: 'monthly', priority: 0.8
add '/password_resets/new', changefreq: 'monthly', priority: 0.64
...
end
end
Andrew Cetinick, www.sherpi.com, 233,939 links, 4m40s
Sitemap stats: 233,939 links / 5 sitemaps / 4m40s
Rake task on Heroku to push to S3 bucket. Also added this route to my Rails app so that it would redirect the sitemaps to S3
get '/sitemaps/:filename.xml.gz' => 'pages#sitemap'
Adam Salter, www.answermyoffice.com, 72,956 links, 2m03s
Zipcode.find(:all, :include => :city).each do |z|
sitemap.add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
end
Rob Biedenharn, stylepath.com, Sitemap stats: 4,684,358 links, 6h21m31s
Category.find_in_order.each do |category|
sitemap.add category_page_path(category), :changefreq => 'daily', :priority => 0.6
Product.interesting_from_category(category.id, 0, nil, true).each do |product|
sitemap.add details_id_path(product), :changefreq => 'weekly', :priority => 0.5
end
end
And running against a Rails 1.2.2 project. Only a few changes needed:
- Need to provide a String#present? (which was easy since I already had String#nonblank?)
- Cope with the change from
app/controllers/application.rb
toapp/controllers/application_controller.rb
by adding:-
require 'app/controllers/application'
tolib/sitemap_generator/helper.rb
-
mattmueller, 1.9 million urls
It took about 2 hours to generate on a very powerful production server without niceing it. If you decide to nice it (we tried at 15) for that sort of load it would take > 8 hours
openc, 104+million urls for OpenCorporates
Takes several days to generate. Runs weekly on worker server (also processes Resque jobs), and then SCP’d to shared folder on app server, which is symlinked from production.
Eric Hochberger, 300k+ urls for The Hollywood Gossip
Since my main sitemap takes too long for Google to process, I take advantage of sitemap_generator’s multiple config option. I generate smaller sitemaps for rapidly changing content such as news.
I use Heroku and S3 (via the Wave Adapter). Due to Google’s Webmaster Tools restriction that sitemap submission must be on same domain, I use 302s to point to sitemap the S3 buckets. Google now indexes them beautifully!
Alexandru-Emil Lupu, 1200k+ urls for Shop With Me
Resque task
Sitemap stats: 1,242,638 links / 25 sitemaps / 17m45s
businessprofiles, ~130M pages indexed for the corporate registration directory, Business Profiles
We store the sitemap files, which take around a week to generate, on S3 space and have Rails routes to appropriately direct requests to sitemap.xml on our primary app server. The gem allowed us to index the site much more efficiently and has resulted in improved indexation by Google of our many millions of pages.
Simple sidekiq job ran daily and generate the sitemap of all “changes”
Sitemap stats: 2,607,677 links / 56 sitemaps / 15m11s
Diego Mayer-Cantu, Inventively.com, 10M links
Like many others, we run sitemaps as a worker job on a separate server. Currently generating over 10 million links in under an hour usually.
Sitemap stats: 10,942,929 links / 219 sitemaps / 42m35s
Christoph Weil, pricendo.com, ~12M links
Sitemap is running with a cronjob on a weekly basis. Currently we generate Sitemaps with ~12M product links using a batch size of 25k with find_each including all images of the specific products. Great gem – we highly recommend it!
Sitemap stats: 12,886,563 links / 573 sitemaps / 497m55s (incl. ~5M images)
Jack Kinsella. Been using this gem for perhaps seven years to power my law notes business Oxbridge Notes
class GenerateSitemapService
# Without this, the `x_url` helpers are only available through the `linkset` instance
include Rails.application.routes.url_helpers
def initialize(linkset, default_store: 'gb')
@linkset = linkset
@store = default_store
end
def run
add_gb_specific_pages
add_australia_specific_pages # not shown
...
end
private
attr_accessor :store
def add_gb_specific_pages
self.store = 'gb'
Product.active.in_store(store).find_each do |product|
add product_path(tutor), lastmod: product.updated_at
end
...
end
def add(url, options = {})
defaults = {
# This is set by default, but I have no idea how often my site changes,
# so I'll using :lastmod instead.
changefreq: nil,
host: host_based_on_store
}
linkset.add url, defaults.merge(options)
end
def host_based_on_store
HostDeterminer.for_store(store) # returns something like https://www.example.com
end
end
SitemapGenerator::Sitemap.create do
# The sitemap variable is explicitly made available by the library
# maintainers within the `create` scope
GenerateSitemapService.new(sitemap).call
end
Boris Tveritnev, ladendirekt.de. ~30M links, gtin-lookup.com. ~80M links.
We’re running sitemaps generation for ladendirekt.de in four independently running tasks, each of which takes ~1h. Uploading them to the block storage (via S3 client) and proxying requests with nginx.
Generating gtin-lookup.com takes a tad longer: ~5 hrs. The rest is the same: S3 to block storage + nginx to proxy requests.